Introduction to Econometrics
James H. Stock
HARVARD UNIVERSITY
Mark W. Watson
PRINCETON UNIVERSITY
~
...
Boston San Francisco N ew York
London Toronco Sydney ToJ..:yo Singapore Madrid
Mexico City Munich Paris Cape Town Hong Kong Montreal
Brief Contents
PART ONE
CHAPTE R I CHAPTER 2 CHAPTER 3
Introduction and Review
Economic Questions and Data ReviewofProbability Review of Statistics 17
65
3
PART TWO
CHAPT ER 4 CHAPTER 5
Fundamentals of Regression Analysis
Linear Regression with One Regressor 1II
109
Regression with a Singl e Regressor: Hypothesis Tests and
Confidence In tervals 148
Linear Regression with Multiple Regressors Hypothesis Tests and Confidence Intervals
in Mult ipl e Regression 220
No nlinear Regression Functions 254
312
Assessing Studies Based on Multiple Regression 186
CHAPTER 6 CHAPTER 7
CHAPTER 8 CHAPTER 9
PART THREE Further Topics in Regression Analysis
CHAPTER 10 CHAPTER II C HAPTE R 12 CHAPTER 13
347
383
Regression with Panel Data
349
421
468
Regression with a Binary Depe ndent Variable Instrumenta l Variables Regression Ex periments and Quasi Experiment s
PART FOUR
CHAPTER 14 CHAPTER 15 C HAPTER 16
Regression Analysis of Economic
Time Series Data 523
Introduction to T ime Series Regression and Forecasting Estimation of Dynamic Causal Effects 591
637
Additional Topics in Time Series Regression 525
PART FIVE
CHAPTER 17
The Econometric Theory of Regression Analysis
The Theory of Linear Regression wit h One Regressor 704
675
677
CHAPTER 18 The T heory of Multiple Regression
v
Contents
Preface
XXVIl
PART ONE
CHAPTER I
1.1
Introduction and Review
Economic Questions and Data 3
Economic Questions We Examine 4
Question # 1: Does Reducing Class Size Improve Elementary School
Education? 4
Question # 2: Is There Racial Discrimination in the Market for Home Loans? Question # 3: H ow Much Do Cigarette Taxes Reduce Smoking? 5
Question #4: W hat Will the Rate of Inflation Be Next Year? 6
Quantitative Questions, Quantitative Answers 7
5
1.2
Causal Effects and Idealized Experime nts
Estimation of Causal Effects 8
Forecasting and Causality 9
8
1.3
Data: Sources and Types
10
10
Expe rimental versus Observational Data CrossSectional Data [ I
Time Series Data II
Panel Data 13
CHAPTER 2
2.1
Review of Probability
17
18
Random Variables and Probability Distributions
Probabilities, the Sample Space, and Random Variables 18
Probability Distribution of a Discrete Random Variable 19
Probability Distribution of a Continuous Random Variable 2 1
2.2
Expected Values, Mean, and Variance
23
The Expected Value of a Random Variable 23
The Standard Deviation and Variance 24
Mean and Variance of a Linear Function of a Random Var iable Other Measures of the Shape of a Distribution 26
2.3
25
Two Random Variables
29
29
Joint and Marginal Distributions Conditional Distributions 30
vii
viii
CO NTENTS
Independence 34
Covariance and Correlation 34
The Mean and Variance of Sums of Random Variables
2.4
35
The Normal, ChiSquared ,Student t, and F Distributions
The Normal Distribution 39
The ChiSquared Di stribution 43
The Student t Di stribution 44
The F Distribution 44
39
2.5
Random Sampling and the Distribution of the Sample Average
Random Sampling 45
The Sampling Distribution of the Sample Average 46
45
2.6
LargeSample Approximations to Sampling Distributions
The Law of Large Numbers and Consi stency The Central Li mit Theorem 52
APPENDIX 2.1
48
49
63
Derivation of Results in Key Concept 2.3
CHAPTER 3
Review of Statistics
65
66
3.1
Estimation of the Population Mean
Esti mators and Their Properties 67
Properties of Y 68
The Importance of Random Sampling 70
3.2
Hypothesis Tests C oncerning the Population Mean
71
Null and Alternative Hypotheses 72
The p Value 72
Calculating thepValue When O' y Is Known 74
The Sample Variance, Sample Standard Deviation , and Standard Error Calculating thepValue When O'y Is Unknown 76
The tStatistic 77
Hypothesis Testing with a Prespecifled Significance Level 78
OneSided Alternatives 80
3.3 3.4
75
Confidence Interva ls for the Population Mean Comparing Means from Different Populations
81
83
84
Hypothesis Tests for the Difference Between Two Means 83
Confidence Intervals for the Difference Between Two Population Means
3.5
DifferencesofMe an s Estimat ion of Causa l Effects
Using Experiment al Data 85
The Causal Effect as a Difference of Conditional Expectations 85
Estimation of the Causal Effec t Using Differences of Means 87
CO NTENTS
ix
3.6 Using the tSt at istic W hen the Sample Size Is Small 88
T he tStatistic and the Student t Distribution 88
Use of the Student t Distribution in Practice 92
3.7 Scatterplot, the Sample Covariance, and the Sample Correlation 92
Scatterp[ots 93
Sample Covariance and Correlation 94
APPENDIX 3 . 1 APPENDIX 3.2 APPENDIX 3.3
The U.S. Current Population Survey
[05
Two Proofs That Y [s the Least Squares Estimator of J.Ly A ProofThat the Sample Variance Is Consistent [07
[06
PART TWO
CHAPTER 4
Fundamentals of Regression Analysis
linear Regression w ith One Regressor 112
III
109
4. 1 The Linear Regression Model
4. 2
Estimating the Coefficients of the Linear Regression Model
T he Ordinary Least Squares Estimator [18
OLS Estimates of the Re [ationsh ip Between Test Scores and the
StudentTeacher Ratio 120
Why Use the OLS Estimator? 12 1
116
4.3
Measures of Fit
123
The R2 [23
The Standard Error of the Regression 124
Application to the Test Score Data 125
4.4 T he Least Squares Assumptions 126
Assumption #1: The Conditional Distribution ofu, Given X, Has a Mean
of Zero [26
Assumption #2: (X,. Y,). i n Are Independent[y and Ide ntically
Distributed [28
Assumption #3: Large Outliers Are Unlike[y [29
Use of the Least Squares Assumptions 130
=[.. . ..
4.5 The Sampling Distribution of the OLS Estimators T he Sampling Distribution of the OLS Estimators
4.6 Conclusion
AP P E NDI X 4 . 1 A PPEND IX 4. 2 A PP EN D IX 4 . 3
131
132
135
The California Test Score Data Set Derivation of the OLS Estimators [43 143
144
Sampli ng Di stribution of the O LS Estimator
x
CO NTENTS
CHAPTER 5
Regression with a Single Regressor: Hypothesis Tests
and Confidence Intervals 148
Regression Coefficients 149
5.1 Testing Hypotheses About One of the
Tw oSided H ypotheses Concerning f31 149
O neSided H ypotheses Concerning f31 153
Testing Hypot heses About the Intercept f30 155
5.2
II
Confidence Intervals for a Regression Coefficient Regression W hen X Is a BinaryVari able
Inte rpretation of the Regression Coefficients
ISS
I.
I
5.3
158
158
5.4
Heteroskedasticity and Homoskedasticity
160
W hat Are Heteroskedasticity and H omoskedasticity? 160
Mathematical Implications of Hom oskedasticity 163
W hat Does This Mean in Practice? 164
5.5 The Theoretical Foundations of Ord inary Least Squares 166
Linear Cond itionally Unbiased Estimators and the Ga ussMarkov r heorem Regression Estimators Other Than OLS 168
5.6 Using the tStatistic in Regression When the Sample Si z~
167
Is Small
169
The tStatistic a nd the Student t Di stribution 170
Use of the Student t Distribution in Practice 170
5.7 Conclusion
171
180
APPE N DI X 5 .1 Form ulas for OLS Standard Errors APPEND IX 5.2
The GaussMarkov Cond itions and a Proof
of the GaussMarkov Theorem 182
CHAPTER 6
6.1
Linear Regression with Multiple Regressors Omi tted Variable Bias 186
186
Definition of Omitted Variable Bias 187
A Form ula for Omitted Va riable Bias 189
Addressing Omitted Variable Bias by Dividing the Data into Groub s
6.2
19 1
The Multiple Regression Mode l
193
194
The Population Regression Line 193
The Popu lation Multiple Regre ssion Model
6.3
The OLS Estimator in Multiple Regression
196
198
The OLS Esti ma tor 197
Appl ication to Test Score s and the Stude ntTeacher Ratio
CONT ENTS
xi
6.4 Measures of Fit in Multiple Regression
200
T he Standard Error of the Regression (SER) 200
The R2 200
The 'I\djusred R2" 20 I
Applicat ion to Test Scores 202
6.5 The Least Squares Assumptions in Multiple Regression
Ass umption # 1: The Conditio nal Distributi on of Given X I, ' X 2i , Mean of Zero 203
Assumption # 2: (XI, ' X 2" . . . , Xiu ' Y,) i I, ... ,n Are i.i.d. 203
As sum ption # 3: Large Outliers Are Unlikely 203
Assumpti on # 4: No Perfect Mul ticollinearity 203
u ,
202
. ..
,Xk, H as a
=
6.6 The Distribution of the OLS Estimators
in Multiple Regression
205
206
6.7 Mul t icollinearity 206
Examples of Perfect Multicollineari ty Impe rfect Multicollinearity 209
6.8 Conclusion
APP ENDIX 6 . 1 APPEN DIX 6 .2
210
Derivation of Eq uation (6 .1) 218
Di stribution of the OLS Estimators W hen There Are Two
Regressors and H omoskedastic Errors 218
CHAPTER 7 Hypothesis Tests and Confidence Intervals
in Multiple Regression fo r a Single Coeffic ie nt
220
221
7.1 Hypothesis Tests and Confidence Intervals
Standard Errors for the OLS Estimators 22 1
Hypothesis Tests for a Si ngle Coeffi cient 22 1
C onfidence Intervals for a Single Coefficient 223
Application to Test Scores and the Studen t Teacher Ratio
7.2 Tests of Joint Hypotheses 225
Testi ng H ypotheses on Tw o or More Coefficients 225
T he FStatistic 227
Applicatio n to Test Scores and the Student Teacher Ratio T he Homoskedastici ryOnly FStatistic 230
22 3
229
7.3 Testing Si ngle Restrictions
Involving Multiple Coefficient s
232
234
7.4 Confidence Sets for Multiple Coefficients
xii
CONTE NTS
7.S
Model Specification for Multipl e Regression
235
Omitted Variable Bias in Multiple Regression 236
Model Specification in Theory and in Practice 236
Interpreting the R2 and the Adjusted R2 in Practice 237
7.6 7.7
Analysis of the Test Score Data Set Conclusion 244
239
APPE ND IX 7. 1 The Bonferroni Test ofa Joint Hypotheses
25 1
CHAPTER 8
Nonlinear Regression Functions 8.1
254
A General Strategy for Modeling Nonlinear
Regression Functions 256
Test Scores and District Income 256
The Effect on Y of a Change in X in Nonlinear Specifications 260
A General Approach to Modeling Nonlinearities Using Multiple Regression
264
8 .2
Non linear Functions of a Single Independent Variable
264
Polynomials 265
Logarithms 267
Polynomial and Logarithmic Models of Test Scores and District Income 8.3
275
Interactions Between Independent Variables
277
280
Interactions Between Two Binary Variable s 277
Interactions Between a Continuous and a Binary Variable Interactions Between Two Continuous Variables 286
8.4
Nonli ne ar Effects on Test Scores of the StudentTeacher Ratio
Discussion of Regression Results Summary of Findings 295
291
290
8. 5
Conclusion
APPENDI X 8.1
296
Regression Functions That Are Nonlinear
in the Parameters 307
CHAPTER 9
Assessing Studies Based on Multiple Regression
312
9.1
Internal and Exte rna l Validity
Threats to Internal Validity 313
Threats to External Validity 314
31 3
9.2
Threats to Internal Va lidity of Multiple Regression Analysis
Omitted Variable Bias 316
Misspecification of the Functional Form of the Regression Function Errorsin Variables 319
Sample Se lection 322
316
319
CONTE NTS
xiii
Simulta neous Causa li ty 324
Sources of Inconsistency of OLS Standard Errors
325
9.3 Internal and External Validity When the Regression Is Used
for Forecast ing
327
328
Using Regression Models for Forecasting 327
Asse ssing the Validity of Regre ss ion Models for Foreca sti ng
9.4 Example: Test Scores and Class Size External Validity 329
Internal Validity 336
Discu ss ion and Implication s 337
9.S Conclusion
APPEND IX 9.1
329
338
The Massachu setts Elementary School Testing Data
344
PART THREE Further Topics in Regression Analysis
CHAPTER 10
347
Regression with Panel Data
349
351
10. 1 Panel Data 350
Example: Traffic Deaths and A lcohol Taxes
10.2 Panel Data with Two T ime Pe riods: "Before and After"
Comparisons
353
10.3 Fixed Effects Regression 356
The Fixed Effects Regression Mode l 356
Estimation and Inference 359
Application to Traffic Death s 360
10.4 Regression with Time Fixed Effects 361
Time Effects Only 36 1
Both En tity and Time Fixed Effects 362
10.S The Fixed Effects Regression Assumpt ions and Standard Errors
for Fixed Effects Regression
364
The Fixed Effects Regression Assumptions 364
Standard Errors for Fi xed Effects Regression 366
10.6 Drunk Driving Laws and Traffic Deaths 10.7 Conclusion
367
371
378
APPENDIX 10 . 1 The State Traffic Fatality Data Set APPENDIX 10 . 2
Standard Errors for Fixed Effects Regression
with Serially Correlated Errors 379
xiv
CONTENTS
CHAPTER II
11.1
Regression with a Binary DependentVariable
Binary Dependent Variables 385
The Linear Probability Model 387
383
384
Binary Dependent Variables and the Linear Probability Model
11.2
Probit and Logit Regression
Probit Regression 389
Logit Regression 394
389
Comparing the Linear Probability, Probit, and Logit Models
11 .3
396
Estimation and Inference in the Logit and Probit Models
Nonlinear Least Squares Estimation 397
Maximum Likelihood Estimation 398
Measures of Fit 399
396
11.4 11.5
Application to the Boston HMDA Data Summary 407
400
APPENDIX 11 . 1 The Boston HMDA Data Set APPENDIX 11 . 2 APPEND IX 11. 3
415
415
418
Maximum Likelihood Estimati on
Other Limited Dependent Variable Models
CHAPTER 12
12.1
Instrumental Variables Regression
421
The IV Estimator w ith a Single Regressor
and a Single Instrument 422
The IV Model and Assumptions 422
The Two Stage Least Squares Estimator 423
Why Does IV Regression Work? 424
The Sampling Distribution of the TSLS Estimator Application to the Demand for Cigarettes 430
428
12.2
The General IV Regression Model
432
434
TSLS in the General IV Model 433
Instrument Relevance and Exogeneity in the General IV Model The IV Regression Assumptions and Sampling Distribution
of the TSLS Estimator 434
Inference Using the TSLS Estimator 437
Appl ication to the Demand for Cigarettes 437
12.3
Checki ng Instrument Validity
439
Assumption # 1: Instrument Relevance 439
Assumption # 2: Ins trument Exogeneity 44 3
12.4
Application to the Demand for Cigarettes
445
CONTENTS
XV
12.5 Where Do VaJid Instruments Come From?
450
Three Examples
12.6 Conclusion
451
455
462
APPENDIX 12. 1 The Cigarette Consumption Panel Data Set APP EN DIX 12 .2
Derivation of the Formula for theTS LS Estimator in
Equation (12.4) 462
LargeSample Distribution ofthe TSLS Estimator 463
APPEND IX 12 .3 A PPE N DIX 12 .4
LargeSample Di stribu t ion of the TSLS Estimator When the
Instrument Is Not Valid 464
Instrumental Variables Anal ysis with Weak Instruments 466
APPENDIX 12.5
CHAPTER 13
Experiments and QuasiExperiments
Ideal Randomized Controlled Experiments The Differen ces Estimator 471
468
4 70
13.1 Idealized Experiments and Causal Effects
470
13. 2 Potential Problems with Experiments in Practice Threats to Internal Validity 472
Threats to External Validity 47 5
13.3 Regression Estimators of Causal Effects
472
Using Experimental Data
477
The Differences Estimator with Additi onal Regressors 477
The DifferencesinDifferences Estimator 480
Estimation of Causal Effects for Different Groups 484
Estimation When There Is Partial Compliance 484
Testing for Randomization 485
13.4 Experimental Estimates of the Effect of Class Size Reductions Experime ntal Design 486
Analys is of the STAR Data 487
Comparison of the Observational and Expe rimental Estimates
of Class Size Effects 492
13.5 QuasiExperiments 494
Examples 49 5
Econometric Methods for Analyzing QuasiExperiments
13.6 Potential Problems with QuasiExperiments
486
497
500
Threats to Internal Validity 500
Threats to External Validity 502
CONTENTS
xv
12.5 Where Do Valid Instruments Corne From? Three Examples 451
12.6 Conclusion
450
455
462
APPENDIX 12 . 1 The Cigarette Consumption Panel Data Set AP PENDIX 12.2
Derivation of the Formula for theTSLS Estimator in
Equation (12.4) 462
LargeSample Distribution oftheTSLS Estimator 463
APPENDIX 12.3 APPENDIX 12.4
LargeSample Distribution of the TSLS Estimator W hen the
Instrument Is Not Valid 464
Instrumental Variables Analysis with Weak In struments 466
APPENDIX 12.5
CHAPTER 13
Experiments and QuasiExperiments
468
13.1 Idealized Experiments and Causal Effects 470
Ideal Randomized Controlled Experiments 470
The Differen ces Estimator 471
13.2
Potential Problems with Experiments in Practice
Threats to Internal Validity 472
Threats to External Validity 475
472
13.3 Regression Estimators of Causal Effects
Using Experimental Data 477
The Differences Estimator with Additional Regressors 477
The Diffe rencesinDifferences Esti mator 480
Estimation of Causal Effects for Different Groups 484
Estim ation When There Is Partial Compliance 484
Testing for Randomization 485
13.4
Experimental Estimates of the Effect of Class Size Reductions
Experime ntal Design 486
Analysis of the STAR Data 487
Comparison of the Observational and Experimental Estimates
of Class Size Effects 492
486
13.5 QuasiExperiments 494
Examples 495
Econometric Me thods for Analyzing QuasiExperiments
13.6 Pot ential Problems with QuasiExperiments Threats to Internal Val idity 500
Threats to External Validity 502
497
500
xvi
CONTENTS
13.7
Experimental and QuasiExperimental Estimates in Heterogeneous Populations 502
Population H eteroge neity: Whose Causal Effect ? 502 OLS w ith Heterogeneous Causal Effe cts 503 IV Regression wi th Heterogeneous Causal Effects 50
13.8
Conclusion
A PP EN DIX 13.1 A PPEND IX 13. 2
507
The Project STAR DataS et 516
Extension of the Differencesi n Differences Estimator to Multiple Time Periods 517 Conditional Mean Independe nce 518
A PPE N D IX 13.3 APP EN DIX 13.4
I
Individuals
IV Estimation W hen the Causal Effect Varies Across 520
I
PART FOUR
14.1 14.2
Regression AnaJysis of Economic Time Series Data
Using Regression Mode ls for Forecasting 527 528
528
523
CHAPTER 14 Introduction to Time Series Regression and Forecasting
525
Introduction to T ime Series Data and Serial Correlation
The Rates of Inflation and Unempl oyme nt in the United States Lags, Fi rst Differences, Logarithm s, and Growth Rates 528 Autocorrelation 532 Other Examples of Economic Time Series 533
14.3
Autoregressions
535
The First Ord er A utoregressive Mode l 535 The p th Order Autoregressive Model 538
14.4
Time Series Regression with Addit ional Predictors and the Autoregressive Distributed Lag Model 541
Forecasting Change s in the Inflation Rate Using Past Unemployment Rates 541 Stationarity 544 Time Series Regre ssion w ith Multiple Predictors 545 Forecast Uncertainty and Forecast Intervals 548
14.5
Lag Length Selection Using Informat ion Criteria
549
553
Determining the Order of an Autoregression 55 1 Lag Le ngth Se lection in Time Series Regression W it h Mult iple Predictors
14.6
Nonstationarity I: Trends
What [s a Tre nd?
P ....... I ~... .."..~
554
" 1:; 7
555
I..... ~
r.., ........ .....I
c:: . __ l... .... ;" ;... T ...,.. .... ...I r
CONTENTS
xvii
De tec ting Stochas tic Tre nds : Testing for a Unit A R Roo t 560 Avoiding the Pro ble ms Caused by Stochastic Trends 564 14.7 Nonstationarity II: Breaks 565
W hat Is a Break? 565
Testi ng fo r Breaks 566
Pse udo Outo fSample Fo recasting 571
Avo iding the Proble ms Caused by Breaks 576
14.8 Conclusion
AP P ENDI X 14 . 1 A PPE NDI X 14 . 2 APPENDIX 14 .3 APPENDIX 14.4 A P PEN D IX 14.5
577
Time Series Data Used in Chapter 14 Stationarity in the A R(I ) Model Lag Operator Notation ARM A Models 589 588 586 586
Consistency of the BIC Lag Length Estimator 589
CHAPTER 15
1 S.l 1 S.2
Estimation of Dynamic Causal Effects An Init ial Taste of the O range J uice Data Dynamic Causal Effects 595
596
Causal Effects and Time Series Data Tw o Types ofExogenei ty 598
591 593
t 5.3
Estimation of Dynamic Causal Effects with Exogenous Regressors
The Distributed Lag Model Ass umptions 601 Au tocorrelated U t , Standard Errors, and Infe rence 60 I Dynamic Multi pli e rs and Cumulative Dynamic Mul t ipl iers
600
602
15.4 H et eroskedasticity and Autocorrelat ionConsistent Stan dard
Er rors
604
604 606
DIstribution of the OLS Estimator w ith Autocorre la ted Errors H AC Standard Errors
15.5 Estimation of Dynamic Causal Effects with Strict ly
Exogenous Regressors 608
The Distributed Lag Mode l wit h A R(I) Errors 609 OLS Estimation of the ADL Model 6 12 GL S Estimation 613 The Distributed Lag Model w ith Additional Lags a nd A R(p) Errors 15.6 Orange J uice Prices an d Cold Weather 15.7 Is Exogeneity Plausible? Some Examples U.S. Income a nd Aus tra lian Exports 625
Oil Prices and Inflation 626
6 15
618 624
xviii
CONTENTS
Monetary Policy and Inflation The Phillips Curve 627
15. 8
626
Conclusion
APPENDIX 15.1 APPENDIX 15.2
627
The Orange Juice DataSet 634
The ADL Model and Generalized Least Squares
in Lag Operator Notation 634
CHAPTER 16
16.1
Additional Topics in Time Series Regression Vector Autoregressions
638
637
The VAR Model 638
A VAR Model of the Rates of Inflation and Unemployment
16.2
641
Multiperiod Forecasts
642
Iterated Muliperiod Forecasts 643
Direct Multiperiod Forecasts 645
Which Method Should You Use? 647
16.3
O rders oflntegration and the DFGLS Unit Root Test
Other Models ofTrends and Orders of Integration 648
The DFGLS Test for a Unit Root 650
Why Do Unit Root Tests Have Nonnormal Distributions?
648
653
16.4
Cointegration
655
658
Cointegratlon and Error Correction 655
How Can You Tell Whether Two Variables Are Cointegrated 7 Estimation of Cointegrating Coefficients 660
Extension to Multiple Cointegrated Variables 661
Application to Interest Rates 662
16.S
Volatility Cl ustering and Autoregressive Conditional
Heteroskedasticity 664
Volatility Clustering 665
Autoregressive Conditional Heteroskedasticity Application to Stock Price Volatility 667
666
16.6
Conclusion
669
674
AP PENDIX 16 . 1 U.S. Financial Data Used in Chapter 16
PART FIVE
CHAPTE R 17
The Econometric Theory of Regression Analysis
The Theory of Linear Regression with One Regressor
The Extended Least Squares Assumption s The OLS Estimator 680
678
675
677
678
17.1
The Extended Least Squares Assumptions and the OLS Estimator
CON TENTS
xix
17.2
Fundamentals of Asymptotic Distribution Theory
680
Convergence in Probability and the Law of Large Numbers 681
The Centra l Limit Theorem and Convergence in Distri bution 683
Slutsky's Theore m and the Contin uous Mappi ng Theorem 685
Application to the tStatistic Based on the Sample Mean 68 5
17.3
Asym ptotic Distribut ion of the OLS Estimator and tStatist ic
686
Consistency and Asymptotic Normality of the OLS Estimators 686
Consistency of H eteroskedasti cityRo bust Standard Errors 686
Asymptotic Norma lity of the H eteroskedasticityRobust tStatistic 688
17.4
Exact Sampling Dist ributions When the Errors
Are Normally Distributed 688
Distribution of ~ J with No rma l Errors 688
Distribution of the Homoskedasticityonly tStatistic 690
17.5
Weigh ted Least Squares
691
W LS with Known Heteroske dasti city 69 1
W LS w ith H eteroskedasticity of Know n Functional Form 692
H eteroskedasticityRobust Standard Errors or W LS? 695
AP P ENDI X 17. 1 The Normal a nd Re lated Distributions and Moments of
Contin uous Random Variables
A PP ENDI X 17. 2
700
702
Two Inequa lit ies
C HAPTER 18 The Theory
18.1
of Multiple Regression
704
The Linear Multipl e Regressi on Model and OLS Estimator
in Matrix Form 706
The Mul tiple Regressio n Model in Matrix Notation The Extended Least Squares Assu mptions 707
The O LS Estimator 708
706
18.2
Asymptot ic Distribution of the OLS Estimator and tStatistic
The Mul tivariate Central Limit Theorem 7 10
Asymptotic Normali ty of ~ 710
Hete roskedastic ityRobust Standard Errors 711
Co nfidence Intervals for Predicted Effects 712
A symptotic Di stri but ion of the tStatistic 7 13
710
18.3
Tests of Join t Hypoth eses
713
Join t H ypotheses in Matrix Notation 713
Asymptotic Di stribution of the FStatistic 7 14
Confidence Sets for Mult iple Coefficients 71 4
18.4
Distribution of Regression Statistics w ith Normal Errors
Matrix Represe ntations ofO LS Regression Statistics Di stribution of ~ w ith Nor mal Errors 7 16
7 15
715
xx
CONTEN TS
Distribution of S$ 717
HomoskedasticityOnly Standard Errors Distribution of the tStatistic 718
Distribution of the FStatistic 718
717
18. 5
Efficiency of the OLS Estimator with Homoskedastic Errors
The GaussMarkov Conditions for Multiple Regression 719
Linear Conditionally Unbiased Estimators 719
The GaussMarkov Theorem for Multiple Regression 720
719
18.6 Generalized Least Squares 721
The GLS Assumptions 722
GLS When n Is Known 724
GLS When n Contains Unknown Parameters 725
The Zero Conditional Mean Assumption and GLS 725
18.7 Instrumental Variables and Generalized Method
of Moments Estimation
727
The IV Estimator in Matrix Form 728
Asymptotic Distribution of the TSLS Estimator 729
Properties ofTSLS When the Errors Are Homoskedastic 730
Generalized Method of Moments Estimation in Linear Models 733
APPENDIX 18, I APPENDIX 18,2 APPENDI X 18,3 APPE N D IX 18 .4
Summary of Matrix Algebra Multivariate Distributions
743
747
748
Derivation of the Asymptotic Distribution of {3
Normal Errors
APPENDIX 18,5
Derivations of Exact Distributions of OLS Test Statistics with
749
Proof of the GaussMarkov Theorem
for Multiple Regression 751
Proof of Selected Results for IV and GMM Estimation 752
APPENDIX 18.6
Appendix 755
References 763
Answers to "Review the Concepts " Questions Glossary 775
Index 783
767
Key Concepts
PART ONE
1.1
Introduction and Review
15
CrossSectional, Time Series, and Panel Data Expected Value and the Mean
2.1
2.2 2.3 2.4 2. 5 2.6 2 .7 3. , 3.2 3.3 3. 4 3.5 3.6 3. 7
24
25
Variance and Standard Deviation
Means, Variances, and Covariances of Sums of Random Variables Computing Probabil ities Involving Normal Random Variables Simple Random Sampling and i.i.d. Random Variables The Central Limit Theorem Estimators and Estimates Efficiency ofY: Y is BLUE The Standard Error ofY 67 47
38
40
50
Convergence in Probability, Consistency, and the Law of Large Numbers
55
Bias, Consistency, and Efficiency
68
70
79
1: ILY,0
76
80
The Terminology of H ypothesis Testing
Testing the Hypothesis E(Y) = J.Ly,0 Against the Alternative E(Y) Confidence Intervals for the Population Mean
82
PART TWO
4.1
4.2 4.3 4.4
Fundamentals of Regression Analysis
I 19
109
I 15
Terminology for the Linear Regression Model with a Single Regressor The OLS Estimator, Predicted Values, and Residuals The Least Squares Assumptions General Form of the tStatistic
131
LargeSample Distributions of.8o and.81
133
1:
5. 1
5.2 5.3 5.4 5.5 6. 1 6 .2 6.3 6.4 6.5
150
f3 1,0
Testing the Hypothe sis f3 1 = f31 ,0 Against the Alternative f31 Confidence Interval for f31 157 Heteroskedasticityand Homoskedasticity
152
162 189 198
T he GaussMarkov Theorem for .81 168 Omitted Variable Bias in Regression with a Single Regressor The Multiple Regress ion Model
196
The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression Model The Least Squares Assumptions in the Multiple Regression Model Large Sample Distribution of .80' .81' ... , .8k
204
206
xxi
xxii
KEY CONC EPTS
7.1 7. 2 7.3
Testi ng the H ypothe sis f3i = {3,.o Against the A lternat ive f3, 7= !3j.o 222
Confidence Intervals for a Single Coefficient in Multiple Regression 223
O mitted Variable Bias in Multiple Regression 237
7.4
8. 1 8.2 8 .3 8.4 8.5 9.1 9 .2 9 .3 9 .4 9 .5 9 .6 9.7
R2 and R2: W hat T hey Tell You and What They Don't 238
The Expected Effect on Y of a Change in X, in the Nonlinear Regression Model (8 .3) Logarithms in Regression: Three Cases 273
279
282
A Method fo r Interpreting Coefficients in Regressions w ith Bi nary Variables Interactions Between Binary and Continuous Variables Inte racti ons in Mu ltiple Regression Internal and External Validity 313
318
319
287
26 1
Omitted Var 'iable Bias: Should I Include More Variables in My Regression? Functional Form Misspecification ErrorsinVariables Bias Sam ple Se lection Bias 321
323
325
327
Simultaneous Causal ity Bias
Threats to the Internal Validity of a Multiple Regression Study
PART THREE Further Topics in Regression Analysis
10. 1 10.2 10.3 11. 1 11.2 11. 3 12.1 12. 2 12 .3 12.4 12 .5 12 .6 Notation for Panel Data 350
359
365
The Fixed Effects Regression Model The Linear Probability Model Logit Regression 394
388
347
The Fixed Effects Regre ssion Assumptions
The Probit Model, Predicted Probabilities, and Estimated Effects
392
433
The General Instrumenta l Variables Regression Model and Terminology Two Stage Least Squares 435
436
44 1
444
437
The Tv" 0 Condi t ions for Valid Instruments The IV Regression Assumptions
A Rule ofThumb for Checking for Weak Instruments The O veridentifying Restrictions Test (the }Statistic)
PART FOUR
14 .1 14. 2 14. 3 14.4 14.5
Regression Analysis of Economic Time Series Data
530
532
523
Lags , First Differences, Logarithm s, and Growth Rates Autocorre lation (Serial Correlation) and Autocovariance Autoregressions Stationarity 539
544
The A utoregressive Distributed Lag Model 545
KE Y CON CEPTS
xxiii
14. 6 Time Se nes Regression w ith Multiple Predictors 546
14.7 Gra nger Causal ity Tests (Te sts of Predictive Content) 547
14.8 The Augmented Dickey Fuller Test for a Unit Autoregressive Root 14 .9 The Q LR Test for Coefficient Stabil ity 14. 10 Pse udo OutofSample Forecasts 15.1 15 .3 15.4 16. I 16.2 16.3 16.4 16.5 572
600
602
6 17
569
562
The Distributed Lag Model and Exogeneity
15 .2 The Distributed Lag Model Assumptions
H AC Standard Errors 609
Estimation of Dynam ic Multipliers Under Strict Exoge nei ty Vector Autoregressions 639
645
650
Iterated Multiperiod Forecasts
Direct Multiperiod Forecasts 647
Orders of Integration, Differe ncing, and Stat ionarity Cointegration 658
PART FIVE
17 .1 18.2 18. 3 18.4
The Econometric Theory of Regression Analysis
675
680
707
The Extended Least Squares Assumptions for Regression w ith a Single Regressor The Multivariate Central Li mitTheorem The GLS Assumption s 723
710
721
18. 1 The Extended Least Squares Assumption s in the Multiple Regression Model GaussMarkov Theorem for Multiple Regression
G eneral Interest Boxes
The Distribution of Earnings in the United States in 2004 A Bad Day on Wall Street 42
Landon Wins I 7[
35
The Gender Gap of Earnings of College Graduate s in the United States A Novel Way to Boost Retire ment Savings 90
The "Beta" ofa Stock 122
86
The Economic Value of a Year of Education: Heteroskedasticity or Homoskedasticity? 165
The Mozart Effect: Omitted Variable Bias? 190
The Returns to Education and the Gender Gap The Demand for Economics Journa[s 288
323
407
425
Do Stock Mutua[ Funds Outperform the Market? Who Invented Instrumental Variable Regression? A Scary Regre ssion 426
The Externa[ities of Smoking The Hawthorne Effect 474
498
446
284
James Heckman and Danie[ McFadden , Nobel Laureate s
What [s the Effect on Employment of the Minimum Wage? Can You Beat the Market? Part I 540
The River of Blood 550
573
Can You Beat the Market? Part 11
N EWS FLASH : Commodity Traders Send Shivers Through Di sney World Robert Engle and Clive Granger, Nobel Laureates 657
625
Preface
E
conometrics ca n be a fun course for both teacher a nd student. Th e real world of economics, business, and government is a complicated and messy place. full of competing ideas and quest ions tha t dem and answe rs. Is it more effective to tackle drunk driving by passing tough laws or by increasing the tax on alcohol? Can yo u make money in the stock market by buying when prices are historically low, relative to earnings. or should yo u just sit tight as the ran dom walk theory of stock prices suggests? Can we improve elementary education by reducing class sizes. or should we simply have our children listen to Mozart for tcn min utes a day? Econometrics helps us to sort o ut sound ide as from crazy ones and to find qu an titative answers to important quantitative questions. E conometrics opens a wi n dow on our complicated world that lets us see the relationships on which people. businesses, and governments base their decisions. This textbook is designed for a first course in undergraduate econometrics. It is our experience that to make econometrics relevant in an introductory course, interesting applications must motivate the theory and the theory must match the applications. TIlis simple principle represents a significant departure from the older generation of econ ometrics books, in which theoretical models and ass um ptions do not match the app lications. It is no wonder tha t some students question the rel evance of econometrics after they spend much of their time learni ng assumptions that th ey subsequently realize are unrealistic, so that they must then learn "sol u tions" to " problems" that arise when the applications do not match the assump t ions. We believe that it is far better to motivate the need for tools with a concrete application, and thell to provide a few simple assumptions that match the appli catio n. Because the the ory is immedia tely relevant to tbe applica tions, this approach can make econometrics come alive. TIle second edition benefits from the many constructive suggestions of teach ers who used the first edition, while maintaining the philosophy tbat applications should drive the theory, not the other way around. The single greatest change in the seco nd edition is a reo rgani za tion and expan sion of the material o n core regression analysis: Part II, wh ich covers regression with crosssectional data, has been expanded from fo ur chapters to six. We have added new empirical exampl es (as boxes) drawn from economics and finance; some new optional sectio ns on
xxviii
PREFACE
classical regression theory; and many new exercises, both paperandpencil and computerbased empirical exercises using data sets newly placed on the textbook Web site. A more detailed description of changes to the second edition can be fo und on page xxxii.
Features of This Book
This textbook differs from others in three main ways. First, we integrate realworld questions and data into the development of the theory. and we take seriously the substantive findings of the resulting empirical analysis. Second, our choice of top ics reflects modern theory and practice. Third, we provide theory and assumptions that match the applications. Our aim is to teach students to become sophisticated consumers of econometrics and to do so at a level of mathematics appropriate for an introductory course.
Realworld Questions and Data
We organize each methodological topic around an important realworld question that demands a specific numerical answer. For example, we teach singlevariable regression, multiple regression , and functional form analysis in the context of esti mating the effect of school inputs on school outputs. (Do sma ller elementary school class sizes produce higher test scores?) We teach panel data methods in the context of analyzing the effect of drunk driving laws on traffic fatalities. We use possible racial discrimination in the market for home loans as th e empirical appli cation for teaching regression with a binary dependent variable (logit and probit). We teach instrumental variable estimation in the context of estimating the demand elasticity for cigarettes. Although these examples involve economic reasoning, all can be understood with only a single introductory course in economics, and many can be understood without any previous economics coursework.Thus the instruc tor can focus on teaching econometrics, not microeconomics or macroeconomics. We treat all our empirical applications seriously and in a way that shows stu dents how they can learn from data but at the same time be selfcritical and aware of the limitations of empirical analyses. Through each application, we teach students to explore alternative specifications an d thereby to assess whether their substantive findings are robust. The questions asked in the empirical appli cations are important, and we provide serious and, we think. credible answers. We encourage students and instructors to disagree, however, and invite th em to reanalyze the data , which are provided on the textbook's companion Web site (www.awbc.comlstock_ watson).
PREFACE
xxix
Contemporary Choice ofTopics
Econometrics has come a long way in the past two decades. The topics we cover reflect the best of contemporary appli ed econometrics. O ne can only do so much in an introductory course. so we focus on procedures and tests tha t ar e commo nly used in practice. For example:
• Instrumental variables regression. We present instrumental variables regres
sion as a general me thod for handJing correlation be tween the error term and a regressor, which can arise for many reasons, including omitted variables and simultaneous causality. The two assumptions for a valid instrum enl  exo geneity and relevanceare given equal billing. We follow that presentation with an extended discussion of where instruments come fro m, and with tests of overidentifying restrictions and d iagnostics fo r weak instrumentsa nd we explain what to do if these diagnostics suggest problems.
• Program evaluation. An increasing number of econometric studies analyze
either randomized controlled experiments or quasiexperiments, also known as natural experiments. We address these topics, often collectively referred to as program evaluation, in Chapter 13. We present this research strategy as an alternative approach to the problems of omitted variables. simultaneous causality, and selection, and we assess both the strengths and the weaknesses of studies using experimental or quasiexperimental data .
• Forecasting. The chapter on forecasting (Chapter 14) considers univariate
(autoregressive) and multivariate forecasts using time series regression, not large simultaneous equation structural models. We focus on simple and reli able tools, such as autoregressions and model selection via an information cri terion, that work well in practice. This chapter also fe atures a practica lly oriented treatment of stochastic trends (unit roo ts), unit root tests, tests for structural breaks (at known and unknown dates), and pseudo o Ulofsample forecasting, all in the context of developing stable and reliable time series fore casting models.
• Time series regression. We make a clear distinction between two very differ
ent applications of time series regression: forecasting and estima tion of dynamic causal effects. The chapter on causal inference using time series data (Chapter 15) pays careful attention to when different estima tion methods, including generalized least squares, will o r will not lead to valid causal infer ences, and when it is advisable to estimate dynamic regressions using OLS with heteroskedasticity and autocorrelationconsistent standard eITors.
xxx
PREFACE
Theory T hat Matches Applications
Although econometric tools are best motivated by empirical applications, students need to learn enough econometric theory to und erstand the strengths and limita tions of those tools. We provide a modern treatment in which the (it between tbe ory and applications is as tight as possible, while keeping tlle mathematics at a level th at requires only algebra . Modern empirical applica tions share some comm on characteristics: the da ta sets typically are large (hundreds of observations, often more); regressors are not fixed over repeated samples but rather are co llected by random sampling (or some other mechanism that makes them random); the data are not normally distributed; and there is no a priori reason to think that the errors are homoskedastic (although often there are reasons to think that they are heteroskedastic). These observations lead to important differences between the theoretical development in this textbook and other textbooks.
• Largesample approach. Because data sets are large, fro m the outset we use largesample normal approximations to sampling distribu tions for hypothesis testing and confidence intervals. O ur experience is tha t it takes less time to teach the rudiments of largesample approximations than to teach the Student l and exact F distributions, degreesoffreedom corrections, and so forth. This largesample approach also saves students the frustration of discovering that, because of nonnormal errors, the exact distribution theory they just mastered is irrelevant. Once taught in the context of the sample mean, the largesample approach to hypothesis testing and confidence intervals carr ies directly thro ugh mu ltiple regression analysis, logit and probit, instrumental variables estimation, and time series methods. • Random sampling. Because regressors are rarely fi xed in econometric appli cations, from the outset we treat data on all variables (dependent and inde pendent) as the result of random sampling. This assumption matches our initial applications to crosssectional data; it extends readily to panel and time series data; and because of our largesample approach, it poses no addi tional con ceptual or mathematical difficulties. • H eterosk edasticity. A pplied econometricians routin ely use het eroskedas tieity robust standard errors to elimina te worries about whe ther hetero skedasticity is presen t or not. In thi s book. we move be yond treating heterosked asticity as an exception or a "problem" to he "solved"; instead, we allow for hete roskedasticity fro m the outset and simply use heteroskedasticity
Ii
PREFACE
xxxi
robust standard errors. We present homoskedasticity as a special case that pro vides a theoretical motivation for OLS.
Skilled Producers, Sophisticated Consumers
We hope that students using this book will become sophisticated consumers of empirical analysis. To do so, they must learn not only how to use the tools of regression analysis, but also how to assess the validity of empirical analyses pre sented to them. Our approach to teaching how to assess an empirical study is threefold. First, immediately after introducing the main tools of regression analysis, we devote Chapter 9 to the threats to internal and external validity of an empirical study. This chapter discusses data problems and issues of generalizing findings to other set tings. It also examines the main threats to regression analysis, including omitted variables, functional form misspecification, errorsinvariables, selection, and simultaneityand ways to recognize these threats in practice. Second, we apply these methods for assessing empirical studies to the empir ical analysis of the ongoing examples in the book. We do so by considering alter native specifications and by systematically addressing the various threats to validity of the analyses presented in the book. Third, to become sophisticated consumers, students need firsthand experience as producers. Active learning beats passive learning, and econometrics is an ideal course for active learning. For this reason , the textbook Web site features data sets, software, and suggestions for empirical exercises of differing scopes. These web resources have been expanded considerably for the second edition.
Approach to Mathematics and Level of Rigor
Our aim is for students to develop a sophisticated understanding of the tools of modern regression analysis, whether the course is taught at a "high" or a "low" level of mathematics. Parts IIV of the text (which cover the substantive material) are accessible to students with only precalculus mathematics. Parts IIV have fewer equations, and more applications, than many introductory econometrics books, and far fewe r equations than books aimed at mathematical sections of under graduate courses. But more equations do not imply a more sophisticated treat ment. In our experience, a more mathematical treatment does not lead to a deeper understanding for most students. This said, different students learn differently, and for the mathematically weU prepared students, le arning can be enhanced by a more explicitly mathematical treatment. Part V therefore contains an introduction to econometric theory that
xxxii
PREfACE
is appropriate lor students with a stronge r mathematical background. We believe that, when the mathema tical chapters in Pa rt V are used in conjuDction with the ma terial in Parts IiV, this book is suita ble for advanced undergraduate or mas ter's level econometrics courses.
Changes to the Second Edition
The cb a nges introduced in th c second edition fall into three ca tegories: more empi rica l exa mples: ex pa nded theoretica l ma terial, especially in the treatmen t of the core regression topics; and additional student exercises.
More empirical examples. TIl e second edition retains tbe empirical examples from the first edition and adds a significan t number of new ones. T11cse additio na l examples include estima tion of the returns to education; infe rence abou t the gen der gap in earnings: the diffic ulty of for ecasting the stock market; and modeli ng the volatility clustering in stock returns. The data se ts for these empirical exam ples are posted on the course Web site. The secoml ed ition also includes more gen eralinterest boxes, for example how sample selection bias ("survivorship bias" ) can produce misleading conclusions about whether actively man aged mutual fun ds actually beat the marker. Expanded theoretical materiaL. The phil osophy of tbis and tbe previous edi tion is that t he modeling assumptions should be motivated by empirical applica tions. For this reason, our th re e basic least sq uares assumpt io ns tha t under pin regression with a single regressor includ e neith er normali ty nor homoskedastic ity, bot h o f which are argu ably the exception in eco nometric a pp li cations. This leads directly to largesample inference using heteroskedasticityrobust standard errors. O ur experie nce is that students do not find th is difficult in fact, wh at they fin d difficult is the traditiona l approach of introducing tile homoskedasticity and normali ty assum ptions, learn ing how to use { and Ftahl es, then being told that wh at they just learned is not reliabl e in applications because of the failure of these assumptions and that these " p roblems" mu st be '·fi xed." But n ot all instruc tors sh are th is view, anti some find it useful to introd uce the homos kedastic no rm al regressi on model. Moreover, even if homosk ed asticity is the exception instead of the rule. assuming homoskedastjcity permits djscllssing the G au ssM arkov theo rem , a key moti vation fo r using ordin ary least squares (OLS). For thcse reasons. the treatm ent of the core regression mate rial has been sig nificantly expanded in the second edition. and now includes sections on the theo retical motivation for OLS (the GaussMarkov theorem). small ;ample inference in the homoskedastic normal model. ,md multicolline <lDty and the dummy vari
PREFACE
xxxiii
able trap. To accommodate these new sec tions, the new empirical exam ples, the new generalin terest boxes. and the man y new exercises, the core regression chap ters have been expanded from two to fo ur: TIle linear regression model with a sin gle regressor and OLS (Chapter 4) ; inference in regression with a single regressor (Chapter 5); the mul1 iple regression model and OLS (Chapter 6) ; and inference in the m ultiple regression model (Cha pter 7). This exp anded and reorganized treatme nt of the core regression material constitutes the single greatest change in the second edi tion . The second edit ion also includes some additional topics req uested by some instructors. One such addition is specificat ion and esti matio n of models tha t are nonli near in the pa rameters (Appendix 8.1). Another is how to compute stan dard errors in panel data regression when the error te rm is serially correlated for a given entity (clustered standard errors; Section 10.5 and A ppendix 10.2) . A third addition is an introduction to current best practices for detecting and han dling weak instruments (Appendix 12.5 ), and a fourth additi on is a treatm ent , in a new final section of the last chapter (Section 18.7) , of efficient estimation in the heteroskedastic linear IV regression mo del using general ized met hod of moments.
Additional student exercises. Th e second edi tion contains many new exer cises, both "pa per and pencil" and empirical exercises that involve the use of data bases, supplied on the course Web site, and regression software. T he data section of the course Web site has been significantly enhanced by the addition of numer ous databases.
Contents and Organization
There are fiv e parts to the textboo k. This textbook assumes t hat th e student has had a cou rse in probability and statistics, although we review that material in Part I . We cover the core material of regression analysis in Part n. Parts Ill, IV, and V prese nt additional topics that build on the core trea tme nt in Pa rt II.
Part I
Chapter 1 introduces econometrics an d str esses the importa nce of providing quan titative answers to q uantitative questions. I t discusses the concept of causality in statistica l studies and surveys the d ifferent types of data encountered in econo metrics. Material from probabi li ty and statistics is reviewed in Chap ters 2 an d 3, respectively: whether these chapters are taught in a given course, or simply pro vi ded as a reference. depends o n tbe background of th e students.
xxxiv
PREFACE
Part II
Chapter 4 introduces regression with a single regressor and ordinary least squares (OLS) estimation, and Chapter 5 discusses hypothesis tests and confidence inter vals in the regression model with a single regressor. I n Chapter 6, slUdents learn how they can address omitted variable bias using multiple regression, thereby esti mating the effect of one independent variable while holding other independent variables constant. Chapter 7 covers hypothesis tests, including FteslS, and confi dence intervals in multiple regression. In Chapter 8, the linear regression model is extended to models with nonlinear population regression functions, with a focus on regression functions that are linear in the parameters (so that t he parameters can be estimated by OLS). In Chapter 9, students step back and learn how to iden tify the stren gths and limitations of regression studies, seeing in the process how to apply the concepts of intern al and external validity.
Part III
Part III presents extensions of regression methods. In Chapter 10, students learn how to use panel data to control for unobserved variables that are constant over time. Chapter 11 covers regression with a binary dependent variable. Chapter 12 shows how instrumental vari ables regression can be used to address a variety of problems that produce correlation between the error term and the regressor, and examines how one might find and evaluate valid instruments. Chapter 13 introduces students to the analysis of data fro m experiments and quasi , or natural, experi ments, topics often referred to as "program evaluation."
Part IV
Part IV takes up regression with time series data. Chapter 14 focuses on forecast ing and introduces various modern tools for analyzing time series regressions such as unit root tests and tests for stability. Chapter 15 discusses the use of time series data to estimate causal relations. Chapter 16 presents some more advanced tools for time series analysis, including models of conditional heteroskedasticity.
Part V
Part V is an introduction to econometric theory. This part is more than an appen dix that fills in mathemat ical details omitted from the text. R ather, it is a selfcon tained treatment of the econometric theory of estimation and inference in the linear regression model. Chapter 17 develops the theory of regression analysis for a single regressor; the exposition does not use matrix algebra, althougb it does demand a higher level of mathematical sophistication than the rest of the text.
PREFACE
xxxv
TABLE I Guide to Prerequisites for SpecialTopic Chapters in Parts III, IV, and V
Prerequisite parts or chapters
I
I I
Part I
I
I
Part II
I
I
I
I
Part III
10. 1, 10.2
I
I
14. 1 14.4
Part IV
14.5 14.8
IS
Part V
Chapter
13
I
47, 9
I
8
12.1, 12.2
I
I
17
I

1
10 11 12.1. 12.2 12.312.6  13
X' X'
xa
I
X· X' X·
c
15 16 17
 L4 
X· X' X·
xa
X· X· X' X' X X
X X X X X
I   b b b
I
X X

X X
X X
xa
X· X X
X
X X
18
X X
X
this table shows the minimum prerequisites needed to cover the material in a given cha!.'er. For example, estimation of
dynam ic causal effects with time series data (Chapter 15) first requires Part I (as neede ,depending on student preparation, and except as noted in footnote a), Part II (except for chapter 8; see footnote bJ, and Sections 14.114.4. _ aChopters 1016 use exclusively large.sa1.le aphroximations to samplinr, distributions, so the optional Sections 3.6 (the Student , d istribution for testing means) an 5 .6 (t e Student r distribution or testing regression coefficients) can be skipped. bChapters 14 16 (the time series chapters) can be taught without first teaching Chapter 8 (nonlinear regression functions) if the instructor pauses to explain the use of logarithmic transformations to approximate percentage changes.
'
I
Chapter 18 presents and studies the multiple regression model, instrumental vari ables regression, and generaliz ed method of m oments es tima tion of the linear model, all in matrix form.
Prerequisites Within the Book
Because different instructors like to emphasize different material, we wrote tltis book with diverse teaching preferences in mind. To the maximum extent possible, the chapters in Parts III, IV. and V are "standalone" in the sense that they do not require first teaching all the preceding chapters. The specific prerequisites for each chapter are described in Table 1. Although we have foun d that the sequence of top ics adopted in the textbook works well in our own courses, the chapters are written in a way that allows instTuctors to present topics in a di(ferent order if they so desire.
xxxvi
PREFACE
Sample Courses
This book accommodates several dillerent course structures.
Standard Introductory Econometrics
This course inlroduces econometdcs (Chapter 1) and reviews probability and sta tistics as needed (Chapters 2 and 3). It then moves on to regression with a single regressor, multiple regression, the basics of functional fOfm analysis. an d L he eval uation of regression studies (all of Part II) . The course proceeds to cover regres sion with panel data (Chapter 10), regression with a limited dependent vari able (Chapter 11), and/or instrumenta l variables regression (Chapter ] 2). as time per mits. The course concludes with experiments and quasiexperiments in Chapter 13, topics that provide a n opportunit y to return to the questions of estimating causal effects raised at the beginning of th e semeste r and to recapitulate core regression methods. Prerequisites: Algebra II and introdu ctory statistics.
Introd uctory Econometrics with Time Series and Forecasting Applications
Like the standard introductory course, this course covers all of Part I (as needed) and all of Part II. Optionally, the course next provides a brief introduction to panel dat a (Sections 10.l and 10.2) and takes up instrumental variables regression (Chapter 12, or just Sections 12.1 and 12.2). The course then proceeds to Part IV, covering fore casting (Chapter 14) and estimation of dynamic causal effects (Chap ter 15). If time permits. the course can incl ude some advanced topics in ti me series analysis such as volatility clustering and conditional heterosked asticity (Section 16.5). Prerequisites: Algebra f1 and introductory statistics.
.;
:
A pplied T ime Series Analysis and Forecasting
This book also can be used for a short course on applied time series an d fore casti ng, for which a course on regression analysis is a prereq uisite. Some time is spent reviewing the tools of basic regression analysis in Part II. depending on stu den t preparation. The course then moves directly to Part IV and works through forecasti ng (Chapter 14), estimation of dynamic causal effects (Chapter 15). and advanced topics in time series analysis (Chap ter 16), incl uding vector autore gTessions and conditional heteroskedasticity. An important component of this co urse is han dson forecasting exercises. ava il a ble to instructors on the book's accompa nying Web site. Prerequisites: A lgehra {J alld hasic introductory econo m etrics or the eqll;valelll .
PREFACE
xxxvii
Introduction to Econometric Theory
This book is also suitable {or an advanced undergraduate course in which the stu dents have a strong mathematical preparation, or for a master's level course in econometrics. The course briefly reviews the theory of statistics and probability as necessary (Part I). The course introduces regression analysis using the nonmath ematical, applicationsbased treatment of Part II. This introduction is followed by the theoretical development in Chapters 17 and 18 (through section 18.5). The course then takes up regression with a limited dependent variable (Chapter 11 ) and maximum likelihood estimation (Appendix 11.2). Next, the course optionally turns to instrumental variables regression and Generalized Method of Moments (Chapter 12 and Section 18.7), time series methods (Chapter 14), andlor the esti mation of causal effects using time series data and generalized least squares (Chap ter 15 and Section 18.6). Prerequisites: calculus and introduclory statistics. Chapter 18 assumes previous exposure to m atrix algebra .
Pedagogical Features
The textbook has a variety of pedagogical features aimed at helping students to understand, to retain, and to apply the essential ideas. Chapter introductions pro vide a realworld grounding and motivation, as well as a brief road map high lighting the sequence of the discussion. Key terms are boldfaced and defined in context throughout each chapter, and Key Concept boxes at regular intervals recap the central ideas. General interest boxes provide interesting excursions into related topics and highlight realworld stud. ies that use the met hods or concepts being dis cussed in the text. A numbered Summary concluding each chap ter serves as a help ful framework for reviewing the main points of coverage. The questions in the R eview the Concepts section check students' understanding of the core content, Exercises give more intensive practice working wi th the concepts and techniques introd uced in the chapter, and Empirical Exercises allow the students to apply what they have learned to answer realworld empirical questions. A t the end of the textbook, the References section lists sources for further reading, the Appen dix provides statistical tables, and a Glossary conveniently defines all the key terms in the book.
Supplements to Accompany the Textbook
The on li ne suppleme nts accompanying the Second E d ition of Introduction to Econom etrics inclu de the Solutions Manual, Test Bank (by Manfred W. Keil of Claremont McKenn a College), and PowerPoint Lecture Notes with text figures.
These include data se ts [or all the text examples. Suzanne Cooper provided invalu abl e suggestio ns and detailed comm ents on mul tip le drafts. OUI Instructor's Resource Disk. In addjtio n. many people maJe helpful suggestions on parts of the manuscript close to their area of expertise: Don Andrews (Yale). an d the Solut ions Manual. The Solutions Manua l includes solutions to alllhe end ofchapter exercises. Lf instructors prefer their su pplements on a CDROM.comlstoclcwatson. Our biggest debts of gra tit ude arc to our colleagues at Har va rd and Prin ceton who used early drafts of this book jn tJleir classrooms. As a coteacher with one of the authors (Stock). found at www. Eli Tamer taught from an early draft and also provided belpful commen ts o n the penultimate draft of the book . rep licat ion fil es for empirical resul ts reported in the text. John Bound (Un iversity of Michigan). Graham Elliott (University of Calirornia. provides a wide range of addi ti o nal resource s fo r students and faculty. cont ains the PowerPoint Lectu re Notes. she also helped to vet much of the material in this book while it was being developed for a requ ired course for maste r's studen ts at the Kenn edy School. Joshua A ngris t (MlT) and Guido Imbens (U nivcrsi ty of Califo rnia. anti Christopher Sims (Prince ton ). A t H arvard 's Ken nedy School of Government. We are also indebted to two o the r Kennedy School colleagu es.3\Vbc. FinaUy. Berkeley) provided thoughtful sugges tions abou t o ur treatmen t of mater ials on program evaluation. B ruce Hansen (Uni versi ty of Wisconsin . and Key Concepts. A lberto Abadie and Sue Dynarski. A t Princeton. EViews amI STATA tutorials for students.com/irc. and an Excel addin fo r OLS regressions. Gregory . a Compa nion Web site.awbc. Andrew Harvey (Cambridge Un iversity). the Test Bank. for their patient explanations of quasiexperiments and the fi eld of program eval uation and for th eir deta iled comments on early drafts of the text. Mad i son) and Bo Honore (Princeton) provided helpful fee dback on very ~arly o ut lines and preliminary versions of the core materia l in Part n. San Di ego). data se ts for the endofchapter Empirical Exercises.xxxviii PREFACE tables. We also owe much to many of our frie nds and colleagues in econometrics who spent time talking with us about the su bstance of this book and who collectively made so ma ny helpfu l suggestions. provides a rich supply of easily edited test problems and questions of various types to mee t specific course needs. Acknowledgments A great many people contributed to the first edition of this book. while the Test Bank. Our presentation of the mate rial on tim e series has benefited fro m di scussions with Yacin e Ai t Sal181ia (Princeton). avail able for Windows and Maci ntosh. offered in Test Ge nerator Software (Tesl Gen with Q uizMaster) . These resources are avaiJ able fo r download from th e Inst ructor's Resource Center at www.
we particularly thank Geoffrey TooteU for providing us with the updated version of the data set we use in Chapter 9. Joseph Newhouse (Harvard) . Hong Li. David Neumark (Michigan State U niversity). Baltimore County Tim Conley. for his help with aspects of the Massachusetts test score data set. Han Hong (Princeton). U niversity of New Mexico ChiYoung Choi. James Heckm an (Unive rsity of Chicago). Agnello. The California test score data were constructed with the assista nce of Les Axelrod of the Standards and Assessments Division. Many people were very generous in provicling us with data. detailed. University of Maryland Joshu a Angrist. U niversi ty of Delaware C lopper Almon. Bauru . Corp. University of New Hampshire Dennis Coates. and Malt Watson worked through several chapters. Eric Ha nushe k (the Hoover Ins titu tion).Alan Krueger (Princeton). and thoughtfu l com ments we received from those who reviewed various drafts for AddisonWesley: We t hank se veral people for carefu lly checking th e page proof for errors. Queen's University. Massachusetts Institute of Technology Swarnjit S. Richa rd Light (Harvard ). and Andrew Fraker. Un iversity of South Carolina A lok Bohara. A rora . David Dm kker (Stata . and AJa n Krueger (Princeton) for his hel p with th e Te nnessee STAR data that we analyze in Chapter 11. Jean Baldwin G rossman (Princeton). The research department at the Federal Reserve Bank of Boston deserves thanks for putting together their data on racial discrimination in mortgage lending. Boston College McKinley L. Massachusetts Departmen t of E ducation. Ori Heffetz. We are grateful to Charlie DePascale. Mi Iwa ukee Ch ristopher F. Universi ty of Chicago Douglas Daienberg. G reensboro) graciously pro vided us with his data set on drunk driving la ws and traffic fa talities. and Richard Zeckhauser (Harvard). A lessand ro Tarozzi. Student Assessment Services. Ch ristopher Ruhm (U niversity o f North Carolina. Steven Levitt (University of Chicago). Pie rre Perron (B os ton University). Blackburn. Thomas Downes (Tufts). University f Montana . Kenneth Warne r (U niversi ty of Michigan). an d Lynn Brown e for explaining its policy context. We thank Jonathan Gruber (MIT) for sharing his data on cigarette sales.PREFAC E xxxix Chow (Princeton). University of Maryland. Canada Richa rd 1.). University of WiscQnsin. which we analyze in Chapter 10. Caroline Hoxby (H ar vard). Amber Henry. California Department of Education. Kerry Griffin and Yair Listokin read the entire manu script. We are also grateful for the many constructive. Michael Abbott. G raduate School of Business.
A ustin . Denver Mototsugu Shintani. State University of New York . University of Pennsylvania John Veitch.xl PREFACE Antony D avies. Stratton. University of San Francisco Edward 1. Davis Frederick L. Villanova University Gary Krueger. James Madison University D avid Eaton. Madison Peter Reinhard Hansen. Berkeley john Xu Z heng. Keil. Sh ippensburg University Tung Liu. Macalester College Kajal Lahiri. Brown University Ian T. Columbia University William Horrace. U niversity of Richmond Robert McNown. Stanford University M. Virginia Commonwealth U niversity Jane Sung. United States Naval A cademy Bruce E. Austin KimMarie McGoldrick. Duke University Serena Ng. Vytlacil. Vanderbilt University Mico Mrkaic. Syracuse U niversity Pierre Perron. Ball State University Ken Matwiczak. Hansen. Naci Mocan. Wunnava. Duquesne University Joanne M. Goodman. Albany Daniel Lee. U niversity of California. Truman State University Christopher Taber. University of Wisconsin. University of Arizona Oscar Jorda. U niversity of Texas. California State University. University of California. Georgia College and State University Yong Yin. University of Colorado. University of Colorado. Boulder H . California State University. The George Washington University Simran Sahi. Los Angeles Frank Schorfheide. UniverSity of Western Ontario Phanindra V. Daniel Westbrook. Joutz. LBJ School of Public Affairs. University of Melbourne. Fullerton Rae Jean B. Claremont McKenna College Eugene Kroch. Georgetown University Tiemen Woutersen. Buffalo jjangfeng Zh ang. Ithaca College Manfred W. Johns Hopkins University Jan Ondrich. Australia Marc Henry. Northwestern University Petra Todd. The George Washington University Elia Kacapyr. D oyle. State University of New York. Henry. F leissig. Murray State U niversity A drian R. University of Pennsylvania Leslie S. University of Minnesota Sunil Sapra. University of Texas. Boston University Robert PhilEps. Middlebury College Zhenhui Xu.
We also benefited from thoughtful reviews for the second ed ition prepared for Addison Wesley by: Necati Aydin. University of O regon Jamie E me rson . Robert Porter. and attention to detail improved the book in many ways. John Donohue. Sylvia Mallo ry. Hansen. F resno Bradley Ewing. Jane and Sylvia patientl y taught us a lot about writing. Roberto E . John List. Avinash D ixit. Nancy Fenton (managing editor) and her selection of Nancy Freihofer and Thompson Steele Inc. Michael Jansson . Bo Honore. organization. Chris Foote. Ross Levine. D anie l Hamer mesh. Steve Stauss. who handled the entire production process. the changes made in the second edition incorporate or reflect suggestions. We have been especially pleased by the number of ins tructors who contacted us directly with thoughtful suggestions for this edition. Hong Li. Ken Simons. Harvey Rosen. William Greene. Do uglas Staiger. whose creativity. sta rting with o ur excelle nt editor. hard wo rk. Tom D oan.PREFACE xli In the first edition we benefited from the help of an exceptional development editor. We exte nd o ur thanks to the superb A ddisonWesley team. and their effo rts are evident on every page of this book . Chris Murray. Jalon Gardella . Clarkson UniverSity Scott England. Jeffrey Kling. Jim Bathgate. Susan Dynarski. Wei bin Huang. In particular. and Samuel Thompson. California Sta te Universi ty. Craig A. Florida A&M U niversity Jim Bathgate. wh o worked with us on the second edition: Adrie nne D 'A mbrosio (senior acquisitions editor). Liran Einav. A ddisonWesley pro vided us with first rate support. Jane Tufts. Linfield College James Cardon . University of Arkansas . Laura Chioda. and presentation. Ed McKenna. This edition (including the new exercises) uses data gene rously supplied by Marianne Bertrand. G iovanni Oppenheim. William E vans. Texas Tech University Barry Palko Iowa State University Gary Ferrier. Charles Spaulding (senior designer). H eat her McNally (supplements coordinator). and Motohiro Yogo. com ments. Finally. Graham Elliott. Depken ll. and Denise Clinton (editorinchief) . and Della Lee Sue helped with the exercises and solutions. Manfred Keil. corrections. we had the benefit of Kay Ueno's skilled editing in tbe second edition. Alan Krueger. JeanFrancois Lamarche. Cecilia Rouse. large and small. and extending through the entire publishing te am. Jeffrey Lieb man. Elena Pesavento. Kim Craft. We also received a great deal of belp preparing the second edi tion. and help provided by Michael Ash. Bridget Page (associate media producer) . Southern Utah University Brad Curs. Peter R. George Tauchen. Minot State U niversity R. Brigham Young U niversity IMing Chiu.
Iowa State University Charles S. Madison Norman Swanson. Emory University Susan PorterHudak. Wassell. University of California. They more than anyone bore the burden of th is commitment. University of Wisconsi n. and for their help and support we are deeply grateful. Brown University Sharon Ryan. . University of Central Florida Edward Greenberg. J L . University of G eorgia William Wood. Sacramen to Ron Warren. Wayne State U niversity Ta eHv. Writing this book took a long timefor them. Northwestern University Tom oni Kumagai. SUNY at Brockport Kyle Steigert. James Madison University Above all. University of Wisconsin. Califo rnia State University.Madison Christina H ilmer. Central Washington University Rob Wassmer. Wright State U niversity Brian Karl Finch. Virginia Polytechnic Institute Luoj ia Hu. San Diego State Un iversity Shelby Gerking. Rutgers University Justin Tobias.xlii PREFACE Rudy Fichten baum. Northern Illinois nive rsity Louis Putterman. Washington University Carolyn 1.ry Lee. Columbia John Spitzer. University of Missouri . Riverside Elena Pesavento. Heinrich. the p roject must have seemed endless. we are indebted to our families for their endurance throughout this project.
This ehaple r poses four of those questions and discusses.. macroeconomics. . marketi ng.CHAPTER 1 Economic Questions and Data A is the a~ sk a half dozen econometricians what econome triCs is and you could get a half dozen differe nt answers. Al a broad level. microecono mics. . A fourth might lell )'ou thai it is the science and an o ( using historical data to make nume rical. including political scicncc and sociology. policy recommenda tio ns in government and business. in general terms. The chapter concludes wilh a survey of the main types o f data available Lo econometricians for Answering these and other quantita tivI: ecunouUc questions. Econometric me thods a re also common ly used in Olhe r soci al sc iences. I n fact.ay thaI econome trics is the process of fi tting m:uhcmatical economic mode ls to rcal world da ta. labor economics. Another might ::. ind uding fi nance . all these answers are right . Or quanti tati ve.~ 1 second might Iell you that econometrics of tools used for forecast ing fut ure va lueS of economic variables slIc h a firm 's sales. Thi s book introduces you to the core se t o f methods used oy econometricians. quantitative questions taken from the world of business and governmenf policy. the econometric a pproach to answering them. One might tell you that econometrics is the lh e~ science of testing econom ic ::. eco no me trics is the scie nce a nd ar! of using economic theory a nd stat istical techniques to analyze economic da l ~. a nd eco nomic policy.. Econo me tric me thods are used in many bra nc hes o f econo mics. We wi ll use these methods to answc r a variety of specific. or stock prices. the o\'("rall growth of the economy.
precise ly. howeve r. evidence based 00 data. we mu st e xamine empirical evidence. bUI for many pare nrs and educators the most impo rt ant objective is basic academic lea rning: reading. Is the beneficial effect on basic learning of smaller classes large or small ? Is it possible that smaller class size actu· aU) has no C(Ccct on basic learning? A lth oug h common sense and everyday expe ri ence may suggest that more learning occurs when there are fewer sludenls. These decis ions require quanlitalive answers to quantitative questions. To provide such an answer.S. writing. building more classrooms. and basic mathematics.1 Economic Questions We Examine Many decisions in economics. BUI what. Many of the pTop os al ~ concern the younges l sWden ts. In the Cali fornia da ta. business. lea rning is e nhanced.4 CHAPTER 1 Economic Ovestion~ oncI Data 1. each student gels more o f tbe leac her's anenrion. public ed ucation system genera te heated debate. To weigh COSIS and benefits. is the e ffect on e lementary school educat io n of red ucing class size? Reducing class size costs money: It requires hiring more teachers and. if the school is already ai capaci ty. and macroeconomic forecasting.re lating class size 10 basic learning in ~kme nt af y schools. cigarette consumption. In this book .that is. tbose in elementary schools. the re are f~wer class disruptions. r Question #1 : Does Reducing Class Size L Improve Elementary School Education? Proposals for reform of tlle U. we examine the re lationship between class size and basic learn ing using data gathered (rom 420 California school districts in 1998. This book examines several quantitative queslions taken from cu rrent issues in economics. With fewer students in the classroom. and gove rnmenl hinge_o n understanding relat ionships among variables in lbe world around us. students in districts with small class sizes te nd to perfonn belter on standardized tests than students in districts with larger classcs. Ele me ntary school education has various objecti ves. One prominent pro posal for improving haste lea rning is to reduce class sizes at e leme ntary schools. such as developing social ski lls. Four of these questions concern education policy. racial bias in mort gage lending. A decision make r contemplating hiring more leachers must weigh these costs against Ule benefits. While this fact is . and grades improve. common sense cannot provide a quanlitalive answer to the question of wha t exact ly is the effect on basic learning o f reduci ng class size. the decision make r must ha ve a precise quantitative understanding of the likely benefits. the argume nt goes.
the n. Pt'Y U~~ Most people buy thei r ho mes with the he lp of a mortgage. lending institut ions cannot lake race into account when deciding to grant or deny a reques t for a mo rtgage: Applicants who are iden. Question #2: Is There Racial Discrimination in the Market for Home Loans? M~~ ilu"""I.<. if so. beca use the bL ack a nd wbjle applicants diffe r in man y ways other than th eir race. districts with small class sizes tend to have wealtbier reside nts than districts wit h large classes. Do these data indicate that. how large is it? The fa ct that more black than white applica nts arc den ied in the BasIOn Fed dala does nOI by ilsel f provide e vide nce of discrimination by mortgage le nders. so students in small class districts could have mo re opportunities for learn ing outsid\! the clnssroom. Many o f the COSIS of smok ing. It could be these ext ra learn ing oppor tunitjes that lead to highe r lest scores. Befo re concluding that there is bias in Ihe mortgage market ..age le nding. there is racial bias in mo ngage lending? 1f so. sucb as the economic background of the students.Jt idtntical applicants and. io C hapter 11 we introduce econo me tric me thods that make it possible to quantify the effeci of race o n c hance of obtainin g a mortgagl1.a large loan secured by t he va lue of the home.is to isolate the eUecl of changes io class size from changes in other factors.cal0 all ways but their race should be equallY likely to have their mOTl· gage applicatio ns approved .1. tbe re should be no racial bjas in mOrl g. we usc mu ltipl e regres. these data mUSI be examined man: closely to see if there is a difference in th e probability of being denied for OIherw. In contrast to tbis theo retical conclusio n.ion ana ly:. it m iglll simply re neel many o lher advantages that students in districts wit h sma ll classes have over the ir counte rparts in districts with large classes. In theory. nOi smaller class sizes. notablv their abilitv to re ay the loan.S. holding COfISWnt o the r applicant characteristics. Question #3: How Much Do Cigarette Taxes Reduce Smoking? Cigare tte smok ing is II major public hea lth conce rn worldwide. in practice. In Part II . By law.1 Economic Oues~oo~ We Examine S consistent with the idea that smalle r classes produce bette r lest SCOfl!S. researchers Dl the Federal Reserve Bank of Boston found (using data from th e early 1990s) that 28% of black appli · cants are d en j~u mongages. To do so. For example.ring fo r Ihose made sick by . whether this diffe rence is large or sma ll. U. such as Ihe medical expe nses of ca.!i. while o nly 9% of while applicanl~ are denied.
[n Chapter 12 we study methods C or handling this "simultaneo us causal ity" and use those methods to esti mate lhe price elasticity o f cigarette demand. have higb smokiog rates. we need to analyze data on eigarene consumption and prices. tlnd s tales with high prices nave low smok ing rales. by what percentage will the quanti ty o f cigarcltes sold d ecrease? The percent age change . Bul by how much? Jf the sales p rice goes up by I %. it does not tell us the nume rica l va lue of the pr ice ci<lSlicil y of demand . and pc.S. say 20%. Because Ihese costs are borne by people other than the s moker. In these data. consumption will go down. One of the mosl flexib le wols fo r CUlling consumption is to increase taxes on cigarettes.o to the beach? . states wit h low taxes.n the quantity demanded resu lting from a 1% increase in price is the price elasticity of demand.es. bU I if there arc many smokers i. there il> a role for govern· me nt interve ntion in reducing cigarette consumption. the analysis of th ese data is complicated because causal ity runs b01b ways: Low taxes lead to high demand. But what is the price e lasticity of demand fo r cigarettes? Although economic theory provides us with the concepts thHt help us answe r this questio n. if so. To \carn the e lasticit y we must examine empirical cvidence abo ut th e behavior of smokers and potential smokers .e. by how much? Will city tax rece ipts next year cover planned expenditures on city services? Will your microeconomics exam nex t week focus on externalities or monopolies? Will Saturday be a nice day \0 g. The dal3 we examine are cigareue sal es. Howe\'er. are borne by other m embers of sO<"iet)'.n Ihe statc the n local politicians might try to keep ciga rette taxes low to satisfy their smok ing constilUents.rsona l income for U. states in the 1980s and 1990s. ta:. by flIi sing taxes. lh en we need to know lhe price elasticity to ca lculUlc the price increase necessary to achieve this reduction in consumption. and thus low ciga re lle prices.. Basic economics says thai if cigarette prices go up. prices. Question #4 : What Will the Rate of Inflation Be Next Year? II see ms thllt people always wa nt a sneak preview of tbe futu re. If we wanl to reduce smoking by a cert ain amounl.6 CHAPTER 1 Economic 0ue5tions and Data smoking and the tess quantifiab!e costs to nonsmoke rs who prefer not to breathe secondhand ciga re lle smok.. What will Sales be next yea r at a firm considering in vestin g in new equipment ? Will lhc st ock market go up next month and . in ot her words.
> cont.. This model.ize . Quantitative Questions.lr. holding othe r thing. then they might increase interest rates by more than that to slo. and the European Ce ntra l Ran k in Fran kfurt. If they guess \'oTong. D. introduced in Part n. the mainstay of econometrics. '" hat effect does 11 change in cla~s :.:l tis tical lel'h niques to q uanli fy re latiunsh ips in hist orical data. so their decisions about how to set inte rest rales re ly on the outlook for innation ove r the next ye. by analyzing data.depcnding o n her best guess of the rate of iol1a tion over the coming year.tant..C. 1'le data we usc to fon:cas\ int1ation :Ice the rates of inflation and unemploy ment in tbe Uni ted States. 1 Economic QuestiOO1 We Examine 7 O ne a~pcct of the fulUre in which macroeconomists and financial economists arc particularly inte rested is the rate of O'w crall price inflation d uring the nc'\:t ) ~a r. For example. our answers alwtlys have some uncerta inty: A differe nt sel of data would produce a different numerical answer.thc conceptual [ramework for the analy sis needs to provide both a numerical answer 10 the queslio n fi nd a measure o r how prcci~e the answer is. and econometricians do Ihis by using economic theory an d st. One of the inflation fvrecasts we deve lop and evaluate in Chapt er 14 is based on the Phi ll ips curve.. the~ risk causing. down an econ omy tbat. An impoTlant empi rical relationship in macroeconomic dat a is the "Phillips curve. that is. O er many. If they thin k the ra te or inflation wil l increase by a percentage point.ht ad\ isc a client whethe r to make a loan o r to tak e one o ut at (1 given rate of interest.111ere(ore." in which a currently low value of thc unemploymerll rate is associated with an increase in the rate of inflation over the next year. A financial profes~ional mig.1. Economic theor) pro vides clues alJout that answcrcigarelle consumpl ion ought to go down when the price goes upbul the actual va luc o f th~ num ber must be learned empirically. in thcir vic\\.. Economists at central hanks like the Fcdcrol Rese rve Board in Washington. provides a math ematical \\ ay to qua ntify how a change in one \!ariable affccts another variahle. Quantitative Answers Each of these four questions requires a numerical answer. Beca use we usc data to answcr qua ntilath e questions. The conceplUnl framework used in this book is the multiplc regression model. risks overheating. Professional economists who rtdy on preci§c numerical forecast s usc econo me tric mode ls to ma ke those forecaSl ~ A fu rcca~ t c r's joh is to predict the fu ture using the pas\. arc responsible for keeping the rat e of price innation under control. dlher an unnecessary recession or an undesirable jump in the ratc of inflation..
Touc hing a hot Slave causes you to get burned. pUlling fe r tilizer on you r tOmato plants cau ses the m to produce more t(lm[l toes. say 100 gram s of fertili zer pe r sq uare meter? One way to measure this causal eUecl is 10 conduct an experiment. e nsuring that any other di ffere nces between the plots are unrelated to whetb er they receive fe r tilizer. the honicuhuralisl we ighs the h<lrvest from each plot .sbles. C <lu sality means that a speciCic action (appJyi ng fertili ze r) le ads to a speci(ic. In that experiment. measurable consequence (more tomatoes).lt is controlled in the sense tha I the re are both a co ntrol group that receives 00 treatment ( no . putting air in your tires causes them to infl a te. holding ConSllI1If the in come of smo kelrs and potent ial smokers? The multiple regression model and it!'. 1. At the end of the growiog season. with one exceptio n: Some plots get 100 gr<lms of fe rtilizer per square meter. the firsllh ree questions in Sec lio n Ll concern causal re l ~lionshjps among var. drinking wa ter causes yo u to bl! less thirsty. ex te nsio ns provide a framework for answering these questions using data and {or quantifying the unce rt ainlY aswciated with those answers. while the rest get none.2 Causal Effects and Idealized Experiments Like ma ny questions encountered in economerrics.. an action is said to cause an outcome if the outcome is the direc t result . ln common usage.8 CHAPTER 1 Economic Questions OM Data ha ve on lest scores. Each plol is te nded ident icaU y.The dif· ference between (he average yield per square meter of the treated and untreilted plots is toe effect on tomatO production of [he fe rtilize r treatment. of fh al action . a horticultural researcher plants ma ny plots of tomalQCs. Estimation of Causal Effects H ow best might we measure [he causal effect on tomalQ yield (measured in kilo grams) of applying a certain amount of fertili ler. or consequence.holdillg COII SWIlf Other (uctQrs such as yo ur ability to repay the loan'? What effect does a 1% increase in the price of cigarelles have on cigarette consumption. whether a plot is ferti lized or oot is de te rmined randomly by a compute r. Thi s is an eX<lmple of a randomized controlled experiment. Moreover. IwldinR consran( slUde nl characteristics (such as family income) Utal a school district administnHor cannot control? What effect does your race have on your chances of ha ving a mortgage application granted.
the causal effect i ~ de fined to be the effect on an outcome of a given action or treatment. A good way 10 ' fo recasl " i f il is raining is to observe whether pedestrians are using umbrellas. how sunny the plot is and whe ther il receives fefli li7. In fact. TIlt:: concep t of the ideal randomized con trolled expe rimenl does.1. however. provide a theoretical benchmark for an econometri.2 Cousal Effects and Idealized Experiments 9 ferti lizer) and a treatm ent group that rece ives the treatment (100 g1m 2 of fertil izer).1 co ncern causal effects.e r. Ihen it will yield an esti mate of th e causal effect on the outcome of interest (tomato pro ductio n) of the treatment (applying 100 &1m2 of fertilizer). ex perime nts are ra re in ccono me lrics beca use often Ih.ndomly assigning '" trea tme nIS" of diffcrent class ~izes to differ~n t groups of student s' If the expe ri ment is designed and eXt!cuted so that the only sy~ (emali. It is possible to imagine an idea l random izet! controlled experiment 10 answer each of Ihe first three qUt!stions in SeCi ion 1. for exa mple. it i ~ not possible to per· form idea l experime nts. or prohibitively expensive . multiple regression a n a l ysi~ allows us to quantify historical . You do not need 10 know a causal rela lionship to make a good fo recast. as measured in an ideal randomized controlled exper ime nl. In this book . Fo r example. As we sec in Chapter 14. Evcn t hough forecasting ne ed not involve ca usal relationships. This random assignment e lim inates the possibility of a systematic rclationsll ip belween. to siudy class size one can im agine ra.cdifference between the groups of students is their class size. so that t he only systcmatic diffe rcnce be tween the treatment and control groups is Ihe u eal ment . economic theory suggests patterns and relationships that might be use ful for forecasting. Forecasting and Causality Although the first thr('c q ueslioos in Section 1. tbe fourthforecas ting innationdoes no\. The concept of an idea l random ized conuoHed experiment is useful bel. If this expcri m~nt is properly implemcnled on a large enough scale. then in the ory this experiment would estimate the effect on test scores of red ucing class size.cy are une thicaL impossible 10 execute satisfact orily. holding al1 else constolH .1. In prllctice. bUll he act of using an um brella docs not cause it to rai n. how ever.:ouse it gives a definh ion of a causal effeCI. I n such an experimenl. the only systematic reason fo r d ifferences in OUI comes belween thc trealme nt and conl rol groups is the trea tment itself.c analysis of causal effects using actual data. II is randomized in tbe sense t hat the trea tment is assigned ra ndomly.
uch us historical n:cords on Illortg. the statl.ers incxpensi\c cig. 1Il some circumstances experiments arc not on ly expensive and difficulllO adminbter but also unethical. which we examine in Chllpter 13. Obscrvatjonal data pose mujor challe nges to econometric allcmph to esti mate causal effeclS. \0 make quantitatiVI! (oreca~1S about the future.. Becaulll! realworld experiments ~it h human llubJects .lesigned to cvahl(lIe a treatment or policy or 10 invcstigute R causal effect. For e. and leachers m er several years. the student. is dc\'oted to methods for k: ~p eri l1lenl j) 1 .&tions and Dota relatlunships suggc~ted by economic theory.Qnomi(::) are ra re.. Much of econometrics. levels of "treatment" (the amount of fer tilizer in the tomato c_"ample. The ' lcnncssee class size ex perimen t cost millions of dollars and required the o ngoing cooperalion of many adm inistrators. pa renl. 1.. Moreover. Data obtained by observing actual behavior outside an t"xperimclltai setting <lre called obscrvaliomll data . experi ments in c(.s size exa mple) arc not a~~igm:(j ul random.o.teacher ratio in the cla:.arcllcs to ~c how ma n) they buy?) Because of these financial. and the tools of econometrics to t:lckle these rhullenge~ In the real ~orld.lcms were ra n domly u~signcd \0 d. practical . and ethical problems.!>.Ire difficuilio :ldmin ister and torontrol..: ofTcnncsscc finance d a I ~rge random it.3 Data: Sources and Types In econometric. Experimental versus Observational Data data come fro m experiments t. th ousand~ of "tut. Instead.10 CHAPTER 1 Economic Q. (Would it be eth ical to offer randomly selected tecnag.:perimental dahl sets.This book examines both expe nmental and oon e. and much of Ihi~ hook. and to asse~s the accuracy of those forecast. data come from One of 1\\0 sou rces: experimenls or nonexpcri mental observations of the world.ed controlled experiment examining class site in the 19SOs. <i. 110 It i ~ difficul t 10 sort out the effect of the "treatment " from other relevant factors. and admini"irative records.:l ss~ of d ifferent si7es for sevcral years ami were given ann ual standardil.. mOllt economic data are obtained by observing real\\ orld behavior. In that expe riment.cd tests.." ampil:.. Observation<ll data arc colkctcd using SllrvCy~ sllch 3" Iltclephone survey of consumers. to check whet her those relationsbips have bl:CII stable over time.lge applications main tai ncLl by lending institutions.thcy have flaws relative 10 ideal randomi7ed controlled exper imcnts..
and panel data.3 Data: Soorces and Types 11 meeting the challenges encountered when realworld data arc used to estimate causa l effects. the pen:l:!1tngc of students for whom En gli ~ h is a s~(.1 . in the Califorma dala set II = 420. The remaining rows prcsent data fo r other d islriclS.10 gene ra l.1. o r Olher economic en tities du ring a single lime period. For exam ple.on d languagc an d who are 110t YCt proficient in Englishis 0% . and so fort hfor a single time period are called '.. is 17. arillblcs listed vary considera bly. Fo r exam ple. The dat a set contains .89. time ~eri es data. so for exam ple. Whether the data are experimen tal or ob~cr"ational. Our data set o n the ratcs of inflation and unemployment in ' he United State~ is an example of a lime series data SCI. CrossSectional Data Da ta on different cntitiesworkers. The averag. the number o f students in district #1. Time Series Data lime series data ure data for a single entity (person. we ca n learn about re lationships among varillbles b) ~Iudying differences across people. data se ts come in three main types: crosss~ctional data.. firms.'fosssectional datil . gm·cmmenta l units. The California te~t score data set contains measurements of ~c\('ral different varia hies for each district. and the number of the district. firm. Some o f these data are tabulated in 'Iable 1. The percentage of studen ts in that district still learning E nglishth at is.te:lche r ratio in tha t dist rict is 17J':Y. In this book )'ou will encounter all three types. The order of the rows is arbitrary. Average expen ditu re: per pupil in district #1 is $6.. country) collected at multiple lime periods. that is.consumers.. all the . divided by the number of classroom tea(.8: this is the ave rage of the math and science test scores for all fi fth graders in that district in 1998 on a standardizeLl test (the Stanford Ach ieve men t TCM). With cro~sectio na l data. firms.3R5. is an arbitrarily assigned number that organizes the datil. Each row lists data for a differen t district. As you can see in the Hlblc. which is called the observation number. the avcmge test score for the firM district ("'d istrict #1 ") is 690. the data on test scores in Californi'l school districts are cross sectional.e stude nt."hcrs in d istrict #1 .nlose data arc for 420 en tities (school districts) for a single time period (1998). the number of entilit:S on which we bave obse rvations is denoted by II.
that is.t he rate of unempl oyment was 5.1 % of the labor fo rce report ed that they did not have a job but were looking for work.lIion was 2. the rate of price inflation was 0. {Fifth Grad_ ) Studen.er Variables for California School Distric~ in 1998 P...0% .0 13.1n Pupil (S) Leominil English 690.2 S6385 5099 0. The data in each row correspond to a differe.2fl 19.1 %.0 672. 5.3%.70 17..t.rcentoee of Ob servation {District) Number j Diih'icl Average rest Stor. February.7% per year at all annual rate. 5993 Calif()mi a !C~t :>ooro: daLa 111:1 <Jc:scrilJ.. 4 1 ' . Some observations in this data set are listed in Tabl e 1.. . Because th ere nre 183 quarters from 1959:11 to 2004:lV.645.. ".d in Appendi x 4..8 661. and June .. ..!mploymeol) for a single enfity (the United Slates) for 183 time periods. the rate of CPT inf1.9 5  6W. this data set contains T = 183 observa tions.0 . and Ma rch : Ihe sec o nd quarter is April... CPl) would have increased by 0.no 17. 1% . May.ln the second q uarter of 1959.1 o bservationS 0 0 two va riables ( the ra les of in(]ation and unt.6 641. and the ra te o f unemployment was 5.8 tKo7 5236 _ . the overall price leve l (as measured by the Consumer Price Index. 24. 1 ~ted Ob$eTvatiom on Test Scores and ()tf.89 2Jj2 18..n l time period (ye ar and qu arter). if inflation had continued for 12 months Ht its rale during the sec ond quarter of 1959. which is denoted 1959:11 and end in the tourth quarter of 2004 (2004: IV). for exa mple. 441)3 . ..6 30..7 5502 7102   0.. pet" Studentt • . In other words. In the third q uarter of 1959.. .0  643..7 % . Each lime period in this dala sct is a quarter of a year (tbe first quarter is January. In the second quarter of 1959... 418 41 ' 420 NOlI' " Th. The observa tio ns in this data set begin in the second quarter of 1959.:1 j) . and so fOrlh )._ . 1 Economic Ouestions and Data TABLE 1 .()4 .2.. 2U lI} 20.   .3 4776 JO 5. lime peri ods) in a time series data set is denoted by T.2 655.12 CHAPTER. The number of observations (thal is.T eacMr bpenclitu.
In the cigarette data set.3 5.1 2OOJ1I 2()().3 1. and the num ber of time periods is denoted b~ T. through 1995.1 % 5.3.3. The number of entities in a panel data set is denoted b)' II..5.or crt an annual ra".s. 1>1 I ~l '.7% 2.l:1II 200UV ~ 5. Unemployment Rat.1. 4. also called longitudinal data.crved at (wo or more time period ~. (I re da ta for multiple entities in whic h each entity is ob. (%) . Panel Data Panel data.. a nd \0 fO r1h.. 3 19. organized alpha helically from Alabamllto Wyoming. cigarcllc sales in Arkan ~~ were 12K5 packs pe r capita (the tolal num ber of pack~ of cigarettes sold = u..:: variables. The fir~ t block of 48 obse rvalion ~ lihts the data for each Slate in 1985.d Observations on the Rates of Consumer Price Index (CPI) Inflation and Un&mpioyment in the United Stoles: Quarterly Data.59:IV 1960:1 1960:11 0. time series data c<l n be used to st udy the evolution of variables over lime and to forecast future values of thos. and !ielecled vilriables and observation:.4 0. in Ihat data SCI are listed in Table 1.59:111 19. wc have obsen 'ations o n" 4..1 5. J9592000 Obefvarion Number (Vear:quarter) . . So m~ dala from Ihe cigarc lle consumption dala set are listed in Tablc 1.6 .3 Dolo.6 • . Sources and Types 13 TABLE 1. For example.1 5.6 '.4 . O ur da la on cigarette COI1 :.2 SeIect.umplion and prices (Ire an ~xample of a panel data sel.59:11 19. CPllnRmion Rote (% per y. Thus the re is a tOlal o f /I x T = 48 x II = 528 obsen·atiollS.4 54 IS3 ·\·olr 35 described in A f'PI'ndi~ 14 I. Ill" t·" InnaLM'on 3nd un<'mplo\1T1cnl tbLa loCI Oy tracki ng a single cnlily over time. The next block of 48 observations lisls {he data for 19M. in 1985..~continental states (enti ties) for T = 11 years (time pe riods) from 1985 to 1~5 .
.!{.8 115.ai  3 A rizona . in Arkans:ls in 1985 divided by the total population of Ark a nsas in 1985 equals 128.986 1987 127. .0i5 2 Arkans.1. ~bc:d i..022 1. st al e.935  n.995 112.. and Taxes.135  528 Wyoming .382 '" 49 ' 985 .The average price of It pack of cigarettes in Ar.8 129.olon tQJ<J SO.2  0. 1lIe Clgar. COP5umpUon dDt~ SCI h.1.. and pane l d ata a re summa rized in Ke y Concept 1.240 0.S.3.360  NOft.335 97 1.n Appendi:>. Alaba ma Cigarett..2 1.""" 1. ta)c.5). of which 37~ we nt to federal .et.kansas in 1985.1107  96 Wyoming Alabdma . by State and Year for U.362  47 West Virgima Wyoming 1985 11 2. incl uding tax. 12.985 .14 CHAPTER I Economic OvesIion ~ ond Dolo TABLE 1.585 0. 19851995 A~Pric.240 0. time series d a ta.015.O!t9 0. Prices.. .3)3 0.8 0.985 (poco per UlpitgJ per Pock (Induding tox••) + .986 Al nbama 11 7.4 J. exci.985 U'S6 0. Tohri Taxe. and loca l taxes.'\4 1. Pa nel da ta can be used (0 learn about economic rl'ia tionships h om the expe rie nces of the many difCe rent enl ities in the da ta set a nd fro m the evolution ove r lime o f the variables for each e nt ity. (cigar. was $1. SlaMS.370 I 1165 1285 1045 SI.3 SeIeded Observations on Cigarette Sales.. TIle definitio ns of c rosssectional data. ObNlrvotion Number . Soles y.
Ma ny decisions in busim:ss and economics require quant itati ve estim ates of how a ch ange in one va riable atfec ts Another variable. J. 2.each of which is obse rved a t multiple poin ts in time. Econo me trics provides tools for estimaling causal effects using ei the r observa tional ( no nexpe rime ntal) da ta o r data from realworld . or 10 0 expensive. the way to estim a te a causal effect is in an ideal randomized con trolled experi ment. whe re each e ntity is observed at two or more time period!. • Pa nel data (also kno wn as longitudinal data) consist of multiple e ntities. Key Termi 15 CROSSSECTIONAL. impract ica l. Conceptually. but performing such experiments in economic appl ic(l ti oos is usually unet hical. Crosssectional data a re ga the red by observin g multiple entities at a single point in time. Key Terms randomiled contro lled e xpe rime nt (S) conlrol grou p (8) treatmen t group (9) causa l effec t (9) experime ntal dat a (0 ) observational da ta (10) crosssectional data (l L) observation number (I I) lime se ries data (11) pa nel data (13) longitudina l dala (13) . imperfect experime nts 4. lime series da ta a re gathe red by obse rving a single e ntity a t multiple points in time: a nd panel data a re ga the red by observi ng multiple e mities.. TIME SERIES. AND PANEL DATA • Crosssectional da ta consist of multiple entities observed at a single time period.1 • TIme series data consist of a single emity observcd at multiple time periods. KEV CONCEPT 1. Summary 1.
De ~ribe: :. Sugge!il some imped.16 CHAPTEt 1 Economic Oues~on5 and Dolo Review the Concepts 1. .. 1. an observational panel data SCI for st udying Ihi~ dfecf.1 Design a hypothetical ideal randomized controlled experiment to study the effect of hours spent studying on performance on microeconomics exams.I.3 You are asked to study the relationship between hours spent on employee training (measured in hou~ per worker per week) in a manufacturing plant and the prodU(. Suggest some imped iments to implementing this experiment in prtlclice.2 Design <l hypothetical idea l randomized controlJed experiment to study the crfect on highway traffic deaths of wearing seat belts. all ideal random ized co ntrolled experimelll to measure 1hb e. an observational crosssectional data set with which you could study this effect . an observational time series data set for studying this effect: and d.:l u~a l effect : b. 1. c.livity of its workers (output per worker per hour). imen ls to implementing this experiment in practice.
chapter.2 covers the mathema tical cxpectation. Sectio n 2. mean..CHAPTER 2 Review of Probability T his chapte r reviews the core ideas of the theory of probabililv thaI a rc nectl ~d to understand regression an alYsis and econometrics.·~ntral importance in econometrics: the randomness that a rises hy randomly drawing a sample o f data from a larger popula tion. and varia nce of a single random variable:. 1f you feel confident with the material.3 introd uces the basic cle men IS of probability theory for two ra n(]olll variables. chi·squared. Mecau5/! you chose the sample a t random. and Section 2. The theory o f proba bility provides ma thema tical tools fo r qua ntifying a nd describing this randomn e~ Section 2. you should rcfrc<:h it by reading thi!.ou still should skim the chnph:r and the Icml S a nd concept s at the end \0 make sure you arc familiar with Ihe ideas J nd nOl3tion. a nti F distributions. ).. We <I~sume p roha ~ i lity that you have taken an introductory course in and statistics. and compu te the average earnings using these ten daln points (or ··observations'·). o ~clions of this chapler focus on a specific source of randomness of (. For example. suppose you survey tcn rece nl co llege graduates selected a t random . you 17 .. Most of the in teresting pro blems in economi cs invoh c more than onc variable. Jf your knowledge of prohabilil~ is state. The fina lt .4 discusses three srecia l prubabili ly d istributions Ihat p laya centra l role in sla Tislics il Od econometrics: the normal. record (or "obscn·c··) their earnings. Most aspects of the world around Uii have an elemenl of randomness.lbility distributions for a single random variable.1 revie ws prob.. a nd Secfion 2.
Only one of these outcomes will <lclually occur ( the outcomes a re mutually exclusive). complicated. referred (0 as its sa mpling distribution because Ibis di stri bution de. and tbe outcomes need not be equally likcly. For exam ple.)\'erage has a probability d istribution. This sa mpling distri bution is. the sample .5 disc usses ra ndom sampling and the sampling di. When th e sample size is sufficiently large. .18 CHAPUR 2 Review of ProOObility could have c hosen ten differt:nt gradua tes by pure random chance.nown lhat is eventually revealed . howe ver. yo u would have o bse rved ten different earnings and you would have comput ed a differe nt sample ave rage.stri bution of th e sa mple (Ive rage. a nd the number of times your computer will crash while you afe writing a term pape r all have an ele me nt o f chance or rnndomness. Section 2.~.1 Random Variables and Probability Distributions Probabilities. It might crash o nce. in general. which is discussed in Sec tion 2. il result known as the ce ntral limit theore m. ' n le probability o f an ou tcome is t he proportio n of the lime that t he outcome occurs ill the long run. the Sample Space. your computer might never crash. you will comple te 80% without a crash . ln each of these exarnpk:s.6. The refore. your grade on an exa m. and Random Variables Probabilities and outcomes. The gende r o ( the nex t ne w person you meet . thcn oyer the course of writing many term papers. there is something nOl yet k. The mutually exclu sive potentia l results of a ra ndo m process atc C<'l iled tbe outcomes. the sampling distrib ution of th e sa mple average is approximately normal . had you done so. Because the a verage earnin~ vary from one randomly chosen sample to the nex l. il might crash twice.ribes Ihe different possible values of the sample average that might have occurred hnd a diffe re nl sa mple been drawn . a nd so on.lhe sample average is itself a random variable. If the probability o f yo ur com puter not crashing while you are writing a lem p<lper is HO %. 2.
or 16%. a nd 1%.<. t hree..umulat ive pro bability di. 6%. if your computer crashes four times. For example. denoted Pr(M ::: 0). 1.lshe~. A n .xa mple. .: probabilhies s um 10 l(XI%..W = 1) + Pr( M = 2) = 0. For e. The probability distributiun or a disc rete r andom \'. respectively. . lll C set of a ll possible o utcomes is called the sa mple space..'.jves Ihe (. le i M be the nu mber o f t i m e~ yo ur co mpute r c ras h~s while you are writing a T erm pape r. ~o il b a random variable.'aria hlc a nd the probability t hat each value . the probability of the eve nt of one or 1\\"0 crashes is the sum of the proba bilities of the cons tituent o utcomes. you will quit and write Ihe pape r by h<lnd. :t discrete nmdom \'ariable lllkes on only a discre te set of va lues. Cumulative probability distribution .ing a term paper is random and takes on a numerical valul.. that is. 111C l a~t row uf'labll: 2.' rrobabilit y o f a si ngk compu te r crash: and so fOTlh . 3 % . 1 Random Variables and Probability Distributions 19 The sample space and rvents. The probability of an eve nt can be eom pUled from t he probability distribution . An eVl'nt is a subset of the sa mple space.uiable is the list o f a ll possible va lues o f the .1 6. Pr( M = I) is th(. l11at is. or four crashes is. Thest. The cumulative probability distribution i"the prnbability lh:llihe random variable is less than or equal to a particular value. TIle event " my computer will crash no more than once" is the ~c l co nsisting of twO outcomes: "no crashes" and " one crash. an event is a set of onc or more outcomes. A random variable is a numerical summary of a random outcome. 2 . Probability Distribution of a Discrete Random Variable Probability distribution . like O. According 10 thiS distributio n.xi1mfllc of a probabi lity distribution fo r M is given in the second row of Ta ble 2.1 b.. the probabili ty or no crashes is 80%: the probabil· ity of one crash is 10%: a nd Ihe probabilit y of two.·. . Tnese probabilities sum to I. Some random variables a re discrete a nd some are eontin uou. This pro bability di~lribution is plo u e d in Figure 2. 1. whereas a continuous Iandom nlri Hble ta kes o n a COnlinuum of J>O".1." Random variables. A~ their names suggest.00 = 0.trihutiun of the random .ible value:.1: in this dist ribution. . Pr(M = lor M = 2) = Pr(. The numbe r of times your computer crashes while you nrc writ. i~ (h e pro ba bi lity of no computer cr.viii occ ur. The probability distrihution of the random variable M b th e list of probabi li ties of e ach possibl e o utcome : th e fl robabi lity that M = O. Probabilities 0/ rvents.10 + 11.
96 0. the probability of at most olle crash. so the P'"obobility of 0 com pu te r crashes is 80'1. th at is. 1 compuler cro~ The height 01 each bor is !he probability thot the the indicated number 01 times. Probability Distribution of the Number of Computer Crashes Pro ln biliry Th e Bernoulli distribution.[. A bi nary rllndom vJJ iable is ca lled a B ~ rnoulli rand om va riable (in ho nor of the seven teenlhce nlury Swiss ma thematician and scienti st Jacob Gemoll lli). th e o utco mes are 0 or I.80 .10 0. Pr(M :S I).8.00 0.80 0.d.1 Probability of Your Computer Crashing At Times Outcome (n umber of trG$h. e probobility of 1 com' puter crash is 10%. FIGURE 2 .. is 90% .' . 0. A n important special case of a discrete random variable is whe n the rando m variable is binary.99 0.' Nu m~r of cn.boes .. A cumulative probabili ty dis lributi on is also referred to as a cum ulative dis· tributioll fUll cti on .01 ProbabililYdi mibution Cumulative probabilily diSiribulion 0.06 0.90 1.. which is t11e sum of the probabilities of no crashes (80%) and oC Ollt: c ra~h ( 10% ).20 CHAPffR 2 Review of Probability TABLE 2. or a cumulati ve d istri bution . For example. a nd so forth for the other bars.00 variable M. and its probability distribution is called the Bernoulli dist rib ution . 50 14. a c. 0. The height of the li r~1 bor is 0. The height of the second bor is 0 ..J 0 2 3 • 0.
For exa mple. The area unde r the pro ba bility de nsit). the probability that th e commute takes less than 15 minutes is 20"/0 and the probabilily (hat il la kes less th an 20 minutes is 78%. le t G be the gende r of the nex t new pe rson you mee t. For example. th e cumula li ve pro bability distribution of a continuo us random va riable is the probabili ty that the random varia ble .. function between any two points is the probabilit y that the rando m varia ble falls between those two po ints.58.d. Inste ud. A probabil ity de nsity function is also calle d a p. . or 58% ..2a pl o ts a hypothetica l cumula ti. 1 R ondom Voriobl63 One! Probobility Distributions 21 For exam ple.2a. Figure 2. which lists the probability of each possible value of the random . Because a continuous random variable can take on a contin uum of possible val ues. it is 03tur31 10 Ireat it as a continuous random variable.2b p lo ts the probability density function of commu ting limes corre sponding to the cumulative dislribution in Jlgu re 2.f. a de nsity fu nctio n. e d istribu· tion of com muting tim es. the pro ba bili ty distribution used fo r discre te variables. .p. The probabililY thai the commu te takes betwee n 15 and 20 minutes is given by the a rea under the p. is no t suitable for continuous variables. lhe probability is summarized by the probability density fundioD . because it depends on ra n· do m facto rs such as the WeatJle r and traffic conditio ns.2.. conside r a student who dri ves from ho me to school.d.. Tha t is.s less than o r eq ual to a p articular val ue. (2. Equi vale ntly. The outcomes of G and their probabilities thus are 1 with probability p { G = 0 with probabilit y I . Figure 2.1) is tbe Bernoulli distribution. this prob a bility can be see n on the cumulative distribut io n in Figure 2. o r simply a density.anable. The cu mula tive probabilily distribu· tion for a continuous variable is defined just as it is fo r a disc re te ra ndom va ria ble. The probab ility di stribution in Equation (2. Probability Distribution of a Continuous Random Variable Cumulative probability distribution . This stude nt '5 co mm uting time can lake o n a conlinuum of va lues a od .. wh e re G "" 0 indicates tha t the pe rson is ma le and G = 1 indicates that she is fema le. which is 0. 1) where JJ is the probability of the next new person you mee t bei ng a woma n.2a as the difference Probability density function .L be tween 15 mi nutes a nd 20 minutes.
78 {78%}.dJ) of commuting times.. :::::. 0.2 Cumulative [)i$tribvtion and Probability Density Functions of Commuting Tima Probability 1. ..) of commuting limes. •· J 1' 1 I~ 0.0.20 {or 20%). ood!he probobilily thai it i.22 CHAPTER 2 Re"Iiew of Probobilily FIGURE 2 . Pr ((ommurmg 1Jme S IS) .H n f..20 0.(1' .11 "'7"'~ . . 25 )) ~ ~ CUJnm\ltin~ ( i lnt' (milUII(lS ) (b) Il ruh.:" ./ I '" Probab ility df'llsi ly 4115 " .20 tl. leu thon 20 minule$ is 0.. The probobiHly thol 0 com muting lime is less than 15 minutes is 0.ll Pr (l) < Comml1ti"9llmt S 20).ll6 Pr (Commuling bIM :> 20):: 0 22 1111.f. Probabililiei orc given by ontos u~ the p.22 '! M . Figura 22b ~$!he probability density fvnctiQl'l (or p.1 "'I/'} \\.' 058 !J20 Illxll .linn of '(}OUlilUtil1!o! nme Figure 2 20 ~~ the cumulative ptobobitity dimibvtion {OI'" c. i1 0 58 158%). 10 0. 058 """' n.L~.78 1l. and i1given by the area under the curve belwNo 15 and 20 minules . Commutin g rim e (llIi nUlcs) •• J~ '" II) C urrmunw .INnbuIHlI'I lillll'UOIl ufcomllLuring 11111(' Pr(CommutmgtJrn«:S 15).d.lbd' t\' d<'lNf'I fUIl.d f The pobobility tnol 0 commuting time is between 15 and 20 minule1.
06 + 3 X 0. where the weights a rc the probabilities of that outcome..2 Expected Volues.35. the expected number of compu ter c rashes while writing a term paper is 0.01. The expected '1llue of a ra ndom \'ariable Y. (2.lhility distribution sho\\ the same information in dif4 ferent formats. 4 X 0.2) means that the average num· ber of cra~h~ o\cr many such term papers is 0.cnse 10 say tha t the computer crashed 0.10 + 2 X 0. Expected yalue ofa Bernoulli random 'IIariable. weighted by the frequency with which a crash of a given size occurs. 2. The e xpected value o f M is tbe a verage number of crashes over many term papers. bUl l % of the time you would get nothing.03 . and Variance The Expected Value of a Random Variable Expected value. Thus. An important special case of the general formula in Key Concepl2. A~ a second cx<lmplc. the ca!culutio n in Equation (2.35. Thus the expec ted value of your repayment (o r the "mean repayment") is $1 08. E(M) = 0 X 0.90.""y. The expected v'ltue of a discrete random variable: is comp Ulcd as a weighted average o f the possible ou tcom~ of that random variable..90. the acw al number of c rashes musl always bt: an integer: it makes no <..01 = $108. co nsi der the number of compul e r ccashes M wilh the probability rJistributi on given in Table 2.35. Mean.1. If the loan is repuid you get $110 (the principal of $100 plus interest of $10).1t..2. The expected value of Y is alw called the expectation o f Y o r the ml.lI ca n take: On k different \ alues is given as Key Concept 2.'an of Y and is denoted by .99% of tbe time you would be paiL! back $1 10.01 = 0.99 and equals SO wit h probability 0.2) That b. ond Voriance 23 hctween the probability that the commute is less than 20 minut es (78%) and the probability that it is less than 15 minutes (20%).Thu~ the probability density func· tion and the cumulativc prob. O f cou rse. the amount you lire repaid is a random vari able that eq uals SIlO \\ jlh probabilit y 0. For exa mple. 'Cordingly.1.99 + SO X 0. suppose you loan a friend SIOO a t 10% inle ces!.1 i~ Ihe mcan 01 u Bernoulli rllnJom . O ver many such loon".!'o on average you would he repaid $110 x 0. is the longrun [lverage v<liue of the randum va riable over many repcated trials or occurrences.00 + 1 X 0. Mean. de notcd £(Y). AI.u lt and yo u wiIJ gel nothing at al l. h Ul lhere i ~ a risk of I % tha i your friend will deff.2 Expected Values. 111t' formula for the ex pected value of a discrete random variable Y .35 times while writing a particular tcnn paper! Rather.
is FXPFrTFn _ _ _ _ _ VAlllF AND THE MEAN . where. . which is the square rool of the \ ariance and is denoted U)" The :. The expected va lue of a continuous random variable i ~ also Ihe prolmbility·wcighlt:d average of lhe pos sible o utcomes of the random va (iable. denotes the first value. . .. _y"..1 Suppose the random variable Y takes on k possible values." Expected value of a continuous random yariable.Y2 denotes the second value. Because the variance involves the square o f Y. the prohabi lit y that it lakes on the value " 1. ) 2]. £(Y) = YI P I + YZPl T ••• + y" Pl = LY.v. The variance of a random variable Y. denoted var(Y). and so fort h. i'" I (2 .. )'. variable. Lei G be the Be rnoulli random variable wi lh lhe probabi lity distribution in Equatio n (2.P. The expected value o f Y. (Z.2. is the expected value of th e square of the deviation of Y from its mean: var( Y) = E({Y _ ~ .3) where the n Olat ion " ~~]Ji Pi " means ·'the sum o f Yi P. It is thererore co mmon to measure the spread by the siundard devia lion . Ihl! formal ma thematical de fi nitio n of ils expecta tio n involves calculus and its definition is given in Append ix 17.4) Thus the expected va lue of a Be rnoulli random variable is p . The ex pected va lu e of G is E { C) ~ 1xP+0 x (I . 1).24 CHAPTER 2 Review of Probability  f KEY CONCEPT 2. These definitions are ~ummarized in Key Concept 2.p) ~ p.Iandard deviation has the same units a~ Y. The Standard Deviation and Variance 1lle varia nce and standard deviation measure the dispersion or the "spread" of a probability distributi on. Because a continuous random variable can HIke o n a continu um of possible values. which makes the variance awkward to interpret. the units (If the variance are the units of the sq uare of Y. fo r i running fro m I (0 k:" The expected vaJue of Y is also called the mean of Y or the expectation of Yand is denoted lJ. and that the prob ability that Y lakes on )'1 is P" the probability thai Y takes on >"2is P2' and so fo rth.1 .y. denoted £(Y) ..
1lO + (I .tax earnings X by the equation Y = 2UOO + 0.0 1 = 0. afterlax cOI rni n.tax earnings)(. Mean and Variance of a Linear Function of a Random Variable 'ntis section discusses random \ ariablcs (say.0.0..2 (2.6475 '" 0.0.lriable G with prohabi lity distribution in E quiltion (2.8.35)' x 0. 10 . the variance of the number of computer cm. .2 Expected Volues. {y. Variance of a Bernoulli random yariable. denoted (f~ •• is °1 ___ VARIANC E AND STANDARD DEVIATION KEYCONCEI"I I (1'\~ = "m'(Y) = f":{( Y .LG = P (Equa tion (2.S) 11\:lt is. For exam ple.1lO.35)' X 0.35 )' x O.r 2. MI!Jon. For e xample.35).X and Y) that arc related by a linear function .J.35)' x 0.6) + (3 . TIle mean of the Bemoulli nm da m v.4)] so its v<Hia nce is "ar(G) = uJ = (0 .1) is J.\) p.L) )2. The units of the standard de viation a rc the Slme as the units o f }'.p) + (I .06 (2. and Vooonce 25 r The: vuriUllCC of the discrete random va riable Y.35: var( M) = (0 . X 0.yf P" TIle ~t3 ndard dcvint ion of Yis (F l" the squart: roo t of the variance.l!. 2:. U nder Ihh ta x ~heme.s )? is 80% uf pre.7) lltus the standard devia tion of a Bernoulli random nlriahlc is (T(.0. con!'ider an income tax scht:mt: under which a ""orker is taxed III a ral~ of 20% on his o r her ea rnings and then given ..\'.pl· :: (2 .1'(1 .6475. plus $2000. i_ I 2. 0. (2. TIle sta ndard d evia tion o f M is Ihe !>LJunrc roOl o f the vari ance.03 + (4 .0. v'p(l p). a ft erlax earn ings Y arc reb tcd 10 pre. (taxfree) gr.mt of 52000.p)' x p .p)' x (I .\ = .hes At is the pro b 3hilityweigbtcd average of the squared difference betwee n M and its mean. so (T M "" vO.(2.
so. Wh at are the mean and standard devia tio ns o f her aftertax earnings under this tax? Afler taxes. ..~ ••• : __ . Thus the expect ed v~ l ue of her afterlax earnings is 0'1. . '1 ". l l1is section dis· cusses measures of two other features of a distribution: the skewness.. Ihe standard de viatio n of Y is a" = 0..8.8(X .I'x» )' ) = O.Lx.. 12) and (2....1 .0. This ana lysis can be ge neralized SO tha i Y depends on X with an inlercept a (instea d o f $2000) and a slope b (i n s te ~d of 0.1 ..8I'x) = O. so that Y = a + bX....8(1'"... and the standard deviation of Y is u y = ba x' Tht: expressions in Equations (2. 1l) 0 + bJ.. her earnings arc 80% of the original pre·tax earnings.SJ.:.. so arc afterlax earnings. (2. taking th e square root of the va riance.8). ~ 2000 + O....64var(X ). ..8(X .·.. plus $2 . Other Measures ofthe Shape of a Distribution The mean and standard deviation measure IWO importaot fcalurc:s of a distribu tion: its cen ter (the mean) and ils spread (the standard devialion). Then the mean and vari ance of Yare Py = (2.I'.·".8 X .)... BC(a use Y = 2000 .26 CHAPna 2 Review of Probability Suppose an individual's pre· lax earnings next yea r arc a random variable wiLh mean /. the standard deviation of the distributio n of her aft erlax ea rnin gs is 80% of the stand ard deviati on of the distribution of pre· tax earn ings.riunce of aft ertax: earnings is th.13) if} = b2ul.l'x)').....111 ) [I re applicati ons of th ~ more ge nera l fonnul as in Equations (2.8X.10) 11mt is._.<o<I"r.9) and (2 . :_••.I ''..Lx and (2.)'J= E[[O..r.9) The va.1' .1 3) wilh a = 2000 a nd b = 0.. 11 follows thal m(Y) ~ O.. :.(2000 + 0.Lx and variance Because pre·tax: earnings are random..1'..t . .. £( Y) ~ ~y ::::: 2(XX) + O.LLy)'l. El( Y . Y .......'. which .12) (2.. (2.11>u...e expected value of ( Y ..000.Ii4 £ l(X .
The skewness of the distribution of a random variable Y is Skewness = £[ (Y .llws.ty)J will be offset on average (in expectation) by equally lik ely negalivc va lues. then some ex lreme depar tures o f Y from its mean are lik ely. then posi tjve va lues o f (Y .rom extreme va lues." are its tail s. If a distribut ion t)as a lo ng left tail. Kurtosis. or "heavy. An eXlreme value o f Y is called an outlier.Y. Skewness. The greater the kurto sis of a dislribution. its skewnes. The skewness of a distri bution provides a mathematical way \0 desc ribe how much a di stribution deviates from symmetry. and Variance 27 how thick. Beca use ( Y .)..15) If a distribution has a large amount of mass in its lails. Ef( Y . Dividi ng by u ~ in tbe denominator of Equolion (2.JL yt. . changing the unils of Y docs nOl ch. lf so.JL)')~ afe no t full y o ffset by neg a live vu lul!s.e.3d appears to devjate more from symme try than does the distribut io n in Figu re 2.eon. Visually.3 plots four distributions. skewncss. is a measure of how much of the va riance of Y arises f././ky) J) q~ (2. The kurtosis of a dislribuLion is a measure of how much mass is in its tails and. lhen a positive va lue of (Y . the kurtosis cannot be: negative. of (Y . th e distribution in Figure 2.3 is ilS skewness.i. in Olher words. 14) ca nce ls the uni lS of y J in the numeral0 r.J.ty) ~J = 0: the skewness of a symme tric distribut ion is zero. for a symmeLric distribu tion. Ilnd these very large values will lead to large va lues.t y)3 generall y is nm offset on average by an equally likely nega tive value. so the skewness is unit fre. the more likely arc outliers. . variance. a va lue of Ya given amo unt abovc its mea n is just as like ly as a va lue of Y the same amoun! below its me<m . and kurtosis are all bused on what are called the moments of a distribution. Figure 2.}J. The kurtosis o f the distri bution o f Y is Kurtosis = El(Y . For a symmetric distribution.2. twO which are symmetric and two which <I re not.'J <7} (2. lhe reIore.s is negati ve./. Th us. positi ve va lues of (Y . on ave rage (in expect ation) . Bdow each of the fo ur dislribulions in Figure 2. If a dislribulio n is nOl symmetric.y)" can not be negative./.2 Expected V~.ts skewness. If a d istri but jo n has a lo ng righllail .). The mean.mge j. the kurtosis will be large. 14) where ITy is the sta ndal'lJ devia tion o f Y.3c. for a distribution with a large Ilmount of mass in its tails. and the skewness is positi ve. so lhe skewness is nonzero {o r a disni butio n lhal is no t symme tric.
.\ " .28 CHAPTER 2 Review of Probability FIGURE 2 . A distribution with kUrlo~i s exceeding . 111." kurtosis exceeding 3 [bd) hove heavy loill " me kurtosis of a normally distributed random variable is:t so a random vari · abl e with kurtosis exceeding ~ has more mass in its tails than a normal random variable.'I Iq~ / \ I i' ~ 0. ( !) "2 IlL 11 ':i) I .\·WIlt"\~ 0. 5 .e djW"ibutiOl'1~ hove 0 mean of 0 and a variance of I. !l_3~ \ u" 11..6.. k urlO'il> . IJ .: . "'t H.\ / \ J 1. n.I'! . ..~ ... kurto". . \ 0. .. II. Below each of the four distributions In Figure 2. kUrooi.2 . The dislribvrioru with skewnen of zero 10 and b) ore symmetric. 3 . . more simply.5 II.hs tribu tion. kunos].. the di$lributions with nonzero skewness (c and d) are not symmetric_The distributions wi. . " = .3 (\.\ '" 1)... .' IU Four Distributions with Different Skewneu al'ld Kurtosis '" "..~ is called Il'ptokurlic or. in tIg ure~ 2. :W !. . so changing the units or y dOC3 nOI change its kurtosis. All of the:..  3 .tiled.. '" :'i " I >.J is its kUrlnsi. ~l (d) " .u U. \\ur<. n.. heavytailed .. Like skewness. (c) ~1..<' .r 0..1l . "" (a) skc\\ I1l"' ''::: ()..3bd arc hC3\yt:. .1. . . the kurtosi ~ is unil free.' (1.. . .1 ( b) ')kCI'v1K"5 .~ .1.1 t" I .
and lei X be a binary random variable that equais O if il is raiD ing and 1 if nol. Y = 1) = 0. is coiled the second moment of Y. Y . Joint and Marginal Distributions Joint distribution. Y = I) = O. 1l1e joint prob. 11le joint probabi lity distribution is the frequency with which each of these fou r outcomes occurs over IlHlny repea ted commutes.lity di~tribut. E(Y).63.0).lribu tion of two random variahles. is the probability tha t lhe f(ln dom variab l e~ simultaneously take on certain values.3 Two Rondom Voriobles 29 Moments. Y = 1).2. ATe collcge graduate~ more likely to have ajob tha n nongraduatcs? HO\~ does the dis tribution of income for women compare to that for men?These questions co ncern the di<. Y = 1). I. In general. Y = y). 1nese four possible out comes are mutually exclusive and const itute the sample space <. The skewn e~s is a function of the first. there are four possible out WIllCS: it rains and the com mute is long ( X . and no rain and shon commute (X = I. involve two or morc variables. Bet ween these two random va riables. The probahilitiell of all possihle (x . . that is.ion s. o. Y = 0): rain and short commute (X = 0. is a l ~o ca lled the firs t momen t of Y. a nd the expected value of the square of Y. Also. 15% or the days have rain and a long com mUle (X = O. and third mome nts of Y. and Pr(X = I.07. the ~xpc clCd value of Y' is called the r l h mome nt of the random variable )J..15. Pr(X = I.. Le t Y be a binary ran dom variable tha t equals I if the commutc is s hort (less th an 20 minu tcs) and cqua l~O otherwise. Pr(X = 0. no ra in and long commute (X ".. income and gender in the second). say x and y .sec ond. An example o f a join t distribution of these two va ria bt e~ is gi\'en in Table 2.o. and Ihe kurtosis is a function of the first through fourth moments of Y. The mean of Y.. and cond itiona l probabi.0 the four proba bilities sum to l. or Pr(X = 0. According to this distribution . say X and y . weathcr conditionswh e the r o r not it is rainingaffect the commuti ng lim\! of the student comm ut e r in Section 2. The joint probHbilily diSlrilmfion of Iwo d iscre te random variable!'.\') combi· n<ltions sum to 1. over mc ll1Y commutes. £( Y:!).abil it y distri bution can be wriUen as the funct ion Pr(X = x. the rib moment of)' is £( 1"'"). lhe probability of a long.2.1. 15. Y = 0) = 0. l' = 0).mar ginal . A n ~wering such questions requires an unde rstandi ng of the concepts of joint. Y = 0) = 0. For example. 2. considered togethe r (ed ucatio n and employment SlatllS in the fi rst exa mple.l11al is. rainy com mute is IY'/o.3 Two Random Variables Most o f the interesting questions in economic.
biJil Y thai it will r<li n is 30%.=1 I (2.litional on another ra ndom variable X taking on a spcci fic value is ca lled Ihe con ditional distribu lion of Y gil'en X. 'Tlte conditional pro bability Ihat Y take!io on t h ~ Vul lle y whe n X t<l kes o n the value X is writte n Pr( Y = y lX = or).16) For example. 1) T . For exa mple. conditional o n it being rainy (X = 0).2. is 50%.2 Review 01 Probobility TABLE 2. the marginal probability of rain is 30%: that is.22 I)  0.78 100 Marzinal probability distribution.2.on) from thc jo int diMribut ion of Y and a na l he r ra ndom va riable.30 CHAPTU.15 0.' Long Commute ( Y Shon Commute ( y Tolal '" 0) "C 0.. so if it is ra ining a long commute and a short comm ute are equall y likely. The ma rginal distrib utio n of commut ing times is given in the final column of Table 2. The m<l rginal distribution of Y can be com put ed fr om the join! distri bution of X a nd Y by addin g up the probabilili~s. over many commutes it rains 30% of the time. in Table 2. Equivalently. Y ~ y).x" the n the ma rginal probabilit y that Y takes on the value y is PT(Y ~ y) ~ :L P. .30 0. 50% = . of £I ll possible out comes for whi ch Y takes on a specified va lue.. what is the probability of a long commute (Y = 0) if you know it is raining (X = O)? From Table 2.. the proba· bility of a long comm ute (Y = 0)..07 0. or Pr( Y 0 IX = 0) = 0.2 Joint Distribution of Weather Condi6ons and Commuting Times Roin (x '" 0 ) No Roln (X .Th is te rm is used (Q distinguish lhe distribution of Ya lone (the ma rginal diSlribut. (X ~ x. as shown in the fina l row 0 1 Ta ble 2. so tt)e probability of a lo ng comm ute (rainy or not) is 22%.15 0. Th u<. Similarly.50. Conditional Distributions Conditional distribution. II X can take on f different v<llues X I ' .. the joint pro babilit y of a ra iny short com· mute is 15% and the joint probability o f a rainy long commut e is 15 %. The mQrgi'ud I>robability distribution of a random variable Y is jusl anothe r name for ils probabililYdistribution. the ma rginal pro b.2.2.63 0.. The distribution o f a ra ndom variable Y conc. Of this 30% of commutes. the probability of a long rainy commute is )5% and the probability of a long commute with no ril in is 7% .70 0. ..
the age of th e computer yo u use.. "' _ 3 "' .50 = 0.3.05 0. is given in Part B of the table.065 0.8 0.0) Pr(. . For e:w:am pie.Q25 O. 1 ".(y = YI X = x) = Pre: ~. Because you are randomly assigned L a a co mpule r. . 0.W' A I) ".= 0. the conditio nal probabilit y o f nocrashes." = Y ~ y) (2 .17) Par example.50 O. Suppose the joint disLribution of the random variables M and A is given in Part A of Table 2.. Y = O) I Pc(X = .02 O.3 T wo Random Voriobles 31 TABU 2. Conditional Distrioorions of M given A '" Prt. 4 0.02 0.1 1.01 '" _ 3 '" 0.35. In contrast . because half the computers are old .00 1. the conditional d istributio n of Y given X = . .t3 0. the cond itional probahili ty of no c rashes given tha t you are assigned a new compute r is 90%.O) / Pr(A = 0) = 0.07 0. As a second eX!:I mple. A .35 . A (= 1 if the compuler is new. consider a mod ification Of tnc cras hing compute r exam ple.Ql Tou l 0.00 0.3.1) 0.01 .0.2. or 70%. "' _ 1 '" .50.1 0.35 /0.Q35 0.OS 0._1 0.r is P. = 0 if it is old) .01 In gene ral. is a random variable.3 Joint and Conditional Distributions of Computer Crashes 1 M! and Computer Age (A) A.'/IA . Suppose you use a computer in the libra ry to type yo ur term pape r a nd rhe librarian ra ndomly assigns you a compute r frum tnose ava il a bl e.1 5/0.06 0..30 0. Then the condi tional distribution of computer crashes.m 0.00 B.00 •. is Pc(Y = OIX = 0) = Pc (X = 0.00 o. lhe cond ilional prohability of a lo ng commUle given thal it is rainy 0) .10 0.70.given that you a re using an old com pute r.90 0. . According to the conditional distributio ns in Part B of Table 2. 0) ~cwcomp u l c r(A 0. Joint Distribution '" Old compu!er (A . the probability of three crashes is 5% wi lh a n o ld computer but J % with a new com puter. the newer compUie rs a re less likely 10 crash tha n the old ones: for exam ple.UO~ 0. . . half of which a re new a nd half of which a re old .03 1. given th e age of the computer. the jo int probabilil YM = 0 a nd A = 0 is 0.45 0..lO . is Pr(M = OIA = 0) = Pr(M .
: X ~ x).omen.Il'r(X ~ x. given thaI the computer is nc:w. The mean of Y is th. lhe mean number of cT<\shes i.05 + 4 x 0.sing thi! conditional distribution of Y given X and the outer expectation i~ com· puted using the marginal di"tribution (If X. is the weigh ted average of the mean he ight of men and the mean heigh l of women .lbl:o E(Y) ~ L .. also c.18) ano (2.20) is computed u.20) is known a. E(Y) = EIE(YIX)].rcisc 2. ..ln number of cr.. the conditional expectation I~ lhe t:=:(pecled value of Y.1 7) (sec EXl. I I E( Y I X ~ .\ 4. Pre Y ~ y. is E(M A 0) = a x 0. computed using the conditiona l distribution of Y given X If Y lakes on k values YI ...56 fo r old com· puter<.J mathematically..19) Eq uation (2.\ cight.:d average of the conditional ('x(lCctation of M given Ihal il is old and the cund.14. Similarly. the law of ilerated expecl.1l1ed the •.r is jU $t thc mean value of Y whe n X . then the conditional mean of Y given X = of is E(YlX ~ x) ~ Conditional expectation .s 0.. TllC conditiona l expecta tion of Y given X = .• )~.'o nditional mean of Y gino X . Equation (2.14. tha t is.Qlion..shes is 0.: weighted CI\' er" g~ of the cond itiona l expectation of Y given X. For example. = = = The law ofiterated expectations...02 = 0.\". the conditional e:\:pectation of Y given that the co mpu tcr is new is 0.10 + 3 x 0.l(pcct<ltion on the righthand .l.ln the e xam ple ofTable 2.13 + 2 x 0.!mong !le. the me. the mean number of crashes M i3 the \..). if X la k. Stalci. h:~~ Ih:1Il for Ihe old computers... That ic.J ..3...56.56. (2.1lJ) follows from Equations (2.111(.I • .lional expectalion .. For example. St:Hed differl!ntJy.' expected Dum· her of computer crashes. co mpu te rs.x..19). . .side of Equat ion (2.32 CHAPTER 1 Review of Probability The condit ion al expecl alioll or Y gh cn X. the I!xpectCltiol1 of Y is the expe ctation of the condit ional expectation of Y given X.. is £(M A 1) 0. (2.lhc expected number of computer cf<I!lhes.. the mean he ight o f adult!. hased on the condit ional dislribUlions In Table 2. so the condi tio nal l! xpecta lio n of Y give n thm thc compute r is o ld is 0.18) For example.. weighted b)' the probability distributio n of X.. \. is the mean of the conditional distribu· tion of Y given X.es o n the I \ al m~~ x l. x1..J.. give n tha t the compu te r is old.70 + I x 0.20) where the inner c. L)' (2..\ eightcd hy the proportions of men ani.
. thaI is.20). the variance of the distri· bu tion in the second row o f Panel H o f Table 2. Stalc.0.lb is is an immcdintc consequence of 0::: E4 ualion (2. if E( Y IX) = 0.2. kt P denote the number o f program~ installed o n the computer. Exercise 2.:. The conditional vari:lOce of M given thaI Ii I j .O.\1 . l~) is less than that for old complllers (0. let X. 0::: 0::: E( YI X =x)1' PreY =).1l1is is the mean of the ma rgina l tlistnhution (If M. so the standard de\'iation of M for new computers is \1'0. t .50 = 0.1properties of conditioml l expectations with mult iple \ anahles. which is 0. :(M ). (3 . iY...21) For example. where E(Y ~ X.56)'2 x 0. is smaller for new computers (0.l tions say~ that E( }') = £{£( Y .0.56)2 X 0. the.t that has P programs installed. and Z be random vari· :lblcs that are jointly distributed.3.cro.. The law o f iteratt. y.50 + 0. 0::: 0::: 0::: . the condit ional variance of Y given X is .56)2 X 0. var(Y I X = x) = L .. so £(/1'1) = E(M iA = 0) X Pr(A = 0) + £(MIA => 1) X (' r(A .1be standard deviation of the contlitiona l distribution of M gi"en tha i A 0 is thu~ v'O. 111en the law of iterated expeC(. For exa mple..22 0. as measured by the condilional s tandard deviation.0.2). i~ tht. The \l ari ltDCe of Y conditional on X i ~ the variance of Ih~ cond it ion al distribut ion or Y gi ven X.1lle expectl!d numbe r of c ra~h~s overall .14 X 0. Conditional variance.22.:r crash illustration of Table 2.:d ex pectations implies thut if the conditional mean of Y given. as calculated in Equation (2.:r wi th age A and number of progra ms P. For example.n the mean of Y is i'cro. (2. '1l1e law of iterated expccHllions also appl ies 10 expectations tha t arc condi· tional on multiple random v::trianlcs. P) i~ the expected nu mber of crashes for a computer with age ..99. [X = x) .99).ero. and the spread of thc distribution of the num· ber of crashes.( .56)'2 X 0..47. thc mean of Y must be zero.3. the expected number of crashes for new com puters (O.70 + (I .0. X. then F.S"id dirrer· cntly.05 + (4 .56 X 0.02 ~ O.56)! X 0.! weighted avcrage of the expected number of crashes for a comput.99 = 0.35.. if the mean of Y given X is I. then it must be th nt thc probahilityweighted avcrage of th ese conditional means is zero.d mathematically. weighted by the propo rtion o f computers with that value of both A and P.3 TYI'O Random Variables JJ of M given that it is new.13 + (2 0. then E(Y) = ElE(Y [XJI = £IOJ .3.1 (I .\' IS z. For the conditional dis tribu tions in Table 2. the cond itional variance of the num ber of crabhes given that the compu ler is old is \ar(M IA 0) (0 .47) th:m (or old (0. Z) is the conditional c'<pcctalion of Y given both X and Z.56).t .99.20 provides some addition. in the comput. 1) 0. Z ) 1.
then the covaria nce is give n by tbe for mula cov(X. I (2. X and Yare inde penden tly distrib uted iI.p. Covariance and Correlation Covan'ance. . If X om take on I values a nd Yea n take on k values./Lx)(Y . (2.23) Thai is.. Finally.. . and whe n Xis less tha n its mcan (so that X . and vice versa).22) into Eq uatio n (2." y) Pr(X ~ x" Y ~ y. the jo int d istributio n o f two independc nI random va ria bles is the prollucl of their marginal distributions.22) Substitut ing Equation (2. Y ~ y) ~ Pr(X · x)Pr(Y ~ y) . Thai is. the product (X ../L y)l .)) =L L . then Pr(X ~ x.x ) X (Y . If X a nd Ya re independent. then the covaria nce is zero (see Exercise 2. suppose that when X is greater tban ils mcan (so that X . Prey c y1 X . In contrast. fnr all values of x and y .24) .) . whe re ILx is the mean of X and /. Y) ~ d Xy ~ E{(X .iJ. if X and Y lend to move in opposite d irections (so thai X is la rge whe n Y is small.19).r) = PreY::: y) (inde pende nce of Xa nd Y) .I'.L x is positive). or indepe ndent. so the covariance is posilive. then Y tends be grcateI than its menn (so that Y . The ctw aria nce is denoted by cov(X..J." . if X a nd Y are indepc ndc nI .Ll' is the mean of Y. y is positive) . Specifica lly. t 7) gives an alterna tive expre!. One measure of tbe exte nt \0 which t wo random varia bles move toge ther is their covariance./. if knowing tbe va lue of one of the variables provicJes no information about the o ther.u y) tcnds to be positive.TIle covariance between X and Y is the expected value EI(X . (2. X and Y arc independern if the conditio nal distribution of Y gh'en X c40als the margina l dislri bUlioD of Y. (x.\ /'" \ To interpret this formu la./Ly < U).· sion for indepe ndent rando m variables in terms of their join t distribution . .x < 0). In bot h cases.Y) or by (T XI' .1 34 CHAP1U:1 Review 01 Probobility Independence Two random variables X and Y are independently distributed.~x)(Y . .i. then Y tends to be less than its mean (so that Y .y}(y. the n the covariance is negat ive.
20) ].27) We now show this result. The Mean and Variance of Sums of Random Variables 'nle mean of Ihe sum o f Iwo random variables.X) = 0 and corr(Y. however.I and I: tha t is. ff the condi tion al mean of Y does not depend on X.X) . = eov( X. divided by their stan dard deviations: corr(X.is the sum of their me'llls: £(X + Y) = £(X) + £(Y) = /L. Speci fi cally. O. the units cancel and the correlation is unitless. X and Y. An example is given in Exercise 2. then Y and X are uncorrelated..}. ·lhis "units" p roblem can make numerical values of the covariance difficul! to interpret. The random va ri a b l c~ X and Y arc said to be uncorrelated if corr(X.28) .2fi) Correlation and conditional mean . deviated from thei r means. That is. the units of X times the uni ts of Y.23.X) = E[( Y . :I Two Random Voriob~ 35 Because the covariance is the product of X and Y. awkwardly. . YJ v'var( X ) var( Y) =~ u.X) = O.\'Uy · (2. The correlation always is between . so t hat cov(Y.25) are the same as those o f the de nominator. first subtract off their means..2: .\ ' + /L y.27) follows by substituting cov(Y.25). '0 oo' (J. 111e correlation is an alternative m e a~ure of dependence between X and Y that solves the "un its" problem of the covariancc.Y) Correlation. Y) :0. if £( Yl X) = JAy. First suppose th at Y an u X have mea n zero.Lx)] = E(YX). Y) = O. It is n OI necessarily true. E(Y IX) = O. E(YX) = E{£(Y [X)X] = 0 be"".1.25) Because the units of the numerator in Eq uation (2. as prove n in Ap pe n dix 2... If Y and X do not ha ve mean zero. its units arc.oo (2. Equation (2.By th e bwofi tcrated expectations [Eq"". tben the preceding proof applies. Sa id diffe rently.I s curr(X. it is possi ble for the conditional mean of Yto be a function of X but fo r Y and X nonet he less to be uncorrelated. tha t if X and Y are uncorrela ted. (2 . (2. the correlation betw een X and Y is the covari ance be tween X and Y. then the conditional mean of Y given X does oot depend on X.J.(y)(X .X) = 0 into th e de(ini tion of correlation in E quation (2. (2. then cov(Y. 1 (correlfHion inequality).
04 $12.74 11...co nditional o n thc high eM cducatiull nl deeree achie ved (high four conditional d i ~tribut i ons Figure lAc).lOll \\'('lmcn.87 24.... ·? Among workers wit h a <.~ure 2..~ .25 21.t horn the M:lJdI .w!! only a high school diploma.2J Jlt85 11. FullTime Worh rs in 2004 Given Educotion level and Gender (n) Women with hIgh sd100.ce n workers gradua lc ~ ho are co lle~e i!> a high school di plo mathat i".'!>l dCgrl'C earnings differ hcl".11I .' I "'l"lctJ annual!) l llc: d~ltll>ulil'n\ . ')tandard dC\'ialion.:grel: than If Ihc)' Are presentcd in Tahle 2. who h. hig h c r' ra~jngjob I{they get higher education.1ch. mean e. as measured cQlI/il/lled N I I/e.02 19..! Ui3 :t:'i.im Har educntiQn.lrihuilon fo r womcn with only II high school degree ( Fittu rc 2Alt ): Ihe ~n m e ~hifl c. and.26 2d ~5 48.. Fo r both men .\'nate huurlr cllrmnJl. Intcresllngly.m be ~cell Onc wny to n!lswcr [h e~c questions is \0 e \:amine ~e hoo l for the tW O groups of ml:n lFigur... hpo. dlf... if Son.clor<.12 Perc~tUe Standgrd o. does the distribution of earning~ for men a nd women differ? For example.. 1For c\. "'''!!CS.: conditIonal mean of earnings for " omen ('05/ wh~e parents rig ht'! Docs th~' \010 high.dJ~t<lcd tw Ihe number ur hU'l!" . UC th.tmple:. Ih..:: 2Ad :lnd tl collc~e t he di~tri bUli on of e.. )1 d iploma (b) Women w)lh [ouryear colle!!c degree (c) NIl. nnd of the conditional dl~tribUlion') able to !let u heUe r.. the ~pre a d of the distribution of ~ilrn ing. and worken.mlings.tl page TABLE 2. and thc mcan. 'urn u( annuat prt'ta:c.54 17. are thc bes t paid college.25 JX:r htlUr.'I 17.: .... salaries.71 ucgrec .viotion "" .:cd ueUled me n? degree = high ~clf{)ol (lip /Villa. ~nd l>unuSC1.15 S 7.31 35..63 27. E( Fllrmnf:~ j/liglr. ho..'7 l3.:u In Appendl\.4. "'hoclll~ d..26 J4.$ 8.S.}.3• CHAPTER 2 Review 01 Probability S ome parenl:> leU their children that ~kip th~y will bl' F.:n with high 'ichooJ diplon1. erc compul.!o<.1 ! .1 (d) Men wllh fouryea r oollcg.lribution of t hc~e some pcr~nli1cs nre a college de.79 lmedian) '''''' "" $ Iti. $13.l~) 2:t'!.nb. $20. first numeriC column ). female college gradu a[c ~ Gelillcr = Il1e distribu tion of iI\'cragc houri\' ea rnin~~ for (Figure 2At'!) is ~hiftcd to thc right of the dif.4 Summor~s of the Conditional Distribution of Averoge Ho urly Earnings of U..4 .educatc!] W Omen paid ll~ wdl as t he l'k!stpaid co Ueg.!005 rum:nl 1\'f'LIlalJon SUI"ey.2) 15...· degree) and on gender_ These arc shown in ble 2.{/.4. fmwlt'}is SJ3.'lrn ing~ arc higher {or those with degree (Ta diploma or b.
3 Two R ondom Voriable! 37 by the st:munrd deviation. is greater for those with a Another feature of these distributions is tha t the .2.
and leI a.29) var(a . and V be random variables.31 ) (2."\I.36) If X a nd Y arc indepcndcm.33) E(XY) = :5 + .{ Q"~ (corrdalion inequality). variance. f:( y I ) "" I Icorr(X. /)(TXl' cov(a + IlX + cV.u.. variances.. VARIANCES.3 arc derht. then the covariance is zero and the \ariancc of their sum is the sum of their variances: var(X +n = var(X) + var(Y) = if !. ami covarian(:e!.l') = (T X ) + C<Tn"' ( 2.o h iog weighted sums or random \'ariable~ are collected in K ey Concept 2.\ + ell i • (2.:d in Appendix 2. b. H nd c be conSlants.30) (2.3. \'ar(lIX + by) """ (/ 2(rk (2.b Y) = h2(f~.Y ) = uk + fT~ + '2<rn.32) + 2abCT. Y.1.'O' + /rn+. leI Q).and (2. AND COVARIANCES Of SUMS Of RANDOM VARIABLES LeI X. Y) (Fr. 2. y be the covariance betwee n X and Y (and so forlh for the othe r vari· abies).34 ) 1 and '" n ! S V{T. (2.xlt y.. in .. 111C fo llowing f. ar(Y) + 2cov( X .3 iT t + bX + e Y) = (I + h/L.. (1.. . (237) Us(.15 fo llow from the ddini lions of the mean. and covariance: E( tI  . let I1x and be the mean and variance of X . + (T~ (if X and Y ar" independent).'rul expressions for means. plus twice their covariance: var(X + y) .35) The varianct! of the sum of X lwd Y is the sum of their \·arianccs. The rcs u lt~ in Key Concept 2. \'ar( X ) .38 CH APTU 2 Review of Probability KEy CONCEPT MEANS. + .uf'.
Chi·Squared. Some sJ'lCcia l nOlalion aDd tc rminology have been developed for the normal di .cd conci"cly a<. I). Student t.u..As Figure 2. 1.2...\" the familiar bell· shaped prob.I) distribu tion art' often denoted by Z ..9<xr and J.5 SllOV. and the standard normal cumulative distribution fun c tio n is denoted b~ the Gree k leiter <1>: accordingl:v.1. ChiSquared.. Values of the standard nor· mal cumulative dil>tribution function a rc tabula ted in Appendix Table L To compuk pmhabilities for a norma l variable \\ilh" genera l mean and \an ance..1 constanl.900 i! 0. Random variables Ihat have a N(O.d. centered at JA. ". Pr(Z s c) = (fl(e).t ribulion. and variance ~J 2 is symmetric afound ill> mean an d ha . ··N(p.md vari:lOce a ~ is expres!l.:lhili ly density shown in Fi gure 2.f. + 1. I. between JA. .t 'JNJ I' JI .." The standard nnrmlll dist ribution i~ the normal distribu · tion with mean J..5.4 The Normal . 95% of its prubllbility between p.%u.t + l.I.I.. q 2J. (J~ ). 1. it mUSI be st:mdardized by first subtract ing the mean.5 The Normal Probability Density The normal probobility den ~i ly function with mean p. and variance . and F distributions. ·lne specific (ullction defining Ihe normal probability densit y is given in Appendix 17. The area under the I"IOfmol p. Student t. where c hi <. Student t.95 The normal di~tribution is denoted N l. the nor· mal denslI) with mean p.4 The Normal. and F Distributions The prohabililYdislnbUlions mOSI often encounte red in econome trics are the· nor maL c hi ~uared.·S. th en Jividing the re:.ull . JI .900 and p. ond F Oi~tributions 39 FIGURE 2 . The Normal Distribution A eonl inuo ul> Tnndom variab le wit h n no rmal dhtrilJUl ioll h.1 is a bell·lhopcd curve.<)()(f )' 2. The normal d istribution with mea n J. = 0 and variance q2 = I and is denoted N(O.
distri bu ted with a me an of 1 a nd <.] ). ThUs. These ste ps arc summarized in Key Concept 2. Accordingly. The nonna! distribution is symme tric.) /u and (2." presents a n unusual applicatio n of the c umulati ve no nnal distribut ion.1). wha t i s the shaded area in Figure 2. that is. it hA S the sta ndard normal distribution shown in Figure 2. Pr(Y ~('I) = Pr (Z ~ dd = I . ( Y . so its skcwness is zero.t) s ~(2 .J. l llen Y is sta nda rdized by sUbtracting ils me an and dividing by its standard deviation..and Pr(c l s Y s e2) = Pr(d J S Z s d 2) = (I'(tll ) .I varia nce of 4./L }/a..(I'« i l ). (2' and let d 1 = (e1 . suppose Y is distributed N(l . by the standard deviation .4 1) where the value 0. Let CI and C1 denote two numbers with c1 < d" = «('2 . that is'1( Y .1) s }.1) / V4 "" iCY . (Tl).that is.40) 'nle normal cu mulat ive distribution function ell is tabulated in Appendix Ta ble 1.691. by computing Z = (Y . The sa me approach can be applied to co mpUle the probability tha t a normally distri buted random variable exceeds some value o r thaI it fa lls in 11 cen ai n range.• 40 CHAPTER 2 Review 01 Probobility r.I ) is normally distributed with meaLl zcro and va ria nce o ne (see Exercise 2.<t>(dl). Y is nonnall)...4 c:r: in other word.69 1 is taken fr o m Appe lldi xTablc 1.6a'! 1he standa rdized vers io n o f Y is Y minus its mean.5) ~ 0. 4).6b. Y is d ist ributed N(IL.4... "Ibe kurlosis or the nor ma l distribution is 3.38) (2. Then. (2. th e random va riable ~(Y . • KEY CONC' COMPUTING PROBABILITIES INVOLVING NORMAL RANDOM VARIABLES Suppose Y is normally distributed with mean Ii and variance 2. For exa mple.8). The box. Th ~ multivariate normal dis tribution_ l lie normal distribution can be gen . di vided by its s ta nda rd devia tio n. Wha t is the proba bility that Y s 2. Ih at is..1)" \1~ Pr(Z s j) ~ <1>(0. "A Bad D ay o n Wall Slree t. Now Y s 2 is eq ui va le nt 10 i(Y .39) PreY ~ el ) = Pr(Z S el2) = ¢I(d2}. (2. tha t is.L)hr. NY " 2) ~ Prfl (Y .
on in I Figure 2.69 1.x + hp. PrjY s 2) = PrCT:S Pril $ 0. 4) r Y) P rlZ s 0. NU . slondordizing Y is J. and the formula for the general mult ivaria te nonnal p.d. the bivariate normal distribution. I) distribution 0. 1Me probability lOOt '( s 2 is shoo..2)..4 FIGURE 2. if on ly IwO vari ables are being considered. then any linearcombin. Pl'(l:s: 0. and FDislributions 41 Cokulating the Probability that Y:s 2 When Yis Distributed Nfl..0 0.0. 4) distnbution y. ~en use the .we u Illultiv:lrinte nomlul distribution.f.0 2.6b.I.51. is a standard normal (II random . ChiSquared.1lle fomlUl a fo r the bivariate normal p.. 4) To calculate Pr{ Y$ 2). 8ecouse the slondordized random variable.60.. 1. the distribution is called the multi"ariate normal distribution. If X and Y have a bi variate no rmal distribution with covariance (Txy.d.n) ( X . if /I random variables h. aX + bY is distributed N(ap.y. Y is standardized by subtroct ing its mean (~ .2.. a2uJ + b2al + 2abif. from Af:Ipendix T obie I. OT. . The multivariate normal dist ribution has thrce important p roperties.f. Siudenl t. I) ..691 .42) Mo re generally.N(o.5) 0.0 (a) ~~1. Y bivariate normal) (2 . tondord normol distribu tion tobia. is given in Appendix 17. standordize Y. is given in Appendix 18. then aX + bY has the normal distribution .'llion of these variables (such as their sum) is normally distrihUied.5) .1.5 (b) 1\'(0..arioble. and if a and b are two constants.own in Figure 2.1) and dividing by Pr(Y s 2) its standard deviation (17 .6 The Normal. and the corresponding probobility one.
whe re {h er e Are . the ~tan d a rd uc \'iation o f daily pcrccnl agl' Yuu will not find thi ~ va lu~ In Appendix Ta hle 1. i< ·2!l ~19. drop eM\ be seen in Figure L 7.!!flli \ c return of Z2( '" 2. 1') .''''~n more:.»<b loll 25.. or more thon 22 siondord deviotion. This probabililY is 104 x 10 w' .OOO ..~ 1"11" 1'1'11 . 1987"BIock Moodoy" iho . so lhe drop of you can cakulal(' it u ~if1g a computer (Ir). On October 19.)(ltbty. On "Black M(.42 CHAPTER 2 Review of probabili ty O n iI typica l d tl~ the overall \ alue of stock:. 1987.Ired In what happened on \.1 6% .hungcs nrc normally distributed. 10 OC loher 16.6% wa ~ fl ne.7 Duri ng Daily Percentage Changes in the Dow Jones Industrial Average in the 1980. 19110.6fL 16) stall t:rm lillll l'd FIGURE 2. bUl price changes on the Dow wtl ~ 1.hul nothing com r.. Y"8 r ... ().'i . t'lS{.:" 510d. market cun rise o r fall hy I % or t.6%.."1\0131 o f l06leTO!>! 25. This is 11 lot. Pc: rce n{ c han gt It) the 1980s. . If da ily pa ccnlage price (.: probabili ty of 11 drop of al k usl 12 standard devin lic)nS IS Pr( t $ .6%! From January I. then th.1987 ·1 ~ " L~c~c~c~~~ I __LLL__L" 1')kU 1')111 I<.7 I '/). . .1 1'~ :. t mded on the 0.hc Oow l o nes Industrial An:mg.. The enormi ty (If Ihi.1onday. dard dC\'ialions. it!).21) '1 '( 22 ). 19H7.'· .IK ~ l"iU 1%. 1')).: (an average of 30 large mdustrial stocks) fell by 25. October 19. returns on the D o w during the 1980s. . thA I is. the overage percentoge doily change 01 "!he Dow" index was 0.05'l ond its standard devioliQn WQS 1 16%. a plot o{ the clail.00014.
tion (2. The ehi·squllred distributio n is th e distribution of the sum o f m squ a red ind e pendent !Standard nOlTtl al ran dom varia bles.2. For this reaso n. For example. or about 5 X 10 1.tion 2. In fact. so the probabilit). ChiSquared. Studenl/. if X and Y h a ve a biva ria te norm al di stribution a nd a xy = 0. The ChiSquared Distribution Thc chisquared d istributio n is used when testing certajn types o f hypo theses in statistics and economelrics. Thu s. This distribution depe nds on m . In Se!.sn the prob· changes than the normal distrib ulion would suggest.I)' thu n o thers. This resull. ~conds. and F Oi$tribution$ 43 How small i~ 1.1 X 10 un. or 2 X Ahhough Wall Slrecl did have a bad day.. use economet ric models in which the \'8r. • Tht: universe is believed to have existed for 15 hil lion years. stock price per centage changes hove II distribution wilh heavier tails than th. U XY =: O.lhe fact tha t il happened at aU suggests tnat its probability was mon: than 1.42) by se tting a = 1 and b = O Third.:ginning of time is 2 x 10. 1l1en + + Zi has a ch i·squared d ist ribution \\. lhen the ma rg inal d islri bu lio n of each o f the va ria bles is no rma l (this follows fro m Equa J. The probabili ty of choosing one at random is 10. Ihen the var iabJes are ind epende n\. there <ITt! more days wit h large positive or large negative to lO.that zero covaria nce imp lies indepe nde nce is a special prope rt y o f the m ult ivariate n o r m<l l dis tribution that is no t tr ue in general. • The re are app roximately 1043 molecules of gas ill the first kilomete r above Ih t! earths surface. financc profe~siollllls ability of choosing n particular second at random from all the !.. the n the con verse is a lso true. TIle name (or this zi Z 1 . Z z.tconds since the hl. so some periods have higher "olatH.4 x 10 101 . of winning a random Jonery among att living people . a nd Zl be ind e pende nt sta ndard norma l ra ndom variables. th e n X and Ya re in de p en de nt.: normal dis tribution: in ot her words. let 2 1.3 il was sta ted that if X and Ya re inde pe nde m then. which is ca lle d Ihe d egrees of freedom o f the c hisq uared d istribut ion. If X a nd Y a re jointly no rma lly distribute d . if a se t or variables has a mullivaria le normal di:\tri bution.13• Second .18.81\Ce of the l>erccntagc change in siock prices can cvoh'e over time. if va ri a bles with a m u ltivariate n o rmal d istri b ut io n have covariances tha t equ a l ze ro .!O about o ne in 6 billion.? Consider the fo!lowing: • The world popula tion b ahout 6 billion . rega rd less o f the ir jo int disu ib utio n.'i lh 3 degrees of freedom. The Normol. These models with changing variances fi re morc consistent with the very had:md "cry gooddays we aemall)' see on Wall Street.
I" .'I( Table 2.let W be a chi·squared random variable with m degrees of freedolll and let V be a c bi· ~quared random variable with 11 degrees of freedom. __ _ _.quared ran I... distrihu tion. the de nominator random variable V is the mean of infinitely many chic..95. The F Distribution The F distribution with m and 1/ degrees o f freedom. it i::. w_ ~" ____ _t ___•• _ _ . divided by 11 .thal is. The Student (di ~lribution depends on tbe degr. x..XI) = 0.. . to an indcpe ndentl)' d i~ lri b ut cd chi·squared random variable with degrees of fre edom 11 . . all important special case of the F distribution arises whe n the de nomimHor degrees of freedom is large enough that the FM II dis tribution can be approximaled hy the /'~' . Thus the 95 111 pcfcentill: of the 1m distribution de pends o n the degrees of freedo m m.44 CHAPTU! '2 Review of Probability distribution derives from the Gree k leiter used 10 denote il : a chisquared distri bution with 111 degrees of freedom is denoted x... The Student t Distribution The Student I distribution wi th m degrees of freedom is defined to be the distri· bution of the ral io of a standard nonnat random variable. The n the random variable z rv'iv~ has a Student {dis tribution (al so called the (distribution ) with III degrees o f frl. let Z be a stamlard normal random varia hie. '. To ~late this mathematicaUy. a nd the I.". In stati"tics and econometrics..."..81. Selected p!. .that is.!rcent iles of the Sru dent I distribution are given in Appendi.i._ _ .!cdom.. I _ _ _ • .. For example. where Wand \' are inde pendently di~Mibuted. . . Then has an I~Jln distribution...so Pr(Zr + Z~ + 21 :5 7. divided by the square root o f a n imlcpcnde ntly distributed ch i·squa red ra ndom variable with m degrees of freedom divided by m .rl d is tribution is 7. _ __ _ :~ .._ • . distribution equals th e 'itandard normal distribution. the Student 1 dist ribution is well approxima ted by the standard normal dislribUl ion. and Jl:\ Z a nd W be independe ntly distribu ted.. divided by m. b UI when III is ~mall (20 o r less) il has more mass in the laih. lei W be a random variable with a chisquared distribu ti on wi th //I degrees of frce· dom.. ._ _: . L .. That is...... The St u· dent I distribu tion has a hell shape similar to that of the nonnal distri but ion. an F distribution with numera tor degrees of lIeedom m and denominator degrces of freedom II. is defined to be the distrihution of the ratio of a chi sq uared random variable witb degrees of freedom m. Wben m is 30 or more.:cs of freedom til . denoted F". Appendix Table 3 shows Ihallhc 951h percentile of the ./' Selected percentiles oCthe distribu tion arc given in Appendix Table 3. _ •. In this limiti ng c<I!'e. This d istributioo is denoted (". a " faller" bell shape than Ihe nonnal.
60.9(I distribution is 2.w:.tribution of the l>3mple averag\!." limit of 2. Because the sample avcrage is a random . divided by the degrees of free dom.ampling and the di:.l rihutiOll S of averages that are u:!ied throughout the book. and the 95 .the 95 .60.lily com muting time has the cumuiali . the 95 th percentile of the F~.'ariable is I (see Exercise 2. Random Sampling Simple random sampling.that is. divided by m: Wl m is dbtribuled F"..1 aspires 10 be a ~tal i stieian and decides to record her commuting times on vilrious days. She .1mple average itsel f a random variable.>t.: begin by discu s~ing random !!:"llllpling. This section concludes with some properties of the sampling di:. the 95 ' " percentile of the P. randomly drawing a s~lmplc from a larger population.24}:Ibus the F.. Suppose our commuting studcnt from Section 2. "hich is the same as the 95 111 percentile of the . WI.KI (from Appendix Table 2).5 Random Sampling and the Distribution of the Sample Average Almost arr the ~ I atistieal and econometric procedures used in th is hook involve ave rages or weighted uverages o f a :. distribution is the distri bUlion of a chisq uared random .. :. which is:1 (7$113 = 2. . The 9()lh. As the de nominator degrees of freedom 1/ increa~es.. 95111 • and 99 111 percentiles of the Fm •II distribUlion arc gi\'en in Appen dix Table 5 for se lected values of In and n. This section introduces soml! basic concepts about random ." dis tribution tend..11 per centile of the Fh.. which is called its sampling dbtribulion.elects these days al random from the school year. For example. the value of the commuting time on one of these randomly selected day:. it has a prohability distrihution. beca use the days werc selected al rantJom.ample of del la.71. an cssenlial step towanj understanding the performallce of econometric procedures.. the "alues of the commuting time on each of the dirfcrent days arc independcntly distributed random vllriahies. c distribution function in Figure 2. dis tribution is 2..92. 7. ides no infonnation about the commuting time on nnut her of the days: tbat is.has thl! effect of making the 5. For example.S Random Sampling and the Di~tribvlion of the Sample Averoge 4S random . 2. pro. knowing..11 percentile of the F!O. to the Fl.Ind her d. Bec.c these days were selected at random.'ariable with In degrees of freedom. The <let of random sampling. ariab1c.(0). Characterizing the distribu· tions of sample averages therefore i.2a. .2.rl distribution. from Appendix Table 4.1O distrihmion is 2.
th is margin al distrib ution is the distribution of Y in the popu lation being sampled . is random . . Y 2 is the second observation. . Because the s. .. " . Y" are rando m. If dif fe rent membe rs of the populati on are chosen. Y" are said T O be iden tically distributed . the values o f the observatio ns Y1' •• • • YII are themselves random.. Y" are randomly drawn iTom the same popu la tioD. Simple random sampling and Li. f1 . . . Y. Had a different sample bee n drawn.i.!lmplc was drawn at random.".. their average is random. they are sa id to be independently and identically di~1 rihut e d . . YII is Y ==  I (YI n + Y1 + .... thei r values of Y will differ. has the same marginal distribution for i "" 1. .. Be<:a use YI . Y" can be trea ted as ra ndom vari abl es. . ...d. + I y~) = If " L Yi' . then Y b .elected days and Yj is the commuting time o n the ph of her randomly seJected days. i.. . Y L•••• . then the obM! rvotio lls and .. under simple rantlom sampling. in which n objects are selected at random from a populalion ( the population of commut ing days) and ~ach member or the population (each day) is equally likely to be included in the sample. the value of each Y. or U.43) An essential concept is that the act of dra wing a random sample has th e eUeet of making lhe sample average Y a random variable. . In the co mmut ing example. knowing the value of Y1 provides 00 infor mation abollt Y 2• so the conditional dis tribut io n or Y2 given Y. U ode r sim ple rando m sampling. Because the members of the popu la tion induded in the sa mple are selected at ra ndom. . .5. draws ure summarized in Key Concept 2.11m&.. When Y. o f the n observations Y. . the act of random sam pling means thar Y1' . .d. Y n can take on many possible vJ lues:a h er they are sampled. called simple r~ndom sampling.d. Y I is the commuting time on the firs t of her II randomly :. Y". . YI!' When YI .. Y" are drawn from the same dLstributi on and are indepe ndently dislri buted . Be fo re they are sampled. a nd so forth . .._1 (2. Because YI . the margina l distri bution of Yi is the same fo r each j == I. . . Yt is dist ributed independe ntly of Y 1. " . . draws... The n observations in the sample are denoted Y. The Sampling Distribution of the Sample Average The sample average.. a specific val ue is recorded [or each observa tion. is the same as the margin al d istri bution of Y"2' In o ther words.46 CHAPTER 2 Review 01 ProlxIbility The si lUarion described in the previous paragraph is an example of the sim plest sampling schem e used in statistics. where Y1 is the first obse rvat io n.
• Y..she would have recorded five difrereot tim esand thus would ba ve computed a dilTerenl value of Ihe sam ple average. D.. and Y. I " E(Y) ~ . so [by applying Equa tion (2. then compUled th~ average of those five limes.Ld. Suppose Illal fhe observHlions Y1•."ussion of th e samplin g distri butio n of Y by computing its mean and va riance under general conditions on the population distributioCl o f Y. RANDOM VARIABLES Jo a simple random samp le. .• Y" are i. .. YJ) O.• and let ~y and (T~ denote Ihe mean and va riance of Y. var(Y\ + Y l ) = 2(11'.i.)J = x 211y = }J. Thus the mean of t he sample ave rage is El~(YI . the random variables Yt .).28): E(Y 1 + Yi> = ~y + ~y = 2~y. because l't . 11 and Y1 is distributed independen tly of Y2"'" YII a nd so forth."tb ra ndoml).. is the sa me fo r aU i . .. We start our dis(.5 their sample average would have been differe nt the vulue o f Y differs from one randomly drawn sample to the next. j. The sampling distribution of averages and weight ed averages pla ys a central role in statistics and econometrics. ..31) with a = b = ~ and co\'( Y "Y2) = OJ. Y" are i.. + Y2 is given by applying Equation (2. t he mean of the sum Y .LE(Y.d . •n).S R oodom Sampliog ond!he [J. Whe n n = 2. Y".1xmon 01 !he Sample A"'"'9" 47 L ..d. drawn objec t is denoted Y. For example. ' nlU S.37). so cov( 1'. y.44) The variance of Y is found by applying Eq uation (2. Tht: di ~iribu t i on o f Y is called the sampling distribution of Y.i.). bccause if is the pwbability distributio n asso ciated wit h possible va lues of Y that could be computed fo r different possi ble sam· pIes Y1 • . 2...1 popu la tion and each object is equally li kely to be drawn .". n objects are drawn al random from . the mean and variance is the !oa me for all i = 1•.i. .. Because each object is equally li kely to be drawn a nd the distri bution of Y. var(¥) = ~(]"f.• In general. tha t isthe dislT i· bution of Yi is the sa me for all i = I.d .d "y. I 2. .. For general/I. Beca use Yis nmdo m. nrc independen tly distributed for i .•• •• Y" are independe ntly a nd idcnlically distributed (i. Had she chosen fi ve different d ays.. il has a probabilit y distributio n..suppose OUT student commuter selected £iye days al random to record her commute times. for n = 2. For example. .  t . Mea n and variance 0/ YO . 1 KEY CONCEPT SIMPLE RANDOM SAMPLING AND IJ.. (beca use Ihl! o bse rva tions are i..) = n ". (2. The va lue of the random variable Y for the ..
and the standard deviation of Yare £(Y) ~ \'<lr( Y ) 0:= IL" . tn cont ra~tot7 i· is the variance of each individual Y.i.dcv(Y) y c u. what the ~:." U and sld. m pIc averagf! Y .so it i~ important to kno". this mean s that.. the variance. .Lt! . .. such a~ the norma l distribution.!. The notation (T~ denotes the va riance of the sampling dhtribution of the ~.4 7). the su m of 1/ nonnally distribull'd random variables is itse lf nor mally diqrihuted. q r denotes Iht: !l. in a mathematical sen'ic.L L '.46) 0 (Tv = • ul.6 LargeSample Approximations to Sampling Distributions Sampling di~ t ributions pia) a central role in the development of statisti!..4S) These rc~ult ~ hold whatever thl. The "\:xac! 'Oappro. is: that i~. draws from the N(Pyoq~) distribution . Suppose thai Y lo. 2. Y" are i. In summary.42) .48 CHAPTER 2 Review of Probability var(Y ) "" var ( I" 1 " ~ . if Y 1• • •• . .) C2 . L .d.) + 11.tandard deviation of the sampling distribution of Y.. (2. Because the mean of"f is Jlr and the variance ofY is (T~Ill.u == . vii ' (lA7) (2. the mean. and (2.'C Y. j"' The standard deviation of Y is the MJuarc root of the variance. I. drawn. = .I LY. . (2 .. dra\\s from the N(Jl).48) 10 hold. for Equa· tions (2. d<xs not need to take on a specific form. As Slated fo llowing Equation (2.~5) LJ 1/. . ".act·· approach :lnd an ··upproximnte·· approach.'1ch entai ls deri\ing a formula for the . the lli ~lribulion of Y.Impling distribution o f Y is.46). (T). Similarly. Umt i~ the vari ance of the population di ~tribUlion from whkh the observation i.}! and econometric procedures.) " L .amplin!: dl'lrihutinn . O'l' )' then Y is distributed N(. " cov(Yjo Y. Sampling distribution ofY when Y is normally distributed.. u~/1!)..uy. There arc two approachcs to charnctcri711lg "'iltllpli ng distribution!': an ··c:.' Jistri bulion of Y.Iv'ii.. Y" are i.
z..y)/Uy.2.· . i = I. For example. druws of a Be rnoulli random . Y will b~ close to p. these approximations ca n be 'cry accurate even if thc sampk.si. (¥ .5) the eXllct distribution of Y is normal wi th mean p. in which she simply records whe1he r her commute was sho rt (less than 20 minut es) or long. II are i.y) / a r does 1101 de pend on the dist ribution of Y.e is o nly " .•• • Y" are i. the asympto tic disLri bu tions are si mple. .i . Y will be ncar /..30 observations. these asym pto tic distributions ca n be counted o n to provide very good approximations to the exact sampling distribution. Because she used simple random sampling. consider a simplified ve rsion of o ur student com muter's expe r iment. The "app roxim ate " approach uses approximat ions to the sampling distribu tion that rely on the sa mple size being large." When a large number of random variables with the same mean are averaged toge ther.. For example.jJ.l )' with \cry high probability when II is large. This is someti mes called the '"law of averages. The Law of Large Numbers and Consistency The law of large numbers states (hat. if the distribution of Y is not nor mal . Thus. when the sample size is large.d .d. Y I .iJ. Unfortunately. under general conditions.·. As we see in this section. Moreover. Y" arc i. . then in general the exaet sampling distri bution of Y is very complic<Hed and depe nds on the di:mibution of Y. This section presents the two key tools used to approximate sampling distri· butio ns whe n the sa mple size is large. This normal approximate d istribution provides enormo us sim plifications and unde rlies the theory of regression used throughout this book. .i . Le t Yi equ al 1 if her commute was ~ h o r( on the j Ib ra ndomly selected day and equal 0 if it was long.y and variance u~ 1 n. and Y I • . The law of large numbers says that.0 lorgeSomple Approximations !o Sompling Distributions 49 descri bes the distribution of Y for any" is c'llIed the exad distribution or finite sample distribution of Y.d. is approximately normal. the large values balance the small values and their sample aver age is close to their common mean. The large !<ample approximaiion to the sampling distribution is oflen called the asymptotic distributioo. when the sample size is large. (he law of large numbers and the central lim it theore m. The central limit theorem says that.remarkably the asymptotic normal distribution of (¥ . then (as d iscussed in Section 2.. the sampling distribution of the standardi zed sa m· pi e average.y with very high probability.."asym ptotjc" because the approximations become exact in the limi t that /I ~ ce. Y.i. A lthough exacL sa mpling distributions are complicll ted and depend on the dis· tribuLion of Y. if Y is normally distributed. Because sa mple sizes used in practice in economet· rics typically number lDthe hundreds or thousands.
consiste nc) (sec K c~ Con cept 2.2) the proba bility Ihal Y. as short. more concl!lch.l.' ). = 1 is O. the n Y ~ J.y . 'Ille propen)' tbal Y is near J1..C 10 Pol + c bccQlm:'s ~ ~ 2. equivalently. Fo r example.: ~ampk a\oerage \. i = 1.r.) "" /L y ::: 0. where (from Table 2. O. The C"Onditions for the law of large numbers that we will USe in this book are that Y.) = if y < '". y (or. one was short. AND THE LAw Of LARGE NlIMRFR<. is rinite. where the law of IUfi!e numbers is pruve n. bccauM! the re is an upper limit \0 our sludent's commuting time (~he could park and walk if the trumc is dreadful).c 10 the Irue proportion in Ihe popuJalion. under cerlain condilions.. The a~sumption that t he varianc. 11 arc Li. ere "hurt ).d. .) if the p robability thai Y is in the fange lJ. un more values and the sampling distrihution bl!comes tightly centered o n }Ly..trihution of commuting times is finile. CONCEPT CONVERGENCE IN PROBABILITY. . When II = 2 (figure 2.::pt:nden tly and idc niitillly dist ri huted with £ ( Vi) = jJ. th:'II Y is consistent for J1."ould be unreliable... <lnd hoth ..2.I.7K As II increases..~. CONSISTENCY. . these large value~ cuuld dominatc Y and th.8 sho\\·s the sam pling distribution oCY for various sample sizes n . assumption holds. Thb l~ wri llc n as y 4 Ilr.that is. equi\alcnlly.afe unlikely and observed infre quen ll~ : otherwise.d).. This assumption is plausible for the applications in this book .. <lnd that the variance of Y. then the Ll.6 ! a rbitrarily close to one as " increases for any constant c > O... Y con veri!l!s in probability to J1. (r~. cally if \'ar( V.h . none of which b parlicular!) c10.78.1f the data arc collected by simple random sampli ng. .. d. out lieni. Y can take on on ly threc val ues: O.7R. The law of large numbers Slall!s Ihal. 111C law of large n umbers says t hat if Y I' i .. Y is con sish:nl (o r J. . . Figure 2.8a).50 CHAPTER 2 Review of Probobility L ". Beca use thr: expectat ion of a Bernoulli ra nd om variab l~ is its success probability.6). .l't II increases is called cOII\'c rgc nce in pro bubi lit) or.: is finit!! sa ys that e"tremc1y large \·alues of Y... The sample avcnlge Y is the fract ion of d A YS in her sample in which her commu te was shurt. howc ver (fig ures V'. Y takc::. nlC math(' rnatieal role uf these condi tions is made dear in Section 17. \ariable.y wilh increasing prohnhilit~ . the variance or the dl.. '!lIe sampk ave rage Y con\'crgcs io probability to jJ. E( Y. 11 a rc ind. <lnd I (neit he r co m mute . y and if large o utlie rs Il T e unli kely (ICChlli .I).) or.)" .
7 P robabili lY '" "4 p " O. .'rage (b ) II 1 .. so the sampling dislribuhon becomes more tighrly concen· 'rated around in mean p.5<J 11. . 'it' (") 7'> 1." ". l)7~ J.). l) . '"' 0.5 1 'Xl flO 02'i O'jj.2.' ( 011 1 1 .I)(. .75 1.\ .II 112 0 25 OS" U. 11 1 .6 lorge·Sample ApproKimolions 10 Sampling Distributions 5I fiG URE 2. 07S .1..78 os the somp/e ~ze n increases . (1 . .1" . the sample average 01 n independent Bernoulli random YOriabies with p Pr(Y. Val ue of sam ple average (d) 11"= tuO The di~tr..0U \ "J .IM' Value of sam ple average (C)II :!. "..1 .bution5 are the $Ompliog di~butions of Y.i Va lue o f sam ple average (a) II Va lue o f sa m p le a\.3 '1'. ~ 1) "" 0..5 ProbabililY Probabili ty o J25 /1 ::: II Ii 'i2:.78 1i. The yorionce of the sam· piing distribution 01 Y cleaeoses a s n gets larger. .78 (the probability of 0 short commute is 78'1. Uti 25 (. 1 .. (!. JI . iI.8 Sampling Distribution of the Sample Average of n Bernoulli Random Variables Probability "..
respectively. One way to do this is 10 slanda rd i%e Y by su btracting it:i mea n a nd dividing by its s tandanJ deviation. Ih is distribution should be well approxi mated by a N(O. so lhat it has a meltn of 0 and 'l variance of 1. After this change of scale. it is easy to sec that. TI}is point is illustrated in Figure 2. The d istribution of the standardized average (Y . 1) distribution when If is large. The sampling distribution of Y. At one extremc. u?). normal approxi ma tion can be see n (a bit) in Figurc 2. According to the central limit the orem.Althougb the sampling distributjon is approaching the bell shape for /I .5.1his leads to examining the distribution of the standardiLed version of Y. and 100. when n is large the d istribution of Y is approximillely N(p. that is qu ite different from the Bernoulli dis tri but ion .8. rrj. and d for n = 5.8: the distributions in Figure 2.y.9 fo r the distributions in Figure 2. under gener al conditions.. how large mUst If be fo r the distribution of Y to be approximately normal? rne answer is '" it dcpend~"The quality of the normal approximation depend. This distribu tio n hus a long right lail (it is "skewe d" to the right).p. C.. themselves have a distribution that is far from normal .9 are e xacliy the same as in Figure 2. is shown in Figures 2. 25. when the underlying Y.25.})/u y . this requires some squ inting. The central limit theorem 5<1)'S that this same resuh is approximlltely true when II is large even if Y1•• • •• Y" are not themselves normally distributed. 103.52 CHAPTER 2 Review 01 Probability The Central Limit Theorem The centrallimit the orem says tha I.}. the normal appro ximation still has not iceable imperfections. Recall that the mean of Y is p. It wou ld be easie r to see the shape of th e d istribution of Y if you used a magnifying glas!> or hud :some other way to zoom in Of to cxpand the horizontal axis of the figure. lOb. the distribution of Y is we ll approximated by a normal distributio n.. then Y is exactly normally distributed for all II . The convergence of the distribu tion of Y to the bellshaped. As dis cussed ilt the end of Section 2. However. if 11 is large enough. 0:= at> One might ask. A ccording to the ccntral limit theorem. the distribution of Y is well approx imaled by a normal distribution when fI i~ large. if the Y. in contrast. ::0 . arc themselves nonnally distributed. that make up the average. after centeri ng a nd scali ng. e xcepl th at the scale of the horizontal axis is changed so that the standardized vuriable has a mean of 0 and a variance of I.}.).fJ. when the sa mple is drawn from a population with the normal distribut ion N (p. the distribution of Y is f'xlIclly N(p. how lmge is " Iarge e nough" ? Th at is.10 for a popu la tio n distribution. y)/U y is ploued in fig ure 2.y a nd its variance is u~ u }ln. (Y . shown in Figure 2. because the distributioo gets quite tight for large n. then Ihis approximation can require" 30 or even more. on the distribution of the underly ing Y.8.
.J(.. the $OIfIpli"9 di1tributions are increasingly well approximated by the normal dhtributioo {the solid I.< ~ " .8 is plolted here otter slondardizing Y. .ndardiu..l on .!ll . The normal distribution is Kaleel $0 that the hei9ht of the di:ltribution5 i5 opproximotely the 5ClmG in 011 figures.7 .0 III lUi " I ..5 " 0.3 02 01 U.l00 tel /I . .5tribt.i1 nnJ..11 .2 ..U . 3 .78 Probability (I. 30 .lO Su.8 and magnifies the scole on the horizontal axis by a Iodot of Vii. U. This centef"51he d. .3.1.. When the sample size 11 Iorije.· tions in Figure 2.. c.6 LorgeSomple Approximations to Sompling Di~bulio.\ "' ('1 0.u 53 FIGURf 2.ample avt rage (d) n .d va lul! or u mpll! average (a) .2. .) P robability 0.. 05 predicted by the cenlmllimit theorem..  Staodardi t:ed va lue or "'rnple average (b ) n " 5 Prob~b iU ry 2 P ro babiU ry 01 Standa rdized value o f sa mple average Standa rdin d val ue or .oe).11 011 " 10 2 11 .9 Oi$tribution of the Standardized Sample Average of n Bernoulli Random Variables with P = 0. (I 2.!5 The 5CJfTIpling dj ~tribution of Y in Figure 2.
\i) <l.ibution.. I) 3( \ Sta ndardized value of sa mple average (a) 01 = 1 Stand ard ized Vol lue o f sa m ple :lvera ge (b) . Probability 0.. ~2 j Stnnd ardi zed val ue of sarllple ave rage (d ) II =.3.0 Standardized value of sam ple a~'erage (c) . like the population dip..lll lI . as ptedided by the ceotrat limit theorem. Bvt when n is large (n = 100). =:.YI 0. 5).100. is skewed.ibu\ioo (~id tine). .201 n."'1 P robability (U~ t'I.1:: Probabili ty 01" I UIIl 0(1\ ~ . the sampling di~bution is well approximated by a stondord oormot dip. the sampling distribu· tion.54 CHAPTER:Z Review of Probability FIGURE 2.Inn The figur~ show the !oOmpiing distribution 01 the sta ndardized sample overoge 01 n drows from the skewed (asymmetric) population distriootion shown in Figure 2. When n is smo!! In . 10 Distribution of the Stondordized Sample Average of n Draws from c Skewed Distribution P ro ba b ilit y 11. The nor mal distribution is KOIed so that the height of the dislriburions is opproximately the some in 011 figvrM. r ~ 1 '4'T~~~ " ...
combmed with its wide applicability because of the central limit theorem.) = 0'1. The joint probahilitics for two random variables X and Yare summ ariLcd by t heir joint prObability distribution. The va ria nce of Y is CT~ = E(Y . 2.7. with £(Y/) = p. 3.. The ccn trullimit theorem is summaril. . 2. makes it a key underpinning of modern applied econometrics.i. whe re o< . for" 2:: 100 the normni approximation 10 th~ distribution ofY typical!} i~ vcr)' good for a wide variet).llion is quite good. the probabi lity distribution fun ction (for discrete random variables) . .vl KEY CONe". To calculat e a probilbility associo t('d with a normal random variable.10 a TC complic<t led and q uile dirrl'TC ni from each o ther.. •. LTHE CENTRAL LIMIT THEOREM Su ppose that Y. amazingly. ha ve a simila r s hape. howc\'er. As II + 0:: .. B ccau~c th e distribution of Y approachcs the !lumMI as 1/ grows large. the distrihuti on of (Y .). The conditional probabilit y distribution or Y given X .lhe norma l approxim.9 and 2. Summary L The probabili ties wit h whic h a random variable takes o n diffe re nt values are summarized by the CUlllubtivc distribution Iunction.cd in Key Con cept 2. is its probabil ityweighted average value." di stributions in Figures 2. 4...1 OU are simple and. In fact.rSummary 55 ~ ' f¥4i%JIlJtp..7~ <?C.}Jy) /u y (where = a} /lI) \lCcomes ilTbitrarily weH lIpproximated by the standard normal distribution. conditional on X taking on the value x. A Tlormn ll y dist ri but ed ra ndom varia hie has the bcll lo hnped probability dc n ~ity in l::"jgurc 2. denoted £( Y). Th. JA. of popula lion distributio n ~ l llC cc nlrallimil theorem is a remark a ble res ult While the "small n" distrib utio ns of Y in parts band c of Figures 2. Y.7 (!? By II = 100. )' and va r( Y. arc i...' com'eniencc! of the normal approximation.. and the sta ndard deviatio n o f Y is the sq ua re root of its variance.d. Y is said to he asyml)loti ca ll~' norma II)' distributed .. or is the probabil ity dist ribu tion of Y. The expcctl!d value of a random variable Y ( also called its mean.lLy)1J. 9d and 2. the "Iarge .5 . . and the probabllity density function (for con tinuous random variables).
l) di stribution] when II is large.i.ing produces n random observations Y•.d. 6.py)lu y . .i. .ly: c. ... dislribulion (44) FdiSlribulio n (44) .d.f.l n. a re i. 5. varies from one rando mly chosen sample to the next and thus is a rando m variable with a sa mpling distributio n. .d. Y. then use the standard norma l cumulative distribu tion tabula ted in Appendix Table 1. • Y". th e central limit theorem ~ays thai the standardilcd versio n of Y.prEII: 2 Review c:J Probability fi rst standardize the variable. Key Terms o utcomes (18) probability (1 8) sa mple space (19) event (19) discre te random variable (19) continuous rando m variable (19) probability distributio n (19) c umulative pro bability distribulion (19) cumula tive distribution fund ion (c. If Y I •. ). b.5. Simple random sampJ. has a standard normal distribution [N(O. Y" thai are inde pendently and iiJeolically distributed (i. .) (21) de nsit y function (21 ) density (2 expected va lue (23) expectation (23) mean (23) varia nce (24) standard devia tio n (24) mo me nts of a distributio n (27) skewness (27) correlation (35) uncorrelaled (35) normal distributio n ( 39) n standard normal distribution (39) standardi ze a variable (39) muhiva ri ate normal disuibution (4 1) bivariate normal distributio n (41 ) chisq uared distribution (43) Studenl. (Y .d. the sa mpling dis tribution of Y has mean iL y a nd variance and ur = «1. CH. The sam ple ave rage.) kUrlQsis (27) oUl lier (27) le plo kurlic (28) joint probability dislribution (29) marginal probability distribution (30) conditio nal diSlribution (30) conditional expecl3tion (32) condi tio nal mean (32) law of iterated expecta tio ns (32) conditional varia nce (33) independence (34) covariance (34) (20) Be rno ulli rand om variable (20) "B ernoulli disuibution (20) pro ba bi lity de nsity funct ion (p.f. then: a. the law of large numbers says tha t Y converges in probability to J.
. You wa nt to calcula te Pr(Y $ 0.) (46) sa mpling distribution (47) exact (finitesa mple) distribution (49) asymptotic distribution (49) law of large numbers (49) convergence in probabilit y (50) consistency (50) cenlIallimit theorem (52) asympto tic nonnal distribution (55) 57 Review the Concepts 2.. 4) distribu· tioo .e rela tionsbip between your answer and the law of large numbers? 2.s Suppose that Y 1• •• •• Y" are Ll. Explain why knowing the value of X tells you nothing about the \'alue o f Y.6 Suppose that YI ..i. is a random variable... Would it be reasonable to use tbe normal approximatio n if II . . 2. Ske tch a hypothetical prObabilit y distribution of Y. What i ~ th. (b) the number of times a computer crashes. Z..Review of (oncepls simple random sam pling (46) popul ation (46) identically distributed (46) independently and idenlicaJly distributed (.l\'c some la rge ou tliers.2 Suppose (hat the ra ndom variables X and Yare independent and you know the ir distributio ns.i.. skewness = 0. and the mean studen t weight is 145 lbs. random variables with a N(t .d.3 Suppose that X denotes the amount o f rainfall in your hometown during a given month and Y denotes the number of children boro in Los Angeles dur ing tht: same month. A random sample of 4 students is selected from the class and their a\'eJ age weight is calculated. 2. 2.7 Y is a random yariable with k "" 0. Y" are i. In words. Sketch the probability density ofY whe n /I = 2. describe bow the densities differ..1).1 Examples of random variables used in this chapter included: (a) the gender of the next person you meet.. .? Why or why not? Use this exa mple to explain why the sample average. A re X and Y independen t? Explain. Explain why n ran dom yariables drawn from Ihis distribution might h.d. and kurtosis = 100. Explain why each can be thought of as random . . Y. (Iy = I. 10a. random ya riables with the probabilit y dis· tributio n given in Figure 2. Will the average weigJ1t of [he studeuts in the sam ple equal 145 Ibs. Repeat this for II = 10 and n . 2. 51 Whal about II = 25 or n = l 00? Explain. (e) the time it takes to commute to school . and (e) whether it is raining or no1.d. (d) whether the computer you are assigned in the library is new or old.4 An econometrics class has 80 students. 100.
3 U~ing.3 . De rive th .hl 0. Compute tbe me a n. c. cumula tive probability dist ribution of Y.6 11le following table give:. ( Him: You migh t find il he lpfuJ 10 use Ihe fo rmulas give n in E xercise 2.4 Suppose X is a Be rno ulli random vari ahle wit h P( X a.1) Total 0.000 . based on the 1990 U. V) .5 In 5e pte mbc r. Wha t is tbe mea n. Shnw E(X~) == p .1l5<J 1. D e ri ve the probabilit y distribut ion of Y. 1990 Unemployed IY" 0) Noncollcg<: grm. Y ). n.24t 0. (b) (T~. I) . 2. and (r ~: and (c) (fll' V and corr( W. status a nd college graduation a mong Ihose ei ther e mployed or look ing for \\ork (u nemployed) in the working age U. a nd vari ance in °C? 2. h. Derive the mean and variance of Y. . st:l nd a rd devia tio n.S.005 0. b.( ~5 EmploYK IY.21. consider two new ran dom variables W == 3 . a nd kur· IOsis o f X. oX and " == 20 .. Ce nsus.1 Let Y denote the number or" heads" tbat occur when IwO coins tIfC tossed. ~ 2.S. the joint probability d istribut io n be tween empluy me". Suppose that p == 0.ls ( X .146 0..S.s k ew nes~. va r i a n cc. .) 2. 2.2 to compUlc (a) £( Y ) and E(X): (b) <I\: ami a ~: Hnd (c) a xy and corr(X. Scatth:'s daily high te m pe ra ture has a mea n of 70°F a nd II !itandard devialion of JOF. l hc nmdom variable s X and Y from Table 2.p .2 Usc the pro babi lity dislfihUl ion given in Table 2. Show E(X*) == p for k > O.7Y.2. Comput e (a) £(W) and £ (V) . Population Aged 2564. c.UJ College gr:ld~ IX "" I) TO III O.58 CHAPU:R 2 Review of Probability Exercises 2. Joint DislTibution of Employment Status and CcMlege Graduation in the U.9'iO O. popula tion.7119 O.
(c) from S (dolla rs) to E: (curos). and va ria nce of Y. t'. The corre lation between male a nd female e arnings for a couple is 0.!uutcs and (ii) non college graduates.9 X a nd Ya re discre te rando m variables with the follow ing joint dist ribul ion: Value of Y 14 0. a.000 per year . Le t Z = ~ (Y . W hat is the mean of C"! b_ Wha t i. b.. Calcu late the une mployment rate for 0) (·ol1 cgc grut.0 1.\andard dev iation o ( C? d. 2. Convert the answe rs to (a).lIld a standard dcvi<ltion of S IROOO. independent? Explai n.!istrihution . d.£( Y). mean.02."' 0. 2. the cova ria nce between m ale a nd fe mail: ~lIrni ngs? c.1 mean of $40.1 5 om 0.02 • X · 8. Lei C denote the comhin ed e<l rnings fo r a randomly selected couple.0:. 0. c.000 per year and a standard devia tion of S 12.15 0. an d variance o f Y give n Calcu la te the covaria nce and corre lation bel\\·ecn X lllld Y. male earnings have . Y = 14) = 0.115 U. e.02 O .01 0.. 2. Pr(X .l).ShowthatJJ. Wh a t is the probabil it y tha t thb worker is a COllege graduate'! A noncollege graduate? f.W 5 0. h. 10 11.80. 59 Compu te £ ( Y) .7 In a given fX)pula lion of twoearner male/fe male couples. . Arc ed ucational achievement and employment strum. a. " 30 40 . Calculate the pro ba bilit y t.02 Value (Ie x . Calculat e the probabili ty distribut ion.01 0.L = OandfT> "" 1.17 0. W ha t is the :. Female cam ings have a mean of $45. A randomly selected member of this population re ports being unem· pJoyud.I . and so Iorth.09 0. The unemployme nt fate is the fraction of the labor force that is une m ployed. Thai is. Show that lhc unemployment fOlie is given by 1 . Ca1cuhHe E( YI X = I) and F :(Y I X = 0)..8 Th e ra ndom variab le Y has a me an of 1 and a varinnce of 4. mean.Exerci5eS ~.1.
Show that £(y2) :0 1 and £(W2) '= 100.t1o. If Y is dimibuted ( 15. S = Ywhe n X = I. find Pr(6 s Y :s 52 ). (Thai is.) 2. d.) o. If Y is distribute d N( I. c. Y ~ distributed N(O. 2).79) . 25). find PreY > 1. find Pr( Y :s 7.83).10 Cornru le the foll owing probabilities: It. b.ewness for :l symme tric distribution? ) c. Show that £(y 4) = 3 a nd £( W4) = 3 X t 002. (l1il1l: Usc the fac t th a t the kurt osis is 3 for a no rmal d istributio n. Derive E(S) . Y s 8). 2.13 X is a Be rno ulli random variable with Pr(X = 1) = 0.99). 1(0). find Pr(Y :s 1.99 :s Y :s 1. I).lind Pre Y > 18.) e.99).60 CHAPTER:I Review of Probability 2. D erive the skewness a nd kurtosis (or 5. b. (ind Prey ~ J).1' = 100 and u} answe r the following questions: = 43. If Y is distributed FlO.1. Why are th e answers to (b) and (c) the sa me ? t!. b. !i> c. If Y is distri buted N(5. If Y is dis(ribUled .11 Compute the following probabiliti es: a. (Him: Use the law of itera ted expectation s conditioning on X = 0 and X = 1.75). 1. £(5 3 ) and £(5 4 ) .. d. (Hilll: Wh at is the sk.. Show tha t £ ( y 3) = 0 a nd £(W3) = O.12). Use the centra ll imil th eorem to . 9).0) .) d. If Y is d istributed X~. Co If Y is distributed N(50. r[ Y is diSlribUl ed N(3.. Why a re the answers to (b) a nd (e) approx imately Lhe sa me? c. find Prey > 4.78). If Y is distributed X~. £(5 2 ).X)W.1.12 Compute the I'o llowing prob. find Pre Y > 1. If Yis distributed Fj4. find Pr(40 :$ d.and S = Wwhen X "" 0. r. b.99. I).:'l bi lilies: a.31). and W is distributed N{O . If Y is distributed F7. If Y is distributed /90' find Pre .99 Y s 1. find Pre . find Prey > 0). 2. If Y is dislribuled N(O. (Hint: Use the definition of (heXi distribution.14 In a popula tion 11. 4}.l2O" find Pr( Y > 2. Le i S = XY + (1 .
you do not have your tcxthook and do nOl have access to a normal probabil ity table like Appendix Table 1. I n a rando m sample of size 1/ = 165. 2. Le t Y denote the 3\l. Pr(Y :S 0. Y~ lOA) when (i) IJ = 20. . 2..0 as n grows large. (i i) II = lOO. fi nd Pr( 101 S Y s 103).1.4.. i. In a ra ndom sa mple of size II  64 .. a. Berno ulli ra ndo m va ri ables wilh p = 0.. Let Y de note the dollar val ue of damage in a ny given year. fin d Pr( Y > 98). the damage is ra ndom. Let Y de no te the sample mean . random variables. c.6 s (iii) n = I.6).37) when n = 4(JO. Suppose c is a positivc num be r. How large would n need to ~ 10 e nsure That Pr(OJ 9 :S Y :S 0.Exercises a. you do have ~ou r com· puter and a computer program that can generate Ll. UnfOrlU· nately.4 1) ~ O.18 In a ny yea r. li nd PreY s 10 1). ii. Pr(Y 2: 0.) 1. the proba bility thai Y exceeds $2000'1 . Comp ute Pr(9 . n are i. 17 Vi' i = 1. Suppo:.d . However. lllld b.6). Show tha t Pr( ID  C S Y :S 10 + c) becomes d ose to 1.95'? (Use the central limit theore m to comput e an a pproximate a nswer.nsura!lCC pool " of 100 people whose homes are suW ciently d ispersed so that. b. Explain how you can use your computer to compute an accurate approx imation for Pr(Y < 3. in llny year.000. . Consider an "i. . arc i.ragc damage to these 100 homes in a year. Use you r a nswe r in (b) tu argue Iha l Y conve rges in p ro ba bility 10 10. eac h d istribu ted N(10. n. 2. draws from Ihe N(5.16 Y is distributed N(5 .e that in 95% of the years Y = SO. b. (i) What is the expected value of the average da mage Y? (ii) What i:. c. 2•.43) when n = 100.15 Suppose Y i .i . What is the mean and standard deviation of the damage in any year ? b. but in 5% of The years Y = \20.d. the d amage to d ifferent ho mes can be viewed as independen tly dis tributed ra ndom variables.d . 1(0) and you wa nt 10 calc ulate Pr( Y < 3. Use th e cen trallimil to compute approxima tio ns fo r i. III. The wealhc r can innict storm damage 10 3 home.OOJ.4 ).. 1(0) distributio n.. In a random sample of size 11 = 61 100. From yea r to yt=ar. " a.
)Pr(X "" . CHAP1Ea 1 Reviewof Probability 2. and so fo rth ....20 Conside r three ra ndom varia bles X. Suppose that w = 0. is ra ndo m with mean 0. Show that q corr(X. 1 .1'/). Y.25 . Suppose tha t X a nd Y a re independent. a. Suppose that IV = as Compute the mean and sta nda rd deviation of R..)' ~ E(X') .for sim plic ity. b. Use your answer 10 (a) to ve rify Eq uation (2 .. after one year and thai $1 invcsted in a bond fund yie lds RbI that R. 0 a nd 2.. b. £(X2).( Z = z) = f'r(Y .• Yk> Ihal X lakes on I values X I ' . 1:". and tha t X takes on I values XI' . Z is Pr(X "" x. Sl and you are pla nning to p ut a fraction w into a stock ma rke t mutual fund a nd the res' .75.. X) . and that Z tak es o n m va l ues ll .).4(E(X) IIE(X ' )] 3IE(X)]'. 16).l'.I1')R/. ) :l. + 6IE(X)]' IE(X') ] 2. in the bond fun d.Z)).IV . Show that Pr(Y = y) = L:~ I Pre Y = yjl X = . a.....19)..05 (5%) an d standard deviation 0.04 ..07. " y~.. [Hint: This is a genera liza tio n of Equatio ns (2. l ' " d y ..08 (8%) and standard deviatio n 0. + (1 . Show £(X .20)..)l ~ E(X') .t /.7 { } . (Hillt: Use the definition of Pr( Y = yJI X = x.19) a nd (2. and th at Rb is rando m with mean 0. E(X})... the n th e return o n your investment is R = wR.. a. 1 . x .. Explain how the marginal proba bility that Y = Y ca n be ca lcul a ted from the joint probability distrib utio n.J b. c. f'r (X x . Suppose tha i $1 in vested in a stock fu nd yie lds R..J b. . Suppose tha I Y takes on k va l ues Yl' ..19 Coosider ( Wo random variables X and Y.22 Suppose )'ou have some money 10 io\'esl. Show E(X . Compute the me a n a nd standa rd devia tion of R. The joint probability distribut ion of X. Z = z). a nd Z.. into a bo nd mutual fund .XI.• .. Show tha t £( Y) = E IE(Y IX. Suppose thaI Y takes on k values Yl' . Y = y.. Y. Wha t val ue o f w makes the mean o f R as large as possible? What is the standard devia tion of R for this value o f w? .r. a nd Rb is 0.. The correlation betwee n R. c.l l X is a ra ndo m va ri able with mo me nlS £(X ). ..Y):< O .3(E(X'JIIE(X)1 + 2(£(XlJ'. [Hint: This is a ge ne ra liza tion o f Equation (2. and the conditional probability di stributio n of Y given X and Z is Pr(Y = IX = ".W .. •. If you place a fra cti on w of your m o ney in the ~Iock fun d and the rest .
p.:'J 'II is dislribuied X.1)0 \' y + b?u f.3./oly)] + '" E(b2(y ..... To derive Equill. a nd le I Y "" X l + z.O. Show that £(Y IX) = X 2 . I k .y)FI . a. Show (hat E( XY ) = O. 2._ 111 Derivation of Results in Key Concept 2.l(Y .n (2. (Hilll: Use your answer to (a). c. y) Z] :>: f>2q}. To de ri ve Equalion (2. b. 3 This appendix deri ves Ihe equations in Key Concept 2.Deriva ~on of Results in Key Concept 2. Eqoatio n (2.) d. b. var(b + bY) = Ella E(b + bYWI = Ellh(Y . zY j n . i. Show that JJ.. lei X and Z be IWO independent ly distribu ted standard nonnal random va ri ables. Show thaI cov(X. .2 . a Show that £(Yl'q'2 ) = I.. use Ihe defin.24 Suppose Y.uy)2) u2 var(X) + 2lIbcov(X.ty)]2] ~ x) l l + 2E [a b(X . (Harder) What is the value of w that minimizes the standard devia tion of R? (You ca n show this using a graph.. Show thai W = ~ ~.J.uy)J'1 = £{[u(X . (249) . usc lht defi nition of the variance 1 0 wrile var (oX + bY) = EH(aX + bY) .(oJJ x + b. Sbow (h a l V::.3 63 d.29) follows from the dd inilion of the expectation.Y ) "" O. (Him: Use the fact that the odd moments of a sl::J. Y) "" O.1 ~I is d istributed 1.lioD of the variance to write. Y) = al ai + 21.d . NCO.ndard normal random variable are all zero.lYt.Io< x)(Y + b 2var(Y) .23 Th is exercise provides an ex ample ora pair of random va ri ablt': s X and Y for which tbe condit ional mean of Y gi ven X depends o n X bU I corr(X.. algebra. o r calculus. n. Show that £(W) = n. 1. 0"2) fo r i :.) 2.) d. Y) = 0 and thus corr(X. is distrib uted i.3 1). y "" 1.~ x) = E(a l( X  + bY  + bey .30).p. c._ 1' APPENDIX _ __ 2_ . .
e(V = E JJv)lI Y . lei n "" .ty£ (X . we huve lilal var(aX + Y) = (l1" i + O'f + 20fTXV '" ( .s b)' collecting lenns./f.II (250) '" b u XY'" coVI" \\hieh is Equation (2. which ( using the definit Ion of thc correla tion) prow:s the co rrelation mequality.Po rl! = E II b(... Itf \ y / (u.II + E iI'(V  ".ud + J. ~o from t he fina l line of Equ.3 1). Ihal is...tfhltf } . Equalion (2.mse var( aX + Y) is II variance. Bec.lx) + JJdl( Y .34). lcorr( X . Y) ~ EU II + bX + cV  £( 11 + b X + e VJI[ Y .IT n /(' j.l S  u} + /.A.: thi rd equilhty follo~ b~ expand ing Ihe quad ra tic.HY JJ.. C4ui\i"lk ntly.[(X .O'. write E(X Y) = E II ( X .loUt (covariance Inequality ).l tion (2. i Jlie cmilria ncc I s (.51 ) it must he tha t u?  ul Y/ tfl ~ O.lIlce and co\ oriance. \II Y ". Y ) I :s.J. Rea rranging this ineq uality yid ds o. To derive EqulHion (2.lxE( Y  j.Y)1 S I ." .64 CHAPU t 2 Review of Probobilily where the 'lCcond equali l )' folJo.33).ty) + j."'I'll ".. . To derive Equ3tinrl (2.fl + .W o"j + u f· + 2(u xy/(I'l)uxt' (2.(y 2) = £U(Y It 1') + 214\.'0\'(11 • !J X + cV. I.35).S1) .J. = We now prove Ihe co rrelation in. 1 252) incqu.y = (T X Y + 14 .JJx ) F.\' .y.FI = L:. >lI Y IIb(X . write F. n .Pr») + J.~ because E(Y . o~ .33). it CHnno t be n..\' Uy) !S 1.32)..r) . lcorr(X. and the fou rth equality fo llows by the definition of the vari.llity implic!' th a t dJ yl(('~ u ~)!S I o r.~x)(Y . t h.E( Y  JJ.:gal ilc.1'IJ.. usc tht· defin ition of the co \ ari:mce 10 write (. To derive Equiltion (2..JJ\') "'f. ilnd b  L A pplying..:quality in Equil tio n (2. o.ly) + ~ III + ILxlA.IJ.
if so.S. such a co mprehe nsive !>urVl"~ would he extremely eXfXn~iw. we might ." of distributions in populations of interesl. population.about characteristics of the full popululion. and the process of design ing the ccn»us (orms.S.5 . For examp1t:.! to the distribulion of earn ings in the population o f worker!>.ing thc da ta ta kcs te n yea rs. Census cost $ 10 billion.sur vey.lordinary commitme nt. Rather tha n . O ne way to answer these questions wou ld be 1 0 perfo rm an exhausti ve slIrw y of the population of work ers. lhe population d istribution of e arninw>..survey the e ntire U. more practica l approllc h is needed.CHAP TER 3 Review of Statistics S IlIl is lic~ is the scie nce o f using d <lta 10 lea rn a bou l Ih e world around us. measuring the earni ngs of each worker and th us fi nding. Usi ng statistical methods.. The o nly comprehensive censu~ l llC survey of the U. SuujSlicallools help 10 answer ques lions about unknow n c ha raclcrislic. llIa ny me mbef100 of the po pulation slip through the crac ks a nd arc no t surveyed. whill is the mean of the distribution of earnings of reCent college graduale~'! Do mean earning.. . by how mucb? 1l1t:~e qu eslion ~ rclatl. differ for men and \\omen and.S. TIle key insight of st atistics is that o ne elm learn about a population d i~lribulio n by st:1ecting a random sa mple frO\lllhat population.Io draw »tati!>tical inferences. we can u~ this sample to reach tentat ive conclusio ns. hi!> cx trl.In practice. Des pi" . selecte d at random hy simp le random sam pling. Thus a di(fere nt. 1000 members o f the 'Pula tio n. population is the dece nnial 20lX) U. say.:. and compiling a nd ana ly:t. howc\"cr. managing and cond uc ting the surve ys.
7. Sections 3. if they are coll ected by simple random salllplin~). from a sample o f data.ty and the properties of Y as an estimalOr of fJ.23. such "" Ih t: mean earnings of women recemly graduated from college. In some special circumstances..such as its mean.ion of the sampl~ correlation and scatterplots in Sect ion 3.4..r) in 11 populalion. lhe methods for learni ng abou t th e mea n of a singJe population in Seclions 3. •• Y" (recall that Y1. Y1.3 are extended to compare means in two different populations. is the re a gap between the mea n ea rnin gs for male and fe male recent college grad uates? In Sec tion 3. hypOlhcsis testing.lals can be based 00 the Student f dis tribution instead of th e normal distribution: these special drCUl11sta nces are discussed in Section 3. hypothesis tests and confidence intef'.3.. H ypolh e~is testing enwils fomlUlating a specific hypothesis about (he population.•• CHAPTER 3 Review of Siotisrics Tbrec types of statistical me thods are used thro ughout econometrics: estima tion. Confidence interva ls use a set of da ta 1 0 estimate an imerval or range for an unknown popula tion characteristic.5 discusses how the means of twO pop ula tions can b~ u~t:d method~ for comparing the to estimate causal effecl'i in experiments. Section 3.3. For example.i. Esti mation entails computing a "best guess" n umerical va lue for a n unk nown characteristic of a population disllibulion. and 3. A natural way to esti mate this mean is to compute the sample average Y from Ii sample of II inde pen dently and ide nticall y dis tributed (ij .\" . The ch<lpter concludes wjlh a discuss. Sections 3.t). Ibis section discusses cSlimation of j.1. MOSl of the interesting questions in eco no mics involve rela tio nships be lween two or more variables or comparisons between different popula ti ons.c is large.d .. Y" are i.3 review estimation.) observations. 1 Estimation of the Population Mean I ~' Suppose you wan llO know the mean value of Y (JI. ••• . and confidence intervals. . and confidence interva ls in the context of statistical inference about an unknown popula tion mean . then using sample e\'idcnce 1 0 decide whe the r it is uue.1 . 3 .5 focus on the use of the normal distribution fo r pertorming hypolhcsis lesls and fOr constructing confidence intervals when the samplc si7. hypothesis testing.2.d.
both are estimators of ILl" When evaluated in repeated samples.. Thus. many estimalors of J. There are many possible eSlimators. while an estimate is a nonrandom number.. fly is biased . the estimator is said to be unbiased. YI' Both Y and YI are functions of the data that are designed 10 eSli mate Jot)': using the term inology in Key Concept 3. ano ther way to estimate /Jy is simply to use the fi rst observalion. y.. you would get the right answer. Es timators. co nsistency. al lenst in sOllie average sense. in fael. To slate this mathematically. Y and Y. we would like an estimator Ihal gets as close as possible to the un known true va lue. It is reasonable to hope that.1. so what makes o ne estimator '"better'" than another? Because estimators are random variables. An eslimutor is a random variable because of randomness in selecting the sample.3. 1 Estimators and T he ir Properties The sample average Y is a natural way 10 estimate j. A n estimate is th e numerical value of the est imator when it is actually computed using data from a specific sample. Unbiosedness. this question ca n be phrased more precisely: What are desirable characteristics of the sampling di stri bution of an estimator'! In general. There a re. on average. o therwise. take on different values (Ihey pro· duce differe lll esti mates) from one sample to the ncxt. but it is Ilot the o nly way.y denote some estim ator of ~)'. 3. Th us a desirable property of an estimator is that the mean of it::. let jJ. Suppose you eval uate an estimator many times over repeat ed randomly d rawn samples. such as Y or YI . . The estimator j.. if so.y) :: Joty. sampling distribution equals ~)'. This observation leads to three specific desirable characteristics of an estimator: unbiasedness (II lack ofbias}.l. the estimators Y and Y\ both have sampling distributions. of which Y and YI are rwo examples.. For example.Ly.. in oth er wo rds. I Eslimalion 01the Population Mean 67 ESTIMATORS AND ESTIMATES An eslimtllor is a function of a sample of data to be drawn randomly from a pop ulat ioll . y is unbiased if £(j. where E(jly) is the mean of the sampling distribution of p. and erficiency. we would like the sampling distribution of an estima tor to be as tightly cen tered on the unknown value as possible.y.
68 CHAPTER 3 Review 01 Sloliuics KEY CoNCEPt 3.y) < v8r(i~y ).l.. Jf iL) has a sma ller variance than il y • the n jLy is said to be more efficient than ill" The terminology "efficiency" stems fro m the not ioll that. co n sist~ney. the uncertai nty about Ihe \lIl ue of J. when th. • Let jiy be Ilno ther estimator of p. Sla ted more precisely.: ~ mpling distribution of Y has a lready bee n exam ine d in Sections 2..p.'Y a re summaril.)" and suppose thn t both jJ. Bias. y.Ly. Let it y be nn e~timator of J. BIAS. iI desira ble prope rt ~ of ~ )' is thilt the probability that it is within a ~mall inter\.r' Then: • TIle bius of it)' is E(~y) .y is tha t.: sa mple ~ i z~ is la rge.5 and 2.y if £(~r) = p. A not he r dcsira ble prope rty of an estima to r ji.y is said to be morc emcienl than ill' if var(fj.6). and dficilo!nl. consistency. so Y is a n .ing the e:. How mig. AND EFFICIENCY • p. bot h o f which are unbiased .timator with the §milllcst vari· ance.y is a consistent estimalor of IL)' if it y ~ p.l. Consistency.2.al of the true villue J.f. The n [A. and efficie ncy'! Bias and consistency.0 b t ochoo~e the estimator wit h the tightest sampling. • ity is an unbiased estimator o f Ji.. Th.lIld iiI b~' pick. if itt hilS a sm alle r variance than iLl" the n it uses the informatio n in the da ta more efficie ntly tha n docs .'y. CONSISTENCY. that is.5..ed in Key Concept 3..y and iii' orc unbiilsed..4) approaches I iI~ the ~a mpl c size increases.l arisi ng fro m random \ ariat ions in the sam pl e is vcry "mall. r is consistent for lLy (Key Con cept 2.0. ~.2 ~.h t you c hoose between the m? O ne wa) to do <. Variance and efficiency Properties of Y H o w does Y fa re as a n estimator o f lLy when judged by the three criteri a of b ia~. jJ. As sho wn in Section 2. This ~ug gests choosing betwcl:n iLl . distribution.y and ILy. l::P' ) = ~y. Suppose you have Iwoca ndidate e~ lim a lo rs..
niance is V3T( Y I ) = O' ~. Th is resul t is sta ted in Key Concept 3. accord ing 10 the cri terion of efficie ncy. Y". th at Y is a more desirable estim ato r than Y 1• What aboul a less obviously poor estimator? Consider the weighted ave rage in which the obst:rvations are alle rn ately weighted by and~: Efficiency. The comparisons in the previous two paragraphs !>how th at rhc weighted averages YI and Y have larger variances than Y.s the least squares estimatoro{J.I I1. it s v.. Thus.i. We start by comparing the efficienc). In fact. the \'ariance of Y is less than the vari · ance of YI: that is. Y i ~ consistent.d .25(T~. the variance of Y is (T ~.6) sta tes thaI Y 4 }Ly. The estimators Y. Y 1• and Y have a common math \!mlHical structure: They an~ weighted averages of YJ Y"..why would you go to the trouble of co llect ing a sa mplt: of n observations only to throwaway all but the firsl?and the con ce pt ot c rricie ncy provides a (o rmal way to sho.l. < ••• .". I " (Exe rcise 3.l r functions of Y1. However. these conclusions reOect a morc general result: Y is Ihc most efficient estimator Y". Y is consistent. + 2Y3 Y" (31) whe re the number of obse rva tio ns 1/ is assumed to be even for conVt:o ience.3. t E 5timotion 01 the Population Mean 69 unbiased estimator of J..y and its variance is var(Y) = 1.( Y I ) = My: thus YI is an unbiased estimator of J.3 and is pro\ en in Chapter 5. because var(Y) t () as n + 0(1.5. Y is the Bc!>t l. Y has a larger variance th an Y lllU S Y b more efficient than Y. Said differ· o ( till unbiased esti mators that arc 'Acighted averages of Y I cn lly. < •••• Y . Y should be used instead of Yt..The sample average Y provides the hcst fit 10 the da ta in Ihe se nse Iha l the average squared differences between the observations and Y are the smallest of aLI possible estimators.inear Unbiased £ stimalOr (BLUE ): Ih'lI is. + 2" I Y"1 + 2" 3 )." llle estimat or Y I might strike }OU as an obviously poor estimator.y.ly. the mean of Ihe sampl ing distribution of Y 1 is F. . What can be said about the effit:icney of F! Because efficiency entails a comparison of estima tors. Simi larly. of Y to the estimato r Y 1• Beca use Y 1• ••• • Y" are i.. Thus Y is unbi ased and . llle mean of Yis p.= Ii 1(2 1Y1 + IYl 3 Y 1 +I 3 Y• + . 11 ). that is. the law of large numbers (Key Concept 2. for 11 === 2. Y is a morc efficient estimator than YI so. . i .. il is the most effi cie nt (be'll) estimator among all estimators Ihat are unbi ased and are line. . we need to specify the estimator o r estima tors to which Y is to be compared.ll'. From Section 2.
m can be thought of as a pre diction mistake.. you can use algebra or calculus to show that choosing m = Y minim ize!> the sum ot' squared gaps in expression (3.2.50 that Y is the least squares estimator of f. yo u can think of it as <l prediction of the value of Y/. •. on the second Wednesday of the month .. . .y ~ ~~_ l aiYj' where al ' " . that is. Thus Y is the Best Linear Unbiased Estimator (BLUE).t y = Y..uy) unless f.2). Y is the most efficient estimator of J.olve the least squa res problem: Try many vaJ ues of m um il you are satisfied that you have (he value that makes expressi on (3. One can imagine using trial and error 10 s.2) can be th ought of as the sum of squ ared prediction mistakes.Ly. Because most empl oyed peo ple are at work at lhat hour (fl a t siuing in the park!) . Suppose that. a statistical agency adopts a sampling scheme in wh ich intervi ewers survey workingage adults sitt ing in cit y parks al 10:00 A. Because m is an estimator of £(Y) . as is done in Appendix 3.Ly among all unbiased esti· mators that are weighted averages of Y. The Importance of Random Sampling We have assumed that YJ ••• • .70 CHAPTER 3 Review of Statistics KEY CONe"'! EFFICIENCY OF Y: Y IS BLUE 3. Y n are i. such as those that would be obtained from sim ple random sam pling. Yw :0 Consider the problem of finding the esti mator m that minimizes " (Y. (3 . The estimator m that minimizes tbe sum of squa red gaps Yj .y is unbiased .. then var(Y'} < var(..2) is called the Jeast squares estimator. • Y". a" are nonrandom constants.y be an estimator of Py th at is a weighted average of Y.3 ~ Let fJ. the unemployed are o verly .In in expressio n (3. so that the gap Y. draws. This assumption is impo rta nt because nonra ndom sampling can result in Y being biased .rl.M .2) as small as possible. . that is. If p. A lte rnatively. The sum of squared gaps in expression (3. L i"'l 01)' . _..2) which is a measure of the tot al squ ared gap or distance between the estimator m and the sample points.i. p. to estimate tbe montMy na tional unemploymen t rate.
3. Roosevelt. This example is ficl"itious. phones..s eCIUOl I. This section describes lesling hypolheses concerning the population mean (Does the population mean of hourly earn in l!. u.s.ilerary Gazelfl! puhlished a poll indicating thai Alf M.. Do you think surveys conducted over the Inler net might have a similar problem with bias? represented in the sa mple. the estimato r was biased and the G(Il(!lte but it was wrung about the winner: Roosevelr won by mad. and an estima te of the unem ployment rate based o n this sampling plan would be biased. 3. CUrTent Population Survey (CPS). as a landslide.2 Hypothesis Tests Concerning the Population Mean Many hypOlheses about the world around us ca n be phrased as yeslno questions. Because the telephone survey did not sample randomly from t he Gll~e(l(! popuhuion but instead undcrsampled Democrats. It is important to design sample se lection schemes in a way that min imizes bias. T his bias arises beca use th is sampling scheme overTepresents. BUI embarrassing mistake. HVDJHhesis tests 109 two populations (A re mean earnings !be s m c Mr me l' 2nd p evus?.or oversamples..and those that did tcnded to be richerand we re also more likcty to be Republican.2 Hypothesi ~ T ests Conce.! an 59% to 41 %! How could the Gazelte have made such a big mis take'l The Ga~elle's sample was chosen from tele· phone records and automobile registration files. but the " La ndon Wins!" box gives a realworld example o f biases introd uced by sampli ng thaI is ~ ot ent irely random. the sur vey it uses to estimate the monthly U. 1 includes a discussion o f what the Bureau o f Labor Statistics aCllIally does whe n it cooducts Ibe U.S. • . the unemployed meml"le rs o f the populatio n.S. Landon would defeatlhe incumbenl. The was right Ihnl the elect ion . are taken up in Section 3.ning the Population Meon 11 S hortly before the 1936 Presidential election. D o the mean hou rly earnings of recen! college graduates equal S20lhour? Are mean earn ings the same for male Clnd fe male college grad ua tes? Both these ques tions embody specific hypot heses about the popula tion distribution of earn ings. unemployment ra te. Franklin D. by a l a n d~ l ide57% to 43%. the in 1936 mllny households did nOI have cars or tele I.4. Appendix 3... The statistical challenge is to aoswer these questions based o n a sample of evi dence.S20?)....
.l.The twoside d alternative IS wnHe n as * H I: £( Y) "* JLy'o (twosidt:d alternative).72 CHAPTER 3 Review of Statistics Null and Alternative Hypotheses "m e starting point of statistical h. because It a llows £(Y) to be c lthe r le ')s than o r greate r tha n ~) :!l. o n average in the populatio n. IIdll iljpGtI1ESa IS tliJi population mean . the sample average Y mrety be exact ly cqua l lo the hypothesized va lue JL Y. Ihis is called a 1wo sided alternative hYR9/hni. t en e nu ypo le IS IS a c . y. The pro bl em facing the of ~taLis ti cian is o n addi ti onal evidence. colle ge Snlduates earn S20/ho ur consti tutes a Pi!!! b YQQ1bo is gbm!! the oopula tion distri bution of ho urly carnin So Sta ted ma thematically. ta kes on a specific value.scussed la te r in Ih is set:tion.. For this reason.u):o can arise because the true t ca ll in fact does not e ual .~ \ ~~s. V _. tha i ho lds if the n ull d oes no t.I..ltl ve h ypothesIs spi:<:llies whalls true If tIle: null h ypotheSIS IS nut Ibe most ge neral alternative hypothesIS is thai E( Y) fJ.:. ~.3) IAA h./4 e. called the IIl1erllatit'e h )'p9Ihc'j:is. t ~ ~ .. ~ I 20 III b qu a bon (3 J).. de noted by 1Ly...4) O neside d a l1 e rnativcs are a lso possible. u~pothes is or fai li ng to do so.Y. •. the conjecture that. a n y give n sample. (the Dull hypothesis is false or mea n equa S fro(the nul] hypothesis IS true Ul Iffers from ~ y'o because of ~iII I . (3. ..!l. The aliem.D iffe rt!nccs between Y and . as t!ither rt!jccti. pare the null hypothesis to a second hypothesis... 11 p. H ypothesis tcstinp enta ils using data to com.:value _ I c.()o The null hypothesis is de noted Ho and thus is inc tile Hri E(Y) ~ ~"" (3.O ::: r r JJ. if Y is the ho urly e arning o f a ra ndomly selecte d recent co cge gra un e. £(Y)../.ypotheses test ing is specifying the hypolbesis to be tested..~ _ . For c xumpie.o. called the Dull hypothesis. sta tistica l hypothesis testing can be posed ~S. .J . aod these a re d i.
~Y.2 Hypo~is Test~ Concerning the Population M. Thepva lue is the leas t as different (rom POPU [:l tIO value of $22. "':~_ 'sel f _A' ''~ A. o r exa mple. the pvalue js the a rea in the ta ils of the distributio n of V under the null hypo thesis beyond IYIIe/ .ol· If the p value is large.... but if the value IS small. H ow v r accord in 0 the . " k• A h mpme t e p va ue ry 0 now the sampling distribu tion of Y under th e null hypothesis....'.24.~ .LlI5.o ) · (3.eon 73 is im possib le 1 0 distinguish betwe en these T WO possibilities with certainly. In the case at hand.. assumin the null h ' thesis is correct. nder the nu Yl norma l d ist ri bu tion is .it ..y'(~ so unde r the n ull hypothesis Y is d istributed N(IJ. also called the signifiamce robabilih'./ is consiste n t with the null hypothesis.. suppose tha t.."1 _~~( average wage is $22.24 by pure nmaom samp im ' 120 (Ihe esis is true. when the sam Ie size is small this dlslnbutJon is com licated . it is possible to do a pro babilistic ca lculation tha t permils tes ting V I" <>a'''I'I.Y al leas! as far in the tails of its distrih si~ s sam Ie avera£e ou actually computed. . .6." ~ .ol > IV"" .~.. ~"The pvalue.5) /JrI \ .u. Thi.the pull hYpothesis in a wa y thaI accounts (or §ampUrg unserlaint)'. is the r . AHhougha sample of data canno l pro vide conclusive e vide nce about the null hypothesis.' ue IS I c probability of drawmg. the p. (Tt ). ~ That is. . is no . A s d iscussed in Section 2. when th Ie size is lar 'e the sam li n dlSlri Imate by a normal dist ribution .. it the o rem.say . in your sa m ple of recent college g ra u ate~ c r'fI • P"f1AIfAJ: but ion und /) fJ'''''.5" pl~ would have been drawn if the null hy  p value = PrHJI"V .3..II. IS pva ue IS sma .)~o. then the observed value y uc.u. . ingasfallSIIC31 caslasa versetol cnu y t sisa I comute In your sam Ie...s calcula lio)involves using the da ta to compute the pva lue of the n ull hypothes is.'.
Thus the pyolue is the ~ 51ondord normal tail probabilIty oul~ide tYOC1 .1 Calculating a pvolue The JM'Olue is !he probability of drawing 0 voIue of Y tOOt differs from ""'(0 by at Ieo~t 0$ much os Y "". The p .. bg wcye r d COrud o n whether a1.cd in Figure 3.74 CHAPTER 3 Review of Statistics FIGURE 3.. so (V' . "" O"~ / " .. The details of the cakuhll ion . Thus.lriboled """." °i) .0. then under the null hypothesis the sampling d istribution cry is N(JLy'o." . Y is di~tributed N(p. the pva!uc) is ij pvnluc  P'HW ~:Y'I > IY ·'u/.1. unde r the null hypotbesis. (Y JL}. Calculating thepValue When Uy Is Known Th e c:llcu la lion of the p value when U y is known is summari7.ro.~r)/ar' = !'f{O.. is di..Th i s largesample no rma l approximation ma kes it possible III compute the pvalue without needing 10 know the population d istribution of Y.'.ln large samples. is k nown .1 (that is. lT y)' whe r~(:~. (3.rr)/ u.2q{IY. has a standa rd normal d istrib ut ion .U)IcT} ."'~i). r " """I 1 "y z where u~ = (T ~/ Il.ul o ".fl va lue of Y 0 . I). the stan darJlI~cd ve rSIOn o f Y. as long us the samp le size is large. l~ lUi! prob abil it)' in Ii gure 3.U~1 under the null hypolhesis. val ue is t he probability of o b ta ining .6) .jA. I) _ I f oli' ~ It I. If the sample size is la rge.
ILy)2.6) ~kgepd § 99 the "niaper Of !he ·0 ulalion distribution.Iluc . is .t:'L~I (l'. wo 10 xercise 3.o) / u y. ... this variance is ty iennv unknown. with twOmodifications : Firsl.J.s the area in the tails of a standard norm lll distri hutio n outsid~ == (Y""  ILy.f~. and as a res ult s~ is unbiased.l. so th a t 0: 0 e ' rees egrc£s o 0 f freedom reedor l. t±r is repl aced by Y. and second. is the average value of (Y .L. in which case the van ance ISdele rimned bv the nul! h)'QQ!h£§i§' Set' EpllalipQ (2 7l ! Because in general must be estimated before the [Iva lue can be compu ted. • ui.y )2.1 8. e reason for th~ fin. is the sou.7) instead of ncoI" rects for this small downward bias.! ios'C o " g f by II is Ih O' nO mati ng J.'lrr root of the s.2 Hypori'Iesi$ Tem Concerning the Population Meal 7S where <1'1 is the standard norma l cumulative distribution fUllc tio n.. one "<.in remain  is. £!( Y( y )2 J [(II 1) / lIlo} Th us. and Standard Error 'ille sa mple variance $~ is an estimator of th<!' population va rianc~ (T ~: the sample ~ I<lndard deviation s} is a n estimato r of the popul ation siandmd deviation (Ty: and the sta ndard error of the sa mple average Y is an estimator of the standard devia Lion o f the sampling d istributio n of Y. t he lH' :. nle sumplc "ariance.I V)' .n.I . The sample variance and standard deviation .I in Eq uation (3.l. . The Sample Variance.I )(T5 " Di viding by n . e data.. ..y)2 in the population distribut.7) l nc sample ~tanda rd deviatjon h·. pie average of (Y. n 1_ . Similarly.y by Y inl roduc ' .yfl = (II .i = 1. In practice. The populat ion variance . ut The formula for the pvaluc in Equalion (3.. D ividing b" 2 ! in 5'11 1'$11 til:a) '.lOlDlc= variance. A n exception is w en i IS mary so !ts ISln UUon IS e rnou I.weed of n is called a degrees or freedom correction: Est'f' matin the mean uses u some of the IIIlo rmatlo ll that uses u . the avera s the divisor" I instead of II. :~rce 0 . ThaI is. (3. c formula for the sample variance is much like Ihe formula fo r the popula· ti on variance. the sample vuriantt is th e sam.JJ. . Yl! " £(( Y. .ion.'flle rea· son fo r the second modificatio ndividi ng bv " .•f = _ ~(Y.3.I. . we now turn to the problem of es tima ting (T ~. £(Y .t modificatIon y is unknown and thus must be estima ted: the natural estimator of IL )' is Y. Sample Standard Deviation. .
C'I VJ . (y.9) is proven in A ppendix 3. Because the standard deviation of th e sampling dis tribut ion of Y is CTr = a ylVn. th e reason that s~.76 CHAPTER 3 R eview of Stotisti c~ KEY CONCEPT THE STANDARD ERROR OF Y ~ 3.i.. . the sample varia nce is close to the population va ria nce with high probability when 11 is large. 1.d..p )/ 11 [see E q ua lio n (2. wh en U y is . The standard error of Y is sum marized as Key Co ncept 3...7)1. The fo rmula for fhe sta ndard error also ta kes o n a simple form that depends o nl y o n Y and 11 : S£(P) = v' Y (l I IT . (3.)2911'5' hi] illite. in o the r words.Y" are i. The standard error ofY.8) Consistency ofthe sample variance. must have a fin ite fo urth mo me nt. bas a finite fo urth mo me n!: thaI is. 5£(1') = up = sy/'Vii . aod Y.. When Y " . itively.value can be computed by replac ing u y in Eq ua t. the p. u Y) Calculating the pValue When Uy Is Unknown Because s~ is a consiste nt estima tor of u ~.9) justifies using sy/Vii as an estima· to r of a y .6) by the standard e rror..3 unde r the assumptio ns tha t Y.ion (3. Whe n YI •. Yn are Li. dra ws from a Be rn o ulli distribution with su ccess probability p.i. .d.d . The stan dard e rror of Y is denoted by S£(Y) or by uy. de no led by SE(P) o r by y (the . est imator of th e population varil'lDce: 111e sample variflnce is a consiSle lll sf ~ u~.4 The sJaPdard error of Y is an est jmatgr of the §1ftpda rd deviatio Q of Y.. SE(Y) = (r y_ That is.9) In other words.. E quati on (3.4.. ~" over the symbol me ans th a I this is a n esriml'l lor of u y ) . Y~ are i. Y.. The resull in Equal ion (3.. the formula for the varia nce of Y simplifies to p(1 . But for st to obec le law o f lar' 4\ must be i 6. 15 consistent IS that It IS a sample e 'sin Ke awof lar: e numbers. (3. The estimator of a y • Sy / VIi. is cal led Ihe standard e rro r o f Y a nd i!.
o r = S£(Y) (3.s} is close to (1~ with high probabilit y.7). assuming the null hypothesis . suppose tha t a sample o f II "" 200 rece nt college grad uates is used to test the null hypothesis tha t tne mean wage. I de note the valu e of the Isia tistic actua lly computed: or r".06.14) As a hypothe tica l e xa mple. Le t I~(. . The lStatistic is a n im po rtant example o f a test statistic.. 1) fo r la rs. tbe pvalue can be calcula ted usin g p. t is approximately distributed N(O. is $20Ihour. Thllt is. the lstutlslic o r lraUg.2. .u SE (Y) (3.}Ly. under the null hypothesis.d ... Whe n " is large. Acco rdingly... a test st atistic is a statistic used to perfo rm a hypothesis test.3.28.o)/SE(V) plays a cent ral role in testing sta tistical hypotheses and has a specia l name.va lue ~ 2e1l( . LarResample distribution ofthe tstatistic.l 2) The formul a for the p value in E quati o n (3. 11) In gene ral.p. whe n " is largt:.. 13) Accord ingl y.9%.l' .1 0) ca n be re written in te rms the Istatistic..y..'1 y .}Ly.(6) = 0.2.64 a nd the sa mple standa rd deviatio n is $)' = S1 8.20)/1..2 HypoIhe~i ~ T esh COf"ICefning the Pop!Jlalion M.64 ...'I). (3. the pva lue . ( I l'~' (3..""' sEc'Y) .10) The tStatistic The standardized sample ave rage (Y .ean unk nown a nd Y I _. The value of Ihe Ista tistic is ( fl<1 = (22.039. E(y) .28 . Y II are jj. or 3. The n the sta nda rd e rro r o f Y issyl vn "" 18.14 / \1200 = 1. Fro m Appe nd ix Ta ble I. I) . (J.. the pval uc is 2~( .14 . Thus the distribution of the /Sta tistic is approxim ate ly the sa me as the distributio n of (Y ./J. lue = 2<1>  .s calculated using the fo rmula 77 pv.e II. The sa mple a ve rage wage js Y"CI = $22.yu)/a V' wh ich in turn is we ll approxim ated by the sta nda rd norma l d istri butio n whe n" is large beca use o f the ce ntra l limit tbeore m ( Key Conce pt 2. Y .
i.96. so th e hypothesis is r~jected a t the 5% level.15) is 5%. t hen unde r the null h y po th ~s i s the (statistic has a N(O.96.o at the 5% sig nificance le vel. . the critica l value of this twoside d te!. repon ing o nl y whe ther the nu ll ..05. 1) distribution. the probabilit y o{ e rro neously rcje("tmg the null hypothesis ( rejecting tile null h)'Polhesis w h~n it is in fact true) is 5% . Hypothesis Testing with a Prespecified Significance Level When you undertake a statistical hypothesis lest. Hypothesis tests can be pe rformed wi thout computing the p valuc if you ar e willing to specify in adva nce the proba· bility yo u a rc willing to rolerate of maki ng the first kind of mistake . summ arized in Key Concept 3. A1thoug h performing the test with a 5% signi ficance level is easy. 1f th e lest rejects a t the 5% ~ig oiJicance leve l" tb e po pu lation mean fJ. hut in many prac tica l sit ualions th is prefe re ntial treatme nt is appropria te.y is said 10 be sta tislica ll y sig nificantly dif fe rent from J. (3.5 .t is 1. 'me signi(icance levt:l of the test in Eq ua ti o n (3.15) 10at is.96.>ugh. T11. In the previous exampl e o f testing th e hypothesis that the me a n ea rnin g o f re ce nt college graduates is $20.:d fro m the sample j~ greate r {han 1. of ~. reject if the absolute value of tllC (statistic c()mpu{.  p Reject I/o iftt'''''1 > 1. This a pproach gives prefere nt ia l trealme nt to the null hypothesis. ~ 'F" ~ (forcxa mple_5%).96.96. th e (sta tistic Wil') 2. If I I is large e n.: e»ceeds 1.9%. r. the probabili ty of obtaining a sample 3vcragt! atlCasl as different from the null as the one actua lly computed is 3. or you can fa il to reject the null hypothesis when il is fa lse. you clln make fwO types of mis takes: You can incorrect ly reject tbe null hypothesis whcn it is true. the null the n you wi ll r~ject the nul! hYPO l h~si s if and on ly if the p value is less th an 0.06. Thus. l cs ting hypo theses using a prespecifie d significance level docs not req uire computi ng pvalues.that i ~. and the rejection region is the va lues o f tbe Istatistic outside :t 1. If you chouse a prespeci .78 CHAPTER 3 Review of Sloli~tics to be true. This (camework fo r testing statistical hypothes~s has some s pecialized termi no logy.
legal cases sometimes involve sla tislical evidence. For e xample. and it t~' pc U error. yo u would incorrectly reject the null on aver· age once in 20 cases.s is th e . and the null hypothesis could bc thai the defe ndant is not guilty. So metimes a more conservative signi fica nce level might be in order.. Eq uivalently.2 Hypothesi~ Test~ Concerning the Population M. The p vnlur.5 hypothesis is rejected at a prespecified ~i gni fieance level conveys less informa tion than rcponing the pvalue. • "/''l'"Y.//.1 of Ihe tcst. lind the prob at'lilit)' thai th e [cst correctly rejeclS the !lull hypothesis when the al tcrn ative is true is the power of tlli~ tr.3.". the pvalue is the smallest significance level at which )'OU can reject the null hypothesis.. and the va lues of the tt'l)l statistic for which it d oe s not rcjectlhe null h)'pothes. a very conservative standard migh t be in order so that consumers can be sure thOl the drugs available in the markel actually work.::st.· ticians and econome tricians use a 5% si gn ificance level.. in which the null hypothesis is nol rejected whe n ill fuet it is false. if a government agency is considering pCfmilling the sa le of a new d rug. What significance level should you use in practice? In many cases.1 'Yo. The sct o f values of the test statistic for which the test rej ~cts the null hYP()l hesis is the rcjl'C: lion region. "'."'. KEY CONCEPT ~ 3. assuming tha t Ihe Ilull hypolhl!sis is correct. Similarly.sis value a~ is the statistic ac tually ob<. in which the null h ~rpot hesi s is rejected whe n in I'aet it is true.'.eoo 79 '. by random sampling vnriat io n. the prespccified probability of n type I e rroris the significance le. stat h.ervcd. to avoid Ihis sort of mistake. . then one would wan l to be quite sure that n rejection of the n ult (conclusion of gui lt) is not just a result of random sample varia tion . a t least as adverse (0 the null hypotht. The p re~ p ecificd rejec tion probability of a statist ica l hypothesis test when the null hypothesis is true that is.lcccptanc:c region. In some legal sc ll ings the significance level used is 1% or even 0.:: is the proba bility of oh ta inin g a te st statistic. 'nli:: critical "alue o f the teS! sta tistic is the va lue of the statistic for wltieh the test j ust rejects the null hypothesis a t the give n significance level.''Y''''''' ""1 THE TERMINOLOGY OF HYPOTHESIS TESTING A Statislica! hypothesis leSt can make 1\\10 types uf mislakes: a type I error. The probability tha l the test actually incof reelly rejects the null h y pOTh es i ~ when it is true is the size of the test. If you were to test many !>latistieal hypotheses at the 5% level.
Being conse rvative. if l t~1 1 > L96). has a cost: The smaller lhe significance level.Y. 1. and th e more difficult it becomes to reject the nu ll when the null is false. Comp ute the p value [Equation (3. with the modifi cation that only large positive values of the (statistic reject the n ull hypot hesis.values and to hypothesis testing is the same for onesided alte rnatives as it is for twosided alternatives. the lower the power of the test. 3.14)]_ Reject the hypothesis at the 5'"1/0 sig nificance level if the pvalue is less than 0. * . so th e reI· evant alternative to the null hypothesis that earn ings are the same for college grad· uates and nongraduates is not just that their ea rnings differ. The p value is the area under the standard norma l distribution to the right of the . Many economic and policy application s cao call for less conservatism than a legal case..13)J..O. rather than values that are large in absolute val ue. 13). the alternat ive hypothesis might be lhat the mean exceeds . OneSided Alternatives In some circumstances. 10 test the onesilled hypothesis in Equation (3.6 ".Ly." J KEY CONe"" 3. then you never need to look at any statistical evidence. In fact .16) The general approach to computing p .o (onesi ded alternative). construct the istatistic in Equation (3. the larger Ihe critical value.8)1. This is ca ll ed a o nesided allernathc bypothesis and ca n be written HI: E(Y) > f.05 (equivalently.o . For example. Compute the {statistic [Equation (3..• 80 CH .S£ (Y) IEq uation (3. TESTING THE HYPOTHESIS E(Y) = My. Specifically. 2. so a 5% significance \evel is often considered 10 be a reasonable compromise. the most can· servative thing to do is never to rejecllhe null hypothesisbut if thaI is your view. PTER 3 Review of Statistin "O_.'I. Key Concept 3. .0'1/". one hopes thai education helps in the labor market.6 summarizcs hypothesis tests (or me population mean against the twosided alternative.. (3. 16). Compute the standard erro r of Y. in the sense of using a very low signific ance leve l. for you will never change your mind! The lower the signifi cance level. but rather th at graduates earn mo re than nongralluates.O AGAINST THE A LTERNATIVE E(Y) Mr.
the p va lue.5% level) based on your dat a: If someone walks up to )...17) The N(O . I) approximation to the distri bution of the (statistic.u Y.o is no t rejected a t the 5% level. H e re is one way to construct a 95% confide nce set fo r the popul atio n mean. write this value down o n your list.. Tha t is. it is impossible to learn the exact value of the population mean of Y lL <.. the 5% rejection region consists o f values o f the (statistic less than .645. called a cOlllidence interval. ing only the info rmation in a sample. fo r exam ple.Now pick anothe r a rbitrary value of JAr.o by com puting the {statistic. Test the null hypo thesis tha t JAy = I'y. if you cannot reject it. . (3. The confidence sct fo r fJy tu rns out to be all the possible vn lu ~s of the mean be tween a lower and an upper limi t. if il is less tha n 1.O ' t he n the discussion of the previous paragraph applies exce pt that the signs a re switched.3 Confidence Intervals for the Population Mean Because of rando m sampling error. However.yexcceding I'y. This list is useful becau~e it !'>ummarizes the set of hypotheses you can and can not reject (at the . Such a set is called a confidence set . based on the N(O.96.):0. keep doing this for a ll possible val ues o f the pop ulation mean. and write down this nonrejected value 1'.o a nd test it .lblc property: The pro bability Iha t it con tains the true value of the population mean is 95%. Ihis hypothesized va lue . . and the prespecified probability tha t JI..0.ou with a specific numbe r in mind. A bit of clever reasoning shows that this se t of values has a rc markl.'). ' m e rejection regio n for th is test is all values of Ih e (statistic exceeding 1. so that the coo fid e nce set is an interval.3 Con ~dence Intervals for the PoptJlo~on Mean 8t calculated tstatistic.. Begin by pick ing some arbitrary va lue for the mean: ca11 this 11y.o. Do this again a nd a gain: indeed..o agains t the alterna tive tha t Jt)' .3.. If instead the alternative hypothesis is that E(Y) < .t)  I . 3.645.« Jtr. Continuing this process yields the set of a ll va lues of the populatio n mean tha t cannot be rejected al tbe.<1>(. is p value = PrJ/a(Z > .u r.1. it is pos sible to use da ta fro m a rando m sample to construct a set of va lues that con ta ins the true popu la tion mean JJy with a certa in prespecifi cd probability.645.5% level by a twosided hypothesis test.y is conta ined in this set is called the confidence le\·el. I) critical value for a onesided test with a 5% significance leve l is 1..16) concerns values of I'. you can tell him whetber his hypothesis is rejected or not sim ply by lOOking up his number o n your handy list. The o nesided hypothesis in Equa tio n (3.
.82 CHAPTER 3 Review of Slothtic~ 3.y ::= 21. That is.):u is rejected at the 5% level if it is more than 1.64 and SfTV) 1. Ill" = 21.5 (although we do nOI know this). ier approach. l11i s discussion so far has focused on twosided co nfidence in tervals.(Y). Y+ 1.y.et uf values of Il r that cannot be rejected hy a onesided hypothesis test. a\\a~ from Y.7 summaril. fo r 11 requires you to test all p os~ible VAlues of lJ. the values on your liq co nstitute a 95% confidence set for J. $25. Thus.5 1 = [$20.5: Ihis means that in 95% of all sampl~ your list will contain the tr ue value o f Ily .96SE(y) of Y.13. you will correctl y (Iccep l 21.64 :: 2. if n is large. I) d istri bution.2:< = 22. O m' could instead construct a onesided confidence interval as the . According to the formula for the Istatistic in Equation (3. Then Y has a normal d istri bu tion centere d on 21.7 A 95% twosided confidence inlervaJ fo r Il y is an inte rval constructed so that it contain s the true value of Il l' in !}5% of all pt)~sib1c random samples.l. K~y Co ncept 3.96 stamiard error. Su ppose the true va lue o f J1. the probability of rejecting the null hypothesis J1. 99% confid ence inte rval for Jly "" TIle clever r e asoning goes li ke t his.96S£(Y) :s: Jll :S approach. [J ut bccau~e you tested al i possible values of the population mean in constructing your set . of those values within :':" J.13) .5.5!{S£(YH· 90% confidence in terval for Il) = [y ::!: 1.yas nu ll hypotheses.% x 1. y that are not rejected a l Ihe 5% leve l consist. Fortunately there is a much e. in particular you tested the true VAlue. 95 % . Thu~ the set of "alues of J.96SF.::!:. Although onesided confi .. a trial val ue of J1.) is 21. a 95% confidem:t: interval for Il ) is Y .5 at the 5% leve l is 5%.I. (Y ::': 2. When the sample size II is la rge.t:. consider the problem of constructing a 95% confidence intl' r val for the n K~an hourly earnings of recent co llege graduates using a hypothetical random SAmple of 200 recent college gmduatcs whe re Y = $22.5 has a N(O.es tb i~ A s an example. Thus. ] n 95% of all samples.64SfiY)}. 90% .%S£(V) j. This met hod of constructing a confidence SCI is im practical. The 95% conUdenC(~ interval for mean hourly earn ings is 22.28.5.l.15 ). 1.64 .1. and the Istatistic test in g the null hypothesis Ily "" 2 1. a nd 99 % confidence intervnls for Jly are CONFIDENCE INTERVALS FOR THE POPULATION MEA N ')5 % confidence interva l for Jly = (y =.
. they m U!lt be estimaTed (rom somple~ of men and wome n.· of cltrnings for men. 18) The null hypothesis that men a nd women in these populations have the same earn ings corrc~ponds to Ho in Equation (3.. lin using Y".Y. (f . is appro:<i matcly distributed N(iJ..18) with dn ::: O. . .Y.Then an estimator of Po". To lest the null hypothesis That IL"... Considc!' Ihe Il ull hypoth esis that earn ings for these two populations differ by a ce r tain amount . . fO const ruct confide nce intervals (or the d iffe rence in Ihe means from IwO diffe rent Hypothesis Tests for the Difference Between Two Means Let 11..""q~/" ... . is Y"... sa y do.The n the null hypot hesis and the twosi ded a lte rna tive hypothesis ure 11& 11". 3. Leltbe !lample u"erage a nnual earnings be Y". dn "s.. approximately distributed N(JJ.radu (lIeu fro m college and let 11. Coverage probabilities. Simi larly. This sectio n summarizes how to lest hypothe!oes and how populalions. Suppose we have samples of II". women drawn aT random IIOIn their popula tions...~ Recall thill Ym is. ..4 Comparing Means from Different Populations Do recent male and female college graduHles earn the same amount on a verage? This question involves compa ring the means of two diffe rent population distribu tions.. H I: 11".11 . Y..". dV' (3.. Because these population means are unk no\\ ll.3.lhey are uncom mon in applied econometric analysh. . be the popula tion me an fo r rccently grad ua ted men.. ..fllm )' where ""}" is the popUlation vari anc. .. for women.). . according to the central limil theo rem.... The cove rage probah ility of u confidence inlerv" l for Ihe POPUI<l lioli mean is the pro bability. for men and Y. :.. we need to know the distribution of Y". be the mean hourly earning in the popuilltion of women recently g.IL.p..  11. ....JL.4 Comparing ho\eons from Dilferent Populations 83 denec 1I11en'uh have application!) in some brunches of st8tistics.. computed over all poss ible ra ndom sarn plesIhal il cOnlai ns the true population mea n.. men and " .
CHAPTER 3 Review of Stoti)tlCl where O'~.. and If " are large. is Y".In practice. tbe n tb is approximate normal distribution can be used to compute pva lues for the lest of the null hypothesis that J.14) ..65.. is defin ed as in Equation (3.Lm . f .I. Thus. .. Th us the standard error of Y".Y _ _ M. (3. the pvalue of the IwosideJ test is computed exactly as it was in the case o( a single population: that is.8.. Y . are large. .. is defined si milarly for the women. ) Istatistic or companng two means . ) . and .Y. is the population vAriance of earnings for women .f. H u~ and 0'. For ex ample.: value of the I·statist ic exceeds 1.J."'" from the eslimalor Y"... these population varia nces a re typica ll y unknown so they must be estimated.20) and compare it 10 Ih e appropria te critical value. th ey arc indepe nde nt random va riables. and Y.J. except that the statistic is computed only for the men in the sample.. If the alternative is one·sided ra ther than twosided (that is.))... however. Also..20) has a standard normal distribution under the null hypothesis when tI". if the alternative is that "'...nifi can ce level rejects when ( > 1.Y". 17).is Istatistic has a standard normal distribution.: extends to constructing n confidence imcn'al for the difference belween the .. > do)..J) .  SE(Y...) = \ nm / S2 m + n·. simply calculate tho:: Istatist ic in Equation (3.'1. and a test with a 5% sig. As before. Confidence Intervals for the Difference Between Two Population Means The method for constructing confidence intervals summarized in Section 3." where . 2.L.~. then the lest is modifi ed as outlined in Section 3. (U~' ''''I) + (0''. . To conduct a lest wi th a prespecifi ed significance level. Because the (statisric in Equation (3 . = do.L.Y ) . The pvalue is computed using Equation (3. . Because Y...11".w. the null hypothesis is rejected at the 5% significance level if the absolu t. . Ih e . II' . 19) The Istatistic for testing the null hypothesis is constructed analogously to the /Statistic fo r testing <l hypothesis about a single population mean. recall from Sec· tion 2..F".. they can be estimated usi ng the sample variances.7).1 11. arc constructed from differe nt randomly se lected samples. by subtracting the null hypot hesized va lue of J..val ue is comput ed using Equation (3.2 (3. s~ and . is distributed ~'" . II' ~.: t = (i'm.96. then th...4 that a weighted average of two normal mndom vu riables is itself normally dislribUied .do ( SE( Y .20) If hoth Ifm and If. and dividing the result by the standard e rror of '1".' are known.f.
m }l .96 standard errors away from do. then the causal effect (Ihut is. the box " The G ender G ap of Earni ngs o f Col · lege Gradu ates in Ih e U.S DifferencesofMeans Estimation of Causal Effects Using Experimental Data Reca ll from Section 1. 1. ). Y". . The diffe rence bd wccn the sample means of the Ire.. m ..  Y"" is less than 1.E(YlX = 0) in an ideal randomized colllrolled experiment.5 DifferencesofMeons Estimorioo of Cousal Effects Using Experimenlal Dolo means..Umen! and camral groups is a n est ima tOr of the causa] effect of the treatmen t.fl. or to a conlrol group. then randomly assign~ them either to a treatment group.S. ..lhe causal effect is also called the treatment effect. where E( Y IX = x ) is the expected value or y ror the trealment group (which rece ives treatmcnt 1e\'el X "" x) in an ideal randomized controlled experiment and F:( YI X = 0) is the expected value of Y for th e control group (which receives treatment lc vel X = 0). entit ies ) from a populati on o f interest. is (Y.96.  Y..E(YIX = x) .: in hand.. morc generally. 3.. Specif ically.. 8S Because the hypothesized value do is rejected al the 5 % level if lll > 1." contains an empirical investigation of gender differ· ences in cumings of US college graduates.96 mea ns IhHI the estimated diUercncc.21) 95% confidence interval for d = IJ.Y.2 tha i a random ized controlled ~x pcri mc nt randomly selects su bjects (individuals Of.96standardcrrorsofY". is the difference in the conditional expcclalions. then we can let X = 0 dcnOlc fhe control group und X = I denote the treatment group.1::(Y IX "" 0 ). RUI III s: 1. W ith these form ula!.96S£(Y.96'((0 will he in the confidence sel if fll:S. Ihe 95% twosided conridcnce interval for (/ consists o f those values lII' d within ± 1. the causal e ffe ct on Y of Iremment level . which docs not receive Ihe {rcalme n!. )" 1. . the trcalmenl effect) is E(Y I X = I) . if the treatmen t is binary).: (3.Y. The Causal Effect as a Difference of Conditional Expectations The causal effect of a treatment is the expected effecl on the outcome of interest of the treatment as measured in an ideal randomi7c d controlled experimcnL 1'11is effect can be expressed as the difference of two conditional expectations. which receives the exper imenta l treatment.3. (/ = IJ.. . lllUS. In the context of e xperiments. If there arc only two trealmenlle"e ls (that is. If the lreatment is binary treatment..
$18.1 gives eSlimltlcs of hourly earnings for collegeeducated full time wo rkcrsagcd 2534 in the U nited States in 1m. Ages 2534.00 10.~uming a 4Ohour work wed..'J I . "'imfican!ly dlU crcmirom zero at Lbc ··1 % IIgniroana: te'·el.. An hourly gup of $3.~O.39 " 1592 y 17.163.1 Trends in Hourly Earnings in the Uniled Slote.on a verage.03 8.52 21..86 CHAPUR 3 Review 01Stotistk:$ The box in Chaptet 2.. The 95 % confi dence intenal {or the gender ga p in earnings in 2004 is 3. and 2004.31 a sta nda rd c rror (~ V lO.." shoW! that. w~ Confid.rKe . Ow.31 :woo 2004 21. L In 2004.52 might not sound like much. in 2004 of the increase is not statistically significant :It the 5% sig· nificancc level (Exercise 3.901 men surveyed was $21. FirsLthc gender gap is large.47 ' 6. and 50 paid weeks pc. aDd the sta ndard deviation of earnings was SS. Y_l 1992 1996 20.73" 2.s of Working College Graduates..70 8. Sli P. ~ .60 16.<1. The difference I. What arc the recenltrends in this "gender gap" in earnings? Social no rms and laws governing gender discrimina tion in t he work place have changed 5ubstantially in the Uni ted Stales. "The Distribution of Eam 1739 women surveYl:d was $1 8.99 of 50. but o\'er a year it adds up to $7 .99. f_ ' 8..20 IU6 0.3....99 ThdC estimaLes are compuled usintdaLI on III flill ume " 'orkers aged 2534 wrvcyed 111 the CllrrC"nt l'lIpul..29 2." .3'P11901 + 8.1 7).21 18.. Second.40 2. the avcrage hourly earnings of the 1. $21. using data collected by the Curren! Popula tion Survey. Ihis gap is large if il is meas ured instead in percentage tcrms:Accord ing to the estimates in Table 3. The a verage hourly carning.37 0. 12) . the dati for::!O:M wen: collected in M a ~b ZOOS).22.48. as. wit h i~ I ings in the Uni te d Slates in 2004.33 19.52 . 2000.. w~ Diffto"• • Mo. Earnings for 1992. 1996. The results in Table 3.3 1377 1300 1901 7.72 18.80" 3. in 2004 DoH ars . from 52..HJ4.96 x 0.31 "" ( S2.39.52 ( .30 2. this Mable or hll ~ it dimini~hed over time'! Table 3.16.n ¥'S. in 2004 women C :UlI/illIlCti TABU 3.56" 3.90 " 1370 1235 l lR2 1739 Y  y. male college graduates e arn more than female ("Ollegc grad uates.48 10.n 2. and 2000 we re adjusted for infla tion by pUlling the m in 2004 dollars using the Consumer Price Indel(.79/hour in rcnl terms over th is sample. Third.!: 1. the esti mated gender glLp has ineTensed by $O.ulX! Surw:r ron dueled m Mardi of lhe f\l:lt yel r (for exlmple. Y_.1. 1992 10 2004. and the standard deviation of ea rn ings for me n was $10.73/hour to 53. Is the gender gap in c:lrnings of college grad uate~ SJ. 1996.111us the estimate of the gende r gap in earnin~ for 2004 ..1 @1I739).914 1..'17).30 0.521hour: howevc r..\ suggest four conclusions.:r yea r.52" 0.29 2. .
. while for men this mean lBecause ofi nflatioll.4): As reported in Table 2. the pric.. given in Equation (3. difficult to administer.. ~ was $27. A welldesigned. so a 95% con fidence interva l for the causa l effect can be constructed using Equation (3.·as worth more than a dollar in 2004.1) than it is for al! college graduates (ana men and women'? Does it reflect differences in choice of jobs'! O r is tht:re some other cause? We return to [hest: questions once we have in hand the tools of multiple regression analysis. A 95% confidence interval for the difference in tbe means orlhe two groups is a 95% confidence interval for the causal effect. which corresponds to a gender gap of Estimation of the Causal Effect Using Differences of Means If the treatmen t in a randomized co ntIoU ed experiment is binary.6%: in other words. then the causaJ effect can be estimated by the difference in the sample average outcom es berween the trea tment and control groups.12. For this reason.99).a dollar in 1'192 .346 to PUt them into "2004 dollars. TIlis empirical analysis documents Ihat the "gen a measure of the price of a "mMket basket" of consumer de r gap" in hourly earnings is large and has been goods and ~el"iiCC5 constructed by the Bureau of ubor Sta fairly stahle (or perhaps increased slightly) over the tistics. which can be tested using the 'statistic for comparing two means. tell us CPI baske t of goods and ~rvjccs thai COSt SIOO in 1992 cost $134. On. the gender gap is smaller for young college gradu31t's (the group analyzed in Table 3.12)127. Ol'cr the twcl\'e ~'cars rrom 1m to 2004.83] among all full'lime earnings in 19')2 canno t be directly compared to earnings in 2004 without adjusting {or mfla tion. For Ihis reason.: of the CPI market baske\ rose hy 34. in the sense tha t a dollar in 1992 could bu~' more goods and ~crvices !hun a doll~ r in lOOJ could.Means Estimation of Cousal Effec~ Using Experimental Dolo 87 earned 16% less per hour than lIlen did (53. more than the gap o f 13% ~cn in 1992 (S2. In economics.4.: wa y to ma ke collegeeducated workers. Thus 24% [= (27. the topic of Pa rt IJ. To mak e earning. !his adjustmell t is to uS<: th~ Consumer Price Indt .' in 19')2 and 2(Xl~ C()m why this gap exists. experience. howclJcr. experi ments tend to be expensive. The hypothesis thatlhe treatment is ineffective is equi valent to the hypothesis that the two mea ns are the same. the mean earnings for all collegecduc:lled women working fu lltime in 2004 was $21.21) . by 1. lyzed in Table 2. the recent P<lSL The analysis does not. Founh. randomized controlled experiments are commonly conducted in some fie lds.3. econometricians sometimes st udy "natural experimen ts:' a lso called quasiexperimen ts. so they remain rare.21. 1. however.60 in 2004.33). Does it arise from gende r dis parahle in THble 3.lhHI is.73/ $20. ethically ques {ionab le.5 Difference~ol.~ (CP1).20)..1S3. b~' multiply fere nces in skills. in some cases.83 . or education between ing 1992 earning:. such as medicine. in which some event . 1992 earnings ~re infla ted b~' the crimination in the labor market? Does it rencet dif amount of (wendl CPt price inflation. wellrun experiment can provide a compelling estimate of a causal effect. and .521 $21.
the standard normal distribution can provide (j poor approximation to t he distribu tion of the {slatistic.Although the standard normal approximation III thl! Ista~ tl. where the standa rd error of Y is given hy Equat ion (3. onc special en.. 7). and it can he \'cry complicated.5. Y" .:!" the sample siLe b small .Substitution of the laller expression into the formur yields the fur mula for the {statistic:  y V J.um. As discussed in Section 3.B). The box .. hypoth es i ~ Co nsider the {·statistic used 10 test the thai the mean of Y is . .cd controlled experime nt." provides an exam ple of such a quasiexperime nt thai yielded some surpri si ng conclusions. under ge neral wndilions the !Stalistie h~s a (tan dilrd normal distribution if the sampk size is large and the null h ypo t hesi~ i::.!' (~st at i stic testing the mean of a si ngle population i!> the Student f distribution with /I . If.2 through 3. the finitesample dil3tribution: see Section 2. 'fbcre is. it can be unrcli able if /I i. the (stalistic is used in conjunct ion with critical va l ues from the standard nonnal distribution for hypothesis It!lIli ng and for the con struction of confidence intervals..ign ing dif ferent lrenlmcnLS to different subjects as if they had been part of a randomi/.88 CHAPTER 3 Review 01 Skltistic~ unrclnted 10 the treat ment or subject characteristit:S has the effeci of a&.1 2)j . 3..~..LI. .: in \\hich . Wh!. .1 degrees of freedom. small. The usc of t he standard norma l distributio n is justified by the ccntra l limi! theorem .2. which Hpplies wh en th e ~ am plc size is large. however. using dala Yl . The exact distribution of the fstausticdepcnd.22) where ~~ is gi\'en in Equation (3. then the exact distribution (that is... The formula for thi s stlltiSlic is given by Equation (3. The tStatistic and the Student t Distribution The tstatistic testing the mean . and critical values can be taken from the St udent (distribu tion.6) of thl.. sr·/n (3.6 Using the tStatistic When the Sample Size Is Small 1n Sl:ctions 3. tfUt' [see Equation (3.O . the population d ist ribution is ilself normally distributed. however.. "A No\cI Way to Boosl Retirement Savings.'i on the di~tribution or Y.tie is reliable for 11 wide range of distributions of Y if II is large.10)..
1 degree~ of freedom. y. (1'~)..l)s~ / rT~: then some algebra l shows that the Istatistic in Equation (3. the null hypo th esis would be rejected <It the 5% si gnificance Icvel against the twosided al ternative. even if the population dist ribu tion of Y is normal.. YII are i.15 and 11 = 20 so that the degrees of freedom is" ..I)Jifui· Vvrlll '\ " Z ...!!.3.al is somewhat wider than the confidence interva l constructed using the standard normal critical value of 1. if the n ull hypo thesis JI.4 that if )'l.o)I'/r:rf/1I and let W = (/I . Spccifi c.19) does 11 0t produce a de nominator in the Is ta tistic with a chisquared distribution..I degree ~ of freedom .(T ~) .22) ean be writt en as Z /VW /(lI . thc n critical va lues (rol11 the Student I distribution can be used to perfo rm hypothesis tests and to const ruct confidence intervals. distribution ofY is exactly N(p.llIy..6 Using the IStati1lic When !he Somple Size Is Small 89 distributed.09SE(Y} This confide nce inter.ViV / (>II).. .}'. the Istatistic can be written as such aralia. if the population dist ribu tion of Yis nonnal. Recall from Section 2.1 degre'es of freedom is deIJncd 10 be the distribution of Zl'l w /( 11 .22) has HStude nt t diSlrihUlion with 11 . then th e {slatistic in Eq uation (3. To "crifv this resu ll. and the populat io n dis tribution of Y is N(J. consider a hypothetical problem in which lOCI = 2. W is II random varinble with 11 chisquared distribution with 1/ . and Y and ..96.1 degrees of freedom. IThe d<:sired ex pression i~ obtained by mul!jpl~'ing and dl\'idinj! by ~ and oollccting terms: l' .O is correct. wo uld be Y !: 2. d ist ri bution .JA~_") \l'ul"" + GI..= The tstatistic testing differences ofmeans. In additio n. where Z is a random variable with a standard no rmal distribution.. are indepen den tly distributed./J.. does not have a Student . The Ista tis tic testing the d if· ference o f two means.09) ..l )s~h'~ bas a X~ . The St udent I distribution docs not apply here because the variance estimator used to compute the standard error in Eq uation (1 .py. The 95% con fi dence interval fO r My.1.. then under the nu ll hypoth esis the H t n ii~ ti c given in Equation (3 .20).J..1 = 19. . When y 1•.) . 1(11 .I~n I"" V4. y..l distribution for all II. recall from Section 2. ' Y" are i. . a nd Z and lV arc independently distributed.09. then Z = (y . thus. It fo llows thllt.d.4 that the Student I distribution with 11 .22) has an exad Studell t I distribution wit h II . .d. then the sampling.. W = ( 11 .15 > 2.iJ). y. Fro m Appen dix Table 2.. give n in Equation (3.i.f~.i. u }! n ) for all II.1). constructed using the / 19 dis tri but ion.y = P1.I) . l . A s an exa m ple. If the population dist ri but io n is normal ly d istributed.lel Z = (Y . the 5% twosided critical value for thl' ' 19 distrihution is 2.o)/V(T~ / 1I has a standard normal distribut io n fo r all /I. Because the Istatistic is larger in abso lute value than the critical value (2.. \ "i ( Y . and the pop ubtian distribution of Y is N(p.
{[cel its enrollment rate? To mcasure the rf(ce t of t. "Iadrian and Shea argued thm there ".. The esti male of thc trealment effect is 48.igned treatment and the clluS<\1 effect of the change coutd be estimated by the difference in mean~ between the Iwo groups.ays optional."II other firms employees are enrolled only if they choose to opt in. ~a\'i n gs no sY!olcmatic dif· ferences bel\\ccn the workers hired before and after the change in the enrollment ddault... 'JbC) compared two groups of worlen: those hired the year before the change and not automatically enrolled (but could opt in).2%).:onfusing thatlhey his o r her enrollment stlltus simply and th~' fill~ out a fonn.! the same.efl' nomic models. Brigitte Madrian and Dennis ShCl1 considered Onc sueh uncon\'cntional method for Slimul tit ing ret irement saving.9% (II = 51301). 10 change To economists sympathetic to the conventional view lhat the dc{aull enrollment scheme should nOl mattcr. called 401(k) plans afler the applicable section of the U. the 95% con.. As a con)cque ncc. in full or in part. whereas the cnrollment rote for the "OPI()UI " (lrC3tmem) group was 85.o c. however. In an imponllOl study published in 2001. is al.he method of enroll ment. from an econometrician's pcrspecti\'c. The ior is not 1I1~'ays in D eeord with conventional cco financial aspect!:> of the plan wer.S.:liable ad vi ce~ cOlild this coO\'entiona l reasoning be the metliod of 1'Ilrolllllf!1It in a . r. Con\'cn ~vings nonpartidpation to panicipallOn. nre aUlomlllically enrolled In such 11 plan unless they choose to Opl QUI: . Although neither cxplanation is eco nomically rational in a conventional sense. and those hired in the }'~ar tional mel hods for encouraging retirement ceonomi~ts focus on financial incen tives.·ladrian and wOllder~d... Madrian and Shea studied a large firm Ihilt chang~'d retirement.8% to 50.imply Ircat the defanlt option as if il were r.. the method of enrollmentopt out ..: that behav amI au tomatically enrolled (bu t t"Quld l)pl OUI).. the change was like a randomly as. One potential exphmation for their finding is thaI many worken. lax code.livings anot her cxphmalion is thai you ng workers il~ ing wrong? Doe~ would simply rather not think abou t and plan directly . Howcver... s. ) phm from . tend nOlIO save enough for their retirtmenl . ThUs. find these plims:.4% (n = 4249).90 CHAPU t 3 Review of Stotis~c~ M any economists think th tll worker. According to conven tional cwnoll1 ic models of hehavior. Beeau~ thei r sample is large. Madrian and Shea found that thc default enrol! taken oul of the paycheck of particip. Enrollment in such pl:tns. dollar value of the ti me required to fill out She~ the form is vcry small compared with the finRm:ial implications of thi~ decision.lIing employ· ees. both are consiMent with the predictions of "bchllviortll eco nomics. or opt inshould scarce ly milller: An cmployee who wnnt:. Recently. at some firms employres ment rule made a huge difference: The enrollmcnl ralC (or thc "optin~(control) group ~as 37.·· and bolh could Icad to lIcttpting the default rOlllilJu{'l f the default option for it~ 401(1. Madrian and Shea's finding was lIstomshing. fidence for thc treatment effect is light (46. have increa)ingly o~f\'cd arter the {hangt.9% 37' %). Many [UIllS offer rClircmelll savings plans in which the firm matches.c. But. there has heen an upsurge ill interest in unconventional ways to influ ence economic decisions.5% ( = 85..
. Inc reasingly mll ny ccono mi..: design of retiremen t ~1\'in gs plans.. (T~. + "". based on a differ ent standard error formulat he "pooled " standard error COOTmlahas an exact $lUdenl . . Bena rlzi (2004). ( Yi  Y".such details lIligh! be as iml'or IImt a~ fmll nci(ll aspecb for boosting enrollment in retircmcOi savings plan. where the standard error is the pooled standard error.o group variances are the ::. SEp(><>I~iy.20) .) . The pooled standard error of the dif fcr~ncc in means is SEpooirA Ym . A modified ver"ion of the dit'fcrcncesofmcans Istatistic..  y. Q". it does not cven have a standard normal distribution in large samples.. ] (3.y~.. if the popUlation distribution of Y in group n· is N(.. and the pooled I·statistic is computed using Equation (3. .19).r:.rM is Ihat it applies only if the two population variances an: the same (assuming lIm *. the pooled variance estimator is bi<lsed and incon sistent. I'f the population distribu tion of Y in group 1M is N(J. The pooled variance estimator is.so that the twO groups arc denoted as m and IV.) .23) ~' ('! u p g. the pooled standard error and the pooled (statislic should nOI be used unless you have a good reason to believe Iha t th e population variances are the same.ts <l rt To 1e:HIl more about behavioral economics lind tho. Adopt the nOlalion of Equation (3. .I '" . in fact.m. .y~.2 degrees of freedom ." ).. Therefore. cven if the data arc normally distributed. see Thaler and starting [ 0 think tha t . S ..1.~ ).'v . where the fi rst summation is for the ob:.) = spoolrd X ~ + l In". If the population variances are different but the pooled variance formula is used .mup . and if the \v.Stoti~tic VVhen the Sample Size Is Small 91 enrollmellt option .6 Using the t..J.)2 + ~ iI ' (Y.. The drawback of using the pooled variance estimator S~~ .n1e (that is.U the pop ulation vmiances arc differen t. the pooled SI<lD d<lrd e rror formul <l applies only in t he spt:cial case Ihallhe two group~ h:lVC the same variance or that each group has the same number of observations (Exercise 3. " 2 [ L . the null distribution of the pooled (statistic is not a St udent I distrihution . = 0". = n".). then under the null hypothesis the Istatist ic computed using lhe pooled standard erro r has a Stude nt I distribution with " ".21).}Q()I~d ..u ~l (. + 11 .:l. distribu tion when Y is normally distributed: howc\'cr.ervalions in group In and the second sum· mallon is for the ohservatio lls in group IV.
Even thoug. which allows for diffe rent group variances. th e . 10 anothcr. In practice. normal distributions lire the exception (for example. the diffcrence in th e pvalu~s comp uted using th" Student (and standard normal distributions ne ver exceed U. is as given in Equation (3. the normal approximation to the distribution of the .ic reason for two groups havin g dif fe rent means typically implies that the two groups also could have different vari ances. this does not pose a problem because the differe nce betwee n the Student I distribution and the standard normal distribution is negligible if the sam ple size is large. Even if the population distributions are norm al. sec the boxes in Chaple r 2.19). relates one variable.19) . used io conjunction with the large sample standard normal approximation. Even if the underlying da ta are not normally distributed. inferences about differences in means should be based on Equation (3 . "The Distribution of Earnings in the United Slates in 2004" and "A Bad Dayan Wall Street"). some software uses the Student I distribution to compute pvalues an d confidence mler vals. the sample sizes are in the hundreds or thousands. stat istic computed using the standard error fonn ul a in EqouLioD (3. \¥hen comparing two means. and in all applications in th is textbook. the Sample Covariance. For economic variables. and the sample correlation coefficient. like many oth · ers. however. therefore.. .92 CHAPTER 3 Review 01Statistics Use of the Student t Distribution in Practice 'F or the problem of testing the mean of Y.h the Student {distribution is rarely applit:n ble in economics.002. the Studenl r distribution is applicable if the underlying population distribution of Y is normal.7 Scatterplots. for n > 80. In most modern applications. any econom. This section reviews three ways to summarize the relationship between variables: the scan crpiol. X (age). 3. Y (earnings). the pooled standard error form ula is inapp ropriate and the correct standard error fonnula. Accordingly. the sample covariance. inferenceshypothesis leslS and confidence intervals about the mean of a distribution should be based on the hlrgesample normal approximation. and the Sample Correlation What is the relationship between age and earnings? This question.statistic is valid if the sam ple size is large. Therefore.Ul .19) does nOI l w\'~ n Student t distribution. In practice. the)" never ~xceed 0. For n > 15. large enough for the differe nce between the Student r distribution and the sta ndard norm al distribution to be neg ligible.
3 .•. " I . I i·· .2. Figure 3. For example one o( the workers in this sample is 40 years old and earns $31.2 lf~ Scatterplot 01 Average Hourly Earnings Vi.YJ For example. II · ' . the Sample Covariance. . I • • • • I \0 I. I : : 201. The colored clot corre~51o a dO·year·old worker who oorns $31 . •• ••• ... and the Sample Correlotion 93 ! . E ach dot in Figure 3...25 per hour. and earnings could nOI be pre d icted perfectly using only a pe r~on '" agl::. FIGURE 3.2 is a sca ne rplol of age (X) and hourly earnings (Y) for a sa mple of 200 workers in Ihe information industry from the March 2005 CPs. Age Averagt Hourly earnings 90 • R() 70 ' ~}f • • • • • • • •• • • •• • • • • • I • • • 401• • • • • • • • • • • • • • • I •• • • I • I • "'lI • • • • •• • • • I ••• • •• I • •• • • • I . The sc<l. tlerplot shows a positive relationsh ip between age and earnings in this sample : O lder wo rk ers tend to earn more than yo unger workers. Scatterplots A scatterplot is a plOLof n observa lions on X . I .25 per hour: this worker"s age Clnd earnin gs are indicated by the colo red dot in Figure 3. . I' .7 Scotlefplo~. I •• • • •••• • I . and Y" in wh ich each observation is represenled by the poin t (X.2 corresponds 10 an (X.• • • • • • • • • • • •• •• • • • 501 • 0 !O 25 30 35 40 45 50 5S 60 A" 65 Exh point in the plot reprosent5the age and average earnings alone of tho 200 workers in the sample. . Th is re lationship is nO! exact. however.. Y) pair for one of the observations. The data are for technicians in the informotion industry from the March 2005 CPS.
The s1tmplc correla tion coe ffi cie nt. this difference ste mS [rom usi ng X a nd Y to esti mate the respective popula tion mcans. then there i~ a negative rdat ion<:hip a nd the correlalion i~ I. If the line slopes upward..2. the sampk correlalion is unitle~) a nd lic~ betwcen . for all i and equals I if X.2') b computed by dividing by 1/ .: popula tion covariance and correlation can. tht: closer is th e correlation 10 :: 1. The ~amplc cova riance a nd correlation arc estimators of the popula tion cova riance and corrc lation..1.1'...). Like the cstim:llors di~u~sed previously in this chap te r.:rplot fall very close to a straight line. (3. The closer the scanerplot is 10 a straight line. in practice we do Dot know the population covariance or correlat ion.\ ) nle sample correlation measures th e ~trl:!ngth of th e linear a~socjation be tween X and Y in a sample of II observations. the average In Equation (3. Th(.] and l: r\y l s: 1. Y. 1.:0.' (the c>. thl! correlation is . . A high correlation eodfiden t dOes not necessarily mean that Ihe line has a ~tl:!ep slope . too. be t!stimalcd by Inking tI random sample of /I members of thc population and collecting the data (X" Y. amI the correlation is I. it means that the points in the snllh. .I instead of n: here. or . . denoted. Whe n IJ is hLrge. The sample correlat ion equals I if X. n .X . then thcre is a po~itive rela tionsh ip between X and }. ' I . it makes Ii HIe diffcrL'nce whether division is by n or n . If the line ~lopes down.24) Like thc sam pic va riance. Like the population correlation. Thot is. == .:ova riancc.a mple correlation . Like the sampk . the)" are computed by replacing a population "vcrag!. the sample covariance is consiMenl. variancc. rather.. for all i. IS dcnoteJ r x)" and is the ratio of the ~am ple covariance to the sample sumdard d~\'iati ons: rn' =~) )' tn (3. The sa mple I.3 as two propertics of the joint prohability distri bution of the random variables X a nd Y.)( Y. . More generally. I Y) . Consistency o/the sample covariance and correlation . Because the population distribution is unknown. i = 1. how eve r.1\).94 CHA PTfR 3 Review of SIo~5ticS Sample Covariance and Correlation The covariance and correlat ion were introduced in Section 2. . is 'n° ~ 1/  L " (X.:!: I if the scattcrplol is a straighlline.pectat ion ) with a sample average.
.. 111.h large and ~m a ll values of X. the sample st<tndard deviation of age is. and Y.20). For these 200 workers. and is left as an exe rcise (Exe rcise 3.d.anooce. suppose that earnings had been reponcd in ce nts.\1I = 10. U. l3ecause the sample variance and sample oova riancc arc con<.7 5<_pl"I>. Y in itiaUy increases but then decreases..11lUs. mple correlation is 0. nol readily interpretable).0.3d ~hows a clear relationshi p: A..01 (the units arc years x doUarll per hour.75 X 13. the rea. Figure 3.:.) are Li.: the sample sta n d ll rd devia tions of earn ings is 1379~Jhour und the covar iance bet ween nge ami earning:. for these data.2S or 25 %.hip.25 or 25% ..3c shows a scallerplot wit h no evident rela tions.26) In othcr "orus. Figure 3.79Ihour. X increases.75 years and the sample standard deviation of earnings is. a measure of linear association.79) = 0. in large samples the sample covariance is close 10 the popula tio n covariance wilh high probahilily.1mple correlation is zero.l'l:.and the sample correlation is zero. Fig ure 3. Figure 3.. . proof of the result ill Equation (3.25 means Ihal there is a posit ive rela tion:.75 X 1379) = O .orrelill ion of 0.3 gives additi onal e xa mples of scan erpJOl!. A~ an example. . that is. th is rela tionship ill far from perfecl. and that X. = 37. The covariance between age and earnings is s"t. This final example emphasizes an important poinl:'i"ne correlation coefficie nt i. To verify Ihat the correlation does not dl:!pe nd o n the uni ts of measurement.hip between age and earnings. thc correlation coefficient is r!lF = 37.9. and correlation .2. have rinite fourth momcnlS is similar to Ihe proof in Appendix 3. the Sample Co.).' . in which casl.n ~ corr(X.8.26) under the assumpt ion that (X.1l1ere is a relationshi p in Figure 3. Y. The (.Ion"" ~n 95 L.3a shows a strong positive linear relat ionship between these vnriables.on is tha t. and the Sample ("".U" (3.3d.3b shows a strong negative relationship wilh a sample correlation of . Y. r . small vlIlues of Y are associated with oo.i"!cn!.. Figure 3. but it b not linear. the sam ple correlation coefficient is ('on3blem. consider the data on age and earnmgs in Figure 3.3 that lhe sample covariance is consistent. but as is evident in the SClt u erplol. the s. and Ihe s. Example.01 /(10. i~ 3701 (u nit s a re years x cen ts/ho ur): the n the correlation is 3701/(1 0. Despih~ this discernable relationship between X a nd Y.S13.
8 7 6 W 50 " ..0 . . X is inde pendent 01 Yond rhe two voriobles orc uncorreloted. . ." . ..: " • ' . c· '0 · ." .0 (qu:ldratlc ) The scolterpioo i..3 ill. .. '. "''hen Y ...:..01 . • . Y is unbiased: . i: ... ." .0 30 301 20: Wi • tlO '.3b show strong linear relationships between Xcod Y. . ·.30 oM 3. .3(...\h \ . " '" ...' . .......96 CHAPTfR 3 Review of StotitliC\ FIGURE 3.. Summary I . Y.0 (d) Corrcl:lt K)o ...... ..y. " '. . 120 . 1111 . .'.9 (b) Corro:hlll<. w 10 01 70 .~ ... . a. Y" arc LLtI .. .m • . . Figures 3. 1"0./11. is :\O estimator of the population mcan.. . '. . '" '"o 7" 70 80 90 100 110 120 130 . (a) O mclation . 60 '0 ~.1:' • '. . .. 100 130 . . (d CorrcLmon .. ' . . .. 70 . ! . 01 . . ' •..Ill '" 01 70 1'10 t)O 100 110 t2t) 130 . Scotterplob for Four HypothetKal Data Seb .' .' ' .. ': '... ..3d. the two variables olw are unconeloted even though they ore related nonlirleOrly. KO . the sampling dist ribution of Y has mea n JLy lind \ ariancc /r ~ = (T ~/ " : h.' . . . The s. In Figure 3.: ..: :. " 7u 90 lfIO 110 120 Do .jJ. • . ' • \"1>" w 50 .. . 6111 ... In Figure 3. .0. . '" •0 30 .'ll1lple average.'" " :...0.. ' I . .. \. . ' I ..' " ' : ..
H ypolhesis lesls ami confidence intervals fo r Ihe difference in the means of twO popu lnt ions arc conceptually similar to tests and intervals for the llli:!aD of a sin gle popu la tio n.ample covariance (9~ ) sample currelation coeffieienl (sample correhlli<1I1) (94) eau~al .ample size il> largc..s i~l c nl: and d .y in 95% of rcpealed samples. The sample correla lio n cue[ficic nl is an estimator of Ihe population correlation coe ffi cient and measures the lini:!ur relationship be tween IWO varia blesthat ill.n cslim. !. 3.lard error of 1I.'I.: ~landaf(1 de"iation (75) degrees of frecdom (75) .. and efficiency (68) critical value (79 ) rejection region (79) ..q uares estima tor (70) hypothesis test (71) null and a lternative hypotheses (72) twol>ided alternati\'e hypothesis (72) I)value (significa nce pro ha hilily) (73) "ample variance (75) "ampit.. 2. the 'statistic has a standard normal sampling dis tril:>ulion when the null hypothesis is true.. con<:i<.Ieeeptanee region (79) size o r a Ic<:1 (79) powc r (79) onellided al terna tive hypothesis (80) confi d e llc~ sct (81) confidencc Ic\'el (S I) confi de nce interval (HI) coveragc prohahility (83) t~ 1 for t he difrerencc hclween two means BLUE (69) lea<:t <. 'i"he ' lltatislic is ust!o 10 test the null hypolht!sis Ihal the popuilltion mean takes on a part icular value. A 95°/() con fidence in terval for J. Y has an approximatcly normal samrling distribution when the <. Key Terms e"timator (68 ) cS li ma t~ (68) l. The (stat islic can ~ used to calculate the pva lue associatt:d wit h the nu ll hypoth esis.ly is a n interval conslructr:u so thot 11 conlainb the true value of p.Iam. 6. 4 .lior (76) I~talistic «(ralio) (77) te". by the central limit theorem. A sma ll p va rue is evide nce Ihrl t th e null hypothesis is falsc. If II is large. (77) lype I error (79) type II error (79) "ignificl\nce level (79) (8] ) c{(ecl (85) treatmen t effect (85) scalterplut (93) <.ht line. 5. by the law of large Dumber~ Y is cOIl.'a tblic.te ncy.Key Terms 91 c. how well their scallcrplot is approximated by a st ra ig.a5.
6 Wh y does a con fiuence interval contain mo re informa lion than the re:<.0. a..ul t of a single hypot hesis les!'! 3. and (e) n = WOO. Show that p is an unbiased esti mator of p. In a random sa mpl e of size n = 100..5: (elO.. .0..PTER 3 Review 01 Stoli~liC5 Review the Concepts 3.0. significa nce level.1 Explain the diffe rence between the sample average Y and the population mean.98 CHA..2 Let Y be a Bernoulli random va ri ab le with success probability Prey = l) =< p.d. Ihlll v!lr(j1\ = n(l  . find Pr(Y < 101)..9. Show that b. Exercises 3. applied 1 0 data Crom a ran domized COntrolled experi me nt. find Pr(lDl c. ~hn .d. y = toO and a~ = 43. Use t he centra l Limit th eorem answer the following questi ons: It) o. Cd) . and Je t Y .7 Explain why the differe nces·ofmeans estimato r. . 3. (c) 0..O. •' p = y. and power? Between a onesided and lwosidt:d alter· oative hypoUles is? 3.e 11 = 64. 3. In a random sa mple of si7. In a random sample of size n = < Y < 103). 165. Provide an example o f each.1. (b) .2 Explain the diHcrence between an estimator and an estimate.. t In a popula tio n p. 3.he law of large numbers. draws from this distribution. sample fro m this populat ion for (0) n = 10: (b) 11 = 100. Re late your answers to .4 Wh "t role does th e central limit Lheorem play in stati stical hypoth esis test ing? In the construction of confidence intervals? 3. Le i p be the (ral" lion of successes (I s) in this sa mple.5 Wh at is th e difference between a null and allernative hypothes is'? Among size.. is an estimatOr of the treatment effect. . 38 Sketch a hypothetical scatterplot for a sample of size 10 for two random vari· abies wilh a population correlalion of (a) 1. 3.\/ . y~ be U . b. find PreY > 98). De termine the mean and va riance o f Y from a ll i.3 A population distribution hal' a mean of 10 and a variance of 16.i.
02. and lei tion of Yoters in the sample who prefer Candidate A. Construct a ~5<J/o confidence interval for p. Why is the iolerval in ( b) wider than the in terva l in (a)? d. Why do th e rc~ ults from (c) and (d) differ? f. iv.5. b.5 vs. O. an d the votcrs are asked to choose between candidate A and candidate B. HI: P j:.S? d. p denote the frac 3.50 vs. Use the estimator of the voriance of p.5 A survey of 105S registered voters is conducted . Compute the power of this test if P = 0.5 vs. 3.5 using a 5% significance level. You arc interested in the competing hypotheses: Ho: p = 0. Construct a 50% confidence interval for p. i. ii. 215 responded tbal tbey w(luld vote for the incumbent and 185 responded that they would yote for the challenge r. nod let p be the fractio n of survey respondents who pre ferred the incumben t. HI:p > 0. What is the p.value for (he test Ho:p = 0.3: a.5 Ys. Construct a 95% confidence interval for p. H r: p "* 0. Did the survey contain statistica lly significant evidence that the incum bent was ahe<ld of {he challenger at the time of the survey? E:\ plain.5? e.3 In <t survey of 400 like ly vOIers. Test Ho:p = 0. What is the size of Ihis test? ii.0. Construct a 99% confidence inte rval for p. p= 0.5 vs.p)ln.51 > 0. Construct a 99% confidence interval for p. a. Use the sur vey results to estimate p. iii. c. HI: P 0. What is the pva lue for the leS1 Ho: P = 0.53. HI :P > 0.0(1 . Without doing any additional calculations. Let p denote Ihe fraction of voters in the popu lation who prefer candidate A .5 vs. ={:.50 at the 5% significance level. .54 . . HI:P :#0. to calculate the standard error of your estimator. c. test the hypothesis Ho: P = 0. b.Exercises 99 3. ". a. In the survey j. Test Jio:p = 0. Let p denote the unction of ali likdy voters who preferred the incumbent at the time of the survey.S uppose that yo u decide torejeci Ho if 1 ft .5 using it 5% significance level.4 Using the data in Exercise 3. b.
Ho w la rge :.:rwi~e.8 A ne.05.. 3. 3. version of the SAT Icsl is given to H)(X) randoml)' selected high schoul seniors.. she "'HI conclud e that the new p races:.hould 1/ h~ if the sur vey usc:'> simple random sampling? 3. the " margin of error" is 1.pI > 0...01) ~ 0. plant produces bulhs with a mean life o f 2000 hours and a standard dc ~' jation of1oo hours.9 Suppose that a Lightbulb manufacturing.7 In it given population..03. How many Qf lhese con fidence the true value of p? int~f\iab do you c '\:pcct to contain d. fl 2100 h ours:othl.d.6 Lei Yl" ' " Yn be i. Fo r each of Ihes.cY". Is there e\ idence that the sur\ey IS biased? E xplain . d ra ws from a d imibution with mean /l. Can you determine if p. Supp~t. i..lcncc inter· \al? Explain.. Suppose you wnn tcd to design a survey thai hau a margin of error of a1 mO~1 I %.:. 11 % of the like ly VOlers arc A frican American . using independently selected voters in each survey. a. \Vhal is the probability thai the true value of p is contained in all 20 of these l.'d Pr{ li) . Ie!S I score i5 3.96 x SE(p): that i'l.100 CHAPTER 3 Review of Slatistics c. Suppose thallhc survey is carrieo out 20 lime. 8% African Americans..'Onfidence intervals? ii..i .process is in fact beller and has.: thai the ne\\.: invenlor 's claim if the sample IDcan life of the bul b!! is grea te r th.lom !!ample of 6(Xlland·li lle telephone numbers find:.t mean hulb life of215O houf'. value of 0. l ll<'ll t~ you wnnh. The plnnt l1lannger randoml~ St!lcCI!! 100 bulbs produced by the p rocess. il is ~ linH:!S the length of 95% co nfidence imerval.denote the mean of the ne w process. What is tbe po. An inwnlOr cI"im ~ 10 have developed an impro"fo!d process that produces hu l h~ with a longer mean life an d the S<l me standard devitttion. A sur vey using a simple r an<. "" {) is containcd in the 95 % confi<. is no better than Ilk old process.:: 20 ~ur. a 95~i) confidence interval fo r p is constructed. b.. She says that she will bt:licvc th. u. A test o f Hu: J1 = 5 versus Ill: J1 "* 5 using the usual fstatj~tic yield" a ". In survey jil rgon. The sample mean 1110 and the sample 1Ilamlard de\ ia· tion is 123. What is the sitc of the planl manager '~ testing p roce dure? h.. Dues the 95% confi dence interval con lain J1 '" 5? Explain. = 2()(X) vs. fl l : It > 2lWlO. Let Il. Construct a 95% confidence inkrval for the population mea n test score for high school senion. cr of the plant manager'" h!sling . ConsiJer tht: null ami alternative h ypothesi~ Hu: p..
Suppose the same test is givc n (0200 randomly selected third grade rs fro m Iowa. b. 3.) b.value of the le:. first slale the null and alternati\e hypothesis: second .l. compute thc relevant (statistic: third.. The sa mple average itCorc Y on the test is Sf! points and Ihe sH mplc standard deviatio n.l yield Y "" 646. iation of 11 points. 3... 11 Consider Ihcc!'.2 and ~larHhtf(1 de"ial.t about w<lgc d irfcrenec~ in Ihe firm ? Do Ihey repre~nt staListie<llly significan t evidence Ihat wages of men and women are d ifferent? (To answer this qu~ ~tion . c.25(T ~.a sample of 100 men a nd 64 women with simila r job descri ptions <lrc selecled al random. Whal lesti ng procedure should the pla nt m. and finally usc the pvalue to answer the question. 1). a. Do these data suggest Ih:lI thc firm is guilt}' of gender discri mina tion in its compe n ~a lion polir. sr. Can you conclude with a high degree of confidence that the populat ion means for 10. he wants the si/e of he r test to be 5 %? 3...a and New Jersey students are different'! (\Vhat IS Ihe Slandard error of the difference in the two SHmplc means? Wh<lt is the p . A !'.mager use if ... What do these data sugge:.\lOO $2900 $200 Women S:t20 64 n. is 8 poin t!\.5 . Construct a 95% confident/. Show that (a) E(Y ) = J.timalor y .. um· mary of Ihc resu lling monthly salnries follo ws: Averoge Solory (f ) Standard Deviation (sr) " 1 00 Men S.13 Data on fift h·grade test scores (reading and mathemafics) for 420 school Iriels in ("alifllrn. y a nd (b) var( Y) = 1. producing a sample average of 62 points and sample sta n· dard de. 10 Suppose a new standardized te:.dcfi ned in E q uation (3. 12 To invcstiga te possible gende r d iscriminatio n in a firm .\ of no difference in mean'\ veniUS some difference?) 3.Exerci~ 101 c. Th e authors plan 10 admini ster th e test to <III thirdgrade stude nts in New Jersey.t is given to tOO r<lndomly selccted third· grade st uden ts in New Jersey. comput e the p value associated with the /'SllIlislie. d is~ .l n.: inte rval for Ihc mean score of all New Je rsey third graders.:ics? Explain. Construct a 90% confi de nce interval fo r Ihe difference in mean scores between 10Wl! and New Jersey.on ~ y = 19.
102 CHAPTER J Review 01 Stati stics B. a.9 [5 there s tatistica lly significa nt evidence Ihal the dist ricts wit h smaller classes have higher average test scores? Explain. b. s x ::: I. c. TIle resu lting summary sta tist ics aT e = 70. 3. Con vert these stati stics to the me tric system (meters and kilograms). a nd 378 re ported a preference (or Ke rry.. a. Construct a 95% co nfidence inte rval for the fraction of likely voters in the population who favored Bush in early September 2004... ~orc in t be 1>. 5 inches: Y = J 58 Ihs. Construct u 95 % confidence inteTVul fo r the average test score for Florida stude nts. and r XY = 0.14 Va lues of heigh I in inches ( X ) and weigh! in pounds (Y) tl f e recorded fro m X a slImple of 300 male college stude nts.0 19. the mea n is 1013 and the sta ndard deviation (s) is 108. Bush. 3.1 6 Grades on a sta nda rdi7.. ConSlruCI a 95% con fidence iDlerval for the mean test population. 405 reported a prefere nce fo r Preside nt George w.4 17.R inches: Sy = 14. results were found : A. Was there a statistically significant c ha nge in vote rs' opinions anoss the twO dates? 3. SXY = 21.85. the {ollowi ng. Ckru Siq Score I Yj Stondard ~ lsrl " 238 182 Small Large: 65 i A 650.cd test a re known to have 3 mean of 1000 fo r studentS i.n the Un ited States.73 inches X Ibs. The test is administe red to 453 ran domly selected stu de nts in Florida: in this sample. . When the djsuicts we re divided inl o districts wjlh small classes «20 students pe r leache r) a nd la rge classes (2:: 20 stude nts per leacbe r) . sur veyed 755 likely voters.surveycd 756 likely \ '0[ t: rs: 37l=i re pon ed a prefe re nce for Bush. Construct a 95% confide nce inte rva l for the frac tion of likely VO ters in the popula tion who favored Bush in t:a rly October 2004.. 2004.3. 1lle Cl\'N/U$ A 10day/GaUup poll conducted o n Octoher 1..2 Ibs. 1004.. and 350 re pon ed 11 preference (or Senator Jo hn Kerry.15 The CNNtUSA Today/Gallup poll cond uc ted o n Septe mber 35.
c.18 n lis excrcise shows that the sample variance is an unbiased estimator of the popula tio n variance when Y i •..2Il). Ii.I91J2 .) 3. The o riginal 453 student s iJ re s iven the prep course and then asked to talce the test a second time..I. I s there statistically sis nificant evidcnce Ihat the prep course helped? d. (Him: Y..and the standard devialion of the cbange is 60 po int ~ i.". Describe an experiment (hal would quantify these two effects. . i.3 1) ( 0 show tbal E( Yi .. Construct a 95 % confidence in terval for the change in the gender gap in average ho urly earnings between 1992 and 2004.Y." iI. with mean tL y and variance tT~. ii. The average change io their test scores is 9 points. S tudents may have performed hetter in their second att empt because of the prep course or because they gai ned tcst·taking experience in the ir first attempt. Is there statistica lly significa nt evidence th at students will perform beller on thei r second attempt after laking the prep course? iii. I992 is independl!nt oCYm.d . Construct a 95% confide nce interval fo r the change in men 's average hourly earnings between 1992 and 2004. Another 503 students are selected at rando m from Florida. Construct a 95% con fi dence interva l for the change io average test score associated with the prep course. 'TIley are given a threehour preparation course before the test is adminis tered. Construct a 95% confidence in terval for the change in average test scores.Exercise5 103 b. it. b..y)2 1= var(Yi ) 2cov(Y"Y) + var(Y) . 3.. Construct a 95 % confidence interval ror the change in women's aver age hourly earn ings between 1992 and 2004 . Use Equation (2. . Y" are U.17 Read the box "The Ge nde r Ga p in Earni ngs of College Grad uates in the United States. Thei r average test score is 1019 with a standard deviation of 95. Is there statistica lly significant evidence thaI Florida s tudents perform diffe rently than other slUdents in the United Stales? c.2004 . ..y.
comlstock_wa tson you \\ill find a data iilc C I'S91_04 that con tai n) an extended version of the dmascl used in Table . )'.S. In 2004.ing power from 1992102004. 1I.m. (llm/: Usc th~ strategy of Appendix 3. the value o f the C PI was 140. avai lahl\! on the We h site. 1n 1992. Y.u. '1 of the text for the years 1992 and 2(04.20 Suppose that (X.i.:: sam· pie co\ ariancc is a con~ i s l cn l estimator of the population covariance..3. ).\'Y ~ (1".h:!111 estimator 01 3..23) cqual$ the usual standard error fo r the d ifference in mean<.). "contains data on fulltiml!. l On the text We b site W W \". d. with a high school d iplo ma or G..~'! b.l fo r the mc:m of A H E for high school graduates. Use the 2004 dlltll to con~truct a 95% confidence intcrv".Y. would you use the re!tults fro m (a) or from (h)? Explain.. Compute the sample mean for average hourly earnings (A H E) in 19l)~ and in 2004.l. fullye ar workers.' degree.d. Usc the rc. 19 u.. Is yl a consi!.) are i. Use Equation (2.. where SXY is defined in Equation (3. Is )72 an unbiased estima tor of . lhat is.Y. agl! 2534.')I given fo llowing Equat io n (3. Empirical Exercise E3. with finite fOUl1h momtmt~ IL5? Prove that th. If you \\ cre interested in the change in workers' pu rchll:.24). . as the ir highe. A detai led descript ion is given in C PS92_04_ Descriplio n. R cpc <11 (a) but usc A IIE mca sured in real 2004 dollars ($2(X)4): that is. Con"truct a 95% conndcnc~' .3 and the Cauchy· Sch\\aru inequality. adjust the 1992 data fo r the price innation that occurred between 1 992 and 20()4. c.truCl a 95 % confidencl! interval for the population means of A H E in 1992 and 2(1().A. the value or the Consumer I'rice Index (CPI) was 18R9.33) 10 show that cov(¥.u\\·bc.) 3.104 CHA PTf R 3 Review ol Statistics b. 3. c.19) wht:n the two group sizes lJrl! the S<lnlC (1/".t aDd the change hetween 1992 and 2{X). Y is a consistent ~slimutor of JJy. in Equation (3. Use these data to answe r the following questions.ults in parts (0) <lnd (b) to show thai E(si·) '"' (I"~. 11..21 Show that the pooted standard error [SEp<WJI~d(Ym . Con:. Y is an unbiased estim:ltor of /J.)  u} f ll.. = n...
confidence inter.ks que~ti(ln~ u~ing pre\'iou~ ~enr. Did rcal (inna tionadjusted) wages or hi~h school graduates increase from I Y92 to 2004" Explain.1 The U."lbe exact random sampling scheme rather complica ted (firsl.tnd a. unemployment. S.lhe u. using appropriate l'~timalcs. Repeat (d) using the 1\)92 data expressed in $2004. g. in lithic .. .h school graJuatcs increase? Explain. Current Population Survey Fnch month the Bureau of L. go~) The ~Uf\C) conducted :aeh March Jbout earnmgs during the ~un'cy~. small geo.and earnings. households are sun'eyed each month. Con struct a 95% confidcncc interval for the. f...:n hoUSing units within Ihese = s are (WW\\ can he found in the Hallllh()tIK of Labor Sranstic5 and on thl' Bureau of LatlOr StatistiC"i Web site hl.r.S. which provides datu on labor foret' ehnraeterisl!t'S of the population. IS more delailed than in other month~ .lrch The CPS earnings d.:u!.The sample I~ chosen by nmdomly select ing addresse~ from a datab<lsc of addresses from the I~ mo~t recent decenmal c<::nsus aug· mented with dllta on new housing units corntrucled afler the I:ISt census.000 U.'~ bt:1\\een the results for high ~chool and college gradualc..t 1 were computed workcr~ thl' M. Are there any notable differc.1bor Sllllistics in the U. Oepartment of Lahor conducts the "Curren t Popu lation Surve y" (CPS).S.1 presents information on the gender gap for college gradu ates.lh. Prepare a similar lable for high schOOl graduatcl' using the 1992 and 2~ data. Current Population Survey l OS imen·al for the mean of AHE for workcrs with a college degree. The statl~lic:.c/eclcu.s. l abJe 3. including thi: level of emrlo)·mcnt.._ 1 . e.h(fercnce between the IWO me:ln5..tphical areas are random ly randoml~ ~Iected}:dctllils . More th<ln 50...(. and test sl3tis tics..'! APPENDIX ___3 _. . Did real wages of college graduates increase? D id the gap between earnings of college and hig.'als.lIa are for fuil·tlme defined to be somehody emplovcd more than 35 hours per week for al least 4R weeks in the prevIous vcar.
..m)2 .2 Two Proofs That Y Is the Least Squares Estimator of f. . one using c. l n + d .l 2: (YI. I :. . Because both terms in the final linc of Equat ion (3.([Y. from which it follows tha i Y is Ihe leasl squ a r~ estimator.'Y)1 + 2deY.Ly This appendix provides two proofs.~1  Y) + I1(P = L (Yi .m. 1 " (3. . T 2t(2 (Yi .l lQr o f E(Y ). thai Y is the leasl square~ t.::!d) where th. ..'\lng d to make the second term. N .". Noncalculus Proof The strategy is tashow that Ihe difference between the least squares esti mator and Y muSt III . thai Y minimw:s the sum of squared prediction mistakes in Equation (3.lculu$ and one nOl .. CHAPTER 3 Review of Statistics APPENDIX 3..III) . (3.2)lhal is.. This is done by setting d = 0. . m .I  ¥f' + II(J ~.. II m)2 = L (Y.d. the sum of squared predict ion mistakes IEqUnll0 n (3.. Then (Y. 1(Yi . second equality uses the [act that I ~ 'l (Y'  Y) = O..m )2 is minimized by ch()().27 ) Solving fo r the final eq uation fo r m sho ws thaI ~:. .(Y  be tert>.28) are nonnegative and because the first term does not depe nd on d.(Y..y) ~ . by selting. Calculus Proof To minimize the sum of squared prediction miSHlkcs. (Y. Thus.2 LY.Y] + d)2 "" ( Y. .so that Y is the least squan:s es tim. thai is.1.'D. + 211111:: O . Lei d =: Y .2)) is II L (Y. as small JS possible. lake its derivat ive lind set il iO zero: d ~ d L( Y m I_ I ' m = m)l = .m)2 is minimized when y. . .Y . ndl. so thil! dW ."Slimalor of £(Y).Y .=1 n .
W" art: i. .) < "".: (Y.: u ~ (by the d~tinition of tht: variance). First. . y)F .:r L(Y .).21}) .)'l (" )(y.ly)(Y . SO ~ ~~.so IV satis fi es the condItIOns for the law of large num· ber<.~Yf.I(YI  /. Thus IV..i.l y) +.. Because the random variables Yl " . .d. The law of large numbers can now be applied to the twO terms in the final ti ne of E qua· lion (3. . In Key Concepl 2.y)1 into Ih(: def· I • " • ..: E[(Y..d. Dut IV .y)l II  1 . add and subtract J...=j'" 2" L(Y.L (Y .! 2.~y)2 ~ 0 so Ihe second term conerges in probability to ze ro.. . Because Y !:.n(Y ..t y)2.J._ 311 A ProofThat the Sample Variance Is Consistent This a ppendix uses the law of large numbers to prove thM the sample variance $~ i!. J.d.l n.y.y)] and by collecting terms. .. y) .'._l (V ' ~. a co n· sisten t est imator of the population varia nce uf·. Combining tbese results yields j~. £( Yi) < 00. Y" are Li.. Y~ arc i. we have that  y)2= [( Y.J.~ con\'erges In probability ul.7)].'L )')~ J < "" because.d.~.29) where the fin a l equality follows (rom Ihede finit ion ofY [which impl ies that I:71( Y . ./J. In addition.p.1 ~ ' I II _ 1 ..l yf Substituting this ex pression for (Y.J...p. Also. whe n YI.lIy)(Y . w(n  I) + I.9).) .A Proof That the Sample Variance Is Consistent 107 APPENDIX __ 3 _. . hy a~sumptio n ..l y) + (V ..4) 2 and E(Wi): ui·.J'. £(Wl) .\ L(Y. ni (3.  lIy)2 .·1 J. and E( Y ~) < ""'.i. . so the first term in Equation (3.l y 10 wrile (Yi initionaf I~ [Equation (3.y) l  .(.(Y .l)) .6 and \V ~ E(W.. ~ u1. Now £(W. CY toul·.  #. . and var( IV. Define Wi = (Y.2(Yj .29) p.p. a s st ated in Equation (3.I( Y' .)·f '·1 I II •(_ n )[! ± n...:.. • IV~ are i j. the Tnndom variables W" .
PA RT TWO Fundamentals of Regression Analysis CHA PTER 4 Linear Regression with One Regressor Regression wilh a Single Regressor: Hypothesis Tests and Confidence Intervals Linear Regression with Multiple Regressors Hypothesis Tests and Confidence Intervals in Multiple Regression Nonlinear Regression Functions CHAPTER 5 CHAPTER 6 CHAPTeR 7 CIIAPTER 8 CHAPTER 9 Assessing Studies Based on Multiple Regression .
one student per class.scs: What your fut ure earnings? i~ Ihe e(feet o n All three of these questions are abom the unknown effect of changing one variable.sfully compielc one more year of college cla:. 111 . imple men~ I. Y (Y being highway dealhs. say. 'nle economC"tric problem is to esti mate this slopeth at is.ough new penaltie. using dala on c1a~s size~ and lest scores from different school districts.'1t is the: dfcct on hLghway fatalities? A school dlSlrlCI eUls the S ize of 11'0. The slope and the intercept of the line relating X and Y can be estimated by a mcthod called ordinary leasl squares (OLS) . or earnings ). Y.This model postulates a linem rela tionshi p belwccn X and Y: the slope of the line relating X and Y is the efft!ci of a oneunit change in X on Y. X..ChAPTER 4 Linear Regression with One Regressor A S'3IC. duting class sizes by. a sample of data on these IWO vari:tb l t'~ ll1 is cha pler describes methods for estimating this ~Iope using a random sample of data on X and Y. to a nothe r. X (X being penalties fOT d runk driving. 111ls chapler introduces the linea r regression model relat ing onc variable. the slope 01' the line relating X lind V is an unknown <.s on drunk ~rivers. size.. we show how to e"l imate the expected effeci on teM scores of r. Just as the mean of Y is an unknown charaClerif'tie o f the population distribution of Y. on a nolhe r variable.: Wh. or )'e:lf~ of schooling). .1 scores? You succc:. dcmc:ntary :lehool classes: Wha t is the: effec t on its students' sta ndardized le:. to e~limate the dfec\ on Y of a unit change in X using. daS:. studen t test scores. For inslance.:haraetcristi e of the population join t distribution or X and Y.
. In many school di strict~ student performance is measured by standardi7ed tests.1 • 112 CHAPTIR 4 linear Regre~sioo wilh One Regl"e$SOf 4. (3c:w&~" where tbe suh script " ClassSize" d istinguishes the effect o f changing tbe class size (rom otbt:r effects.) Suppose thai f3(1JwS.1 ) so that !J. which is nOI (0 the liking of those paying the bill! So she asks you: I f she eUl s class sizes.u is the change in the test score that results from changing the class size. wha t wiu Ihe effect be o n slUdent performance? '. she will reduce the number of students per teacber (the student. Sill! Caces a tradeoCe. and the job status or pay of some administrators can depend in part on how welllheir student s do on these tests.TeslScore change in ClassSize .:r = 0.(.6.6. Parents wan I smaller classes so that their ch ildren can rccci\e more individualized attent ion. (J. which concerned changing class size by two st udents per class.. But hiring more teachers means spending more money. Thus.2: tb al j" )'ou would predict that lest scores wou ld ri~e by 1. We th t:refore sharpltn the superintendent 's question: If sh. ClassSize ' ('..Si:t' You could also answer the superintendent 's actual question. Thcn a red uction in class size of IWO st udentS pttf d ass would yield a pred icted change in test scores of (0..) X ( 2) . If the superin tendent changes the class size by a certain amount. J3c~s.ssSi~r = change in TestScore _ .e reduces the average class size by twO student s.ll where the Greek letter.... . what will the effect be on standa rdized test scores in her d istrict? A precise answer to this qu estion requires a quan titative stat ement abo ut changes..6. If she hires the teachers. what would she expect the change in standardized lest scores to he? We can write this as a maLheOlalical relationship using the Gree k let ter heta . If you were lucky enough to know (3climSlw You wouJd be able to tell the super· intendent that decreasing class size by ooe studenl wou ld change dist rict wide tesl scores by f3 C/a. 1. To do so.6. J3CI.1 The Linear Regression Model The superintendent of an elementary school dislrict must decide whether to hire atlditional teachers and she wants your advice.2 points as a resu lt or the redll(' (iew in class sizc!> by two siudents per class. rearrange Equution (4. (della) stands for "change In:'That i s.e. divided by the change in the class size.teacher ratio) by Iwo.TeslScore = f30""Si:t X Il ClassSiz.
.'iI~t. we sim ply lump all these "other [<lC tors" togethe r and write the relationsh ip for a given district as Tes(Score = f30 + f3CltJS. a nd Ihal lwo districts with the same class sizes wi U ha ve dilfe re nttest scores for many reasons. but you also would be able to predic t the average test score itself fo r a given class size. it shouJd be viewed as a statcme n! a bout a re la tio nship that holds 011 overage across the population of districts. t he leSI score fo r the districi is writte n in te rms o f o ne co mponent. as before. the idea expressed in Equation (4. A lthough this discussion has focused on test scores and class size.t rict mieht h ~ te r teachers differe nt test scores for essentially sons having todowith the performa nce of the individual s\udc ntson Iheday ofthe test. she te lls you Ihat some tbing is wrong with Ihis formulation. A versio n of this linear rela tionship that holds for tile" distric! must incorpo ra te these othe r faclors innuencing. For now. One di<.. When you propose Equatio n (4.3) whe re f30 is the intercept of Ihis straight line.of course. including each district's unique characte ristics (for example.JsSize. Instead. This straight li ne can be writte n TesrScore = f30 + f3 C/<U. test sco res. b<lckground of lheir stude nts..3) 10 the superinte ndent. (4.'\i~~ x C/assSize + o lher factors.4) is much more geneml. She points oUl th3t class size is just oDe of many face ts of elementar y educalion.3) will not hold exactly fo r all districts. quality of their teachers.4. 1) is the definit io n o f the slope of a )t raight line relating test scores and class size. f3CIll»SI~t is the slope. for all these reasons. that re presents the :weruge e ffect of class size on scores in the populat ion of school districts and a second component that represents all other factors. not only would you be able to determ ine the change in test scores a t a district associated with a change in class size. She is right. a nd. howeve r.•. f30 + {JC/lu.'~:~ X ClnssSize. if you knew flo and f3cw.3) (an idea we re turn to in Chapter 6)..3). how lucky the students were on test day). (4. One approach would be to list the most important factors and to introd uce the m explicit ly into Equation (4.4) ThUs. According to Equa tion (4. t The linear R egrenion ~r 113 Equation (4.'il~~ x CIt. so it is useful to introduce more • . Equation (4.
. denote the ot h ~ r factors innuenci ng the t C~1 score In the i 'b district. = f3a + /3 IX. let X. .4) can be written more generally as Y.. Let Y."" dislrict. Then Equa tion (4 . be the average class SILt? in the . and let fI ."" district.114 CHAPTfR 4 linear Regression with One Regressor ge neral notation. be t he average test score in the . Suppose yo u have a sample of " district". (45 ) . 14..
wr. 111is mea ns that test scores in district I#l were bette r than predicted by the population regression line.. = .1 summarizes the linear regression model with a single regressor fo r seven hypoth etica l observations on lest scores (Y ) and class size (X). the hypotheti cal observa tions in Figure 4. so the e rror lenn for that dist rict.1 Yj = fJo + {3 !X . as mentioned earlier. where Ibe subscript. and III is the error Icrm.. ..4. runs over obs~rvations. For exam ple.. . .teacher ratio by two stu dents pe r teacher? The answer is easy: Th e expected change is (2) x f3 C/IlSSS.1 The Uneor Regression Model 115 ~ . i = 1. it has no realworld meHning in this example. .linear regression model is 'Jl'I~ TERMINOLOGY FOR THE LINEAR REGRESSION ~jr. . Figure 4.:. bul. is above the population regression linc. + U i . . 1"<" . In con· lTasl. . which means that districts with lowel" studentteacher ratios (small er classes) tend to have higher tesLscores. The intercept 130 has a mathe matical meaning as the value of the Y axis intersected by the population regres sion line. . . Y\. the regress(I1ld..:r? ~ • . MODel WITH A SINGLE REGRESSOR I.1'J" II KEY CONCfPT :_ _ 4. X j is the illdepcndell1 variable.. 11 . the regres. Y2 is below the populalion regression linc.ze' But what is the value o f {3 ClimS.. ii i' is positive. and U2 < O. or simply the righl·hand varia/)/e: f:3o + fJJX is the population regression line or poplI/miol! regression funclioll: f10 is t he inlercepl of the population regression line: /31 is the s/opt of the population regression line. Yi is the depelUlf:1Il variable. The popu · lation regression line is the straight line f30 + " IX. Now return to your probl em as advisor lO the superin tendent: What is the ex pected eHect on test scores of reducing the student.so lest scores for thai district were worse th an predicted. or simply the left·hand varit/bll!. . Because of the other factors that determine test per{onnance...1 do not fall eX<lctly on the population regression line. The population regression line slopes down (f31 < 0). the value of Y for dbtrict ttl.
Although the populalion mean earn ings ur. for the ..Y I) " " • • • <>Ill 6:!lI 6001L_ _ _~_ _ _'_ _ _'_ _ _J 10 15 20 25 )0 Stutl entteac h tr rauo (X) 4.. (or example.. unknown. Thert: · fore. possible 1\) lc:un about thl! population mean u~ing a sample of data drawn from thai . hypothetical observations for s. point to the popukrlion regreuion line is Y. i\)O li nc relating X (class si. such as t he applica tion to cI .nuJ uatcs in the sample. observation.ISS size lind tcst seorel>..nown popula tion n. lhe intercept 1311 anlll>lopc f3 1 of the popula tion regression line are unknown. Then the natural estimator of the un kno\\ n popu lution menn earnings for \~ ome n.:c) and }' ( tcst score . 'rne same idea extends to the linear regression model. StvdentTeoc. 1 Scatter plot of Test Score n.. The population regression line is/Jo + PIX.' the slope or the unk. veriicol distance from the jl. This estim ation problem is similar to others you have faced in statistic.. whieh is !he population error term u.fllf example.(~I . We do not know th\! population va lue of i3ulW$. But jus t as it \\a.grt:s.'lt" college g... """ (. SUPPOSt. is the average earmngs or the feln. we can estimate the populat ion mean s using a random samph: of milk nnd fema le college graduates.even school dislricb.11 6 CHAPTER 4 linear Regnmlon with One Regressor FIGURE 4 .her Ratio (Hypothetical Dotal Test sco re (Y) 7j~) The =1Ieq>Ioi >how. we must usc data to el>timate the unknown slope and intercept of the popu· lation reg ression line. n. ).. (XI.2 Estimating the Coefficients of the Linear Regression Model In a practical situa tion. LBo + PIX).:.l you want to compare the mean earning"S of men and women \\ ho recenlly grad u.lIed from college...
Ivcrllge studentteacher rati o is 19.9 re'>t ~Lore 630. while the district 3t the 901h percentile has a studentteacher ratio or 21."Ores and the ~Iudenltcache r r:llio is ::._ . there arc other determinants of test scores that keep the observat ions from fa lling perfectly aiong a sl raight line.'~' based on these data..l.0 variables. to use the ord inary least squares (OLS ) estunotor.3 (that is. The test score is the districtwide average of reading and math scores for fifth graders. if one cou ld somehow draw a srraight line through these data. The measure u~t!d here is one of Ihe broadest. The data we analyze here consist of test scores find class siz~ in 1999 in 420 California school districts thai ser\~ kindergarten through eighth grade. only 10% of districts have SiU· dent..mkr using a sam ple of datil . the districlwide student teacher rauo.9 19.1ope {3( 't..23.ould be an estimate of {3CIIU><.7 679...3tios belo w 17.teacher r. While this method is easy. I 654. SO is it possible to lea rn about the population <.2. __ '. Despi te th is low correlation. Class size ca n be measured in various ways. 1 Summary of the Distribution of Studen.• .9 students per teacher.Ita ta arc described describel n Appendix ". k h Un 10 ' t hu l sample.6 640. ..4.Ii .that is..1 popu]'Hion. which is Ihc no . e~c d<l ..9 90% 19.U· ~llc ~ r o .1 11.ou cou ld. the district divided by Ihe number of II is.ul£ 4 . The . "i1UJ<:nt~ leather ratio Standard Deviation '0% "" 18. Although larger da~~cs in this sample tend 10 have lower test scores. . One way to draw the line .. it is \ery ullscien ti fic and different people will creale different e!)tima lcd lines.5 ""'.6 ~ l udcnls per teacher and the standard deviation is 1.. • ..ould be to take out a pencil and a ruler and 10 ·'eyeball"' the besl line ).3 197 ZIl.9.. then the sl o~ o f this line \l. How. then.Teacher R atios and f jftflGrode Test Scores for 420 K8 Districts in Califomio in 1998 Av" rQg. • .4 MI}. The samph! correlation is 0.] Percentile (median) ""' 60% "" 10.3).2 E~rimoting the Coefficients 01 the linear Regression Mode! 117 T. indicating a weak nega live relationship betwee n the 1\1. A scatterplol of these 420 ob~ rval ion s on test M. The IOIIa percentile of the distribu tion of the stud enlteacher ratio is 17.hown in Figure 4.l MilA 21.6 6652 1. JIJ. "hou ld you choose among the many possible lines" By far th e most common way is 10 choose the line thai produces th e ··Ieast squares" fi l to Ih c~c data.
ll1e sum of these squarcd prediction mistakes ovcr all n observa tions i~ .'. ..6) The sum of the squared mistakes for the linear regression mode l in expres sion (4.o (California School District Data) Dolo from .:"' . _ . the sample average..l. ..6). . ..he least squares estima to r of the popu lation mean.6» ). ... " '.\.X.2).• 118 CHAPTIR 4 Linear Regreuion with One Regressor fiGURE 4 . .I •••'. 'i. £(Y). bo in expression (4..2).) . predicted using this line is bo + b l X . . " " " The Ordinary Least Squares Estimator T he OLS est imator chooses the regression coefficients so that the estimated regression line is as close as possi ble to the observed daiS.·:t"' .~. . In fact. Y minimizes the 10lal squa red estimalion • mistakes L~_ I( Y.. ". ' '. . . The OLS estimator extends this idea to the linear regression model... .. then III does not e nte r expressio n (4....I !..~I~ . that minimizes Ihe expression (3. . so is there a unique pair of estimators o f 130 and f3 1 that minimize expression (4. Th us.h. to '.mJ2 among all possible eSlima tors m (see expressio n (3_ 2)J .. Just as there is:1 unique estimato r. . T..' 10 ____~~L~~____~~~~~~~______~~ 15 2() 25 30 Stud e. so the value of Y.. 660 640 (...:. .2).. .ttl .. . ."I :'..' .. Y. . As discussed in Section 3.. Y. 1').... .e mean in expression (3..:f.23. " .. . .. '...b L o b. (4. 1.st SCO ff no There i~ 0 week negative relotion~ip 700 ""') between the student teocher rotio one! leU scores: The sample correlo~oo ./) lX i. • Of''' ".j'.. is !.~ •.. is 0. '. Let ho and hI be some estimators of I3u and f3 1• The regression line based on these csll mators is ho + b l X.' l"'.. •• .: .20 &O.... .2 ScatterpIot of Test Score v.1 " (Y... the mistake made in predicting the j1 11 observation is Yi .'.. ."$.(bo + bI X .. ...••Q: .. . Studen....6) is the extension of the sum of the squared mistak es for the problem of estimating th.Teochef' Rat.. .6) and the two problems nre identical except for the different notalion (m in expression (3. whe re close ness is mel· sUTed by the sum of the squared miswkes made in predicting Y given X.'':..420 Coli· fernio ~hOOI diitrim. that is. if there is no regressor..ntt each er ratio . • ...) == Y/ ...t.... .
80 is denoted 130. dUl l strea mlin e the calculation 01" the OLS estimato rs. . and error term (uJ s estima 'ilimation (3.5 IS mea The esti mated intercept ({3o). You could compute the OLS estimators f30 and .2 (4. slope (13 1) .7 ) 1'1 f'i _ i~l " (X. . then b1 :ept fo r the IS there is :t unique pair • .2 A _ 'lXi·ThuS..4. L (Xi .. II. Y.2)].limatcd . These form ulas arc im pl emented in virtually all sta tistical and spreadsheet programs. =1 PO= Y .8) The aLS predicted vulues Vi and residuals iI.10) 30 ra tio j'i = . and Yioi = 1.. i = 1. Fo rtuna te ly there a re form ulas.X.im iz ing ex pression (4. (4. OLS has liS own specia l notation a nd terminology.81 by tTying diffe rent \'alues oC bu and b 1 repeatedly until you find those tha t minim ize the total squared mis wkes in expression (4.PIX' . These f()fmulas arc dcrived in Appendix 4...6) are ca lle d the ordinary least squares (OLS) estima tors of . are y.. .6) : they are the le ast squares estimates. AND RESIDUALS KEY CONe"" iI The O LS estima tors of the slope p.hu 'alions is (4. slope (~I)' and residual (iii) are computed from a sample of " o bserva tions of X. The OLS estimator of . .9) (4. II. and the intercept P il are 4. and tbe OLS estima to r o f . "' II (4 . \. YJ . . = . el.2 Estimating the Coefficienb 01 the lineor Regre~sion Model 119 ~ THE OlS ESTIMATOR.8 1.v.6) In cxpres )roblem of m. The OLS formulas and terminology are collected in Key Concept 4.X )! Y) _ SXY . L .80 and .81 is denoted P I' ~c OLA S regress ion tine is the straigh t line constructed using the OLS estimators: f3Jl + 8. . f30 + {3\Xi• Yj i == I. The 'ili bet ween Y/ a nd its predicted resid ual f' a1ue: Iii "" Y. Le i /}o hese esti = The estimat ors of the intercept and slope thai minimize the sum of squa red mistakes in expression (4.}J..t X)(Y. .. These are estimates of the unknown true population intercept ({3o).. however. PREDICTED VALUES. derived by min.6) usi ng calculus.2. This method would be quite tedious.
TIle slope of 2 . observations is TeslScore = 698.: ... !he e1limoted regres sion predim IMI las! ~e~ will increcne by 2...• 120 CHAPUR 4 linear R egression with One RegresS()( OLS Estimates of the Relationship Between Test Scores and the Student. figure 4.ion line for these '20 II.. studenH cachcr ratio.2.~ Mil '.\.\ISCOrt' in E4u31ion (4. 111e symbol .. ' 5111 ~ '!.... "'" . . the OLS regrc. ".3 The Estimated Regression line for the California Data Tell . Accordingly.. ~ (4. " (.'It '... by 2.. .tht: FIGURE 4. ' 'S ' he ' mere 5"':!(....28 • ..·. j J "1 ~ •• .11) where Tes/Score is the ave rage {eM score in Ihe district lind STR is tht: s tudent.Teacher Ratio When OLS is used \0 estimate a lin.....9 .2. . ..18 and tht: estimated inter~pt is 698. .7) indicalc ~ Ihallhis is the predicted value based on the O LS regression line.lhc estimated slope is 2. '.. '_~ '" • "" . ' .Ieacher ralio. r ' ... . for a dbt rict with 20 student:.' 2"" ~( J Slud"ntleacher nuio ..flo (>(. per teacher... . .. .... 28 points o n the test... ~I~_~_ _ _ ~~_ _ 10 15 ~~:'_.1 ' II ' ..." .. .:. on average..'C' •• 1": ·.. ~" over T._ .9.3 plots this OLS regress ion line superi mposed over Ihe scatterplolo[ the data previo usly ~ how n in Figure 4.. • '.~.teacher ratio 10 le~1 scores using the 420 obse rvations in Figure 4. ~ :: 6989.._ _ _ ~=~_~_~.H':' _ '.teacher ratio by ont' ~tudcn t pcrclass is.2..1. For example.p belween te~t KOfe~ and the student teacher ratio.. .. ' " 'I D" . 'kdecrcass ip .: relu ting the student." • ' •• . associated with a decline in district wide l~lIt score:... 28)1 "0 $ :egaliiC sls.228 X STH. 1 = 2 X ( .. . .~.. .. tf doss !>izes fall by 1 stvdent..{I ....'C .".?~.!. fl:?l) . 21\ " ~ (... pe r cl ass is 99 uyepve 2§SOCIiIIC" with a n in£[CilW in I "S I scows pf 4... . .(X•.be >'II"cR'dCa r h gr prje AI' 2 st lldent..... ".': ".....28 means that an increase in the st udent..56 poi nt.~cor(' The emmoted regres sion line show$ a 720 71ltl negotive relatian~...28 points. .PIS p~r tcacher flarger classes) is associated with poorer pe rformance o n the test IS now possible 10 prediclthc di'Slriclwide lest score gl\~a lu e of tht..
at leasl.3.2 E stimating !he Coefficients of the Linear Regrew.hat would place her district close to the 10% with thc smallest classes would movc her test scores from the 50tll to the 60 Ih percentile. ThUs.teacher ratio by a largc amounl (2 students per teacher) would help and might be worth doing depe nding on her budgetary situ· ation. based on their studc ntteacher ratio.5.28 X 20 = 653.5. 'Ihis regression was estimated using Ihe data ill Figure 4.1. Why Use the OLS Estimator? There are both practical and theoretical reasons to use lhe OLS estimalQrs f30 and ~]' Because OLS is the dominant melhod used in practice. Wh at if the superintendent were contemplatin g a far more radica l change. wou ld move her studenHeacher ratio from the 50th percentile to very ncar the 10111 percentile. the median studentIeacher rat io is 19. finance (see Ihe box). Suppose ht!r d istrict is at the median of the California districts.1 .7 and the median test score is 654. 654. This is a big change.9 .::ording to Equation (4. cutting the s tud en t U~acher ratio by 2 is pre dicted to increase test scores by approximately 4. from 19.1 .culling the student. they are predicted to increase to 659.11). this improvement would move her dist rict from the median to just shan of the 6(Jlh percentile. Is this improve mentlarge or small? According to Table 4.11) would not be very usefu l to her..~s perform . the estimates in Equation (4. Of course.. These data con tain no infor mation on how districts with ext remely small clas. Is this estimate o f the slope large or small'! To answer this.2 .6 points: if her district's test scores are at the median. so these data alonc lire nOI a re liable basis for prcdict ing Ihe effect of a radical mO\'e to such an ext reme ly low student. absent those other factors. and as the figure shows. From Table 4. Recall that she is contem plating hiring enough teach· ers to reduce the studentteache r ratio by 2. the smallest studentteacher ratio in these dala is 14.4 . we return to the superintendent's problem. According to these esti mat es.2. and the social sciences more generally. such as reducing the stude ntteacher ratio from 20 students per leacher to 5? Unfortunately.teacher ra tio.7. A reduction of2 Students per class.on Model 121 predicted tcst score is 698.7 to 17. it has become the com Illon language for regression analysis throughout economics. How would it affect test scores? Ac. Presenting results using OLS (or its vari ants discussed tater in this boo k) mC'lrlS that you ure "spea king the same language" w •• • . but it would not be a panacea. BUlthe regression line docs give a prediction (the OLS prediction) of wllat test scores would be for that district. a decrease in class size t. and sht! would need to hire man)' new teachers. this prediction will not be exactly right because of the other factors that determ ine a district's pcr formance.
M uch or Ihlll be mea~ured by il~ OLS re!!ressioll of the actual CXCC~) return on the stock again~t the actual excess ret urn on a broad markcl index. H.Ire an.l5 2. imcstmclII.R.S. studied in Section 3. .ur. (4.. ri~k ier than the market pMt U linancial incenti\'c to tak e (l ril.xces.R / ). a Mod buu~ht On January I (or 5100.\1anagcmcnt (\\iI~te di~ptJ. In 0. or cx~('ted The "hcla" of :l qock has l:'Iecome a workhorse of tho: iD n:.al) Spri nt Nextd (tdecommUntC3lllms) nam6 and '\ohle (hook retaikr) \lim~fl (software) Bcsi Bur (electronic I. thc CAPM says tbat R R.Th('" f<.\umplc. This me~l1I~ Ihat thc right way to (mce U.M folio and /3 c.... for hundred5 of :.R.. rolio and thu s eomaods a higher C\rec1cd e:\fX:>:.~ r('tu m on a portfolio of iJlI a. Under tbe 3$SUmplions introducted in Section .e! pncing model l('APM ) formal· lZes Ihh idea. Sm. can he reduced by holdin). Iht.)mtionul 10 Ihe (the ·'nlllrke t portfolio~ ). r• In practice...· ~ion of R .. H. btimoted IJ O. ~tocks L o\\" ri~k ~ix .:! mar ket pori Kellogg (breakfasl cereal) \Val.122 CH APTfR 4 lineorRegrenionwitno...l'hort·tcnn U.. like owmng stock in a company.~ as other economi!ots and statisticians.. The OLS formulas are built into \'irtual1\" .fAthe OLS _ __ .lhe c:I:pcct. estimators also have d esirable theoretical properties.. (3(1< "..\f.f the population mean.50]/$JU(l 7.ailahlc as~ts Thai i!\.llkr) Ama?Qn (online retailer) . on R". .' a relOm (If H [(S105 $1(0) + $2. making OLS easy to usc.01 Y a!o un est imator l.. R .:d return l on a ri~ky inve~lmcnt. and you C Dn oblBin ei>li· mated fJ·~ risk·free. These .12) whcre R".:0nlra5t.. 1.lurn on (In in~C!I\m~\l( is the chRnge in itl! pnct any payout Idil'ldend) frum !he ill" estlllent ItS" pcl ccnillge of ill> initial priet:o For c. 0.."peclCd Compony return on an asset is pro]. . . . mu.... The QU..ll· ogous 10 the de~ira blc properties...u.!!t portfolio. ken 10 be the ra te of interest on d~'bl....! .locb on in\'cslment Those /3's I~pically fiml Web ~iti:r.:! 1.. CAP. ~·url· logg have Stock:!. arc c~limated hy At fi r$t it might ~hould ~ecm like the ri~k of a stock I'ariance.es. the risk·rree return I~ often t. consumer producl!I (irm~ like Kd mCil~u re th¢ risk of a siock i~ not by i1:. Thus thc exc. government According to Ihi. 3 stock "'illl a fJ < I hil~ bs risk Ihlln Ihe ma rket ponfolio lind therefore has a lower expected excess rcturn than the mar!.W! di\l\knd dUl'ln~ the year and ~lU on Ikccmt>cr 31 [or$J05 . ho wcver. in\'e$tmcnl. Solid di ffcre nll~ . ~hC)ul d be po<.5% p!u. expected i. According to the l' xce!:>~ e".Ill spread !ohcc t a nd s ta tistical software packag. is the: ex~C'tcd rdum on Ih.s.:.O~ The capital ai>.! other stocks in a "portrolio" in olher word~ by diversify· illg you r fin ancial hllidi ngs. The table below gi\'cs estimate:d p's for stocks.lhc Iff>.70 U. CAP~l.. . which then pmd 11 $2...quipmcnl ret.. ret urn.Mart (discount retailer) Waste.eRegres~ A fundamenlalldca uf modern fina nce is that an inveslor nc:c:d~ a stock with iI {J > I 1<..17 2.\..itin:. m I....RI' on a risk).7K I.::>s retu rn.O.ould ha\(. wilh [Oil' {h: riskier technology hal e high but rathe r or its 1·(11 /lria/ICt' ""jlh the market.lment industry.' t"1)l!frlCient in thc population rcgre:.k.! c x~cd the rcturn un a S<1fe.65 U..
thai is expbined by X.timator is also efficient among a ~crtain class of unbiased estimators.\. you might wonder h<...llhemalically.5 . from their avcrage. "" ~ + iiI' (4. ):. Y.) " .. ESS = ~ (Y. I (4.1.· 1 (4.3 Measures of Fit H aving estimated a !ineur regression.3).. as Ih e sum of the pre dkled value. typically is from it5 predicted . and the total s lim o(squllres (TSS) is the sum of squared dc\iulions of 1'. this efficiency result hold::. the R2 is the ral io o f Ihe sample \arianee of Y.lw well that regression line describes the data. from its 8\'Crage: In Ihis notalion.. .15) Equation (4 . 3 Meonlr~ of Fit 123 C:.1l ) 10 the sample vari ance of Y. 'alue.Jained sum or ~qullres (ES~l is thc sum of squared deviations of the predicted values of 1'. M. 4. plus the residual i : u Y. and furtherdi~ussion of thi~ result is deferr~d until Section 5.timator is unbiased and consbtcnLThe O LS e::. " . TheR2 The regression R2 is the (raclion of Ihe sample va ria nce of Y.4.::gres sion measures how fa r Y. 14) uses 11ll: (.. . explained by (or pn.:diclcd by) X . The slaodl"lrd error of the r. or arc they spread out? 11lt. Ihe R2 can he written as the ratio or the e xplained sum of squares to the total sum of square. however. TSS = ~(l~ .ures Ihe frac tion of the variance of Y.ell the OLS regression line filS the dala.2) ~l1ow us to write the dependellt variable Y. The definitions of Ihe predicled value find the residual (~ce Key Concept 4.Y)2 . Does the regressor account for much or for little of the varhuioll in the dependent variable? Are the obscnation.el thaI the sllmplc average O LS predicted valuc equals Y (provcn ill Appendix 4.lhe Rl r anges between 0 and I and mea..~ The eXI. R1 and the standard e rror of the regression measure how ""..j')'. tightly c1u~tered around the regression linc. .. under sollle IIddilional special conditions.
The Standard Error ofthe Regression The sta nda rd e rror of the regression (SER ) is an es timator of the stan da rd deviation of the regression error iii .. = Y ." sq ua red OLS residuals: SSR = " 2.. . thus the R2 is ze ro. the n the SER meas ures the m agnit ude o f a typi cal devia tio n from the regressio n line. nOI e xpill ined by X i. Ihe magnitude of a typical regressio n e rro r. Ii.)ua re of the co rre lati o n coeffi cient between Y a nd X The R2 ranges be twe e n 0 and 1.SS R. The units of 1(. then Y.. O1ed sured in the units of the dependen t variable. th e R2 of Ule regression of Yon the single regressor X is !.1 8) Fina lly. fo r a ll i a nd every residual is ze ro (that is.16) A lte rn'Hively. i~l (4. t l i = 0).3 th at TSS = ESS . In this case.in dolla rs. The s um of squa red residuals.he S<.. In genera l. If ~ l = 0.ile <I n R2 nel:lr 0 indic a t~ " th at the regresso r is not very good at predictin g Yi . if th e units of the depen de nt va ri able are dollars.that is. if XI expla ins a ll of the variatio n of Y. so Iha t ESS = TSS a nd R2 = 1.17) It is sho wn in A ppendix 4. the expla ined sum o [ sq uares is ze ro <lnd the sum of sq ua red resid uals e q uals the to ta l sum of sq ua res.124 CHAPTER 4 linear Regreision with One Regressor The R1 is tht: ralio of lhe explained sum of sq uares to the total sum of squares R2 = E SS T SS' (4. so the SER is a measure of the spread of the o bservations aro und th e regr ession lin e. the R2 can be wrin e n in te rms o f the fraction of the variance o f Y. and Y. the n Xi expl ains no ne of tbe va ria tio n of Y. In coolTas!. For example. is the sum of Iht. or SSR. a nd the pred icted va lue o f Yi based o n the regressio n is jusl the sample average o f Y.. Thus the RZ a lso can be a tbe tOlal sum o t expressed as 1 minus the ratio of the sum of squared residua ls L squares: R2 =t _ SSR TSS' (4. A n Rl near 1 indicales that the regressor is good at predicting Y" wh. tbe R2 does nut take 00 the extre me values of 0 or 1 but fa lls somewhe re in be tween. are the same.
imply Ihilt this .l.Y in Equation (3. it. SSR = So. The R2 of this regression is 0.or hy 11 2 is negligible. This large spread means that predictions of test scores made using only the sludent. relating the standardized test score (TesIScort!) 10 the student.. Figu re 4. the Stll dent. 11 1" = n _ 2' n i. by n . but much varia lion remains unaccounted for. estimated usi ng the California test score data. where sJ = =2 . si Application to the Test Score Data Equation (4.1 9) is similar to the fomm la for the sample standard deviation of Y given in Equation (3..19) whe re the form ula fo r uses the fact ( proven in Appendix 4.1 % of the va riance ot" th e dependent va riable TwScore.11) reports the regression line.2 here (instead of II) is the same as the reason C or usi ng the divisor II . fil e difference between dividing by n..7) is " . As the sc3 uerplot shows.so the divisor in this factor is 11 . The formula for the SER in Equation (4.1. and the S£R is 18... The R2 of 0.I U (4.1 %.7) is replaced by it" and th e dh'isor in Eq uation (3..6 means that standard deviation o f the regression residuals is 18.un arc unobserved.the OLS resid uals u\.6 means that there is a large spread of the scalterplot in Figure 4. o r 5.4.teachcr ratio fo r that district will often be wrong by a large am ount. two "degrees of freedom " of the data were lost .2. the SER of 18. except that Y i .3 superimposes this regression line on the scalterplot of the Tes/Senre and STH data . Tbe form ula for the S£R is SER 1 ~. whe re the units are points on the standardized lest.I in Equation (3. because two coef· ficients were estimated Iflo and (31). (The mathematics behind lhis is discussed in Section 5.(. Because the standard deviation is a measure of spread. The reason for using the divisor IJ . whereas here it is n .7) in Section 3.6.7): It corrects for a slight downwa rd bias introduced beca use two regression coefficie nts were estimated..3) Ihat the sample average o f the OLS residuals is zero. .051.thc ScR is computed using their sample counterparts.2.teacher ralio e xplains some of the variation in test scores.051 means Ihal thl: regressor STR explains 5.. This is called a "degrees of fn:ed om" correction.teacher ratio (STR).. What shou ld we make of this low R2 and large SER? Th e fact that the R Zof this regress ion is low (and Ihe S£R is large) does nOI. 3 MeowrM 01 Fit 125 Because the regression errors U I••.3 around the regression line as measured in points on the lest. The S£R of 18. . by itself.2.6.6.) When n is large.
> 0) and so nw times to worse perfe rmam:e (II.S willand wi ll notgiv~ useful estimates of the regression cocfficknts. given X.Hlu understand ing these assllm pt iom is essential for understa nd ing whe n O I. bUI on avera c over the popu lation the. someli ll1~') these olhe r ractors lead to hetter perfonmnce th. or luck on the leSI. has a mean of zero.4.: r ratio.ero: stated mathcmutically.:1 to differ from thc prediction hased on the population regres. given a .20 and." Wha t Ihe low R2 does te ll us is Ihal oth er important factors innue"ce lest scores. have natur~11 illtcrprctMions.. Assumption #1 : The Conditional Distribution ofuj GivenX j Has a Mean of Zero The firs t least squares assumption is that the condit ional di~tribution of u. ! con 1110na on " . £(uI IX.o. The populat ion rcgrc~si on is the relationsh ip tha t holds on a verage bet ween class size and test scores in the populat ion. howe ver. . in the sense that.) . f30 and (3 1. . l lley do... at othe r va lues x 01 : j c . and asserts that the~' otner fuc tors are umt' laled 10 X. As shu\\n in Fi gure 4. = x) = 0 or.im plcr notation.)rncwhat .4. the distribution 0 iI.The low R2 and high SER do not tell us what these factors arc. say 20 stude nts per class.4 The Least Squares Assumptions This sect ion presen ts a sel of three assumptions on the linear regression modd and Ihe sampling scheme under wbich OLS provides an appropri. re prese nts the other factors that lead test sco res at a given diM no. 1111.I X. <II I ere nlly.Icl\ch. < 0).. "nlCSC fa ctors could include difrcrcnces in the student body neros)' districts. a t a give n value of class size. ' < In' .pre dictionisri~h t . .e iven = 20 the mean 0 tel!) ~ 7SW ' 9 Flgms ] .lit o f 7.• "6 CHAPTU 4 linear Regres~ion with Ooe Regressor regression is eithe r "good " o r ·" bad. and thL' e rror teml II.4 Ib is is shown as the distrihution of II . but Ihey do indicate that Ihe studcnHeac her ratio alone exp l ili ll~ only a small pari of Ihe variation in leSI scores in thc!oC da ta .Ini tia ll y these assumptions might appear abstract. differences in !>Chuo! quality unrclalt'd 10 the Slu dcnl.lIc estimator (I r the un known regression coeffi cients. This assumption is a form al mathem:ltical sta tement abllul the "other factors " contained in II. 4.lI1 predicted (II. centerc on l' .. in SI. lll ol h er words.'alue o f X" the mean of the distributiun 0 f these uther factors is 7cro. more gene rally.ion line. E(II .11 is illustmted in Figure 4. population regression line at X . be in!!.
." The figure shows the conditional probability of test scores b. has a condilionol mean 01 ~o for all values of X . The mean of rhe conditionol di~lribvtion 01te1l scores.4(1 6~1 f.1 the popul ation regrcssioll lin e is Ihl.nt to assuming. 20.di~lrich wirh cla~~ $iws of 15.Y .4 The Conditional Probability Distributions and the Population Regression Une T~st score 7:!1I 7~1 Dbufbution 01 Y wIll'l1 X '" 15 (!l'>tJ / (Ylh IS) DlSIribution of Y whtn X • 20 N >I I / E(Y!X" 20) ((f!X". III Fi '" . . f {YjXj. As shown in Figurc 4. • . AI 0 givoo yolue of X.4 . The conditional mean ofu in a randomized controlled experiment.' conditional mean nf Y.lgment . StuJC'ntlea c her nliQ . given X. n IX.4.IX. Random assignmen t makes X find II indepe ndent. Ih. WheLher this assumption holds in Ii givcn cmpirical appli cation with observational data requires earefullhought and ju<. th e assumplion that E(III IX. Y is distributed around the regr~sion line and the error..and we return to this issue rc pea tedly. a nd 25 slvdC11h. e nsuring that X is dislri huled independently of all pcrona l characteris tics of the <. is the population regrenion line flo . g llfOfl the Uudenlteochet rolio.fW.. the best thai c.6) . suhjccls are randomly assigned to the treAt mcnt grour (X = 1) o r LO the control group (X = 0).PIX (.. in thc precise sense that E(II.4 The least Squares Assumptions 127 fiGURE 4.1n be hoped for is that X is fll if randomly assigned . In obsen'ationa l da ta.ubjccl. Instcad. which in tum implies L hal the conditional mcan o f II givcn X is zero.{flo + (jIXI. (a mathe matical proof of thi s is left as Exercise 4.) 0 i~ o. X is not randomly assigned in a n experimen t. The random assignment typo iC<. v . 25) Olstrlbu\Jon 01 Ywl!enX = 25 / fj(J .) = O. Tn a rdndomized controlled experiment.:qui va1e.Illy is done using a computer program thit' uses no information about the subject.
that is. It is the refore often convenient to discuss the conditional mean assumption in terms of possible corre lation bc{weell X.d. . If the obser vations are drawn hysimple random sampling from a singJe large popula tion .• 128 CHAPTER 4 linear Regression with One Regre»af Correlation and conditional mean . . Thus.) ::=.• n Are Independently and Identically Distributed The second least squares assumption is that (X"Y. i "" 1. then il m us t be the case that £(u. O.i. a re uneorre latcd. . and U r If Xi and Il i are correl ated . If she picks Ih e techniques (Ihe level of X) to be used on the J1h plot a nd applies the same kchniqu e to Ihe ph plot in all re petitions of the expe riment. IX j) = 0 implies tbat X i and u. the conditio nal mea n assumption £(u. However.. Thus X . or corr(X.n are independe nt lv a nd ide ntically distributed ( i. Recall from Secti on 2. assumption is a reasonable one for many data collection schem es. That randomly drawn person will have a certain age lind ea rnings (that is. If ther are drawn nt random they are also distributed independently t"rom one observa tion to the next. a nd II. 1Xi ) is nonzero.). then the IWO ran dom variables have zero covariance and thus are uncorrclated [Equation (2.) sample of II workers is drawn from this population. necessarily have the same distrihution. then (XbY. For example.5 (Key Concept 2. then the value of X.i.. this implica tion does not go the other way: eve n if X. so the sa mpling sche me is nO I Li.i . . is nonrandom (although the outcome Y j is ran dom). /I aTe i.d.). then (X"Y.. if Xi and 1/.27)].5) .observa tio ns on (X" Yj). 3fe uncorreialed.. does not change (rom o ne sample to the next.d. howe ver. O ne example is when the values of X are not drawn from a random sample of the pop' ulation but rather are set by a researcher as part o f an experiment. As discussed in Sectio n 2. 3 tha t if the condit ional mean of one ra ndom va riable given a nother is zero. and imagine drawing a person a t random fro m the population of workers. suppose a borticuhuralist wants to study the effects of different organic weeding methods (X) on tomato produc tion (Y) a nd accordingly grows d iffe rent plots of tomatoes using different organic weeding techniques.. Not all sampling schemes produce i. i = I•.i . 1f . X and Y will take on some vaJues).) across observations. Assumption #2: (Xi' Y. .i . are correl a ted . The i. the n the conditional mean assumption is violated. For example. let X be the age of a worker and Y be his or her earnings.d.n..i. i = 1.d. For example.. Because correlat ion is a measure of linear assa cia Lion .d.d. the conditional mean of u i given Xi might be nonzero.J i := 1.. s urvey dala from a randoml y chosen s ubsel of the population Iyri caUy can be treated as i. .). 'Ib e results prese nted in this chnph:: r . this is a statement about how the sample is drawn. II . they are i..
i.4. a key Slep in lhe proof in Appendix 3. tbe level of X is random and (Xi' Yj ) are i.. ypiJ ill. then the law of large numbe rs in Key Conce pt 2.5 using hypot het ica l data.d. • Y" are i. assumption. modc rn experimen tal pro tocols would have the hort icu lt uralist assign the level of X to the differe nt plots using a computerized random number gencrator. Another example of nOIli. One source of large outliers is dalRen try errors.d. observations with values n/" X i andlor y/ rar out side the usual range of the dataare unlikely.9) Slales tha llhe sample va riance s? is a consistent estimator o f the population variance oi. In this book. regressors 3re a lso true if the regressors are nonrando m.i.74. ••. TIlis is an exa mple o f time se ries data. 1 (Yi . A no ther way to sta le this assump tio n is Ihal X and Y have finite kurt osis. Equation (3. they might be recorded fo ur limes a year (qu an e rty) for 30 years. The assumption of fini te kurtosis is used in the mathematics that justify the largesample approx imations to the dist ributions o f t he OLS test s ta tistics.ling \ S (If ~ o l!S l C j1h 'ro lll s? .d. y)2.d .IJ. sampling is when observa tions refer 10 thesame unit of observation over lime. howevcr. If Y1. The ca. For examplc. ~ "'i.>that is. Specific<l lly.4 The Leost Sqvares Assumptions 129 developed for i.'<perimental protocol is used .e o f a no nrnndom regressor is. This potential sensi tivity of O LS to extreme o ulliers is ill ustrated in Figure 4. where these data a re collected over tjme from a specific firm.p Iplc.! ". . For example.i. We encounler ~ d this assum ption in Cha pter 3 whe n d iscussing the coMistency of the sample variance. and a key fe ature o f time series data is tha t observations fa lli ng close to each o lher in time are oot independent but rather tend to be correlated with each uther: if interest rates are luw now. Assumption #3 : Large Outliers Are Unlikely The third least squares assumption is that large outlier. such as a lypographicaJ error o r incor rect ly using differel1l uni LS (or different observations: Imagine coUC<:ting ncS.i.6 a pplies to t he average.d . Ti me series data in trod uce a set of complications that arc best hand led aft·e r devel oping the basic lools of regres~ i o n analysis. When this mode rn e. This pattern of correlation viol a t e~ the ·' in dependence·· part of the i. is finite.3 showing that is consis tent. the assumption that large outliers arc unl ikely is made mathe ma tically precise by assuming (hat X and Y have nonzero fi nite fourth mo menLS: 0 < E(Xn < (XI an d 0 < E( y n < co. Large outliers can make OLS regression results misleading.i. thcre by circum venling a ny possible bias by the horticuh urll iist (she might usc her favo rittl weed ing method for the tomatoes in the sunniest plo t)..(s~ ~ u~). we might ha"e d ata on inventory lev e ls (Y) at a firm an d the inte rest ra te at which the firm can bo rrow (X) . thcy arc likely to be low next quarter. for exa mple. and the fourth mo me nt o f Y. q uite specia l.
)1~ () ... 2.)\) hll~ C'xduJmg oot!J ~r I I _~J )0 (. The OLS regres ~ line estimated with me oudier ~ a stro..130 CHAPTER 4 linear Regression with One Regressor FIGURE 4 . Onc way to fi nd outl iers i::. BecamL' c\a~s size and lest scores ha ve a fin ite ran g~ .. they ncce~sari ly have finite kurto<. O lS 5~) n:j.The least squarc~ assumptions play t Will roles. commonly used distributions s u(.i ~ is a plausible on" in man)' applications with economic data. the assumption of finite k unrn.:J by a fewobse rv:lIions. If this a ~sumr lion h old ~ then it is unlikely tha i statistical in fe rences using O LS will be tlominatr. then you C<ln either correct th~' error or.CI.'19 POSltrve reIotioruhip 17uO 141_1 • between X and Y. Use of the Least Squares Assumptions The Ihrec Ica.3. ~ \ lgOU(hn . to 1'101 your tlata. Data entry e rrors aside.'h as the noml<l l di stribu ti on h<l\'<. und "L' return 10 thcm repc. as a ffi <l the matinll matter.i:.umm:l rized in Key ('nllcepI4..t squa res assumptions fo r the linear rcgrcs')ion model nrc .ucdl y throu ghout this textbook. but inadvertently rccortling one stutlent\ height in centimeters instcatl.re"H.: four moments.5 The Sensitivity of OLS to large O\Itlien y 2(1)1) Th is hypothetical dolo ~ has ooe oudie. some dist ributio ns h. Class size is ca pped by the physical capaci ty of a classroom: the hcst you can do o n a st andardiLed test is to get all the qucstions right and the worst you can do is 10 get all the q uestions wrong. drop the obse rvalio n fro m your data !..• I 40 OLS n::. .!:.rC'i~ion lin. More generally. ir that is impossible.er shows no relationship. If you decide tiwi an out li er is due 10 a tlata entry error.. nnd Ih is assumption rules ou t those distributions.(l 7(1 X data on the heigh t of sludents in meters. Still . but the OlS regression line estima ted without 1 HNl MOO the ou~..l ve infin llc fourt h mome ll t~.
2. this largesample normal distribution JetS us deve lop methods for hypothesis testing and constr ucting conCidcnct: intervals using th e OLS estimat ors. 111 turn.. c<ln be sensiti ve to large outliers.. One reason why the fi rst leasl squares as. 'caust' rtosi!>.:. c n f~ ryou ~t the ~ ono.YJ i = 1.d.you should examine those out liers carefully to make s ure those observations arecu[ rectly recorded and belong in the data set I I and . If your data set comains large outliers.el.:?.1 ha\'c ilfinite isump linatt:d Their fi rst role is mathematical: If these assumptions hold. n are inde pendent and identicaHy distributed (i.5 Sampling Distribution ofthe OLS Estimators Because th e OLS es timators ~() and ffi 1 are computed fr om a randoml y drawn sample.'c 4. ' .. As we wit! see. (Xi.1\.: ~' s ica l lll tn!. where 4. as is shown in the next section..ons with time !!oeries data. It is also impo rtant to consider whether the second assumption holds in an application.5 Sampling Distribution 01 the OlS Estimators 131 v THE LEAST SQUARES ASSUMPTIONS rAi\H"1I4'1!(?~ .: . .) draws {rom their joint distrib ution: and 3. K£Y CONCEPT : I Yi = f30 + f3\X. and additional reasons are discussed in Section 9.i. the estimalors therru... the inde pendence assumption is inappropriate for lime series data. .'/"::. The third assum ption se n 'es as a reminder Ih<ll O LS. 11.. urge o utliers are unlikely: Xi and Yj have nonzero finite fourth moments. The t'fror term IIi has conditiOnill mean zero given Xi: E(u j!X 1) = 0: 2. Although it plausibly holds in many crosssectional data sets.":':.es are random variables with a probability distri bution. 1l1eir second role is to organize the circun1Sl ances that pose difficultic~ for O LS regress ion ... just like the sa mple mean. the firsll east squares assumption is t he most important to conside r in practice. + II i.:>ed in C hapter 6. then.4.3 1.i = 1. in large samples the OLS esti malors have sampling distribu tions thaL are normal. the regres· sion methods devc loped under assumptio n 2 req ui re modificat ion for some applic. : .:>ump tion miglll no t ho ld in practice is discus. .Ihe sampling distributionthat describes the values they could take over >umm a • . Therefore.
). A llha ugh Ihe sampling distrihution of Y ca n be compli ca led whe n thi.6 about the sampling distribution of t he sample average. In particu la r. P I) ill II. lf n is la rge. y. Decause Ihe OLS estim ators are calculated using a random sam ple. This implies Ihal the marginal distributions of and are nornlJ l in large samples. In small samples. an e~t l ' malOr of the unknown populati on mean of Y. by the central limit theorem Ihe ~ampJinf dist ribution of and ~1 is well approximated by the bivariate normal dislTibutiOl1 (Section 2. it is possible to ma ke certain sta te me nlS about il t hat hold for all II.) : ~.> lions 2. the mean of the sampling distributions of ~() and are J30 anJ 131.! sa mple size is small. iJo Po iJo PI ..7 If the sam ple is suf(iciently large.Po (ind ~I are unbi(ised estimators of f30 a nd P I' The proof that PL is ullhitlseJ is given in Append ix 4. a nd ~t a re random va riables tha t take on diffe rent values from o ne sampk to the nex t. (4. Y is a random variable that takes o n differen t values from o ne sa mple to the next the probability of these different values is summarizt!d ill its sampling dislribUlion. the mea n of the sampling distributio n is 111' tha t is.J). Review ofthe sampling distribution of Y . bUI in large sampk.A. the n more can be said about the sa mpling distribution. the cenl ra llimit t heorem (Section 2.5 and 2. tions.4.6) states that this dist ribution is approximately normal. Because Y is calculated using a ra ndoml y drawn sample. £(Y) . these distributions arc complicated.• 132 CHAPTER 4 lineor Regression with One Regressor differen t possi ble random samples. Th is section prese nts these sampling distribu .so Y is a n unbiased estimalor o f J). The sampling distribution of and These ideas carry over to the OLS estima tors a nd ~l o f th e unknown intercept AI and slope /31of the population regression line. under the least squares assumptions in Key Concept 4. they are approximately normal because o f the cenlrallimit theorem. ~y. the probability of these di(fe re nt values is summarized in their sam pling distributions.21)) Lhat is.. y. ill E(P u) : ~o and E(P. 1n othe r words. it is possible to make certain state ments a bout ill hat hold for all In particular.3 and the proof that is unbiased is \e ft <IS Exercise 4. In pa rticu la r. Y. The Sampling Distribution of the OLS Estimators Recall the discussion in Ser.3 . Po Po PI Although the sampling distribution of and can be complica ted whcn lh~ sample size is small.
d ~~. as reliable unless there arc good reac. The largesample normal 4.] l • is (4 .) £( X l) X I_ (4.enot a simple ave ra ge. it is normally distributed in large sam ples.7) for Pl' you will sec thai iL too.6 we suggested that n = 100 is sufficiently large for the sampl ing distribution of Y 10 be well approximated by a normal dist ribut ion. In virtually all modern econometric applications" > 100.L.)F . and of th e eSli mators decrease to lCro liS 11 increases (1/ appears in the denominator of the formulas fo r the variance!!).1) a jointly normal sam pling distribution.:re H . f30 and /31' when 1/ is large. where _ _ . (rt ~. the centm llimit theore m co ncerns Ihi: distribUlion of averages (like }i). This criterion carries ovcr to the more complicated 3\'t!rages a ppearing.ults in Key Concept ~.ons 10 thin k otherwise. lik e the simpkr average Y . If you e xamine the numer aTor in Equtll ion (4. in regression analysis. U summarizes the de riva tion of these fo rmulas.5 Samping DistributiQ"l of the 0lS E~timotors 133 lARGE·SAMPLE __ and PI JHlVt. where the variance of this distribution. (Appendix .4. Ihe cenlral limit theorem applies to th is a\'erage so thai ."lmple size is large. . then in large !>l'lrnp1cs j. . (Y. The rc:. malo r~ will be lightly concentrated around their mcnns.21) n [var(X. The largesample normal distribution of iJo is N(Ar (TJ).22) This argume nt invoke!!. il) a I)'PC of averag. Technically. ll . (Tk .4 distribution of ~ I is N(fJ" ("tll.1 var(llilll ) "iJ. := .Y)(X.! DISTRIBUTIONS OF ~O AND ~I KEY CONC£PT I lfthe lcust squares assumptions in Key Concepl4. .50 \\C willlrea t the normal approxi mations to the distributions of the OLS estimator. In Section 2. bu t a n average of th~ product.) A rc [e" ant queslion in practice is how large n must be [or these "pproximat io ns to be reliable. The normal npproximation to the di~lrib ut i on o r the OLS estima tors in large samples is summarized in Key ConccpI 4.4 ...4 = 1 vur\(X. like )7.3. when the s.~ (.X ).1 . a nd sometimes smaller /I suffices. II I E(H ~}j2' \\hr. SO Ihe distribution of the OLS est..E. .tL.4 imply that the ()LS estimators arc consistent that is. ~o a nd ~l will be close to the true popula· tion coeffi ci ent~ /30 and f31 with high probabilit y. lhe ccnlrallimit theorem.3 hold. As dis c ussed furth e r in A ppendix 4. This is bt!cause the variances ~ .
witn a lorge variance... .' .. The no rm al approximation to the sampling distribution of ~f) and b l is a pow erIul tool...21) so the smaller is To get a bett er se nse o f wh y thi" is so.·" I··" •• . the larger {he variance of X. which have a larger vari ance t!:l an th e colored dots. which presents a scatlerpio\ of 150 arli1icial data poi nts on X and y' The data points indicated by the colored dots are th e 75 obse r yation ~ closest to X..6 The Variance of p..... 202 t :t 1% • • • . ~ ... · . cise is b" .. the variance erJ .... : ..b for making infe rences a bout the true population values o f the regression coeffi cientS using only a sample of dat a.. in ge neral. ."'. Suppose you were asked to draw a ljn~ as acc urately as possible through either the colored or the black dots. The 0 black dots represent 0 set of X.... th is arises because the \'ariance of PI in Equation (4. ::'1 :" ...• 134 CHAPTER 4 lineor Regres~ion with One Regres!>Or fiGURE 4 . . ·.4 is lhal. . the more pre PI' u J.: the larger is var(X i ) . •• • · . . . • • ••:: • ... .. The regren ion line can 204 be estimated more occurotely with the black dots thon with the colored dob... look a1 fi gure 4.. o ( Mathemat ically. .' ·.6.. ' • • ' • • • • 194 LI_ _ _"_ _ '_ _ _'_ _'_ __ '_ _. ....' ... Similarly. (he targer is the de nomi n<l tor in Equalion (4.. and the Variance of X y 206 The coiOfed don repre sent 0 set of X:s wjth small vorionce.... . we arc able to develop methoc. With this approxima tion in hand .r. .J 97 98 99 100 101 102 103 X Another implication of the di~lributions in Key Conce pt 4.. the large r the variance of X" the smaller .2 1) is in versely propo rt ional to the squ are of (he variance of X . ·.which would you choose? It would be easier to draw a precise line through the black dots.. ..
X. and confidence imervals is taken in lhe next chapter. This assumption implies lhat the OLS estimator is unbiased.. then Ihe sampling dislribmion o f tJle OLS estimator is normal.1 summarizes Ihe le nninology of the population linear regres sion model. The second assumption is that (X.that is. Slated more fo rmally.given the regressor X.) are i.of It observa tions o n 3. There lirc ma ny ways to draw a straight line th ro ugh a scauerplot . . The rtsulls in this chapter d e~cribe th e sampling distribution of the OLS esti m<ltor. if n is large. Y.. The third assumptio n is that large o utliers ure unlikely.Y. are consistent. pre sented in Key Concept 4. ~ is rri pre: ~"ic nl S ~ : Pi Ie Id Summary l. These impor1l'lol properties of the sa mpli ng distribut ion of the OLS estima tor hold under the three least sq uares assumptions. The slo pe. 130. By themselves. The popula tion regression line. X and Y have finite fou rth moments (fi nite kurtosis).Sumroory 1 .d. This stepmoving from the sampling distri bution of to its standard error. for the vari ance of the samplin g distribu tion of the OLS est imator.. f30 + PIX. dependent variable. but doing so using OLS has scveral virtues. If the least squares assumptions hold. for . and have a sampling distribution wit h a variance (hat is inversely proponionallo the sample size 11 _ Moreover. The reason for 'h is assump tiOn is {h at OLS can be unreli able if the re are large ou tliers.!st a hypothe sis anoul tbe valu e of {31 or to construct a confidence inlerval [or {31' Doing so requires an estimator of the standard deviation of the sampling dist ribution. This assumption yields th e formula . th ese results are not sufficien t to tl. is the mean o[ Yas a fu nction of the va lue of X.6 Conclusion Thischapter has (ocused on tbe use of o rdinary least sq uares to estimate the inter ce pt and slope of a popula tio n regression Ii n ~ using a sampk.3 S 4. The inte rcept. {3 1' is the ex p~c t ed change in Y associated with a Iu nit change in X. hypot hesis tests. and a single regressor. however. The first ass umption is that the error te rm in the linear regression model has a conditional me:m of zero.4. determines the level (or hcight) of the regression line. as is the case if the data are collected b} ' simple ra ndom sampling.i. thcn the OLS estim fllOf'3 of the slope and inter cept are unbiased. the standaru error of the OLS estimator. Key Conce pt 4.
The OLS estimators of the l.: and between the O LS predicted value Y. arc 10 the estimatcd n::gression\ine. '1l1C R2 hi between 0 and 1. P I .. . ·n lere are three key as:. regression intercept and slope ure denoted hy ami P 3. . and then provide <In example In which the a. i = l...i. If these assumpt ions hold .ee n the residual Ii.' unlikely. The standard error of the regre<:<:ion is an eSlimator of the standard deviation of the regression crror.. .). II.ion \I..d.3 Sketch a hypothetical "Cauerplot of data [or an estimated reg. 136 CHAPTER 4 linear Regression with One Regressor 2.sion (S £ R) are measur~ of how close thl. and ".. (1) unbiased: (2) consistent :and (3) normtl ily distribut ed when the S<L Po Key Terms linear regression mooel wilh a si ngle regressor ( 114) depe nden t variable (!l4) independent variahle (114) regressor ( 114) populatio n regression lint! (114) popu lation rt!gression fu nction ( 11 4) populat ion in tercept and slope (1 14) population coefficients (1 14 ) pnni metcfS (114) error term (I 14) o rdi na ry least sq uarc:s (Ol S) esti mato r ( \ \9) OLS regression line ( 119) predicted va lue ( 11 9) resid ual (1 19) regrcssion H2{I 2.9. random draws from the popu lation: ~nd (3) . X.5. \\ ith a larger vulue indicat ing Ihllt the Yj's are closer to the linc. have a mean of lero condit io nal on the regressors X..lX) 4. n by ordinary le.. The popu lation rl.: (2) the sample observations arc i.2 For each least Stluares assumption.( }/. 4..I arg~~ outlier') an. and the regression error If.'gression linc can be est imated uliing s(lmpie observat ions ( Y..umplions fo r the linear regression model: (1) The regres· sion errors. values of Y.sion "Jth Rl _ 0. jlh R! 0. the OLS estimators (30 and (3 1 arc mpie is large. provide an example in which the :lSSUI1lI " lion LS hllid.rc<.sumption faii~ 4. 111e R' and qandard error of the regre<. Sketch a hypothetical scaacrplOi oC data for a rcgres:.Lst sq uares (OlS).1 Explain the difference he tween and 131: bet .1) explained sum of sq uares (ESS) (123) lOla I sum o f squares (TSS) (123) sum of sq ua red resid ua ls (SSR) (1 24) standard erro r o f the regression (SER) (124) least squares assumpt ions (126) Review the Concepts 4.
ize across the 100 classrooms is 2 1. . What is the regrcs. R2:= O. A man has a late growth spurt lind grows i .82 x CS. Rl.2. Wha t is the regression's pred iction fo r the increase in this ma n's we ight? c. sizc (CS) and a verage le st scores from loo thirdgrade cl asses. R2 "" O ..<.SER = 11. ) 4.) \\'ilh with 4.4.. What is the sam ple a verage of the test scor~\o across the 1(X) classroom!.6 x A gC'". What i:.M. and this year il ha:.) rl .5 inches over the course uf a year. The sam ple a verage class !.io n·s prl!dictio n fo r tht! change in the classroom a ve rage test score? c.). o n clus:. S£R = 10.81. estimates the OLS regress ion . Wlla! i!.? (llint: Review the fo rm ulas fo r the OLS estimators. lh ~ rcgn:ssion's predict ion for thlll classroom's a verage tesl score? b..5.Exerci$eS 137 Exercises 4. selected from a popula tion a nd tha tthest! me n's height a nd we ight are recorded. and S £R. for the R2 and S ER.41 + 3. . inche~ where Weighl is measured in pounds and llldg'" is measured in )I. thc. A cla:.99.. B.2 Sup pose tha t a ra ndom sample of 200 twentyyea rold men i!.rooms? (Hint: Revie w the fo rmula:.7 + 9. Last yea r a classroom had 19 s tude nts.3 A regres.1. A rcgres· :. Suppose that instead of measuring weight a nd height in pounds a nd inches.023.ion of ave rage weekly ea rnings (A IV£.e varia ble are measured in centimeters and kilograms.5.4 . estimated coefficie nts. the regression's weight prediction for someone who is 70 inc hes tall? 65 inches tall ? 74 inches tall? b.SER = 624. Ill l} ' regression? (G ive all resuhs. What a re the regres!. Wha t is the sam ple sta nda rd de via tion of test scores across the 100 clas.ion of weight o n height yields Weigh! ~ = .ion estima1es from this ne w centime t cT~ L:ilogra m the '. usin g dill .c.94 x Height.1 Suppose lha l a researcher.sroom has 22 students. measured in dollars) o n age (measured in years) using a random sa mple of collegeed uca1ed fuli tim e workt!TS aged 2565 yields the following: AWl:' = 6%. 23 stude nts. Rl = 0. TestScore ~ = 520.
. What is the regression's predicted earnings for a 25·yearold worker? A 45yearold worker? e. . Ex pla in what the eaefficieol values 696.) 4. Suppose that the value o f f3 is less than 1 for a particula r slock.7 and ~.023..)? (Hilll: Don '\ forget the regression error.• 138 CH APTE R 4 linear R egression with Ooe Regre~sor o. = Po . Will the regression give reliable pred iction s for a 99yea ro ld worked W hy or why not? f. b.. let X. but some stude nts have 90 minutes 10 complete the eXdTll whi le others have 120 minutes. What are the Unil'l of measurement for the SER (do llars? years? or is S£R unitfree)? c.R. and consider the rcgrcssion model Y. What is the a verage value of A WE in the sample? (Him: Rev iew Key Conccpt 4.4 Read the box "111C ' Beta' o f a Stock" in Section 4.2.2.1. :5 loo). Each student is ra ndo mly assigned one of th.3%. The shmdard error o flh e regression (SER) is 624. Is it possible Ihal variance of (R . Let Y/ denote the number of points scored o n the exam by the jlh stutlent (0 :5 Y.. 6 mean. Tn a given year.R. the rate of re turn on 3monl h Treasury bills is 3. do you think il is plausihle Ihm the distribu\j on o f errors in the regression is nor· mal? (Hint: D o you think thai the distribution is symme llic or skewed? What is the smallest va lue of earnings.: examinat io n times based o n the nip of a coin.). He gives each of the 400 sfude nts in his course the same fioa l exam. .5% and the rate of return on a large di versified portfo lio o f stocks ( the S&P 500) is 7.6 years.) c. The average age in th. " nd is il consistent with a normal tlislIibu lioo'!) g. a. Show that the va riance o f ( R . use the estimated value of /3 to estim ate the slock's expected rate of retum. + II /< .90 or 110). denote the amount o f lime thai the student has to complete the exam (X. What are the units of measuremen t for the Rl (dollars? yea rs? o r is R'l unitfree)? d. G i ve n what you k now abom the distribution o f ea rnings. fJ IX.Rrl for this Slock is greater than the variance of (R". TIlt regression Rl is 0. b.Rj ) for thjs stock is greate r than the va riance o f (Rm . For each company listed in the table al ine end of the box .5 A professor decides to run an experiment to measure the effect of li me pres ' sure o n fi nal exam scores. 4.is sa mple is 4 1. Suppose thll lthe va lue o f f3 is greale r than 1 for a part icuklf siock.
Explain why t:{u.IX. Compute the estimated gain in sco re fo r a s tuden t who is given an additional 10 minutes o n the exa m..21). where (X . = 0 for [his regression model.9 u.. Whi':. ) =: 2.6 Show that the firs t leasl squ a res assumption .!X.4? Wha t about ~l'?) 4. in Ke y Conce pt 4.3.::d. Wh y will different students have b. E(II . ii. p. II . this imply that #. Doc!. is N(O.O .IX.20. Suppmc you know that f3t.d. 4).4 continue 1 0 hold ? Which changc'! Why? (Is (31 normally distributed in la rge sa mples wilh mea n a nd variance given in Key Co ncepl 4.) 4. I).) £(Y.) ~ ~o ~ 0& O. and X .3 a re sal is· fi.core o f studcn [s gi\'e n 90 minutes to complete Ihe exam: ' 20 min utes.X. b.1 Show tha t P o is a n unbiased esti ma tor o f f3o. . i.) are Li. s~l i s fied ? A re the other assumplions in K ey Concept 4.. De rive c!'timlllor of fj .:::: Iln + f3\X. When X = \.) l·._ 4.? tI . 139 represents. 4lJ + 0.: ~Iirnal()r of {31 " It formu la fo r Ihe \cast squares Ihe least squares :::." ion assumpt ions in K ey Conccpt 4. (ll int: U~e the fact that ~I is unbiaseJ . Derive a n e xpression for the largesample variance o f [Him: E valuate the Icrmsin Equalion (4. . Show thaI Ihe regression satisfi. d.24 X . im plies that P. 4. + " /. Suppose you kno\\ Ihal J30 = O . + II " a.1f. A linear regression yields PI = O .10 Suppose Ihat Y. b. is a flernou ll i random va riable wit h Pr( X =: 1) . Compute the estimated regression 's prediction for the average :.::y Concept 4. D erive a fo rmula for .3 are b. 4..11.3 =: Explain.0 .f3\X.. Show tha t R2 . A linear regression yie lds R2 = O .. W llich is shown in A ppendix 4.Exercises a.:d exce ptlhal lhe fi rst assumptio n is rcplaced wilh E(II.. = O? 4. 11 Co nsider Iheregression model Y.. a ssu l11 p ti o n~ n.l X. = f30 '. and 150 minu tes. l h ! estim ated regression is Y.8 Suppose that all of the rcgrcs.J 4. whe n X = 0.h pariS o f K . i~ N (O. Explain what the term d ifferen l values of II.
l'I"htK dutll " ere provided by Professor Daniel Hamcrmesh of the Um..ity of'texas at AtlSun lind ".369376.ree. Ru n a regressio n of average hourly earn ings (A H £) on age (Age). "Beauty in tbe Cl1IS)room: t llsiructor~' Pulehntude lind l'utJ' live Peda'O£. and professor characteristics for 463 courses at the Univt:f· sit y of Texas al A U51i n. That is. fuilye<lr workers.Economics of £dllCfJllOll Rt:v. Alexis is a 30yea rold worker. (Gene rally. Does age account for a large fraction of the variance in earnings across individuals? Explain .ilb Amy !'arke r. rho b. show Ihal Rl::. leadi ng to highe r productivity and earnings. Show thai the r~gr ession Rl in the regression of Yon X is the squared va lue of the sa mple corre la tion between X a nd y.1 for 2004.~.) n. 1 A detailed description is given in Tcachi ngRat· ings_Dcscription . Predict Bob's earnings using (he esti. One of tbe characteristiCS is an index of Ihe professor's " beauty" (I S rated by a panel of six judges. Predict Alexis's earnings using the estimated regression .12 a..com!stock_watso n).ica1 Producli~ily.o::re u~. .you will find a dahl fi le Teachin gRatin gs tha t contili ns data on course eva lu ation:>. In this exercise yoo will investigate bow course eva luations :lre re lated to the professor's beaut )'. A detailed description is given in CPS04_Dt:sl:ript ion .. Wha t is the esti ma ted intercept'! What is the estimated slope? Use the estimated regressio n 10 answe r this question: .2 On th e text Web site (www. £ 4. 24(4): Pf'.AJB.S. Empirical Exercises E4. age 2534. Bob is a 26yearold worker. Show that the R2 from the regression of Yon X is the same as the R~ from the regression of X on y .1 On the text Web site (www. (These are the same data as in CPS92_04 but an: limited to the year 2004 ) T n thili exercise you will investigate the re l ~lIion sh ir between a worker's ag~ and ea rnings.. with 11 high school diploma or B. August 2U05.140 CHAPua" linear Regre~on with One Regressor 4.aw· bc.. c. o lder worke rs ha\'e mort: job experience.you will fi nd a data file CPS 04 that contains an extended version of the dat a set used in Ta ble 3.c. also ava ilable on tbe Web sil e. as their higheSTdeg.H ow much do ea rnings increase as workers age by one yea r? b. mated regression. It contains data for full·time .aw·bc. also avail able on the Web site. course characteri stics.comlsfock_wafson).n hiS paper ".
e'· and "sl1'I<.. In this exercise you will usc these da ta (0 investiga te the re lation:. ~va l uatio ns E4. Run a reg. Run a regression o f years of compieled education (ED) on diSlance to t h~ nearest college (D ist). here students go to high school? !J1. data [rom a random sample of high school seniors interviewed in 1900 and re·interviewed in 1986. where Disl is mC<lsured in te ns of miks.III. Commen t on the size of the regression 's slope. 2 D. also avai. Predict Profe ssor Stock's and Professor Watson's course evaluations. ( Hint: What is the sample mean of Bcalll)'? ) c_ Professor Watson bas an average val ue of Beuuty.~ data we re pro. pp217. so Il)at slUdents who Ihe closer to a four. D ocs there appea r to be a relation ship between the va riables? b.ation or DI"er(ion1 The EUecl of Community Colleges 011 Educational AIt ain· men!:· Journal of BI.hi p be tween Ihe number of completed years of ed uca tio(l for young adult.. (Prox imil Y( 0 college lowers the cost o f educa tion.) A detailed descriptio n is give n in College Distance_Descriplion . (For example. .·ided hy Professor Cecilia Rou. while Professor Stock's va lue of Beauty is one standard deviatio n above the ave rage.) What is the esti mated inte rcept? What is the estimated slope? Use the esti mated regression 10 answer this q uestion: How does the average value o f years of comple ted schooling change when colleges are built close 10 . Is the estimated effeci of Beauty o n Course_Eva/ large or smu ll? Explain what you mean by " Ia rg." e. d. Construct 14 1 a scauerplot of ave rage course evalua tions (Course_Evnt) on the profe ssor's bcaUly (Beou/y).ression o r average course evaluations (Course_ElIlIl) o n the professor's bea uty (Beottty). Apnl 1995.><! of Princeton Un lVe n.lable o n the Web site..224.3 O n Ihe text Web site (www. complete more yea rs of highe r educatio n.you wi ll find a dala fil e Coliegef)iSlonce Ihat cont ain::.awbc. o n average. . Does BC(tllIY explain a large fraelion of the va riance in across courses? Explain .comlstock_walsoo) . Disl .s nnd the distance from each stutlcnt's Iligh scbool to the nea rest fo uryear college..year college should . 12(2). What is the estima ted inte rcept ? What is the estimated slope? Explain wby Ihe esti mated interce pt is equal to the sample Oleall of Course_Eval.surtsS (lltil Et·o"umic Slil/l~·'IQ.2 means that tJl(~ distance is 20 miles.Empericol Exercises 3.ity and wcre used In her paper ~ DcmOCfa!iz.
ny 3ntl wef(' Ol<J j()I!nW/ . d.comfstoek_wwtson).. Does the re a ppear [0 be a re la tion· ship bet ween the variables? b. Hlso lIv"H"ble on the We b site.. .e Gr..yo u will fi nd a da ta rile Growth that contains data un Jve rage growth rates over 19601995 for ti5 coun tries. 2\00. Does Malta look like an outlier'.. Find Malta 0 0 the sc'ltte rplot. Ilow would the prediction change if Bob lived 10 miles from the nearest college? 1..~. lit! Thor~h:n Bcd.. along wi th va riables that are put e nt ia lly re la ted to growth A uelailed description is given in Growth_D escription .wer the same questions in (c).. In hi~ p.4 On the text Web !>ile (www.aw.C()/IOIIUO.0. Using all obser vations.hc. What is the estimated slope'! Wh. years. n Uni~cr. Constr uct a S(·atterplot of a verage a nnua l growth ra te (Growlh) on Iht: 'lverage trade share (TrtttieShare)."lf'< ' of Finan.a. ] n. Ma lia.:. "'~r~ pro\"idcd by Profc~1IOr R ~ U\ln( 01 Bro ..142 CHAPTER 4 Linear Regression with One Regressor h.' c.o: WI. or some thing else)? E4.. ami ~orman Loa)7..5 "nd wit h a trade sha re e qual to 1.300. Bob's high sl:hool was 20 miles frOTllthc nearest college.th. An::..~ l:. dollars. Predict Bob'!. has a trade s hare much larger than the other countries. E::.'C:'o l.ship between growth and trade.timate the same regression excluding the: dat a from Malta . " I 2h1. run a regression of Growth on TmdeShare.. yea rs of completed education using the estimated regression. cent ~. Does distance to college expltlin a large (met Ion of the variance in educational allainment aeros.. Where is MaILa? Why is the Ma lta IraJl' share so large ? Should Ma lt. UFin'ln~.UrI.. In lhis exercise you will in ves tiga te the re lati on.! be included or excluded rrom the unaly~ i~? ~I"h~'s.. One country.." indi\'iduals? Explai n. e.lI is the esti mated intercept? Use th~ regression 10 predict the growth rate for a country with trade share 01 0. W hat is the value of the standa rd error of the regression? What are the units for the standard error (meters. grams. ml the s. d.
2.:iI)ufcd liS "fulltime equ ivalcnl) "). .\ssroom. school dWl"nctcrblic.b . firs t lake the parti..( dcrivath e~ with respect to blJ and b l : iJIo . and cxpendi\ un.lhe vn lues of /Junod b.ca..dl\idcd by the num her of full time cqui \'alcm teachers. The dat a used here :tn.:n in Key Concept 4. l " bll  bIX. enroll me nt. To minimit:c the sum of squared prediction mlSlaKes I:~ I (Y' .23) p  l I ~ l btX.)! IEquation (46)1. and the percentage of studenls who are Eng lish learne r" (that is. equIValently. giv. O. School characlcriSlic. (4.~ bll ._ 2. number of computers per cl.gov).s (:tvc r aged :lcross the district) include. (Y.' (rom fl ll 420 Kfl3nd KlS dbtricts in Califo rnia wilh dutl! OV:lilllbtc for l\J9g and 1999. .~.rcent age o f sludenls who qualify for a reduced price lunch.b() b.\udenl. The OLS CSlHnator.:d l b ling !lnd Rl'pl1rting da ta )ct contains d<ll:\ on lesl JX rfor mance. Ihe p.e L: .l. a slAndanJi'lcd leSI administered 1 0 fiflhgrade SlUdc nts.1 Derivation of the OLS Estimators I biS appendix u~es calculus to derive tho: formulas fo r thc OLS cstimnton. I f1~ . and studen t dcmogruph ic backgrounds.. I( Y.'. APPENDIX _ _ _4_.hal minim.) and (4. 11 .blX. .·d from the Califorma Do:pan ment of Education (w\W.'mographic vurinb1cs {or the students also are 1I\'cTilged across the di"lricL'Ille demographic variables ind ude the percentage of students who) are III Ihe public lI~ista nct program CaIWorh (formerly A FDC).L " (V.hlch the ucri\luivcs in EquatIons (4... All of these data were obtain.Oemot. f '" 2 "L (Yi .)X. for "..oo of the as EsJimoton 143 APPENDIX 4 ..TIl": S\U denlleacher ralio used here b the number of ~tudenl S in the district.cde.24) .ba .:) per f..I.1 The California Test Score Data Set 'fhe Cillifornin Stnndardizo.L " . and ~" are the values of bl• and b l t.b" " Ix . nu mber of t e ach e r~ (mo. Test scores arc Ihe It\'cmgc of the read ing and mmh scores on the Stanford 9 Achievement lest.Xi nr. (Y.23) fi... SllIdCnlS for whom Engl ish is a second language). .xy = 2 2.
.+:~: the formula~.28) are the fo rmulas (o r . fJ. i(X.Po and PI.  u)j (4.::!lI) Equa l i on~ (4. (4.htX "" 0 and 1 " • • I " LX.27) is i( x.1 "/1 Solving this pairor equations for (4. PI given In Key Concept . (4 2~) Y .esequallo zero.) iJo and P I yields i{X.v.X)[~. . bas the normal sa mpli ng." A _ II !_I ! II .27) and (4. I_I .e nllmerator of the form ula for AJ + f3 1X/ + u j• Y.y) . LX. in large !>Bmpl¢.lIl<.2(. = P! in terms of the regressors and ~'rrors.ii. must satisfy the two equation.29) " "" PI L I I (X j  " :\. . . . we show Ihallhe O LS estima tor PI is unbiased and. 1 X)(ll._I LX.XI' " (·1. . .. .0 " /. 144 CHAPTER 4 linear Regreuion with One Reg(~~ and (4. ..(X. .24) equnl zero.Y. collecting lerm\. APPENDIX ___4 _" _ 311 Sampling Distribution of the OLS Estimator In this ap~ndlx.Accordingly.X) + II.iJo . BcC.4.X)(¥.1. and dividing by" shows thill thc OLS esumalors. distnblllion given in Ke y Conce pt 4.o and lion (4. .(X I' L{X.~..X ._I ! i x. . . Representation of P I in Terms of the Regressors and Errors We sta rt by providing an expression fo r Y.(X.)1 + :L.Ii) .P.X Y 13. .Y) " '""..2 7) b yn . . $0 the ~I in Equation (4.XI i_I + (u.x.sxyh} is obtained by dividing tht numerator :md denomm tllor in Equa.X)( Y.27) iJ" Y ._ _ _ __ .  Y = /31( X. set ling these dcrivuti.
4) .. of iterated expecta tions qjJ.31) fo ilo \\'5 by U ~tng the la w of itcm ted expecta tions (Section 2....27) yidds 1 " L(X . .).X)F. XJ =.~1 l'1condilion' all y unhill'!ed .L~ I (X.' )11. . The cxpcct.31 ) _ f31' n ' :L(X.. I (1. Ihal is. .X)II ..X)II.X'. .. ] " .X)(u. It fol1o.29) yields l ~ I( X i L. ~ thalthc (undilional eXJ'Cc\i! ' large hraekets .. II ... ..il'__ Pi .  f3 11 X I" . By the fir~t lenst squarc~ assumplion...PI.X l' i~I where Ihe ~econd equality In Equ at ion (4.30).X H Y. Equi\. Is Unbiased ln E(P1)=P j "' /:.trit'outed mdepclldc nlly of X for all Obierl'.. .X. .X.L: :(X.. .(u. which implies that ~ 7. I " " . . given X I' .30) ProofThat Thus.3 ).. PI IS unbiased..)' ) ::: PI }. . . E(lIiI X.O.X l' 1/ ..Pi . X" .X f . I.X) ! + cxpre~sion in turn into the formula for P : in Equa tion (4.UO). .PdXI•.L ( X..  X)II. . X~ ) ~ E(II . .I X.h:..XlU . .. ~o that [(P I .I (X.\:.  >. By the la.(XI. .t. .7 _t (X.._I ' ~ ] (4.X )(11/ X)II..X )/1 L. I~ (. P. . II) = 145 2:7.uion 01P I ~ obtained by taking the elpect3tion of bolh "id~ of b.1  .:L (X.. Subslituttng this II ) . . ) . herl' the finn l equnluy follow'S from lhe definitiOn of X .n the 'Second line of Equation (4.Ptl = FIF.'. .onccpt4. .alenlly..O .. howcvcr.) = O. Xn)) =' O.. + E [ . so L::(lItI X I" lion In .kl  I ~~ 1X. so that E(PI) = Pi: that IS.] ~. (Key C. .E(~tl X.s E~timo lor No w ~.~ "".) II . I(X.{J . . B~ the second lea~ t squares assum pt ion. LargeSample Normal Distribution of the OLS Estimator The largc slLmrlc normal approximalion to Ihe lim1llngdi~trihution olP.31) is zero.) obt3lncd h} l'un~ldc ring the bcha\'ior of Ihe finallenn in Equation (.11 X III = O. Substituting 2:7 d I(X... tnto Ihe Gmll e x pression in Equation (4. .. j(X.Sampling Distribution of the a.' I(X.(.ltions other tha n i. [ f(X.jurttion (1. I(X.
the samp le covariance s~\ betwee n the OLS residuals and the regressors is ze ro.ILx)u J /[ n[var(X. whe re (Tt. Thus. Nc). and (4. By the first least square5 assumption.il~·. Combining these two results.3)) (. . the term in the numerator of Eq uation (4. in large s<lmph.".1 4). is i. II satisfies all the requiremen ts of the cenlrallimil theorem (Key Conc. i. (4. ze r(l~ the sample average of the O LS predict ed values cq u. if the sample sue close a pproximation. in large 'l amples.d.32) through (4.ux)lI. TSS. 7). we have that. Vi has a mea n o f ze ro... is. ii lvar(X.. 1). where v.! consider the express!on in the denominalO r in Equal io n (4.). 17)1. so in large samples it is arbitra rily close to the population vari· flnce of X . tT. (Xi .I " :2>.popul..35) say t ha t t he sample average of the OLS residuals i<.X :' 0.32) nL Y.30) is The sample ave rage V.. Thus the d istrib ution of v is well approximated by the N(O.:pl 2. X is nea rl y equal 10 I'x' Thus.i. whe re u~ = u~/ n. )n) distribu tion. By the second leas t squares ass umption. "" var( v)/ [va r( X.)J"I .2 [Equa tion (3.15).~. Th ~· fero re. =O' . l]. and SSR arc defined in Equatio ns.) so IhM lite sampling distribution of lion (4. ..30). ilnd the ro tal sum of squares is In. = var[(X. .1.JLX)If. is u . '" 0 and Ir.. to il Bcca u~ X is consistent.x.l which. which is inconseque ntia l ir n is large).I (4. d istribu ted N(O. 13..ltio n variance. is large. ( 4. N(t3I ' q~. (4.J· Some Additional Algebraic Facts About OLS The O LS re~ idua l s and predicted values satisfy : 1 " nL u..2 1) . =Y.8)].) TSS = SSR + ESS . = var(Xi . which is the e xpression in Eq u. ". . In large sa mples. =1 I " .3. by the th ird least squares assumption.! is Y . the sample variance is a consiste nt estimator of th e. (4.: sum of the sum of squared residuals and the expla ined sum of squa res [the ESS.• 146 CHAPHR 4 linear Regre~5 ion with One Regre~SOf Fll'S\ consider the numerator of this tenn. vl u " is.~ nonzero and (in.3~) Equations (. Ihis is the sample v" ri am'c of X(exn:'pt di vidingb}' n [alher Ihan n .".". As discussed in Section 3. ~ I  f3 1. The variance of v.md i.
 y. + P I I:~ lil. r. L (X. = ( Y. '0 .  y }2 + 2i(y. combined wi th t!le preceding resul ts. ..Y) . + Y I y)2 = I_I i (y.27).so l L~.Y}(X. i" = O . SSR + ESS . "" Y. note tha t I:.. To verify Equation (4. ~ X)'  (06) o. .34)._ i I {(Y.X ): thus ft L u.X. = O.. . .Y) (4. 1_ 1 i~ ' Y)  p.= .(Po + PIX.}2 + . = X) ] (X.I . • .i r:.l. _1 . ) = Pur:.X) L(Y .I iu.y) 2 = "iCY.x. + .].Y i r.~IX. ~ y) .IY whe r(' lhc:: second equalil}' is a consequence of EquatIOn (4 . )(Y. . in Equa tio n (4.P. I.~ \ Y. where t he fina l equa lity in E<luru ion (4.. im plies that 5"X Equatio n (4.(X._ 1 • ~ X).IY / + L:_I.rify Equation (4. . fin a l equality fo llows from = 0 by the previous n:suhs.1i... ::: L:.32).Sampling Oi ~tribution of the 0lS Estimator 147 To \c. .  iJo .(X.I . . .. I• y.~J Y.33).L:~ Ili.Y.i.i.I i:(y.o r._I iCY.. To verify Eq uation (4. .j "" 0 im plics r..  But t he defini tion of Y a nd X imply that }:~_ I (Y' .yi .}~ ) :: 0 and S:. lli.35) fo llows from the previous results and some algebra: TSS = . .   X}  P '/L ( X.~ ii.II. nlis res ult.37) = SSR whe re the + £ss + 2±U..:. nOle t ha t the definition of AI lets us wri te the OLS residu als as Ii. ~ L (Y.36) is o btained using the formula fo r /3. I(Xi X) = O. no te t ha t Y.(X/ . :.:.X. .X ).32).
. .5 presen ts thl:! G '1U ~~ ·M arko\' theorem. which qates thaI. we show how knowlft dge of this sa mpling distribution can be used to mak~' Sl alCI1lt. in additi on. some ~\ron g. Section 5.. which measures the spread o f the s ampling tl istribution of 131 Sect ion 5.cusscs the d istribut ion of the OLS estimator when the population distribution o f the rcgrc"sion c r l'\)r~ i~ nonna!.6 dic. 15.. Section 5.: intercept). OLS i!I effiei!!n! (hilS thc smal1e~! variance) among a certain class of estimators.4. how iJI has a sampling distrihution. The sla Tting poin l b the slancJa ru error of lhe OLS esli ma lo r. then some stronger rc\ults can be derived regllrding the dbtribu tion 0f Ihe OLS estimator One of these strongc. a concept int rod uced in Section 5. Section 5.2 explains how to construct confidence intervals for f31' Section 5. Ih r. conditiOns is Ihm lhe crroI3 lire homo)kedastic. Chapler 4 cx plain.: Ihe sampling unce rlai nty.:d how the OLS estimntor {J! of the slope coefficie nt {3\ diffe rs from one sampl e to the nexl. [n th is chapter.. under certain condition.. then shuws how \0 use [J.1 provides an expression {or this s ta ndard ~rror (and for the Siandard e rror of the OLS estimator of 1ho.:c k asl squart:s assumptions of Ch apter 4 hold. Sections 5.:nts abou t f31 tha t llcc ura le ly Sllmmnri/o. and ih standard error to test hypot heses.Ihtll is.3 assume that tho.CHAPTER 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals T hiSchapler conlinucs (he trea tmen t o f linear r~g ress i on wilh a single regressor.. If.: r ~pedlll case of a binary conditions hold .3 takes up thl:! regressor.
'iJu of the popu lation regression line is 'Z e ro.6. the superint edent ask~. evidence in your sample of 420 obse rvations on Californ ia school dislricts thai th is slope is nOfl1. which has the general form gi ven in Key Concept 5. We sta n by discussing twosided tests of the slope f3 ] in de tail .\l:~ =. has no effect on Icst scores.J.ero? Ca n yuu reject Ihe taxpaye r's hypothesis thai {3Ch1u. The third step is to compute the p value.! tcst statistic actually obsc rvcd: equivalcntiy. She has an a ngry tax payer in her offi ce who asserts that cutting class sizt'owill not help boost test scores.'\'idenec? This section di'SCusscs tests of hypotheses aoou tt he slope {31o r inte rcept Po of the popu latio n regressio n line.o)/ SE(y). applied he re. assuming t hat the null hypot hesis is correct . at leas! ten tatively pending Cunhcr ne\\ I. Is there.ilalis tic actua lly obse n'ed. which is an estimator of the sta nda rd deviation of the sa mpling distribution of Y. (Y  I' r. so that n:ducing them further is a waste of mon ey.r:o. the n lOrn to one·sided tests a nd to tesls of h ypo the!)e ~ rcgarding the inte rcept f3 u. The gene ral approc lch to testi ng hypoth eses about these coefficie nts is Ihe S(lm c as to testing hypotheses about the population mean.1 Testing Hypotheses About One of the Regression Coefficients Your clie nt.i r. the (sta tistic is r Tes ting hypotheses about the population mean . th e slope f3 0 <l".. The tax payer's claim ca n be rephrased in the language of r~gre ss ion analysis. Heca use the effeci o n test scores of a unit c hange in cl ass size is Ihl(f~SiU" the tl'lx payer is asse rting th at the population regression line is fiatthaI is. by ran dom snmpling varia lion .2 that the null hypolhe!)is Ihat the mea n of Y is a specific v<l lue l.O.1. The tes t of the nu ll hypothesis Ho aga inst the twosided alternative proceeds as in the three ste ps summarized in Key Concept 3. The second !)Iep is 10 compute the Istatistic. The [i ~ 1 is to compute t he sta ndard error of }It S£(Y'). the supe rintendent calls you with a prohle m. which is Ihc smallest signilicancc level at which th e null hypothesis could be rejected. Class sizt'othe taxpayer cla ims. Recall fr om Section 3. so we begin with a brief revie w. or stl ould you accept ii .1 Testing Hypolheses About One 01 the Regres~ion Coefficients 149 5. based on Iht.L}' o. TwoSided Hypotheses Concerning 13. o can be written as flo: E( Y ) = p.5. the pvalue is the prObabi lity of obtCl in ing!\ slatistic. at Icast as diffe rent from the null hypothesis value as is the f. a nd the two·sided alte rnative is " l: £(Y) t:.
m ce level For example.'sted using the same general approach. u twos ided resl with a 5% significance level wOllld rejectlh c nulJ hypoth esis if I.Under the twosided a lt ern ative. tes t is 2¢( l r<Jell) . lion of 131' Specifica lly. Beca use the Istatistic has a standard normal distributi on in lurge samples under the null hypothesis.lope f31 can be It. the p va [ue fo r a twosided h y p o th e~ i.  SE(Pl ) = ./3 l. The null and alte rna tive hypotheses need to be slated precisely before thl!~ can be tested.0' Tha t is. H1 : {3. Pt... iD large samples. the sampling distribution of Y is approximate ly normal..statistic has Ihe form I = csLimator . the null bypothesi. the critical fCJ ture justifying tht..<1 (t wosided a lt e rna tive ). ..' foregoing test ing procedurc for the popula tion mean is that. Because PI also has a normal sampling distribution in large sa mples.. we follow th e sa me three sleps as for the popul a· lio n mea n..n A ppendix Table 1 A lternatively.o. the .s ta tistic act uall y computed and e l) is the cumul ative standa rd normal distributi on ta bula ted i.1 is the value of the .l ) ~ (Key Concept 3. "* /3 1. More gener ally.Al a theoreticallevc l. In Ihis case. under the nu!! hypothesis the true popuilnion slope OJ takes on some specific value. hypotheses about the true value of the s.0 "s.. The angry l<ixpayer"s hypmhesis is that O CllUSSj~t = O . (5. the third ste p can be re pl aced by simply co mpa ring the Istatistic to the critica l valu e apprO pri<:l le for the test wilh the desired signific.96. Testing hypotheses about the slope P t..J) ..1 In general. where lU/.hypothesized value sta ndard error of the estimator (5.5). {31 does nOI equal f31.~ and the twosided Illtemative hypothesis are Ho: {31 = f31.arr l > 1. Th e firs t step is to com pute the st andArd error of SE(~ l) ' The standard error of ~I is an estimator of lTiJ • th e standard devia tion of the S<1mpling diSlribu ..2 ) To test th e null hypothesis H o. (5..1 . the popul at ion mean is said to be sta tistically sig nifkall1ly diffe re nt than the hypothesized value at the 5% significance level.• 150 CH APTER S linear Regres~ion with One R egressor KEY CONCfPl GENERAL FORM OF THE ISTATISTIC 5.
the hypoth esis can be tested at the 5% significance leve l simply by comparing the val ue of the Istatislic to ± 1.0 (5. the probability of observing a value of at least as diffe rent from f31. 1 _ X )l/il 2 . .2.PUll! SE(~.. the null hypothesis is correct.1 Testing Hypotheses About One of the Regreuion Coefficients T51 where _ 1 _ "t. the critica l value for a twosided te"t. uoder Lhe null hypothesis the (statistic is approximlltely dist ributed as a StHn d<lrd no rmal random variable.6) where PrH" denotes the probability compuTed under the null hypothesis.96. Because ffi) is approximately normally distrihuted in large sam ples.(31. the sec ond eq uality fo ll ows by dividing by SE(Pl)' and I n(/ is th e value of the .5) The third step is to compule the pvalue . If so.) Pr ~ p. (5. in fact. say less than 5%. °lllese ste ps are summarized in Key Concept 5.a).. (5. the null hypothesis is rejected at the 5% significance level.11("11) = 2$( l lq("ll). 1 /I  [1i (x. 1. [IP 'fu'l > Ik' p.S.oas the estimate actually computed (jf. and rejecti ng the nuU hypothesis al Ihe 5% level if IrMfI > 1.~ SE(iJ. ) (1 ' 1> ". H. PI pvalue = PrHJ IP I  f31.ul > liff' .statistic actually computed. I '~'I)· (5.7) A small va lue of the pvalue. assum iog tbat the nuU hypothesis is correcl.) ' (31 ." I] = SE(~. so in large samples. ~ . The second step is to compute the Istatistic.96. Stil ted lnilUle matically. pvalue = Pr( IZ I > 1. _ 1 ± (Xj 2 /. _X)'] II .4 ) is disc u s~ d in Appe ndix 5. in applications the standard e rror is computed by regression software SO that it is easy to usc in practice. Alt hough the formula for oJl is compl icated.. x . Altern:Hively. provides e vide~cc against the null hypothesis in the sense that tbe chaoce of obtaining a value of /31 by pure ran dom variatio n from one sample to th e next is less th an 5% if. .4) Th e estimator of the variance in Eq uatio n (5.
' I ~\[ldard errors of these esti ma tes arc SE(~n) = 10.% . SER ::: 11:1.4) (0.2. estimates of the sampling uncertainty of the slope and the inlercepl (tbe standard e rrors). * 5. Reject the hypothesis 011 the 5% sig nificance level if the pvaluc is less than 0. To do so..28. The O LS regression of lhe test score against the studentleacher ra lio..~ • 152 CHAPTER 5 linear Regres1ioo with One Regre~!>Or KEY CONCEPT TESTING THE HYPOTHESIS {31 = {3.2 . Because of the imporlance of the standard errors. and its standard error from Equa ti on (5.S) Equation (5.3)].52) . equivalen tly.) {Equation (5. llle Istat istic is const ructed hy substituting the hypothesized value of fJl under the null hypothesis (zcro). ll).5)].1(11> 1.05 or.S) provides the estimated regression lin e.2.4 and S£(&l) = 0.. 0 AGAINST THE ALTERNATIVE {3. 1. Compute the standard error of ~I' SE(P .8) into the general rormula . Suppose you wish to test Ihe null hypothesis 1hllt the slope PI is zero in th. R( 10.8) at the 5% signi lica nce level.: population count erpart of Equat ion (5. Compute the pvaluc [Equation (S.= O..7)J. (5.  TIle standard ~rror and (typica lly) the tslatislic and pvalue testing f3! = 0 afe computed automatically by regression soflware. . and it will he used tbroughout the rest of th is boof..9 .ncl udtXI whe n reporling the estimated OLS coe((icicn ts... and two measures of the fit of this regression line (the! R2 and the Sl:R):l11is is a common form at for reportlllg a single regression equ ation. repon ed in Equa lion (4.6.S) also rcporl s Lhe regression R2and Ihe Slandard erro r of the regre.. construct the Istatistic and compare it to 1.96. the 5% (twosided) critical \'alu ~' take n from the standard normal distri bution.· sian (S E R) followin g the estimated regression line. Thu s E quation (5 . 3.. the estl" mated slope.28 X STR. if It..OS l .52. {3'l..O <. The :. Reporting regression equations and application to test scores.. 2. by convent ion they are . Compute the tstatistic [EqUlIlioTl (5. One compact way to rdpaft the standa rd errors is to place them in parentheses below Ihe respective coeHicients of' the O LS regression line: Te~'ISwre ~ "" 698.9 anti ~I = . yielded iJo = 698.
the pro bahility of o btaining a value of as far fr om the null as the value we actually obt . to thf ~ft of U8 l cst! the art.ac' = 4_ 38. at value e te ct ll~ the e~ti for !l1 11 \l1 * • .teacher ratio/test score problem. ma ny people think that sma lle r classes provide a better o do SO . 0 ·DM . Because this eve nl is so unlikely.1. t h~ result .001'Yo . it is appropri ate to use a o ne sided h y pot hc~is tcst.4. so th e null hypothesis is rejected in ravor of Ih" twoside d alternati ve at the 5% significance le vel.Sided Hypotheses Concerning P. where Z i~ 0 stondord normal ro ndom variable and fX' is the value of the tstatistic colculated from the ~mple. when .$ I QCf = ( . the pvoloo 1 s lg 96. That {JI I rc£. however.o against Ih e h ypothe si~ Ihat {31 f3 !. exceeds ( in absolute va lue) the 5% twosided cri tical value of 1.2. in th e studen t. if the null hypoth esis {3Ck'>iS.resinty of I (5.l to the riqIIt of "'4. Thi s Is ta [ist ic.4.0)/U.4.38 The pva!ue of 0 twosided test is the probabili ty thot II > 1t"""I.38. e xtre me ly Slll<l ll. AlLernativeiy.%.o.s. I are N(O.28 .11115 probability is the area in th e tails of standard normal distribution.: to c:om:lude thai Lhe n ull hypothesis is fal se.1l1is is a tWOsided hypothesis test.S) c !'it of porting ~ i.5). IOLS ~ qua ~ 1M pvalue Is tilt area ~.38. it is rcasonubl(. This probability is ext.00001 ..lined is.y are ra\' to bcu\·e (5. Th e dis(:ussion so far has foc used on testi ng the hypothesis th.001% . l e~s than 0. For exa mple.:. bec.l Testing Hypoiheses About One of the Regression Coefficients 1S3 FIGURE 5.38.0' Sometimes. book in the One. • . 1) ~. as shown in Fig ure 5.' = 0 is true. approxim ately 0. IW can wmpute thepvalue associated with ! "O .:n f3! = f3l.52 = . 1 Calculating the pValue of a TwoSided Test When t(J(l = . is only 0.'luse under the alternative /3 1could be eithe r larger or sma ller than 131.0000 1.8) in Equation (5. o r 0.S.remely small.
') (pvalue. HI : P I < fh.o.10) are reversed.onesided lefttail les!).. This rea son could come rrom economic theory. th e constructi on of th e . but not large posit ive. so the pvaluc is the rightt ail probability. The pvalue for a onesided test is obtained fTom the cumulati ve standard nor mal dis tributiOI) as pvalue = Pr( Z < "4 . (5.0.9). the nuH hypothesis is rejected aga inst the o nesided alte rnati ve for large negative. the hypothesis is rejeCied at the 5% significa nce level if l ilCI < . . Pr(Z > ( l!('I ).. In prac · tice. upon reflection th is might not necessarily be so. or bOl h. TIle onl y difference betwee n a one .and twosided hypothesis test is how you interpret the (statislic. Under that hypothesis. (5.n the cl ass size eXlImp1c. prior e mpirica l evidence.9) is reve rsed .10 ) If the alternatjve hypothesis is lhat {31 is greater than P I».0 vs. to tes t th e nu ll hypothesis that P I = 0 (no drect) against the onesided alternative th at (3 1 < O . Ho"' ever.~ hypotheses should be used only when there is a clea r reason {or do ing so. 0 is the value of P..96. unde r the null (0 in the !>I udent. th e inequa lities in E4u3tio ns (5 . I.154 CHAPHR 5 lilleQr Regre~~ ion with One Regreuor learning environment.. Because the null hypothesis is the same Ior a one. (Doesided alternative) .645.teacher ra tio exam ple) a nd the alterna ti ve is tha t 13. is less tban f3l.the inequality in Equation (5. When should a one~sided test be used? In practice. eve n ir it init ia lly seems that Ihe rclevant a lte rna ti ve is o nesided . one si ded a1t e rn a l i \o./\ > 1. we are reminded of the graduati on joke th ai a uni versity's secret of success is to adm it talented students and the n make sur~' that the facu lty stays oul of tbei r way and does as linle damage as possiblc.o' If the ahe rna tive is that P I IS greater than f31.1.9) and (5 . therefore.<1>(.... A newly rormulated drug undergoin1! cli nica l tria ls actually co uld prove harmIul because o f previo usly unrecognized sid e effects.statistic is the same. the null hypothesis and [he oncsided alternative hypoth esis are Ho: {31 = /11 . It might make sense. For a onesided lest. .and a two sided hypothe sis test. such ambiguity often leads economeuicians to u~ twosided tests.9) where P. For the ones id ed alt ~ matj ve in Equm ion (5. va lu es of the ( st<uistic: Instead o f rejecting it' . PI is negative: Smaller classes lead to higher scores.
()(XI6%. lnc Istatistic tcsting the bypOl hes is tha t there is no effect of class size on test scores [so /3 1. however. Ye t. th e hypothes is concerns the intercept. It • . This is less tha n 2. rd.I3o "* f3o.li os t the onc sided alternat ive at the I % level. Occasion ally.IO) ~it iCS in ~ability.33 ( the crit ica l value fo r <l onesided test with a 1% signifi cance Ic\'e l) .5. In prac 5. t3o.1l1e null hypothesis con cerning the inte rcept fi nd the twosided a lte rnative are Ho: (31) = .9)) is / iKI = 4.2 Confidence Intervals for a Regression Coefficient Because any stati stical estimate 0 1 the slope {31 ne.2 Coo ~del'l(e Intervals for a Regression Coeffici ent 155 :ad h at ) Ih 5. H ypOl hesis tests are use(ul if you have a specific nuU hypothesis in mind (as did o ur a ngry taxpayer).0 = 0 in Eq ua tion (5. there a re many times that no single hypothesis a bout 3 regression coefficient is domi na nt. In fac!_th e pvalue is less tban O.l3o. (3\ . H I :. we cannot de termine The true value of f31 e xactl y from a sample of da ta.0gnized ~ o k e that )ukc sure ~. upon f.2.st the the t 5% 111i5 discussion has focu sed on testing hypotheses abo ut the slo pe.cessarily has sampling unce r tainty. and instead one would like to koo w a range of values of the coefficient tha i are consiste nt with the data. If th e alte rnative is onesided. This calls for constructing a COnfidence inte rva l. you ca n reject the a ngry ta xpaye r's assert ion tha t the negat ive estim ale o f the slope arose purely beca use of ra ndom sampllng variation at the I % signi ficance level .1). 131 is Testing Hypotheses About the Intercept 130 ot be ~ec n lr the .o (twosided alte rnative).1J vs.9) ~am Application to test scorl!s. so the null hypothesis is rejected 3g. appli e d to.11) lC fd nor P o (S. HoW ~ergoi ng The genera l approach to lesting Ihi s null hypothesis consists of the L hree steps in Key Concept 5. Based o n these data. (5.38.l3o (the formu la fo r the standa rd error of is given in Appendix 5. this approach is modified as was d iscussed in the previo us subsection for hypotheses abou t the slope. ~ rn ati vC l11is rca' b. Being able to uccept o r to reject this null hypothesis based on the statistica l evide nce provides a powerful tool for coping with the uncer tainly inherent in using a sample to learn about lhe population.
). l '= 0. llie reason these two de finitio ns a rc eq uivale nt is as fo llo ws.c PI l. wi th Po. by definitio n.96 X 0. 156 CHAPTER S linear R egression with One Regressor is. possible to usc the OLS estimator and its sta ndard erro r to construct a confide nce inle rva l fo r the slope 131or fo r the inte n. First . T he 95% confiden<:e inte rval is tlwl! the cnllectio n o f nJl lhe va lues of 13) Ihnt a re nOl rejec te d.0) ill thr.)}. Second. 131 is summari/c d .· Reca ll .1<11. in 95 % of a ll possible . il is said to have a confidence level of 95%. re po rted in Eq uation (5.%S£ (P. PI .H).ll fo r f3!\ is const ruclt:J i3u <lnd SE<flIl) repiacing .3. Application to test scores.1.! It 95% co nfide nce in ter val ctln be computed hy tcsting all possi ble \'al· ues of /3 1 (t hat is.amplcs the true vulue 01 will 1/01 be rejected . in principiI. Because the 95 % co nfide nce inte rva l (as de fi ned in thc t1r\t de finitio n) is the set o f all valu es o f 13 1 I1Mt a re nor rejec ted 3 1 the 5% significan ce level. il is an interva l ilia! has a 95 % probabili ty of containing the true value of 13 1: that is.:cpt /3r. that is.  lie will reject the hypothes ized va lue J3 I. it is the sct of va lues thai ca nnot be rejected u ~ i ng a twosided h)'pothesis les! wit h a 5% significa nce level.' 1 .8.3.:I S K ey Confidence interval for as in Kt:y Co ncept 5.(p]) .3.. A hypothesis t ~ t with il 5 % s ig nificance level will..•. the confidence inl erval will contain the trUI: va lue of Pl' Beca use this inte rva l contains the true val ue in 95% of all sa mp les. A n easier way to construct the confid e nce inte rva l is to nOtC thlli lhe 1.hat a 95 % eo nfi d ~ n ee interval for 131 ha.Th is argume nt para lle ls the argume nt used W develop a confide nce inte rval fo r the populati o n mean....U whe never /3 10 is o utside t he rang. 1. tes ting the null hypothesis /31 "" (Jill fo r all va lues of /31. it fol l ow~ th a t the true va lue of {3. The 95% twosi ded confidence interval for /3 is [. however. B UI construl·ti ng the tsta lis tic for a ll va l u e~ o f /3 1 would ta ke fo re ver.96SE<p.teacher ratio.521 .26. in 95% o f ~ sible samples tha t might be d rawn... As in th e case of a co nfidence inte rva l fo r the pop ulatio n me a n (Section 3..3).2. TIlat is. two e q uiva lent de Clllilio ns..96S£(P ')'{. tlf 1 .: 0 is not contai ned in Ihis confide nce lntc n <ll.81and SF. = T he conSl ructi un of a co nfidence in terval for Co ncept 5. reject the true va lue of 13 ) in on l} {J I 5% of a ll possible s:lmples. the 95% confide nce interva l for (JI h the int cc\:ll 1.52. The value {:J I ::.sta Tislic. Th ~ OLS reg ressio n of Ihe tcst score again ~t til" stude nt.28 and SF:(~..2. Confidence intervai for 13 . yielded == . A 95% conridence in te rv.: 5% sig ni fica nce le ve l using the ..30 :5 13 1 S .28 . will he contained inl he co nfi dence inter · val in 95% of all possib le sam ples.1 + 1.
52.dx • 1.30. • .f pos true rtCs.96SE(~1 ) 1 X ax.96SE(P.60 points.13) l SITlIC1...teache r ralio by 2 is predicted to increase test sco res by between 2.3.96SE(~l) ' the predictcd effect of th e chan ge !l x using this estimate of (31 is [~I .52 a nd 6.965£(&1) ' and the predicted effect of the change using that esti mate is + 1.3 "is l l!SI only lut' of IP. The other end of the confideoce inter val is ~ I + 1. When the sample size is large.26]. The 95% con· n 3. Because one end of a 95% confidence interval for III is ~l . as Key 95% confidence in te rva l fo r PltU: = IPl <l x . Ie val al th. Thus decreas ing Ihe student . it is th e set of values of {3 1 [hal can no t he rejected by a 5% twosided hypothesis test .2 Confi dence Intervals for 0 Regression Coefficient 151 ~ 1  . kruct 5. it A 95% twosided confidence interval for {Jl is an interval tha t con tains the t rue \." CONFIDENCE INTERVAL FOR (3.x can be expressed as u3. For example.) x ~x. + 1.2) = 2.teacher ra1io by 2 could be as great as .3.96SE(P . our hypothetical superintendent is contempl ating reducing the student.. 60.teacher ra1io by 2.nle rvai fo r Ill . 96SE(P. it contai ns the true "'" Iue of f31 in 95% of aU possible randomly drawn s amples.( P11 ( 0.alue of /3\ with a 95% probability: that is. the eHect of reduci ng the st udent. j( is conslTuCted as 95% confide nce inte rval for (3 1 0:: 5.26 x ( . with a 95 % confidence level.> x l.p. .! predicted change in Y ASS O cia ted with this cha nge ill X is f3lux.1. x. Equivalentl y. inter so (as we knew already from Section 5.2) = 6.: is then lng the Istcll is I e ran ge used !O iin tt:rval fidence interval for (31 can be used to construct a 95 % confi dence interval for the pred icted effect o f fI general change in X Consider changing X by a given amo unt.52\. or as littl e as . Beca use Ihe 95% confidence in terval for /31 is [.d ~a im' t \ h l! \d SE.96SE(~I ) I x !l x. but because we can co nstruct a confidence i. The pop ula tio n slope f31 is unknown. or e int erval.96SE(jJ I). p ..1 2) c n r~ l icullce .1 .)]. (5.1.. ) x . rii&'I(lf*'4 KEYCONCEI'T I ~sing has erval ~ . ~ x. Confidence intervals for predicted effects ofchanging X .1 ) the hypothesis III = 0 can be rejected at th e 5% significance level.we can construcl a confi dence interval for the predicted effect f3 1tJ. .1. Tht.1.l.3 ). (5. Thus a 95% confidence interval fo r the effect of chang ing x by the amount tJ.30 X ( .
.: coeffid ent multiplying D J in this regress ion o r. depend· ing on wh ether thl:! siud ent. = 0 ir male) . one at a l im~ the two possible cases. + /I . For exa mple..158 CHAPTU S linear Regression wi th One Regressor 5.141 The popuhuion regression model with D j as the regre5<. Thus we wilt not refer to /3 1 <IS the sloJX in Equa tion (5.. i = 1..15) is 11 0 1 a slope. X mig. and il turn s OUt tha t regres· sion with a bin ary variable is eq uivalent to performin g a lIirf~ r ~nce of means analysis. To sec this.:gression model with the con linuous reg resso r X .ht be a worker\ gender ( = I if fe male. or whether Ihe district's class s i7c is small Of large ( = I if small. whe n it lakes on o nly two va lues. = 0). Because D.teacher ratio in 0· = { . ... ::: 0 (11ld V I = I. more compactl y.l me as the r. suppose yo u have a variable 0 . except Ihat now the regr~sso r is the binary variable D j .1 51 This is the S. lhal equals either 0 or I. (5. 11. as descrilxd in Secti on 3. thC:fI D. 1h .::: f30 + PJD. (5. == A. Regression a n a ly ~is can also be used when t b ~ regressor is binarY. 15): instead we will simply rere r to P I a~ Ih.16) . If the studt. then what is it? TIle be:. wbether a school dislfict is urban or rural ( = 1 if urban. (0 . h not tonti nuo us.4. 0 or 1. because D j ca n ta ke on o nly Iwo \ <11· ucs. indeed.teacher mlio in . Th e interpretation of /31' however.0 and' Eq uat ion (5.15) becomes Y. 0 . Interpretation of the Regression Coefficients llle mechanics of regression wi lh a binary regressor arc the same <IS if il is con · tinuous. there is no " li ne" so it makes no sense to talk about a slope.l + II. 0 if the student. not useful (0 think of P I <IS a slope.:nl. A binary variable is also called an indicalor \'ariablc or somet imes a dummy \·Hriable.lhat is.. 1h district < 20 district <!: 2U.teache r ratio is less than 20: 1 if the stude nt.! way \0 inter pre t f30 a nd /3 t in a regression with a binary regressor is to co n~id e r .. . = 0 jf large). is different . it j.Ieachcr rat io is high . variable.3 Regression When X Is a Binary Variable The discussion so r<lf has foc used on the case that the regressor is a cont in uou'i. = 0 if rural). . (5. the coeffi cient on 0i' l! /3 J in Eq uation (5 .or is Y.
{3o is t he popul ation mean valueof lest scores when the stude ntteacher ratio is high. 17) TUTal :::n large ThUs. Specifically. £( Y. the null hypoth esis can be rejected at tht' 5% level agai nst the two sided altern ative when the OLS Istatistic I "" ~\I SI::(~I ) exceeds . R2 = 0.::: 1). "" I) 'iable or it is ca n· at regrcs of nleans .ID.14) Hypothesis tests and confidence intervals.40. In the lest score example. is tile difference between the sa mrle averageS of Y. when D j = I . As an exampl e. Si milarly.1I the two population mea ns a re the same can be lesled against the " licrna tivc hypoth esis that they diIIer by testing the null hypothesis f3 1 = 0 against the ahernalive (3.96 in absol ~te value. a 95% confidence interval for (3 \. Similarl y. /31 is the differe nce betwecn th e conditional ex pect ati on of Yi when D. yields res/ Score = 650. 10) .14) esti mated by OLS usin g Ihe 420 obscn'ati ons in Figure 4. 13u + PI is the popul. then 13 1 in Equation (5. Because f30 + f3 j is the popula tion mean of Yi when 0.15) is rem.E(Y.S.111is hypothesis can be tested lIsing the procedure outlined in Section 5.teacher ratios and the mean leSt score in diSlricts wi th high studentteacher ratios. constructed as (3 \ :: 1. (D . it makes se nse (hal the OLS e~timator f3. the conditi onal ex pectation of Y. til. = I and when Di = ().15 ) r X" except :muous.. and in fact this is t he case. /31 is lhe d iffe rence between mean lest score in d ist ricts with low student. Thu ~ the Dull hypothesis Ih. = I) . it is rliy tWO val we will not .0. (5. If the two population means are the sa me. a regression o f the lest score inter Ine at a ti nle. = I. (5. when D. provides a 95% confid encc in terval for the difference between the I WO popula iio n means.8) ~ (5.7.96S£(/3 I) a" described in Section 5.teacher ratio binary variable D defined in Eq uation (5. In oth er words. 18) (5.\ /J. +.3 Regression When X Is a Binary Voriable 159 iouO US ~·~th a( Because E(ud D. whe n Di = 0 is £ (YiI D .1. ) = 0.Jtion mean value of test scores when the student.1.0 + 7.035. or /31 == £(Y.3) ( l. Because {3] is th e difference in the porulation means.1 Di = 0).2. 'orker's Jf Y. SER == IS. = I and f3'1 is the pop ulation mean of Yi when Di = O. is high. = 130 + 13 1 + II. Ibm i!>. the difference ({Jo + f3\ ) . depend· = f30 + /31. in th c two groups. to /31 as the ~ coctlicient Application to test scores.leacher ralio is low." Nay 10 again slthe student.2. (5.. = 0) = /30: that is.f30 = /3 1is the differ ence between these two means. (1. .
0) is 650.conditional o n Xi is Ihal11 ha!l a mean of zero (t he first least squa res assumption) . cally significantly different from zcro at the 5% level? To fi nd OUI .ti. . Thus the average test score for Ih\: subsamplc with student..4 / 1. If.teache r ratios less than 20 (so 0 = 1) is 650.:.4. Th is section discusses homoskedasticity. IhC II.8 ~ (3.11\1 5 confidence int e rva l e xclude:.. Th is is 7..4 Heteroskedasticity and Homoskedasticity O ur o nly assumptio n "bou t the d isLribulion of lI. illr which D .96 x 1. and the risks yo u run if you use these . so thc hypothesis that thc popu la tio n mean test scores in districts with high a nd low MU dent. 10.... Is the difference in the population mean tesT scorl!S in the two grou~ stati. given X. The OLS estima tor a nd its sta ndard error can be used to construct a 95% Con fidem::e inte rval fo r the !r ue differe nce in means. What Are Heteroskedasticity and Homoskedasticity? Definitions of heteroskedasticity and homoskedasticity. its theoretical implica tions.st a tistic on {31: ( = 7.4 = 657'. so tha t (as we know from the previo us paragra ph) the hypothesis f3 1 = 0 ca n be rejected a l the 5% signt ficance level. Thc error l~rlll is homoskedaS Iic' if the variance of the condi ti onal distri bution of II. c rror term is h e l ero~ ke d as tic . the I'orial/n' of this cond itio n.0 + 7. the coefficient on the stude nt.4 :!: 1. the simplified formulas (or the standard errors o f the OLS estimators that a rise if the errors arc homoskedaslic.9.teacher ralios is the same can be rejected a llhe 5% sign ificance leve l.n and in particu lar does nOI depe nd on X.96 in absolute va lue. 5. This is the OlS estimate of fJ. construct thc . The differcn.]Ill plificd formula s in practice.: between the sample average It:st scores for the two groups is 7. lhe n the errors arc said to be homoskedastic. is constant for i "" I. OtherwisC". I . 1distributio n does no t de pend o n X.04. fu rthe rmore.teacher ratio binary variable D.160 CHAPTER. (11 = 0.. given in parentheses below the OLS estimates. and Ihe average test score for the subsample with Stu dent. 5 lineor Regres~ion witn One RegresSOf" where the staodard errors of the OLS estimates of the coefficie nts f3 0 and {3t ilro.8 = 4..0. This exceeds 1. .9).teache r ratios grea ter than o r equal to 20 (Ihat is.
.04 .. the simplified formulas for the standard errors of the O LS estima tors.0 ..8 = 4..0 1herwise. The error {al11 uj is homoskcdastic ir the va riance of the conditional distribution of II.Ieache r ratios less than 20 (so D .96 in absolute value.teacher ra tios is the same can be rejected at the 5% significance level.onst ruct a 95% con fide nce inl(. subsample with studen t."\! betwee n the sample ave rage \es\ scores (or the two groups is 7.4 657. so the hypothesis tha I th e population mean test SCa TeS in districts with high and low ~IU · dent.• 160 CHAPTER 5 linear Regreuion with One RegreMOr whe re the standard errors of the OLS estimates or th e coe ffi cients (ju nnd {J.. thai arise if the errors arc homoskedastic.0.% x I. Th is exceeds 1.lIcs.. . co nstruct the . [he error term is heleroskedaslil.4 Heteroskedasticityand Homoskedasticity Ou r o nly assumptio n about the distrihution of u. .. . an d the risk. 10. If.sta tistic on 131: t = 7..l lte differen ..H :..4 / 1. Th is section discusses homoskedasticit y. = ~ .s you run if you usc t h e~ sim plified ro rmu l a~ in pracLice. Like F 5.4 ~ 1. What Are Heteroskedasticity and Homoskedasticity? Definitions ofheteroskedasticity and homoskedasticity. This is 7. Thus the average test score for tho. de nt.. Th is confide nce interval excludes /31 = 0.4.teacher ratio binary variable D. The OLS t!~l i m ato r and ils standard error can b~ used to(.\1 for the true differen ce in means. (3. conditional o n X i is that it ha~ a mean of zero (the fir st leas t squares assllmption ). 11 lind in particular docs not dcpend on Xi... gi\ cn X.teacher ra tios grea ter than o r eq uaJ to 20 (tbat is..9. furthermore.HI. for which D == 0) is 650.9).4. and the average test score for the subsu mplc with Slu. so Iha t (<IS we know from Ihe previo us parasxaph) the hypOlhesis 131 = 0 can be rejected :l tt he 5% significance le vel.. Is the diffe rence in the population mean lest scores in the IWO grou ps slatl'>\I' cally significa ntly d iffe rent {rom zero at the 5% level? To fin d out.! given in parentheses below the O LS estim. .':(\" . is constant for i = 1."'0. This is the OLS estimate of 13••the coefficient on the st udent.: I) is 650. 7. its theore tica l implica tions. the l·ari(llU"t· of this condi tional distribution does not depend on X" tben the errors are said to he bo moskedaslic.
all these condi tional distributions have the same spread: more pre cisely.tribution of lesl ~es 101' Ihree diffeeenl doss sizes. Ihe varia oce of these dist.. given Xi IS Jtherwise. that it ha:'o a As an illustration .4 are homoskeoastic.. vor(u(XI.4 Heteroskedasli<:ity and Homoskedosticity 161 10/31 arc ~ fo r the la\ is. so that tile e rrors in Figure 5. in Figure 5.4. Becouse lhe voriance 01 1 the dislribution 01 vgiveo x. For small val ues of x. u i. Unlike Figure 4. Because tbis distribution applies specifica lly for the indicated vaJue of x. re turn to Figure 4.ld low stu lcvl. so the FIGURE 5./J IX 96x 1.2 the variance of uj given Xi <= x increases with x.8 =' )W from the ~igni{ic a ncc 6OO. il has a greater spread.. the . Thus. is shown fo r vario us values o( . so tbe errors illustrated in Figure 4.. The definitions of heteroske dasticity and nomoskedasticity are summarized in Key Concept 5.2 An Example of Heteroskedasticity Test seore 72U 7()O XIS "1M"'''' 0 " 'hm) O_tioo '" 660 1.>0 StlldcntIeac her ratio l C' Uke Figure 4. in Figure 4.. tne:le distributions become more $preod out lhave 0 lorger variancei for lorger doss sizes. hetero.t.. spreads o ut as x inc reases. the conditional variance o f Iii given X.2 illustrales a case in which the condi tio nal dis tribuLion of Il.4. That is. for \\'ith stu lifferellce .ri bUlioos is the same for the various \'al ue~ of x. / Y hmX .4.4.4. In contrast. 0:= 0:= l[C said to be lica! irnplica tirnato rs that Ise these sirn 11C e rror tern 1 f II . bu t for larger values of x. I y varia/l ce of . The dis tribution of th e errors II . Figure 5. depends OIl X.!L ~l 95 % con 64U 620 IU /3 0 . 20 01"" ".'( 0/ f Ywl\en X= 25 . Ihi~ ~how1 !he c.the QLS b\e D ps statisti lstruct the lue.kedostic .orM:liliOflOI di!.~f~~__~~~~__~ ~__~~~____~ J'i 10 25 . t!lis is the conditional distribution o f IIi given X j x. this d istributio n is tight.2 are hete roskedastic. x does nOt de pe nd on x. As drawn in tha t fi gure.
is homoskedastic if the varia nce of the conditional distribution of IIi given X .f3 1)' Jt foll ows tha t the: . the error te rm is he. He re the regresso r is MA LE.. and in particular doe!' not depend un x. IIi is the deviation of the /. 11. Deciding whet her lhe vari ance of " i de pe nds on MALE. 'l11e bin a ry \'a riable regressio n model re la ti ng somcone 's earn ings (0 his or her ge nde r is &millXs. = II .. Example. O therwise.. (or wome n.. it is heleroskedastic.in this case. The definition of homoskcdasti cit).1 9) as t\\·o sepa rate equations. the e rror is homoskcda ' tic: if not . one for me n and one for wome n: Eamiltgs. Because Ihe regressor is hi na ry. it is use ful to write Equa tion (5. s t at e~ (h a t the va ria nce o t II . t he differe nce in mean e a rn iTl g~ betwee n me n and women who graduated from college.h woman's ea rnings from the popu l.h ma n'~ earning!. . does \lll t depend on the regressor..20) t30 + f3 1 + II. we digress from the stude ntteache r r. so at iss ue is whe the r the va riance o f the e rror term de pe nds on MALI:. = f3(1 + {3 \M A L E i + It. 11 . . (5. is the va ria nce ot' the e rror te rm the same for me n a nd for wume n':' If so.. is the de viation of the /. 5." Lei MA LE.. is (. (5. var{lIi l Xi = x).lt ia/lest score problem and in stead re turn to t he example of ea rn ings of male vcr sus female coJlege graduates co nsidered in the box in Chaple r 3.ll ) Thus.. In this rega rd. In ot he r wo rd~. (wome n ) a mI (S..te roskcdastic.onstalll for i = 1. u.. .4 . These terms a re a mouthful a nd the defi nitions might see m a bstract. requ ires think in!! hard a bout what the error term act ually is. (m en).. (Jl i!> the diffe rence in the popu lation means of th e t wo group ~ .". . = f30 + Earnings. "'Jh e G ~ nder G ap in E a rni ngs of College Grad uales in the United Slates. 1~) (or i = 1. .1' li on mean earnings for women (l3Q )' and for me n.162 CHAPTER S Linear Reg re~sion with One R egressor KEYCONcm HETEROSKEDASTICITY AND HOMOSKEDASTICITY The error te rm If. To hc lp cl a rify them with an e xam ple. frO Ill the pop ulatiolllllcan earnings for me n (f3o '. be a binar) va riable Ihal equals 1 [or male college gralJua tes a nd equals 0 for female gradu a tes. .
tract. Because the least squares assumptions in Key Concept 4. the OLS estimators remain unbiased and consiste nt even if the e rrors are hOOloskedastic. Whet her the errors are homoskedastic or he teroskedastic. p . they apply to both the general case of heteroskedas· ticily and the special case of homoskedasricity..wri te Equa 'n : Homoskedasticityonly variance formula . Mathematical Implications of Homoskedasticity The OLS estimators remain unbiased and asymptotically normal. then there is a specialized formula that can be used for tbe standard e rrors of ~o and ~l' The homoskedaslicityonly standard error of ~l' derived in A ppe ndix 5.1 . Therefore. does not depend on MALE. "the variance of e arni ]]g~ is the same for men as it is for women. consistent . t_teacber male ver :nder Gap e a binary ale gradu tS to his or (5 . 1 the popu HI earn ings H.S. This result. (5.20) (5.L. X. 1 ab!. then the OLS estimators 130 and {J I are efficient among all est ima lors that are linear in Y j .l n addition.. and asympto tically normal.1. the error term is heteroskedastie. the OLS estimators have sampling distributions that are normal in large samples even if the e rrors are ho moskedastic." In otlleT words." is eq uiva lent to the st atement. is discussed in Section S. conditiona l on XI". "th e variance of II.211 2 a = ~J . If the error term is homoskedas tic.22) (X. .5. Consequentl y. if these vari ances differ.' . the t:~timator of the variance of P I under homoskedaslicity {thaI is.4 Heteroskedos~city and Homoskedas~cily 163 lribution does not statement. (5. if the errors are homosked astic. the • .X) n the popula .19). homoskedasticityonly estimator of lhe va riance of {J l: ~ v'iif. which is called the GaussMarkov theorem. The homoskedasticity·only fornlul a for the st andard error of ~o is given in Appendix S. then the formulas for the variances of ~o and ~! in Key Concept 4. i =i ~ J" .t he e rror term is homoskedastic if the variance of the population distribution of earnings is the same for men and womcn . If the least squares assumptions in Key ~o llceel 4.3 hold and Ihe errors are homoskedastic.4 simplify. . the OLS estimator is unbiased. that tht: where s~ is given in Equation (4. In the special case thaI X is a binary variable.1Y) Efficiency ofthe OLS estimator when the errors are homoskedastic.2 (homoskedasticityonly) . in this ~Xal11 p k:.f the i!h man'" IlIow!. docs not whether the variance of omoskedlls res thinking . ••• • Y" and are unbiased.3 place no restrictions on the condit iona l variance. is SE (Pl ) = where ij~ is the p.
i. the variance of the e rror term in Equ a . Th is sug gests that the distribution of earnings among women is ligh ter than among men (See the box ill Chapte r ). However. Specifically. Because the swndard errors we have used so far (i.e. lJu b~r (1!J67) . today. be(.· clues as to which assumption is more sensible. In fact. but there have rarely been highly paid womc n. Because these alternative formulas are deri ved l or the special case thallhc errors arc homoskcdaSlic and do not apply if the errors are hClcroskcdastic. those based on Eq uat io ns (5. heteroskedasticity or homoskedosticity? ·lllC answe r to thi ~ question depends on the application. and (jJ" of the va riances o f PI and given in Equa tio ns (5 . Familia rit).26)] lead to stalislic.ritical values cannot be tabulated.. Simil arly. wi th how pcopk are paid in the world around us gives ~O Oli.4) and (5.. they are also referred to as EickcrHuberWhite stan dard errors. and White (1980).sts.J on thoSt: standa rd e rrors a re valid whether o r not the errors a rc hd eroskcdastic. cven in large snmpics. then the {statistic computed usin!! the homoskedaslicityonly standard error does not have a standard norm al distIi hUl ion. Thus hypothesis tesLS and confidence intervals ba~e t. UJI #ll What Does This Mean in Practice? Which is more realistic.23). to a le ~ ~t:r exten t. For many yearsand .4 ) the est imators and (5. the i"slles can \:"It' clarified by returning to the example of the gender gap in earn ings among collcgt: graduates... so those (.26) produce va lid statistical infere nces whe the r the errors a re hel e roskcdast ic o r homoskcdastic. if the errors are het croskedas tic..women were not found in th e toppaying jobs:There ha ve alw a~~ been poorly paid men.96 homoskedas· ticilyonly standa rd CHors.. the correct critica l va lues 10 u~e fo r thi ~ homoskeda sticityon ly (statist ic depend on the precise na tu re of the het ct\1skedilstici t y. In conlnlSt. they will be referred to as the " homoskedaslicityonly" formulas for the variance and standard error of the OLS esti mators.Cluse ho moskedaslicity is a special case o r he te roskedaslicily. if the error~ are hi:!lcroskedaslic but a confidence interval is constructed as == 1.'ll in ference. give n in Equation (3. CHAPTER.• . "111e Gender Gap in Earnings of College Grad uates in the United States") . even in large ~a m p l es.. In other words. As the name sugge. Because such fo rmulas were proposed by Eicker (1%7). then t he homoskedastici tyonly standard errors nre inappropriate. if the errors !Irc hctcr<)skedaslic.n general the probability th at thi s intervid contains the Irue va lue of th e coefficicnt is not 95%.they are called heteros ked ast idtyrobust sian da rd errors .5 linear Regression wilh One Regre5S01" square of the standard error of ~l lLndcr homoskedaslir:ily) is the socalled pooled variance formula for the di ffe rence in mea ns. thaI are valid whether or not the errors are beleroskedastic.
3.nd.'lbe data come from the Ma rch 2005 Cu rren t Population Survey.43: and for workers with a college degree.07.. slItndard deviation is $7.5 .. by $1. workers .77.3 Scatterplot of Hourly Earnings and Years of E ducation for 29. 01 education H" lor 2950 full·ti me..3.07) Rl = 0. very fe w workers wiLh Jow li:\'c!S of education have highpaying jobs.46. The coefficient of I 47 in the OLS rcgr es. 'Ibis increase i:. 1.. this standard deviation increases to $10.II!illl .. Figun: 5. The 95% confide nc¢ interval for this coefficicnt is 1. ". (5.mong ml!\l radua h. (0. S£R .the regrc~s i()n error~ are hete roskedastic.. s ug.3 is a scanerplot of the hourly earnings and the Ilumherof years of edu· cation (or a sample of 2950 fu lltime workers in the United States in 2004.rrors n::::q uin's analyzing data.93) (0.:' ill n in E qu a '" ~ I . [t11 only te n years of education ha\'C no shot at those jobs.u es \\i ll he earning S5OIhour by the time they Art: 29. I L.61. Hourly &arnings are plotted against year. In realworld terms... they .ol1. The second striking fr:ltl ur.. so ans we ring it : <.96 X 0. While some ". this laSlieilY· (5 . r~ or ~u n"jon 20 .: of Figure 5.78. Because thcsc stAnda rd devillliolb differ for diffe rent lncls of education.cClling that rhe regreuion erren ore hel~kedo~tic . This Ho urlr cUlli ng.11 . distri or this O n average.4) HC hel ntervals are hcl bascdon 1ethc r or ). but some wili . the standard deviation of Ihe residuals is $5.llnd worke r. on 'ty ? The les can lx ng colkg. summari7ed by the OLS regression line. til 15 "'''.~ion line means tbat ..1. . which is de~ibed skcdas a ins the arou nd the OLS regression line.~7Years Educm. . or \ using tion.ith more education have higher earnings Ihnn workers . H uhe r lOS in A ppendix 3.33 to 1. ages 29 and 30.47 for eacb additional year of educatio n... orkers with many years of education ha\'~ lowpa~ in g jobs.23) !lile sIan This line is plotted in Figure 5. !he yean of education. with between 6 and 18 ~ears of education. for workers wi th a high school diploma. il migh t n l~o be tha t the spr('u(1 of the distribution of earnings is gl't"ate r for workers with more education. This can be stated more precisely by lookin!! at the spread of Ih ~' rc ~ idual ~ IC' hel out as education increases? This i ~ an empir ical q uestion. hourl~ earnings increa5c.130.B + 1.ilh le~ educa average.3 has two striking features.47 ::!.4 Heteroskedaslicity ond Homoskedosricity 165 loled the ·.. 8. The spt"eod arOUnd !he regression line increases wit/.23) depends on thc value of the r<:grc)sor (the years of educlltion): in other WOrds.e FIGURE S .. BUI if the beslpaying jobs mainly go to the col lege educated. . .\1 'I! and 'C het pri atc.to 3OYear Olds in the United States in 2004 ?. lbe first is that the mean of the distribut ion of earnings i~reaselt with the num ber of years of education. no t ali college grudu. Figure 5.3 is that tbe spread of the distribution of earnings increases wi th the years of r:duea tion . For workers with len years of education.us! st an ~ "" .i vcssome to a lesser a ve alw a ~ s 1. the variance of Ihe residuals in the regressio n of Equation (5. 29· to 3Q·yoor·old 'H(lrkers. Docs the dis tri bution o f earnings ~pread 1..
1 used In conjunction .20) for ".:cd. Unless there are compelling reasons 10 the con· nary. J( the homoskedas..5 The Theoretical Foundations of Ordinary Least Squares As discussed in Section 4.t9).uIIC'I)robu~1 )tandard crmrs are U50ed. As this example of modeling earnings illustrates.. is consistent. Th e main issue ot practie. the presence of a "glass ceiling" for women's jobs and pay suggests that the error term in the binary variable regression model in Equation (5.c. nothing is lost by using the heteroskedaslicit).5."t standard errors: if they diffe r.166 CHAPTER 5 Linear Regression with One Regressor tion (5.. I '5.:li releva nce in this disell" sion is whet her one should use lleteroskedasticityrobust o r homoskedaslieityonly standard errors.robu. All of the empirica l examples in this book em ploy heteroskedasticityrobusl standard errors unless explicitly staled otherwise. The de tails of how to implemcnI bctcroskedasticilyrobusl standard errors depend on the software package you u. the OLS estimator is unbiased. Tn thi!> rega rd.ticity..~r. many software programs usc the ho moskedasticity·only standard e rrors as their default se lling.omen is plausib ly less than the variance of the error tenn In Equation (5. example as heteroskedastic. The simplest tbing.$ addit lonlll IIs~umpuon is nOI needed for the validil)' of OLS regression analysis I" lon~ J' hctemd.Jiscussed. Practical implications. AI a general level . A ~ just .:s that a llow for hctc roskedas. ... then you should use the mo re re liable on. is a lways to use tht' hete roskedaslicily..ter chap1en. Ih. has a \'. then c hoo~ ing be tween tbem.robust standard errors. ibIS iC!CIIOn IS optional . economic theory rarely gi \'e~ any reason to believe \hal the errors are homoskedastic. 10 the Ii~t of lea~1 squarcs assumption '!..ilh OIher I C ~I~ it might he helpful to TWtt: Ihm 5(lme le\I' booh add homO$ledo)hCII). \.ticityonly and he tc roskedastidlyrobusl standard errors are the sa me.and we can think of noneit mak es sense to treat the error term in Ihi. and has a nonnal sampling dislributhln lI n casc: Ihis book .·. hcteroskedasticity arise!> III many econometric applicalions. ho.t he n. so it is up to the user to s pecify the option of heteroskedaSticityrobust standard errors.. Thus.:Iri anec th at is in versely proportional to n.. howeveJ. It therefore is prudent to aSs ume that the errors mi ght be heteroskedastic unl ess you hu ve com pelli ng reasons to believe o therwise. it is usefu l to im agine compu ting both .21) for men. For tlistorical reasons.nd is not used in I.s hcteroskedastic.
lhen the OLS est'im ator has the sma llest variance. s The Theoretico! Foundations of Ordinory leosl Squores m In . . it is BL UE.. under certain conditions the OLS esti· mator is more efficient than some other candidate estimators.. the esti mcllor lJl is conditionally un biased if E(~ ll Xl··· · . 5 as lonF. ' a'l can depend on Xl •. YII" This section explains and discusses this result. . then the O LS estim ator has the smallest variance of aU conditionally unhiased estimators that are linear functi ons of Y\ .25) hold~ (it is .. if the least sq uare:. that the sample average Y is the most efficienl discus ty only choos· robust 'robust )le oneS usc the estim aLOr of the population mea n among the cla ss of all estimators Ihm ure un bi ased and are linear fun ction s (weighted averages) of Y l •• ••• Y". . conditional on X l" .. :ityonly :!oplion plcment you usC. . summ ari zed in Key Con cept 3.= 1 (5. l11e class of linea r conditionally unbiased es timators con sists of all estimators o f f3J that are linear fu nctions of Yl .· ..tributiofl where lhe weights 01 . Tn other words. among aU estjmators in the class of li near conditionally unbiased esti mators.. if ~l is a lin ear estimator.3. given Xl' .r) (). III <ldd ition.s.. which is a consequence or th e Gauss·Markov thcorem. Specific<l lIy. the OLS estimator is the Besl Linear conditio nally Unbi ased E stimatorthat is.• X"..24) . .3 ) hold and if the error is homoskedaSlic. Linear Conditionally Unbiased Estimators and the GaussMarkov Theorem If the three least squares assum ptions (Key Concept 4. • Yn and that are unbiased.lbe section concludes wit h a discussio n of alternative esti mators that are more efficien t than OLS whe n lhe conditions of the GaussMarkov theorem do no t hold..l l~ ~l /31 ([3J is conditionally unbiased). assumptions hold and if the errors are homoskedastic. X".. jobs :lei in : con n th is i<:es in I gives lent 10 )ellin g 167 when the sa mple size is large. Xn) = t somf. This result extends to regression the result . Yw'rlle estima tor ~ I is conditionall y unbiased if th e meiln of its condition al sam pling dislribut ion.. is P l' That is._) d. X"..as a van . ((31 is linear). however. That is..24) (it is linear) and if Equation (5. conditional on Xl. •• ·. then it can be wrillen as (31 = fO iY. X" bul not on Y1..as The esti mator (3\ is a linear conditionally unbi<lsed estimator if it c<ln be writ ten in the for m o f Equation (5. . yrobus t Linear conditionally unbiased estimators..
Conscqucntlr. if the e rror term is hete roskedasticas it oCi e n is in economic a ppiica lion'ithen the OLS estima to r is no longer BLUE. ll1e Ga ussMarkov theorem i!ro sta lt': d in Ke y Conce pt 5. The GaussMarkoy theorem. called the weighted ICilst squa res estinHII OI. I ' discus$ed be low. 111e Ga uss·Markov The nrem stales Ihat.5 If the thrce least squares assumptions in Key Conccpt 4.• .'it ic.. the OLS estimator is BLUE. . then the OLS estimator ~I is the Best (mo:. X n • of all lincar condit ion ally unbiased estimatOrs of fil: that is. l condi tionally unhiased). The GaussMarkov theon. First.3 hold alld if errors are homoskeda. these othe r estima to!' .168 CH APTE R 5 linear Regr~sion with One RegrElS50r KEVCONcm THE GAussMARKOV THEOREM FOR ~1 5. which a re stated in Appendix 5. the OLS estim ator /3 1 has th e s m{lllt:~t cond it ional variance. A n alte rnative to OLS whcn lht:fe b het· eroskedasticity of a known form .:m provides a theoretical justification fOT using OLS.m~ implied hy T he three le a~ t squares assumptions plus the assumption that the e rrors nre ho moskedastie. the theore m has t\\O important lim itations. I he GaussMarkO\ conditio ns. if the three least sq ua res assu mptions ho ld and the e rro r~ ilre homoskedtlstic. Limitations ofthe Gauss Morkoy theorem .2. Regression Estimators Other Than OLS Unde r certain conditions. its conditions might not hold in practice.5 amJ prove n in A ppe ndix 5. the re are other cand ida te estimators tha t a re not lin ear a nd conditionally unbiased. It is shown in A ppend ix 5. of Ihe theorem hold. undtt a se t of conditi ons known as lhe GaussMarkov condi tions.. given XI •. some regression estimators arc mo re effici c nt than 01> l . As discussed in SecLion SA the prt':\encc o f helc roskedasticity does not pose a th reat to infe rence hased on hcte ros k ~d. but it does mea n that O LS is no lo nger th e efficien t lin car condition<l lly unbi ased estimator. The second limitation of the Ca ussMarkov theorem is that e ve n if the Clm" diti on:.2. In part icular.t efficient) Linear con ditionally Unbiased Estimator (is BLUE). under some condi tions.ls" ticit yrobust standard e rrors. Howeve r.2 tha t tilt: O LS estim ato r is li n ear and condi tionally unbiased.Ir~' mo rc efficient tha n OLS.. then OLS is HLU E.
how e\'er.:' lI Y. so OlS. is uncommon in applications. If extrt!me outliers are not rare. . ca lled weighted least squares (WLS). except th at the absolute value of the prediction "mistak e" is used instead of its sq uare.jnear con DalOr is lin IhaLunder 's(imator /31 :mditionalh . the exact distribution of the Istatistic is compli cated and depends on the unknown popul ation distribution of the data. in II •5.6) . is het S estimator.6 Using the IStonsnc in Regression VVhen the Sample Size Is Small 169 The weighted least squares estimator. This method.6 Using the tStatistic in Regression When the Sample Size Is Small When the sample size is small. weights the i lh observation by the inverse of the square root of the condi tional va riance of u. lerrors are . If. when a pplied to the weighted data. the least absolute devi a tions estima to rs o f ~. In many economic data sets. then ot her estimators can be more eHicient than OLS and can produce inferences that are more reliabl e. ~das ti c. • . then OLS is nO longer BLUE. As discussed in Section 4.ussMarko v east squares bnsequentl\'.bo Pu aod PI are the values of ho and bl tha t minimize blXil. severe outliers in II are rare. is BLUE.5. then rep! 5_5 and The least absolute deviations estimator. the Ihree leasl squares assumplions ho ld . then the OLS if the CO Il nal are nol lin Yell estimator" tlrC' "111is secli()n is oplional lind is nO! used in la\cr c hapICrs. If the nature of the hClcroskedastic is knownspecifi ca lly. and Ihe regression errors a re normally distributed .then il is possible 10 construct an estimator that has a small er variance than Ute OLS estimator. depends on Xi . i~ than i~ OL$. Although theoretica ll y elegant. 3. given X i' Because of th is weighting. One such estimator is the least absolute deviations (LAD) esti mator. the e rrors in this weighted regression arc homoskelias tjc. if the conditional variance of u j given X j is kn own up to a constant fac tor of proportionahly.re ho moskedastic. or other estimators with reduced sensitivity 10 outli ers. In practice. the practical problem with weigh ted least squares is that you must know bow the cond itional variance of II. That is.then the presence Ilcte roskedas :f the efficient 'n there. 'alions. the OLS estimator can be se n ~ iti ve to outliers. in which the regreSSion coefficients (3() and {3. Thus Ihe tre<ltment of lin ear regression throughout the n:mainder of this text focuse s exclusively on least squares met hods. If the errors arc heteroskedastic. are obtained by solving a min im iza tion like that in E quation (4.something that is rarely known in applications. the regressio n e rro rs a. so use of the LAD estimator. this estima tor is less sensi tive to large outliers kov theorem )rcm has tWO In particular.
.• 170 CHAPTER 5 Linear Regr~sion with One Regres5Ol'" es tima tor is normally dislribuled and the bomoskedasticityon ly Istatistic has a SlUden! I distr ibution.'c assumptions.i'~ o f the result Ihal..21. conditio nal on X l. if the h~V populatio n d is tributio ns are norma l with the same varia nce a nd if the Islat lslir i~ constructed U Sing the pooled standard e rror form ula IEqutl tion (3.2 degrees of freedom. hUlian wi th 1M degrees of freedom... ' X".u) h. \\be n X is binary. Be catl~c a wc i ~hl ed a verage ot inde pendent normal random varia bles is no r~ a l ly distrit'l 0'1 uled. Y has a norma l distri bution..:d in Eq ua tion (5. th e weigh IS de pend o n XI " .tble with a s tn ndard no rmal distri bution. the (normalized) homo~kcdas ticityonly variance eSlimtllOr has it c hisquared d istribution with 1/ . Xn .. cussed in Section 5...t of testing (o r the eq ualit y of the mea ns in two samples.I.10).5.j1.2 degrees of fr eedom.'~ Exerci se 5.5 in the cont!. The homoskedas lidt yonly {statistic testing fJr ~ f3 10 is I = (~l . Under the nul\ hypothesis. . divided by n . and iT3r and PI arc indcpcndenlly distributed . These fj. A ') di. if the hOllloskcdaslic normal regression assumptiom hold....22).~ sio n ilssllm p li on ~ .. the homoskcdaslicityonly r· stati~tic hu~ a Stude nt I distribution with n .23) ). It fo llows th at the resull of Section 3. X". the Istatistic computed usi ng the homoskedasticityon ly standard emIr can be wrille n in this form . Use of the Student t Distribution in Practice If the regress ion c rror~ arc ho moskcdastic a nd norma lly dist ributed and if Iht' homoskcdastici tyon ly lsUltist ic is used .IIl!..). Under the homoskedastic norm al regr.'!1 the homoskedasticit yo nly regression rSI::J lislic has a Studenl {di stribut ion (:. In that proble m. The t·Statistic and the Student t Distribution Rccull from Section 2. where Z is a rnndom vurt.. thc OLS estimator is a weighted a verage of YI. lhal the e rrors aT C homoskedastic. the n critical values should be taken frtHII ' I . conditional on Xl' .a re collecti vely called th e homoskedastic normal regre. thcn the (pooled) Ista tistic has a Student {d iSlribmion.32) in Appe ndix 5. the horn os ktd a~ ticityo nly standard erro r fo r P t simplifies to the JXlolcd sta ndard error formul a {or tbe diffe re nce o f means.:s s ion ass umptions.111US (/31 . P I has a norm<ll dis lrihulio n.the th ree least sq uares a~!\u mp · lions...4 lhul the Sludell( ( distribution with III degrees of freedom is defi ned to bt' the distrihution o f Z / v'W I m .. This result is closely related to a result discusS('d in Secliun 3. lV is a random variable with a chisqua red dbtrj .P I n) h~I" a normal distributio n under Ihe null hypothcsis. Conseq ue ntly.5 is a speci(ll . and that the errors are normally distrib· uted. In addi· tion. cond itional on XI "" ..(3 1. and Z and Ware independent. • X" Isee Equalion (5.' wh ere i ~ uefint.2. Yn • whcu.
1 Conclusion 171 las a Lmp . TIle coefficie nt on the slUde ntteache r ratio is statistically significnntly d if terent from 0 at the 5% significance level. Ye t. Bu t does thi s mL'"an thilt red ucing the student..1 and 5.that is.tribution of test scores to approximatel y the 60 th pe rcenti le. in fact. A 95 % confidence in terval for f3) is . increase sco res? )) Iii~" t"!gres A. showed t ha t there was a nt' g<lti vc relati onshi p between er classes have higher the slUdent. .30 :s. highcr test scorcs.'fll e population coefficient mig.e null I error 5. nlls represents considerable progress toward answering the supcrintendent's question. a pproximately 0. Thcre is a negative relat ions hip between th ~ student. a n<lgging concern. and coufidence intervals. In econometric application s.Q) has Inaddi . however.6 points higher.uib 0 115. Howeve r. th e proba bi lit y of doing so (and of obl<lining a [sta tistic on f3! as large as we did) purely b)' ran dom variation over potentia l samples is exceed ingly ~m all . on average.teache r T atio and teS1 scores.ligible if 11 is moderate or large. (3) 5 .quared j ~L are istic has context the t WO atis!iC is then the oskcdas for mula ~da l case old.teacher ra lio and test scores: Districts wilh srnaU tes t scores.. O!dom : wi th diSl ri the St udent I distri bu tion (Appendix Table 2) instead of the standa rd nor mal dis trihut iOll.lion (sec find if tlW ~ken tronl • . this disti nct ion is relevant only if the sample size is small .robust standard errors.7 Conclusion Re turn for a momenl to the problem Ihat started Cha pler 4: the su p"ri ntende nl who is considering hiri ng addiLionalteachers 10 cut the st udentteacher mt io.S dis . there is rarely a reason to bel ieve that the errors are homoskedastic and normally distri buted . and we might simply have estimated OUT nega tive coerricien t by random sampling \'ariation.(. t e ~t scores that arc ·1. Because ~amp l e sizes typically are large.1. inference can proceed as dcscrihcd in Sect ions 5.2. and thcn using tht! standard nonu<ll distri bution !O compute p values. in a practical sense: Districts with 2 fewer students per teacher have. on av era ge.ht be 0. Because the difference between the Student I distribution and the nor mal distributiun is neg. based on the 420 observations for IY9R in the Cali fo rnia test :.26. The coefficie nt is mode rately large. where ·ecaust distrih 1 t.S.3. then . h)'pothcsis tests. TIli s corresponds (0 moving a district at the 50 110 percentile of the dil.:ore data st't. What have we learned that she might fi nd useful? Our regression analysis. re mains.00 ] %. but is {his relationship necessarily the causal one tha t th e superi ntendent needs 10 make her decision ? Districts with lower ~ tudent t e<lc hcr ratios have. by fi rst computing heteroskedaslicity.teacher ratio will.
. including better facilities.i. For example. California has a large immigrant community. H ypothesis testing fo r regression coefficiems is analogous to hypothesis testmg C or the population mea.nce int erval for the popul ation mt':all.or "o mi th. in fact.112 CHAPTER 5 linear Regre~on with One Regressor There is. Moreo\Cr. Like a confid e.96 stand ard errors.:' ses about the difference between the population mea ns of th e "X . These o ther faclors. we need a method that will nllow us to isolate the effcct on tt::~ t scores of changi ng Iht:: stu dent. . it could be misleading: C ha nging the student. students at wealthier schno ls tend themselves tn come [rom more affluent rami lit:s. lht:: to pic o f Chaplc r 7 and 8. reason to worry tha t it might not. in millly cases.0" group and the "X = 1" gro up.n : Use th e (s lat i ~ tic to calcula te th e pva lu es and either "crept or rej eellht': null hypothesis. and betterpaid teachers. That method is mUl ti ple regression analysis. 2.teac he r ratio alone would nOl cha nge these other factors Ihat determine a child 's pe rformance al school.." could mea n that the O LS anllly<.teacher ra tio is a consequence orlu rge cl as~t:~ being fouml in conjunction with many other fa ctors that are. Summary I. their children are not native English speakers. aft ~t al l. a 95% confidence interval fo r a regression coefficien t is computed a~ th~ estimator ± 1. these immigrants lend to be poorer than Ihe ovenlil population and .:d va ria hies. holding Ihese other /oclOrJ COlfSlal/(. in faci.. B ut students at wealth ier schools also have ot her advantages ove r their poorer neigh bors.costs money. Hiring mare teachers.lndeed. and thus have othe r advantage!> not directly aSSOCiated with their school.0 estimate and test hypolh. Ihe real came of the lower tes t swres. To address this problem.Ieac her ratio.so wealthier school di!itricLS can hetter a(ford smaller classes. done so far has linle va lue 10 the supe rintendenl. newer books. Ihe regression model can be used 1. When X is hi nary. It thus migh t be that Oll r negative estimated re lationship between test scores and the student.
( 158) coefficient o n D. fa mi :>1. fac tors Key Terms null hypothesis (150) twosided alternati ve hypothesis (150) standanJ e rror of (151) Ista listic (lS I) pya lue ( 151 ) confidence interval for f31 (156) confidence level (JSh) indicator variable (158) du mmy vari able (158) coefficient multiplying variable 0. The diffe rence between the Studelll l distribut ion and the normal distribution is negligible if the sample size is mod erate or large. For end to 3. as a resu lt of the GaussMa rk ov theorem. A specia l case is when the error is homoskedas lic.statistic computed using homoskedasticityonly standard errors h as a Student I distribution whe n the null hypothesis is true.lX. 4. = x} is constant. (15S) he te ros kedasTicil y and homoskedast icily bomoskedaSlici (yonly stand ard errors P. that is. S. = . I f the three least squares assumption hold and if t he regression errors are Ire not onship classes I cause r nal~sis ~'e need It he stu ~u 1t iple ~ adlOg: horooskedastic. Homoskcdasticityon ly standard errors do not produce valid statistical inferences when the errors are helerosk edastic. the variance o( III at a given va lue of X" var(uilX. the n.and if th e regression errors are normally distributed.1") depends on x. (163) be lcroskedasticityrobust standa l"d crro r (164) best linear unbiased estimator (BLU E) (168) GaussMarkov theorem ( 168) weight ed least squares (169) homoskedastic normal regression assum ptions (170) GaussMarko v conditions ( 182) Is testing either D putatioO cd as the rd IhypothC ~roup (lnd (160) • .Key Term$ '73 I. then the OLS .fl" f Bu! )cigh. ~ovc r. If the th ree le aST sq uare s assumption s hold. the OLS estima tor is BLUE. but hetc roskedasticityrobusl standard e rrors do. In general Ihe error Ui is hcteroskedasticthat is. if the regression e rro rs are homoskedas lic. vaT(l/ .
Wh~ 1 arc the dependellt and indcpem.0.ulue of a twosided lest of Ho: f3 1 "'" 0 in ~ n:grcssion model using nn i. determine whe ther ...2 Explain how you could use a regression model 10 estimate the wage gentkr gap using the data on earn ings o f me n a nd wome n.4) (2 .. al ue for the twosided test of the null hypothesis Ho: /3 1 = 5. Outline the pru· cedures for comput ing t he p ·\. ...lk workers and 280 fe male worke rs. Calcula te the p va lue for the twosided tesl of the null hypothesis Ho.nl for f3n.i. " 5..5. i = 1. Without doing any additional calculat ions.el? c.12 x Ma le.52 + 2. Do you reject the null hypothesis a t the 5% le . l 174 CHAPHR S linear Regr~ion with One Regre~sor Review the Concepts 5.i = 1.1 Suppose that a rescarcner. 5. sel of obscrvitlions ( Y X . My = 0 using an i. T~!ifSCO'~ .leol variables? 5.d ...2. set of observations Y.23) (0.4 .. (20..6 is cont ained in the 95 % confidence inle rval for fJl' d.06.11..idty and heferosker/aslicily.3 Define hQmosketlol'. II. Construct a 95% confide nce interval fo r 131' the regression li lope cod fici e nt. estimates tbe OLS regression.. using data on da ~ size (CS) and average te(t sco res from IOU lhird grade cJas::. "Wag(: = 12. n. = O.0.36) = 4.).08. SF.d. and expl ain you reasoning.. .R .1 O ut line the proced ures fo r computing the p value o f a t wosided lest 0 1 H rj. b. Construct n 99 % confide nce inter.. eSlimales tbe OLS regression. .2 Suppose that a researcher.81  x CS.el? At the 1% le . /3. P rovide a hypothetical empirical example in which you think the errors woul d be heteroskeda"l ic.\·ER (.5. Ca lcula te the p ·.520.. .. using wage data on 250 randomly selected m.21) II. Exercises 5.es. R7 .5. R~ = O.i.
but regresses Wages on Fema lt!.R1. d. 5.ssion esti mates calculated from this regressio n?  Wage  + .5 inches over the course of a year.) l\ :c:r m' c. How much is this worker's ave rage hourly earnings expected 10 increase? • . is measured in inches. What are the regre.f " where Woge is measured in SJhour and Male is a binary variable that is equal to I if the person is a male and 0 if the person is a fema le.SER (2.31) = r f  lO. A randomly selected 30yearold worker reports an education level of Hi years.23) to answer the followi ng. . a variable th at is equal to I if the person is fema le and 0 if the person a male.4. SER ~ _ _ _. Use the regression reported in Equation (5. What is the es timated gender gap? h. 5. A high s< :hool graduate ( 12 yea rs of education) is contemplating going to a communi ty co llege fo r a two· year degree.Exercises 175 .val uc for testing the null hypoth esis that there is no gender gap. a. a. What is the worker's expected ave rage hourly earnings'! id mall! b. Anot her researcher uses Ihese same dat a.2. Is Ihe es timated gender gap significant ly differen t (rom zero? (Com· pute the p .41 + 3. A man has a JUIl. R . Define the wage gender gap as the difference in mean ea rnings between men and women.99.15) (0. A regression af weigh t on height yields ~ ~ .: grllwth spurt and grows 1. [ I whe re Weighl is measured in pounds and H eigh. == x Female. what is the mean wage of women? Of men'! cal r' test c.4 Read the box "The Economic Value of a Year of Education: Heteroskedas· licil y or H a moskedaSlicity?" in Section 5. In the sample. CoO$lfuct a 99% confidence interval for the person 's weight gain.3 Suppose that a random sample of 200 twentyyear·old men is selecled [rom a population and their heights and we ight s ilrc recorded. Construct a 95% confidence interval for the gender gap. Rl = 0.94 x Height.
ln! sample of size 11 .5 10 the 198Os. college gradu <lteS ea rn $10 per hour more than high school gradua tes.9 (1. A regression of Tes(score on Smal/Class y ie ld~ ~ "" 918. the s tandardized tests have a mean score of 925 poi nts and a standard deviatio n of 75 points. 3.5(c)? Expla in.5) 0.7 Suppose tbat (Yo Xj) sa tisfy the assumptions in Ke y Concept 4. Suppose tha t lhe rcgres· sio n errors were ho moskedasLic: Would th is affect the v<:alidity of the conf ide nce interval con5IJucied in Exercise 5.ression evidence? What range or values i<. on <:average.5) a. Is the estima ted e He ct of class size o n test scores statlsticaUy s ignifi ca ot? Carry oU( a test at the 5% level. R2 = (3. S£(bl) was computed usi ng Equation (5.SER = 74. .oullse lo r [e lls a student that.6 Refer 10 the regression described in Exercise 5..3. Let SmollCltlSs denote .. b.sion with One Regreuor c.~s on test score.2. in (be populatio n.01. b. Is Ihis state· ment consistent with the reg.250 is drawn and yields Y== 5. Co Construct a 99% confidence interval for the effect of SmaIlCla.176 CHAPTUI: 5 linear Regres.26.stic ! Expla in . Do small classes improve les t scores? By how much? Is the effect la rge? Explain. (Regular c lasses con ta ined approx imately 24 students a nd small classes contained approximately 15 students.ll (1.3). and given s ta ndardized tests at the e nd of the year. 5. (2.4 + 3. A hi gh school (.6. A rand. Tennessee conducted an experiment in which kindergarten stu dents were r ando mly assigned LO " regular" and "small " classes. 5. Do )'OU toinl:: th at the regression errors plausibly are homoskeda.6) X SmailClas~) R 2 = O.0 + 13.I binary variable equal to l if the stude nt is assigned to a small class a nd equal to 0 ot herwise.2X.5. consisteot with lhe regression evidence? 5.) Su ppose Ihal . SER = 6.
H J: {31 :f.) satisfy the a!isumptions in Key Conce pl 4. Test Ho: {31 = 55 \'s.. in addi I' tion.8 S u p po~e {hat (Y.54 . + U i . Suppose thai Yi and lues is 'ten st U Xi are independent and many samples o f size L d given approx :udents. ::0 f3 X. b.2) (7. Test Ho: P I = 0 vs.3. SER = 1. . (10.9 Consider the regression model skedast ic? Y..Y b .. . and X. Construct a 95% confide nce interval for f3o. In what fraction of the samples would Ho from (a) be rejected? [n what fr action of samples would me value {3. where Y and X are the sample means of Yi and X i' respecti vely. II i is N( O.Exercises 177 gradu .3 and. + 11 . Suppose you learned th at Y. a~ ) and is independent of X A sample of size n = 30 yields 5.. = f3 0 + {3 IX. Construct a 95% confidence interval for {31 c. y = 43 .52. Y".3. R2 = 0. for rhe regressi on coeffic ients. Show tllal l3 is conditionally unbi ased . we re independent. H I: (3 ) > 55 at the 5% level. d. . '" 0 al lhe 5% level. ) score of n = 250 are drawn. a.10 Let Xi denote a bin ary va riable and consider the regression Y. X . . 0 be included in th e confi dence interval from (b)? denote a Lnd equal I 5. Le i Yodenote th e sam ple mea n for observa tion s with X == 0 and ~ P • . c.tate 9.. ffeet signifi : iass on 5. Show lhal {J is a linear function of Y l. regressions estimated. he r egres . h.5X.4) where the numbers in pa ren theses are th e homoskedast tco nly standard erron. lilY of the . . and (a) and (Il) answered. H l : J3. Let denote an estimator of f3 that is constr ucted as {j = .55 at the 5% level. Test Ho: f31 = 55 vs. A ran dom where u. b..2 + 61. and Xi satisfy th e assumpt ions in Key Concep t 4. 5. Would yo u be surprised ? Explai n. a.
c $51 . = J. + II . is N(O.Y". i = P. Write the rcgre ~sion [Of men as Ym.10 and s..r + Il. denotes earn ings. IT~ ) a nd is independe nt of X.• 118 CHAPTER 5 lineor Regression with One Regreuof de note the sample mea n for obse r va tio ns wilh X = J. X... = x) is consta nt ? d. .28) in A ppe ndix 5.. = 13 1 womr. Lt:t WOmell dcnOie nn indie<t lOr vari able that i s eq ua l 10 J for women and 0 for men. )2) is $1).) ~a ti s ty the Ga ussMarkov conJi lions given in Equation (5.1.. To p~ specific. wbere (1I i' X. Womell. II. a.3? 5.1. + 1/..10. derive the variance of ticilYgiven in Equa tio n (5..3 1). I X.. b. How would your a nswers to (aJ and (b) change if you assumed an i) that ( Yj • Xi) sa tisfi ed the assumprio ns in Key Concept 4. = }l1  Yo 5. <lnJ the independe nt samples are for men and wo men.) sa ti sfy the assumptions in Key Concept 4. U Sta rt ing from E quation (4. (Y "..1. = D e rive the least squares estimator of {3 and show that it is a linear fU lictio n of Ylo." + /(""1' a nd tbe regression for WOnll!n as )' = {J..:n The sample 3\'e rage of me n's weekly earnings <V".. H.X. {3X.be assumptio ns io Key Concepl 4. Show that AI ~ ~J AI + ~ I = YI.. The corresponding values for women are Yo. Show that the estima tor is conditionally unbiased. denotes yea rs o f schooling... Prove thaI the estima tor is BLUE.andll. Yn . suppose that Y. Find the OLS estimates of f30 and {31 And thei r correspond ing sta ndard errors. d. in addl' tion .o + {J.. 5.:: $485. S.. ~~:1 y~ $523.U A random sample of workers co nt ains n.1 0. and the sample st:mdard deviation (sm = L:=. VA {Jo + /3. IX .d' Let ~m I denote lhe QLS cstimatorconSlructcu using .15 A researcher has two indepe nde nl s amples of o bservations on ( Y.. == 120 me n and " ".13 Suppose that ( Y" X. Js {J l conditi onall y unbiased? b..o + Pm..3 and.3 and " va r(II/IX.14 Suppose that Y. a nd s uppose that 311 251 observat ions a re used in the regression Y... Is {J l the best linea r co ndit io nally unbiased estimator o r P I? c. ". is run....22). c. How \\ould your answers to (a) a nd (b) c hange if you assumed only that (Y Xi) satisfi ed t. P ounder homos kedn& 5.). De rivc the conditio nal variance of the estimator.
5%.2 Using the data set TeachingRatings describe d in Empirical Exercise 4.]) and SE(j3."J  p".umed pt4. or 1% significance leve l? Whal ..statistic'! I I? .. or 1% signifi canct' leve l? What is the p value associated with coeUicien t's . l is given by SE(P... the sample of men .10.~ .3 and sumed only )' Marko v condi· is a linear un E5. stat istic'! ::.)1) is $68. Construct a 95% con fi dence interva l for the slope coe fficient. Show (hat the sta ndard e rror of ~"'.. Run Ihe regressio n using data o nly o n fema les a nd repeat (b).s thc pvalue associa red with coefficie nt's .. the data set ConegeDistance described in Empi rical Exercise 4. e.) E5. t·.3.. run a regression of COI/I'se_Eva{ on Beatay.Empirical Exercises 179 lat flo = Y o· 131 women.. Construct a 95% confidence interval for the slope coefHdent.l Using lhe data set C PS04 described in Empirical Exercise 4.. Re peat (a) using only the data for high school grad ua tes. can you reject the null hypot hesis Ho: {31 = 0 versus a two sided ahe rnative a t the 10%.1O . and SE(h". can you reject the null hypothc sis Ho: 131 = 0 versus a twosided a llernativc a t the 10%. run a regres sion of ave rage ho urly ea rnings (AH£) o n Age and carry OU I the followi ng exercises. c..· b.5%. :onstructcd using b. X .l. (Hillt: See Exercise 5. • $51.To hc Df schooling. Is the estimated regression slope coe fficient statistically significant? That is..1denote the aLS estimator constructed from the sam ple of women. . or 1% significance level? Wha t is the p value associa ted with coefficient's Istal istic? o n (Y.ression Yj = J31and their Empirical Exercises ES..en and 0 for .2. Is Ihe effcci of age o n ea rnings diffe rc nt fo r high school graduates (han for college graduates? Explai n. and. d..3 Usin g. homosked<lS a.). Repeat (a) usin g o nly the data for college graduates. can you reject the null hypo lhesis Ho: 131 = 0 vcrsus a two· sided alte rnativc at the 10%.) ~ v'ISE(~m. 13.1 .. T a regression of years of completed e ducation (ED) on dist ance to th e near es! college (Dist) and carry out the fo llowing exc rcise~ u. Is the estimated regression slope coef(icien t statisticall y significant ? That is.I ) denote the corresponding standard errors. ) J' + ISE(~. Is the estimated regression slope coe((icient sta tistically significa nt? Thai is.1.. in add. anJ l e regressio n [Of )r women as Y.. ~ $52 3. • .. Let .5%.1 5. ~.. )I'.
defi ned in Equation (5.in Equation (5.. The consistency of heterosk~dasticlly .jliil ~ " 11 2/_ 1 OJ. e.21) b)' Ihl'.3.) APPENDIX 5. The ~Stimator of the variance of 1 ~ .e 1\\0 da rd errors is discussed in Sect ion 17.X )lil!.robust " standilrd errQrs..) m Equllt ion (4 . Thc~ :Ire first prese nl~d unde r the least squ<tres assum ptions in Key ('.· tty: these are the "he leroskedasticity. L:'.it h a modification llle variance in the nume rator of Equation (4.4). ca~e HeteroskedasticityRobust Standard Errors The est imator (. Is the effect of distance on completed years of educat ion different for men lhan for women? ( Hint: See Exercise 5.2 (instead of n) i ncorpcmlte~ a degreesoffreedo m adj ustment to correct ". yields "i.J. here the dIvisor n . The variance in the denominator is estimated b~ ~ L..3.15.2 1) is estimated by ~. v. . for downward bias.I(X. R un the regression using data on ly on males a nd repeat (b). varianc/. analogously to the degreesoffreedom adj ust meOl used in the deflm· lion of the S £R in Section 4.3.:'I { XI  xl Replacing \'a rI{Xj  estimatoN.16J . '" it x (l±HfY nj &1 ( ~.J and va r(X.21) by the corresponding sample variances.4) is obwined by replacing the population \3ri ances in Equation (4. which allow for heleroskedasti.onccpt 4..1 Formulas for OLS Standard Errors 111 is appendix d i ~cu ~es the formulas for OLS standard errors. . FormuliLS for tht..180 CHAPTER 5 linear Regression with One R egre1W1" d.robu~1 ~(Iln' iJo is 1 ILx)II. of the OLS esti mators and the associated standa rd errors arc thcn given fo r the special of honlOskedasticity.
2 1) as var[(X.28).0". then var( II. .)] ". and ste ms from replacing population Vi expectations wjth sample averages. 'lbe result in Equation (5.) == (T~ so E{(X.28) skeJaslic aria nee of ca~ To derive Equ.}tioll (5 .  )eclal E[(X.] = 0 (by the first kaS l squares assumpt ion) and where Ihl: final equality follows from the law of iter ated expectatio ns (Sectio n 2. . .X l ) s! I .J. here H.. The standard e rror or Po is SE(Po) = VJ[.Only Standard Errors The homoskeda:<>ticityonly stand ard errors are obta ined by substituting sample means and va riances for the pop uillll00 means and van a nces in Equations (5.28).] = £ (l( X .30) . the fann ulas in Key Concept 4.21 ) and simplifying.IX.J. . The ril~. and by estimating the variance of ui by the square of theSER. HomoskedasticityOnly Variances U nder homoskedastici l'1.dudl ") = E[ICX.tX)II. write the numera tor in Eq uation (4.I X. It to COffect Homoskedasticity.(f I~ ~:_l XfJX. The homoskedasticityonly estima tors of these variances . . L(X.d u:l = £[( X.timated by )'1 these \\\'0 'robust stan' _.27) if. L(X.tx) 2vu r(u.) = fT~. given X j is a eon"lant: "ar(lI. (T~ E[(X. o 11fT).) )..tX) II./oIX)ll.Xl' " (5.27) follo ws lation vari cation.. If u.. . If the errors ar~' homoskedastic.2o ) ij2 _ ~.J. • .27) and (5. . A simila r calc ulation yields Equation (5. .JLX)2j by s ubsti tul ing this expression into t. = I . IX.t.h~ = CT~CT1.27).rl = £[(X.. .' 2 £( X f) 1 (5. ( 1 . where the second equality follo ws Occause £I(X.4 simplify tY't • = ~ and fliT..X) ' • . the condItional vari.J. IX. s~ . (homoskedasticityonly) and (5.29) ..fixJ1var(u.:z 1 Ihe defini .•: ~resent cd '" (5.t . .mcc of to II.3.formulo$ for OLS Standard Errors 181 ni for v.J. The reaso n ing be hind the estimator b the same as behind OJ./oIx)II . (5. is homos kedastic.m: ~. where nume ralOr of Equation (4.. ." L /I '1 U (homoskedasticit'1only) .
We next show tha t the OLS estimator is a linear condit io nally un hia wd estimator. APPENDIX The GaussMarkov Conditions and 5. which is conslant .1( cond ition (iii) is implied by Ihe tellst square~ U!>suml" tions.. plus the addit io nal assumptions thai the the ob~rvations erro~ are homoskedastic. X. Similarly..1 X. notc that E( uju/I X I . has mean olCro. (Assum ption 2).I X I . il fo llows that E(II/II. so condition (il) holds...d.. . an d lhat the e rrnTS !'I re unco rrelated for Jiffe re m observlllions. . and a t. £(uil X.3).. j .\Usc (Xi' Y.1 X I' .).I. respect ively.. This appe ndix begins by s til ting the Gauss·Markov condI tions and showing that they are implied by the three least sq uares c(lOdi lion pl us homoskcdasticity. )C.X~).efficient) condi tionally lifl1!dr unbiased estimator (is BLUE). Xl "'" X... XJ .) . n. IX .. . lind by Alisump lion I. The three condi tions.c E('I. hilS a CODSlant va ria nce. ... the n the OLS estimator is the bcsl (mo~t .. Finally.) arc i.31 ) "* j whe re Ihe condi tions !:Iold for i. XII) "" 0 (ii) va r (u. The homoskedasticityonly SlandaroJ errors are Ih.! X. . i 00 (5 .).I X.) . Assumption 2 also implics Ihm E(II. Because arc i.II/I X. the Ga uss·Mil rkov theo rem siaies Ihal if the GaussMarkm condilion§ hold . where aillhese ~tate mc nts hold coodition:tl1y on all ob scr\.. . . that I' .: square roots of Di. .) = £(II.I X !. and becau~ the errors arc assumed to be homoskcdastic.... The GaussMarkov conditions are implied by the th ree least sq uares assum ptIon) (Kc~ Concept 4.I X I . . To show Ih. j.) . 0 < O" ~ < (iii) £(11. The GaussMarkov Conditions The three Ga uss·Markov conditions are (i) £(u.. by Assum ptio n 2..0.IIJIX" X.i.182 CHAPTER S linear R egreuion wi"" One Regressor where ~ is given in Equatio n ( 4. we turn 10 the proof of the theorem. var(u.) "" ~ .. £(u. .. so condition ( L ii) .) . XI) bec.1'1 )..X~) "" 0 for all i ..i.5.d. X..) = E(II/I X') £(II/I Xj ) for i ? j: bc:cnu<. . .) = 0 for all i.. .Assumplion 3 (nonlero fini te fo urth moments) ensures that 0 < ~ <.var(u. XI " " .O: thuscond ilio n (i) ho lds.~·d Xs (Xl.11. hy Assump tion 2. va r( II. ..2 a Proofofthe GaussMarkov Theorem As discussed in Section 5. stall' thJI u.E(/I.• X..
X) = :l:... X.1 I1...) + PI (L::' lll.) = P I. Because ~1 is conditionally unbiased by tlssumption. a nd collecting terms. the OLS estima tor 0 is 1 11 linear esti mator. .. thc least squares assumptions in Key Concept 4..IQ iY. th us.= ". . ) ~<% . l (X.32) depe nd on X l •. o f Iht' errors.8 d X I• . X )' " . . Substituting Y.'.Xn) = 2 7 _IG . for a ll esti mato rs PI salisfying E q uations (5. (i n . is . ~Ie that Proof of the GaussMarkov Theorem We start by deri ving some faclS Ihal hold [or ail linea rconditionai lly unbiased e stintators tha i IS. = I... i.) = .31) The rl!sult tha t f31is conditionOlUy unbiased was previo usly shown ill Appendix 4.X.!.'.)+ i. Xn) ".e the weigh IS a " i = 1.x. . . 0." linear ndi ro ::E (X. .25). (5. imply the GaussM arkO" conditions in E quation (5. .61 10 be condilio na11y unbiased.33) ::E(X..'..X..Pl i~ conditionally unbiased. .Q.7) yields  ~7 1 (X...(in. ' X R bUI nOI o n YI . ::E (5.BnO:. b ut for thisequa1ity to hold for a ll values of 130 (lnd P I it must be lhe case thaI. Y ix l i where.The GaussMarkO\' Conditions and a Proof01the Gou~~Morkov Theorem 183 'h' holds. = f30 related bsen 'ed + f3. biased Y" ..X. = (X.I . iI X) X) ' " (X.34) \ SSUOlP' ." ..32) I plu!> Beca u:. it mus t be t ha t f30 (L.) +~. + U j into PI = I :..suOlP· Assump' By the fi rst Gauss·Markov cond ition. and the \'aria nce of the conditional distribution of ~T' give n Xl'" .: \ (5. 0. " "" La.X) = O(by Ihe definitio n o(.. . bccau~c il ion (iIi) " La. X) " ~ '" (5. • X. a.::. E(27 . (. plus homoskcdaslicil). (or .  I"" .YL.3 1).3.35) • .'. flISt note thill. Under the Gauss·Markov conditions.X)(Yi y) = ~:.. because :E~_I (X.X)' . .. .  " (X. . The OLS Estimator (3 .(Xi .34) yi e lds E(.... .. Is a Linear Conditionally Unbiased Estimator 10 show Ihal ~ 1 is linear. II1. .E(ud X1. .24) !lnd (5._I(X. va r(tld X j."<" 1 ' '' 1 II (5. ThUs. ••• .).lIi IX I" .\a. i = I 0 and ~G. ta king condition al expectations of both sides of Equation (5.x. .X) Yi • Substi tuting this result into the formu la for ~l in Equalion (4. wc have thaI ms(Key Becau$t ~.3.. 11 in Eq u!llion (5.) .) + f31(2. '.. X n .. XlV.Y). ::E ~rko\' X) Y .
..... . • • • u ~ • .c: 11 .o .c:..~~ ~ H! . of c ~.c: 0>': ...' " .:..' "0 E o c: Z > ~ . E . 0:.:.~ . .
.' is no regressor.. To see this. are i.I' . a.d" lhen ... + d..i. .) when Vi' .i. Y. ~u m · The Sample Average is the Efficient Linear Estimator of E(Y) An implication of Ine GauJOs· Markov theorem is Iha\ the sample average.. is the most efficient lincar estimator of £( Y . It follo ws that. Y" are i.'C.i..3...' )l"..36) Equa· for Ine iJo = Y.). 11 . ...37) ... .The Gouu'lv\oriov Conditions and a Proof of the GouwMortov Theorem . Thus his result (5.xCCpl [nat • . O. mple 10 the next. X~ " s t nlcmcnlS arc unne. lllis result was stated previously in Key Co nce pt 3.• XII take on the same \ alues from o ne :. so iI {ollow$ that Y is BL UE if YI. Y. Then the OLS estimator (5." so that the only regressor is the constant regre!>SOT X I\' = I. But o nonran· : repealed >sumptio n i.11. Xn ): . 1". consider the case of regression without an " X. 'xceeds \ .d. . Y is BLUE Note thn! Ihe Gauss·Markov requireme ntlh!!t the error be homosl:edastic is irreleva nt in this case because then...o..ssary because X l •.baS 18S all of Ine "condnional on Xl. under the GllussMarkov :lssumptions.d.
A lthough school dislricls with lower studentteacher ralios te nd 10 have higher test scores in the California data set perh aps stude nts from districts with small classes have olher adva ntages that help the m perform weI! on standardi7cd leSIS. Thh c haple r e xplains how to estjrnate the coefficients of the multiple tincar regression model. I Omitted Variable Bias f " By foc using o nl y on tbe student. a nd the re by estimate Ihe e Uect of one regressor (the studenHeache r ralio) while ho ld ing conStant the other varia bles (such as srude ot characte ristics).l nese omitted r.nts attest scores b) col lect ing thei r influences in the regreo.sion error term.teacher rat io. slUdied in C hapte rs 4 and 5. Co uld th is have produced misleading results a nd. The key idea of multiple regression is that. a method lhal can eli minat e omi!led varia ble bias. suc h as stude nt cha racte ristics.CHA PTER 6 Linear Regression with Multiple Regressors C ha Ple r 5 e nded o n a wo rried nOle. biased . if so. wha t can be done? O mitted fac tors.. then we caD include them as addilional regressor. ctors include • . more precisely. Ma ny aspects of muhiple regression parallel those of regression with a single regresso r. TIlis c hapter e xpl:lins this "omitted variable bi::ls " a nd introduces mllhipJe regression. 6. if we have dHta on thest: omitted variables. The coeffici e nts of tbe mult iple regressio n model can be estimated fr om da ta using OLS: the OLS estimators in multiple regression are random variables beca use the y depend on data trom a random sample: (Itld in large samples the sampling distributions of the OLS estim atOrS are approxim :Hely normal.can in fact make The ordjoary least sq ua res (OLS) estima tor of the effect of class size on lest scorc ~ misleading or. the empirical analysis in ChaP tcrs4 and 5 ignored some potentially imponant detennina.
. the OLS esti ma lor of the slope in the r egrcssion of test scores on the s tude nt.ud< been omin ed from the analysis (t he percentage of English learners) and that deter mines. then the OLS estimalo r will have omitled variable bias. To illustrate these conditions. in part. If dis tricls with la rge classes also have many siudents still learning English. 19.lence to this concern. H ere is the reasoning. thell it would be safe 10 igno re E ngIJsh profic ie ncy in the regression of lest scores ag<li nsl Ihl: ::.teache r ratio (larger classes). A ccordi ngly. Ihen the OLS regression of lest scores o n Ihe stude nt . but her hopedfor improvement in test scores will fa il to maleriali ze if the true coefficient is small or zero. We begin by considering a n offiillCd student c haracleristic that is particularly re levan t io Ca lifo rnia beca use of its large immigrant population: the prevalence in the school district of students who a re sli1l learning English . But because the student. the de penden t va riable (tesl scores).l compute r usage.1 Omitted Variable Bios 187 school c haractcrislics. the superinte ndent might hire e nough new teach e rs to reduce the studentteache r ratio by two. By ignori ng the percent age of English le arn ers in the district . A look a t the California dal3 lends crec. The correlation belweeo Ihe SludenHe<lchel r:lIio and the percenlageof E nglish learners (students who a re not na tive English spt:akers and who have not yet mastered English) in (h e district is 0.Iude nttcacher ra1io. and st uden t characteriSlics.even zero. il is possible that the OLS coefficie nl in the regression 01 test scores on the student.Icacher ra Tio could e rroneously lind a correla tio n and produce a large estima ted coefficie nl . col . Stude nts who a re still learni ng Eng li!oh might perform worse on standardized tests than nativc English speakers.teacher ratio. Definition of Omitted Variable Bias II" the regressor (the studen l.Ieacher raljo were unrebte tlt o the pe rcentage of E ngli sh learners.teache r r:uio and the pe rcentage of English learne rs are correlated.Ieacher ratio co uld be bia!1.teacher rat io reflects tha t influe nce. based on lhe analysis of Chapters 4 and 5.teacher ratio) is correia le d wi th a varia ble thai has !+ ~ "p .. consider three examples of variables tha t arc omiltcd from the regression of test scores o n the studentteacher ratio. such as tcache r quality a nc. when in fact the Irue causal effect of cutting class sizes on tesl scores is small.ed. Th is small but posilive correlation suggests that districts with more English learners tcnd to have a highe r studen t. If the swtlent. O mit led variable bias occurs when lwo conditions are true: (1) the o mitted variable is correlated with the included regressor: a nd (2) the o mitted variable is a determinant of tbe depende nt va ria ble.6. such as fam ily background . that islhe mea n of the sampling dislribution of the OLS estima tor might not equ al the tr ue effect on test scores of u unit cha nge in the student.
This variable satisfies the first but no t the secolld co ndition fo r omit ted variable bias. no t the parking lot. For example. so the second condition holds.e r omiued variable is park ing lot space per pupil (the a rea of the teac he r parking I~ I di vided by the n umber of stude nts). Because parking lot space per pupil is no t a det e rm in a nt of test scores. Thus. H owe ver. in which case tbe percen tage o f English learners is a determinant o f test scores and the seCOnd conditio n for o mitted vari able bias holds. Fo r th is o mi tted variabl e. unde r th ~ ass umption that learning takes place in Ihe classroom .3is incorrect.thaI E(u. Speci. Thai is. other than Xj. so the first conditi on wou ld be satisfie d. omitt ing it from the analysis docs not lead to om itte d variable bias. sch ools with more teachers per pupi l probably have mo re teacher parking space. Another variable omille d fro m the analysis is the time of day that (he test was administered.teacher ratio. it is plausible that the first condition for omitted variable bias does not ho ld but the second condition does. CHAPTER 6 linear Regression with Multiple Regressors Because the percentage of English Ic arners is correlated with the stude nt. the OLS estimator in the regrc<. 1. However. To see why.· sion of les t scores 00 the stude ntIeacher ralio could incorrectly refleci the inOu_ ence of the o m illed variable. thus the second condition does nol hold. th en the time of day a nd class size would be uncorrelated so the fir st conditio n does nol ho ld. the lime of day of the test could affect scores (ale rtness varies thro ugh the sc hoo l day) . A no lb. the firs t condition for omiued variable bias holds.teacher ratio could not he incorrectly picking u p the " lime of day" e ffect Thus o mitting the time of day of the Lest does nOI resuh in o miued variahle bias. Omitted variable bios and the first least squares assumption . a5 listed in Key Concept 4.• . because in this example the time that tbe test is administered is uncorrelate d with the studen t. the pe rcentage of English learne rs. Example #1: Percentage ofEnglish learners. recall thaI the error term II. parking lot space has no direct effect on learnin g. O mitted variable bias means that th e firs\l east squares assumpt io n. the slUdent. .) = 0. if the lime of day of the lest varies from o ne districT to the next in a way that is unrelated to class size. It is plausible thai students who are still learning Eng. omitting tbe pe rcentage o f English learners may introduce omitte d va riable bias.teacher ral io. lish will do worse o n standardized tests than native English speakers. tha t arc determinants of Yi .. Example #2: Time ofday of the test. Example #3: Parking lot space per pupil. If one of these other factors is corrclHled with XI· .fically. I X. Omitted variable bias is smrunarized in Key Concept 6. in Ihe linear regressio n mooel with a single regressor re presents a Ufactors.. Con ve rsely.
6. Y. PI . Omitted X) d ). e parking (6.!!.(<J.3>'. and if it is correlated with Xi' Ihen the error lerm is eorrelal. X. This bias does no t van ish even in very large samples. twO conditions must be true: f"" '*''. is nol a consistent estima tor of f31 when there is o mitted vari ahle bias.1 :gres influ lilt ing 1.ITor term (which contains this factor) is correlated with Xi. 1) summarize~ severaJ of the ideas discussed above about omi tted variabl e bias: 1.. Con l ( r:l~ :~: . For omilled variable bias to oceuT.. of day" variable A Formula for Omitted Variable Bias The discussion of the previous section about omitted variable bias can be sum marized mathema tically by a formu la for this bias.1) P . be corr(Xj..(<J .1) is Ihe bias in {. the condit ional mea n of lI.. ThaI is. as tbe sample size inc reases. and the aLS estimator is inconsisten t.. lId riable. and Xi a re correlated . Omitted variable bi as is a problem whether the sample size is large or small.\'I' (J'x' ~ is park : number fo r omit ~bly ha"':: 1. TIlt! term Px. /3 1 '+ f3 1 + P. 2.s from !of day / . Suppose that the second and third least squares asswnplions hold .. KEY CONG'" j 6. .m the this means that the (. Let the correlation be tween Xi and U. then it is in the error te rm.E..other tn:lI1 ed with X" Because P I does n OI converge in probability to the true value (31' ~1 is incon sistent.) = Px. given Xi is nonzero. but . 1 Omitted Variable Bio~ :e of n ror 189 Eng case 'eand O MlnED VARIABLE BIAS IN REGRESSION WITH A SINGLE REGRESSOR Omitted variable bias is the bias in the OLS estim ator that arises when the regres sor. is correlated with an omitted variable. {31 is dose to (31 + Px. but the first does oat because fix" is nonzero.I that persists e ven in large sa mples.1 ) condil io n Int of lest ias.Iff x) with increas ingly high probability.. rauo... The formula in Equation (6./ux) in Equat ion (6. X is correlated with the omitted variable. as r term Ui in . The omitted variable is a determinant o[ the dependent variable. This correlation therefore violates lhe firsllea sl squares assumption.0WCver. if an omilted variable is a determinant of Y. TIlen the OLS esti mator has the limit (derived jn Appendix 6.In o ther words.:d with Xi' Because ii. tha t is.. and the con sequence i~ serious: The OLS estimator is biased.u.
1176) Hnd the one by Lois Hetland (pp.0% . or those schools with . especially the lIr1iclc by Ellen Winner and Mon ica Cooper.lson trolled experiments on the Mozart eff. we specullted that the percentage of $lU' dents learning English has a lI eglllil'e effect on district test scores (students still learning English have lower scores). 111e direction of the bi as in ~ t d epend s on whether X an d 1I Alt<.~ilil'ldy correlated with t he stud entteacher ratio I l'er~entagj <:: 1. ~O. the larger is the bias.. (pp.art. 190 CHAPTER . ever.148). By omitling factors such a~ the s tudent's innate abil ity or the overall quali ty of the school. I99J) suggested that listening to nlUsic courses appears to have omined variable bias. suggests that the rea! fI. the many con What is Ihe evidence for lhe "MOlart effect"? A re view of doz~ns of studies found Ihal students who take optional music or arts courses in high school do in fact have higher E nglish and math test scores than those who don't. Shaw ::Ind Ky. In our data. The larger is IPx. it seems that listening to classical music (Ioes help temporarily in one narrow area: fold ing paper Instead.:ct fail to show thAt listening to MOJ:art improves IQ or general lcsi pt:rformancc."l'am for an origami exam. too. Th. For example.O~ :>23.) Taken together. the 'luthors of the review suggested that the correlat ion between testing well and taking art or music could arise from any number of things. more interest in doing aCross the board. how· e\'er..' A closer look at these Sludies. For rcnsons not fully understood.8.1 deeper music curriculum might juSt bo.ISY way to make their childr\!n smarter.lislne are positivel y or negalively corre late d. For exampk. the fraction of English learne rs is pv. For a while.1I sludy made hig new~ and politicians and parents saw an o.. TABLE 2.R2J .9% 1. So the next lime you !. belween the regressor a nd the error term. so th at the percentage of English learners enters the error term with a negat ive sign.9...8% H. 6 linear Regression with Multiple Regressors A stUd)' published in JIoiarurl! in 1993 (R::Iuseher. try to fit in a lillIe M07. the acade mically b. the state of Geor gia even distributcd classical music CDs!O all infanlS in the state.: bc tter schools In the terminology of reg ression .:ncr students might ha \"c more time LO take optional mu. (As diocussed in Chapter 4. to have <1 n effect on test scores when in fact it has none. 3 . W hether Ihis bias is large o r small in practice depends on the corre lation Px.. Mozan for 1015 min utes could temporarily raise your IO by 8 0l '} points. randomized controlled \. l. how· for the betler lest performance has littk to do with those courses. the estimated relationship betWeen lest score~ and laking optional lSec the jrJ/lTllul of At!Slheric Edlt((Jl'OIl 34: 34 (FaIiMill ler 2(00).art effecr~ One way to find out is to do a randomized controlled experimcnt.105.'xpe riments eliminate om itted variable bias by ran domly assigning participants to " treatmen t" and "control" groups.sic courses or and visualizing: shapes.~. So is therc a M07.$tudying music appear..
72 0.13 denl'> gli~h II} 'ItR"o ion of nl ti Cl r.0.5 665. Ratio ~ StudentT_'20 DiH... Among this subset of dis tricts...0.teacher ralio ~I would be bi. ::.. Htgh 5TR . Districts " " . \ " .04 PCrCtn13gc: of English learners '< I~·ll.3 low".1 " 44 50 51 1. 665.ct.' 7. 7..6.s. so Px" < 0 and the coeffi cient on Ihe student.' ...d T ABU 6 ..nn(U 650. Test $c_ " 182 di. so one reason tha t the O LS estimator suggests that small classes improve lest scores may be that the districts with small classes have fewer English learners.U% .. 23. holding conslUnt other factors. In othe r words.statistic 4. 657. instead of using data for aJI districts.68 634.\ " 238 .7 27 44 .2 654. f SlU ['or 664.. This new way of posing her question sug gests that.9 3. includ ing th~ percentage of English learncn>. but she has no control ove·r the fraclion of immigrants in her community. As a result ....8 6'9.. by the Percentoge of English Learners in the District StvOentT_heI'" ROlio < 20 pr_ the ~s ~II Te st Sc:_ .lscd toward a negative number.. she is interested in the eITect of the studentteache r ral io on test scores.4 .I Omi!!ed Variable Bios I9 T (dist ricts wilh more English learners have larger classes)..teacher ra lia (X ) would ~ lU!g(lfjVe/y correlated with Ihe error term (II). 1 \ Differences in Test Scores for California Sthool Districts with Low and High StudentTeacher Ratios. Thus llle student.~t $c. 636.''"% .teacher ratios.rer>(e in T.4 661. 1 reports evidence on the relatiom hip between class size and test scores with. perhaps we should foc us on districts with percentage!. I. do those with smaller classes do better on standardized tests? Table 6. Addressing Omitted Variable Bias by Dividing the Data into Croups What can you do about omilled vari able bias? O ur su perint endent is considering increasing the num be r of teachers in her distr.. of English learners comparable to hers.in districts with compa r<l ble percentages of English learners. having a small percentage of English lea rners is assoc iated both with high lest scores and low student.' 61 I.30 1...0 .
So.teacher ratios is 664. districts wi th low stude ntteacher ratios had test sco res tha i averaged 3. so (h e null hypoth esis that th e mean test sco re is the same in the two groups is rejected at the 1% significance level.2 points for the third quarLile and only 1.9 points fo r the quartile of districts with the most English lea rners. . Once we hold the J>l!r centage of English learners constant. the average test ::score for those 76 with low student.04 .1 8) as the OLS estima te of the eeefti · ciem on DI in the regression of TesrSco re on Db where Di is a binary regressor thaI equals 1 if STR . th at is. First. the diffe rence in test scores between these two groups without breaking them down further iOlo the quartilcs of E nglish learners.3 poi nts highe r th an those wi th bigh stu dentteacher raLios: this gap was 5.4 points. The final fo ur rows in Table 6. At first this finding might seem puzzli ng. depending on whether the studentteacher ratio is small (S'f R < 20) or large (STR ::! 20).192 CHAPTER 6 L inear Regression with Multiple Regrew>r~ ore d ivided iota ei!!)lt groups. The difference in the average le~1 score hetween districts in the lowest and highest quartile of the percentage of En!!" lisb learnerS is large. The districts with few English learn ers tend 10 have lower student. within each of lhese four categories. the average test Scare is 7. the districts are broken into four ca tegoric~ that correspond to the quartile~ of thc distribution of the perce ntage of English learners across districts.te acher ratio th an a hi gh one .5 and t. How can the overall effect 0 1" te'! scores be twice the cffcct of test scores witllin . for the districts with the fewest E nglish learners.teacher ratios.teacher ratios: 74 % (76 of 103 ) of \be districts Ln the first quartile of English learners have small classes (STR < 20) . Thus.1 report the diffe rence in test sco res between d istricts wit h low and high studentteache r ratios. test scores were on average 0.4 points higher in d istricts with a low student. (Recall that this difCerence was previ ously report ed in regression form in Equation (S.my quarti le? The answer is Ihnt Iht: d ist ri cts with the most English learn e rs tend 10 have borh the h igh~~t student. or the districts wi th the fewes t English learners « 1.teacher ratios alld the lowest test scores.he average for the 27 with bigh student.) Over the full sample of 420 districts.. The firs t row in TabJe 6.9 points lower in the districts with low stu dentl eacher rfllios! II) the second quarti le.)triclS.teac he r ratios is 665. the dist ricts with the most English learners have both lower lest scor~s and higher studentteacher ratios than the other d i. d istricts arc further broken down into two groups. approximately 30 points. < 20 and equals 0 oUlerwise.9%). the difference in performance betW een di~ trie ts with hig h and low studentteacher ratios is perhaps half (or Jess) of the overall estimate of 7.1 reports the overall difference in average te51SCorcs between districts with low and high student. while 01\Iy42% (44 of 10:5) of the djstricts in the quartile with the most English learners have sm311 classes. Second. broken down by the q uartile of the percentage of E nglish learners. the {statistic is 4. Th is evidence presents a diffe rent picture.4.
(6. Eq uatio n (6. or rc for 7 wilh nglish o w stu The Population Regression Line Suppose fo r the mo ment thai the re a re o nly two independent variables.eacher I gh s1U . given that X l.teache r ratio (X]. th e average r elationsbip be twee n these two independent variables a nd the de pendent \'ariable. Xli a nd X 2i . if the student. mo rc simply..) while holdi ng the other regressors (Xi i' X. Slili. the coef ficient f3 .2).2 The Multiple Regreuion Model 193 ~ies i ·lish are he r ores "nee into usl\' £fi r that triets. = Xl a nu X 2. th is analysis does not yet pro vide the super intende nt with a use. given hy the lin ear fUllction I '..ful estimate of the effect on test Scores of changing class size.teacher ra tjo in the jlb district (X li) equals some value x. using the method of mull iple regressio n. O ne or more of the inde penden t variables in the multiple regression model arc some times referred to as control l'ariables. Y. given the studentte acher ratio anu the percenl flge o f English learners is given by Equation (6.t] "t" f3rz. holding constan t the fract ion of E nglish learne rs. the lest score differences in the second part o f Table 6. In the linear multi p!e r egression model.2) b highest ~Tage test ~e of Eng ~isb \caW districts in :only42% ~ave small Itest sco T':S wh cre E( Y.2) is the population regreSSion line o r popuJatioll regressio n runc tio n in the multiple regressio n model. = X 2' That is. 1.6.~ ha n ging onc variable (X t . more simply. the mu ltiple regressio n model provides a way 10 isola(e the effect on lest scores (Y. the coeffici ent on XII' lind the coefficienl Pz is the slope coeffi cient of Xu o r. In the class size problem. = x2) is the condit ional expectation at Y.). Such an estimate can be pro vided .1 improve lipan the simple differenceofmeans analy sis in the firsl line ofl"ahle 6. of . This model permits estimatin g the effect on Y. is.2 The Multiple Regression Model The multiple regression mode l extends the single variable regression mode l o f C hapte rs 4 and 5 to in clude additional variables as regressors.) while holding constfln t lhe percentage of studen ts in the district who are E nglish learners (X 20. a nd the percentage of English learners in the J"th district (Xu) equals X2' then the expected value of Y. is the slope coeffici e nt of Xli or. By look ing with in quartiles of lhe percent age or English learners. l vel.9 points the per reen dis ~) of the ~\ of I CS t thai the E( Y. • . 6. The coefficient f30 is the inte rce pt.) of the stu dent. acher n1l! an This analysis reinforces the superi ntendent's worry that omitted va riable bias is prescnt in t he regression of test scores against the studentteacher ratio. tween tile o f re. IX jj = Xl' X 2i = Xz) = ~o + f3 1. the coeffici ent o n Xu.l X l' = X l ' X z. however.i' and so forth) constan t.
2) is the relat ionship between Yand Xl Rnd X 2 that holds on average in the population. is the effect on Y o f a unit change in X I ' holding Xl co nstant or controlling for Xl ' This interpretation of P! follows from the defi.4) The coefficieot p\ is the effect on Y (the expected change in Y ) of a unit change in Xl' bo lding X'2 rucd . Thus the popul ation regression function in Equation (6. o ther student characteristics. yielding 6.nition that the expected ellen on Yo ( a change in X l' 6.194 CHAPTER 6 Unear R egrcs5ian with Multiple Regres$Or5 The interpretation uf the coeffi ci. + /3zX2 from Equation (6.sion wi th a single regressor.holding X 2 cons taot (6. J USI as in the case o f rcgre<.2) needs to tK: augmented to incorporate these additional factors. the intercept 130 dctt:r mines how fa r up the Y axis the population regressio n line starts. hOldi ng X 2 consta nt. ctoTS Ihal deter .3) An equation for 6 Y in terms of /lX l is obtained by su btracting the eq uation Y = AI + PtX. write the popu lation regression (unelion in Equa tion (6.2). Ano ther phrase used 1 0 describe 131is the plllrtial erred on Yoi Xl' holding X1 fixed . Just as in the case of regression with a single regressor. Accordingly.2) is different than it was when X II was the only regresso r: In E q ua lio n (6. while holding Xl constan!. The Population Multiple Regression Model TIle population regression line in Equatio n (6. (or exam ple.(X I + AX\) + P2X 2• (6.Ieacher ra tio and the fractio n of students still iea rniog E nglish. in Eq uation (6. fJ . that is. is the difference bc tll. say . the new va lue o f Y. een tb e expecled value of Y when the independenl variables take on the values XI + I I and X2 a nd Ihe expec ted value of Y whe n the independent variables take on th!' values X I anu X 2.Y = 1316XL' Th at is.fJo.2) as Y = Po + (3 jX 1 + ~X2 ' and imagine changing X l by the amount .11 change by some amount . After this change. Y w1. Y + 6 Y..3).1 X \ while O Ot changjng X 2.XI . The interpret ation of the intercept in the multiple regression model. and luck. Tn addit ion to the stU dent.1 y. this relationship does not hold exactly because many other (acto rs influe nce tbe depe ndent variable. when X li and Xu are zero. Y = Po + J3 . the f. 131 = :~.enl p.. Because Xl has changed. is y + 6.i s sim ilar to Inc interpretatio n of the intercept in the singl e re gr~ s sor model : It is the e:'Ipected va lue of Y. however. test scores are innuenced by school characteristics. Simply pU L..
tj in the multi ple regression model is homoskedastic if the vari ance of the con dit ional distribution of II.. given Xli"'" . The multiple regressio n model with k regressors.. = (30 + {3\ X Jj e + M 2i + u" i = I.6. This reasoning leads u ~ to consi de r a model with three regressors or.4) aoge cl an r sim is the (leter· ~ ynnd rcs~i o n )cca use . n and thus does no t depend o n the values of Xli' . The multiple regression model holds out the promise of provid ing just what the superintende nt wa nts to know: the effec t of cha nging the student. Xl e where the subscr ipt i indica tes th e ph of the 11 o bserva tions (distric ts) in the sampl e. tltink of f30 as the coefficient on Xu.. more gener ally. (In The variable Xr~ is sometim es called the constant regressor because it takes on the same value. . The definitio ns of ho moskedasl ic ity a nd heleroske dastic ity in Ihe multi ple regression model are extensions of their defi nitions in the singleregressor mode l.IXli •···• X ki ). == f3oXOi + {3\Xli + ~X2' + ui.5) and (6.. a re eq uiva le nt The discussion so fa r has focused on the case of a single addilio na l variable.. n. the populat ion multiple regressio n mode l in Equa lio n (6. a mod el tbat includes k regressors. n. To be • . For example. where XOi == 1 for i = 1. E quation (6. TIle error te rm l. The two waysofwriling Ihe po pula tio n regressio n model. . . . teristi(:S. just as ignoring the frac tion of English learners did. (3(1) is some limes called Ihe const aol term in the regression ...he stu· "ample. + . .2 The Multiple Regres~on oYlodel 19S . Accordingly.. Xl i' X li' " . is constant for i := 1. .. X li and X2r I n regression with binary regressors it can be useful to treat (30 as the coeffi cien( on a regressor tnat always eq uals 1. Eq uatio ns (6. holding constant o the r fac tors that are beyond her con trol. .. Accordingly. .teacher ratio. Similarly..n.5) .\i' var(u.5) can a lte rnatively be wriuen as d.whe reXr~ == l. .the value Ifor all observati o ns.X. incl uding the economic background of the siudenis. we have Y. is summ a rized as K ey Conce pt 6.. XJ:.2. (6. ignoring the stude nts' economic background might result ill omitted variable bias. the re might be multiple factors omitted from the si ngle regressor model.3) ~ i on Y.6). however. Xl' In practice. 5) is the population multiple regressio n model whe n there are two regressors. bIll Olhe r measurable fac tors that might a[fect test pe rformance. These factors incl ude nOI just thc percentage of English learners.2 ) aj. t "e rror" term u i' This error term is {he deviation of a particul ar observation (test scores in the /1h district in our eXample ) from the average population r elationship.Xkr Otherwise. ds to \'lC a l d~ter .i. . the intercept. (6. the error term is beteroskedastic.i = J.6) K6.
• The population regression line is the relationship thai holds between Y and the Ks on average in the population: £(Y !X II =.. that equals I for:lll i... . however. X . 'Th.XI.. Xo.196 CHA PH I 6 lineor Regres!. these coefficient!..Xl l = X2. e.. ... The coefficient fJl is the expected change in Y. Fortunately. is the error term.:. holding constant Xl.ion with Multiple Regre$$OrS I Kf' CONaPT THE MUlTIPLE REGRESSION MODEl The mu1ripk regression model is YI = 6... ... sian model calculated using a sa mple of data.. ·· · .. . we need 10 provide her wit h c~ti m'ltcs of the un known population coefficien ts flu" . II (6.. '~I:of the population regre::. " "..Xtj + Ill ' i = where • ...tit observations on each of the k regressors: and Yj is . + fJ. . are the /I .e interce pt can be thought or as the coefficient on 11 regressor. d '"""fJo + P I Xj + ~1'2 + .7 ) Po+ {JIXU + fJ2Xu + .3 The OLS Estimator in Multiple Regression This section describes how the coefficients of the multiple regression model can be esti mated using OLS. . .. resulting from changing X li by one unit .2 ~ 1. X/w The coefficients on the other Ks afC interpreted similarly. • The intercept fJo is the expected valu~ of Y when all the X s equal O. + fJlrf /i' • f31 is the slope coefficient on X" {32 is the coefficient on X 2• and so on.Xj. 6. of practical help to the superintendent.ln be cstimaled using ordinary least squares. = x.11t observat ion on the dependem variable: XI I' Xu ..
8) is the extension of the sum of the squared mistakes given in Equation (4. =[ " (Y. . X kl' based on Ihe OLS regression line...f3k· The OLS estimators are d e noted 'ou· 131' · .. 0= call e~tima t ors could be computed by trial a nd error.8) are called the ordinary l~as! squar:s (OLS) estimators of /30.. The formulas fo r the OLS estimat ors in the multiple regression model are simil ar to those in Key Con ccpt 4. 13k· The terminology 01" OLS in the linear multiple regression model is the same as in the linear reg.okin the multiple regression model.ression model with a single regressor..6.2 (or the singleregressor mode l..bl Xy. These form ulas are incorporated into mod ern statistical software..1.b1X t. .. + .00'. 2: bo b.es for the linear regression modd in expression (6. The OLS regression line is the strai ght line consLIucted using the OLS estimators: ~o + ~I XI + .. . b. It is far easier.:i) ~ Y. . . . + b0/.l'. .x. T he es timators of Ihe coefficients fio. w their presentation is deferred to Sect ion 1B.. Y. + b~k" and the mis wke in pred icti ng Yi is Yi .8) Tlle sum of the squared mistak.2 shows how to estimate the intercept and s lope coefficients in the singleregressor model by a pplying OLS to a sample of obserntions of Y and X.XI:'. . repe atedly tryin g differen t values of bo.. however. .. the OLS residual is can u. is V. The estimators that d o so are the OLS eslima 1 0rs. + ~kXk' TIle predicted value of Yi given Xli' . . by choosing the est imators bo and b l so as to minimize L~.I (Yi . Lei boob]. . In the mu ltiple regression model . The method of OLS also can be used to estimate t he coefficients /30..'oo and . is bo + b]Xli + .8). . 131.X" . .Yj .. .(bo + b 1X1.. The OLS residual for t he /"Th observation is the dif fere nce between Y.b. . 111e OLS . the formulas are best expressed and di ~ c usse d using matrix not Cltion .'ok that minimize th e sum of squart:d mistakes in expression (6. and its OLS predicted value_ th at is. (6.bo . to use explicit fo r mulas for th e OLS estim ators that are derived using calculus.. . .OJ. .. h k until you are satisfied that you have minimized the total sum of squares in expression (6. .6) fo r the linear regression mode! with a single regressor. /31' . .. .0 1.Tht: sum of these squared prediction mistakes over all II observations thus is .3 The OLS Estimator in Muhiple R:egfe~sion 197 The OLS Estimator Section 4. •. ~ ~o + ~lXl.b o .0. + + ~r.The predicted value o f Y" ca[culated using thcseestimators.ok.tXkt.. lhal is. . The key idea is Ihat these coefficients can be estimated by minimizing the sum of squared prediction miSlakes. bl: b e eslimators of ...
. .hl be picking up the effect of having ma ny E ngl ish learn e r~ in ois tricts with large classes. IJ.. .bo ..28 x STR.. reported in Eq ua tio n (4 .11 ). h( that minimize the sum of squared prediction mistakes L~_I (Yj . The OLS estimators ~o. and residuals ii. •fh arc the values of bo.)... TIle defi nitions and terminology o f OLS in muhiple regression are sumnw · riled in Key Co nc~ pl 6. We arc now in a posit ion 1 0 address this concern by us ing OLS to estimate a multiple regression in which the dependent variable is Ihe test score ( Y. is r.9) (6...i = 1.. These are estimators of the unknown true popula tion coe ffic ients {3o.... PREDICTED VALUES.Xk/' i = I. .198 CHAPTER 6 Linear Regression with Multiple Regressors THE OlS ESTIMATORS. i = I. Application to Test Scores and the StudentTeacher Ratio in Section 4.Y' The OLS predicted values Y.• (3A and error term. ANn RFSlnllAI ~ IN THF MIILTIPLE REGRESSION MODEL ~CONaM . we used OLS to estimate the inte rcept a nd slope coefficient of the regressio n relating test scores (TcsrScore) to the slUdeoHeacher ra tio (STR ). and (6. using OUf 420 obse rvations for California school distric ts. it is possible that the OLS estimator i" subject to om itted va riable bias. That is.b1XII ..b~J.) and thc:rl! are I~O regressors: the studcnI. 11 /. are y/  Po + b1 xU il.teachcr ratio (XI.:. f31' .{3I " . 6..3 The OLS est imators fJo.2..__~~==~~~~ ______________________ T ••..10) Yj  Y....9  2.11 ) O ur concern has been that this rela tionship is mis lead in g becau se the stu · de nHeac hcr ra tio mig...s..b l•. '~k and residualtii are computed from a sampl e of II observati ons of (Xli' .) and the percent age of Engli~h ... 11.3. (0. = + ~S. ~I' . the estim ated O LS regression line.. X ki • Y.e = 698. 11....
65. so we will focus on that which is new wit h multi ple regression. 12)]. it is estimated to increase test scores by only 1. .0 .) for our 420 distri cts (i = 1.11) ).1) and tbe multi ple regressio n approach (Equation (6.11 ). . Pc{E L is not held constant. O f lhese two methods.0. Second. balding constant (or controlling for) PctEL. and the OLS estim ate of tlte coeff icient on the percentage E nglish le arn ers lfh) is .teacher ratio by two differ ent pa ths: tile tabul ar approach of dividing lh e data into groups (Section 6. TIle rest of this chapter i~ devoted to understanding and to using OLS in the multi ple regression model.1. The OLS estimate of the intercept St3o) is 686. . We begin by discussing measures of fit for the multiple regression model. These two es tim ates ca n be reconciled by concl uding Ihat th ere is omitted imate in the singleregressor model in Equation (6.10. 0.10 X STR . • . we saw Ihal districts with a high percelHage of E nglish 1c arn e r ~ tend to have no t on ly low test scores but also a high st ude ntteacher ra lio.1. multi ple regression has twO important advantages.  (6. First.12)]. it provides a quantitative esti mate of the effeCl o f a unit decrease in the studentteacher ratio. This difference occurs hecause the coeffici ent on STR in the mu lti ple regression is tbe effect of a change in STR. th e OLS estima te of the coef ficient on the sL ud ent. We have reached the same conclusion that the re is omitted variable bias in the relationship between lest scores and th e student. red ucing the stu dentteache r ra ljo is estimated to have a larger e ffect o n test scores. but in the mu l tiple regression equation (Equation (6.65 X Pct EL.3 The OlS Estimator in Multiple R egression 199 learners in the school district (Xl.. .28 points.420) .0. The estimated effect on test scores of a change in the studentteacher rat io in the multiple regression is a pproximately half as large as when the studentteacher ratio is the onty regressor: in the single regressor equation (Equation (6.teacher ratio (131) is 1.6. If the fractio n o f E nglish It'a rne rs is o mi lled fro m the regressio n. ln variable bias in th e eST Section 6.a unit decrease in the STR is estimated to increase test scores by 2. Much of ".1 0 points.1 2) f the us~ ng t SSIOn (6. TIle es ti· mated OLS regression line for litis multiple regression is TesrScQre = 6M. butl his estimate refl ects hmh the effect of a change in the studenH eacher ratio ami the omitted effect of baving fewe r English learne rs in the distri ct. to make her decision. so that muHiple regression can be used to con trol for mea surable fac tors olber than just the percent age of English Jearn ers. \'vhcreas in the singleregressor regression.'hal yo u Jearn ed ahout the OLS estima tor wit h a single regressor carries over to mu lt iple regression with few or no mod ifi cations. it readily ex te nds to mo re than two regressors.1 1l he stu 1earo l ataT is t imate D ~d ther: :English where PCfEL is In e percentage of students io the district who are English lea rn ers. which is what the superintendent need!..
in gle regressor: R2 _ ESS SS R TSS = 1 .l summary statistics in multiple regression are the standOl rd error or the regression ..1. In Section 4.. The mathematica l defin ition of the R2is Ihe saml. or ·'fits. TheR 2 The regression R! is the fraction of tbe sample variA nce of Y. (Y. the dfcr. The only difference b~ t wcc n the definition in Equation (6. ." the data.l) where the SS H.)2 and where the explained sum of squa res is ESS = squares is TSS = l: ~"I (Yi .n . lhe di visor n . As in Section 4.) as for regression with a .3.k ... Herl!.Y)~.1..ion line describes.le regressor.3 is the same as in Equa tion (6...2 _ 1 '" ' 2 _ k _ t ~fli  1' II _ SS R k I' (6..k .. When II is large.3 for the singleregressor model is that here the dIvi sor is 11 ~ k ~ 1 rathe r lhan n .In unee of Y i ll o t explained by the regressors.TSS ' (6 141 }. and the adjusted R2 (also known as R ~ l.4 Measures of Fit in Multiple Regression Three commonly usel. the SER is a measure or the spread of Ihe distribution (If Yaround the regression line. the regressio n R2. All lh ree stat istics measure how well the OLS estimate of the multiple rcgrcs.: l of the degrees·offree dom adjustment is negligible. SSR . The Standard Error of the Regression (SER) Tbe standard error of the regression (S ER) estima tes the sta ndard deviation of the error term H i' Thus.2.  the total sum of .3. 'i. 13). the SER is SE R  ~.l bias int rod uced by I!stima ti ng twocoefficie nlS (the .· 200 CHAPTER 6 linear Regression with Multiple Regres$(I(1 6. :£7. If there is a sing.. Equivalently_the R2 is I minus the fraction of Ihe \.7_ l itf.explained by (or pre dicted by) the regressors.13) and tht: defm ition of Ine S£R in Section 4.2 (rather than 1/) adjusts for the downwa n. so the formula in Section 4.lope and int ercept of the regression line) .I adjusts for the downward bius introd uced by es tima ting k + 1 eoeft"iciell1 S (the k slope coeffi· cients plus the inte rcl!p t) . using n . is tht: s um of squared residuals. the divisor 1/ . then k =.j. w 1 l ere. In multiple regression...1 rather than n is called a degreesof·freedom adjuslmenl.
.1 T SS = 1 . But this means th at the R2 generally incre ases (and never decreases) when a Dew regressor is added. "nlin.1) increases. unless the e~ l im a( e d coefficient on the added regresso r is exactly zero.\)/(11 . Th is happens when the regressors. sothe . about starting wit h o ne regressor and I h~ n add ing a second. is a modi fie d version of the R2 that does nol neces sarily inc rease whe n a nelV regressor is added. an increase in the R2 does not mean Ihat adding a variabl e actually improves the [it of tb e model. One way to correct {or this is to deflat e o r reduce the R2 by some factor. But if OLS chooses any value other than zero. the R2 gives an inflated estima te of how we ll the regression fits the data. 'r (6. When you U51.k . 15) by (or prt: of the vari ~ with a sin (6. Tn this sense.. TIle Til = I _ he defin i the divi 'r than n} the slope ts for the pc cocfii II is called " l. taken toge th e r. SO]?2 is always less than R2.k .1) /(11 .6_ 4 Wieowres 01 Fit in Multiple R egrenion 201 lndard <IS Rl ).ller than 1.1)1 (n . the e Heet Ii?· is n I SSR 5'1: n .: OLS 10 estimate the model with both regressors. The adjusled Rl.L th e R2 Clln be negative. the R2 increases whe never a regressor is added.I).I).k . rc~~ ion llion of ution of In mult ipl e regressio n. think. In practice it is extremely unusual for an estim ated coefficient to be exactly zero. Ihis mea ns thai the a djusted R 2 is 1 minus the ra tio of th e sa mple variance o f the OLS residua ls [wilh the degreesoffreedom correct ion in Equution (6. OlS finds the values of the coefficient s that minimize the sum of squared res idual ~ If OLS happens to choose the coeffi cient on the n ~ w regressor to be cxactly zero. and this is what the adjusted R 2. add ing a regresso r ha s two opposite effects on the R2.usted n2" Because the R2 increases when a new variable is added. . To see this. does.13) The "Ad. On the o ne hood.(n . Second .1) /(n .the factor (n .As the second ex pression in Equation (6.k . then the SSR wi ll be the same whether or no t the seco nd va riable is included in the regress ion. Whe ther the 'R2increases or decreases de pends on which of these two eHects is stron ger. First. O n the other hond.14) is tha t the ra lio of the s um of square d residuals to the Lmal sum of squares is multiplie d by fhe faclor (n . the SSR taIl s.1 4 ) total sum of The differe nce b~t wee n this Connula a nd the second defin.k .ition of tbe R2 in Equa tion (6.. or /i. which increases the R2. red uce Ihe sum of squ ared res id uals by such a small amount that Ihis reduction fails to offset the factor (11 .2. or "R 2. so in general the SSR will decrease whe n il new regressor is added.·alue reduced the SSR relative 10 the rcgression that excludes this regressor.15) s hows.13)} 10 the sa mple varia nce of Y Th ere are three useful lhings to know about the R2. (6.J) is always gre . then it must be thaI this .
Compa ring these measures offit with those for tbe regression in whic h PetEl.ng the least squMes assumptio ns of Cha pte r 4 to the case of multiple regressors. this va lue falls to 14. or explain.426.lbles to includcand which to exclude. 424 . Because 11 is la rge and only two regressors appea r ill Eq uati on (6. I . the \'a ria tion in the depen' dent va riahle. the decision about whe tbe r to include a variable in a muhiple regression should be based on wheth er including th aI va ri able alloW S you belle r to estimate the causal effect of interest. TIle SER for tlte regression excluding PcrEL is 18. cent age of English learners (PcrEL).12) re ports the estimated regression line for the mull iple regression rela ting leSl sco res (TestScore) to the stude nHeache r ra tio (STR) and the per. o nly a small iract ion o f the va riation in TestScore is explained . firs!. The R2 fo r this regression line is R2 "" 0.in Chapter 7.5.202 CHAPTER 6 lineor Regression with Multiple Regre~S()(~ Appl ication to Test Scores Eq ualjon (6..6. whe n Pc. th~ difference between R2 a nd adjusted R2 i$ very small (R l "" 0. heavy reliance on the l?2 (or R2) can bot: a trap. 6. the extent 10 which the regressors account for.IEL is add\!d to the regressio n.. Neverlheless. " maximiu Ihe Rh is ra rely the a nswer to a ny economically or sHlt islically meaningful queslion.11 )J shows tha i including PerEL in t he regression increased Ihc R2 from 0.5 when PCIEL is included as a second regressor. ion with o nly STR as a regressor. The units of the SER a re points on the sta ndardized test The reduction in lhe SER tells us tha t predic tions about stan· da rdized lest scores are s ubsta ntia lly morc precise if they are made using the regression with both STR a nd PC:IEL tban if they a re made using the regrc~ . In thi s sense. we need to develop methods for quantifying the sampling unce rt:"linty of the OLS estimator. Using the Rl and adjusted R2. Whe n lht: only regressor is STR. The starting point for doing so is e xtcndi.051 100.426 ve rsus"R2 "" 0. The R2is useful because it quanti rie. is excluded (Equatio n (6.6%) of Ihe variati on in test sco re~ is explain \!d..41. mo re than twofifths (42.5 The Least Squares Assumptions in Multiple Regression lliere a re fo ur least squares assumptions in thc multiple regression moJ:. the adj usted R2 is R2 = 0.424). We return to th e issue of hoW 10 decide which vari. In <lpph· cations. including the percentage of E ngli sh learn ers substant ially improves the fit of th e regression. however. and the stand ard error of Ihc regression is SER "" 14. however.12).instead.(.
TIle second assumption is that (Xli" ..) . the OLS estimator of the coefficien ts in the multiple regression model can be sensitive to large outliers.TIlis assump tion means that sometimes Y i is above the population regression line and someti mes Yi is below the population regressi on li ne. the expected value of II. !ssion e peT j0. ertaintyof ast squa re5 Assumption #4 : No Perfect Multicollinearity The fo urlh assumption is new to the multiple regression model.1his assum ption holds automaticall y if the data are collected by simple random sampli ng.5 The Least Squores Assumptions in Multiple Regression 203 (Key Concept 4.5. l 11e commellls on this assump tion appearing in Section 4. . ca ll ed perfect multicollinearity. n Are i.5 ~o tn t s on lout stan ~sin!1.Xki. YJ . 'In is assumption extends the first least squares assumption with a single regressor to multiple regressors. . PctEL ession a small .) random v:uiables.ln appli Assumption #3: Large Outliers Are Unlikely The third least squares assumption is that large o utl iersth at is. This assumption is used to derive the properties of OLS regression sta tistics in large samples. ariahle in ble al\ows ue of how ter 7. X /. is zero. . Therefore..d . ~atistica\lY .6. .3 for a single regressor also apply to multiple regressors. observations with values far outside the usual range of lhe dataare unlikely.3). This assumption serves as a reminder that.426.) < 00 . ex tended to aHow for multiple regressors. We rNurn to omitted variable bias in mulliple regression in Section 7. As is the case for regression with a single regressor. p.I • •• I X ki Has a Mean of Zero The first assumption is that the condi tional distri bution of II i given Xl i'" . Y. for any value of thc regressors.i.. but on average over the popu lation Yi fall s on the popul at ion regression line. It rules out an inconvenient situation.d.. . ~gresstOn ~ tifie s the re de pen . Another way to state this assumption is that the dependent variable an d regressors have finit e kurtosis. .: i . . ~ to \4.i. s S£ R Assumption #1: The Conditional Distribution of U j Given X liI X 2.u has a mean of zero. . The fourth assumption is new and is discussed in mOTe detail. have nonzero finit e fourth moments: 0< E(X1. . X . 11 are independently and identically distributed (i. The assumption that large outliers are unlikely is made matllcmatically precise by assuming that Xli' . the Assumption #2: (X ii I X 2il • •• I X k i ..0 < E(X l. . and these are dis cussed only briefl y. First.) < 00 and 0 < £(Y . as in singleregressor case. this is the kcy assump tion that makes the OLS estimators unbiased. in which it is impossible to • . and Y. i = I. added bores is antially ~ress oTs iT)' small .) < 00• . i = 1.
. The regressors are said 10 be perfectly mu1li co ll in~ar (or 10 exhibit perfert multicollinearity) if one of the regressors is a perfect linear funclion of the o the r regressors. and STR j • This is a case of perfect mull icollin earily beca use on~' o( t he regressors ( th e first occurre nce of STR) is a perfec t linea r fuoction of anoth... Why does pe rfeel muLticolLioearilY make it impossible 10 compu te the OLS estimator? Suppose you wam to estimate the coefficient on STR in a regre~~io n of Tes/Score. Large outliers a re unlikely: X l i •. uted (Li.<.. At an imuili ve level. • ! _ _ ." are independently and identica lly distrib. · .I()N M()nFI  Y. thl! coefficient 00 one of the regressors is thc effect of a change in lh:l! regressor. .P1 6. ' _ u _ _• . J::(II. _ . In the hypothetica l regression of TesfScon' on STU and STR .: Ihal is. hold ing the olber regresw rs const ant. you reg. if you try to estimate this regrt:~ · sion the sohware will do one of three things: (1) II will drop one of the occu rre nce~ of STR.. 2.. Depending on how your suft ware package handles perfect multjcollinearity.F<.exce pl tbat you make a Iypogra pttieal error Jud accidenta lly type in STR. perfect muhicollinellrit).n..IX!i' X 2.. holding constanl STR.4 THE LEAST SQUARES ASSUMPTIONS IN THE MUlTIPLE REC..where L ". The (ouClh least sq uares assumptio n is Ihal Ihe regressors are nOl perfectly multicollinear.. . Vi)' i "'" 1. on STRi and PaEL. on STR.'" ' XJ. .Xk . The mathematical reason [or this fai lure is that perfect multicolli nearity produces divi sion by lero in the OLS formuJufo. (Xli' Xli' . " I comp ute th e OLS estimator.. + f3zX2i + . _ _ _ _ .::ss TeSISt'orc. In mult iple regression.• X ki amI Yi have nonzero fi nite fourt h moments.. I/.h the computer...er regre!.: IhM is. ___ ! __ • _ __ _ !.sor (the second occurrence of STR). a second time instead of PC/EL.I ! 204 CHAPTER 6 linear R egression with Multiple Regreuoo ':~:~::. = f30 + J3 IX j. is a problem because you are asking the regression to answer an illogical question . and 00 _____ • _ . . has condit ional mean zero given Xli' X:u..X/ci) = o. . the coefrici em on tbe fi rst occurrence of STR is the effect on t~SI scores of a change in STR . .. 4. + f3. (2) it will re fu se to ca lculate Ihe OLS estimates and give an error me S)~lge : or (3) it will cra:. 111cre is no perfect mult icoll inearity. . . ..) draws from their joint distribution. This makes no se nse.xk/ + ff"i = t. 3.d.
in large samples.6... differe nt sample. In large samples. :t lnd OL S on test . tb e OLS estimators ~I)' ~l' . A dditi o na l examples of p erfect mu ltico lli nearity ar e give n in Secti o n 6. the cen tral limit theorem applies to the OLS estimators in the m ultiple regression mode l for the sa me reason that it applies to Y and to lhe OLS estima tors when the re is a single regressor: The OLS est imators bo o ~ \..7 . wruch is the exte nsio n of the bivariate normal distribution 10 the general case o f rwo or mo re joinlly norma l random variables (Section 2. it o ft en refl ects a logica l mistake in choosing the regres sors or some previously unrecognized feature of the da ta set.::ssag. A lthough the algebra is more complicated when there a re multiple regresso rs. and if the sample s ize is sufficie ntly large the sampling distribution of those averages becomes normal....4. The least squarl!s a ~umptions for the mult iple regression mode l are SUmma· rized in Key Concept 6. .4 tha t. th is varia lion is s umm a rized in the sampling distri bution of the OLS estimators.• {3" in the linear multiple regres sio n mode l.b". 6. That is. {3 I. This example is typical: When perfect multicollinearity occurs. the OLS estimat ors (~() and ~l ) are unbiased and consis tent estimators of lhe un known coefficients ([30 and (31) in the linear regress io n model wi th a si ngle regresso r.. .p. This variation across possible sa mples gives rise to th e unce rtainty associated with the OLS estim ators of the population regressio n coe fficients. the joint sa mpling distribution of '~k is well appr oximated by a muh iva ria te norma l d istributio n..4. under the least sq ua res assumptions of Key Concept 6. R eca ll fr om Section 4. hold )c{Jre 0 0 . th": ()T .. which also define s and di scusses imperfect multicollinearity.. . produce different \lalues of the OLS es timators.6 The Distribution of the OLS Estimators in Multiple Regression Because the data differ fro m one sample to lhe next . '~" are unbiased a nd consistent estima tors of 110 ' {31> .. . the solu· tion \0 perfect multicollinearit y is to mod ify the regressors to e liminate the problem. In additio n. are averages o( th e randomly sa mple d da ta. is that ~' OU art' :ion. under the least squares assu mptio ns. Because the multivariate normal l. 13k" Just as in the case of regression with a sin gle regressor.4).6 The Distribution of the OlS Eslimotors in Multi ple Regres~ i on 205 The solutio n to perfect multicollinearity in this hypothe tical regression is sim ply to correct the typo a nd to re place o ne of the occurrences of STR with the vari able you o rigina lly wa nted to include.. . . In general.~: Po. These results can)' over to multiple regression analys is..[3o. the samp ling distribution of ~o a nd ~I is well approxi· mated by a bi\lariate normal distribution.
5 P i distribution is best handled mathem atically using mat rix algebra . and PC lEL..A . 6. nor does it imply a logical problem with the choice of regressors. in Equation (6.. the distribUl ion of the OLS esti mato rs in multiple regression is approximately jointly normal. . This section providcs some examples of perfect mu lt icollinearity and discusses how perfect multi collineari ty can arise. . 13k :~~~~~~If the least squares assumptions (Key Concept 6.5.. and the general case is diSCU SSed in Section 18.. 131. 6.5 by exa[ll ini ng three addit ional hypothet ica l regressions. In each.lTj ). 12).y CoNcEPT LARGE SAMPLE DISTRIBUTION Of 13o. Unlike perfect multico llinearity.. the expressioru. for the join! d ist ributio n of the a LS estimators are deferred to Chapler 18. a lhird regressor is added to the regression o f TestScQre.! r~sult th aI . and can bc avoided. in regressions wit h mult iple binary regressors. PI' . Th e joint sampling distributi on of th e OLS estima tors is discussed in more deta il for t he case t hat there are two regresso rs anu ho moskedast ic errors in Appendix 6. on STR.2. then in l(lrge samples the OLS estimators ~o. the OLS est im ators arc co rrela ted. However.5 summarizes Iht.j = 0..4) hold .hut not perfectly corre lnted. Key Concepl6. perfect multicollinea rity arises when one of the regres sors is a pe rfect linear combi nation of the o the r regressors. ~1 arc jOin tly normally distributed and each is dis tributed N(P. In general.with th e oLher regressors. Imperfect multicollin earity arises when one of the regressors is very highly correlated. this correlation arises (ro m the corn: lalion between the regressors.. . impe rfcet multico llinearity does no t prevent estimation of the regress ion .. .7 Multicollinearity As discussed in Sect ion 6. it docs mean that one or marc regression coefficients could be estimated imprecisely_ Examples of Perfect Multicollinearity We continu e the discussion of perfect mul ticollineari ty from Section 6. k.2 .. in large samples.• 206 CHAPTE R 6 linear RegrElS$ion with Multiple Regresson K. .
If the variable Fr(/CEL . and PcrEL" the regressors would be perfectly multicollinear. the perfect linear relationship among the regressors involves the constant regressor X o... .2. There are in [acl no districts in our data set with STR i < 12. Thus we can write NVS. per fect lfluJticollinearilY is a statement about the data sel you have on hand. gres vides ulti inary very nlike al ion ~~ors.PctEL j • . This il lus trates two important points a bout perfect multicollineari ty. this question makes no sense and OLS cunnOI answer it. be the fraction of English learn e rs in the i ll! district. Example #3: Percentage of English speakers. the n one of the regresson.: For every district. In ne Ola a nd d in Example #2: " Not very small" classes. NVSj = 1 fo r all observations. Th e reason is that PcrE L is the per· cenrage of English learners.At an intu itive le vel ." specifi call y. it is impossible to compute the OLS estima tes of lhe regression of Tes/Score j on STR h Pc/EL. While il is possible to imagine a school dist rict with [ewer than 12 students per teacher. PerES. . specifically. OLS f:lil s beca use you are asking. Th is regression also exhibits perfec t multicollinearity.Thus one of th e regressors (PcIEL. A gain the regressors will be perfectl y mult icollinea r. there are no s uch dis tric ts in our da ta set so we cannot anal yze them in our regression. that eq uals 1 {or all i. hol ding constant the fraction of English learn· ers? Because th e percentage of English learners and the fraction of English learn e rs move together in a perfect linear relationship. First. XOj. for every district.7 Multicollinearity 207 Ie i'" m Ion r Let FracEL . mated Let NVS j be a bin ary variable that equ als 1 if the student. whieh v<lries between 0 and 1.ressioll includes a n inte rcept. Because of t his perfect multicollinearilY. exit l1l . but for a more subtle reason than the regressi on io the previous exa mple. wht'n the reg. that can be implicated in perfect multicollinearity is the constant regressor X o.) can be written as a perfect lin ear func tion of anoth er regressor (FracEl_ i ). = 100 x X\Jf . aod FracELi. it eq uals Xo. defined to be the percentage of stu· dents who arc not Englisb lea rners. Example #1: Fraction of English learners. added Let PCIES.6. = 1 x XOi for a l1 lhe observa tions in our dala sel.. as you can see in the scatterplot in Figure 4. be the percent age of "English speakers" in the jIb disuic t.teacher ratio in the jlh district is " not very small. 2:: 12 and equals 0 otherwise. so that Pc/ELi = 100 X Frac EL.6). NVS j eq uals I if STR. Thus. Second. were included as a third regressor in addit io n to STR. the smallest value of STR is 14. Now recall [hat the linear regression model with a n intercept can equivale ntly be thought o f as including a regressor. N VSi ca n be written as a perfect lin· ear combina tion of the regressors. Like the pre vious example. Wbal is the effect of a un it change in the percenrage of E nglish learners. as is shown in Equation (6. thai is.
Each d istric t fa lls inlo one (and o nly one) category. all G binary regressors can be inc luded if the interce pt is o millcd fr o m the regrcs~i on.: mistake is easy to spot (as in the first example) but sometimes it is not (as in tbe: second example). the n the coefficient on Suburban.. if each obse rvation fa lls in to one and amy o ne category.1 of the G binary va riables are i. then the regression wi ll fail because of p. For example. AnOlher possible source of perfect muhl colLinearity arises when multiple binary. lhe regressor Xflj) or PctEL. a nd urba n.. r SlIbw'balt. This situatio n is called the dummy variable Inlp..:r feet muhicolline<lrity.... Ihe regressors will be pe rfect m ulti collinearity: Because each district ~Iongs to one and o nly o ne ca tegory.. Le t these binary variables be Rurali _ which equals I for a rural district a nd eq ualsOolberwise:SLlburbanj.. you must exclude one of th ese fo ur variables.... Ahernative ly. lhe constanl term is relained. in which case one of th e binary indica· la rs is exclude d . or dummy. H e ithe r the intercept ( Le... Multiple Regressors This example illustra tes another point: petiee! multicollinearity is a feature of the e ntire SCI of regressors. ho ldiog cons ta nt the o the r regressors..• 208 CHAPTER 6 linear Regression wit!. ... Tn one way or another your software will le t you kn ow if yOU make such a m is take because it ca nno t compule tbe OLS estima to r if you h9\ ':· When your software lets you know th a t you have perfect mu ll icollineanlY. a nd at a minim um you will be ceding control over your choice of regressors to your compute r if your rcgre ...(. jf lhe rc a re C bina ry variables. + Urban.ual wa y \0 avoid the dummy variable trap is to exclude one of the binary variableS from the multiple regression. so on ly C . =0 PerCcet multi coll inearity typi CAlly arises when a mistake has been made in specifyi ng the regressio n... SOT S. ho lding constant tbe othe r variables in the regression .soli . suburban.. suppose you have panitioned the school districts into three ca tegories: rU(31...... a nd Urban . were excluded from Ihis regression .11. and if aU G binar: variables are incl uded as regressors.. Somctintc~ th.. 111US.ncloded as regressors. The dummy variable trap. By conventiOn . The u<. In Ibis case. .. eithe r one of the binary indicators or the constant term. variables are used as regrc<... For ~xample. . Solutions to perfect multicollinearity. If yOli include a ll three bina ry vari· ab ies in the regress ion a long with a conslaDl . In ge ne ra l. ~t is imponant that you modify your regression to elim inate it.. tht: regressors would not be perfeclh multicoll inea r. we re excluded . the coefficien ts o n the include d binary variables repre sent the incremental effect of being in that category. Rllral... relative to the base cast' of Ihe omitted ca tegory. would be the average difference between test scores in suburban and rura l dist ric l:'.. if there is an intercept in the regression. 10 eSliro3tc the regressio n. 1..1" .6).. Some software! I) unreliable when there is perfect multicollinearity.. J = XUj' whe re XIJj de no tes the conSlrmt regressor intra· d uced in Eq uat ion (6. if Rural.
the closer is th is term to zero and th e large r is the variance of (31' More generally.x!" whe re Px. impe rfeci m ulticollin earity is not necessarily an e rror.) for the special case of a ho moske dastic e rro r. tl }' It i 'es· ree ne) . in the sense thatlhere is a linear func tio n o f the re gressors th.la ta to estima te the pa rtia l e fftet o n test scores of a n increase in PC1EL. the perce ntage th e distTict 's resid ent s who are first ge neration im migra nts. t. Imperfec t mult icollinearity d oe s nOi pose a ny pro ble ms for the theo ry of the OLS esti. tlle variance of {31 is inverse ly pro portio na l to 1 .net 'ari iulti I. Because these two variables a re highly correlated. it o va re I ~ will tM: ress ots .The la rger is U le correlation betwee n the two regressors.cs thl! il1 the ir yOU have. the data sel pro vides li ule informa tion abo ut wha t happe ns 10 lest scores when the percentage of Eng lish learne rs is low hut t he fraction of immigrants is high. will be imp recisely est imated.HScore o n STR . . th ey will have CI large sampling vari a nce. is the correlation between XI and X 2.se of I' all G :ssion. means that two o r more of (he regressors a re highly co rre la ted. imperfect muhieoll inearity is conceptually quite differ e nt th an perfect multicoll inearity. arilY. The effect of imperfect mul ticol\ineo rity on the variance of the OLS estimators ca n !:'Ie see n m a t ~ ematica lly by inspec ting Equation (6. icall)' . yo ur da ta . th en the coefficients on one o r more of these regressor.lnd Pcll::L Su ppose we were to add a third regres· sor. a purpose of OLS is to so n o ut the independent influences of the var io us regressors whe n these re gressors a re po tentially cor related. In o the r words.x . but ra the r just a fe ature of OLS. If the least squares assumptions hold. 7 Multkolineority 209 of Imperfect Multicollinearity D espite its sim ila r name. In this case. however.] n contrast . ho ld ing consta nt the percentilge immig ra nts.at is highly corre la ted with a nothe r regressor.2. so the vari · IEL a nd pt reeot age immi gra nts will be highly correlated: Districts with a bies PC ma ny re cenl immigrants will le nd 10 have ma ny stude nts who arc still learn ing E ng lish. If the va riable s in your regressio n a re the o nes you mea nt to includethe o nes yo u chose to address the po te ntial fo r omitted varia ble biast hen impe rfect mul ticoll inea ri ty impJic£ tha t it will be d iffic ull 10 estima1e precisely one o r mOTe of thc parlinl e ffects usi ng Ih e da ta ::It hnnd .6. it will have a larger variance than if the regressors PCIEL and percent age immigrants we re uncorre lated.P}. By ica t on and one inar)' per· usual abies l uded eprc . then the OLS estima tor of the coefficient on PclEL in this regreSSion will be unbiased.1 7) in Appendix 6..Ih a t is.ma to rs. indee d. Imperfe ct multicoWnearit). which is the variance o f III in a multiple reg ressio n with two re gressors (Xl a ~d X.. Fo r example. If the regressors are imperfect ly muhicoUinear. + lro eof .. whe n multiple regressors are imperiectly multico ll inear. Firstgenera tion immigrants ohen speak E nglish as a second language. consider the regresslon of Te. then the coefficients o n alleast o ne ind ivid ual regressor will be imprecisely estimate d. it would be d ifficuJt to use these <. Perfect multi colli nearity is a problem that often sign al s th e presence of a log ica l e rro r. and the q uestion you are trying to answer. o r vice versa.
8 Conclusion Regressio n with a single regressor is vuloe rable to o mitled variable bias: If an omilled variable is a de te rminant o f the de pendelll va riable and is correlated W ith Ihe regressor. X I' X 2. The coeffici ent 131 is the expected change in Y < Issociated \vith a one.5Of~ 6 . Do ing so red uced by halt the estim ated effect on t e~ t scores of a change in the student teacher ra tio...4 are satisfi ed .J.. holding Ihe other regressors constant. The statistica l theo ry of multipl e regression builds on the statistical theory uf reg ression with a sin gle regressor. When the four least squares assumptions in Key Concept 6... plu s a fourth assumption ruling uut perfect mulli collinearity. Muh iple regression make) il possible :0 mi tigate omilled varia ble bias by ind w.. which occurs when one regressor is a n exact linear fu nction (If the olher regressors. The other regn.. X k.. In the test scort: exa mple. then the O LS es timator o f Ihe slope . 4. 13 1. s io n coefficie nlS have an analogo us interpretation. the OLS estimators have a joint sampling dis tribu tion and . us ually arises from a mistake in chuosing wh Ich . The least sq uares ass ump tions for mu lt iple regn:ssion are extensions of the three least squares assum ptio ns (or regression wi th a sin gle regressor.IJOil change in XI. . have sampling uncertainty. <lod the ways to do so in the mult iple regre ssion model 3re the lopic of the next chapter. ing the omitted varia ble in Ih e regression. . 3. and no rmally dislributed in large sam ples.. the OLS estimato . a ~ unbiased .13k .. Associated with each regressor is a regression coeHi· cie n! . 2. there fore. holding co nstant the perce ntage of En \! li sh learne rs. Beca use the regression codficients are estimated using a single Sam p le. .:oefCicienl will be biased and will re neel both lhe effect of tile regressor fi nd the e ffect of the omitted variable.f English lea rners as a regressor made it possibl e to es timate ute effect on test l)Corcs of a change in the student.. Q milled va riable bias occurs when an omiued va ria ble ( 1) is correlated with an included regressor and (2) is a determ inant of Y.• 210 CHAPTER 6 linear Regreuion with Multiple Regres. The coefficie nts in multiple regression can be estimated by QLS.teac he r ratio. consistent. XI in multiple regressio n is the part ia l effect o f a change in X I_holding co ns tant Ihe olher included regressors. . 13l . The coe ffi cie nt on a regressor.. Summary 1. includ ing Ihe percenlage I. Perfect mu lt icollinea rity. The multiple regression mooel is a linear regression model tb al includes multipk regresso r:. Thi s sampling uncertainlY must be quaoti[ied as part of an e mpirical stud y..
nclude in a m ulliple regfes~ i o o . she rcgrcsse~ di strict average test scores on the num be r of compu ters per student. . ~t lineal g whid' l " . The standa rd error o f Ihe regression. (193) coefficient on X l i ( 193) slope coefficient o f X2i ( liJ3) coefficie nt on X 2i ( 193 ) con tro l varia ble (193) holding Xl conSta nt (194) cont rolling for X (194) pa rtial effect (1 94) popu lation mul tiple regression model (195) conslliat regressor constant term (195) homoskedast ic (195) he te roskt:dastic (195) OLS estim ators or f3o. a nd the J?2a re measures o f fi t fo r the muhiple regression mode l. ~he fou f IO rS ~Ire I .3 Explain why two perfectly multicollinea r regressors cannot be incl uded in a linear mull iple regression. an coe W· cd with Using school district data like that used in this chapter. {3. Key Terms omitted variable biaS ( UP) multiple regression model ( 193) popula tion regression line (193) population regression fu nction (193) inte rcept (19 3) slope coerticient o f XI. be an unbiased esti mator of the effect on test scores of increasing tbe number of computers per student? Why o r why nol? If you think 13 1 is biased. 201 ) perfect multicollinearity or to exhibil pe rfect muh icollinea rilY (204) dummy va riable tra p (208) sam have ofa n \he Review the Concepts 6._ Wha t is the expected c hange in Y if X I increases by 3 un its a nd X 2 is uncha nged? What is the expected cha nge in Y if X l decreases by 5 units and X I is unc hanged? What is Ihe e xpecled change in Y if X I increases by 3 units a nd X 2 decreases by 5 unilS? 6. regrCC..2 A mulliple regressio n includes IWO regressors: YI :: f30 + I3\X •• + ~X2J + 11.1 A researcher is in terested in the effect o n test scores of co mputer usage.1.Review 1M Concepts 2 11 an regresso rs to i. is it biased up or down? Why? 6. Solvin g perfect m ulticollinearit y requires changing the set of regressors. the R2. G ive two examples of a pair of perfectly multi collinear regressors. .(Jk (197) OLS regres!iion line (197) predic ted value (197) O LS residual (197) R! and adjusted RZ(RZ ) (200._ . Will (3 ... 5.
I.(2 are highly correl ated . O o the rwise) South '= bin<lTY variable (I i [ R egion "" South . ing X 2 conslanl. computed usi ng d ata fo r 199R from the CPS.'SI = b inary va ria b le (J if Regio n = WesLOothe rwise) 6. Is age an important determinant of earnings? Explain. a nd number of ch ildre n.4 Using the regression resuhs in column (3): fl.h inary variable ( I if R egion = Midwest . O o therwise) W(. Do meD earn more than wome n on ave rage? H ow muc h morc? 6. Do there appear to be important regiona l diffe rences? h. Sa lly is 29year·old fem ale college graduate. ment for each worker was either a high school diploma or a bachelor's degn=e. orkcrs with only high school degrees? How m uch more? b.4 Explain why it is diffic ult 1 0 t. marital ~Ia· IUs. Why is the regressor West omitted rrom the regression? What would happen if it was included'! . if X I and. The data set also contained in fo rmatio n o n the region of the country where the person Jived. Predict Sally's and Be tsy's earnings. Exercises The rirsl four exercises refe r LO the table of estimaled regressions on page 2 1~. 0 if male) Agt" = age (in ye:lrs) N ln easf = binary variable (1 if Regio n = North east.O if high sc hool) Female = binary variable (I if fem ale. 6. a Olherwise) Midwesl =.212 CHAPTER 6 linear Regre~ion with Multiple Regres~ 6. 'Inc da la sct consists of infor ma tion on 4000 Culltime fullyear workers. than l. h. The highest educational achie'y e.1 C ompute R1 for each of the reg ressions.3 Using the regression results in column (2): a. For the purposes o f these e xercises let A H E = average hourly ea rning!> (in 1998 dolla rs) College = binury variahle ( I if college. The worker's ages ranged fro m 25 10 34 years. on average.:Slimale precisely the partial effect of XI' hold.2 Usin g the regression results in co lumn (1): II. Do workers with co llege degrees earn marc. 6. Betsy is a 34y(? arold female college graduate.
4(0) " 4000 4(0)  c. Norlhe3s1 (X~) Midwest (X 5) South (X6 )  0. BDR denote arold lhe num ber of bedrooms. SER = 41. :{ info r I Ieh'\e\'e degree. .194 0.48alh + O. \ workers !re? 6. 176  . Lsiz.21 0.62 0. + O.156Hsiz.09OAge .72. 801ft denote the number of bathrooms.2.Exerci~~ 213 I hold Results of Reg ressions of Ave rage Hourly Eornin9 ~ on Gender ond EduCOhon Binory Voriobles ond Other ChoracterishCS Using 1998 Octo from the Current PopulotiOfl Survey Dependent variable: aven:lge hourly earnIng.2 + O. Jen nifer is a 28yearold female college graduate from the Midwest.190 6.44 .60 . 0. Hsize de no te the size o f the ho use ( in square feet ). College (X I) 5.64 2.ssor ( 1) (2) (3) Ige 2D.5.62 2.29 0.027  Inicreepi Summary Statistics 12. Reg".29 .e + O.27 0.e denote the 101 size (in :.69  0.5 Data were collected from a random sample of 220 home sales fro m a com munity in 2003 . Jua nita is a 28yearold fema le college graduate (rom the South .".46 5.69 '4<) 3.22 0.8Poor.quare (eel ).4HSBDR + 23. and Poor denote a 0 J if the condition of the house is reported as binary variable that is equaJ 1 "p OOL" An estimated regression yie lds • i\at wo uld 'PriCe = 119. Cal culate Ule expected difference in earnings between Juanita nnd Jennifer.. 'R2 .75 SER R' R' 6.48 5.Age de note Ihe age of the house (i n years).o tained rltal sta r::FeOla!\: (Xl ) Age (X. ..OO2Lsize ~ 48. lAME ). Let Price denote The selling price (in $1000).
Compur e the R2 for the regression..lata tha t need to be oolicCled and tbe appropliate sta tist ical techniques for ana· Iyzing the data. Your c ritique should e xplain any proble ms witb tbe proposed resea rc h aDd descr ibe how the resea rch pla n might be improved . wh ich inc reases the s ize of the house by 100 square fee t. Include a d iscussion of any addi tional !.n Equa tion (6.S. What is the elCpcctcd increase in the value of the house? c.ich variables would you add to the regression to conlrol for imponant omitted variables? b. Explain why this regression is likely to suffer fr om omitted variable bias./s crime rate on the (per capita) size of the county's police force.214 CHAPTEII 6 l ineor Regression with Multiple Regressors a. A researcher is interested in determining whether ti me spent in pri. elh nicilY.\'ho have never served time in prison.on has a permanent effect on a person's wage ratc.7 Critique each of Ihe following proposed research pla ns. U. Suppose that a homeowner converts part of an exist ing family room in her house into a new bathroom. The researcher theo plans to conduct a "differ ence in means" test to determine whether the average salary for wome n are significantly less than the ave rage salary for men. 6. W'b.6 A researcher plam to study the causal effect of police o n crime us ing data from a random sa mple of U. th e researcher collects salary and gender jnforma tion for all of the firm's engineers. Suppose tha t a homeowner adds a new bathroom to her house. age.or unde restimate the effect of police on the crime ralc. A.1 ) to delermine whether the regression will likely over. What is the expected increase in the value of (he house? b. Wha t is the loss in value if a homeowne r le ts his house TUn down so tha t its condition becomes "poor"? d. He collects data on a random sam ple of people who have been out of prison fo r a t least [if tee n yea rs. tenure . (That i\ do yo o trunk that PI > III or /31 < 131?) 6.. counties. Use your answer to (a) and the expression for omitted variable bi a~ giVe\l i. He collects similar data on a random sample of people \. education . He plan s \0 regress th e COll n!. A researche r is interested in delermining WbeUler a la r g~ aerospace firm is guilty of gender bias in setting wages. a. To deTermine potential bias. gender. The data set includes information 00 each pe rson's curre nt wage.
var(II. .11 (Requires calculus) Conside r t he re~ ress i o n ri ~oi1 on ' fif Ie \liM ·. 111is calculation was then repea ted for people sleeping six hours. the variance of PI i ~ Jarger than it wo uld be if X l an d X2 were uncorreialcd. Based on this summary.1 million observations used fo r this study came from a random survey of Americans aged 30 to 102. Compute the variance of 13}· [Hint: Look at Equati on (6. and so on).on 0 11 t model Y j = f3 1X li + f32X 2j + u. Xli' X 2. Assume that X l and X l are uncorrelated .2 : . Assume that (or(X}. Does this estimator suffer from omitted variable bias? Explain. X 2) = 0. 17) in the Appendix 6.) satisfy the assumpti ons in Key Co ncept 6. .no (Notice that there is no eonstant term in the regression . and ullion status. in addition.) co ial of "r b. Tne researcher plans to estimate the effect of incarceration on wages by regressing wages on an indicator variable for incarceration. . would yo u reco mmend that Ame rican!. X li) salisfy the assumptions in Key Concept 6.5.llms.) = 4 and var(Xli) "" 6. union status. occupation. Comment on the folloA wi ng Malements: "When X I and X z are corre laled. ata y's s ely i. You are interested in f3 1• the causal effect of X I all Y Suppose that X I and X l are ullcorrelated . The 1. and higher than the dea th rate for people who sleep five or fewer hours. if you are interested in # 1' il is beS1 to leave X ] out o f the regressio n if it is correlated with X l'" 6. Co mpute the varia nce of ~I ' c. ted 6. as well as whether the person was ever incarcerated. tenu re. inclu di ng in the regression the other poten lial determinants of wages (education . and so on. a. IX 1" X 2. oukl tho: dat<l ana 6.) Fol lowing analys is like tha t used in Appe ndix 4. who sleep nine hours per night consider reducing their sleep 10 six or seven ho urs if they wam to pro long the ir lives? W hy or why not? E xplain .10 (Y i .Exercises n 21 S (time in curren t job). 6.8 A recent study found Ihal the death nHe fo r people who sleep six to se ven hours per night is lower than the death ratc for peopl e who sleep eight or mOTe hours.4. The death rate for people sleeping seven hours was calcul ated as the ratio of the number of deaths over the span of the study among people sleeping seven hours to the 10lal number of survey respondents who slept seven hours.9 (Y" X I!.4.2. You estimate f31 by regressing Y onto Xl (so that Xl is not included in the regression). Each survey responde nt was tracked for four years. A random sample of size n = 400 is drawn from the populalion. m ' for i = 1. .
OneCredir.• 2'._ Show that the teast squares estimators Empirical Exercises E6. b.P2X2' + PIX!.. a. CHAPTER 6 Uneor Regression with Mo~ip\e Regressors a.. B lack.X] . include as additional rcgn:~' sors Bytest. 11.. inc lude as additional regressors Imra. . In partic ular. "" O. Suppose thaTthe model includes an intercept: y / :: f30 satisfy P I) :: Y . Pre' dict Professor Smith's course eva luation. ClIeBO. the student's famil y. carry oul the fo llowing exe r ci s~s.1 Using the da ta se l TeadungRarings described in EmpiriCA l Exe rcises 42. includi ng some additional va riables to control for the type o f course aod professor c ha racteris tics. Run a regression of ED on Dis/.c<lf0 out lhe following exercises. Professor Smith is a black male with average beaUly <l lld is a natl'ie English speaker. What is Ihe estimated slope? b. OWl/home. Specify the leas! squares funClion tha t is min imized by OLS. and the local labor market In particuJar. + /3)(21 + It. D erive an expressio n for ~l as a functi on of Ihe data(Y"X 1"X2l ). Minoriry. Ru n a regression of UJ/me_Eval on Beamy. E6. R un a regression of years o r completed education (ED) on distance th e nea rest college (Dis /). a. Compute the pa rlia l de rivati ves of the o bjectio n fu nction with respect to b l and b 2• c..2 Using the data sel ColJegeDistance descr ibed in Empirical Exercise ·U. Suppose ~~. J l~~ l xii' d. He teaches a threecred it upper·d ivision course. and SMlllfg80. Female. Fem(lle. . Run a regression of Course _c. and NN English. What is the estimll ted effect of D isl on ED'! . Incoillehi. but include some add itional regrc!< sors to control for characteristics of the student. What is the estimated slope? ((I b.P .·tll on B~tlUly. Dat/Coll. Show lhat PI  17 \ X 1/"y.. Wh at is lbe estimil ted effect of Beamy on Co/me_Enll? Does the r egrc ~s ion in (a) suffe r from im ror· tanl o mitted va r iable bias? c. Hispan ic. I XIi'\'21 O. i::= 1. "* e. Suppose:t~ l XI /X.
attended college. His baseyear composite test score (B y test) was 58.75. E63 Using the datti set Gwu1h described in Empirical Exe rcise 4. g.car ry stance to regres fa mily. o ne standard deviation a boye tbe w ean . Construct a table tha t shows the sa mple mean. does Ihe regression in (a) seem 10 suffe r from importan t o mitted variable bias? d. Include the appropriate units for aH entries. H is mother ses 4.t. Is it large or small in a realworld sense? 4. Jim has the sam e characteri sti cs as Bob excepL (hat his high school was ted ditional teris edit.4. Yea rsSchoo/. lind his family owned a home. His family income in 1980 was $26. Re peat (c) but now assume that the country's value for TradeShare is . Rev_Coups. Assassinations and RGDP60.pect c. Compare the fil of the regression in (a) and (b) using the regression n of !>ta ndard e rrors.) wha t you would have believed? Interpret rhe magnhudes of these coeffic ients. Assassinalions. Whal is th e val ue of the coefficien t on Rev_Coups? Inte rpret the vldue of this coe fficient. A re Ihe signs of their esti rnalcd coefficients (+ or . d. b. E D? Co Use the regression to predicl lh e ave rage annual growth rate for a co untry that has average va lues for a ll regressors.Empirical Exercises 217 . but exclud ing the da ta fo r Malta . TradeShare. Pre dict Bob'!:. ca rry OUI the following exercises. a. Pre· 40 miles from Lhe nearest college. E xplain why Cue80 a nd Swmfg80 appear in the regression. but hi s fath er d id not. I rcgrcs (ldCol/.2. Run a regression of G rowth on TradeShare. of Impor a tive ·se. Is the es tim ated e ffect of Dist on ED in the regression in (b) substan tively diffe re nt from the regression in (a)? Bast:d o n this. a nd mi. Oil. and the Slate aver age manufacturing hourly wage was $9. What does this coefficient measure? tors f.000. Bob is a black male. R 2 and sion (b)? /i 2.'1lle unemployment rate in his county was 7.nim um and maximum va lues for the series Growth. TIle va lue of lhe coeffic ie nt o n DadCofl is positive. Pred ict Jim·s years of complt:ted schooling using the re gressio n in (b).3.5'%. YearsSchool. Rev_Coup. h. standard deviation. years at comple ted schoolin g using the regression in (b). His high school was 20 miles from the nearest col lege. Why arc the R 2 and R2 so similar in regres e. RGDP60.
ltors. X II und X 2 homosktdastic.\i.5.JO) in Ap pendix 4.16) yields Equal ion (6.3. and of t hese limit s iolO EqUDl it. APPE NDIX 6.: \ nri· ance of thls distribu\ion . . 1) l h is appendix prc~'nls a derivation of the formula for omillcd variable bias in Equ31 ion (6. can be writll:n var(u" X II . .xJ = px~rr~ux. (1l 11) .in large samples the sampling distribution of P I is N(PI. _ . 1 ~' .X) ' . and the error ternl l ' .he distribu tion vf th . Whcn there lire two regresso rs.1 ).:. _ X)". SubslilUlion kk~I(Xi  X Y ~ O'~..fT!. Equation (4.." (6.• 218 CHAPTER 6 linear Reg(e$$ion with Multiple Regres500 e. n I [t ( Pi.._ 1 11 Derivation of Equation (6.. . * ~: . tben the formula ~jmplifies enough to provide some insights into . X b ) . 11_1 _ _ f31 = f3 1 +'~ i 1 1 lI L _ (X.· .3 st ates tha I I " . where th.lTj. is ~" /1. if there are OLS estim. IT). (6 Ib) Under the last two assumptions in Key Concept 4.~ .). Why is Oil omiu ed from the regression? What would ha ppen if it weft.X )II.2 Distribution of the OLS Est imators Wh e n There Are Two Regre ssors and Homoskedastic Errors Although the general form ula for the variance of the OLS estim:ltors in multiple regrl!s· sion is complicated.x .::cause the errors are homoskedastic.: induded? APPE NDIX _ _ _6 _. L P.I~ I WO rcgrciWrs (k = 2) und the errors are hnmoskeda..( X. B.. ~ COV (II/. . the conditional variance of /1.1 ).
The variance ujl of (he sampling dis tribu tion of P.and th us the term I . the corre lation berween the OLS estimalOrsp t and ~ is the negative of the correlation betWeen the regressors: corr(j3t.Pk ...\. Ihen ISciosc to I. is small and the variance of P I is 1:lTgcr than il would be if P'r l . If X Land X 2 are highly correlated.{J2) . isthe population correlation between the two regressors X I and X~ and u } is the population \'aria nce of Xt. either positively or negati\O.. 18) (fi. A no ther featu re of the joint normal largesmnple disuibutio n of the QLS esti mators is Ih:lt t WO P l and ~ are in general correlated.· (6.Px l.I~) .Di~tribution of the OlS E~timom W'hen There Are Two Regreuors ond Homoskedoslic ElTOfs 219 where Pxlx . When the errors arc homoskedastic.x: in the denominator of Equ3tion (6. 17) pl . depends on the squared correlation between the regressors..\ ' were close to O.~ly.x.
2 lind 7.teacher ratio using the California tes t St.3 show how to teSI hypotheses thaI involve lwo or more regression coefficients. and confiJe nce intervals.. The general approach to testing s uch "jo int " hypotheses new test stat istic. muJtipJe regression analysis provides a way 1 0 mitiga te lh ~ problem of omilled va riable bias by including addi tional reg ressors. ilb a single regressor to multiple regression.5 discusses ways to approach this pro blem.1 extends th e me thods for statis tical infe rem:e in regression .:lIes o r the e ffec t on test sco res of a reduction in the student. we appl y multip le regression analysis to obtain improved es tim. Thl. lhe O LS estim<l lor has sampling uncertainly beca use ils value differs from one sample 1 0 the nex t. Section 7. in volvc~ a . the F·statistic. in volves two o r mo re regression coeffici ellts..4 ext(!nd~ the noti on of confidence intervals for a single coeffi cienll O confidence seh for mul1iple coeffici ents. O ne new possibility thai arises in multiple regression is a hypoth esis that simultaneousl). thereby controlling for lhe effec ts of those additio na l regressors.:ore data se t. Like nil estima tors.6.CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regression A s discussed in Ch<lpt<:: r 6. Deciding which I'ariables to include in a regression i~ an im port ant practical issue.. In Section 7. Section 7. Sections 7.! coefficients o f the multiple regression model can be estimated by O LS. SlaliSlica l hypolhe!li!l lesls. so Section 7. This chaple r presents me thods fo r quantifying the sa mpling uncertainl)' of the OLS estimato r through the use of s tandard errors.
ho lding constant lhe pe rcenlage of English learn ers in tbe district.imate consistently the standard deviation o f their sa mpling distribution. If Ihe allc rnative hypoth esis is twosided . TIle OLS ei>l imator of the i" regression coefficient has a stand ard devia tio n.1 HypoIhesis TM and Confidence Intervals for 0 Single Coefficieot 221 7. the la w of la rge numbers implies thai these sample averages conve rge to the ir population counte rparts. o r 12 OJ.o \IS. The key ideasthe largesample normal it y o f the estimators and the ability to esl.). and how 10 construct confidence intervals for a single coefficient in a mult iple regression equation.. in the case of a single regressor. there is nothi.2). as in the sludenl. The important point is that..1) . More generally.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient This section describes how (0 compute the standard error. so for example tti . y or.(3). PI OJ1 is the standard error of ~I ' SE(f3I)' an estimator of tbe standard deviation of the sampling dislribUlion of P I.ion ivcs a with" test regressors. we migh t want to test the hypothesis that the true coe[[icient f3. (7.l1le fo mwla for the standard error i1> most easily stated usi ng matrices (see Sectio n 18. how to test hypotheses.g conce ptually different betweeo the single· or multiple· regressor cases. il was possible to estimate the vari ance or the OLS estimator by substituting sample averages for expectations.ndS elS for is an t r 110: f3j = f3 j .o (twosided alternative) . on the regressor takes on so me specific va lue.teacher ratio exam ple. All this extends directly to multiple regression. then the two hypotheses can be wri tten mathematically as ext. which given in Equation (5. {3"o. Under the least squares assump led 10 lhe estimator tions.o comes either from economic lheo. two. as fa r as standard e rrors are concerned. yo( .4)..The null value f3}.a re the same whe the r o ne bas one. I Icrt ~ J. The square root of ~. This corresponds to hypotbesizing that the true coefficient f3 1 on the studentteacher ratio is zero in the population regression of test scores on STR and PcIEL. Standard Errors for the OLS Estimators o Recall thai . H( {3J *.. Hypothesis Tests for a Single Coefficient Suppose tbat you want 10 lest the hypothesis that a ch ange in the studentteacher ratio has no effect on lest scores.7. £rom the decision· making context of tbe application. SE(p. l 11c all lifters P I . and this standard deviation is eSlimated by its standard error.
10 compare the . Key Concept 5.5. tben the nuU hypOlhesis that changing the stude ot.'/ is the value of the (statistic actually computed. Compute the .0 arc computed au tomatically by regression software.statistiC actually computed i' . The theoretical underpinning of Ibis proced ure is that the OLS esti mato r has a largesample no rmal distributio n which." =i" .tcache r ra tio has no effect o n class size corresronds to the null hypo thesis that /31 = 0 (so 111. alternat. irt llK"l! > 1.value testing f3j =....statistic 10 the critical value corresponding to the desired signif· ica nce level o f the tesl.3) . The second step is to calculate the {·statistic using the g\!n' eral fo rmula in Key Concept 5.O" The variance of this distritlU' {ion can be estimated consistently..2) 2. For example.p 22.i. Compute the standard error of /Jr SEcfl). Therefore we can simply follow the same prO"" cedure as in the singleregressor case to lest the null hypothesis in Equation ( 7. The .0 j j '" • J.equivalcnt!y.. As stated in K.1 j.. CompUie the pvalue.:~ Concept 6.statistic.J~ P i The procedu re for testing a hypothesis on a single coefficient in muluplt regression is summarized as Key Concept 7.05 or.~'.Q. KEY CONa" AGAINST THE ALTERNATIVE h /3.1. has as its mea n the h)'P0thesized true va lue. The third step is to compute the pvaluc of the lest using the cumulative nonnal dislJibutJon in Appendix Table 10 r.. the sampling distribution of is approxima tely normal. The standard error and (typically) the [statistic and ". (7.2 gives a procedure for testing this null hypothesis when Ihere is II single rt:g.96."e1}. Ho aga inst the alterna tive H I using a sample of data. f3jJJ 3. _ 13/ I  SE(~i).!.ressor.... 111i5 unde rpinning is present in multiple regression as well..2¢( ! t""cl). and that the variance of Ibis distributio n ca n be estima ted consistently.~~ ~HE HYPOTHESIS i3 = i3 0 _ _..0). ~ (7. 7. Und\!! the null hypothesis the mean o f this d istri bution is I3j. . pvalue =.d Confidence Intervals in Multiple Regre~~ion r. Our task is to test the null hypothe:<. 2 CHAPTER 7 Hypothesis Te~t~ or. under the null hypothesis. • where . Reject the hypothesis at the 5% significance level if the pvalue is less than 0.0 =.. if the first regressor is S TR. The first step in this proced ure is LO calculall! the standard error o{ the coefficient . 1..
The regression of test scores against STR and PCIEL. Iy.645. it sho uld be kept in mind that these methods for qua nti fying the sampling uncerta inly are only guaranteed to work in la rge samples.2 rely o~ the huge sa mple normal approximat ion to the distribution of the O LS estimalor f3.12) and is restated here wit h standard errors in parenthcses below the coefficients: ed in Kt:~ ~Il der til': . CONCEPT 7. controlling for the percent age of Engl ish learners? We are now able to find QU!. it comains the (rue value of /3.. (lnd we adopt th i~ simplified nota tion fur the rebt of the book.1 1. Equivalently.tc':lcher ralio.. lhat cannot be rejected by a 5% twosided hypothesis les!. When the sample size is large.3) CONFIDENCE INTERVALS FOR A SINGLE COEFFICIENT IN MULTIPLE REGRESSION A 95% (wosido.!' the null ~ria n cC (If Application to Test Scores and the StudentTeacher Ratio Can we reject the null hypothesis that a change in the studentteacher rat io has no erfect on test scores. n \ll ultipl. was give n in Equation (6..965£(/"91. it is the set of values of P.2) (7./ in this Key Conce pt.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient 223 (7.1.7.s distritlU' same pro ~tion (7....4) A 90% confidence interval is obtained by replacing 1. 1111S method is summa rized as Key Concept 7.1. K.I.2. in 95% of all possible randomly d rawn samples. estimated by OLS.2  "" [~i . natIve ~ tha i the C dsign. ~angi ng r he null lesis Ho Confidence Intervals for a Single Coefficient T he m e thod for constructi ng a con fide nce int e r val in the multiple regression model is (lIsa the same as in the si ngl eregressor model.96Sf. 1 and the method for constructing a confidence interval in Key Concept 7. fSis at t~·:~ denoted [.e of the I . (7. e nthere lt3ndard the gen r .16 in Equation (7. is a n interval that con tains the true value of [J/ wit h a 95 % probability: that is. Accord ingly. the 95 % confidence interval is 95% con fidence interval for f3.(i3)'~j + 1. il is customary to denote thI s simply as t. once we control for the percentage of English learners in the district? What is a 95% confidence interval fo r the eft"cct on test scores of a c hange in the stude nl.4 ) wilh 1. The mEthad for cond ucting a hypot hesis test in Kcy Concept 7.: bmput ed IS .!d confidence interval for the coefficient {3. Howeve r.
lI she is to hire more teach ers. A 95% confidence inle rva l (or the popula tion coefficienl on STR is 1.2). we ca n be 95% confide nt Ihal the true value of the coefficient is be twee n .. Inl e rprelt~ d in the conte xt of Ihc s uperinte nde nt 's inte rest in decreasing the SlUde nl. Now.656 X PuEL.0)10. holding expendi tures per pupil (and the per centage o f E nglish lea rne rs) constant ? Th is question ca n he addre.032) where Expn is total annual expenditures per pupi l in the district in thousatlds of dollars.% X U.95 X 2.I''''_ _ __ _.~ d by estimat ing a regression of lest scores on the studentteacher ralio.1.95 a nd .5) (0.hi' t.1 %.95.11) z: 1. (R. and th e percent3ge of English leam ers:111e OLS regression line is ~ = 649. the ' statistic is' = (.reduced maintenance.224 CHAPTER. changing tbe student.cry sma ll effect on lest scores: The estima ted coefficien t on STR is \ \0 I~ Equation (7.1.6 . significance level (bu t not quile at the I % significa nce leve l). is the effect on test scores of reducing the studelllteacher ntio. reducing class s ize will help test scores in her district. . which.O?Q M nrrnvrr .59) (0 . thc 95% confidence interva l for the effeci o n test scores of this reduction is (.0 ..3. Ihal is. a nd so on). Because the null hypothesis says that the true value of this coefficient is zero. however.. taxpayers do nOI favor.0.S) (0. 0.1.90.031) To test the hypothesis that the true coefficient on STR is O.'1 at which we can reject the null hypothesis is 1.1.52).5) l1as persuaded the supe rinte nde nt th at.1 0 X STR .lir rnr u""lin~ Th AI . (7."' n li.".U ..7) (0.54.lcache r ratio by 2.she asks. 0.h.43 .·statistic in Equation (7. that is.6). the smallest significance 11.650 X PCIEL. based on the evide nce so far .2.J .1% . Holding expenditu res pel pupil a nd the percent:lge 01 English learners constant.26). The resu lt is striking.43 = ( . the null hypothesis can be rejected al the 5CY. Hypothesis Tesb end Confidence Intervals in Multiple Regression ~ = 686.val ue is less thall 5%.29 X STR + 3. it is 0 1111 ..87 X Expn . . Because the p..0. she can pay for those teachers e it her through ellis e lsewhe re in the budget (no new compute rs.48) (1.26. The associoted p·"aJuc is 24>( 2.:\1.10 .54) = 1. lolal spe nding pe r pupil .teacher ratio is esti mated to hat' a . 'n1f~ value... Wha t.26 X 2) = (. aft er adding Expn as a regressor in Equation (7. or by ask ing (or an increase in be r budge t.5) but.43) (7. of ' he ~fficientlj':.1.6) ( 15. she moves 0 0 to a mo re nuanced queslion. we first need 10 compute the . Your analysis o f the multi· Adding expenditures per pupil to the equation pic regression in Equation (7.0.0.
tba t is.7.62) can make th e OLS estimators less precise.5) ccd to hat the 2.59 = 2.) and the coefficient o n spe nding per pupil ({32) are ze ro. tech nology. fro m 0. o nce • . Altho ugh it might seem that we can reject thi s hyp o tht:sis because the Is ta tistic testing /32 = 0 in Equation (7. school administra to rs a llocate their budgets effici ently. I r e teach Jdgel (no Icrcase in cst scorC!. O ur a ngry t axpayer hypo thesizes that neither the stu dent. Put diffe re nlly. in these Cali fo rnia da ta. Thus Eq ua tion (7.5) to 0.teacbcr ratio (ft. the F sta listic.6) indicates thai this transfe r would have little effect on test scores.1. res on tnt .0. he hypothesizes t ha t both 131 = 0 a nd {32 "" o. If so. (7. introduce d in Section 6. Wha t a bout o ur angry taxpayer? He assens th at the po pulation values of bOTh the coefficient on the slUde nl.645 ). so the hypothesis thallhc population value of this coefficicnI is indeed zero cannot he rejected e\'cn al Ihe ]0% signif· icance level (1. and so o n) and transferri ng those fund s to hire more teachers.48 in Equa tion (7.60.6) provides no evidence thaI hir ing more leachers improves tcsl scores if overall expe nditures per pupil are be ld constant. cL. expendi tu res pe r pupi l.48 = .0.43. the small a nd statistically insignit'icant coefficient on S TR in Equation (7. Now.54. school districts could raise the ir test scores simply by decreasing funding for other p urposes (textbook s. However.0. Suppose. Note tha t the standard e rro r o n STR increased whe n Expn was added. and the percent age of English learners. lila! the coefficie nt on STR in Equatio n (7. thi s reasoning is nawed.0) /0.60 1 < 1.6) is thaL. and to test it we need a new lool.2 Tests of Joint Hypotheses TIli s sectio n describes how 10 fonnulate joint hypo th eses on multipl e regressio n coe ffi cie nts and how to test th em using an Fstatistic. c level 'O ( ss Ihan quile JUG " ue value I t t of the he 95% ~ .29 . ODe inte rpretation of tlle regression in Equa tio n (7.87/1. co un Ic rfaclUaily.r English 7. This illustra tes the gene ral pa int .95 x 2. The t ax p flye r '~ hypothesis is a joint hypothesi s.teacher ratio.0. rcm ulti based 00 iet. spans.6) o f the test sco re aga inst the stu de nt.2 Tests of Joi nt Hypolhese~ 225 (7.teacher ratio nor e xpenditures per pupil have an effect on test scores.) were negative and larg~ . th a t corre latio n be tween regressors (the correlation between STR and Expn is . d the per zero is now ( = ( .01 )usands of Testing Hypotheses on Two or More Coefficients Joint null hypotheses_ Consider the regression in E quation (7. districts are alread y alloca ting the ir fu nds e fficie ntly.6 ) is I = 3.6).7 in the context of imperfect m ulti collinearity.43 ill Equatio n (7. thereby reducing cl ass sizes while holding expendi lures constant.
the fo llowing c:llculation show~ thall~l' approach is unreliahle. the null hypothesis is tha llhe coe fficients on the 2ndA'h.c. . .6) thai {JI = 0 and f32 = O. then the joint null hypot hesis itscl f is fa lse.:maticall~ ae.. and pj.i\ f1~ doce. . •• ~ I.so that there 3rc q = 3 restrictions. I.ll l:{J1 '* Oand/or fJ.' null hypolhl!si:..~{J1 = Oandp2 = Ovs.. _ . 1l.e. .. O .tric. If anyone (or morc than one) of the eq ualities unde r the nul l hypothe~b !II' in Equa tion (HI) is fal<.c. hypothe.7) is <In e xample of Equation (7. vs./b i~ .1. What happens when you use the "one al a time testing procedure: Reject the joint null hypothesis if eilher" or'2 exceeds 1. Why can 't I just test the individual coefficients one at a time? Althou. Anot he r eX<lm ple is Ihat.ion model.: values of these coeffkients under the null hypothesis.. refer to differe nt regression coe fficil! ll ts.tric· lions on th e regrt!ssion coefficients. "" O.... . Il l: onc or more of the q reslriclions unde r H o docs not hold.nt •••• ? . a joint hypothesis is a hypothesis Ihat imposes two or more re<.. f3 z = O.o. and leI t 2 be the t'3tatistic {or test.p".. ing the null hypolhesis thHt fJJ.8). .226 CHAPTER 1 ~is Tests end Confidence Intervals in M ultiple Regrenion we com ro l for the percentage o f English learners. In gene ral.6) a nd £'(1''' is tht:' second. The null hypolhe"is in Equation (7. en. Tn general. that is.o.!!~ it seems it should he possi ble to lest a join t hypulhcsis by using the U3ual r·:Wll s· tics to test Ihe restrictions onc at a time.liter' nati\"(~ hypothesis is tha t al least one of the equalilies in the null hYPolhl!. under the null hypothesi s Ho there arc q such restrictions.0.. and 51h regres~ors arc 7Cro.'r to th.Jlh.. Ln II regression with k = 11 regre~ors. the .O. S pecifically. In Ihie.8) where P" 13". Let I I be the 1_!>t 9' tistic fo r lesting the null hypotbesis tbat 131 = O..fjJ = O .. ses of the for m 1111: 13) = f3 . minology wc can say thatt hc null hypot hesis in Equation (7.7) im posc~ IwO rl!<.. TIlu!. tions 011 thl! m ultiple regression model: f31 = 0 (Il1d f3 2 . (7. ..teacher ra tio (P I) IIlId the coc(fjcic nt on expenditures pc r pupil (f3 2 ) are "lero is a n example of a joint hypothesis on the coefficients in the muh iple regrcs. restricts the value of two of t he eoeffick·nts. We consider joint null and fllternntive hypl)th~. so as a mailer alter..is 1l1..o' P".13"'.e. rcfi. suppose that you are interested in test in!! t~~ joint null hypothesis in Equation (7. sor in Equation (7 . (""'"'J 11\c hypothesis lilal bolll the coefficie nt on the studen t. for a total o f q restriclions. thi. and Ps = o. · . Beca use STn b the first regrc\. we can w rit ~ th h..z +.. not hold.
:!.9025 = 90. in a It<: on lhe h at there re q such t h~sis fill the alter the<i.. arc. If th e regressors arc corrda ted.. d~ 1.11 T he F Statistic Th e F statisti c is used to test joint hypothesis about regressJon coefficients. What is In e size of the "one at a lime"testing proced ure: that is.: 0.s H~ Because Ihis question involves the IWO ra ndom variabkl) II and 12. Firs t consider t he special case in which th e (statist ics are uncorrela ted and thus arc independent.': rcstric ypothe (70S) . TIlis "one at a time " method rejects the null too often because it gi ves you too many chances: If you fail to rejeci using the first {statis tic . there is anothe r approach to testingjoinl hypotheses tha t is more powe rful.a new approach is needed . The Fstatistic with q = 2 restrictions. in larg~ samples ~I and have a joint normal d istribution.96) X PrO'2! !S \.96) = 0. So the proba bility of reject ing the null hypot hes is when it is true is 1 . .96. lllC ad va nl:lgc of the Bonferroni method is that it appHes very ge nerally.96) = P r(I'll !S 1. method. ·lb c liize of the "one at a time " procedure depends o n the value o f the correla tion betwee n the regressors.7.2 :cgre~_ Test1 01 Joint Hypotheses 227 math_ (7.952 :c.lt.. tha t thl! :. I. so under Ihe joint null hypot hesis the .omplicated .hhou~h 11 .e. you get to try again using the second. the of ler restr. what is the pro bability that you will rej ect the null hypot hesis wh en it is true'! More than 5%! In this special case we ca n calculate the rejection prohabil it y of t his me thod exactly. Fortunate ly.96. the situatio n is even more (.lIe normal distribu tion .As mentioned in Section 6.lhat i ~ ils rejection ra te unde r the null hypothesis docs not equa l the desired significance leveJ. .75%. The for mulas fo r the Fstatis tic arc integrated in to modern regression sofly. il freq ucntly fa ils to reject the null hypothesis when in fad the ailcm ative hypothes is is true. re fer pothesi<i thilt. answeri ng il requires characterizing the joint sampling distribution of'l and . the n turn to the general case o f q restrictions. We fi rst d iscuss the case o f twO restrictio ns."..he two tSla . O ne approac h is 10 modify lht' "o ne 31 a t ime " method so that it uses differ enl cri llca l values tha t ensure Ihal its size equals it!> significa nce leve l. When the joint nuU hypothesis has the two restrictions thnt PI = 0 and tisl ics I) and 12 using the formula pz'" O.statistics ' l and have a b ivari.96 and I (~ I ~ 1.95 2 = 9. Pr( lld s 1.t ing tnt the I_~ta c for I~~t lit a till).25%. ~speci ally when 'he regressors are highly correlated. Because the Istat istics arc indcpendent. This method . Because t he "o ne a t :'I time" test ing approach bas the wrong siL. is d ~s cr ibed in Append ix 7.6.96 aDd \12! s 1. the Fstntistlc combines .0._stali~· .s di sadvantage is that it can have low powe r. That approach is based on the Fstatistic. ih '2 . The null is nOI rejected only if both I'tls 1. whe re each Istatistic has mean equal to 0 and variance equal to I. called t he Boo (erron..7) I) (mel a joint sc.
a hcteroskedast tc1W robu:!. Consequently.x distribUlion.L·" ' (19) where P ")I is a n estimator of the correlation between the two (sta tistics.~. Th e homoskedaslicityonly V CT"iofl of Ihe Fstatistic is discussed ot the e nd of this section. leading the teslloreject the null hypothe~is.228 CHAPTER 7 Hypolhesis Tests end Confidence Intervo!s in Muhiple Regression f .! estimate of the "covaria nce matrix"). in large sa mples. is given by the Fq. lf so. That is.9) adjum for this correlation. Computing the heteroskedasticityrobust Fstatistic in sta tistical software.1t the Ista tistics a re unco rrcl ~ t ed so we can drop the terms involvingp'. for historical reasons most statistica l software computes homosked ns ' ticily·o nly standard errors by default. Eq ua.. in l:lrge samples. first suppose that we know th. . Uode r the null hypothesis. then either 1 1 o r II (or hath) will be large. 1. In genera l the {·stati stics are (. so under tbe null hypolhesis F has all F2. This adjustment is made so that. ('1 P" ._ distribUTion in Appendix Table 4 for the approprillt e value of q and the desired significance level..<~ distribution (Section 2. TIle formula for the he teroskedasticity· robus t Fsta lis!ic T esting the q restrictions of the joint null hypothesis in E quation (7. a nd the formula for the F·stal istlc In Equati o n (7.ll. the F statisTic has an F2. the Fsmistic has a sampling distribu tion that. + f2 ' . + fi): that is. / 1 a nd I] are independent standHtd normal rando m va riables (because the Islalistics are uncorre la ted by assumption ).. Under the alternative hypothesis thai either f3l is nonzero o r f31 is no nzero (or bo th). This fa nnula is incorpo rated into regression software. U nder the null hypOIhesis.4).'/ If ] ) 1 1" I'.10) Thus the c ritical values for tbe Fstatistic can be obtained from the tahlt:!~ of the F'I. under the null hypothesis the F·statistic is distributt!d Fq. To und e~tand the Fstalis llc in Equation (7. If the F. under the oull hypothesis.3.9). (7.sla tistie is computed using the gener~ 1 b etc rosk edasticit ~ · robust fomu la.2' _:: :. As discu!)sed in Section 5. the Fstatistic is the average of the sq ua red ' ·stat istics.~ distribution in lnrge samples whether or no t the {·stati st ics are correlated. lio n (7. mo re genernlly.8) is given in Sec tion 18. i({T The Fstatistic with q restrictions. in some software pack'lg~S you must select a ··robust'· option so that the Fstatistic is computed using he!' croskcd asticityrobust standllrd errors (and.9) simplifies a nd F ::::.orrelated. m:lking the P ·stal istic easy to compute in practice. its large II distri bution under the null hypot hesis is Fq•x regardle~ of whether the errors are homoskedastic or he teros kedastic.4.
The pvalue of the Fst:uistic 7.. at le ast o ne j.. That is. the p\"alue is .Teacher Ratio We are now able to test the nu ll hypothesis that the coefficients on hmh the s tu dentIeacher ratio m ill expenditures per pupil arc Lero.. k.. Becau~e the P·statistic has a largesample "~. we need to compute the hcteroskedaslicityrobust F sllltis(ic of the tc.il' . In IMge sam ples. The "overaW' regre~sion Fstatistic tf!:>IS the joint hypothesis that all tht: slope coefficients are zero. and the I~slal i s lic is the square o f the Istatistic.:z . the null and alternative hypotheses are . lind tbe The F·statistic when q = I .0 using the rcgre~ion of TestS('()re on STR. because a x~d is lribut(:d random variahle is q limes an "~/. ah hough lhe intercept (which under the nu ll hypot hesis is the mean of Y.nlage of English learn ers in the dj"tricl._ > F. and the overall regression f'statistic is lh e I'~stat is t ic computed for Ihe null hypot hes is in Equation (7.13k = 0 \'~ Jf1: {3/ "* 0. Iht! overall regressio n Fstatistic has an Fk•.. . r the p value .".. controlling for Ihe pcrC\. I . in der the U nder this null hypothesis. Then the join t null hypothesis reduces \0 the null hypothesi~ on a single regression coefficient.7I. The p value in Equation (7..!· paekag. It. because formu las for the cumu lative chisquared and F distributions have been im:orporated in lo most mode rn slOlIis tica l softwa re.11) can be evaluated u~ing (7. I' then bcsh. no ne of the regressors expla ins any of the variation in Y./32 = 0.2 ~hofJoinlHypotheses 229 Computing the p.7. .8).volue using the F·statistic.9) can he compu ted using the largesample Fq ap proximation to its d istribution.. the p. ..value can be c\alualed using a compu te r. that qua· fthe dard ion).. Allematively. . W) ~abiCS of .e of the general null hypoth esis in Equat io n (7 . Lei Ff denote the value of the Fstatist. .~t i dt y uation ·ftwa re. I s til' in under ether The "overall " regression F·statistic.<£ trihution (or. illy versi(lll \ . distribution under the null hypothe<ii:o:. (7.Pr [f·~." dist ri bu t ion when the n ull hypothesis is true.'il Ihal {jl = 0 and f3. .againsl the alternative th at at least one coefficient is nonzero.lClual1y computed. 'l.1 2).e" u~ i n g tu:!' das ticit~ Application to Test Scores and the Student.thl! Fstatislic (Csts a single reslric lion . 12) t hat...11 ) a ta ble of the Fq dis .. OI .''i !cussed 11'1 oskedn. Hn: {3! = 0. j = 1. alternatively. To lest Ihb h) pothcsis. a table of the x~ distribution.) can be llon"lero: rhc null hypo thesis ill Equlllion (7.dislributed random variable). . When q = l. (7. fl!gardk.stlC ~asticit).12) is a special C<t!o..
if the e rro r is homosked. because it is va lid only if tbe e rror term is ho moskedastic. The HomoskedasticityOnly FStatistic One way 10 rest ate the questio n addressed by the Fstatist ic is to ask wht=thcr re laxing the q restrictions (hat constitute the null hypothesis improves the fit or the regression by enough that this improveme nt is unlike ly to be the result merely of rando m sampling va riation if the null hypot hesis is (Jue. the F·sta listic can be wrincn 10 terms o f the im proveme nt in lhe fit of the regression as mc:asured eit be r b) Ihe s um of squ ared residua ls o r by t he regressio n R2. ~o the null hypothesis is rejected at the 1% level. the relevant reg. if tbe e rror te rm is homoskedastic. lhen the tcst reiects the null hypothesis.. that is.00 (Appendix Table 4). ea]Jed the restricted regre~sion . distribu tioo is 3. we can reject t..ressors arc excl uded from the regression.'llled the unrestricted regression. and the 1% critical value is 4.. c.1l) as summarized in this Fslatistic.. stic. be associated witb .) is va lid whe ther the error term is homoskedaslic o r heteroske daslic.. u. such as might be repon ed in a t<lble thai includes regres~ion R2's but nOI F·slatisti cs. exceeds 4." distribution. The value of t he Fstatistic computed fr om the data . Ihe he terosked astici tyrobust F·stati~tic comput ed using til e formu la in SeCllo n 18. In the second regression . This resta tement sug gests thaI there is .005). where all the hypothesized \'alue~ are 7ero. This Fstat istic is S. If the sum of squared residuals is sufficienlly smaller in the unre stricted than the restricted re£ression. and P".61. Despit e thi s significant limit ali on of (he homoskedastk ity·oruy F slalistic. It is very uolikely thaI we would have drawn a sample that procluced an Fstatjstjc as large as 5. The ho mos kedasticil yonl y F·statistic is compllled using a simple C ormul ll hased o n T he sum of sq uared rcsidu als fr om two regressjons. the restricted regressiun is lbe regression in whieh th ose cocfficicn(~ are set to 'le ro. lo contrast.61. the null hypothesis is forced to be true. Ihe simple formul a ca n be comput ed using standard regression outpu!. In the [irsl regression. Tha t js.43 if Ule nuU hypoth_ esis really were true (the pvaJue is u.230 CHAPHR 7 Hypothesis Tests and Confidence Intervals in Multiple Regreuion Expn. Under the null hypothesis. .43. th e altemative hypothc~i~ p allowed to be true. its simple form ulAsheds light on what the F·~t a li stie is doing.. In addiTi on. Based on the evidence in Equation (7. it see ms. The resu lting Fstatistic is referred to as the ho moskedasticity·onJy Fstalistie.EL reported in Equ ation (7. this intuition has an exact mathematical expres· sio n..6). ln fact. The S% crit ical value of the F2. link betwee n the Fstatistic and the regression R2: A large F· statistic sho uld. When lhe null hypothesis is of the IyJ)C in Equation (7.8). in large samples this statistic has au Flo.h e taxpayer'S hypothesis lhat nei_ ther the studenll eacher ra tio lIor expend itures per pupil ha\ e an effect on t t:~1 scores (holding constaot the pe rcentage of English learners).43. 5. substantial increase in we R2.
q is the number of restrictions under the null hypothesis. SSR"".R:. The ~ thesiS is esi~.:ssors STR.:1ttl .k"nrtm i.R?u"u. 14) Ir the errors art. An alternative e quivalent fo rmula fo r the ho moskedasticityo nl y Fstatis6c is based on the R2of the two regressions: F = (R~". To test the null hypothesis that the populat io n coefficients o n STR and Expn a rc Q. they a re valid o nly if the e rrors are ho moskedaSlic.t _ . 14 ) a nd the het e roskedasticityrobust Fstatistic vanishes as the sample ~ i ze n increases. C ri tica l va lues rOr tbis d istribution .7.rricrfd .1. a re give n in A ppe ndix Ta ble S.mtl) / (n .<.smt'frd ) l q (I . in practice the ho moskedasticityonly Fstatistic is llo t a satisfactory substi tute fo r the beler oske dasticilyrobust F sta tistic.4366. then the diffe re nce between the ho mos kedas ticityonly Fstatistic computed usi. or mo re generally wit h data sets typica lJy fo und in the socia l sci ences.:'I"':I(d is the sum of square d residuals fro m the re stricted regressio n.4366: tha t is.o< distri bu tion as n increases. for large sa mple sizes.nIt~ .SSRullmlrkr."'~wia(d is the number of regressors in tbe unrestricted re gression. however.14) has an Fq. bo moskeda!>tic. and k.rftl .k""~rT1(. Because homoske dasticity is a specia l case tbal caono l be counte d o n in a pplica tio ns wiiJl economic data.1 distribution converges to the P1 1 . the sampling distribution of the ruleof· thumb F statistic unde r the null hypothesis is.. U nfo rluna te ly.4. the di[fc rc nces between the two distri butions are negligible. in large samples..s small..2 TestsofJointHypolheses 231 n e homosk edQstitityorn)' Fs tatisti c is given by the formula F'" (SSR mmqrd . In II>' Application to Test Scores and the StudentTeacher Ratio. and Pct£L.13) where S5R'.d.6): its R l is 0.. As dis cussed in Section 2. Thus. conlrolling for PcrE L . we need to compu te the SSR (or R2) for the restricted and unrestricted regr ession .i. if the e rrors a re homoskedastic. the two sets o( c ri rical val ues d iffe r. the n Ihe homoskedaslicity o nly Fstalistic defined in Equations (7. The unrestric ted regression has the regn..d ) /q _ SSRunrnnicrtdl( n k UllujUlCIM I) (7.I dis tri butio n under the null hypothes is. For small samples.13) or (7. no rma lly d istributed . These rule ofthumb (onnulas are ea sy to compute and have an int uitive int e r pre ta tio n in (e rms o f how well the unrestriCied a nd restricted regres. ~. a nd is give n in E quation (7. R~"r~$lflcrffl = 0.1) ' (7. whic b depend 00 both q and II . heUJl~ . Fq.ng Eq ua tio n (7.wrirred is the sum of squared residua ls from the unrestricted regression. th e FqJl .13) and (7 . : F·sta frthe Using the homoskedasticityonly Fstatistic when n. ress10n f n11Ula rt:nthe ressiofl. t1 "alu~ ~nts art . £t:!JII..ions li t lhe datOl . If the errors rnd~rd a re homoske dastic a nd a re i.
1)[ = 8. The homoskedasticityon ly F·statistic.Ol exceeds the J % critical val ue of 4. 7. Because t!. all hough PclE L does ( the nu ll hypothesis does nlll re~trict the codficicnl on PctE/.0.. It s disadvantage is that the va lues of the homos kcdas tieityo nly and he! t:roskt:dnsticityrobust FstatiSlics can be very different : The helcroskedastiClty robu~1 F·stntislic testing this joint hypothesis is 5.4366 .1 . The restricted regression.3 Testing Single Restrictions Involving Multiple Coefficients Sometimes economic theory s uggesl ~ a single restriction tha t in voh"es two or nlOre regression cocf[icients.0. quite different from tht. the task is to test this null h}'Polhcsis against t he alternative thu t th~ tWO coefficit:nts differ: Ho:f3 I =/31 VS · lI j:{3I'1f3 2· (7 16.tEL.. . H2 = 0. theory might suggest a null hypothesis ("Itthl' form {31 = {3l.01 .4149.).15 R~n. cstimated b} 01 .4J~ 9.43.IM = O. so q '" r. I~ TestScore  =:c 66·'1 .0. computed usiog Eq ua tio n (7. has a single res triction .032) ::= so 2. 111is example illustrates the adv::lDtages and disadvan tages of the hom osk~uas· ticity·only Fstalislic. The number of restrict ions is q = F ~ [(0.0 1. the effects of the first and second regressor are the sam~ 10 this case. the hypothesis is rejected <It the j % le vcl using this ruleo fthumb approach. but that reslrict iOfi involvcs multiple coerricienls ({31 and {:J2)' We need t o modify the meth. tions is 1/ = 420. and the number of regressors in the unrestricted regression l~ k 3.. lls ad van tage is that it ca n be computed using a calculJlor..4366) / (420 ...' 1e~s rcliahle homoskednstici tyonly ruleof·thumb value of 8. the number of obse na.d~ pre~entcd <."lches: w hich one \\ill ttc c:'i')ie. under the null hypothesis STR and E:r:pn do not enkr the population regressi on.232 CHAPTfR 1 Hypothesis Te~t$ and Con~dence Intervals in Multiple Regression re~trictcd regression i m po~es the joint null h }' pot h es i~ that the true coefficient<:.4 )49)1 211[«( .6J.. IJn 5TR and E:r:p" are zero: that is.1 This null hypothesi . Tllcre are two appro.0 far to lest this hyp01hesis.671 x Pc.0) (0.l'lt depends on your 'illftwarc.\.14). Fo r example. (7 5) (1. that is.
I and het :edaslicit" I . then estimating the regression of Yi on X li and WI' A 95% confi dem:e interval [or the diffe re nce in the coefficients f3 1 .81 .f32 can be calcul ated as ). + f32 W i + ui' (7.Io that the tWO \7 .17) can be rewri tten as Yi = f30 + YVYI. is ted at the Y.17) into Equation (7.1 ± J . the po pUlation regression in Equa Lion (7.1tol ~ t restricti')" the methodS hieh onc Because the coefficient YI in this equation is YI = f3 1 .) = YIXli + f3 l W!. + X2 i + .is (7.) Des no t O LS.f32 (Illd Wi = Xli + X 11. by turn ing Equation (7.16). = 0 while un der the alternative.. The two methods (Approaches #1 and #2) a re equivalent .9). This me t hod c(ln be extended to other restrictions on re gression eq uations usi ng the same trick (see Exercise 7. the hypothesis in Equat ioll (7 .8\X 1t + f3VY2i = f3 1X II .f32' under th e null hypoth esis in Equation (7. To be con crete.16) into a restriction on a single regression coefficient.1. suppose therc arc only two regressors.1l1US. lti) can be tested using a trick in which the ori ginal regression equation is rewritten to tu rn the restriction in Equation (7. ~m the \css Here is the trick: By subtracting an d adding f32X li' we have that .18).3 Testing Single Re5tridions Involving Multiple Coefficients 233 entS on )\ .f32X li + f3JX 1 .18) I ~ wo or mort Ithcsis of the Ithe same. = f30 + f31XI . Because the restriction now involves the sin gle coe ffi cient YI' the nu ll hypoth esis in Equation (7.82)XIi + .'" distribu tion is 1. so the 95 % pe rcentile of the Fl. X l i and X 2t in t he regression. Th us. so the population regressio n has the form observa .4 that the square of a standard normal random variable has an Fl.17) lloskedas ~alculalO[. y.15 ) Approach #2: Transform the regression .:ntcr Approach #1: Test the restriction directly. + fhX2t + Ui' (7.96 2 = 3. . because q = 1."" distr ibution . (Recall rrom Section 2. If your statistical pack age cannot test the restriction dir ectl y. Y\ of:. this is do ne by first constructing th e new regressor Wi as the sum of t he two ori ginal regressors.16) can be tested using th e Ista tistic me thod of Section 7. in the sense that the Fsta tistic from the first method equals the square of the Istatistic from the sec ond me thod . • . O.82X 2j = (.8l(X 1 where Y1 = f31 . we have tu rne d a rest riction o n two regression coefficienb into a restriction on a single rcgression coeffic ie nt. has an F Loc distribution under the null hypothesis. Some statist ical packages have a specia lized command design ed to test restrictions like Equ ation (7_Hi) and the result is an Fstatistic that .84 .7..96S£(Y)).sion is k 14). In practice.
Because the lest has a 5% significance level. 2.J shows a 95% confidence seI (confide nce e llipse) for the coefficients on the s tudent.1 for constructing a confidence set for a single coeffici ent using th e I·St8Iistic. except that the confidence sel for multiple coefficients is based on the F·statistic. the true population values of (3\ :md [32 will not he rejected in 95% of all samples. stalistic ca D be comput ed by eithe r of the two methods just d iscussed for q = I Precisely how best 10 do this in prac!ice depends on Lbe specifjc regression soh.. flGU ellipse C' 00$ of PI rejected 7.111is ell ipse does not include the point (0. ware being used . Secl ion 7._ _ __ _ _ . Figure 7.2 showed how 1O usc the F s tatislic to test a joint nuJl hypothesis tha t /3 1= /31. 'The method is conceptually similar to the method in Section 7. Thus.4 Confidence Sets for Multiple Coefficients the 5% si.6). ciems. This means that the null hypothesis thai these two coefficie nts arc both ze ro is rejected using tilt Fsta tistic at the 5% sil!.0 a nd /32 = fJz p.61. suppose ). the set of values nOI rejected at the 5% level by tbis F·statistic const itutes a 95% confidence se t for /3 1 a nd {h Although this melhod of trying all possible values of {31.nificance level.!(l. TIlis section explains how to construct a confidence set for two or more regressi on coeffici ent s. This a pproach.. TIle p.z.0).Suppose you were to test every possi hie va lue o f . In general it is possible to have q restrictions under the oull hypothesis in which so me or all of these restrictions involve mult iple coefli.234 CHAPTER 7 Hypothesis Tesi's ond Confidence Intervals in Multiple Regression Extension to q > 1./3~.u aDd [32.e case of multiple coefficie nts. When there are two coefficients.0 a nd l3z.. in practice it is much simp ler to use an explicit formula for the con fidenc e ~c\· This fonn ula for the confidence set for an arbitrary number of coefficients is based on the formula for the P·statistic. holding consta nt the percentage of English learners.OU a re inte rested in construc ting a confidence set for twO coefficients. The F·siatistic of Section 7.0 works in thc· ory.To make this concrete..teacher ratio and expenditure per pupil.. A 95% confidence set for two or more coefficients is a set that contains the true population values of these coefficients in 95% of randomly drawn samples.tI)· }'OU construc t the Fstari slic and reject it if it exceeds the 5% crit ica l va lue oj J. which we al ready kne w from Section 7. can be extended to th. As a n illustration.6J and f3. based on the estimated regres· sion in Equation (7. Recall thac a 95% confidence interval is computed by findin g the set of vtll· ues of the coefficie nts thai are not rejected using a Istatistic at the 5% significance level.o at the 5% le veL For each pair o f cand ida tes (/3t O. the resulting confidence se lS are ellipses.2 extends to Ihis type o f join! hypot hesis. a confidence set is the generalization to two or more coefficient s of a con!"i· dence interva l for a single coefficient. Thus. .
This mea nS tllJI C is rejected using tll ~ " from Section • .\ \ .0). 7 (. " 2.11 .1l 0 . ' me re ason fo r this o rien ta tion is th : H the estim ated corre latio n between P I a nd {3: is positi ve." 1. .0 wo rks in tM' the con[ide ncl: s.. hat contains the ~ drawn samples. ~7) .S JOS under the lultipl e coeffi.. lho)· :Iie"l value 00.:s of fit such as the Rl o r "H2. . m values o f {3 \ and ues not rejected al 7. which in tum arises because Ihe corn: la tion b ~t wee n the regressors SiR and Exp ll is negative (schools tha i spend mo re pe r pupil te nd to have fe wer stude nts per teache r).sed for q = 1. "Ine stan ing poin t for choosing a regression specification is thinking through the po~sible sources of omitted " a riable bias. because some useful guidelines lire available.o.S Model Specification for Multiple Regression The jo b of detemli ning "hich .00.sloli~tic 01 <Je~ the 5% significance leoeL . But do not des pai r. jidales (J3 l.'aria blcs to include in multi ple regressio nthat is.29.6) Coeffici elll on El:p" The 95% confidence set lor the coeff icients on STR (.:t. othesis.5 Coeffici ent o n STR ({JI) : ng the set of vat e 5% significance fficie nls. . Section 7.0 1. cie nts of a eonfi· ~stat i st ic. . :t fo r /31an d 132 ' Id i32. .) and Expn iJ311 j~ on ellipse.8.5 0.11:t) = (0.1. statistic.:l ~ndi\ure per pupil. To make idence se t for tWO istic to test a joint o test eve ry possi Th e co nfi dence e llipse is a fal sausage with the long part of the sa usage oriented in the lowerIeflfupperrigh t directio n.U 0. th e problem of choosi ng a regressio n specifica tioncan be quite challen ging. The F· .5 I.1 for except . It is important to rely on your expert knowledge of the empirical problem and to focus o n o btain ing an unbiased est imate of the causal effect of interest:do no t rely St)lely on purely statistical meilsun. .ts iTIore re gression I + (11 1.7. 3. The elIip~ contoin$ the poir~ of vol· . :oefficicn ts is ba~l:d :ients. he estimated rcgre~0. egression sofl· Model Specifico~on lor Multiple Regression 235 FIGURE 7. 1 95% Confidence Set for Coefficients on STR and Expn from Equation (7. and no single rule applies in all si tu atio ns. (iJ2) 95% Confidence Set of t31 ond 132thot COnnot be relected u ~i ng the F. the resull iM (confidence e llipS.
and if it is correlated with at least one of the regressors. the afnu ence of the students and the stude nt. then the schools will tend to have large r budgets and lower st udent. so in general the OLS eSTimators of all the coefficients will be biased. For example.teacher ratios. the OLS estima tors are correlated. As was discussed in Section 1l. This base specifica tio n should contain the variab l e~ or prim ary int. A t a ma lhema tjcal level.This means th at the conditional expectation of II. if the two conditions fo r omitted variable bias are satisfied. The two conditions for omined va riable bias in multiple regression :He summari zed in Key Concept 7. As a result. Fi rst.teac her ratio and the percentage of English lea rners. decid ing whe ther to include a particul ar va ri able ca n be d ifficult and requires judgment. given X Ir .. even after co n[roJtj ng C or the percentage o f English lea rn n short.stude nts from afflu ent families o h en have more learning oppo rtuni ties tha n do their less a rnuen t peers. when dal<! are ava il able on the omitted va ri able. and the OLS esti mate of the coefficie nt on the stude nt. the solution to omil· ted variable bias is to include the omitted va riable in the regressio n. omitted variable bias implies thilt the OLS estimators are inconsisten t.3 . then at least one of the regressors is correlated with the error term.teacher ral'io wo uld be negatively correlated.teac he r ralio would pick up the effect of average district income.236 CHAPTfa 7 Hypothesis T esbond Confidence Intervo11 in Multiple Regrenion Omitted Variable Bias in Multiple Regression The O LS estimato rs o f the coeffic ients in multiple regressio n wiU have omitted variable bias if an omitted determ inant of Yi is correlated with at least one o f the regressors. The O f] Model Specification in Theory and in Practice In theory. however.1l. ch en the OLS est imators will have omitted variable bias. economic theory. . omitting the stude nts' economic backg round could lead to omitted ers. so that the first least sq uares assumption is violated. Moreover. and knowledge of how the data were collect ed: the regression using th is base set of regressors is sometimes referred 10 as a b ast" specifiClttion. . O ur approach to the cha lle nge o f potential o mitted variable bias is twofold. if the district is a wea lthy o ne.!r est and the control variables suggested by expert judgment and economic theory· . the omitted variable bias persists even if the sa mple si.• Xk/ is nonzero. A t lea variabJ! 2. If so. .OMlnE Omitted \1 more inelu able bias l' I.:e is large. a core o r base set of regressors should be chosen using a combination of expert judgment. that is. wh ich could lead to bette r test scores. Th e gene ral cond itions for omill ed vari able bias in multip le regression are similm to those for a single regresso r: If an omitted varil:l ble is a de terminant of Y . In prac l i c~. T variable bias in th e regressio n of tes t scores on the student.
but if il does this docs nol necessa rily mean that tbe coefficient on that added regressor is statistically significant. If the estimates of the coefficiems of interest are numericall y similar across the alternative speci. The "R2 does not always increase. However.his approach to mode) specification in Section 9. 2. O MITIED VARIABLE BIAS IN MULTIPLE REGRESSION 7.. . This bzero. lhis ohe n provides e vide nce tbat the original specifical ion had a mi lied variable bias.2 after studyin g some tools for specifyi ng regressions. so that variable bias . ff.lIeS of the coefficie nts of interest c hange substantiall y across spec ifications.< K£y CQNC.! omitted rWWMJM '. then this provides evi dence that the estimates from your base specificat ion are reliable ." .'I' 7. two things must be true: I..e effect of ~omitted variable bias is the bias in the O LS estimator that arises when one or more included regressors arc correlated with an omitted variable.. S estimators I DL$ estima· fienlS. on the other hand. We elaborate on t.3 j glish leam to omitted tio and the lression are ri nant of Y. JJJtfll.. it is easy 10 reat! more into the m than they deserve. Interpreting the RZ and the Adjusted R Z in Practice A n Rl or an "R 1 near 1 mean s that the regressors are good at predi cting the values of the dependent variable in the sample. alternati ve sets of regressors.. and an R2 or an R2 near 0 meam tbey are 1)0t.P1 I me of the ~Iearni ng rettc r test 1d to have Ie student!'. and often the va riables suggested by economic theory are no t the o nes on which yo u bave data. A t least one of the included regressors must be corre\<lted with the omitted variable..Foromitted \'ari· able bias to arise.. QLS esti .s t\\'orold .. The Rl increases whenever you add a regres sor. • .s implies that Expert judgment and economic rheory are rarely decisive. however. the eslim. int erconomie the°!'}'· 1. that is. All increase in the Rl or Rl does not necessarily mean that an added vari· able is srarisfically significant.Will be gress10n a re r1b1e bias art lor {efln....2: ution to om it :>0 . To asce rtain whelher an added vari able is statistically significant. In practice. you need to perfonn a hypothesis test using the (statistic.fications. Y. :ombins tion of I were collectl!d: 'red to as a bs. Ie difficu!l and bias .S Model Specifica tion for Multiple Resre~sion I 237 .'l'" . This makes these statistics useful summaries of the predictive <lbility of the regression . whe the r or no l it is statistically significan t.<it )f primar). The omitted variable mllsl be a determinant of the dependem variable.. There are four potential pit fa lls to guard against when using the R2or R. Tbe ref01e the next step is to develop a list o[ cand idate aheroative specifications.
lS. then the regressors produce good predictions of the dependent variable io that s<lmplc. 1. Recall (he l10t 2. or a high R2 .4 Tht: Rl and R2 tell YOEI whether the regressors are good at predicting. If the R~ (or 7?2) is nearly 1. and.011 whether: 1. The re is omilled variable bias: or 4. Non e of . 4. which concerned omitted variable bias in the re gre ~ sion of test scores on the student. a moderate }<2.nors. data quality. the opposite i ~ true. Decisions about the regressors must weigh issues of omitted vari able bi <ls. You have chosen the most appropriate set ot" regressors. and possibly wi th district income all things that arc correla ted with test scores.statistically significant: 2. The R2 of the regression ne\ ~r came up because it played no logical role in this disc ussion. most importantly.but the relations hip is not causal (try telling the s uperintendent that the way to increase test scoreS is to increase parking space!). mean that tlie regressors are a (rue cause oj the dependent variable. wi th whether the school is in a suburb or a city. or "explain_ iog. O mitted variable bias can occur m regressions with a low R2.1. lIor does a low Rl or lil necessarily mean you have an il/ap' propriate set of regressors. Pa rki ng lot are{l is correlated with the studentteacher ratio.238 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regre~sion WHAT THEY TEll AND WHAT THEY DON 'T R2 AND 7?2: You ~~ 7. in the sense t hat the variance afth c OLS residual is small compared to the variance of the dependent varia hie.Ieacher ra tio. The question of what const itutes the right se t (If regressors in multiple regression is difficult and we return to II throughout this textbOok . A high R l or liz does not necessarily mean you have (he mosr app ropria rt sel oJregre. a low R2does not imply that there necessarily is omitted variable bi. The regressors are a true cause of the movements in the dependent Variable: 3.1 does discussion of Section 6. Imagine regressing test scores against park ing lot area per pupil. economiC theory and the nature of the ! mhstantive questions being addressed . A high R2 or li.111us the regression of test scores on parking lot area per pupil could have a high R2 and R2. A high HZ or /i 2 does not mean there is no omirted variable bias. If the R2 (or l?2) is nearly 0. The R2 and R2 do NOT tell ). Cun verse ly." the v(llues of the dependent varia hie in the sample of dat a on hand . da ta avail ability. An incl uded variab le is .
Ih~ fraclion of "lUden ts who arc still learn ing Eng lish. .t scores Rcca ll lh~ I the rcgre.teacher ratio using the Californi<l data set.4. The two other variables are new a nd control (or the economic background o f tht: stude nts.1fI. so omilling them [rom the regre s~io n will rc~uh in omitted variable bias. The firs t new variable is the percentage of students wh o are eligible for receiving a subsi di zed or free lunch at schoo L St uden ts are eligi ble fo r this program if their family income is less than a certaio threshold (approximately 150% of the poverty line). These poinls are summarized in Key Concepl 7.7.~ not perfectly correla ted (their corrclillion coe fficient is 0.6 Aoolysis of the Test Score Doto Set 239 these questions ca n be answered simply by having a high (or low) regression R2 or n 2. holding constant student cha racteristics that the superintendent cannot control. Our secondary purpose is to demonstrate how to use a tabl e to summarize regres sion resulls. These two variabks thus measure the fraction of economically disadvantaged chil dren in th e district. Whe n we do this. Here we I. None: ~ )mittcd vari Iy. the coefficient on the studentteacher ralio i ~ the effect of a cha nge in the studentteacher ratio. a lth ough L hey are related. apl)ropriQlt Ql 't (UI jl/UP ~ right set l. so in stead we use two impe rfect indicators of low income in the dist rict. One of these conlrol variabl es is the ol:"le we ha ve u~ed previously.6 Analysis of the Test Score Data Set This section presents an analysis or lhe effect on lest scores of Ihe slUde nl. Some of th e fac tors th at could affect lest scores are correlated with the studentteacher Hltio.· 'ssion nC\"t. va riable: Discussion ofthe base and alternative specifications.:onsider th ree variables that contro l for background characteris tics o f the stude nts that could affect tesl scores.. Tne second n ~w va riable is lhe pe rcentage of students in the district whose fami lies qualify for a California income assistance program ..74). with income test scores e!ationship : te :. Con a riablc bias. holding constant these other factors. economic . they . Although theory suggests that economic : use of the 1 & lot area ratio. Fxplain ffthe R2 pendent 7. If data are available on these omined varia hles. There is no perfect measure of economic background in the data scI. is small benrly O. the solution 1 0 this problem is to indude them as additio nal regressors in the multiple regression.teache r ratio. but th e t hreshold is lower (stricter) than Lhe threshold for the subsidized lunch program .1 'oughout thi~ Ised . Families arc eligible for thi s income assistance progra m de pending in part On their family income. Many factors potentially affect the average k'~\ score in a district.1 lcd variable gh R2. Our primary purpose is 10 provide an exam ple in whi ch multiple regression a nalysis is used to mitigat e omitted variable bias. This analysis focuses on estimat ing the effeci on test scores of a change in the student.
and betwct: n test scores and the percentage qualifying for in come assistance is .. the regreSSion would have had an identical R2 aDd SER. { . lation between test scores and the percentage of English learners is . pet'tenlogeq r t' nnr' PfI . holding STR constant.240 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Mulliple Regression baCkground could be an important omin ed facto r. In the It:st score appli· cation. howc:vef. How. Pcf£L could be replaced by the fruc/iolf o f E nglish learne rs.00000356. C stam.2._ _ __ ~ . to us. for the purposes of interpretation the one with PctEL see ms..0. Each of these variables exhibits a negative correlation with test scores.ti5. For ou r base specificatio n. then. The corre. FracEL ( = PCIELl I OO).6. theory and expe rt judgment du not really help us decide which of these two variables (percentage eligible fOr <I subsid ized lunch or percen tage eligible for income assistance) is a better mea!lUrc of background . which would range betwee n 0 and 1 instead of between Oand 100.·s 720 71~1 I)(~ ! '''" . if a regressor is mcasured in dollars and has a coeffici ent of 0. in regression analy· sis some decision usually needs to he made about the scale of both the dependent and independen l variables. to read. th e coefficient is the pred icted cnange in test scores (or a o nepercenta ge·point in crcase in En!!' lish learne rs. but we consider an alter· D ative specifica tion that includes the othe r va riable as well. of the va ri ables? The general answer to the question of choosing the scale of the variables ~ to make the regression results easy to read and to interpre t.O.. In the specification with Pc/EL. more nalUral. Mure gene rally.?II "''' I (e) The SCOlterpl hon ~ 0.64.87. it is easier to read if the regressor is convc rted to millions of dollars nnd the coefficil: nl 1 "ifi i~ l . th e units of the variabl es are percent. for example. Fo r e. should you choose the scale. ".0.650. in the specification with Frac£L. we could have defined th ese variables to be a decimal fraction rathe r th an a percent. fIGURE T. so the maximum possibl e range of the data is 0 to 100. .. bctwcl:n tesl scores and the percentage eligible for a subsidized lunch is . the coeHicient on Fr(IC£L would bave been . Alte rn atively.'\a01 pie. In the regression of Tes/Score o n STR and PCIEL report ed io Eq uation (7. Although these two specifications are mathematica lly equivalent. If instead the regresso r had been Fr(lcEL.0.. A no the r consideration when deciding on a scale is to choose the units of the regressors so that the result ing regression coefficients are eas).5).63. Scalle rplots of test s scores and these va riables are presented in Figure 7 2. the coefficien t on Pct£L is . the natural unit for the dependent variable is the score of the test ilSde. we choose the percentage eligible for 'J subsidized lunch as t h~ ecooomic backgrouDd variable.ho ldi ng STR N Il English learnersthat is.. In Fig· ure 7. N " 62( "" •• (0) I What scale should we use for the regressors ? A practica l question that arises in regression analysis is what scaJe you sbould use for the regressors. the coeffi· cien t is the predjcted change in test scores fo r an increase by I in the fra ction of or a lOOpercenlage·pointincrease. o r uni ts.0.
6 Analysis of the Te51 Score Data Set 241 t do FIGURE 7 . Each regression has . . as in Equation (7.. ow /._... We are now faced wit h a communica tion fraction of g STR con enl. • ". '.f. This works wcll whcn there are only a few regressors an d only a few equations. Each column summarizes a separate regression..0. . Three Student Characteristics Test score "o r a sure {or a score Iter e 7. :.. 0 !..2(1 f>l~' .1Jn ~ 'll.' I~'::·.87)." ' .. we have pre sented regression results by writing oul the estimated regression equations.. in a table. • I:" . For c"a(11' p000356.'<OTh. A better way to com m unic. it I~ ~e cocfficiCld I units of the problem. . . .. the lio'! Percem (c) r"rn'ntJ~'\' 'lua h/\'illS lor UK OllW J~<i~t:ll1c \"" r The scolterplob !>how 0 r icient on negative relation~hip between Ie! scores and {oj !he percen!oge of Engli!>h \earners (correia 0.7.2'1 • • ' .. {Of th~ [ural.1111 (. itself. orre tween t ween .0 ~ ifNI r. '" 7S Ir~. . Table 7.1 summa rizes the results of regressions of the test score on various sets of regre. What is the best way to show the results from several multiple regres s ions that contain different <.. {b) the percentoge of students qual ifying for c wb~jdjzed lunch (correlation = .:?(l 6f)1 • • UM) o 2'.l so qU~hfyUlg 75 P e rcent P e rcen t n that (a) Pcr(C'm.0.. e nd (c) the percentage qualifying for income 0~5i5!o nce (correlation = . (oiill •• • • 7>' ~ f.lg~ o{En~h". bU I with morc regre ~sors and equations this method of presenta· tion can be confusing.5). 2 T~st Scatterplol'J of T est Scores vs.. "~I '. the cE L.".4(1 • .01 . eo • "" .<. .63). coefficient c in Eng : tht: cnefti Tabular presentation ofresult.6). " . /. ~..J pncc IUllch In Fig ange of to be a aced by d range n analy pendent ts.2.~ • • 4.1gc learners (b) PercC'ilIage (or n:dun.(/1 • ( . of the ~bk" 1S to Test score ~2(1 7..re appli..(~t nil h4U (. In (7.lIc the results of several regTcs sions i:.64). • • ' .ubsets of the possible regressors? So far. .
01 ** (0.11 = 420.58. with their standard erro rs below the m in paren· theses.626 I).52) = 18.)1 ·· (0. descri bed in Appendix 4.2" (5. " 420 420 420 ThClie regressioon5were estimated llSing tbe d~ta on KS scb{K)1 d. R2 = O.·.650'" (0. (7. The asteri sks indicate whether the rstatistics.) Percent on public income assistance (X4 ) Intercept ( 1) 1 (2( : (3) .9 .J()" (0.1.28" (0.'l.024) (4) (' ) . Standard ~rro'" are g.43) 1.O.6) 698.0.~trkts in Ca lifornia. Fo r example.52) UO· (0.049.033) .28 x S TR."'" (0.. osided test. and the adjusted R2. The e nLries in the fi rst fi ve rows are thl! est imated regression coefficients.65 0.OS 0. is signi fica nt at the 5% level (one asterisk) or Ihe 1 % level (two asterisks) .068) 698. 7 HypoIhe~ is Test$ oncI Confidence lotervoh in Mulliple Regression TABLE 7.059) 686.036) . 420 observations).O JR) .euor Studen tteacher ratio (Xl) Percent English learners (XJ Percen t eligible for subsidized lunch (X.1.58 0. I n equation fonn.0.2.242 CHAPTE II.1.529u (O. All the informa tio n that we have presented so far in equation Cannat app('al"S as a co lumn of this table.9) 700A'" (5.0.424 9.iobl.te acher ratio.08 11.0.19) .~n in parenth('se~ un d~r codf.122" (0. The md.79()· (0.:J4) .4) (0.. testi ng the hypothesis th at the relevant coefficient is zero.46 0.n3 420 .27) .9" (10. the same depe ndent varia ble.0'" (6. wi th no COO lrol variables..dual coefficient i1 StahstlcaJl y signifi ca nt al lhe "5% level or *"1 % signifi cance k .5 ) Summary Statistics SER R' 18. test score.27) . consider the regression of the test score againsllhe student..030) I . SE R (10.tlti:> regression is  TestScore = 698.547" (0.: clVerage tnt lCore in the district.LOO" (0. R') and the sa mple size (which is the same for all of the regressions.deDt.0.031) 0.. Reg.488 (0.7) 700. SER.773 420 o.0·· (8.1 Results of Regressions of Test Sc:ores on 1M StvdentTeacher Ratio and Student Characteristic Control Variables Using Colifornia Elementary Sc:hool Districts Dependent vo.2.049 14.4) 0.. The final three rows contain summary statistics for th~ regression (the sta ndard error of the regression.·el using a t .
5)  lO.. In the fo ur specifications with control variables. The blank enlries in lhe rows of (be other regressors indicate Ihat tbose regressors are no t included in this regression.(1.6 Analysis of the T esl Score 0010 Set 24J .1.059) (5. regressions (2) .2.28) appears in lhe fi rst row o f numerical e nlries.2. the percentage of E nglish learners and Ihe percentage of stude nts eligible fo r a free lunch.27) . was previously staled as Equation (7.2. which is indic(l ted by the dou ble asterisk next to the estimated coeffi c i~nl i.4" 9.111e intercept (698.0.. these can be computed fr om the information provided. as discussed in Section 6.5S).teacber ratio and ( IVa control vari ables. reducing the st udent.036) 700. ve rows are the them in parc n Discussion ofempirical results.529· · (n.038) 0.Thc /i.uggest Ihree conclusions: I. In column (4). the R2 in Ihe base "" 420. In all cases lhe coefficient on Lbe studentteacher ralio remains statisti cally significan t at the 5% le vel.5) . Column (3) presents the base specification. tbe intercept can be viewed as the coeUicjent on a regres sor that is aJwaysequallo I. Controlling for these student characterist ics cuts the e ffect of the s tudent I hypothesis thaI : asterisk) or Ihl: statistics for the listed R2. the (slatistic testing the hypothesis that the coefficient on the studentteacher ratio in column (1) is zero is .teacher ralio (.9) and its standard error (lOA) are given in th e row labeled "Intercept. and in column (5) both of the economi c background variables are included.38.049. The estimated coefficient 00 the student. Column (2). Although the table does not repon I·stati stics.1 9) • .52) appears in parentheses just below the estimated coef ficien t. the R2(O.This hypothesis is rejected at the 1% level.lhe SER (J 8. For example.1 30" . "R!) and observations).. 111ese results s. Columns (4) a nd (5) present alternative specifications thaI examine the e ffect of changes in the way the economic background of the students is measured.7. The student characterisTic variables are very useful predictors of test scores.28/0. and its standard error (0. for example.\.773 420 Slon d3td eH()T'i lr H\ % ~igni {i All this information appears in column (1) ofla ble 7. .tcacher ratio alon e explains only a small fraction of the variation in test scores:The R2in column (1) is O.(5).048 (0.n Ihe table.52 = .2 jumps. (1." (Someti mes you will see this row labeled "constant" becau~c. in which the regressoTs aTe the stu de nt.01" .0. Th e stude nt.ion of test scores on the stud elll.and the sam ple size n (420) appear in the fin al rows.teacher ratio by one stu dent per teacber is estimated to increase average test scores by approximately one point.teacher ratio and on the percentage of English learn ers. however. tt'lis teacber ratio on test scores approximately in balt 'Ibis estimated effect is not very sensitil'e to which specific control variables arc incl uded in the regres sion.) Similarly.. which reports t he reg res.4. Regressions thaI include the control variables measuring slUde nt ch<fracteris Lies are reported in col umns (2)(5). n format appears of the test scOre: 4uation form .049). the percenlage ofstudents on income assistance is included as a regres sor. ho lding constant st ude nt characteristics. 2. when the st udent chamClerislic variables are added.llS 0.
f~ro at the 5% significance level.. 773..ingle·regressor estimates o f Chapters 4 and 5. denl.: popu li!' tion regrc"~ion function is linear in the regressorsthat is.<.\I .. Duing SO cub. In fact.. trict" th. . 'his additional control variable i" redundant. is 0.nificant in specification (5). The analy)is in this and the preceding cha pter has presumed that tht. 3. i..lhe OLS estimator would have omitted \'.. tha I the condition.J . and confide nce interva ls are much ma rc useful for advising the superinten dent than the ... 7.lhc estimated effect of a urnt change in Ihe sludentlcacher ratio in half... at least for the purposes of Ih is analysis.11 cxpectation of Y! given the regressors is a straight line..ICachcr ralio. If so.llly E nglish learners and d istricts with many poor children have lowe r test ~orcs. Bec:lUsc adding Ihis control varia ble to the bas~ specification (3 ) hil)' :I negligible effect o n the esti mated coerricient for the sludentIcadh. tbe population regression line is nOt linear In the: X's but rather . holding these co ntrol vnriabk ). although it rema ills possible to reject the null hypo thc)'is that t h ~ population effect on h:).leacher ral io in the d istrict.' r ratio and its standard error. hy p(llh l!~IS tests. There is. Thus.. thc hypothesis that the coefficien t o n tbe pcrcentage qUilli· fying for incnme assistance is zero is Dot rejccted at the S°k.omitted student charac\(:risljcs thai innuence leSl scores micl11 be correlat ed with Ih~ student. 111e control varia bles are nol always individuall y statistically significant In specification (5). he coefficients on the 'tu· dent lJemographic variables are consiste nt wllb the paHems seen in Figure 7 2 Dist ricts with m. a nd U so the \(U. these muhiple regression estimate).Ieacher ratio in the dislrict wou ld pick up the d fect on lest scores of Ihc'''' omitted s tudent characteristics. we need the lools devc lop<!d "'!~ ____.pcdfication .. we a ugme nted the regn:''> sian by including variables that con lrol [o r va rious student ch ar a c l erisli~ (I hl' percentage of English ICnrnl!h and two measures of stude nt econom1C bJck · grou nd). no PitT ticular reason to thin k this is so. however.limmate omitted va riable bia" ari'lo)! from thc"c student chllracteristics.t scores... The signs of .'I ri able bias. To mitigate this potentia l o mitted varible bia::. constant.1t aln.. lic is 0.. To extend o ur anul). Because they e.n: ~ · sion function " thal Are nonlinear in the Xs.82). leve l (the /·st. and because the coefficient o n this con trol vun· able I!> nOI sig. regression (3).:ady hme small classes.. ItJ rcg.. tbe eUecl of reducing the StudCnltcachd ratio migh t be quite d ifferent in districts wilh large classes than in d.~ a nonlinear function of tbe Xs.7 Conclusion Chapter 6 bl!gllO with a conce rn : In the regressio n of lest scores against the stu d\!nl. • 244 CH APTiR 7 ~is TasPs and Confidence In lervals in Multiple Regression ). howc\ cr.
. ~ The basi specification can be modifi ed by including additional regressors that . es. .ricted regression (230) unrestricted regression (230) hOJll OSked . p d l = 0 in the multiple . · Ilh Ihe highest R] can " tcallon w lead 10 reg.D_ : ~.. 1 /I. first detern. nn par· esis that /3 1 = 0 all(/ fJz = O.. Explain wh y the '\c y 10 J OLS eStin1310rs "'ould be biased and in col"I ~ i stc nt.I / nato rs of the regres.In F.sed in the onevariable linear regression_model o f Cha ~% con fidence Lllle r • ple r 5.Ie. 'testS regressio n model.l ~t ici t\' _o n l) Fslatislic (231) 95 % confi~lcnce 'set (2. . 's a.~ .. . <>sion coeffici ent a re log1 e regrc carried oul using essenliaUy the same procedure. Yj = f3u + fJ. Hypotheses involving more than o ne restriction 0 1\ the coeffit·ic nls are called joi nt hypotheses. Regression specification proceeds bv . ·.. sion coc fficient (s).1 Ex pla in how you would lestl he null hvpo. ExplaiP how you would test 12. !he Coocepb 245 Summary L H ypOIhesis lests and con fi de nce int ervals fo r a .~ m p Ie . IS 3. Joint hypotheses can ~ lested using . 'h a . is givcn by p. :::!: 1.3 4 ) base speCification (23t j ) ahe mali\ c 'Ipecific3l it 1ns (237) Bonfe rroni lest (251) Review the Concepts 7. dd . For clI.! causal cf(~C l of mleresl. Explain ho\l. . Expl<lin why Ihe Rl is lit I be I igh ..o I Id testlhe joint hypoth the null hypothesis that .' .·· " tllllng a bil .96SE(j3t)· 2..Isten..a 9 . Why isn't the tl:sul l or the joint test implied by thc resulls o f Ihe fi rst lwo tests? 7.e specification cho sen to address concern about omitted variable bi .l ler potential sources I ress 0 o f omill ed variable bias.IO rt!striclions (226) joint hypolhesis (226) Fstatistic (227) reSl. Si mply choosing the spc"lr. Key Terms ~Iu V. va l fo r P.2 Provide 3n example:: o r a regression Ihat atkuablv WOl ~ ld have a high value of 1<2 but would produce biased and iocol"I. I IHII were .ression models Ihat do nol esti mate thl.z )'ou \\ O U popul a · . = O.X li + fJzXz.
Is the male. degree. tained info rmation on the region o[ (he count ry where the person lived. b. 7. 0 if hig. mar. Do there appear to be important regional d ifferences? Use an appro priate hypothesis test to explain your answer. a if male) A ge = age (in years) Nth easr . me nt fo r each worker was either a high setlool d iploma or a bachelor'. Midwest. 7. 0 otherwise) Sow" = binary v:v iable (1 if Region = Sout h. and number o( children. The worke r's ages ranged (rom 25 to 34 years. Is age an impo rtant detenninant of earnings ? Use an appropriut e ~IO' lisli call est <lndlor confidence interval to explnin your answer.246 (HAPHR 7 Hypothe~s Tests ond Confidence Intervols in Muhiple Regr~on Exercises The firs l six exe r ci~es refer to the table of estimated regressions on page 2.year workers.. The data set consists of infor. The highest educa tional achlc\t:. . b.3 Using the regression resuhs in column (2): 9..<. Betsy is a 34.1 Add . O othe rwise) 7.ion statistically significant at the 5% level? Construct a 95% confidence interval fo r the differe nce. Sa lly is a 29ye arold female college graduate..4 Using the regrc.. 7. compllled using da ta for 1998 from the C PS. Uo the rwise) Midlvest = bina ry variable ( 1 if Region ::.2 Using the regression results in column (1): ll. yearol d female college graduate . ital status.n. Construct a 95 % confidc nce inte rval for th~ expected difference between their earn ings. Is the collegehigb school earnings difterence estimated fr om thi~ regression stati stically significant at the 5% level? Construct a 95''' 0 confidence in ll!r val of the difference. (5%) and " *¥" (1 %) to the table to indicate the statistical signifi· ca nce o f the coeffici ents.h school) Female = binary variable (1 if female . mation on 4()()() fulltime full.female earnings diffcrence estimated from this regre<. For th e purposes of these exercises k't AH E = overage hourly earn ings (in 1998 doll ars) Co f/(!ge = binary variable (1 if college. Uo the lw ise) IVes/ = binary variable ( I if Region = West.binary variable (1 if R egion = North eas t. The data set abo Con."~i on res ults in column (3): a.
eJl.21) 2. Molly is a 28yearold femil le college graduate from the West.28) . converted into 1998 dollars using the consumer price index). (Hil1l: What would hap pen if you included West and excluded Mid· wes( (rom lhe regression?) 7.05) 3.26) 12. .10 6.27 6.60 (0.1 94 R' 0.JllSti c for regional effects .29 (0.69 (0.47. The rcsuils are .0 6.or 1' 1 5.21) (21 JAS (0.27 (0.S The regression shown in colunUl (2) was esti mated again .04) Age (X l) l'llnheast ( X~) 0. ii.Cor 247 .62 (n. Jennifer is a 28yea ro ld [e male college grad uate [ro m the M idwest i. Conslruct a 95% conIjde nce in terval fo r Ihe d iffe re nce in expected r1ate sIn L\rold eumings be lwee n }lIa nil a anu Mo lly.06) s.62 (0. vc ~pe nc:lent variable: ovet"age hourly eorningl IAHlJ• ROSI .21) (31 or's College ( X t ) Fe male (X2 ) 544 (0.1 4) 4.21 0.69 (0.176 " 4000 4000 4000 dence b.64 (O.40 ( 1.W) 2. Ju anita is a 28yearold female college grad uate from the South.ignifl Summary Srotisriu and joint Tesh SER th is :95% ~re!>Sion F'\[.75 (1. 2.20) 0.46 (0. Explai n how you wou ld conSlruct a 95 % confid ence interval fo r bl fo r the an apprO' the difference io expected earnings between Juanita and Jennifer.30) MId west (Xj ) Soulh (Xo) ln tcrcc pt 0.04) I con let ".20) 0 .22 0.190 6. this time using dala from 1')92 (4000 observations selcclcd at random from the March 1993 CPS.0.29 (0.Exercises Re5Ulis of Regressions of Average Hourly Eamings on Gender and Education Binary Variobles and 0tMr Characteriltks Using 1998 Data from tf1e Current Populorion Survey .
0(1) (O.7 Question 6.. SER = 4!. to Table 7. Construct the homoskedaslicityonly Fstatislic for tcsting {31 :: {34 () in the regrcssion shown in column (5).59Fenwle + 0. the cocfficio.:!e l fro m an adjacent lo t.2l .2 = (0. '1l1e f:·statist ic for o mitl ing BDR and Age' fro llllhc regression is F =' O.03) Xiil = 0.~~) (0. labor marke l.29College (O.. A homeowner purchases 2000 square f. «US) (0. large.20) Comparing this regression 10 the regression for 1998 shown in column ( 2).! for the change in the value of he r house.77 + 5.OO04S) + O.6 1) (!U.9) + OAS5RDR + 23.6 E\alua te the fo U on Female is negative.I 248 CHAPTER 7 HypoIhe~is T ests ond Confidence Intervob in Mulriple Regre~sion 2. Is Ih is consistent with your answer to (a) and with the regrec..5 a.8 R cfe rrin ~ !l. R.2 . .1 11) 0. tUl\C p.0. This provides strong stati sti cal evide nce o f gende r discriminalion in lhe U.= 119. SER = 5$5.1 in the text: Construcllh c R2 for each of the regression s.'· 7.OR Are the coefficients on RDR and A ge statist ically different from ze ro at the 10% levct? 7.8Puor..5 re ported th e fol lowin g regression (where standard e rrors bee n added). Typicall y fivebedroom houses sell for much more than two· bedroom house:.94) (0. Con struct a 99% confiden t inte rv.. was there a sI31islicaily sign ifieanl change in I be coefficiem on CQllege? owing stateme nt: ' In all of the regressions.1. b.2 (2J .002Lsiu (2..48.156NsiU + O. and statistically significant.48(1/11 + O.::nt 7. R.S.72 .· sion more gene rally? c. D o you think thai another sca le might be more a ppropriate? \\'h )' or why no t? e.40Age..5) (8. d.. Is the s tatistic significant at tht: 5% level? t"" Test 133 "" f34 .OYOA gL' . Lot size is measured in square feet. Is the cocfficienl on BDR statistically significa nll y difre re nl from :Le TO? b. 0 in the regression shown in column (5) using the Bon rerroni tCoSt discussed in Appendi x 7.
= 130 + {3 IXli + fJ2X :1' + u. What is the cslima!cd effect of Age on e arnings? Construct a 95% confidence in terval for the we (fi cient on A ge in the regression . Show tha i the two formulas are equivalent 'iz.m(lle and Bachelor can be deleted from the regression.3 to lIansform the regression so thai you can use a (stu ~ge? fii cicnt s slrong et. c. A lexis is a 30 yea ro ld fema le worker with a co llege degree.14 ) show two fo rmulas for the homoskcdasli cilY only r "statist ic. (Him: You must redcl'inc the de pendent variable in the regression. Test I.1 Usc th e dat.Empirical Exercises 0. c. Bob is a 26· yearold male worke r with a high school diploma. f31 + /3 2 = 1.) 7. Are the result s from the regression in (b) substa ntively differen t fro m th e result s in (a) regardin g th(' e ffects of A ge and A H £'? Does !he regression in (a) seem to suffer from omitted vari able bias ? ther scale sion is F == '[ereol froJ11 d.13) and 17. f3 r = 131: b. and educat ion (Bachelor). whe re a is a constan t. What is the est ima!ed inte rcept? What is the e stimated slope'! rcgrcs· lof her t lot. rbedroom ~le Run a regressio n or average hourly ea rnings (A HE ) on age (Age).e Empirical Exercises {rom E7. Use "A pproach #2" from Section 7.he null hypothesis that bOlh FI. R2 and R2. Pre dict Bob's earnings using the e stimated regression in (b). 19 {3~ = f3~ == II [lineant al the Llsing the Bon • . II. d. Compare !he fit of the regression in (a) and (b) using the regressio n standard crrors. 7. Why are the Rl and R2 so si milar in regres sion (b) ? f. Construct a 99 % confidence interval for {31 for the regression in column 5. Tcsi Ih e null hypot hesis that Bllchelor ca n be de leted from the regressio n. Are gender and educatio n de lerminants of earnil. f31 + a{32 = 0. 249 mn (2)." ors have listie to test a.10 E quations (7 ..:a l Exe rcise 4.2 l. c. gender ( Female).1gs? Test the nu ll hypothesis lha l Female ca n he dele ted fro m the regression.9 Consider Ih ~ regression model Y.1 to answer the following questi ons. Predie! A lex is's earoings using the regression .:l se t CPSQ4 described in E mpiri r. Con b. R un a regression of A HE on A ge.
Discus~ boW th e estimated effect of Dist on ED changes across th e specifications.15 year if dis ta nce to the nearest college is decreased byl0 miles. c. O lhe r factors a lso aHee l how much college a person comple tes. Construct a 95% confi dence interval for the effect of Beauty o n Cou rse_ E mf. Include a simp le specification [co nstructed in (a)l· a base specificatio n (t hat includes a sel of important conlrol "an· abIes). b. i E7. carry Qui Ihe fo llowing e"e rcises. R un a regression o f years of completed e ducation (ED) on disl30ce to the nea rest col. Ilnd Hi~ panics complete more college than whites. a pe rson·s edu catio nal a tla iome nt wo uld increase by approximately 0. II has been argued Ih at. b. Whic h do you think. H. Conslruct a 95 % confid e nce interval fOT the coefficient on TradeSJ1I1r/'. bu t e:'tcJud ing tbe data for Malta .n ess of ihe confidence inte r"al that you constructl!d in (a). What a re these two conditions? Do these condi tio ns seem to hold he re? E7. A regression will suffe r from o milled variable bias when two condI_ tio ns hold.1.4. examine the robust. Q. controlling for other factors.1. Is the coefficic nt stulistically significant at the 5% le ve l? . A!iS(lssil/(lfioll. o n avernge. Is the advocacy groups· claim consiste nt with Ine est imated regression? Explain. II. What is a reasonablt: 95% confidence interval for the effecl 01 Beauty on Co urse_Evof? E7. lege (Di~I). A n educa tio n advocacy gro up a rgues Ihal. YearsSchoof. Rev_COUPs.4 Using the data sel Gro\\1h described in Empirical Exercise 4.. Docs controlling for these other factors change the estimaled effect of di!> tance on college ye<lrs completed? To answer this q uestio n.2S(l CHAPTER 7 Hypothesis Te~~ and Coohdence Intervols in Mvlrip1e Regression g. Consider lhe various control variables in the data set.~ a nd RGDP60. carry o ut the following exe rcises.1 U sing the da ta sel TeachingRa liugs desc rihe d in E mpirica l Exercise 42. blackl>.. construct a tabl e like Table 7.wer loe following quesrio ns.3 Use Ihe data set CollegeDistancc described in Empirical Exe rci se 4. and several modifica tions of the base specifi cation. Run a regression of Course_ Evaf on Beauty. Is this result consistent with the regressions that you constructed in part (b)? I . I Run a regressio n of Growth o n Tr(llieShare.3 10 anl>. should be included in the regression? Using a lable like 1able 7.
Then Pr(A U B) = Pr(A) + Pr( B)  Pr(A n B). and RGDP60can be omitted from the regression..20) (Bonfe rroni o ne·atlltime (statistic test)..The Bonlerrooi Tesl of a Joint Hypotheses 2S 1 Indi seem 'ise 4.al fOf ·· A or B or both" (the union of A and B) .. Lei N and W be the complements of . 011 legreSSion This method is an appli catio n of it very general testing approach based inequality. rejecl Acce pt if li t I S (" a nd if 1 (7. Let A n a be the even t " both A and B" (the intcrsection of A and R). it foll ows tha t Pr(A U B)._111The Bonferroni Test of a Joint Hypotheses able 7. APPENDIX b. structed The method of Section 7. taken as a group. Does C l of dis conslrUct 8 ..o tes. then you will not be able to compute the F·sliltistic of Sec!1on 7. say 5%. respectfully. This inequality in turn ~ implies that I . where ') and'2 are t he Istatistics tha t (esllhe reSlrictions on f3 L and f31._ 7_. This appendix describes a way to test Joint hypothest's that can be used whe n you on1)' have II tisc 4.value of (he Fslatislic? .vise. and let A U R be the event I Rev _Coul'~ ~ inter. and you do nOThave the o riginal data..2 is Ihe preferred way to test joint hypotheses in multiple regre:.· r eel of sian. s and His . TIle trick i~ lo choose the critical value c in such a way that the probability that the one· atatime test rejects when the nu ll h ypothesis is true is no more than the desired signifi· cance level.ion in which you are interested .2 do ne prope rly..Pr(A UB) I . Let A and B be evcnts. ~~ van rriSCUSS hO\\' irtcalions...1. Test whether. However. 1\ . Because Pr(A n B) ~ O. ·ntis is done by using Bonferron i·s ineq uality to choose the critical Y <ll uc c to <lllow both for the fac t th at two restrictions arc being tested and for any possible cor· relation between I) and 12' = /310 and {32 "" f3z. Bonkrroni's ~ on's edu a r [((lis rest col estimated TIle Bonfe rroni test is a test of a joint h ypo Lheses based on the Istatistics for the indi vidual hypothe ~s. the Bonferroni test i~ the o neatatime Istatistic test of Section 7. Rn'_Coups.(Pr(A) + Pr{B)J. Assassill(l (ions. What is the p. that is. if the author of a study pres~ lIt s regression results but did nollest a joint rcstricl.sisten! with kbut cxduJ· Ily significant Bonferroni's Inequality Bonfcrroni's ineq ualit y is a basic resull of probability theory.2. antiYOU _ . Yea rsSchool.The Bonfe rroni test of the jnint null hypothesis /3! based o n tlte critical value c > 0 uses t he fo llowing rule: 11 s c: olhe .2. Pr(A) + Pr(B).ted in ((1)1.3 to table of regression res ults.
l ~r ho moskedasticity.:e k Id I~ S'!Iv and q = 2.lard crror~ th~' 11 the Oonrcrroni test is \"8lid whe ther or not there is heteroskedasticity.lI1dard errors.1 distribution.(oneatatime tes t rejects) S l Pr( IZ I > c) . in large samples. but if Ihe t_l>I 3tl~t i<:" a re b.lity..252 CHAPTER 7 Hypothesis Te1h 000 Confidence Intervols in Multiple Regression A and R.Pr(ArnW) .~r\l l'Orree\.TIlis critical \'a1ue is grealer than \. Then the meq PrOI.'l the Jomt nu ll hypothesis.lctor of 2 o n Ihe righthand ~ i de in Equation p. by looking at two I·~tatistic:s.96 bccau'ie 11 pro\. the f. 1~ Zl > 2. For eX<llI1p[e.s if at Icu~t om: 1_~JjJ \I .: c w that the probability of [hc rc)cc[ion unde r [he null hypothesis equals Ihe deSired sip.Pr( A U 8) . Under the null hypollh.241 in a~olute "" llue. Alx'Ording to Tahle 7. 1> cor 1121 > c or both" IS the rejectio n region of the onc·atH Ulle ti m~" In test..22) The inequality in Equ:uion (7.IPr{A) '" Pr(B)] Now kt A be the e\cnt that it) l7(A U 8) :S Pr( A) I'. 1bis critical value i~ t ho: 1.241) = 2. which yields Bonferroni's inequ:..' 21 > r. that is.21) implic~ tha t. A' nlJ<. the BonfeTToni test is \illid onl\ unt. For cxampk. Pr( 11 1 1> c) '" Pr(jlll > c) = Pr(l l l > c).3 presents cnlical valul'S (' for the one·al3·lime Bonferroni ICSI fur "ilri''u~ ~ig' nlficanec levels and q . .l . Table 7. critical value c w that t he "CIn(' at II larg~ Istatistic has the desiroo significance leve l in large samples.. Pr(Nnl1' 1 .2. m samples.25% percentile of the standllrJ norm.11 \alul.21) provides a way tochoos<: tho. Equation (7.?!. su ppose the desired ~ignifical1i. the probability that the onea tatime test rejec ts under the null Pr/J. so Pr(1 rej~cl IWO cocfficienb: if there an: q restric tions under the null.3 arc larger than the critical values for te~tlllg" ~1T1 ~l<' restriction. (7.22) tells us t hat. "' I J Bonferroni Tests Because the event "1 /. tbe cvents "not A " and " not B:' Because the complement or A UII i.3.nificance le\d The Bonferroni approac h can be extended 10 more than replaced b) 11.S ror the fac l that.! + Pr(R))ieldl Pr( t' .. (( the indi\'idual/slaliSli~ are based on heteroskedasticityrobustMam. as discussed in Seclion 7..\lC exceeds 2. Thus Equa tio n (7. (. Thl' critical values in Tabk 7.s {\: . the onca tatime te5t in Equation (7 . with q .20 ) \\ 111 al most 5% of the time under the null hypothesis.l h u~ Equation (7.22) provides a way 10 choose critic.! the c\'cnt that . > c and B b. the oneatatime tes t reject. i~ in large samples.1. you get a second chance to rcjel.3. and 4.scd o n homoskedasticityonly ~t.1 > c or I' ll > c or both) S > c ) + Pr(1r11> c ).2.241. the critical value c is 2.
22) e c sn lhal the ificance lev". we can rejecl the join t n ull hy ~thesi$ hypothesis at the 5% significance level uSing the Bonienoni test.241.· ificunco: level i~ ieal \ alu. at the 1 % significance level w~rt using Ihe Bonferro ni les!.2 .128 2.ull5. 'nle /. ~ vtllid ani) tln lic' .3 1\'n8') 2: Bonferroni Critical Values c for the Oneatotime tStan5tic Te51 of a Joinl Hypothesis Significon<e Level he inequlll Number of Restri ctions (q) 10% 5% 2. Application to Test Scores 3 e_al:l_lim~ cal a time" ypolhc5is in implies thaI.t:l li~lic.Ih.1 onc 1_~13 Ii~lic cau<..241 2.:rl~ t Tldard errol".807 (7.c it pror.W 2.394 2.slalistics tes ting th e joint null hypolhesis that the true coefficients on test scores and e xpenditures per pupil in Equ a tion (7. because 1 121 > 2..1\.6) are. we able to for var. both (I and 12are less t han 2.241 .The Bonferroni T est of a Jail'll Hypotheses 253 il AU 8 i~ TABLE 1. respectivel y.023 .935 3.:!l I if Ihe 1_. so we canno l reject the joint null rejeel this hypolhesis at the 1% l:.ignifica oce leve l.5%·Thus :.43.01'1 (7.tic in Section 7.2 41 2. 11 = .60 and 12 A ltho ugh lId = 2..0.sig. However.\' if the re are q (Ilion (7.: ~ tesling a .960 2.498 1% 2 (7.: is Ihe )"" I 2. us ing the Fslati. null is < 2.22) is in ahsolute value.ioglC: 1.21 ) 1.20) ~ill lance \0 reject ttl. Tn cont rast.
If so. Whereas the linear population regression (unction in Figur~ K I. ss under control.l~ The methods in the second group are usefu l when the effect on Yof fl has a steeper slope when Xl is smat! than wh en it is larg. (h~ popu lat ioo regression [unCI ion was assumed to he linear.Iconstant slo pe. X I_ depends on the va lue o f XI itself. the test score (Y) is a nonlinear function oC the Studen Heacher ralio (X I)' where this fun ction is s teeper when X \ is small..amplt!. Tn other words. ntis first group of met hods is presented in Sectio n S. But what if the effecl on Y of a ch(lnge in X does depend on th e val ue (If one or more of the independent variables? If so.• CHAPTER 8 Nonlinear Regression Functions I n Chapters 47. for ex. the sL ope of the population regression fun cl io n was con!ot'. students still learning English might cspecially benefit f TQ m h<1\" ing more oncononc attention.iependt'nt varia hI e. the pop ulation regression f unction i s nonlinea r. shown in Figure S.... Th is chapter develops two grou ps o f methods fo r de tec ting and madding nonlinear populatioll regression functions. ln this exa mple.2. l. reducing class sizes by one s tude nl per tcacher might have a greater effect if class sizes arc already man ageably smaU than if they are SO large that the teache r can do little marl! than kee p the cI. if so.teacher ratio \\ill be greater in di~trict~ with many student s stillle:lrnin~ English Ih. say X.:. the effect 011 .ln in diSlricb wilh few English leurncr. the non linear popu lation regression function in Figure S. change in Xl depends on the val ue of another in<. the effect on test scores of n. The me thods in the first group ar~ useful w hen the effect 0 11 Yof a change in one indepe ndenl variable. An example of a non linear regression funerioD wjth this fea ture ie.. I so th at the effect o n Y of a unit chunge in X does nOl i tself depe nd on the valul! of X.lnt. has . For exa mple.dueing the studcnt.
I Xl " . J ('P ~n d~ 0 <1 the val\l ( uf X 2 ~" nction s In Figure 8 .>nds on the value of X2 . X 1. e ffe ct oil . In Figure 8. ng To Population regres. 1a . the slope of the population regre~i on function depen ds on the value of Xl In Figu re 8 .1 b.~ '"" R i~ '"" x\ (a) C omta m ." F()r ving c nonlinear function o r the independent vari ables..ioo ftmctioo wheo Xl eo 1 " Ri~r.) is a nonlinear functi on of one or more of the X's..2 and 8..p~nd.<.le. 01 a In th~ models of Sections 8. that is....Nonlineor Regression Functions 255 FIGURE 8. 1c.fficicnts (or parameters) of th e populat ion regression model and th us arc versions of the multiple regression model of Chapters 6 and 7. ta test scores ( Y ) of a red uction in the st udentteacher ratio (X l ) depends on the re S. (c) Slup". these models are linear funct ions of the unknown coe. Altho ugh they are nonlinea r in the X's. tbe \ \carning. on the v:11u<" of X l luc of y ' ''1 ''.\ _____ '"" x.3. the pop ulation regression function is a X.As shown in Figure S.. "" Population regressjon jU r'K tioo whfn X2 = 0 x. the conditional expectation E(Y. lo pe (b) Slope J". howO 8. the popvlation regre~sion fu ndioo has a constant sk>pe.{~ aL n t. . . the slope of the populotion regress ion fu nc tion dep. Therefore. the slope of Ihis Lype of population regression func tion depends on the value of X l' Th is second group of methods is presented in Sect ion R3. 1 YI Population Regression Functions with Different Slopes Y .1b percentage of English learn ers in the district (X2 ).
256 CHAPTER 8 Nonlinear Regression Functions unknown paramete rs of these nonlinear regression function s can be estimated . 1. Appendix 8. .1 A General Strategy for Modeling Nonlinear Regression Functions Thi s section lays OUI a general strategy for modeling nonlinear population reg res · sion functions. In th is st rategy.. That analysis used 1 '. If sO.5.llt sidized IUDCh and the '''' for inco fl1 ' r .. In practice. Test Scores and District Income In Chapter 7. 1 and 8. but they cll n be estimated using noolinear least squares.3 extends thi.rccntage o f district families q ualifving . the nonli near models are extensions of the multi· pie regression model and lhcrcfore can be estimated and tested using the tools of Cha pters 6 and 7. hOWC\'\. it is impor1ant to analyze nonlinear regression functions in models that comrol for omitted variable bias by including control va riables as well. we re turn to the Californ ia test score data and consider the relationship between test scores and district income. First. and Secljon 8.8. Sections 8. Iv tWO inde pendent \'anables. holding studen t characteristics constant.lnd tested using OLS and th~ methods of Chapters 6 and 7.. In Section ~. we found that the economic background o f the students is an imp<W tant factor in explain ing performance on standardized tests.3. To keep things si mple. rn some applicat ions. addilional contro l va riab lc~ arc o mitted in the e mpirica l exam ples of Sections 8. wc combine uonlinea. t ' .teacher ralio.. the regression function is a nonlinear fu nction of lhc A S alld of the paramete rs.2 irtl roduce nonlinea r regression functjons in the conk\1 of rcgressio n with a single inde pendent variable..!f. the parameters canno t be estimated by OLS...\'0 econom ic background variables {the percentage of studen ts qua lifying for a ~I.r regression functions and additional C(lntrnl variables wh en we l l1ke a close look a\ possible nonlinearities in the relationship between test scores and the student.1 provides examples of such funct ions and describes the nonlinea r least sq uarc~ estim:lIor.tIJ 8. however.
but the linear a s regression line does not adequately de~ribe !he rclotiomhip between the!>e vorioblos score 74" 720 700 """ 660 640 62fi 600 0 10 2 . <lIa ng w. a nd il ranges fro m 5.nd assista nce) 10 measure Ihe fraclion of stude nts in the district com ing from poor fami lies. 1 A General Strotegy lor Modeling Nonlinear Regression Functions FIGURE 8.71).000) or very high (over $40. {he medjan dislrlCI income is 13.2 shows a scallerplo t of fifthg rade leSt scores against disirici income for the Califo rnia data set. A llo nlinear func tio n is a fu nction with a slope thai is no t constan t:1l1e function [(X) is linear if the slope o f [( X) is the l)ame for a ll values of X. slUde nl S from afnue nl districts d o bette r o n the tests tha n slude nls (rom poor d istric ts. bu t if the slope d epe nds o n the va lue of X. it seems that the relationship betwee n d istric t income a nd lest scores is no t a stra ight line.000).'dislriCi i~ com e " ) .TiCf in come (tho usands of doU an) .300 per person). the n f( X) is no nline ar.000 a nd $30. with a co rre latio n coefficie nt o f 0. But this sca u e rplo t has a peculiarity: Most o f thc poin ts a rc below the OLS line whc n income is very low (under $10. ~) '0 50 60 DiSl. broader measure of economic back ground is (bi! average annual per ca pila income in the school dislricl (. A differen l .. Figure 8. . Test scores a od average income a re stro ngly positive ly corre la te d. Rathe r..3 ($5300 per perso n) to 55. but are above the line whe n inco me is between S15. The Cal ifornia data sel include.sl Score Vi. The re see ms to be so me c urvnturc in the re la tio nship he tween test scores and income tha t is not captured by the line a r regression . District Income with a Uneor OLS Regression function Tf'~1 There is 0 positive correlotion between lest ~ ood dldJicI iocome (correlation "" 0 . $13.8. dislrici income measured j('l tho usands of 1998 dollars.3 ($55. TIl e sam ple contains a wide ra nge or income levels: For the 420 d istricts in o ur sa mple.2 2S 1 Scotterplot of fe.lb tbe OLS reg ressio n line rela ting these t wo variables.. In sho rt.700 per person).000. it is no nlinear.71 .7 (I hat is.
llllcome. is a quadrat ic fu nctIon of the independent vari"ble.ln d /31 are coeffici ents.. = f30 + {JI l n collle. The fir st regressor is I ncome. In come. what is? Imagine drawing. One way 10 appro'{imalc such a curve mathematically is to model the relations hip as a quadratic (un(lioo. + f311 tlcome! .. usual) standard errors o f the estimated coefficicll h are gi\cn if! parenlhesc~ Thl..s. however.2. Eq um io n (80 1) is e~\ l led the qu adratic regressio n model became the population regression function.Ih diMricl.. If YOli knew the population coefficien ts f3r" 131 ' and 132in Eq uation (8. . it might seem difficult to find the coefftcie nts of the quadratic func tion that best fi ts the data in Figure 8.2 yields r.) = f30 + f3Jncorne.sion .}.. /l1wm e. and th e second re gressor is IIIC()IlI e1.2... is M error term thal. R~ 12. you . E stimating the coe ffici e nts of Equnti<lfl (K 1) using OLS for the 420 o bservations in Figurc H.. A quadralic population regression model reialing lest scores and incomo: i~ wriu... But these population coefficients are unknown and therefore must be estimated using a Sdm· pic of datil. is the inr.3 + 3. as Income a nd I ncomt? the nonl inear model in Equamll1 (8. as usua l.! e~limated regression function (So2) is plotted in Figurt' :o.. TllU~.• 25' CHAPTER 8 Nonlinear Regression Functions If t\ slraighl line is not an adeq ua te description of the relationship beh"cen district income and test scores.IH) . 'AC could model test scores as a functi on of incomt: {/Ild lhe square of income.:n mathematically as TesrScore. rcprest:nts "Il the other fac tors T ha t de termine leSt sco res.554. Income. lI ' unknown popul ation coefficients can be est imated a nd tcsted using the OLS methods descr ibed in Cha pters nand 7.1 ) where /30' 131' .I ) is in fnet a version of lhe multiple regression mode l with twO regre~..9) (0. ill see that Equa· tion (~ . 4 f32 / I/(:om('~ + 1/" 1'. in Figure !j.IJO.2.27) (O.. 0.1 where (a. Ilttn would nallen out as dist rict income gels higher. (':\.nrs.:ornc in th e . you cou ld predict the lest score of a district based On ils average income.0423IlIcom r ..3 cu rve that (jLS the point>.851IlCOIIIl' ~ O.11131 \. is the square of income in the ith dist rict.1 ) is simply <1 mul tiple regression mode l wilh two regressors! Be<:<lu~c the quadrat ic regression model is a variant of m ul tiple regrc<. =0 607. E ( TesrSto re. If you compare Equation (8. :Iller defining the r~gn:sson.1). This cu rve would be steep for low values of district income. and II . I) \\:ith the multiplt!' n:gre~sio n model in Key Concept 6.. At first..
1 \ (. the n Equation (8. after del in E4uation ~Ie regrc%ion. that is. we can test the null hypothesi s that the population regression fu ncti on is linear against the alternative that it is quadratic by testing the nu ll hypothesis that {32 = 0 against the alternative that /32 *. District Income with linear and Quadranc Regression Functions Test score 740 Li near fE"9 ff'S>ion The quadratic 01.01%.1"11at j~ f income. In short.5 regression function. the null hypothesis that /32 = 0 can be tested by constructing the tstatistic for Lhis hypothesis. If the relationship is lineaL then th e regression function is correctly specified as Equation (8.: m~ .unc : q ua d ratlc ~ (8. against the alternative that it is nonlinear.2 and 8.1).8. so we can reject the hypothesis th at /32 = 0 at aU conventional significance levels.1) holds with /32 = O.tu " Quadrati<: regressil)l1 he r ll district.) fllts are given iTl I ted in Fi~ure s}· superimposed over the scatterplot of the data.. Thus.0043 = 8.0423/0.1 FIGURE 8. Indeed the pvalue for the Istatistic is less than 0. Thus. th en mate such a :tion. . you bille. In absolute value.. The quadratic function captures the curvature in the scatterplot: It is steep for low values of district income but flat tens out when district income is high. ~ te rm tl.aL (IS luation (8 . (It.5 regre~ion function fits the dolo better than the linear 01. 620 (. Because Equation (8. ::w 700 680 660 (8.3: The quadra tic modd fit s the data better than the linear mode l. except th at the regressor Inc:ome 2 is absent.96).2) is ( = 0. Thus thi s formal hypothesis test supports our informal inspection of Figures 8. \. if the relationship is linear. the quadratic regression function seems to fit the data better than the linear one. . 1) . ntl income is Scatterplot of Test Score n.81.1 ) with tho: i'see that Equa' two regn::sson.M IJ 10 20 30 61) 40 so DiHTicl income (thousands of dollars) Itic (unction of ~tion (R. this exceeds the 5% critical val ue of this test (which is 1.3 A General Strategy for Modeling Nonlinear Regression Functions 259 ip between t!> the points lcollle. This {statistic is ( = (~l . Bul theSi: ed using a sam l I .1) is just a variant of the multiple regression model.0)/ SE(ffi2)' which from Equation (8.n is ssion fu nction.554.O. its ~ using the OLS knts of Equation f = U. We C(ln go one step beyond this visual comparison and formally test the hypothesis that the relationship between income and test scores is linear.
penden t variables X2.) + Il j. When the popula tion regressio n fun ction is Iinc'H. froll! .. X k ..) . flIC models In the hod)' of Ihit chapl er fire all in the first family_ Appendl~ ft.. up model. .l t.mJI)... .. holdi ng constan t other im.. X l . II . i~ the e rro r te rm .3) allows for nonlinea r regression func· tions us well.. r rej!rt!>l>lon·· IIpphes IWO co~ccptUlllly ~iffcrcnt ramlh~ of modeh.)n function IS a noJl' hnear funcuon of lhc unknown parnmclcl"5 and mlly or mll~ not be" mmlmear funcllon or rhe . H o wever.01 whcre !(X li • X2i . i = t. . .) = f30 T f3 1/ncome. Dccausc the population regressio n func tio n i~ the co nd itio na l expectation of Yi give n X I/' X u •. .X ) =Al + 13IX" 2l•. I Tbe no nlinear popula tion regressio n models conside red in this chapte r a re of the form \'alue Ih~ p ~~' jI \~ YI :::: A X il. X 2J •• .. .. .. X k .3) . where! ca n be a non line a r fun clion. In the :)«ono fllmil). the e xpected change in }' h more com plicaled to ca lcula te because it can depend o n the values of the inde· pendent variables._ that is. . constant. Xt.fi X II. The effect on Y ofa chanEe in X. I ).the populalJon n:tt:rc:u.. + t3. IX II' Xli' .Xk/. this effec t is easy to calcula te:As sbown in Eq uation (6. Fo r exam ple.X! j. Ihe pi A generalformula for a nonlinear population regression function . second h..<SInn funcllon IS a nonlmear function o f the ). in Ihe q u ~d rat i e regressio n mo de l in Equa tion (8. Equa tio n (8. (~..a nd Eq ua tio n (8. Ihenj(Xli.. ilnd II.~~ but I~ II hnear fundl "it the unknown pammetcrs (the 11").) .... so X I is Income a nd the population regres sion (unction is j{ lncomt:... . ' X ".. • X A + t32XZ.. the e£fe~1 on y or a change in X I' uX I • ho lding Xl> .4). X" . • Xli.... . . is the d iffe rence In the: '1l1<:" tcon Mn(mline . a possi· bly non linea r fun ction o f Ule inde pe ndent varia bles Xl i' X 2i •• . I tllk~ . TI XI ·10 run exp..2.3) becomes tbe lincar regression model in Key Concept 6..only o ne inde pe ndent vari:t ble is present.• . + .. X k.) is Ihe popula tio n nonlinear regreSSion f\lndio n. where 131 is the popu lation re gressio n coefficien t multiplying Xl. in Eq ua lion (X3) we a llow fo r the possibility Ihat Ihis condi tiona l expectation is a nonlinear fu nctio n o f X l. however.2.Ii" '? fll.. If the populmion regression func lio n is line ar.lly. Y = t3 l AX I . Whe n the regressio n function is nonlinear.. As d iscussed in Section 6. In lhe i~ the populllllon rcsre. X 2.m.. + fJJ l ncomer. the expected cha nge in Y is 6..260 CHAPT(R 8 Nonlinear Regreuion Functions The Effect on Y of a Change in X in Nonlinear Specifications Put aside the test score example for a mome nt a nd consider a gene ra] probk:m You want to know how the depe ndent va ri'lble Y is expected to change when tht independent va riable Xl changes by the a mo unt 6X I ..• X"i.\ .. £ ( Y..
5) ./f. ( is de· The c:. .. j. d ... . . .. ~ m. the e~pt!clcd change in Y is the difference : 8.3).. hold i ng X 2• .~s.Xk• The met hod fo r calculating the expected effect on Y of a change in Xl is sum marized in Key Concept 8.ff.6.3) 'SSI The estimator of this unknown population difference is the difference between the predicted values fol' these two ~ascs. ""~ ". X 2····· X". Xd· Because the regression function/ is unk.:ssion m ouel of Equation (8. What is the predicted change in tcst scores associated with a change in district income of$I 000..Y = l(X I + l it.. .md the expected value of Y when the independent variables take on the values X l' X?..X 1. We there fore . associated with the change in X l' ~Xl. Y.. ··.X2 .. Xl .1 . between these two expectcd val ues. ..4) :be .) . th e population effect un Y of a change in Xl is also unk nown. . X 2 ... XI: ilnd th e predicted value of Y whe n {hey take on the values X I . A t a general level.r'\ :lion . X.). say !ll~ is whal happens to Yon average in the population wh e n X l changes by an am (l unt !lXI' hol ding consta nt the oth e r variables X :" • . constant..XI:' Tn the non linear regr. X:!.odd Irune· CI OO n {he J(X ) + .( d pVO .f(X"X.tion in Equation (8.2). %flJJ¥H5M>b.~. X2•• · •• Xk ) be the predicted value of Y bas~d on the estimator f of the popu lation regression fUllction. T11en the predicted change in Y is ~y ~ j(X. (S. .. X I . Lei I{X!. .' 8. To estimate th e population effect.\'"I' X 2.... he Ie· aT.1.nown... 'he . X. lion. .n of Ihi~ expected value of Y whe n the independent variables take o n the values Xl + aX I. Xj.2)? l:kcausc that regress ion function is quadratic.l.. ). .. " ..1y) of the change in X l is the diff~re nee between the predicted va lue or Y whe n the inde pendent variables take on the values XI +. . .l. .1b' 0111 \)IC Application to test scores and income.pected change in Y. X 2' · •. . Ih~ (. ... 111e estim ated efIect on Y (denoted. Ihis effect on Y is [) Y = is. based on the estimated quadratic regression function in Equation (8..:: 261 "'I KEY CONCEPT 1 in X.. is the difference between the value of the population regression function before and afler changing Xl ' hold ing X 2_•••.W. + "X.nly res. X 1..r.~/:.1 A General Strategy lor tv\odeling Nonlinear Regression Functions . thb... denot e this esti mated function by f: an example of such un estimated fun ction is the estimated quadratic rt:grcs sion fu nc.3) "~ .  THE EXPECTED EFFECT ON Y Of A CHANGE IN Xl IN THE NONLINEAR REGRESSION MODEl (8 . X I:' The difference. Xd . XI: . X k constant That is. (8. I lXI.rm .X.'" . effect d e pends o n the initial dis trict income.. X .. lirsi estimate the population regression fu nct ion.
3 ror testing a single restriction involving multiple coefficie nts. the slope of the estimated qua· dra tic regression fu nction in Figure 8.000) than a t the higher values of income (li ke $40.COO) .(Po + ill x 10 + h x 10').. To comp ute AY associated with the change in income from 10 to I L .3 + 3.53. lha t is. " Ln the nonlinear regression models of th is chapter..() per ca pita to $11.2) .641.0. the predicted value is 607. Do ing so yields o (A.O<!<J) and an increase in district income from 40 to 41.3 + 3.. O ne wa y to qua ntify the sampling unce rtainty associated with the estima ted effect is to com pUle a confide nce mlt!fVa l for the true population effect.0. The estima tor o f the e ffect on l' of c ha nging Xl depe nds on tbe estim a tor of the po pula tion regression fun ction.(607.000 to $41 .1inS sampling error. II is easy to compute a standa rd error for ~y whe n the regression function is linea r.3 + 3. Thus.icted va lue of Y when income = 10. we need to eompUle the standard e rror of ~ y in Equation (8.0423 X 402) = 694. Said differently.e. x II ' ) . the differ e nce in the predicted values in Equation (8.0. Th e te"" in the fir st set of parent heses in Equa tion (8. Whe n Income "" 11 .(0) tha n if it is $4U. Accordingly.000. we lan apply the ge ne ral form ula in EquaLion (8.85 x 40 .OOO is 2.57. (S. iJu. To do so. These predicted va lues a re calcul at ed u~ing the OLS estim ates of the coefficients in Equation (8.n) whe re PI' a nd ~ are the OLS estimators. The ~stimated c[(ect of a c hange in X l is ~IAX1' so a 95% confidence inter· va l for the estim ated c hange is ~16XI :!: 1.5) 10 the quadratic regression model. from $ toJ).3 + 3. whe n income changes from $40.85 X 11 .96 points. Y Standard errors ofestimated effects.6) is the predicted \ alue of Y when income = 1\.0423 X 412) .6) is AY = (607. wh ich va ries from one sample to the next.0423 X 11 2 "" 644.oo:J (the predicted chang~ are 2.96SE(PI )llX1.0.42 points.85 x 4l .53 . th e pre dicted value of test scores is 607. the predicted difference in lest sco res be twee n a district with ave rage income o f $11 . In the second case. the sta ndard error of :lr' can be comput ed using the tools introduced in Section 7.96 points. consider lbe r .000 and one with average income of Sl O.57 = 2. and the term in the second se t of parentheses is the pre.62 = 0.693. There fore the estima ted effect c(Jnt. d.04 . 0423 X J01 "" 641.3 is steeper a t low va lues of income (like $10. a cha Dge of income of $1000 is associated with a large r change in predicted tt!st scores if the initial income is $10.85 X 10 .262 CHAPTER 8 Nonlinear Regression Functions conside r two cases: an increase in uistrict income fro m 10 to 11 (i. when hu:vme = 10.96 points versus 0. To illustrate this me thod. The di ffe rence in these two predicted values is 6 = 644. + ill X 11 + il.5). 42 point).
6).94 = 0. In the multiple regression mode l of C hapte rs 6 and 7. applying Equati on (S. le differ . But.3. I~Yl Vi' (8.3 for testing a single restricti on on mu ltip le coe ffi ciem5. + 2tb f 1~YIS£(dYW. For eXH mplc. predicted d changes alcd qua Ollle (like \V'hen applied to the quadratic regression in E qu a tion (8.s had a natural inte rpretation .6) I ~d \'a\ue the pre ~u using Inco me Thus. o ne of the coe ffi cie nts is {3. if we can com pUle I~e standard error of {3] + 21~.3.ilnd solving for SEedY).. Thus a 95% confidence inte r val (o r the change in th e expected va lue of Y is 2 .7) (8.29). The standard error of flY is then given by2 S£(6Y) = V 10' = ~. + 21.961'\1299.. That is. 8.(XlQ estimated change in test scores associated with a change in income from 10 to \1 I'e can nodel. tllis is not generally the case in a nonlinear model. v .0. which is to compute the Fstatistic te~ting the hypothesis th at (3.8.. lion effect. it is not very helpful to th ink of /31in Equation (8. The first method is to use "a pproach #1" of Secti on 7. c\ on yof unction.96 == 1.).17 or (2.C lesting this hypoth· esis. which correspond to the two a pproaches in Section 7. the regression coefficient.Y = 2.0423 2 points.3.1()2) = PI + 21{J2' 'Ine standard error of the predicted change therefore is in Eq uation (8.82 = O. l t function i~ deuce in te ! error of oli' lconsider l}\e ~ting a siogle ~Equall('" (8.8) 6 points.J b)' noting thar the F·stati~\)c is the square orthe lstatbl.. holdi ng the square of th e d istrict's income constant. the regression func tion is best interpreted by gra phing it and by ca lcu lating t he predi cted effec t o n Y o f changing one or more of the independent variables. F . The second me thod is to use "approach "2" of Section 7. /31 is the expected change in Yassociated with a change in Xl.z. Because 6.9).thal is.17.10 ) + ~2 X (1 12 .5 ). There afC two met hods for doing this using sia n· dard regression software. as we have seen. Th is means that in nonlinear models. r.96 x 0 . in the transformed regressio n. + 21f3.94. then we have com puted Ihe standard effor of 6 Y. ct contains cd with tho! A comment on interpreting coefficients in nonlinear specifications. which entails trans forming tbe regressors so that.8} is derh·.63.1 A General Strategy for . ~I X ( 1 1 .96. constant. j.2).57 riet with . w hich is uY = S£(6Y) = SE(jJ. Doing this tra ns fo rmatio n is left as ao exercise (Exercise 8. the Fstatistic testing the hypothesis that /31 + 2 1/3 2 "" 0 is F "" 299.1) as being th e effect of changing the district's income. (8. + 21iJ.Modeling Nonlineor Regre~on Functions 263 'O. + 2Ib~Sli<P." l(b.8) gives SE(ily) = 2.0423 x 641. holding th e other regressor.
th e e lJe" Oil Y oj a change in X. Determin e whether fh e nonlinear model improves upon a lin ear model.:nd on [he value of X or on another independent v ari able. r lol/he estimated nonlinear regressionJunclioll_ D oes the e stimated regres· sion fu nction desc ribe the dat a we ll? Looking a t Figun:s 8. Th e fi nal ste p is to use tht' esU m ate d reg ress io n to ca lcul ate t he e ffe ct on Y of a change in one o r more regrt!ssors X us ing the method in Ke y Concept 8.. Before you even look althe data. thin king about cl assroom dynamics with l l yearolds suggests that clIlting cl ass size from 18 students to 17 cou ld have a gre ater eHec! than cutt ing it fro m 30 1029. 2. tha t can be cqi ma ted by OL~. pic. • 264 CH APTER 8 Nonlineor Regression Functions A General Approach t o Modeling N o nlinearities Using Multiple Regression The genera l approach to modeling nonlinear regression f unctions taken io thl~ chapler has five elements: I. ask yourself whether the slope of the regression function relating Yand X migh t reasona bly dcp. E~'l imate 8. 4. The best th ing to do is to use l'l:O_ nomic theory and what you know aboullhe application to suggest a pO~:'lblc nonlinear relationship.. Jdemijy a possibl~ nonlinear relationship. Why might such non linear dependence ex ist? What non linear shapes does this sugges t? For exam.1.2 Nonlinear Functions of a Single Independent Variable This section provides two methods for modeling a non linea r regression fun ctiOn To keep thi ngs :. we develop these methods fo r a nonlinea r reg r e"~L()1l .2 and 8. Most o f the time you can use (sta tistics a nd F·stat istics 10 t e~1 the 11ull hypothesis Ihal the popula tion regression funct ion is li nea r against the tllter na ti ve thaI il is nonlinear. JU~ 1 bet:ause you think a regression funt:lL orr is no nline ar does no t mea n it really is! You must dClc mline e mpirica lly whe the r your nonlinear mode l is aprro priate.tcd that the quadratic model [i t th e data better than the line ar model. 3.2 and 8.3 suggc:.:. 5. SpeciJy a l1ol1linear Junction and esrimate its param erers by OLS. After work ing through these sectio ns you will unde rstand the c haract eristics of e ach of these functions. S ec l jon ~ 8.3 cont ain var io us nonlinear regression functio n. implc.
tbe unknown coe ff icients Po. (8.• X.X.sible f rtcutting ~g \'hether depend uch non exam· Polynomials O ne wa y to specify a nonlinear regression functio n is 10 use a polyno mia l in X.. {3.)(2. .. an exte nsion of the quadratic regression used in the last seClion 10 model the re lationship between test scores and income. the null hypothesis (Ho) that the regression is linear and the allernative (HI) that it is a polynomial of degree f correspond to HO:f32 = 0. that is. When r = 3. and so o n .. Equalion (8. In pa rticula r. wh e rea ~ he re the regressors are powers o f {he same depe nde nt variable.. + f3. these models can be modifieu to include multiple independent variables. . :st! the cs!\' I ne Of mort: Testing the null hypothesis that the population regression fun ction is linear. ted regres ~ suggested I . H. F.. + 132Xr + ..8.j = 2.1. tbe regressors are x .. The polynomial regression model of degree r is br it from Yj = f30 + {3.10) ion func li01'l ~ rt:grc~ . ..9) can be esti ma ted by O LS regressio n of Y.5.9) is the quadratic regression model discussed in Section 8.D. The first meUuxi discussed in this section is polyno mial regressio n. in Equa tio n (8.9) I Sections t be esti stand the When r = 2.: at least one{3J =I. X. f.10).. = OV5. . .. against X" X?. {3. Just is appro sl the null Init really t the alter· called the cubic regression m odel.. then the q uadratic a nd highe r·order temlS do not e nler the population regression functi o n.• {3.2. . they call be used in combinatio n. Accordingly.9) is rodel. .iOfl The nuH hypothesis tha t the popula tio n r egression function is linear can be tested against the alternative thai it is a polyno m ial of degree r by testin g No against H I in Eq ua tion (8. The polynomial regression model is simi lar to th e multiple regression m ode l of Chapte r 6.2 Nonlinear Functions of 0 Single Independent Variable 265 ~ in tbis I ~. If th e popula tion regression func tion is linear. so tltat the highest power of X included is X 3. lei r denote ihe highesTpowe r of X tha t is included in the regressio n. Thus the tedmiques fo r esti ma tio n a nd in fe re nce developed fo r multi ple regressio n can be a pplied he re. As we see in Section 8. TIle second method uses logari thms of X andlo r Y A Itho uglt these met hods a re presented separale ly. X l. Equation (8. ./33 = 0. Xi . Because Ifo is a joint null hypo thesis with q = r . however. . se eco funct ion that in volves on ly one indepeodcn t variable.1 restrictions on the coeffici e nts of the population po lynomial regression model. .Xi + U i · (8.. In ge ne ral. except tha i in Chapler 6 the regressors were distinc t independent va riables. it can be tested using the Fstalislic as described in Section 7.
2.. c~ressioo fo r Ihal r. This procedure.. Ihis answer is nOI very useful in practice ! A practical way to dete rmine the degree of the polyno mial is to ask whct ht. Pick a maximum value o f r and estimate the polynomial .: regfl::~' S1011 function rclati ng dist rict income to lest scores is (8 ~ .our po lyoo mi al is statistically significant." If so.nA!1vc thai it is a cubic at Ihe 5·. then il is appropriah. U:>c the I. The .! .statistic on Illcome' is 1. (he n belongs in the regres· xr sion. the)' do nol have sharp jumps or "spikes. Ihe non linear functions are smooth..eliminate X' (rom the regression and e"ti mate a polynomia l regressio n o f degree r .I bends (that is. use lhe polynomial of degree r . continue this proced ure until the cod ficient on the highest power in ). L = 0 in slep 3.I .000)5) ti' R. such as 2. H you reject this hypotbesis. 1) (0. in Pqua. so use the polynomial of degree r. . Iha l is. Test whether the coemcien! on )(..h. inflection point~) In its g. is summarized in the fo llowing ste ps: 1.:r the coefficients in Eq uation (R.. The estimated cubi<.. Increasing the degree r introdUl.Q2Jncome .1 + 5.97... If you do nOI reject (3. If you do not reject {3. . Th is recipe has o ne missing ingredient : the initi al degree r of the polynomial. If '\Q. If you n:jccl. 600. (5..!_!_ _ .1 is ze ro..:\ more flex ibility into the regression fun ction and allows it 10 match more shape ..55:'1. Unfort unately. In many applic<Ltio ns involving economic data..:t. = 0 in step 2. be~io wit h r = 2 or 3 or 4 in step 1.ffi6/ncomil + O.stat istic to teSllhe hypotbesis that the coefficie nt on )(' {fi. Ihen these tcrms can be dropped fr om the regressio n. tion (~t9» ) is zero..71 ) (0. which can reduce tht. more regressors. Thus the a nswe r \ 0 Ihe questio n of how many terms 10 include is Ihal }\.(0)69JllcomeJ. but no more.li yidua l hypotheses are tested sequcntially. 3.. Application to district income: and test scores. how many powers o f X should be incl uded in a polynomial regression? The unswer balances a (radealf between ncxibility and statistical precision. I ....I . or 4that is.lU should include enough to modcl lhe nonlinear regression funct ion adequately..raph.!~~.I 266 CHAPTER 8 Nonlinear Regrenion Functions Which degree polynomial should I use? Th at is.~~ . J.029) (0.. 0.9) associa ted with largest va lues o f ra re zero. prccbion or thc estima ted coefficie nts. 50 the null hypoth~is Ihat the regression fo~' •__ . But increasing r means add ing.O. which is called seq\lenti al bypot he~is tes ti ng beca use int. a polynomial of degree r can have up \0 r . lI. .' \0 choose a small maximum order for th e polynomial .1 4. _.
.so the null hypothesis Ihal the regression func tion is linea r is rejected againsl tht! alternative Ihal it is either a quadratic or a cubic. the wa ge gap was measured in terms of dollars. Regression spec ificat ions that use natural logarithms allow regression mod els to estima te percentage relationsh ips such as these.Ol%. might it be Ihat a change in di strict income of 1%rather than SIOOOis (lSSo cia led wjlh a change in test scores th aI is approximately constanl for di fferent va lues of income'? • In the economic analysis of consumer deman d. Moreover." examined th e wage gap between male and fe male college tor that r. where e is lhe constant 2. Would this relationship be lincar using percenlage changes'? That is. The coef· ficie nts in pol ynomial regressions do not have a simple interpretation .71828 . Logarithms convert changes in variables into percentage changes. we revie w the exponenti al and natural logarithm fu nctions. hich is e tested Logarithms Another way to specify a nonlinear regression fu nctio n is to use the nalural loga rithm of Y and/or X. Here arc some examples: • The box in Chapter 3. regrcs t and l:sti ficient on graduates. the nat ural logarithm. hat you lely. the exponential fu nction i. Berore introducing those specifica lions. rc smooth. and many relationships are naturall y expressed in terms of percentages. • In Section 8. I . The exponentiul runcti on of x is e'< (that i::. professions and over lime when 'hey are ex pressed in percenlage terms. 111 The exponential function and the natural logarithm _ Tbe exponent ial function and its inverse. but N hether ro.8.If so. it is often assu med that a 1% increase in price leads to a ce·rtain percelltllge decrease in the quantily demanded..1. play an import ant role in mode li ng nonlinear rcgres. ic at the ) • . wi th a p value less than O.. the FSlatislic testing tbe joint null hypothesis that lhe coefficienls on Illcome2 and II/comc' arc bOl h zero is 37. Interpretation ofcoefficients in polynomial regression models. ropnate w hat is.7.2 Nonlinear Functions af a Single Independent Variable 26 7 deoff uces Ihapes: ints) in Lee Ihe ~ or X lcvel. Th e best way to in terpret polynomial regress ions is 10 plot the est imated regression fu nc lio n and to calculate the estimated effect on Y assoc iated with a change in X (or one or more values of X. In that discussion.ion fu nctions. . begin ~bic regreS (8. o \ynomial. "The Gt:nder G ap in Earn ings of Collt:ge Grflduatcs in th e U nited States. we fo und that district income and test scores were nonlinearly rela ted. II the coef l ificant. llle percentage decrease in dema nd resulting from a 1% increa~ in price is ca lled Ihe pricc elasticil}'. ~e in Equa. e raised to the power x)." ression fun': . it is easier to compare wage gaps ncros!!. Howe \'er.
The logarithm function h as th e fo llo wing use ful prop e rties. y = In(x).. such as base 10. InlXI is steeper for.lfil hm is the funelion for which x 0:: In (e") o r.4 The Logarithm function.:L"L20 .0 80 100 60 l ~(J . the di ffere nce between the loga r ilhJl1 o f x + ax a nd the logarith m o f x is approxima telv . TIle slo pe of th e logarithm function In (x) is l / x. the natural log(. In (x) . equiva lently. ~. is graphed in Figure 8. In(x / a) ::::. in th is book we conside r on ly logarithms in base e.. . The link between the logarithm a nd percent· ages re lies o n a key fact: Whe n !:u is small..~on thon for lorge vo'ue~ of X. con tinu c~ to I In(l/x) = ·In(x). The aaturallogllTitlun is Ih e inverse o f the expone ntial func tion." The loga rithm fun ctio n. tha t is. No te th at the log arith m func tio n is defined o nly fo r positive va lues o f x. That is. then fl <l ltens out (<llthough th e function increase) . x = In[exp(x)J.13) l + In(x)... 8 Nonlinear Regre~sion Functions FIGURE 8. (S.12) (8. In(ax) = In (a) (b. so when we use the term "logarithm " we always m ean " n atural logarithm. is only defined for X> 0.and In (X') == lfln(x). The logarithm fu nction has a slope that is sleep al firs t. y '" InOO 4 3 2 °o" ~~~~.268 CHAPTER.In(a). X also wOlten as exp(x) . 15) Logarithms and percentages. .111e b ase of the n a tura l logarithm is e.4. the na tura l logari thm . ond has slope 1/ X. p ) (•. A hho ugh there are loga rithms in o the r bases. Y = In{XI y 5 The loga ri th mic functi on Y". the pereenlage change ill J d ivided by lOO. that is.
+ 6<) ..c. In (X ).16). . 1l1e re are three d iffe re nt cases in which loga rithms mi ght be used: whe n X is transt"o rme d by ta king its logarithm but Y is not.t OLS regreSSio n of Y .1 7) Because Y is no t in logari thms but X i~ this is sometimes re ferred to as a 1in~8"" log model.l3.In(lOO) = 0. ~ta tistie. the regressio n mod e l is Po + p \lo(Xj ) + u"i .). Thus /lx / x (which isO.]e In(. while In(x + Ax) . lf X changes by 1 %. hypotheses abo ut 131 can be l es l e~ using the l . The three IOKarithmic regression models.5/100 = 0. Case I: X is in logarithms. The o nly diffe re nce between the regressio n model in Eq ua tio n (8.0 113 1.!.00995 (o r 0.In(x) = In( 101) . no (8. consider t h l. 17) and the regression model o f C hapter 4 with a singk regressor is Iha l the rightha nd vari able is now the logarith m o f X raliler Ihan X itse lf. E stimating this regression by OLS yields • . thus. we cou ld usc t he linear·log specifi catLon in Equation (8.In (x) (which is 0.x 1x = 1/ 100 = 0.995% ).00995).Dl. x ( when ~.05. in this model a 1% change in X is associated with a change of Y of 0.1 7).OI ) is very clo se 10 In(x + ax) . Fo r example.In( IOO) = 0. (8. 6< /.In(x) 6. o n In(X.01 (or 1%). W he n ax = 5.oc ia ted wit I) a cha nge in Yof om f31 ' To see this. In the lin ear· log mode l.) = In( 105) . fi rst com pu te a ne w va ria ble. t he n 6.[.2 Nonlinear Functions of a Single Independent Variable OIl 269 In(x + llx) . ._ + J3 l ln(X )] = .96SE(fJI)' As an example.. Yi ::.and a 95% confidence interval for /31 can be consuucted as /3 1 :: 1.1 7) . th is is readily done using a sp readsheet or sta tistical software. a 1% change in X is as. when Y is tran sfon neu to its logarit hm b ut X "is not .. x 15 small ) . Th en Poa nd 131 ca n be estimated by the in . To estimate the coefficien b 11) f3 0 a nd (3L in Equa tio n (8.In(X)] . Y is not.l311n(X + JiX )] . .1.8. wh. {3t( ilXIX) wh e re th e fi lial step uses the approxi· mation in Equation (8 .l3d ln (X + JiX) .! diffe rence betwee n 111e pop ulation regressio n func 10 ti on at values of X that differ by 6 X: This is [J3u + . The interpre tation of the regression coefficients is diffe rent in each case. a nd when bOlh Y and X are lransfonn ed to their logarithms. return to the relationship betwcen district income and test scores.16) wh ere " S!'" me <lns " ap proxim<ltely equal to:'1l1e de rivo tion of this ap p roxima ti on relies o n colcuhL\ but it is readily demonstrated by trying o ut sOme val ues of x and !lx. We d iscuss these three cases in turn . th en :1XIX = O. when x = 100 and A.r = 1.In(.04S79. In this case. Instead o f the quadratic specification .
In(lO)J = 3. 18). il is initially stee p bu l Ih. the predicted d ifferenc&.5.8 + 36A21n( lOll 36.1 For example. 1S) is the nalurallogaril hm of income ralher than income.000 versus SI1.8 + 36. H2 = 0.jlh an increase in lest scores of x 36. (3. The estimaled linearlog regressio n functio n in Equation (S.! hctwel:n a district wit h average income of $40.90. 42In(/ncomc).42 = 0. Po + ~ 11ntXl captures much of the nonlinear relation between ted )(O(e~ ond di~trid income.56 1.lhtricts.: n natlens OU I (or higher levels of income.8) ( lAD) {K lS) AC\':ording 10 Equ ati on (8. a 1% increase in income is associated ".5 The linearLog Regression function Ten 5core 7of() The estimated linearlog regrenioo functioo y .42 X [I n(ll) .47.20 600 U L IU 20 30 40 5U (. this regression predicl~ Ihal a SIOOO increase in income has a larger effect nn teS I scu res in poor districts than il does in aID uen tl. om = . Because the regresso r in Eq uation (8.(X)() and a district with avcmgc income of S41. wha t is the predic ted di(fe re nce in lest scores (or d istrict!> wit h a\'~ r<lge incomes of $10.n~ bch~ een the predicted values: AY = [55 7.(XXJ is 36.In (40)] = 0..36 points. 720 1011 '''''' 660 610 (.3.ue the effect on Y o f a change in X in its original un its o f thou~nds of dollars (nol in logari lhms)..• 270 CHAPTER 8 Nonlinear Regression funcnom FIGURE 8. we can use the method in Key Concept 8.R + 36. Thus.\8) I!> plollcJ JI'l Figure 8. To cstim. Similarly.4210(1I)J .1557.000? The estimated value of /1 Y is the d if£er. like the quad rat ic specific<ltio.lo. the estimated regression fun ct ion is not a slraight lint: Like the quadratic regression funct ion in Figure R..' 1 Distr icl in corue (thousa nds of dollars)  F WScore = SS7.42 x (10(41 ) .
Many e mployment con t ract s spcciry th a t ..0005) lth average difference = 0. Ibis is referred to as a loglinea r model.6.19) SO that each additional year o f age (X) is.1 ).TIlis percentage re lationshi p suggests esti mating the loglinear specifica lion in Equation (8.u. com pare the expected val ues of In( Y ) for val ues of X that differ by l l.) "" {J(~ + {3I X . Y) = /30 + f3J (X + .1/30 + .In (Y) == a Yl Y Thus. When X is X + dX. In(Eurnillg. the ex pected value is given by In(Y + 6. From the approximation in E quation (8. for each additio nal year of se r vice . this is referred to lIlo d c.:cause both Y (lnd X are specified in logarithms.(086) % I for each addi tional year of age. th en ISI(Y + a Y) .the rela tionship between age and ea rnings of college gradua1t:s. In t his case.20) ~6.19) Because Y is in logarith ms but X is no\.777 obser vat ions on college graduates in the 2(X)5 C urrent Popul atioll Survey (th e data are described in Appe nd ix 3. The expected value of In(Y) given X is In(Y) = 130 + f3 1X.655 + 0.2 Nonlinear functions of a Single Independent Varioble 27 1 Case 1/: Y is in logarithms. If LiX = J. associa ted with some consta nt per centage increase in earnings (Y ). earni ngs arc predicted \0 increase by 0.llnit change in X (. % ch ange in Y To see this.86% 1(100 x 0. ~ffect on \t::~1 ! inc001e (l ( According to this regression. o n average in the population.421i\(IO)1 e between (I pecificatiO !l. this relationship is ~ In (E arnings) = 2.\1 = fJ \AX.. In this casco the regression model is In( Y.e in hi s or he r wage. if /3 taX is small.1.\'.21 ) a loglog Bt.l. X is not ..Thus th e difference between these expected values is In(Y + tJ. so that X changes by one ullit. we re turn to the e mpirical example of Section J. Tra nslated in to perce ntages. r. l1 tlnit change in X is associ ated with a 100 x f3 !% change in Y As an illustration. the unknown coeffi cie nts f30 and {31 ca n be estimated by the OLS regression of In (Ra rnings l ) against Age. • .. a one. an tbousand~ FIll In(Y) = 1/30 + f31(X + aX)1 .010) (0.thc regression model is plotted ill logarithm (It I straighl litl~ teep but Ih(tl " In(Y/) = fJu + 1l1ln( X.7.. D conll~ dollau) (8.030. (8. Case III: Both X andY are in logarithms. however. When estimated usi ng t he 12. a worker ge ts a certain percentage increa:.OO86Age.~ X = I) is associUlcd with a 100 X 13. n ~.16) . I n the loglinear model.8.81. Rl (0.).\j . then a Y I Y changes by fJ t. By first computing the new dependent va riable. aYI Y == {3J a x.) + IIi' (IS (8.18) id with pt 8. (8.
1og regreuion function. Thai is. If thc percentage change in X is 1% (Ih is. in this speci[ica tion 131 is the eillsticity of Y with respec t to X.log model. A:i.8 1 is th e elasticity of Y wi tb rcspeC! to X.557. iC 6.the ratio of the percenta ge change in Y .55 l.ogll'\t¥ ~ L at I i L " th e r 6.lX .006) (0. In(Y) is Q linear function of X. is the percentage change in Yassociated with a I change in X. 1: lhus In(Y + i1Y) .In(X)J.) :. 16) 10 bot h sides of this equation yields . in thc loglog specification PI is.5d d. In the 1og... In(YI is 0 linear function ollnlXl.6 The LogUnear and Loglog Regression funcrions In{Test score) In the Iog·Iinear regression functioo.: lh aga in apply Key Concept 8. The res ulling estim ated equation is l ri(Te:'·I S·~. 'R2.05S4Ln(lncome)..XI X = 100 X (AX / X) = percenlagechangeinY perce ntage change in X ' (. '''' (. retu rn 10 the relationship betwee o income and Icst )cOf' When this relationsh ip is specified in this Conn.% chang(: in Th us.40 1 o j II 10 20 J I) 50 61 Di strict ill COIll (thousacds of doll ars 40 II In the log. (0. ao illuslrat ion .1 130 + .d.B)ln(X») = 13..0021) (~.23 l .lln(X + uK) .In(Yl = 1 130 + 13\ln(X + .1~sO' cia ted with the perce ntage change in X.272 CHAPTER 8 Nonlinear Regr~sion Functions FIGURE 8 . a 1% change in X is associated wit h a 13.OI X).45 1'. the unknown coeffici ent~ arl! <!s mated by a regression of the logarithm of test scores against the logarithm income.' 2 Thus. then 13.O: f3cyor (31 = flY I Y 100 x ( fl Y I Y) o. Y fl X y:. A pplic3 Iio n o f the approximati· in Equat ion (8.. 6.336 + O. 6. To sec.0.X =O.
Eve n so.:: in X is assoc iated wilh a {31 'Yo chsngt in Y.8.0554 % increase in test scores. most of the observa tio ns fan be low the loglog curve.. incomt "" A ill Xhy 1 unit (. For comparison p urposes. fW¥$~ 82 K~Y CONlOEI'T r . the re gression function in Equa tion (8.X. A I % chang. 'nle following table summarizes th ~sC three cases a nd the interpretation of the regression coeffi cient {j .Blln(X1 ) + It. + II f In( r ..1•. f dollars) m In(Y.6 is t he logarilh m of the test seore_and the scalte rplo t is the logarithm of test scores versus d istrict income.: in X is associated Yi = f1rl + .01131. (8.6. The estimated loglog regressio n fUllctio n in Equalion (8.221 According to this estima ted regression fun ction.557) lb an fo r the logli ne ar regression (0 .) + II. 23) is plotted in Fig ure 8.439 1ge in Y aSSO'  + Q. so {31 is Lhe elasticity of Y with respect to X.' U j . an independen t variable X. Figure . the loglog specificatio n fits slightly better than the loglinear specificatio n. Rl = (O.e vertica l axis is in logarithms. in Y. cie nts arC c~\I' ~ logarilh Jl' of ..7.Xli roximation y X· (B..497.change ' with 3 cha nge in Y of 0.2.24) is the Slraight line in Figure 8.2 Nonlinear Functions 010 Single Independent Variable 273 I~ lr LOGARITHMS IN REGRESSION : THREE CASES Logarithms can be used to transform the dependent variable Y. the loglog specifica tion does not fit the data especially well: At the lowe r values o f income. Th is is consisten t with the higher 'Rl for the loglog regression (0.) = f30 + .s.iX '" I) is associated with a IOO{31 % change in Y. o sec this.ooOl~) 0. {3q + {3. As you can see in Figure 8. Because Y i ~ in logarithms. o r both (but they must be positive).003) X is 1% (thai ed with a l % nd test score.6 also shows the estimated regression (unclion for a loglin t:ar specification_ which is In(Test Score) "" 0.B 1ln(X. . (X + . (B.OO284ln com c. while in the middle income range most of the observa lions fall above the estimat ed regression function...11 Because th.6. In each cascof31 can be eSlimated by applyi ng OLS after raking the logarithm of the depen dent and / or independent variabl e. tbe vertical axis in Figure 8.) = . A 1% Chang. a 1% increase in income is esti mated to correspond to a 0. The three logari thm ic regressio n models are summa rized in Key Concept 8. ~ 'Og.24) (0.497).6.[t!>sion "'M n t Regrouion Specifi(orion Interpretation of /1.
is di\tributcd indcpendemly of X" then the cxpccteJ vulue 01 Y. i.'>_ sion of Y against Xln the test sco re am.19): the res ult is Computing predicted yalu~s of Y when Y is in logarithms. it seems (to us.tI.."urC$ tbe fraction of the variance of the depende nt variable explained b~ Ihe regressors. the appropriate predicted value of Y.!> the data? As \I.exp(.'\eu l X.ln modeling lest scores.'ll d~pcndent variables are different lone is Y. Sl) Ih\! linearlog model fits the data ocHer. the estim ated regrL's sion can be u::.J lo'ibis OlJlen.kp.508. To sec Ihis. sion has an "R2 o f 0.) = ~+iJ.J.'ay) nat ural to di <.• 274 CHAPTER B Nonlmeor Regression Functions A difficulty with comparing logarithmic specifications.1111..:u~ test result:. it docs nol muke se ns~ to I:ompure their R~' s. .. using economic Iheory and either your or olhe r experts ' knowledge of the problem.1 \J more ad~anccll af"J can he: '~lppcd "ltOOUt I""s or ~onunully_ . gi\cn X IS E( Y.561 while the linear regression has an RI o f 0.. consider the loglinear regression model in Equa tion (8. HO\\ e .) If II. Bccause the depcndent variables in the loglog and lineurIog modcl~ arc diffen:nl.:~lrc rather than its loga rithm.23) and (R. the loglog model had the higher R2. . the 8..! income regression.. labor economists typicnUy model earning~ using logarithm::. Il ow can we compare the linearlog model and the loglog model? Unlor!u. Hnd 50 (Orlh arc often mosl naturally discussed in pc'r centage term!t..e saw in the discussion of Equalion~ (8. because wage compari~on:. natel}.: R: mca. Similarly.~ E(e"').1 rewrite it so that it is specified in t~rrns of Y ralher than In( V ). (Ie.. II r\ a bit trickier to compute the predicted value of Y ibdf.I X. Which or the log rcgr~ion models best iiI.'r.) e £(~. Reell lllhlH th!. contract wage in c rea5e~." din:ctly the predicted value of In( Y).:n dent variable Y has heen transformed by taking logarithms... whether it makes sense to specify Y in loga rithms. nOI simply obtiUI1. tak~ thl' I!xponcnlial function of both sides or the Equation (H. £(e". The problem is that . Beca use of Ihis problem.ed to compUl.lhe linearlog regrL'\.24). .. If the I. 18) and the linear rcgf('. anyv. the R~ COIIl/ot he used 10 compare these two regressio ns beeau~1. th e R2can be u~d 10 compare the log·linear and loglog moo_ els: as it happened.' thl."veil ir !:lId = O.1 Y).) F 1. Fo r example. the best thing to do ill a pUrlieul "r application is 10 decide.2 can h. To do so.. the other is In(YJJ.ll\'e "· .. . ThUs. J Y.j. in terms of poinL'i on the test rather than pcr<:enlage inCTe asc~ in [he le~t score~ so we C ocus on models in \\hich the dependent va riable i!> the test s.: used to compare the linearlog regression in Equation (8.. ) = e A.8o r 13 IX/ + II.
Oue way to dn so is to a ugme nt it wit h higher powers of the logarithm of income. but we did not test Ihi~ formally. 1 + llJ.06[ln(/m:omt')j3.8! Yo: l b is predicted value is biased because of the missing factor £(e"').) lohtl1lllo. . filling a no nlinear function therefore enta ils deciding which method or combination of met hod~ works best. but in the end the t rue form of the populat io n regression fu nction is unkn own. but thb. so we select the cubic model as thc preferred polynomial specification.18) see med to provide a good fit to th ese data.. Rl "! t he ode ls blo of the labor I risons. it is ofte n most natural just to usc the logarithmic speci fi ca tion (and the ltssociatcd percentage interpretations) throughout the a na lysis. quadratic [E quation (8. If these additional terms are not statistically different from zero. the cu bic specification provided an improvement over the quadratic. n p.:J + 3.. Accordingly. We conside red two polynomial specifica tions spt:cifi ed usi ng powers of In come. economic theory or expert judgment might suggest a func tional form to use. T n practice. gets compliC<lted and we do not pursue it further.: 1 (8.2)} and cubic [Equation (8.4) (87. depen I "r. we compare log arith mic and polynom ial models of the relationship bctwt. is to compute pre di cted values of th e logarithm of Y b ut not to transform them to their origin al un its. II I~ 19).. the mtu their Ite by taking the exponential funct ion of l30 + that b.g:~c~.26. n. A s an illustration.x. 00 II be .rc<. (3.7) ~ Jt if £(".7?1::: 0.1 I)}.':cn district income and lest scores.2 \o~ Nonlinear Functions of a Single Independent Vorioble 275 one. fi . A nother solut ion. this is ofte n acceptab le because when the de pendent variable is specified as a logarit hm.8. = e ti .9) (31.11) w as significa nt at the 5% level..4In(/l1come) .+.9[ln(/llcomc)j2 (79.:!:'t i\'cn X i~ Logarithmic specifications_ The logarithmic specification in Equation (8. whieh is the approllch used in this book. Polynomial and Logarithmic Models ofTest Scores and District Income In practice. then we can COIl clude that the specification in E qulilion (K 18) is adequate in Ute sense that it can not be rejected against a polynomial function of the logarithm. ltnd Ink. In practice.560..re!. O ne sol ution 10 th is problem is to estimate the factor £(e".:r iscuss in th~· I scurc Polynomial specifications. Because the coefficient on ilicom CJ in Equat ion (8.74) (K26) . . the estimated cubic regression (specified in powers of the logarit hm of income) is T eS1Score = 486.) and to usc this ~ stimatc whe n computing the pred icted value of Y. by setting Y..: th.
818.561 aud for the cubi c regression it b 0. The Fstati.~ 6) does nOI provide a statistically signi ficant improvemen t over the model in Equa.7 plols the estimated regression funct ions from the cu hic sp(~cification in Equat ion (8. 740 720 7(".1 8)..555. lilM'ar. Comparing the cubic and linearlog specifications.64. Thus the cuhic logarit. FIGURE 8.The"R'1. with a p value of 0. tio n (8. wllich is linea r in the logarith m of income. One statistical tool fo r compa ring these specifications is the R2. lions are q uite simila r. esis that the true coefficient is zer o is not rejected a1 the 10% level.hmic model in Equa tion (8. CHAPTER B Noolinear Regression Functions TIle (·statistic on tbe coefficient on the cubic term is 0.44. so the nul] hypOth. and becHuse this specification does not need highe rorde r polyno mia ls in the log· a ri thm of income to fit these da ta . tic testing the joint hypothesis tha t the true coef[icie nts o n tbe quadratic and cuhle term are both zero is 0.7 'The linearlog and Cubic Regl'tiSton Fundi on~ Tes t score The estimated cubic regre~ion IiJnction [Eqvation (8. Because the logarit hmic specificat. Figure 8. of the logarithmic regre ssion is 0.' .ion has a sli ght edge in terms o f the R!.181J are nearly identical in this l1Omple.log 680 6(. we adopt the logarithmic s pecifi cation in Equation (8. The two estima ted regression [unc.• 27.1 1I[ and the e~imoted linear· los regressiOl1 IiJnction [Equation 1 8..0 " Cubk 640 620 600 0 10 20 '" 40 50 D is trict in come (th ou sands of doJlar1) . 18).11 ) and t he lincar· logspecificat ion in Equation (R l 8). so this joint null hypothesis is nOI rejected a l the 10 % leve l.
:iCicat ion. the effect of DlJ o n Y. on these two binary variables is Yi = f30 + {3ID li r J. and when both are continuo us. d cubic sis is not on (8. The specificat ion in E quation (8.27) has an important limitalion:Thc effect of having a colkge degree in this spl.< where Dh = I if the jlll person graduated from colleg. is the same for men and women.:hc r ratio wou ld depend on the fraction of English lea mers. This section explai ns how to incorporate such interact ions between two inde pendent variab les into the multiple regression model. (8.111e pOpulll1ion li near r~grcssion of Y. . bolding schooling constant and f32 is the effect of havi ng a college degree..l he presence of many English learners in a dist rk'\ would inte ract with the studen t.' 50 ''"' ':> + f32D 21 + 1/. if students who arc slilllearning English benc fit differenti<llly from oneonone or small grou p instruct ion . holding D II constant. could depend on the value of 1.3 Interactions Between Independent Variables 271 ~ l Fstatis· l hypath 8. when onc is binary and the other is continuous. The possibl e interaction be tween the studentteacher ratio and the fraction of English learners is an exam· pie o f the more general situation in which the effect all Y of a change in one inde pe ndent variable depends on the va lue of another independen t variable. the individual's gen dcr (D I" which = I if the (ill per son is fem ale) and whether he or she has a college degrcc (D 2. however.. no reason Ihalthis must be so. Interactions Between Two Binary Variables Consider the population regression of log earnings [Y.27) '"'" . where Y.c ).. Therc is.8. f31is the effect on log earnings of being fe male.(8.ll1is could arise. the re could be an interac tion between gender and having a college degree so that the value in the job market of a degree is diHerent for men and women. for example.< ' rr . 1\) MOd ssion func l ions is th~ lession i~is . If sO.in the log Ification in rof thC R2. ° .ct jncOil1C Is of do\1 pd) In this regression model.3 Interactions Between Independent Variables In the in troduct ion 10 th is ch apter we wondered wheth er reducing the st udenlleacher ratio might have :l bigger effect on test scores in districts where ma ny students are slillieaming English than in those with few s@ learning Eng Iish. = tn(Eanlillgft)] against two binary va riables.:r..teache r ratio in s uch a way that the effect on test scores of a change in the s tud en tIc~u. Phrased mathematically.26) in EqulI r p lo~s the . holding gender consta nt. holding constant gender.In other words. We consider three ca ~ e s: when both inde pcndent variables are bina ry.
AJtho ugh tJlis example was phrased using log earoings..) of having a college degree (changing D21 from 0 21 = 0 L a D y:::: 1) to depend on ge nder (D \.1. The c frecl of this c ha nge is the difference o f expected va lues (tha t is. The coeffici ent f13 on the interaction te rm is tbe dif· fe rence in the effect of acquiring a coUege degree for wome n versus me n.ID " = d. + f32D 2t + f3lD \.. it is easy to mollify the spec.D. but if the person is female (d 1 I). is called an inlersC1ion term or an inter. t he e(fect is /31 + ~..28) is call1. whic h is f10 + f3 \d l · The E(Y.28 ).. acted regresso r. in effect..29) }. given a value of 0 \.4)].278 CHA. The int eraction term in Equation (8.'d a binary vari able interactioo ~gression model. and the popula tion regression model in Equation (8.ID \. Th is method.) Firsl of bin bee e1(pcC 1llt: new regressor.D b = \)  £(Y. = 0) = /3. to work through each possible combination of the binary variables. (82.. = 111' DlJ "" 1) = Po + /3\ X d1 4. aher the cha nge. for O2. The me thod we used here to inrerpre t the coeUicie nls was. given the same value of D. ification so tha t il does by introducing anothe r regressor. the product of the two binary varia bles. is summarized in Key Concept s j · = = . the effect of acquiring a college degree is {Jz. + ~3dl' (8.). The (irst step is to comp ule the conditional expectat ion of Y. l Dl i "'" d l . and acquiring a college degree.27) does not allow (or Ihis interac_ tion between gender and acq uiring a college degree. for Ov = I.PTER 8 Nonlineor Regression Functions Although the specifi cation in Equation (8.lle (d1 0).19) Thus. calculate th e po pula tion effect of a chaDge in 0 21 using the general method lai d out in Key Concept 8. lhe point is a general one.lhat is. which applies 10 all regressions with binary variables. this is £(Y..nge in D 1i ) depend ~ on the person's gender [th e value of O k which is d 1 in Equa tion (R. X Du. The bina ry va ria ble imeraction regre~' sio n allows the e ffect of changing one of the bina r y indepe ndent "a riab lc~ to depend o n the value of the othe r binary variable. uni t cha. the d ifference in Eq uation (8. = a. ~ d.D 21 = 0) = 130 + 131X (I I + fh X 0 + ~ X (d\ X 0) = next Slep is to compute the conditional expectation of Y. in the binary variable interaclion specification in Equation (8. x Dli ) + I~. If the person is m..ID . Ih x l + P3 X (d t X I ) = /3 0 + /31(i l + /3 1 + f3 3(/ t.gender. To show this malhema tically. the effect of acquiring a college degree (a. lhis is £(Y. the product 0/ . DI/ x Dk The resulting Jegression is Y i = Po A IN R + P lO t.28) allows the population effect on log earnings (Y.
to \\.5(HiSTR x HiEL) .4 points.3 an inter S) is caned Application to the studentteacher ratio and the percentage of English learners. Whe n HiSTR . be a binary variable that equals 1 ifthe studentteacher ratio is 20 or more and equals 0 o therwise. This is done using the procedure in Key Concept R.2 ( = 664 . the sample average is 662.3) (1. = 0) is 664.9 + 3. Let HiSTR . "'hie Key Concept 8} to one with a high slUde nt.5 ( = 664.2 . and let HiEL I be a binary variable that equals I if lhe percentage of E nglish learners is 10% o r more and e q uals 0 other wise."ti ll s method.9) (3.1 . 8.9 points.' is [feet all log D ll : 1)10 l ~ population ept S.30) also can be used 10 estimate the me8n test scores for each of the four possible combinations of the binary variables.:ht:r rat.30) 7<2 = 0.5 ) . If the fraction of Eng lish lea rners is high. is givc n by Equa tion (8.5 ::< 5.28) )f First compute the expecled values of Y for each possible e<lse described by the set "fbinary variables.9 . and when H iS TR j = I a nti Hi£L j = 1.290. the sample average is 645.4) (2.18.9HiSTR .3.1.3.2HiEL .18. [.1 . ~ 0 a nd Hi EL j '" l. test scores a re estimated 10 decli ne by 1. . That is. Each coefficient can then be expressed c ither as an expectcd value or as the difference between two or more expected \'alues. if the fra ction of English learners is low (HiEL = 0) .1.29). this effect thus is .9 ("" 664. interac th~ ~pec If the two fOR INTERPRETING COEffiCIENTS IN REGRESSIONS WITH BINARY VARIABLES ~ A METHOD KEY CONa/'T II (8.8.1) (8. The inte racled regression of test sco re~ aga inst HiSTR. the sample aver age lest score for districts with low studen t. then the effect Oll lest scores of moving from HiSTR = 0 to HiSTR = I is fo r lest scores to decline by 1. For di stricts with HiS TR j = 1 ( high stude ntteache r ratios) a nd Hit: Lj "" 0 (low frac t ions of English learners). i33d\.1 .1 . the the pers oO's pe rson is Olal~ erson is female term is the dif· .o ~Jdl' (8. The pre dicted effect of movi. According to the estima les in Equa li o n (8.30). Accordingly. Next compare these expected values.1.3 Inleroction~ Between Independent Voriobles 279 r.ng from a district with a low srude nHeacher rati.5Hi EL.1.us mell .18.3. with estimated coef ficie nts replacing th e population coefficients. the sample lIveragc is 640. and HiEL. The estimated regressio n in Equatio n (8. The difference io (1.ios (HiS TR.tell<.teache r ratio.3.29) )Il ion (8.The rive n a value f(dl x O)~ r'j after the TesrScore ~ = 664. holding constant whel her the pe rcentage of Englisb learners is high or low. I.1.2).9 .9). = 0) and low fr actions of Engli sh learn ers (HiEL.and acquiring e ractjon regre~' ent variables 10 n effect. t D u : d l • D.28).
'.ff~. .. ariable X <... Utnl' ~lo~ X (b) l>uft:Tl""nI 111l1.. a college dt!grec (D ..ln anle f) in three diffe rent way~ In Figure RXa...lOO5varioble$ can produce rhree different population regreni<N1 fvr"IChons lal f30 + PIX.PIX fhD • p)(X x D\ allows for different intercepts and different slopes.. PI reren t slopes. SbptPI+Pl po+flz r /Jf) .2 80 CHAPHR 8 Nonlinear Regression Functions FIGURE 8. PI flo x (a) O . x (el 5.."T<t:pl\../ I"U .. hown in Figurc RI<..:nt mt("rt"~rt~..w d I II one binury variable..:a n depcnd on the lmwr~' ~'.8 Regression Functions Using 8inary and Continuoos Variables (Po ~fJ2)+(fJt +p)X }' I (jJe +P1hPIX I' .. (he popul:Hion rc~rC~' sion line relaling Y and the continuou::._ __ J ....\) .. Ihc individual's years of work cxperkncc (. ) ... and (clPo + PIX+ P2f X x DI has !he some intercept but allows lor dif· l Po +PI X jJopt ..+fl2 I" .. =: In( c·(ll"IIill.111 ' ... As . whether the worker hl\:::...l.11lle intercept. when~ f) the j lh penon i') a college graduate) ...the 1\\0 regrcssion lines differ only in their imcrcepL 111 C ((lr rc~ponding populntion regression model is 1 ( ~.JJ against onc continuous variable.. ditr"'rC"1l1 . fl.1opc'l Illlffodion~ of binory voriablel and conlinl.)1 " sJope ~ PI +P: P' l~. . fbi fJo +... fJ2D allows for different intercepb but nos the some slope. IhtTt:Tl""llt ~'ol'''''' y I fJo~ (PI +13... Interactions Between a Continuous and a Binary Variable Next consider (he population regression of log ~ilrnings [ Y..J I :. .
8a. A third poss ibility. th e two lines in Figu re 8. Doi ng so shows Ihat .) and ~feV"'lir . as shown in Figure. a pply the proced ure in Key Co ncept S. Thus. holding years of experience constant. the population regression fu nction is (Jo + f3 1X j.Sb. Slated in terms of the earnin gs cxample.. In th is specification. 8. To allo w for different slopes.8a have the Silme slope.. the product of X i and Vi. To in te rpret the coeffi· cients of this regression. The di fference betwce n the two intcrcepts is f3z. 8Bc. 10 terms of the earnings example. the two lines have differen t slopes a nd intercepts. (his specification corresponds 10 the population mean entrylevel \VolgC being Ihe same for college graduates and • . is that the two Jines have di fferent slopes bu t the same intercept. 13. In the earnings example.. but requires that c xpcclcd log earnings be the sam e for both groups whe n they have no prior expe rience.. is the effect on log earnings of an add it ional year of wo rk experic nce.Jl l The coefficien ts of this specification also c.3. . and the difference be tween the two slopes is {3~.3. = 0) and f3 1 + (3J is this eUect ror g rHduates.Sb.'y Concept 8. Said differently.33) Ilion rcgrl!~ binaf'\'ari· ~pl. fhc co f (:. shown in Figure. hold ing college degree stat us constant. ~) + (3\X .. + f32(Xj x V i) + II . The interacte d regression model for this case is Yi "'" Ice (X. The differ e nt slopes permit the effect of a n additional year of work to differ for college grad uales a nd nongraduates. li where Xi X Di is a new variable. jf D.3 Inleroctions Between Independent Variobles 281 This is the fa miliur multiple regression model with a population regression func tion that is linear in X I and D j " When Dr = 0. so 13J is the difference in the e. (8. {31is the effect of an additional year of wo rk expe rience for non graduates (D. and f32 is the effect of a college degree on log earni ngs.32) l luoctiom: allows s for dif £IIrningl. but the intercept is f30 + f31·lllUS (J2 is the difference between the inlc rCepls of the two regression lines. add an int. so the intercep t is (Jo and the slolX' is (J \..ffect of a n additional yea r of work expe rience for college graduates versus 110n gradua tes. + ~Di + (3J(X i X 0 i) + Il.31): Yi = (3{) + 13 I X.eraction term to Equa tion (8. as is shown in Figure S. + {32' so the slope remains f3. the populatio n regression fu nctio n is 130 + f31 Xi' whcre(t ~ if D j = 1. When Di "'" 1. the effect of an additional year of work expericnce is the same for college graduates a nd llongradUalCs. Ihi$ specification allows for different effect ~ of experience on log earnings between college graduates and nongradua tes. thc popula tion regression funct ion is (f3l) + {32) + (131 + f3J)X. "'" O.8.that is.an be interpreted using K(. (8. the populat ioll regression function is f30 + PI X. this specifi ca tion allows for two diffcrenl population regres~ i on functions relating Yi and X i' de pending o n th e val ut: of D . In Figure 8.
31).Sb ): Y.4. d ifferenl slope (Figure 8. which allows for dif ferent intercepts and slopes.): Y/ = 130 .ari able are summarized in Key Concepi 8.33). This does nOI make much sense .1.97STR + S.282 CHAPTER 8 Nonlinear Regression functions INTERAcnONS BETWEEN BINARY AND CONTINUOUS VARIABLES 8. The three regression models with a binary and a comjnuous independent .0. This is achieved usin g I h ~ d iffere nt intercept/different slope specifica tion: TeslScore ..2 . 1. no ngraduates. + 1320 . are vcrsion~ of the multi ple regression model of Chapter 6 and. x the population regression line: relating Y. f31X .fl 2(Xi X 0) + /1 /. i~ cre· ated . Application to the studentteacher ratio and the percentage ofEngJish learners.6HiEL .J.. the coefficients of all three can be estimated by OLS..1. and (8.! = 0. 1. All three specifications. (ll.28(Sf"R x mEL). X Di) + 'I..<. There are three po£sibilities: 0.59) (10 . = 130 + f3lXi + P20i + f3iX. .h or loW'! One way to answer this questio n is 10 use a specificalio n Ihat allows for '''0 dif· feT eD( regression lines. + II. o nce the new variable Xi x D.~7) (8. and the continuous variable Xi can have II slope that d ep~nd s on the binary variable D . Same intercept.4 Through the use of the interaction term X. (8. depending on whether there are a high or low percentage of English learners.}4i "R..: 2.n Lhis applicatioo. and in practice this specificaiion is used less frequently than Equation (8. Differen t intercept and slope (Figure 8.32 ).: 3.~)  (0. .305.8e): Yi = Po + PIX.teacher ratio depend on whet he r the percentage 01 stmJe nLSsti llleamjog English is hig.R.32).. Does the eHec r on test scores 01 cuuing the student. same slope (Figure 8.:: 682 . Eq uations (8.5 ) (O.. Different intercept.
Fi rst the hypoth esis thatthc two lines are in fact t he same can be test ed by computing the F·statiSlic testing the jo int hypoth .lgc of students stillleam· ing English in the district hi greate r than 10% and equals 0 otherwise. esi~ that th e coeffi(.5 == 0. is less tha n 1. the hypothesis that the st udent. Finally. t he estimated regression lin e is 682. for twOcbI" I ' percent~: [ferent s\~ riEL). Thus. which has (I pvalue of 0. but the tes ts of the individual hypot heses usi ng the (·slatisti c fail to reject it. = 687.teach er ra tio does not enlt:r this spec· ificalion can be tested by computing the F·statistic for the join t hypot hesis that the coefficients on STR and on the inte raction tcnn are both zero. the hypothesis that the lines have the sa me intercept can bc tested by testing whe ther the population coefficient un HiE L is zero.:annot be rejected using a two·sided test at th e 10% significa nce level.9. Th e I·statistic.. so the hypothesis that the lines have the same lIltercepl cannot be reject ed at the 5% level. an d the coefficient on the interaction term S TR. reducing the stu· dentteacher nnio by J is predicted to increase test scores by 0.. = 1). (lrc highly cor· related.97 poin t in districts wi th low unctions of E nglish learners but by 2. Even though it is im possible to tell which of the coefficie nlS is non7. The OLS regression in Equation (8.25STR.25 poinls in districts with high fra c· tions of English learners.32. X HiELjare both 7.97S TR.28STR. ·n lis results in la rge standard e rrors on the individual coefficien ts.1.6 .97 = . so the null hypothesis that the two li nes have the same slope (.97STR. isCf<· ~ pr<lclice ndent van· offni /i" !acher ral1l1 high or lOll" .2. 'Ine difference be tween these two effects. Thc HiCa tistic i~ I = 5.ero.6119. (~ }J) where the binary variable I1iELI equals I if the perce nt.2::1 . . is the coefficient on th e interaction term in Equation (R. 8 . For districts with a high fraction of English le<lfncrs (Hi EL. the h ypot h e~is that (WO lines have the same slope can be tested hy testing wh e the r the coeffic ien t on the interaction term is zero.04.645 in absolute value.28 points. • . These three (csts produce seemingly coorradictory results: The jOillt test using the F·statistic rejects the joint hypothesis that the slope and t he intercept are the same. HiEL a nd STR x HiEL. the coefficients on the stude ntte'lch ~ r ratio a f C sta tistically significa nt al the 1% significance level .3 Intero(:~ons Between Independent Variables 283 tES ·n line n the Iws for dif· ~ o ns of the ~ D.004 . According to these estimates.lhe estimated regression line is 682.28/0. TIlis F·statist ic is 89.2 .34). 1. . .O. Second .O.34) can be used 10 test seve ral hypothe· sc:s a bout the popula tion regression line. Tnis F·statistic is 5.1.:ien t on HiEL.8.2 + 5.ero. &. For dimicts wi th a low fraction of English learners (Hi EL J = O). which is significan t at the 1% levcl. Third .1. there is strong !!videncc against the hypothesis that both a re zero. The reason fo r th is is that the regressors.
006) Midwest South \\Ie. II ) 0. In<h"'dual COC:ITOC:lCnlS arc "ausually 51gnirlCllnl al the ..0.242 'h.018) 0. l"f Intercept R' I.cX~) 0.63 ob!.0. The analy~ j s in those boxes was incomplete.~/.01 80"' " (0.(l.. MIII . 1 The Retum to Education and the Gender Gop: Regression Results for the United States in 2004 Dependenl _rjobl.023) 0.078" (0.0008) si Femafe X Years ofI! (!m::miol/ POIentiaf e.S1.on' for cadi regr"".n pI . .cy (scc Append..fper ienctf O.lXlII) .. A!.484 (0.54 S" (0.. <00 add aris. mfJoCt Are level. R~renor.o:".. Second.oo.058" (0. SI.:" and "qual) 0 nth" . Slandllrd " nor< a.OOOJ(l<'l· · (OJIOOO IS) ... Ill!. 3.. the boxes in Chaplcl10 3 and 5 show.220 1.011) 0.(122) 0.s of educaflOn 0... "" '" canl h. workers with morc education lend 10 earn more than their counterparts with less educa tion. education has economic rewa rds.030" (0.011 ) 0..cp.52 1" (0. OIl~ " Year.ll!e 'lampk SlZC IS n .. Sou/h.~ .\ 284 CHAPTER 8 Nonlineor Rq~tiOl'l f UI'ICliOM I n addi tion to its intellectual pleasu res.22 1  . MIII" ·<"1/ equals I If 1110:: wurke Tlivn . Tepmlcd . F.0.hIo..1XXJ8) )2) (3) )4) O.·al. .0861" (O. w the OLS estimator orthe coefficient on education could have omined variable bias..(XH 6) 0. Ililt omitted region is North NS/). the fu nctional form used in Chapter 5a simple linear reiatiOn implies \hal earnings change by a conSllI nl doll ar (roil/if/li ed) TABLE 8 . arc 1lI<l' calor va riableS de nou ng the reJlOO of the Un lled SlIlla In ".1XXJ8) . .237'" (O..n 1I..721 " (0...1).xperien ce su: dOl ).0.621" (0. and 1\..n the Mid. reg: 1.0232" (0. The dala are from 11M! March 200S Currenl PopulauOfI Sur.30" (0. however. 11.(X)1 I ) Female ..09.006) .2 15" (0.0 15) 0.f. il failed to control for other dctcnninants of earnings lhat might be correlated with educational uchievcmcnt. Porell1iul t!.0207· (0.:ilC&lOf uriilblc thai cquab t for "vtn<"11 and 0 for men.0016) (O.: logarithm of Hourly Eornings. theses ""low the: CSll mllled coeffICient$.()914" (0.006) . ~ ••• .:h the work er loves: For example..mal. for al least three reason" First.1 74 1.0...
Nevertheless. Third. ages 30 through 64.0207...242 otl!Iernlioro> J W". thereby addre<.0%. The dependent "ariable is the logarit hm of hourly earnings.3% kill) i21 "· )22) (= 0.I are oomistent with Ihose obtained by economists who cllrcfully address these limitations. in percent): for 16 years o f education.. See Card ( 1999).' .. Table 8. All of these Iimi1ations .. 19. A recent survey by Ihe econometrician David Curd (1999) of dozcns of empirical studie~ concludes that labor cconomi<.ts' best estimates of lhe return to education generall y fall between 8% and II %. These estimate~ )207·" m32·· <XXI8) ~ 1\ . regres.07S'" 1.~ing potential omitted variable bias that might ari~e 0. and that the ret urn depends on the quality of the education.06'X. hich.!. Decau~c tbe regression functions for men {4 ) ' ""'.. so anothe r ycar of education is a~iatcd with a conSr31l1 percentage mcrease (not dollar increase) in carnings. relative to those reported in regression (3).. and that uses a nonlin car fu nctional form relating education and earnings.. notably the native abilit}' of the worker.1:(6) .3 Inleroctions Between Independent Variable" 285 at mig. as measured by years since completion of schooling...(!d by a multiple regression analysis that includes those deter minants of earnings .orkers. For 12 years of education.99% for cach rear of education for men. . The estimated cOt!fficicl\ts imply a declining marginal valuc for each year of potential experience. der enters regression (2) significa ntly and with a large coefficient. regression (4) control<. the e<. Scr. on average men and women Ita\'!: nearly I]}e same il!"els of education. Thble 8. and potcntial problems associated with the way variables are measured in the CPs. ~. art in(. tha t is.0899 + 0... data on rulltime .1).."Ond. gender and years of cducation are (XXXII') .0.. and 11 .006) that they are 1he samc is 11..00t6).05!!.1 summarizes regressions estimated using in regression (4) is 8. the box in Chapter 5 ignores the gcnder diffe rences in earnings highlighted in the bo. for the potentilll experience of the worker.~ion (4) controll> for the rcgion orlhe Country in which the individual lives.(06) uncorrelated. The estimated economic retUrn on ~due~ t ion 'elation 'fl/i/lw:~dJ nt dolhlr can be addrcs.030..OI8OlO. the gender gap depends o n thc years o f educa tion.!!· :& \ !flbc rted III pare n· JlgIllrlC3!lCC Ir years of education differ sysremat icnlly by return to cducation.. ). If you are inte rested in learning mo re ahout the economic sU~lIln tial . the umis ~ion and women have different slopes.0.scribc:d in Appendix 3.0207 x 12 . .521 . if omitted. the gender gap is estimated to be 27. OOO~ of the return to education and of gender in regressio n (I) docs nOt result in omiHed variable bias: Even though gen the gender gap still have limitations.215" 1018) Third..timatcs in Table S..ht tbe [)n could It. the gender gap is less in perccntage terms. Fourth.. 10 percent) for women. herefl§ onc might su~pect region. . could cause omi tted va riable bias... including the possibility of other omitted variables.1 has rour salient resul t\' First... ( . Controlling for region makes a small difference to the estimated coefficients on the edu that1he dollar change in earnings is actually larger at higher levels of educa tion. SO amount for each additional year of education..25 (= O. from the Cu rrent Population Surve)' (the CPS dltla arc de. the returns to education are e<:onomicaily and regression (3). the tstatistic testing the ~tatisti \.J cally significantly diffe rent for men and women: In hypolhe~i~ 1. 1.nctional cation terms.8.= in Chapler 3.
. the effect of an additional year of education does not depend on the number \11 years of work experience..' i~ the number of years he or she went to school. Y .:onstanl.:. in the earnings example.{3\ 6.) + "l' (K . if XI change!.\' [Exercise 8. = INTI The i prod jn inXt Xz T and b. and X u: Y.! X z. there might be an intet<ll:linn between these twO variables so that lhe effect on wngc!. To sec thi!. A similar calculation ~bows that thc c[fect on Y of [1 chang~ . am.thc effect on wages o f nn additional year \' experience does not depend on Lhe number of years of education or. by the amount {3" for each additional }ear of education the worker has. are combined with logarithmic Iran~fonnutions..!t t~1 = (~ + fl~I)' PUlling these two eff~cts togcthe r shows that the cocfficient fl. Y ::: (f3\ + fhX2P. Interactions between two variables are summarized as Key Concept 8.md X 2 • above and beyond the 'lUll of the effects of a unit increase in Xl alone and a unit increm:ie in X~ alone. 1.5.l.: depe nds on the number of yea rs of educa tion.111e difference in Equation (K4 ). Wheu interaction!.lX2 in X 2• hllkl· . apply th e general meth od for willputing effects in nonlinear rcgrcs ~iOll models in Key Concept 8. then the effect on log ea rnings of all additional year of expe rie nce is greater. i. . however..lX1.35) . 11lUs.XI . they eJ O be used to estimute price elasticitie:i when the price elnstic::ity depends on Ih~ which depends on Xl' For example. is!:J. and X 2s ) are continuous". Thall. increa Po + {3 ]X II + (32X2i + (3)(Xli x X 2. In reality.equivalen tly. X ]' is his or he r years of ""\1r~ experience. is the cffcct of a unit increase in X] .10(a»).(/3] + /3.lX .1I is the product of X].1 on the illil'r' action tern.I~ the extra effect from changing both Xl and X 2. of an additional yei\! uf expcricnci. is log earnings of the lih worker. if fll is positl\<.':') 'Ille intcraction lerm allows the effect of a unil change in XI to de pend on X~.the e[fect on Yof a change in X l' holdi ng X 2 constant I!.286 CHAPTU a Nonlinear RegrM~ion Functions Interactions Between Two Continuous Variables Now suppose that hoth independent variables (X ].lXl~Xl (Exercise 8. This interaction can h~ modeled by augmenting the linear regression modd wilh an in teraction term th.X\ + ({32 + f3JX\)~X2 + .lX] = {3t + fl3 X 2· ~y (f06) ing Xl constant. computed tor the interacted regressiun function in Equation (8. then the expect ed change in } h 6. If the POpll lalion regression functio n is lincar.~X~).IO(c>l 111~ first term i ~ the effect from changing Xl holding Xl constant: the second lern11~ the effect from changing X l holding Xl (..8J.n example is when Y. and the finalt crm. by !iXI and Xl changes hy .
59) (U .67 PctE L ~ + O.422.Iem. (11. wit h a slope of 1..06.g5). conversely_allows th ~ e t'fcc! of n change in X. A different way to study this interaction is to examine the interaction between the studen tteacher ra tio and the continuous va riable .0).:her ratio and a binary variable indicating whet her the percentage of E ng lish learne rs is large or small.Ind uding this interaction te rm allows the effect on Y o f II ('hangc in XI to depend o n the value of X2 and.. ThaI i. bUI for a district with 23.3 Interactions Between Independent Voriobles 287 rt. ~.36) I .09 ( = 1.019) PcrEL).1 2 + 0.teacher ratio is estimated to be lang!:: In f 1<' 1.teacher ratio by o ne unit is predicted to increase test scores by o nly 1. j ~car of 8.**1 KEY CONCEPT rfwork ~ pOpu INTERACTIONS IN MULTIPLE REGRESSION The: interac tion term between th e two indepe nde nt variables X l nnd X 2 is thei r product X l X 2.1.12STR .019 = 0. is (8.11 points. the percen tage of Engl ish learners (PctE L)."1 ~. inbcr of tract ion year of can hI:: ~ nn that x (8. Ihe\ call ipcnds o~ tl1~ • . h oSl tl \' '::.85). \Vhe n the percentage of English learners i ~ at the 75 th pe rcenti le ( PctEL = 23. Th e pre vious e xampl es co nsider ed in te r actions between th e SIlI de nt.the slope o f the line relating test scores and the student. this li ne is esti mated 1 0 be natler.0012 x 23.11 ( = .hOld Tes/Sco re = 686.0% English learners. for a dislrict wilh 8.85 % English learn· ers.IO(c)I·I'h' (3)~Xl ~·y:· i~ When the pe rcentage of E nglish le arners is a t the median (PeIEL = g. reducing the student. ho wc ve r:The Ista tistic reSoting wheth er the coefficient on the interaction term is zero i ~ I " U.0. jus.37) (8.()9 points. ~ ut e d rcg res for _~Xl) ~X \ Application to the studentteacher ratio and the percentage of English learners. kms.3 .0). which is not significant althe 10% leve l.}¥J!VCf.. '111 1S is true whet he r X I a nd/or X 2 arc continuo us o r binary. )0 the inter bnd the sunl 'one. above and beyond the sum of the individual effects of a unit increase in X l a lone and a unit increase in X 2 a lo ne.5 I r lalcnlly.8) (0..37) N' = 0.! 10 depend on lhe value of X l The coefficient o n X I X X2 is the dft!ct of 3 unit increase in XI (/lui X~ .. r rd an c:\amp le ). ·111C difference between these eMimatcd cffcCl~ is n 01 statist ically signi ficant.l. A n 8.the estimated effect of a unit reduction in the studentIeaeher ratio is to inercasc lest scores by 1. That is. t en ~tcr.1 2 + 0.0012 x 8.35) characteristics of the good (see the box "The De ma nd for Econom ic lou rna ls" fo r 0 " X. by the ~n X2 .0012lO. iCond ternl i) ~ccrl H5. The estima te d in te ractio n regressio n is llslanl.UOI2(STR x (0.
figure 8. o '0 ) 6 S 4 3 2 . ) 6 5 " . not thc paper on whkh it is prmted but mther the ideas it contains. . 'o( Itu In(S ubscriprions) 8 1 .... p. o .9 Sublc r ipdons library Subscriptions and Prices of Economics Journals In(Subscriptiom) .s [ lIlla. Although we cannot mea')ure "ideas" directly... " Sf:" if llt<. Most research in economics first apPC.. I.. Bul 0S }~ in figure 8.I 2 < P r ice pe r citatiou (a) Subscri poons lnd P rice.r 3 There . !\fal]. •. we analyzed the rela tio n~hip between Ihe numher of subscriptions to a journnl at U. ib price is logically me~ urcd not in dollars per year or dollArs per page but in<itcad in dollars per idea ...)X'r ei t3tion lo(pricc pt{ citalio n ) (b) In(Sub:<criptlons) and In{Yrlct ptr CIl<lIIt)n) In (.:ation .S.$ 0 nonlinear inverse relolion between the . o s shown in f ig ure 8. Bccaus. the number of times thm articles in a journal are cOnlinllell . library $ubK riplion~ (quontity) and the ljbrary price per citotion (price). How elastic is thc demand by libraries for ceo nomks journals'! To find OUI....51 than for old journals (Age .288 CHAPTER 8 Nonlineor Regression Funclions P ro ressional econo mists (o llow the most recent research in their areas of speciali:.80).. . the relation between log quan tity CW"Id tog price appears 10 be appro)(imote/y lin eo.' .or thcir librariessubscribc to economics journals. . " fiGURE 8. S. so economists.9c shows thai demand IS mQ(e elastic lor young journals {Age . • • 0 II II < 2 • 2.90 lor ISO economics ja''nlOr~ in 2OCX>.. lihraries (Y1) am! ils libmry sUh$cri p lion price using data for tnc year 2{)()O for ISO eco nomics journals.per ma tion) number of U.9b.lrs in econom ics journals.491' :: 80 De~nd w~n • 0 6 S l 3 2 _I 2 3 I Ln(Price per citation) (c) In(Subolcnpuo1l1) lOO 111(1'II("e... . a good indirect measure h..: the product of a Journal u. .
.156*" (0.11Y ) 0.onomic Uevie:w: Because we UTl' inleresh':d in estimming clastici nOlin ead in. B ~cause casure IS the cominued al arc TABLE B.n Y ) 0.21·· (0. Ac(. \i\!UI:) the in Fig )ood 0.'( cllario/l) ...tlcalJv :\.lIS ) 0.2).(XXUX X1) !nttre!:p!  0.lInd . The priel.f)44 ) 0..206(0.0037 (OJX)55) [In( l'rice per ("luNion )f [lnr Price pt'r d /(II/ol/)]' !n(Agt' ) 0.id· U~[ cuetliclCQt5 ~re ~tatlj.. (2) T80 observation• .W8'· (O.ord Journa! (ll a l m~1 E C(lm) 1III'u'jCJ cost m OT'.(96) 4.: few ci l ~lion .9a and R lJh provide c: mpi r ical '>uppo rl fo r this transformat ion .017 (O.llbsequentty d ied by other researc he rs.Iute"'! ~re g.()~ 9 times \h~' prk <: of a library subscription 10 th. per cita tion or mu re. a library su bscription 10 the The scaHe rplots in Fig ure 8.1 4' " (0.040) In(Agt'j X In(Pru'" pn ci(m/rm) In(CllllfO("lt'r. Rej!IC". • .38) 3..0.424 '" (0..235· (0.Am ~ri{'all Ec. Ih<:: h\·poth~.555 I. Ih~1 the ~frKiclIIs on rlnll'Ti.118) 0.n:.145) (0.s.:.1[14) .13ndard Cl'mf! arc .626 elastic JrI'IOis  ~hc r\Iali~hc t .en In parcmh~ un<kr F·s[llhSUC5. from iCl peT ci ta tiun (the .1i91 O..pf>anl at u.229 (U.961 "* (') lnl Price PI.f + l.i.607 (). Inc:l. libraries in the Year 2000.8.2 Oe~ndent Iw i mwll I Estimates of tne Demand for Economic Journals va.. ()th o erS because their libra ry subscription price per year is very high: III 2006.098) 0.38) FSlatilnu and Summary Slatiltin F~lallslic testing coefficieniS on qUddratic and cubic lenns (p.750 0.1<.ly !in k2 0.373" (0.622 0.899·· (0. Some journ als an: c xpen· ties.OS33" (0.~ pt"TCllollmlWand [In(P'''~ ~r'II"lUm)I' a r~ both zew..'ll11eriflln Ecollomic Ri:~'if')\') to .9\<::n in p"r<::nlheses under codflCienl).705 0.. we u~c a loglog ~ pcc itica tion (Ke) Concept 8.38) 3. lndi.u. IWlos 'leN u.or (I) (3) .098) 0.43" (().O. ~rna:1 is leT the r 'w~ iogly.41" (0.6SS quan .0) 0. tnan $2700.Ii'nlricant al Ibe ·S~ lc'cl or •• I 'l(.. 3 Interactions Belweefllndependenl Voriables 289 <.. we m<:llsure price as the "prin' per ci tatio n" in the jourual.77" (o.'" range is enormous.S ) 3.025) 0.able: logarithm of l ubse. sivc per citation tx c au.. "..15 I (o.~e ther hav.374 (O .052) 0.
after controlling fo r differences in economic ch aract. does the eHect on test sco res of reducing the s t udent.2. Conseque n t I)" t hese results a rg uably are s u bject to o m itted \'3T1ablc bias.0. nOI a luxury.67 (SE . 2.4 Nonlinear Effects on Test Scores of the Student. 3.teacher ro ti o.:r istics of different dis Lricts. Demand is less cI. against log price CQuid have omitted \"ariablc bias.. afte r takjng econom ic facto rs and non linearities into account. upstan arc superimposed on the !OC8lterplot in Figure 8. s. abou! the effect on [est scores of reducing tbe s tude nt.9c: the older joumal"~ demand elltSticit) is O.3 to 0. funct ion of log price. Those resulLS yield the following conclusions (see if you can find the basis for these conclusions in the table!): 1.0. TIle evidence SUpp<lflS fI linear. If you areinlCf eS led in learning mOfe about the cronomics 0{ economIC! JOurnals.teacber ratio? Third.te acher mlio by tWO s tudents per tcac he r.teaeber ra tio. as our s uperintendent from C hapter 4 propose). and it is to such an exercise that we now turn.2R (SE = 0. a rtgression of log quantit). t he s pecifications in Sec tio ns 8. these nonlinear s pecifications must be augme nted ~~ith control va riables.teacher ratio depe nd Oil the frac tion of English learners? Second. experts estimate the demand elasticity for cigarettes to be in the range of . and mo SI impon ant.: two cont rol vari· abies. Demand curves lor . it seems. To d raw s u bstantive concl us io n:. 18.· 290 CHAPUR 8 Nonlincor R:egreutoo Functions some of the o ldest and most prestigious journals are the cheapest per citation. to do? . This demand is very inelastic: Demand is VCt)" lbc regression resuhs arc summarized in Table 8. For libraries. the logarithm of age lind the logarithm of t he n umber of characters per year in the journal. having thc most recent research on hand i~ a necessity.. IThese dat a wcrc graciOllsly provlded h)' Profe~~or lheodore Bergstrom of the DepilTtmcnt of Economic! III the Uni .:n.it)· of Califomill.3 excl ude additio na l control va riables s uc h a'i t tLe s tude nts' economic backgro und .m 80yearold journal and a yea ~old ~. dOCS th is effec t depend on the value of t he slUdc nI. First. So what is the elasticity of demand for economics journa ls? It depends on the age of the journal.Teacher Ratio Thi s secti on 3ddrCS5e~ Ihree speci fic q uestions about tes t scores and the :. \\ hat is lhe estimated e ffect on test scores o f reduci ng the s t udent. rather than a ~'hur cubic. Our regressions therefore includ. 8.S<lnlu.5.. holding pricc and age constmlt. Economics jour· nals arc.tU de n t. To keep the discussion foc used o n no nlinear models.OJ)8). ns addictive as cigarett~sbut a lot ben e Tfor your hea lth! ' jourIlnl!l.tstic for older than for newer inSl:nsith'c to price. " 'hile the younge r Joumal"s is . By way of comparison. espcciall~' (or old~r journals.06). Demand is greater fo r journals with more lIcters. Barbara.ec UcrgSITOm (2001).
The cha nge in the coeffic ient o n STR is large e nough between regressions (I) a nd (2) to warrant including the loga rithm of income in the remaining regressions a!> a de terrent to omitted variable bias. As in Section 7. allow ing expendit ures per pupilla incre ase (t hat is. The results are given in regression (2) in Table R.s and thdr pvalues.6. When the economic control variables (percentage eli gible for subsidized lu nch and log incom e) are added [regression (4) in the tableJ.4 Nonlinear Effech. 0 . the coefficients change.3 i ~ tbe interacted regression in Equation (8. we do not include expe nd ilures pe r pupil as a regressor and in so doing we are considering the effect of decreasing the studen t.teacher ratio by includin g a cubic spec ifica tion in STR in add ition to the ot her control variables in regression (4) [the interactio n term . Discussion of Regression Results The OLS regression results are summa..1.50 = .73.0.1 l dOC:~ f hat i~ . standard errors.0010 .58/0. is regressio n (3) in Table 7. as indicated by the description in e<lch row. extended 1 the economic background of the studen ts: the percentage of st udents eligible fo r a subsidized lunch and the logarithm of average district income.y l\\ll " . but in neither case is the coefficient on the interaction term significa nt at the 5% level. The first column of regression results.1.rized in Table 8. on T est Scores of the SIudentT eoc:heI Rotio 291 We answer these questions by considering nonlinear regression specifica tions 0 include two measures of of the type discussed in Sections 8. although it remains statisticall y significant at the 1% level. Regression (5) examines whether the effect of changing the student.3.0. cerlain Fstatisttc.34) with the binary vari able for a high or low percentage of E nglish learners.3.bles. This regression does not control fo r income.2 and 8. laheled regression (1) in rhe ta ble.3. Regression (3) io Tahle 8.teacher ratio.8. do is check whe ther the results change substantially when log income is included as an additional economic contro l vari able.16).l1te entries in tbe table are the coef ficients. but with no economic control varia.2 suggests tllul th is spec· ification captures the non linear relationship betwee n lest scores and income. Based on the evidence in regressio n (4). fallin g from . The logarithm of income is used because tbe empirical analysis o f Section 8.1l1e log of income is sta tistically significant at the 1% level and the coefficient on th e s\ udenttt:achcr ratio becomes somewhat closer to zero.teacher ra tio depcnds on Ihe value of the student. lhe hypothe sis Ihatthe effect of STR is the same for djSlriets wilh low and high percentages of English learners cannot be rejected <l Ithe 5% lelfe l (t he /SLlJ tisLic is I "" . so the first thing we. The columns labeled ( I) through (7) each report separate regressions. and summary sta tistics. IliEL x STR. we are oat holding expenditures per pupil constant).1 repea led here fo r convenience. was dropped because il was nOI significa nt in \ ·ec l1 ie bk 109 'ith stu' cter ~ thC tJl.
..3 Noolineor Regression Models of Test Scores ~ Dependent TOriab': R..1 7 « IJ.6·· (V.1 (50.:all) "~.mll) Jl..OhO (OJJ21 0.'J) () I n~' (O.~l· A\erag.059 (0.. 5....I~ .80) 5. ('I..~) 65.% (0 (~ji) (0.0 5..420·~ O.1 d"ln.7IJS 0.024) 11....0 ( \63.'} 11 .69 (O.5) STR J 4 :is'' ( \.') 658.R ~.6) s (1 85.19:"· to..U3I) IO%? (Binary.l. nu: k.2.6.5 1) .12" 12. '.gressor .50 (9.[ 'M "ltnrj..\ (327.44) J..!)..lKII) 4.".04 15. j( STR' ". ) (7 ) StudenHeacher r3110 (STR ) Snl~ I OU ((l.7'1') 07'l" Ibex rC!U"""'\ln~ "ac C'llm~ l cJ U'll\£ tile data (\Il Ii.XU) ~2..U.b rn C"ahr(lrnia. 11t F.41~' fl.81 ) 11.04 (0.L x 5rH! 6.2fi (1J.n.2) Hi f.12 ( I.n".O:!9) (O.tlJl) 0. IJ] (t).'.~1 ~..1J7) .IIIE! X STR·... K :'1 R' an: ""'.dr.292 CH APTER B Nonlinear Regreuion Fuodion~ TABU 8.Jj)""· (tU1~'>J (O.59) 0. and p·..7) HitL.V) 252.5) (lM.<j.e.: ui~trici inl'Ome (logarithm) I nterc~pl 11.\:1) h.J/.Wnl CTT"" lQ fQrcnl~ urwk..004) 5.50) ~ 123. .J1.·all1CS IIrC 11''('11 In Pl'fo:nl~ under F·". (a) All STR vari:lblc ~ and interacuons .\6h ({J.41 1·· (OJT..% l < (Will ) ~.101 (0.lrc Sl~ U!1I11.305 U.f).003) « (WII ) C).3 1151 '· ( UO) 2+1 700..1.47 (l:!7) f % English learners % English ieamc" 2: 0.?3" (0.53 (0.<}2 {'I:11 (0.O.0·0) OS47·" (0.1 .7b) (1.97 (0. ( I) (2) (3 ) (4 ) (5) (. 420 observations..2' ('.'" .54) ~HiEL )( SfR' % Eligihle ror subsidized lunch 0...6) (11.075"· 10024 ) O.1\ U.1....I ) /<.7. n" ( 103) 816./Iln'"". STR' 0 (e) /f.003) 2.d . '1.6' (8.7<)1 0.26) .\lCf1i<:l..!:1I (25 ?6 ) ).SS (0..f"'''n!.27 ) O.EL )( S TR. 0 Sf. lnd..7) fStotiltiU and pValu.t "_ In dbrrict.021) 0.176·· (O.lIJ (b) STR' .!9) 12.1 on Joinl Hypothe .1 0795 K~n 85 ..nh .VKiu.) 122.34 ) &U3' (Zt86) tlUO" (2S.2" O..lj.42" ( 1.88 8.< STR O.. lllEL) 11 9.H) \1 507" (1.'i. d~"1Ctlbed in ArflClldh ~ 1 '\lIll.75' (1.
'llIis "as done for "arious . I6n •• 1(0.]6j ~3..Oo. regression (4) at the 10% kvell. ""S of S J 1<.17. STR".57 0..69.. 7 ) 5..fXH). with a pvaluc of < O.::ssio n (7) is <l modification of regression (5 ).% (f). "For each curvc. thC"Se fi.1lle estimates in regression (5) arc consiste nt wit h the studentteacher ratio having a nonlinear effect. 3 :lre most easily in terpreted graphi call y.o!: the other variables ron~tant at their sample aver figes. o ther than STR. The resulting Fstatisti c is 2..91 (UJ))) ) 5. holding fixed other vulues of the independent variables in th e regression . and STR" we can check whethc r the (possibly cubic) population regression s funct ions relating test scores and STR are different for low and high perccntages of English learners. To do so.47" (1..teacher ralio but also on the frac tion of English le(lmcrs.402 (O.27) o))60U (0.lre ~t~ • .1I"cra!Jc "!Lluc and compUling thc predicted vatue by multiplyinp.ression functions are different for dis tricts with high a nd low percentages of English learne rs: however. ' t(lcicn!.teachc:r ratio for the linear spccific. intlic<ltmg thaI the result s in regression (5) arc no! sensitive 10 whal measure of the percentage of E nglish learners is actually used in the regression.. we test the res triction that the coeffi cients on the three interaction terms ure ze ro. comparing regressions (6) <lnd (4) makes it clear that these differences are associated \\ith the qU3dratic and cubic terms.d values of the inde~no.~1·· (1.03') r 0.10 g raph~ the esti mated regression functions relating test scores and lhe sludc m.The coeffi ci ent!:."]\1:<0 ~ndarJe"''' .. in which the con ti nuous vari able PctEL is used instead of the binary variable Hil:L to control for the pe r centage of English learners in th e district.~ oomput<:d by setllng each ind.. which has a p value of 0.'lti on (2) and the cubic specifica tions (5) and (7). Regr. 111e null hypothesis thaI the rela tionship is linear is rejected at the 1% signHica n c ~ level against Ihe uherna live Iha l it is cubic (Iht. T he estimated regression funct ions are a\1 cl ose to each other. By including in te ractions between Hi EL and STR.8.·aIUl.1)21 ) 0 ..alue wa..: ~hmllted coefficients (rom Tohle RJ. Figure 8 .4 Nonlinear Effec~ on Test Score~ of the StudentTeocher Ralio 293 (7) ~S.OJ') 1]. although the cubi c regressions flatten out for large val ues of the studentteacher ratio. Regression (6) further examines whether the effetl of the studentteacher ratio depends not jusl on the value of the student.:~ndent YJlriahte. alo ng witb a SC3uerplot of the data.! Fstatistic testing the hypothesis that the true cocfficienls on SFR2 and STRJ are zero is 6.046 and thus is significant at the 5% bu t not the 1 % significance !evel Illis provides some evidence thai the reg..X (165. hold. the predicted . '1l1e nonlinear specifica ti ons ill Tanlc 8. In all the specificat ions.knt Yllriahle~ hy the respecti . to its ~mplt.29·· l<'...' ) :. 4 These estimated regression functio ns show t he predicted v<llue of test scores as a function of the studentteacher ralio. the hypothesis tha t the studentteacher ratio does not enter the regressions is rejected at the 1% level. and th" graph ofthc resulting adjustl""tl predicted values is the ~\1m3tcd Tegre!> ~ Ion line rclalll1!: test scores !Lnd the STR.81) 244. on the olher regressors do nol change subsln nlially wheD this modification is made.
)0•• .~.• tillt'ar regression (2) _ _ • 62tJ 6OO1~CCC~~CC~=~~C~ 12 14 16 H! 20 22 24 26 . Thus. 11 graphs these two estimated regression functions so that we can scc whether this di fference. _.11. • '~t .....teacher ratios for which we have the most data. .... :o "fIi.. 1~ Studentteacher rrllio R egression (6) indica tes a sta tistica ll y sigoifica nt differe nce in the cuhic regression fuoClion s relating lest scores a nd STR .. hut we m ust be careful not to read mllTe into this than is justified..teache r ratio does not de pe nd on the pe rcentage of Engli'\h learners for the range o f student ... They indica te 0 $ffiQ1I amount o f oonlineority in the relation between ~I ~Qre~ Ten score no 700 and ~tudentIeocher rolio.. "...a range that inclu des ~% of the observations. ~"' . ] .. for student.. in addition to bei ng statistically significant..(' 64{) . • . ..~ .. is of practical importance. Cllbk regression (5) .... .PTER 8 Nonl inear Regression Func tions FIGURE 8 . . : t .. we conclude t hat the effect on test scores of a cha nge In th e st ude nt.. .11 shows.. . • "# '\ iK>n. 10 Three Regression Functions Relating Te5t SeOret and Studentleocher Ratio fl Di The cubic fegression~ from column~ (. but tbe e ffect o f a c ha nge in the studen tteache r ratio is essentiall y the sa me for the two gro ups..." ..... Cubic ~rtMion (7) ... 11le two regression func tions are diffcn: nt for student. .5 constitute onl y 6% of Ihe observations.. ..teacher ratios below 16.. tha t is." . Figure 8. districts with a lower perce ntage of E nglish learners do better.teacher ratios between 17 and 23.0 .. '111 e diSlricts with STR < 16... so the differences between the nonlinear regression functions nTe renecting differences in these very few districts with very low stude ntteacher ratio~..the two function s are se parated hy approximately ten poi[\ls but otherwise are very similar. .... '" • .3 ore nearl}' identical... . for STR between 17 and 23. As Figure 8. :::0' • o • .. holding co nsta nt the stude ntteache r ra tio. de pend ing on whethe r the pcr centage of EogJish learners in the district is large or small.. y. )~ ..5.• 294 CHA. J ...'l. '. based on Figure 8. ~. 0 0 680 6(. . . ... "~*::..5) and (7) of T obie 8..'".
ression (6) provides sta tistically significa nt e videoce (at the 5% level) that tht: regression functions are diUe reni fo r disuicts with high and low per ts.73 x 2) points. 10 the linea r specificalion (2). This effect is statistically significant at the I % level (the coefficients on STR2 and STR J are al ways signif. ilfter controlling for economic backgro und.i cant at th e 1% level). and the estima ted effect o f this red uction is to impro ve test scores by 1. with a tant the r ratiO is ce ntages of English learners.. .. In the linear specifi cations. " '. wheth er there are many or few English learners in the district does not have a substantial influence on the effecl on test scores of a change in lhe student. this effect does nOI d epend on the student.11 Nonlinear Effects on T est Scores of the SIudenHeocher Ratio 295 Regression f unctions for Districts with High and Low Percentages of English learners Test score Districb with low perc:anlogcs of English learners (HiEL =" 0 ) are shown by groy 0015 and dis/riels with HiEL = 1 are ~own by colored dols. The cubic specification in reg.i have th~ • . 4 FIGURE 8.teache r ral io itself. . She wants to know the effect on test scores of reducing the student. " I '' 28 12 __~__L__L__~__~__~__~__~ 14 16 22 u 26 20 ~ ratio Studentteacher ratio " Summary of Findings e cubic the per hs these renec.8. Second. The slopes of the rogreu ion functions differ most for very large and small values of STR.46 ( = 0. 'Tllird .. fOf which there ore few observo liom . but otherw<!. after controlling for economic background .11 des 88% n pointS These results let us answer the three question s raised at the start of this section. there is evidence of a non linear effect on tes Lscores of th e stud en tlea ch ~r ratio. as shown in Figure 8. In fhe diffe rent ead jllorc % of the ctions are t~ I C ilc her coreS of a t nlage l. ~ .le<lcher ratio by two SludenLS per teacher. .teache r ratio. . i'J.11.. howevec the estimated regression functions have similar slopes in the range of studentteacher ralios i.. there is no statis tically significant evidence of such a diffe re nce. in ure 8. • • • • •• ••• .2U ~ LI • .' • • 8 3 i ~ oppfOximaiely 10 point$ below the (ubic regression function for Hifl = 0 for 17 "5 SfR "5 23.'O n taining most of our data..e rile two rune iions hove similor ~pe$ and slopes in this range. The cubic regreuion function ~ HiEL = 1 from regression 1 6) in tobie 720 700 680 660 610 (.. First. • • • • • • . • ~ . we now can return to the superinte ndent's problem that opened Chap ter 4.
_ __ _~ . most importantly. ll1e single most important step in speci fying non linear regression functiOnS is 10 "usc your head. .. in ge ne ral depends on the values of XI' x. By making the qucs. and yo u could nol be ol. arc muny dHferent mode ls in this chapter.93. while based on regression (7) this esti mate is 2.J geno:ral approach {or :)uch an analysis.. the unkn own coefficients can be eSlimated by OLS.lUdenl~ would differentially benefit from more personal aUention. wh y th e slope of the populatioll n:p. . can you think of a rea~on .l~ecl Oil economic theory or expert judgment.i(lOs and exe rcise judgment along the way.. X~. then based on regression (5) the eSIimated erfe.t of this reduction is to improve lesl scores by 1. If her district currently bas a studentteacher ratio of 22.! expected effect on Y of a change in onc of th e independen t variable:.296 CH APTf.S Conclusion This ch~lpler presen te d several ways to model non linear regression functions.R 8 NonlineoT Regres~i on Fundion ~ nonlinear specifica tions.'. we were able to find a precise answer: After cOlltrollinR fOf lhe'_ j".and '·statistics as described in Chap ter 7. . then based o n regression (5) th e estim ated effect of thi~ reduction is 10 improve tesl scores by 3.93 points. \\hlch nonlineari[ies (if any) could have major implicatio ns for the substantive i~.1f her district currently has a slUden tlcacher ratio of 20. llo\\ should you analyze possible nonlinearities in p racticc? Sectio n 8. such reasoning led us 10 in\'e~ tigate whether hiri ng more teachers might have a greate r effect in districts \\ith ." Before you look at the data . perhaps Ih:cau. hul 10 practicc data analysis is rarely that simple.. but this approach req uires you to make dcci". or another. and hypotheses ahout the ir va lues can ~c tested using /.. will ie based on regre:'siun (7) this c~lima t c h 1. There. 8. In Ihese models.I large percentage ofstudenb stilllcarning English.. tho. Because these models arc variants of the mul tiple regression mode l. ~ addressed by your study? Answering th ese questi ons carefully will foell:' ~our analysis..rc~· sion function mighl depend on the value of that .t that cutting the studentIeacher ralio has a somewhat greate r effect if this mlio i\ already small.• X . consta nt.1 laid o ut .lnd she is considering cutting itto 20. cring cUlling it to IS.. XI' holoing the other independe nt v(lriables X 2• . . this effect depends on the value o r the studentteacher ralio. what sort of dependence might you expect? And .ue.00 points. tion precise..z. for example_. In the test sco r~ application.90. b.e those :. and she is COINu. independent variable? If so. II would be convenient if there were a ~in' gle recipe you cou ld follow that would always work in every appl ication. The estimates from the nonl inear specifications suggl.:uncd fo r being a bit bcwilde red abo ut willcb 10 use in (I given applical ion.
we found no statistically significant evidence of such an interaction. Summary 1. g.rc~ l. regrl.lriablc. b ut in nclions i~ 011 . based on reg. 1. which live iSSlll·~ focUS your portional changes and clasticities. they allow the regression slope of one variable \0 depend on the value of a nother variable. Explai n how you would specify a nonlinear regression to model this shape. 4. Can you thin k of an economic relationship wilh a shape like Ihis? us 10 i tl\' ':~ Inch \\il\1 a se s\Udel'l~ Ihe qU CS ling r(lr .cst tio is The procedure is summar ized in Key Concept S.RevIew the Concept~ 297 'id '"' his (7 ) and Icc\ sian economic background of the students.. Small changcs in logarithms ca n be interprdcd as proportional or percentage liolls.I.hi g . the 10iding o n the blamed .lble~ b c.. A quadrat ic regres sion includes X and X 1.:asing (has a positive slope) and is steep fur small values of X but less steep for large values of X. 5.'ariablc ·) nlly. The product of two varil. R egressions involving logarith ms are used 10 estimate pro nown can be c is. 2. When interaction terms are included as regressors. In a nonlinea r regression. and a cu bic regressi on includeS X. changes in iI v. A polynomi<ll regression includes powers or X as regressors. the slope of the popUlation regression funct ion depends on the value or one or more of th e independent variables. X 2• and X l..!ssion function at two values of the independent variable(s).. Key Terms quadratic rcgrc~sion model (258) tog·linear model (271) loglog model (271) interact ion term (27~) il'L1 cracted regressor (278) interaction regression model (278) nonlinear regression funl:lion (260) polynomia l regression model (265) cuhic regression model (265) elasticity (267) exponen tial function (267) natural10garithm (268) linearlog model (269) nonlinear least squares (309) nonlinear lcaM squares estimators (309) Review the Concepts K_ l Sketch a regression fu nction that is incr. ~hould g.llled an interaction tenn.:ne ral eClsions re a sin· 11. The e ffect on r of a change inlhe independent variabk(s) can be computed by evaluating the.
. Exercises 8. Repeat (a) assuming Sales xm :0 205.labor (L) . II/!. b.0 and # 2 = 0. How good is the approxima tion when Lhe change is small? Does the quality of the approximation deteri.'2002 ) . where In is the q ua ntit y of (reil!) money. a nd an e rror term II using the equation Q :0 AKf'>I L jJ~MjJle" .298 CHAPTER 8 Nonlineor Regression FUn<:lions 8. CDP is Ihe value of (real) gross domestic product.. Sales1f." E.2 A " Cobb· Douglas" production function relates production (Q) to factors of produc liun. . Your profes· sor says. . /1 ($1001 saJ(~l. '0 '0 Ln" Po. 2002. Ii' VarianJ. Supposc you ha ve data on produc tion and the fac l or~ of production from a random sample of finn s with the same Co bb· D ouglas pro d uc tio n functio n.S Suppose tha t in probll!m 8.m '" 500.2 you lhought that the value of {32 was not can· stan t. what is the ex pected cha nge in prict: of bui lding a 500·square·foot addition to a house? Construct a 95% can fid e nee inte rva l for the perceOlage change in price_ . How would you usc regression analys is to estimate tht: production parameters? 8. How could you use an inter action te rm to capture this effect? Con. where A . \Vbat will happen Lo the value of III if CDP increases by 2%? %a1 will happen to m if the interest rate increases from 4% to 5% '? 8.xplain /1 ff Si. .2 Suppose th at a researcher collects data on houses thal have sold in a partiC ula r neighborhood ove r the pas t yea r and obtain s the regression results III the table shown below. S.In(Sales2(X ll )]. a. Sales2(. 9_ Using the results in column (1).1 Sales in a company a re $196 million in 2001 and inc rease to $198 million in Inter. think iliat the relationship between Yand X is nonli near..orate as the percentage change increases'? 8. Compare th is value to the approxim ation 100 X [In (Sa le.1 binary . c. a nd R is the value of th e n(lminai inte rest rate measured in perce nt per year.:" . Compute thl! percentage increase in sales using the us ual fo rmula 100 X SaJ° SER  Sum. T. aud raw materials (M).4 You have estimated a linear regression model relating Y to X. Suppose that 13 1 = 1. PI' P2• a nd /33 a re produc_ lio n pa ram c te rs.3 A s tandard " money de ma nd" func tion used by macrocconomists has the fo rm In(m) = /30 + t3 lln(GDP) + t32R.XI2 = 250. ~ ble {! .. capital (K).02. }Joo. . but rat he r in creased when K increased. how you would test the adequacy of your linear regression.
73 0.071 (0.e or In(Size) to expl ain bouse prices? c. 071 (O.Exercises Regression Results for Exen:ise 8.035) 6. Uolh.00342 (0.69 (0.102 0.72 0.029) (O.73 ula llXl 00 :::> 500.57 (2. is it better to usc Si:.·i!." 5 bi naf) V'dnabJe (I if house: has a nice ' ·ICW.74 0.026 (0.:e) 0. Using column (2). The regression in column (3) adds the number of bedrooms to the regression.1 2 (0.e)..00 (0.026) 0.10) POlll x View Condition 0.07 1 (0.F fJrdroom s 0.03) O'<Xl78 (0.OJ5) 0.037) Pno/ lil(W 0.<c: has a swimm ing pool.12 (0.0022 (0.034) 0.098 0. Condition 5 bind!} '~riable (1 if realtor rcporl$ house is in excellent cotldit ion.027 (0.60 Intercept 6.!) 0.000038) has Ihe f (real) Ie value that (31 1creases 1105%? r profcs· Explain not con an inter· 0.071 (0.14) 0.0036 (0.<)99 0.) Construct a 95% confidence interval for thi s effect. r". what is the estimated effect o f pool on price? (Make sure you gellhe unit'> right.032) 0.099 0. oes the ang e b.028) 0.Ol.69 (0.73 0..029) 0. 0 othc:rwl:.2 Oepe ndent vorioble: In (Price) 299 lOTS of 'T le rm Toduc· lors of Reg reSiior [ 1[ (2) [3 [ [' [ IS) as pro· tile the Sizt III{SI.91 (0. 5 house >II.036) 0.021 (0.037 (0. mm ory Stotiltin iUion in SFR l?l 0.055) InL~il .054) 0.c): Vic.68 (0.12 (0. Vall~bk dc:tinlu005' T'riec::5 sale price (5): Soli".1m (0. Compa ring columns (1) a nd (2) .02 (7..39) (0..087) 0.027 0..c results i(l in pric~ of 95% coP  o. rool5 binary 'ari· able fltf hOU.c: (in square reet ): BedrQllnls 5 numbl:r of ~drooms.O otherwise).OJO) 0.53) 0.12 (0.63 (O.045) 10.50) 0.13 (0.n a paru. .069) U. How large is the estimated effeet of an additional bed room? Is Ihe effeci stati stically significant? WIlY do you think th e esti matetl effect is so small'! (llilll: Which other variables are being held constant?) • .40) S.099 0.035) 0.036) 7..035) 6.
4 Read the box ··The Re lurns to Education and tbe G e nJ ~ r Ga p " in St: c tio n tD. th en jumps and is COm l <1 nt for cl ass SI 7(.:d with . 8. defin e the binary vu riables STRsmoll = I if 5TR < 20.0 o the rwise. A resea rche r tries to estimate the regression TewScore.' ~ between 20 and 25. a. and STRmoderatt! = 0 other wise. th e regression Tr. the r~L I ' lionship is constant in the in termedia te region between 20 and 25 stude nt~ and th ere is no loss 1 0 increasing class si ze when i t is alread)' greater t han 25. + {3~TRlllrl?t'. tor comme nts. . Is the quad rat ic term In(Size)2 important? r...: 1\ constant fo r <. Sketch the regression tunetion relating TeJrSCQre to STR for hypothe tical val ues of the regress ion coefficiem s thai are consistent with the educatoc·s statement. Rather. Consider b.\ size is less thH n 20 students and do very roorly when c1:ts. STRmoJerole = I ir 20 ==: STR ~ 25.studenl pcrformam:<: depends on class Sileo but not in the way your regressions say. T {33STRlar8ei + u r and fi nds Ihal her com puter crashes.Her til<l1l 25. and STRsJnn{/ . " In my experience. Why? 8. stud ents do well when c['l. Use the regression in column (5) \ 0 com pute the expected chonge III price when a pool is added to a ho use withou t a \'iew.<ic wi lh a vicw. + II..ears of education. There are no gains from reducing cl ass size below 20 students. b. 1 and the method in Key Co ncept 8.:m addi tional year of expcricn~. and then jumps aga in Ior class sizes gre.1 to estimate the expected change in the logarithm of average hourly earnings (All E) a~soci at.To model these threshold effects.3 Aft~r reading Ih is chapter's analysis of test scores and cl ass si?c. Repeat (a) a~uming 10 years of elCperience.• 300 CHAPTER 8 Nonlineor R C9re5Sion Funclions e. and 2 years of experienl!":· who is from a westcrn stalC. Consider a man wilh 16 ). Usc the results fro m co lumn (4) of Tahk' 8. £In educt!.'·The educator is describing a "threshold effec(· in wh ich per{ormanCl. and S TRmo(IerOle = 0 o the rwise. = {3r. Repeat the l!xcr cise fo r a hOll.srSwre.:lass sizes less tha n 20. = p (l + {3ISTRmwll. Is there a la rge difference'! Is the differ ence sl atisticaUy significant? 8. + {J 1STRsmnllj + ~STRmoderale.s size is greater th'ln 25. and STRlarge = 1 if STR > 25.
he conjeclUres (ba l increases in this varia ble from 10% to 20% have little effe ct o n lest score ~ bu t tha t cha nges fr o m 50% to 60% have a much lar ge r eUecl. Expla in why the a nswecs to (a) and (b) are differe nt. Wou ld your answers 10 (a}(d) c ha nge. In pa.ne d from the estimated regression ? anc!! is ss siz!!5 25. Is the diffaence in the answers to (a) and (b) statistically signiJicanl at the 5% level? E xpla in .3. Using the results in reg ression (4). I.000. ) c.s. 1.0. D escribe a nonlinea r specifi cat io n tha t ca n be used to m odel this form of no nlinea rit y. 30 T ! in d.3. How • .3? ii. r than 8. Describe a nonlin ea r specification Lhat can be used to m odel this fo rm of nonlinea rity. xer er duca size. wha Li ~ the basis fo r eac h of these conclusio ns? b. Looking a t the results in the ta ble. To it The box reports tha t the sta ndard error fo r the estimated elasticity is 0.o the linear specificatio n in column (7) of Table 8. r.5 Read the box "The D eman d for Econo mic. would you test wheth er the researcher's conjecture was bel ter tban the linear specification in column (7) of Table 8.000. A researcher s uspects tha t tbe effect of %Eligihle for subsidized Illnch has a no nli nea r effect o n lest scores. Suppose tba t tbe vactable Characters had been divided by 1. ii.28. b: and i. a. i. H OIII wo uld yo u ca lculate this standard error? (Hin!: See the di scus. if the person was a woma n? From fh e So ulh? Expla in.J ? A resea rcher suspects tha t the effect o ( income on test scores is differ ent in di stri cts wi th small classes tha n in districts with large classes.6 Refe r to Ta ble 8. How would the re. lhe box reports that the elast icity of demand fo r a n ROyearold jo urnal is . c\fl:.'S e. How wo uld you cha nge the regressio n il" you suspecte d fha l fhe effecl of expe rie nce o n ea rnings was diffe re nt fo r me n than fo r wo me n? r Ihnn e re ladents.rtic ula r.i>ults in column (4) ch ange? 8.io n "Sta nda rd erro rs of eSL imated effects " below Key Concept 8.&ercises c. How wo uld you test whe the r Ule rese arche r 's conjecture was bet ter tha.. How was this va lue de te nni...06." Journals" in Sectio n 8.000 instcad of 1. Th e box reaches three conclusio ns.
< between 5 and IOU on the horizontal axis and va lues of Y on the vt:rllci\l axis): . «1. 111e SER is 2. T he st udy compares total compensation among top execut ives in a large set o( U.O.670. Two new variables.86 . A regression o[ the logarithm o[ earnings onto Fef1wlc yielJ.S X is a conti nuous variab le that takes on values between 5 and 100.65. Are large firm s more likely to have female top executives than ::.) a.: + O.003) 46.03) (0.()()4Rcwfn.28.) = 6. The coemcient on In( MarkcrValu e) is 0.g.S. b. Does this regression suggest that fe male top executives earn ics':> th an top male execu tives? E xplain. S. / i~ 3 binary variahle. iii.0. ii. fn(Eami. are added 10 the regression: [D (Earnings) = 3.7 Thi s problem is inspired by a study of the "gender gap " in earnings in top corp0nLtc jobs [Bertrand and Hallock (2001 )J . c. in the 1990s.28Female + 0. (0. ii. in perce ntage points).004) /I .44Felll(lle.345. Explain what val ue mea ns. i .302 CHAPTER" Nonlinear Regression Functions 8. Does this regression suggest that there is gender discrimimHion? Explain..37.01) (0.. Explain wh at this value means. Let Female be an indicator variable that is equa l to I for remales ami 0 for males.65. "R 2 = 0. the market value of the firm (a measure of firm size. Ex plain what this value means.48 . "Illc coefficien t on Female is now 0.mall firms? Explai n. SER = 2. in millions of dollars) and stock return (a measure of !irm perfor· mance. (Eacb yea r these publicly traded corporations must re port total comp~nsation levels fo r their top five executivcs. iv. public corporation. Tne estimated coeffi cien t on Female is .05) \h i~ i.04) (0.44. Explain why it has changed from the regression in (a).37 In(MarketValue) (0. Sketch the (allowing regression fun ction':> ( with value~ of .0.
If Age increases from 25 to 26.lX1. with Z + 3. \.1 Usc the data ~el CPS04 described in Empirical Exercise 4. how are earnings e xpected to change? c. how arc earnings expected to change? If Age increases from 33 10 34.02  =.0X . . Run 11 regression of the logaritllm average hourly earnings. Run a regression of "vcmge hourly earnings (A H £ ) on age (Age).0 X In(X).i1 = /32 + /3~Xl (effect of change in Xl holding Xl constant). Same as (i ).9).] earn less 8. Usc Key Concepl 8. Empirical Exercises E8.0 1.lHim:This requires esti mating a new regression using a different definition of th e regressors and the dl!pcndcnt variahle.Empirical Exercises g. gcnder (Female). .O.01X 2.. In(AH E).3 to calculate the conlidcnce interval discussed helow Equation (KS). and Bachelor. If Age increases from 25 to 26.0 + 3.wil h Z = 1. = {3(1 (I. Ires lot a\ l)Orations port total b. Female. how are earnings expecleLl to change? If Age increases from 33 1034. Run a regression of the logarithm average hourly earnings. = lie:::. on A. of fnm firm perfor t' c.'er\1G:1 . e.() + 125.mgel! by AX l and X2 ch anges by .10 Co nsider Ih(. If XI ch. i. If Age increases from 25 to 26.0 X In(X) + 4. bUI with Z = O.0 X In(X).XJ !1X I + (/32 + f3~XI)ilXl + (3J!:J. = 2. .1 to answer Ihe following questions.02. and Bachelor.! rcgrc~ion model Y. See Exercise (7. but with Z = O. ii.s r l and \00.1 10 show: a. how I I=. Z is tl ith values ()~ .. :lnd education (Bachelor). Ii. Female.0 + 3.0 X Z x In(X). [)U3) hal this value it has than small a. l' l' = 2. = 2.0 c. how arc earnings expected to change? b. 8.lhen AY = (/31 + /3. In(A II E). on In (Age).£:e. b.Xl ~2 · )U4 Refill'll. SHme Ull (i).::' in top 303 a.. I.\ J on the . and 0 wll' yields 2.0 X In(X) + 4.~\ = {3 l + {31X 1. l' i. + (3 zX Zi + f3iXlI x X 2.9 th is ilal ExplHin how you would use "Approach #1" of Sect ion 7. Y= l. d.) + [lination'! + /3 3X2 (effect of change in X l holding X 1 constant). 3.
After running all of these regressions (a nd any others that you w3nl iO run). RUll a regress ion of In(AHE ). k. I. if Age increases from 25 to 26. and (d) fo r males with a high school diploma. " C. orkers. FenUlle. on Age. PIOl the regression relation between Age and In( AH£) fro m (b).' out the following exercises. (C) .1rry . Do you prefer the regression in (d) to the regression in (c)? Explain. o n Age. how a re earni ngs expected to change? d. What doe!. Do you prefer the regression in (c) to the regression in (b)? Explain.St' . and the in teraction term Female X Bachelor. Is the e Uect of A ge a ll earnings different for high school graduates than college graduates? Specify and estima te a r~g(ession that you cafl use to answer Ihis q uestio n.304 CHAPTER 8 Nonlinear Regrenioo Functions are earni ngs expected [0 change? If Age increases from 33 to 34. What does the regression predict for his value of In(A H E)? Jim is a 30yearo ld male with 3 high school degree. Do you prefer the regressio n in (d) to the regression in (b)? E xplain. how are ea ming~ expected 10 change? If Age increases from 33 to 34. D esc ribe the si m il arilie~ a nd differences between the estimated regrc ss~o o func tions.2 Using thc da ta sel TeacblngRatings described in E mpirical Exercl . Agel. and Bachelor. E8. th e regression predict fo r her value of In (A H E)? What is Ih e pre dicted difference be tween A lexis's and Jane's earnings? Bob is a 30 yearold male with a bachelo r's degree. ln(AII El. Wha t does lile regression predict for his value of In(AHE)? What is the predicred differe nce betwee n Bob 's an d Jim's earni(lgs? j. Ru n a regression of the logarit hm average hourly earnings. Btlchelor. summarize the effect of age on earnings fo r young v. Agr? Female. g. h. how are ea rnings expected to change? e. f. What does the re gressio n predict fo r her value of In(A H £)? Jane is a 30yearold female with a high school degree. What does th e coefficient on the interaction term measure? Alexis is a 30yea rold female with a bache· lor's degree. Is the CUCCI of Age o n earnings different for males chan fo r fem al e~? Specify and estima te a regression tb at you can use 10 answer this ques· tion. Would your answer cha nge if you plotted the regression function for females with college degrees? i.
OWllh ome. Mom ColJ :os 1. 1. . fro m 20 10 30 m iles) .~1 increases from 6 to 7 (fro m 60 [Q 70 m iles). Fema/e. What is his value o f Beamy before {he surge ry? After th e surgery? Using th e regressio n in (c). M omColf. O wnhome. e.H increases from 2 to 3 (fro m 20 ( 0 30 miles). If Di. O Wl/home. l. and NNEnglish. e. DadCo/l. By test.fl. Black .3 to a nswe r the foUowing q uestio ns. R un a regressio n of ED o n Disl. who is a woma n. Mom ColI.. Hisponic. and SIwmfgSO.k. OlleCrulir. Cue80. ]n c:omehi = 0. • . Byte. co nstruct a 95% confide nce fo r the increase in his course evaluaticm . Is the male. BYlesl . d. cs· Hispanic. Run a regressio n of £ 0 o n D ist. 305 b.. If Dist inc reases fro m 2 to 3 (fro m 20 to 30 miles) . DadCoU. a. [mro. lllcomehi. a nd S llI'mfg = SlO. hmaJ e.fema le dif fe re nce in the effect of BeaUty sta tisLica lly significJ nt? d. CtieSO. how 3re yea rs o f ed uca tion expected 10 change'? ca' t t. His panic.2. and . Ctle80. Inwmehi. Tuilinn. Female. Coosider a H ispanic fe male wilh Tuirion '" $950. Professo r Smith is a man . If Dist increases fro m 2 to 3 ( tha t is.58. how £Ire yea rs of educa tio n e)( ~c te d to clllmge? If Disl increases fro m 6 to 7 (that is. Is the re evide nce tha t A ge has a nonli near cffect on Counoe_E van Is the re evide nce tha i A ge' has a ny effect o n Course_Evan c. a nd Srwm/g80. Estimate a regress ion of Course_EvaJon Beallty.3 Use the dat a set C ollegeD istance descr ibed in Em pi rical Exe rcise 4. CueSO = 7.Empirical Exercise~ a. from 60 to 70 mi1es) .. E8. Run a regress ion of In( £D) o n Disr. how are years of ed uca ti on expected to cha nge? b. MilWriry. OWllhome "" 0. Hlack. how a re years of e duca tion e xpected to cha nge'! c. Femaf~. Tuition. B/tu. Tuilion. ho w a rc years of educaLion expected to cb ange'! IJ Dis' increases from 6 10 7 (from 60 to 70 m iles).06. Igh . DadColI . Mom Col/. Byrest. Add A ge and A gr2 to the regression. Mod ify the regressio n in (a) so that the effect o f BeOltry o n Course_EmJ is di ffe re nt fo r me n a nd wome n. how a re years o f educa Lio n expected 10 c hange? If Di.)Mm/gSO. R epea t (d) fo r Professor Jones. Dis?. DadColI.. Do you prefe r the regression in (c) to the regression in (a)? Expla in. lncornehi. H e has cosme tic surgery thai increases his be auty index from o ne sta nda rd de viatio n belo w the ave rage to o nc standard deviation above the average.
What docs the codficient on the inte raction term measure? g. In(YeorsSclioo(). Neither of Mary's parents altended collcge.lIher did not.:_ tions. I n(Yt:{/r~. CUf!80 and Stll'm/ g80. Plot the regression relation between Disr and ED from (3) lind ( C) for Disr in th e range of 0 10 10 ( from 0 to 100 miks). E8. Both of Bonnie's pa rents attcnded col lege.s<:huol}.ool. Rt'v_Coups.havc for Dist > 10? Ho\\ many observat ion are lhere wilh Dis. What does the regression predict lo r the d iffere nce hetwee n Jane's and Mary 's yea rs oi education? ii.. H ispallic. Fill come. Construct a scatterplot of Groui/ll on YearsScI. \){hal does the regrcssion predicr fo r the diffe rence between Bon nie's and Mary's years of educa tion? h.:lIed regression funl. Janc's fat her anended college.4.ioll. Tra deS}lII n::' In ( YcorsSchoo(). De:tCribe the similarities and differences between the est im. Jane. > 10? f. Female. run the following fj\le regressions: GrUlI'l1z on ( 1) TradeS/iliff' and Ye(/rsSclwu i .w. and In(RC DP60). Using the regressions from (I): i.4 Using the da ra set Growth described in Empirical Ex~rcisc 4. Would your answer c. Afler run ning all of these regressions (and any others that you want to run). 810ck.uint/tion '>. but he r mother did not. Assassil1Glions. BYlest. Rev_Coups. Alexis. (3) TradeSha rt'. How does the regression (unction (c) b(. Whal dot::s the regression predict fo r the differe nce between A le xis's and Mary's yeats of educa rio n? iii. Add the interaction term DadColi x MomColi to th ~ regression in (c). (2 ) TradeSlwre and In(Yea rsSchou/). Rev_Coups.· Slum'. Is there any e\l idence Iha! the e rrect o f Di:)'l o n ED depend" on Ihe fnmi ly's income? I.. Use the plot to e ~pl ain wh\' regres.'iinn (2) fi ts bette r than regression (I).. and '{iI/dr Share x In(YeafJSchoo() . and (5) Tra deShare. Alex is's mother attended college. D oes the rdation ship look linenr or nonlinear? Explain. A mlS)'ill{l/iul1s and In(RGDP60): (4) '{rcllt. o.hange if yOll plotted the regression fu n ~ tion for a while male with lhe same characte ristics? ii. Mary. A ua. ex cluding tht" datu for Malta . Own/tol1le. Tlli. summarize the effect of Disr on years of ed ucation . In(RCDP60). and Bonnie have the same values of Di. . TrudeShare1 .306 CHAPTE R 8 Nonlinear R egreuion Function$ i. but htlr f.
The depe n.m: in different industries. Use regressio n (2) to predict the increase in Growth. In some applica tions.n \Vl1~ LQgi)·tic curve_ Suppose you a re studying the m:ukc! penetration of a technology. In 1960. Us ing regression (4). In 1960. Use regression (5) to predici the increase in Growth. a cOll ntry contemplates an educa tion policy that will increase average ye<l rs of schooling fr o m 4 years 10 6 years. ~st. 4) Trot/e t/eS}J//rt' ' .R egression Fuodions Thai Are Nonlineor in the Parameters 301 ~c) he ~mc · ~w fnc b. b Bon I the ~ want to ~uding the 'ut/l. This family of no n linear f(. Because they are lin e.] a re nonlinear fu nc tio ns of the X s but arc linear functions of Ihe un known para me te rs.. is th ere evidence that the effect of TradeShare on Growrh depends on the level of educa tion in the counlry ? In e.5 to 1. [ nd Tra de  Functions That Are Nonlinear in the Parameters We begin with two examples offunetions tha t are Ilo nlin.r in the unknown parame ters..'SiJort' df!S/J(1f t . te .1 Regression Functions That Are Nonlinear in the Parameters The nonli nea r regression functio ns conside red in Sections 8. economic rca~ni ng leads to regreSSion fu nctions that are not linear in the parameters. We then pro vide a general formul<ltion.. ca n be estimated by OLS after defining new regressors that a re nonli nea r Irans{ormaljons of the origi nal X's..He d. the..adtShare from 0. A llho ugh suc h regressio n functions cannOt he estimated by OLS. Use regression (3) to pre dict the increase in Growth. a • . a co untry contempla tes a trade policy that will increase the average value of T.:arin the parameter.'gression func tio ns is both rich and convenie nt to use. Usin g regression (5) is t here evidence of a nonli near relationship betwee n TradeShare and G rowth? f. ca lled nonlincllr least squares.y can be eSlimated using an ex te nsion of OLe. l rehl 1ion i. those ptl ramctc n.foT t:xmn pie the adoption of databasc management softw. . how eve r.2 a nd 8. Co Test whether the coefficients 0 11 Assassinmiom and Rev_Coups equal to zero using regression (3).. Use regressio n (I) to predict the increase in G rowth. den ! va riahl e is the fraction of firm s in the industry that have ado pted the softwa rc. bdcOl_ }anes APPENDIX 1 8.
·ep 1\)( lu \\ \ alu c~ of X .. .sion mood wi th it single X is Y.) + lip 1. G~neral functions Ih(Jt ore IlOnlinrar in the paramelers.lhl.. the polynomi.lU\. C'p'l' rcgr.e some udiciellcies.11l In~. (S. thl.lupo. As Cim be seen in the grdl'lJ \alu~or the logistic function has an elongated "S··shape.2 for some val ues of income.BtJ I + (I .... so for 50me incomes the predicl..r lI1~tead a function that produces prcdicted values octween II 'Ille logisti. . _ • •• Xh ..md the ~ I opc IS nat: the curve is stee per [or moot:ratc values of X .s to infinuy). hut the paf:ll11ctl!r' cl1Icred linearly. and fur large values of X. 130.. the prcdicted vtllues increase without bound. The nega ti\'e exponeDtLaI growth function is graphed III Figure S.. TIle negative exponential growth model is (K~II) Y. between test ~core~ and income ha. p.. the value of the (une ncitrly 0 . Nrgolil'r npoll clliia/ RrOtl'th.~ihle ~re value for a district will exceed Ihe maximum on the test The negallve exponential gro\\ th mooel providcs a nonlinear \pecification tha t hfl~ a prn.3'. m~ and decrea~e'i !l~ inCQme ri ~c:l.fJu.:1nd you h:1\'e data on II industnes. and hus an upr~er Ixm nd (that is. /J".c 10 usc mudd could produce predicted \u iucs less than 0 or gre.". 111e dependent variabk is between U ( 110 adopters) and 1 (100% adoption) Becaust a linear and 1.. regrc~it)n than l. fu nction smoothly increa~es from a minimum 01Uto a maximum of I.as income !o./l. l h~ X's ~nl~rcd this function nonlincarly. For eXllrnple. il makl.li moJo ~I ope i~ ds can produce <l ncgative vcr~ implausible.:s ~m.: function appru1lches I and the sl ope is n:H again ..308 CHAPTlIt 8 Nonlinear Regreslion Functions slOglc Imk_ pcndcnt \ armble X describes all industry ChMtl. has a slope that is gr~· .tcnstlc.:: [or 1111 values of income..2 and R3. which 10 mudcllhe rel:.( ' . The arithmic ~peci iicn t ioo has a positive slope lor all valuc!\(jf income: ho" c\er.:u pos...1 2b. but as :< increases it reaches an a5yrnptolc of f3v.t\c s t al low values of inc.1' cis of Sections 8.... TIle fu nctions lIsed in Seclilln 8.: paramctcn enter nonlinearly as \\. I n the: elCarnpll. d. For small i~ X..  AXI. nn asymptote re~rc ~s ion a~ inC\l!tlc increaSt...!$ of Ihis appendlX.t. The slope is stt.::h large. The tiOo logi~tic function with:1 si ngle X is graphed in Figure 8. Th" logi~tic regre.. I 1.':~lv(1 nential growth regrl's:.AJ 1 + Uj.. 111e 1 0gLstic and ncgall\'''. '" I + etA...11 ..iti\c :. /J..40) in which there arc k independent \ariablesandm'" I paHlmcte. p. .ion mode ls are special cases o( the general nonline:H model Y..Hl. .': I3tJl  e IJ. ln the 1111.
. ."".J p.". b.Regre$$ion Functions Thot ATe Nonlinear in the Porometers fI GURE 8. The OLS estimator minimiLes the sum of "<luared prcdic lion mistakes in Equation (5.38). and on asymptote 01 flo os X tend~ to in~nity."1 ~ " L ' _ _ _ _ _ __ _ __ graph. p..t. In appl icatIOns.X + + bkX/./Cltion (8 .. a of income as Hlcomt: Nonlinear Least Squares Estimation Nonlinear leasl squares is linearl~ It g~" n~"ral method for e~limating Ihe unknown paramcterl> of a regression function when those parameters entcr the populatioIl rcgres~oD function non (8.:lcrs are un known and must prl!dich:d tha t ha~ be estimated from tho.n conSlruellhc sum or squared prediction mistakes: f(X k ···· Xl.l()) h i> .12 Two Functions Thot Are Nonlinear in their Parameters y 309 !lOn 11 Ilion)..~(Il mr~s.{ic n rllTW x x (b ) A Il cglciw I 'XPOlltJlU.8).0<1' ... .rowth nun" he (une : and for I relation Pa rt (0) plOl$the logistic furn. Ih\.lian of EqlXllion (8 . Paraml. .(~ .qofl that minimize th~' sum of squared mistllles.41 ) C<lr\~ ""II .b " '. which ha~ predicted volues thai lie between 0 and 1. hmlf (8. the " \"aluesof b ..:lcrs that enter nonlinearly cannot be estimated by OLS. (~ . ~ ~.TIn: log are knolln..:rs of the general nonlincltr regression model in EqulItion (SAO). .sing (he method ornl! gcts described in Section R I. ! (y. (a) A lo~i.. method is called nonlin car ICllst ..2m RcclIJI the di~w... 'reatcr ~' ecn 0 ~ l.. . ~ i'" """.. Pori (b1 plOI> the negative exponential growth function of Eql. If the pa r amctcr~ . which hos 0 slope thot is always posilive and decreases os X increases. For a set of trial parameter values btl.)f.I..3 of the OLC) estimator of the cocfficientsof the lin ellr multiple regre . In principle. " param. then predkled effcct~ may be compu.:oef thi. but they can be c ~ timaled hy nonlinear least ~qu3rCs. :2:1) . (8.l3eciluse the regression model is nonlinear in the o.:d u."!!'" ~ ~ panllnl: l.: f ' ..~ion in Section 5.: daHl. u~d Thb ~ame approach can he ficient~ to estim:lIc the p:namct. b and scltlingon OLS o.n:::.~jon l is sle":p [of model.. howeve r.The \' 1'.391..:~ t imator can be computed hy checking many trial o l Ihe valllc~ ~ali\"e e.P''' Ir rcg. + b..
square. and . In rcg.1. squa re~ miz. One differe nce is that the negative cX jXlnential gro wth curve fl a ttens out .310 CHAPTER 8 Nonlinear Regression fUodions The nonlinear least squares estimators of f3v. consistent with ha\'ing an asymptote. so he teroskedaslicityrobust standan. Application to the Test ScoreIncome Relation A negati ve e xpone ntial growth model. lion of the data..md IheX's. infere nce concerning the parameters ciln pm.. mini_ Regressio n sortware incorporales algo rith ms for solving Ihe nonlinear least mato r in practIce.39) is positive1 and a n asymptote of f3u as income increases to infinity.0068) (4. mator sha res two key properties with lhe OLS estimator in the linear regression model: It is consistent a nd it is normally distri buted in large samples. a n:lativcl y simple formula expresses the OLS cstimalOr as a func..:~li.34.uio n proble m.lt In linear regressioD. . along wi th the logarithm" regression fu nction a nd a sca n e rplot of the data. .. the pararn· res/Score = 703. the error tcnn in the non· linear regression model can be heteroskedastic. Unfo rtunatel y. no such gene ral formula exislS for nonlinea r leMt .:! errol'> sho uld be used.44) (0. ceed as usual: in part icular.ThUS = 4. ports nonli nea r least squares estimation . • b. As a consequence.82 "" the estimated no nlinear regression function (with standard e rrors repo ned belo\>..! the highest levels of income.0 (5£ " 4 4S). in t hi~ cast· quite similar.(068).13. fit to distric t income (X) a nd test scores (Y). (8.l)5S2(IJI(""'.4l). • 13mare the values of /'0' b1• ··· minimize t he sum of squared predict ion mistakes in Equation (8. Istatistics can be constructed using the general approach III Key Concept 5. plus or min us 1. whic h simplifi es Ihe task o f comp ut ing the no nlinear least squares 1.% standa rd crro~ Just as in linear regressio n. f3 1'..44).39) using the California test score data yields robuststa ndard e lTo r e te r estimates) is ~ Po = 703. SO the no nlinear least squa res esti mator must be found nume rically using a compuler.0552(5£ = 0.81 = 0. the output typically reports sta ndard error~ for the esti mated paramete rs. The result of estimating fJ.~J4. th. Unde r gene ral conditions on the fun ction f.12) (4.n. and a 95"/Q confidence interval can be COllstructed as the estimated coeffic1enl.2 (hc te roskedastlcit) .ssion software that ~lI p.0) 1 .48) l11is estimated regression fun ction is plut ted in Figure 8. the nonlinear leas t squa res r~ll. The two spccificllt ions are.2[1  e~O.. ha~ the desirable fea tures of a slope that is always positive [if 131 in Equatio n (8.r 131' anJ 13~ in Equation (8.
o 20 • • '" Districi income '" .tsup 'fS fOf I lir.13 The Negative Exponential Gl"owttt and Linearlag Regression fund'ion s . c non crton ~~~I···'r.has lhe ivcl ~nd lInd fJ..t8)..42)) and the Test 'Cort 700 ~eSli d:h . bul the JioeorIog regreuion Junclion does nol. One difference between tne !"wo functions is that the negative exponential growth model has on asymptote o~ Income incrooW!S 10 infinity. no fl11m f cSli i ~e~.:ll''.garilh ll1iC Ihis .ThUS IC para m' (8.. • • ..~ ' . .42) ..eorIog regression function {Equation 61811 both copNre the OOI1lineor relo lion be!"ween 1 8st scores and district income. in us\icil~ .. J. ums oul <II .Regression Fuoctioni Tho! ATe Nonlinear in the Porometef"s hal 311 FIGURE 8.: . The negative uponentialSrowth regres' ~ function [Equation (8.lt usn  n pro mKey licicnl.t • • Negatlve tlj)OOeflti.
ats to internal and exh:rnai validity.plain how 1 0 use multiple rCgTcs:. wh en will mUltiple regression pro vide a useful estimate of the causal effect and. This framework relies o n the concepts of intern al and ex\c rnlli validity. this chapte r presents a tramework fo r clssessing stati stical studies in generaL whethe r or not they usc regrcssion analysis. such as cl<lss size..lliciit)'. Whal makes a study lhal use s mult iple regression reliabl\! or un reliable? We focus on statistical sludit:s Iha1 have th e objective of estimating the ca usa l effec t of a change in some ind ependent vari able..forecas ting.. In Sections 9. In Ihis chapler. .teacher ratio presented in Chapt ers 4. A st udy IS inlernally valid if its sta ti stiC<! 1infe rences about causal eiCec ls are valid for the populalion a nd se tt ing studied.:ll1dity of forecasts madc using regression models. list a variety of possible thre.and provides an introduction to The threats to the \.. As an illustrati on of th e framework of internal and exte rn al v. 'me discussion in Sec ti ons 9. such as lest scores. Secti on 9.1 and 9. on a depende nt variable. juSt as importHlllly.3 discusst:s a differt:nt use of regression models.2 . For such stud ies.: ff~C( un test scores of cutti ng the st udent.2 focuses on the estima tion of causa l effects fr om observati onal data. it i~ externally valid if its inferenct"S can be gene ralized to o ther popula ti<Jns and sc ltin gs.. when will it fail to do so? To a nswer this question.CHAP TER 9 Assessing Studies Based on Multiple Regression T he preced ing five chapteT s e>. · +1 .4 we assess th e internal and exte rnal validit y of the study of the ..ion [0 analyze the relationshi p among variables in a data set.. \: SIt'P back and ask. and discuss how to ide ntify those threa ts in practice. in Section 9. we discUSS inte rnal and exte rnal va lidity.1 and 9.
people.' . whe ther tllc o rganic me thods tha t work in the set ling of a labora tory also work in the selli ng of the rea l world... TIle popula tion to which the results are gene ralized.1 r 9.. {3STR. social. school dis· tricts. 1 Internal and External Validity 313 v INTERNAL AND EXTERNAL VALIDITY I ' "0' '. Internal va lidit y has two compo nents. legal.}.ased and consiste nt. th e n {3STR should be an unbi <1sed an d consistent estimator of the true population causa l effect of a change in th e studentteacher ratio. For example. For exam ple. 9. il would be important to know whe ther the find ings of a lab· o ra tory ex pe riment assessing me thods fo r growi ng o rganic to ma toes could be g. a nd econo mic environ· me nt.".' . By "selling.1. provide a framework for evalua ting whe the r a sta tistica l o r econome tric study is useful for a nswe ring a specific queslion o f in terest.eneralized to the field.la tistical analysis is internally valid if the statistical infe nm ce~ about ca usa l effects are valid for the population being studied. We provide other exa mples o f differenceS in popul<ltions and sett ings la te r in this section.scores of a urUt chaoge in the student. First . tha t is.teacher ratio in a certain regression . For exa mple. The analysis is externally ''alld if it~ inferences and conclusions can be generalized from the popu l<ltion and sel ting studierJ to other popu lations ami se ttings.ed. a high school (grades 912) prindpul might "'ani 10 gene ra lize our findin gs on class sizes a nd test scores in Ca lifornia e le me nta ry school districts (the popula tion stud ied) to the popUla tio n o f high schools (the popu l<l tion of inte resl). compa nies. The pOlmlation studied is the popula tion of e ntit ies. K£Y CoNcEPT I 1 "A :. the estim a tor o f the causa l effect sho uld be uobi..9. I" Threats to Internal Validity t' y L£.defined in Key eoncepI9. is the population of e nti lies to which the ca usa) infe re nces from the study a re to be applied ." we mea n the institutional. o r the populatio n of inte rest.1 Internal and External Validity 111e concepts of inte rnal a nd ex te rnal validitY. In te rnal a nd e xte rna l va lidity distinguish between the popula tjon a nd setting studied a nd the populaTion a nd setting to which Ihe results 3re gene rali7.. if ~STR is the OLS estimator of the effect on test.. aod so forth fro m which the sa mple W<lS drawn. .
in a study based on OlS regn:~. Section 9. which violates Ihe firs t least squares assumption.I~ chose n in way tbat mak es it di(fere nt from Iht. or becau<. Accordingly. rbcse threats lead to failures o f one or more of the leaSI S4ua res assumptions in Kt'y Concept 6A..2 provides a detai led discussion of the vario us threats 10 intt:rn. the true causa l effect might nOI be Ihe orome in the popu l ali (~n studied and the population of interesl. More generally. lat"!· oratory studies of the lOxic effects o f chemica1s typically use animal population' like mice (the population studied) .. and their standard errors.. For example.. J3s n~ .ion.96SE(~'I7"H) . Wheth N mice and men differ sufficiently to threaten the external va lid ity of such studie' L~ a mailer of debate.:ristics of the populations. . if a confidence interval is constructed as ~S1'R ::!:: I. and these reasons constitute th reats to internal va lidity.l threat to external valid ity.JI vnlidity in multiple regression analysis and sugges~ how to mitigate them.3 14 CHAPTER 9 Auessing Studie~ Based on Multiple Regression Second. W Ith probability 95% over repeated sa mples. and confidence imervals should have the desired confidence levd . In regression analysis. This could be because Ihe population \\. If data on the omitted vari.·r cnee. Differences be tween the population studied and the populat ion of inte re"t can po:. able arc available. Threats to External Validity Poten tial threats to extern:tl validity arise (rom differences between the popula lion and selling stud ied and the population and selli ng of interest. then th is threal can be avoided by including that variable as an additional regressor. confidence interval1Should contain the true population causal effect . hypothesi. For example.e i.. cl). <1nd tha t s tandard errors arc computed III a way that mah'S conh. There are various reasons Ihls might nOI happen. thi. one threa t thai we have discussed at length is omitted variable bias: it leads 10 corrC"ialion between one or mo re regressors and the I:!rror term . but the results are used to write healt b and safel) regulations fo r human populalions (the PO p ul ~lIion of interest). sian function and hypothesis tests nre performed using the estimated regression coe(ficienL<. ~ . . lhe requirements for internal validity are thaI the OLS estimator is unbiased and consistent. icnnce le . because of differences in charact. tests should have the desired significance level (the actuil] rejection rate o f the test under the null hypothesis sho uld equal its desired signil. dencc intervals ha\£: the desired confidence level.' popula tio n of inte rest. <. beca use of geographical di nt.e the study is OUi of date. For example. Differences in populations. causal e ffects are estimated using the estima ted regn.
the leg.lT)' school district'\. This analysis wa:. a study of the effect on college binge drinking of an an tidrinking ad\'enising campaign might not generalize 10 another identica l group of college students if the legal jXnalties for drinking at the two colleges dif fer. dif ferences in laws (differences in legal pcnalties). clemen t.ct of reducing. Application to test scores and the studentteacher ratio. Differences in settings.9. but substanli\'e ly small.. class sizes e!>limated lLsing the C"Ilifor nia elementary school d istrict dat a wou ld gene l' ~l h 7e to coll eges.5 Even if the population being st udied and the popula tion of in terest are identical. For example. elemen tary school students. I r so. More generally. On tht. and orgllnilfllion are broadly simi lar throughout Ihe United Slates. In th is ell!>/:.:. • . sim ilar findings ill two or more studies bolster claims to How to assess the external validity of a study. examples of differences in sCH ings include d ifferences in the institutional environment (public universities versus religious uni. Exte rnal validity mUl>t be Judged using specific knowledge of the populations and se n ings studied and those of inlerc.lIld. To what o ther populations and settings of int er est could th is finding be generalized? The closer arc the population and setting of the ~ tudy 10 tho.1I thc California resulb might gcneralil'c to performance on standardized tests in ot her U. curriculum.S. it might not be possible to generalize the '\tudy resu lts if th. For example.al setting in which the study was cond u et~d differs from the lega l setting to which its results are applied .ersilies).' other h.collegc students and collegc instruction are vcry different than elemen tary school slUden ls and instruction. In general. estim atcd improvements in test scorel> resulting from reducing the student. l mpurtant differences between the two will cast doubt on thc external validity of thc study. based on test result:. For example.sl.4 we (In<llyzc Icst score and cla'l> si/c dala fur de mentary school districts in Massachusetts and compare the MassachusetlS and Cal ifornia resul ts. so it is plausible th. the Mronger b the casc for external \'alidit~. Alas ka).e of interest.. the external validity of bot h st udies Clill bc checked by comparing thei r results.or differences in the physical em'! T on men! (tililga teparty binge drinking in southern California ve rSlls Fairbanks. so it is impill usibic that the effe. in Section 9.: settings differ.1 Inlemol and External Volidily 3 1. for California school districts. Sometimes Ihere are two or more studies on dirrerenl but related popula tions.teacher rat io. Suppose fo r the moment that these results are internally valid. Chapters 7 and 8 reportcd sta tistically signifi ca nt.
9. fur example. Cook. For eac h.lish. do you sort the: good studies from the t>aJ' 110100' do you cumpBre studies when Ibe dependent "ariabtcs differ" Should you put more ". fid e nce int e r va ls wilh lhe desired confidence leve l.anal~·51s. we disc. Om itted Variable Bias Recall tha i omitted variable bias arises when a variable that both determinl!s Y and is correlated with one or more of the included regressurs is omitted fro m the regression.2 Threats to Internal Validity of Multiple Regression Analysis $ wdies based on rcgrcssion a nalysis arc intt:rnally valid if the esLim al cd regres. I How to design an externally valid study. and sim uil . Perfonfl1nll' mellianalysis of many §iudics has its own ehaUenges. All fj\'e source) of bias arise beca use the regressor is correlated with the error term in the populatjo n regression.""n III tbe box 00 the ·· ~Ol:arl cffed" in Cbapte r 6 is ba. eve n in large samples: omin ed variables. mi sspeciIication of the functi onal form of Ihe regression function .316 CH A'TER 9 Asses..ed on II mCIIlanalySiS. viola ting the first least squa res assumption in Key Concept 6. and jJ their sta ndard t ITan. Because threats to extern al valid . Study design .s beyond the scope of this textbook. imprecise measurement of the independent vari· abies C"errors in variables"). an d Campbell (2002). these threal\ are best minimized at the early stages of a study. The inte rC!5led rellder is re(errtd 10 Hed¥cs and 01ldn (J91l~) and COOJ'Cf and H( W ' (l!n4}.meous causality.uss wha t can be do ne to reduce t bl~ bias. I A com pari5Ol\ o f many relatcd studies on tbe same to pie is callcd iI m~ta.eIght (lII' large ~tud) than a small siudy? A dl~sslon of nleta·anal~i~ and IIsch ~lIe n ges goes bc)'Ond ih ( "'.! to Shat. The section con· eludes with a discussion of circumsta nces Iha l lead to inconsistelll standard errors and what can bt: done about it. before the data are colleeted. . ity ste m from a lack of comparability of (Xlpulations a nd senings.4. whi le d iffe rences in their findi ngs that nre not readily expl ained cast doubt On their external validity. so that the OLS estima tor is inconsistent..sing Studies Based on Muhiple Regre~~ion external va lidi ty. and the interested reader is rdc rrl:t. sample selection. How best to minimize omitted variable bias depends on whe ther or not data are available for the pote ntial omitted variable. Thi s bias persists even in large samples.Ior' of I hi~ textbook. TIli s st":ctlon surveys five reasons why the OLS estima tor of the multiple regression coefficients might be biased. The disCu. !Im. sion coeffici e nLs are unbiased and consistent ... yield con.
Land 8.cs a tradeoff between bias and variance o f Ihe coefficie nts of interest. W~ could have presented o n I)' the regrcssion in co lumn (7) .1: mreaTS to InI8l'TlOl ValidIty of Multi~ Regreuion AooIysis 317 Solutions to omitted variable bias when the omitted variable is observed.3 are examples of this Slralegy.l step is 10 identify the key coefficien ts o f in terest in your regression. The th ird step is to augm ent your base specifi cation wit h the addit ional ques tionable variables identified i. then yOu can include this variable in a multiple regression. if ignored.ng the data. and a lisl o f additiona l "4ueslio nable" variables that might he lp 10 mitigate possible omitted va riable bias.could bias o ur estimato r o f the cl ass size e ffect. If not. However. the decision wh eth er to include a va riable invol\. Th e result of this step is a base regres. including the \'aria blc when it does Dot belong (t hat is. . wben its popu lation regression coefficient is zero) red uces the precision of the estimators of the ot her regression coefficients.2. this is refe rred to as a priQri (" before the fact") reasoni ng. In other wo rds. hecause the q ucstion o riginall y posed concerns th e effect o n test scores of reduc iog the sludcnllelleher ratio. H you have daHl on the omitted vuriable. Prese nting th e other regressions. O n the one hand.'hen the additi onal variables are incl uded.. permits the Skeptica l read er lO draw his o r her own conclusions. The seco nd ste p is to ask yourself: What are the mosl likcly sources of impor tant omi tt~d variable bi as in this regressio n? Answering this question requircs applying economic th eory and expert knowledge. beca use that regression summarizes the relevant effects and llonlinearitics in the other regressions in that table. add ing" new vari able has both costs and benefi ls. In practice. If the coefficients on the add itio nal variables are statistically signilicant.sion specifica tion. omi tling the variable could result in omitted variable bias. there are four steps that can help you decide whether 10 include a variable or set of variables in a regressio n. because Ibis is done before analyzi.. however. then these variables ca ll be excluded from the regression. These steps a re summarized in Key Conccpt 9.3. this is the coefficicni on the studenttea cher ralio. this step entails i ¢~ntjfy ing those determinants of test scores that. The fourt h step is to prese nt an accurate SUmmilrY of your results in lahular fo rm ..9. the starling poi nt for your empirica l regression a nalysis.n Ihe second ste p and to test the hypotheses that their coefficients are ze ro. o[ ir the estimated coefficien ts of imeresl change appreciably . who can Ihen draw his o r her own conclusions. On the ot ber hanq . In the test score regression s. For exam ple. In the test score example. The fir.. and should occur before you actually run any regressions. t h ere b~ addressing the problem. Tables 7. then the)' should remain in the specification and you should modify your base speci ficatio n. in Table 8. This provides "rull disclosure" to a polentia l skeptic..
31 8 CHAPTER 9 Assessing studies Based on Multiple Regression OMITTED VARIABLE BIAS:SHOUlD I INCLUDE MORE VARIABLES IN My REGRESSION? 9. you will eliminate the possibility of omitted variable bias from cxduding that variable but the variance of the estimator 01' the coefficients of interest can increase. As e xplained in Chapte r 1 0.lrol es regression is discussed in Chapter 12. tcst score ancl related data mighl be collected fo r the same dislric ts in 1995.. called an instrumental variable. The third solution is to use a study design in which the effect of intere<. Each of these three solu tions circu mve nts omitted variable bia5 th rough the use of difrerent types of da ta.lt a randomized controlled experiment. Te5t whether addit ional question able variables have nonzero cot!CficienlS. Still . .2 If you include another variable in your multiple regression. For example. the effect of reducing class size on student achievement) is studied usil\. Provide "full disclosure" represen tative tabulations of your resulis so thaI oth· e rs can see the effect of induding the questionable vuriables on the coem· ciel1t(s) of inte rest. there are three other ways to solve omill ~d variable bias. The first sol ution is to use data in wh ich the sam e obse rvat ion al unit is observed at differen t points in time.t (for example. The second solu tion is to usc instrumental vari ables regression_ Th is method re lies on a new variable. Be specific about the coefficient or wefficients of intercst. lines to help you decide whether to include un additionul variable: !. Instrumental vaei. 4. Data in this form an: calL ed panel daiS. Do your results change if you include a questionable variable? Solutions to omitted variable bias when the omitted variable is not observed. 3. Adding an omitt ed varia ble to a regression is not an option if )'Oll do nOI have dala on that variable. pane l data make it possible to con tro l for unobserved omi tted variables as lon g as those omilted var iables do not change over time. Use a priori reasoning to idcntify the most important potential sources of omitled variable hias.lcading to a base specification and some "ques t ion abJe~ variables. 2. Here are some guide. Randomized conlrollcd experi ments nrc: diS cussed in Chapter 13. tben aga in in 2000.
to:: ache r ratio fOr tenth graders in that d istrict. be biased. ErrorsinVariables Suppose that in ou r regression of test scoreS against the student. then the estimator oj' the partial effect of a change in one of the variables will.in·vari ables bias • .l io Chapler I I. A lthough tbe slUdent. if th~ population regression function is a quadratic pOlynomial. then Ihb functionaJ form misspecificlttion mak es the OLS estimaTor biased.. howeveI. 111is bias is a type of omitted va riable bias. Regression wIth a di screte de pendent variable is discusset. Wh en the depend ent vari able is continuo us (lik e lest scores). th ~'y are not the $<Ime. If the fu nctional form is misspeci fied .ed in Key Concept 9. (he depende ot variable b dis· crete or binary (for exa mple.1 Threats 10 Intemol Validity 01 Multiple Regression Anolyiis 31 9 FUNcnONAl FORM MISSPECIFICATION ~ KEY CONCEPt I' ' Functionnl form misspecificution arises when the functional form of the estimated regression function differs from the functional form of the population regression funclion.9.3 Misspecification ofthe Functional Form of the Regression Function If the True populati on regression function is nonlinear but th e estimated regres sio n is linear.I scores for rifth grad ers on the Sludent. For exam ple. this problem of potentia l nonlineari ty ca n be solved using the methods of ChapteI 8. in gene ral.3 .sing nonlinear aspecls of the regression function. and it can be corrected by using a different fu nctional form.teach er rario for elem entary sch ool st udents and te nth graders might be correlated . then a regression that omit s the square of the independent variabl e would suffer fr om omitted varia ble biii5.teacher ralio we had inadvertently mixed up our data . in which the o mitted variables are the terms that reflecllhe mi!'. Y. so that we ended up regressing t e~i. 9 .Dias a rising from fun c(ionall"onn misspeciJication is sum· merl'l. so thi ~ mixup would lead 10 bias in the estim<lted coefficie nt . Funct ional C onn misspecification often clln be detected hy plotting the data and the estimated regression function. things are more complicated. If. equals 1 if th e i'h person att ended college and equa ls 0 otherwise). Solutions to functional form misspeci/ication . Th is is an eXlIlllple of errors.
).ri + so ~L . suppose there is_3 single regressor X.~ mell~urcmcnt error assumption.2) luiJ(ui ..) + CO\( X.(X1_1<.!.oo. suppose that the survey respondent provides her best guess or recollection of the actua l va lue of the independent variable XiA conyen ic= nl way to represe nt this mathematically is 1 0 suppose thatlhe measured value of X. err Ihe have been typographica l errors when the da ta were first entered. . No" oj . and X...\'. (9. is X/ = Xi + hli .00\'(.... Th e precise size and direction o( the bias in P I depe nd on the correlation be tween Xi and ( Xi .X. c..w.) '" P.. PI .L. .. lion in the Current Population Survey in volves last year's earnings. ..so co..1 ) whe re v. "" 1J1(X.. 13 t Ux u . is obse rved. = 131 (X . lUlld£f th.. u!)!P. or he might misstate it for some other reason. As an example. 1<. Under thi s ass umption..X.P I"!J~...J + 11.II ..'(tri <r] '" cr. plus a pure ly random compone nt .. hi. a responden t might give the wrong answer. act ual income) but that X. is measun:d imprecisely by X.aI(X. we might suppose that Wi has mean ze ro and vari:1nce and i ~ uncorrelalcd with XI and the regression error 11 /.) .. Accord ingly. the measured value of the variable.. + Vi' (9. If instead the data are obtained from compute rized administrative rccor~ there mi~ht abl. a bit of algebra 2 ~hows th at ~l has the probability limit a.This correlation depends.X. II this diffe rence is corre lated with the measured value X. the regression equati oo actually t:stimated is the one ba~cd on X.'anable... Written in terms of the imprecisely measured varia bl~ Xi' the populution regress ion equalion Y. on the specific nature of th e measurement error. 111 . will be corre lated with the error term aDd P1 will be biased and inconsistent.This r E En bias persists even in very larg. the population regression equation written in 1ConS of XI has an error term that contains the differe nce be tween X. = Po + PIX.. = I3IJ + (3)(.. lr the data arc collected through a survey. ". P.320 CHAPTER 9 Assessing Studies Bo~ on Multiple Regr«wion because its source is a n error in the measu re ment of the illdepcndcll l .) 0 . in turn.j • co~ (X.u'.Xi) + u.. denoted by X. unmeasured va lue.. (say. uJ' 2 + . + 1 . equa ls the actual.( X.) _ P Ia::_ ll'll. For example. the n the regressor .• tld oov(X1_"'... + tli is len bi3 y. SO thai the OLS estimator is inconsistent if there is measurement error. II. Th us.. l'.) = 130 + f31 X .. .~ jnlfll EquAlion (61) . P.. A respondl::l1l might nOI know his exact earnings. (the respondent's estimate of income). /3 l • p '+ . Because: the error is purel)' random .) • n:.e samples... . To sec that errorsinvariables results in correlation between the regressor :...\'. one qUt:~. Because X" not X. There are many possible sources of measuremenlerror.:... ".tnd the error te rm.t "" "l" fJI".
econometri c methods can be used to mitigate errors.J!. In the e xtre me case that the measurement error is SO large that essen tially no informa tion about X ..:s when an independent varj· able is meas ured imprecisely. independently d islri buled measuremem error lerro. then she can use Equation (9. f31' A lthough the result in Equation (9.2) is specifi c to this particular type of mea sureme nt error.I al. 9. and its probability limit is given in Equation (9.2) to com pute an estima lor of {3j that corrects for the downwa rd bias. how ever. remains.i o ..variable ~ bias. In the o ther extrem~.Errors. to use 1he resulting fo rm ulas to adj ust the estimates.invariables bias in the OLS estimator aris.'This bias dt:!pends on the nature of the measurement error and persists even if the sample size is large.. . and if she knows or can estima T e the ratio Q'~. if a researcher belie ves that the measured variable is in fac t the sum of the actual value and a random measurement error term.. the de tails ty picall y are specific 10 a gi\len data set and its measurem ent probJems an d we shall not pursue this approach furth er in this textbook .. <41' Solutions to errorsinvariables bias. A second mel hod is to deve lop a math ematical model of lhe me asurement e rror and.9. if possible. This method is studied in Ch:lpter 12. If the measured variable equals the actual va lue plus a mcanzero. it illustrates the mo r~ general proposition that if the independent variab le is measured imprecisely th en the OLS estimator is biased.. Errorsin ·variables bias is summari 7cd in Key Concept 9. It relies on having anothe r va riable (the " inslrumenlaJ" variable) that is correla ted with the aemal value X.4. even in large samp les. but is uncorrelaled with the measureme nt error. Beca use the ratio <1! is less th an 1.2). when there is no measur~ment error. = 0 so ~\ . the ratio of the va riances in the final exprcssion in Equation (9. For example. If th is is impossible. then the OLS estimator in a regression with a single righthand variable is biased toward ze ro. even in large sa mples. ~1 will be biased toward 0.4 That is. if the meas ure ment imprecision has the effect of simply adding a random element 10 the actu al value of the independent variable. U'.. Because tRis approach requi res spe ci<tHzed knowledge about the nature of the measurement error.2 Threats to Intemol Validity of Multiple Regression Aoofysis 321  ERRORSiNVARiABLES BiAS Ke\' CONCEPT I' . One such method is instrumental varia bles regression . The best way to solve the errorsin variables problem is to get an accurate measure of X.".2) is 0 and ~l conve rges in probability to O . then ~1 is inco nsistent.
when e mployed . whe ther som. Sample se lection bias i~ summarized in Key Concept 9.. by definition .. because in 19. if data are collec ted from a population by simple random sampling. the fact that so meone has a job suggeST S that. and cou ld be coreela le d witb the regressors.:lion in econom ics a rises in using a regression of wages on educa tion to estimaT e the effect o n wages of an addiTionul year of ~du· ca ti on ..Thus.'~ nOI imroducc bias. all else equal . and thus appears in the data '1CI. the sampling method (being dnlwn at mndom from the popu. For example.ls Outperform the Market?'" provides an example of s<lmple sckclion bias in fi msncial economic$. The me thods (or estimating models with sa mr Je sel ect ion are beyond the scope of Ihis book. Sample selection that is unrelated to the value of the dependent variable dUI.tluc of th e depend.16 car owners with phones were more likely to be R ep ublican s. . In that example.5_ 11le box "Do Stock Mu tual Fun<. dent vari(lble (who The individual supponed for president in 1936). A n example of sa mple SCIC(. lation) has nOlhing to do with the value of the dependent variable. An example of sample selec tion bias in poll in r. provides infor ma1ion 1ha t 1he e rror term in l he regression. tIle error term in Ihe \\Jge eq uat ion for that pe rson is positi ve. where furthe r refere nces are providl!d.ol1 of sampling is related to Iht' \. On ly individuflls who have a job have wages. lnis seier. Th e factors (observable and unobservable) tha t de te rmine whe the r someone has a jobcdu· ca tion. TIl is 1 00 ca n le ad to b ia~ In the OLS estimator.:o ne has a job is in pa rt de t~ rmined by the omined \'ariabLes in the e rror te nn in the wilge regres sion.:nt variable. 'Orrclatio n becween tb e e rror te rm a nd the rcgrc~~or which leadlii to bias in the OLS estimator. l i t least on ave rage. was given in fl box in Cha pter 3.a re si milar to the factors thal de te rmine how much that person earm. Solutions to selection bias. B i a~ can be inuoduced whe n tht' mc th.ta is innuenccd b. I 322 CHAPTfR iii A!~~ing Sludies Based 0f1 Mukip\e Regression Sample Selection Sample selection bias occurs when the availability of the da. abilit )" luck . Sa id <.I. Such sam pling does not inlroduct bias. tbe :oimple fact that someone bas a job. is pOS il iv~. exper i~nce.Thus. wbere one lives.. domly se lected phone numbe rs o f automohile owne rs) was rela ted to the depen.Iiffercntly. The me thods we h ave d iscussed so fa r can nul eliminate sample selection bi'ls. a selection process that is rela ted 10 tbe value urlhe de pendent variablc. the sa mple se lection met hod (ran. (jo n process can inlroduce . and so fo rth. Th ose methods build on the tech niques introduced in Chapter 11.
• " • .
for convenience.. however. ity would rUIl in bo th directions : For the usual educational reasons low student.teacher ralio. while Eq uation (9.nlle .. In the test score problem . that a government inHm.:1[ as forward.. an OLS regression picks up both effects w tht OLS estima tor is biased dnd inconsiste nt.Ieacher ra tio 10 lest scores. In o the r words. o ur study o f lest scores focused on the eUect on test score~ of reducing tbe student.n X. the student .teacher ratio. we have assumed that causality runs from the regressors to the dependl. In t he tes t score example.. Accordingly. Equation (9A) represent s the reverse causal d fect of Yon X.3) is the fa miliar one in which /3.4) represents the rC\'I!f'C causal effect of test scores on class size induced by the go\'crn mc:nt progra m. il )'. Equation (9. so that causa lit ). and one in which Y ca uses X: ~ I Yi = f30 + (3tXi + u. where 1/ represents olher factors.leacher ra tios would arguabl y lead to high test scores. is the effec t on Yof a cho. This in turn lea d~ to si multaneous causality hi as and inconsistency of the OLS estimator. . there are (\vo equations.cly com:lated with the error term in the population regression.and (9'. this factor that produces low scores in turo resulls in a low student.1f so.\ . .3) represents the edllC a· tiona l effect of class size o n test scores.! to o ne o r more regressors (Y causes X)? If so. If there is simuhaneous cau'a]. This correlatio n betwee n the error te nn and the regressor can be made pre· cise ma thema tica lly by introduci ng an additional equatio n that descri bes the reverse causa l link . for exa mple. cau~. is presumed to run from the studeot. bUI because of the government program it also leHds to a decrea~e in U le student. Su ppose. = 'Yo + 'Yt Y . becauSt of the government program. but bccause of the govern ment program low test sco res would lead to low studenH eacher ratios. consider just th e two variables X and }' and ignore ot her poss ible regressors.. tive subsidized hiring teachers in school districts wilh poor tcstloCores.324 CHAPTER 9 Assessing Studies Based on Multiple Regression Simultaneous Causality So far.teacher ra tio is positi . suppose there is an omi tted fac tor that h::<lds 10 poor test scores. + VI' Equation (9. that is.I. a negative error term in the popu lation regression ot test scores on the studentteacher ratio reduces test scores. one in which X ca uses Y.:nt variable (X ca uses Y). there is simultaneous ca usHlity. But what if causality also runs from the dependent variai'll!.Ieache r ra tio. Sim ultan eous causa lity leads 10 correlat ion be tween th e regressor and the error teml. Thu s.) (9 ') x. causality runs "backward·· a.
u..To see Ihis. Assu ming that cov(•••1 1. and u.r.3). the to pic of C hapter l2..X .... E ve n if the OLS estima tor is co nsistent a nd the sam pIc is large. inconsiste nt standard e rrors will prod uce hypothesis tesls with size that diffcrs from tbe desire d signi ficance level and "95%" confidence intervals thaI fail to include the true value in 95% of repeated sa mples.. II.) ..:qualion (9.. '1h i hov.YIP ).) .). X.) = l'.. Solutions to simultaneous causality bias. u. rn3themlllically. the simultaneous causality bias is somc time~ called simultaneo us equations bias. It) .. YICID·( Y.. The re are two ways to mitigate si mult aneous ca usa lity bia1>.. + P .'.B ICO' ·(X . y .4 ) imp!ie~ I hal OOV(X. u. imagine tha t II. there is fI causal link from Y to X This reverst causality makes X correla ted with the e rro r term in the population regression of interest.9. o.. if 'YI is positive. II. + 1'. by I..) "" " 100' ( Y..) .) + COY (l'.and if 1'1 is posi tive. u. will be positively correlated. nme thaI Equation (1J. How eve r. hi. and such experim ents are di sc u s~e d in Chapter 13. 'Y I~ Soll'ing for oov(X.. The second is to design and 10 implement a randomized con tro ll ed experim e nt in which the r evcne causali ty channe l is nullified .U. Sources of Inconsistency of OLS Standard Errors Inconsisten t standard errors pose a different threa t to inte rnal validi ty. which decreases V.) Beca use thi s can be expressed mathematically using two simultaneous equa tio ns. in Eq ua tio n (9. also called simuhaneous equations bias.It. this lowe r value of Yi affects the va lue of XI tlHOUgh the second of these equations. 9.11. Il. .) . wiU lead to a low va lue of XI" Thus.CO\·(A. One is 10 use inslrume ntal variables regressio n.. Simultaneous causalit y bia~ is summarized in Key Concept 9. "Y1. a Jow va lue o r Y..6. Simuhaneous causalit y Leads ro correla tio n between X. Yl Y. is nega tive.) • oo~( Yn . a nd the e rro r te rm II. in addition to the causal link of interest from X to Y..2 Threats 10 Intemol Volidily of Multiple RegrM~ Ano/ysis 325 1  SIMULTANEOUS CAUSALITY BIAS KEy CONe"" SImultaneous causality bias.) then Yield) the result cov(X... ]) 'h i~ in curn implieS that oo' IX. L . / (1 . arises in a r~grt::ssio n of Yon X when.6 .
'Inc solution to this problem IS to use hetcroskedasricil y. lht. If. As discussed in Section 5. howe\'Cf. Serial corre laliOn in the error term can arise in panel data (d ata on multiple districts for multiple years) and in time series da ta (data on a single district for multiple yea rs). Correlation ofthe error term across observotions_ [n some . (he population regression error can be correlated across observa tions.hose standard errors are not a reliable ba~i. The consequence is U 13t the OLS standard e rro~hoth homoskedasticyonlya1rd heteroskedaslicityrobust. Correlation of the regression error across observations does no t make the OLS estimato].4. howevcr. this problem ca n be fi xed by using an alternative fo rmu la toT stand ard errors. The most common circumstance is when the dala are repeated observations on the same e nti ty o ver time.4. Ano ther situatio n in which the erro r te ml ca n be correlated across obser. t.7 summari zes the threats fa int ernal "alidity o f a muJtrrl . these omilled va riables could result in correlation of the regression errors for adjacent observations. Thi s wi ll not happen if th e data are obtained by SatnDling a t random from th e pop Ulation because the rando mness o f the sampli ng process ensures tha i th e e rrors are ind ependentl y distributed fro m one observa tion 10 the next.326 CHAPUR 9 Aueuing Studies Based on Multiple Regres~ion There are two main reasons for inconsisten( standard errors: improperly han dled beteroskedastici ty and correlat ion of the error term across observations. Heteroskedasticity. If th e omitted varia bles th at const itute the regression error are persistent (like district demographics). In many C(lses. setlin g~. but it does violate the second least squares assumr lion in Key Concept 6. the n this induces 'serial"correlalion in the regression error over time. sian wiLh panel da ta) and in Chapter 15 (regression with time series dat a). biased or inconsistent. fo r example. Key Concepl 9..a tjons is whe n sa mpling is based on a geographical unilo If there are omilled \'ari abies that reflect geographic influences. He teroskedasticityrobust standard errors are provided as an oplion in modern software packages. {or histories1 reasons some regression software report homosked3slicityonly standard errors. We provide such a fo rmula for comp uting standard e rr o r~ thaI are robust 10 both heteroskedasticity and se rial conelation in Cha pter 10 (regrc:s. the same school district for different years.are incorrect in the sense tba t they do not produce confidence intervals wit h the desired confidence level. Somet imes.robust standard errors and to construct FSt3tistics using a he teroskednsticityrobusl variance estimator.: regression ~rror is beteroskedaslic. sa mpling is onl y partially random. for bypOlhcsis tests and confidence intervals..: regression study.
When regression models are u ~ ed for forecasting. the discussion of multiple regression ana lysis has focus ed on the esti mation of causal effeCIs. If the variables arc not independent acro~ observa. how ever.3 Internal and External Validity When the Regression Is Used for Forecasting Up to now. concerns aboul external validity are very import ant. which in tum means that the OLS estimator is biased and '* inconsislCllt. Using Regression Models for Forecasting Cbapler 4 began by considering the problem of a school superintenden t who waDIS to know how much test scores would increase if she reduced class sizes in her . Apply ing this list o f threats to a multiple regression study provides a system atic way to assess the internal validity of tbat study. Re gr~s i on modds can be used for other purposes.9. E(Ui IXli"' " Xk . if presen l.tions. results in f& il ure of the first teast squares assumption. Simultaneous causality Each of these. 9.7 V THREATS TO THE INTERNAL VALIDITY OF A M ULTIPLE REGRESSION STUDY There are five prim ary threats to the internal validity of a multiple regression ~ KEYCONCEPT 9 . O mi tted variables 2.ts are not. including forecasting .3 Internol and External Validity When the Reg renion 15 used for Forecasting 3'2. Errorsinvariables (measurement error in the regressors) 4. Sam ple se lection S. rncorreci calculation of the standa rd errors also poses a threat to internal validity.) 0.. but concerns about unbiased estimation of causal effe.. as can arise in panel and time series data.7 1 study: l. Funct ional form misspccificalion 3. Homoskedastidtyonly standard errors are invalid if heteroskedaslicily is present. then a further adjustment LO the standard error for mula is needed to obtain valid sta ndartl errors.
S size. Now consider a different problem. Thai is.d tests.. however. TIle pareot interesled in Corecasting lest sco res does no t ca re whe the r the codficie nt in Equation (9. the pan:nt simply wants the regression to explain mueh 01 the varirll ion in test scores across districts and 10 be stable.e.".. Assessing the Validity of Regression Models for Forecasting Because the superin tendent's problem and the pl ren l's probli".J! bow well the different districts perfo rm on s tandardized tests based on a lim ih:d amount of information. elas!) size is not the only dete rmin ant of lest perfo rmance.328 CHAPTER 9 Aue55ing Studies Based on Multiple Regression school district lhat is. Equation (9.tha t is.. the parenfs problem is 10 forecast average tl!~t scores in a given district based on information re lated to lest scores. that rest score dala are not available (perhaps the) are confident ial) but data on d ass sizes arc.2R x STR .the requirements for the validity of the regression are different fof . the parent mu. More ge nerally. Although omitted va riabk hias renders Equation (1).CI plans ( 0 choose wh\!re ( 0 live based in part on the qua(jty o r the local schools. t h ~ superintendent wants to k.now the ca usal effect on tC~t scores of a change in cia5. Accordingly.usts.ressioD models [or fo recasting.evcn tf th~]r coefficients ha ve no causal interpretation.sl gu e~.' dis tricts to which the parent is considering moving. 5) We concluded that th is regressio n is no t useful fo r the superintendent :Th e OLS estim ator of th e slope is hiased beca use of omitted varia bles such as the composi· tion of the stud.teRcher rati o (STR ) (rom Chapter 4: TesrScnre = 698. hut from the parelll's perspel. To be sure. In th is situat ion.5) could he useful to the pare nt trying to choose a home. Suppose.nt body and SlUdents' other learning opportunities o utside school. regressio n models can produce reliable fore<.2. Olaplcrs 48 focused on using regrc.5) useless fo r an swering the causa l qu~ s tion .in panicu. Thl:: pa rent would like to know how different school districts perform on standardl/\. Rather. sion analysis 10 estimate causa l effects using observational data. (l). How can the paren t make this forecast? Recall the regression of test Scores on the stude nt. A parent moving (0 a metropolitan art. 10 apply to thl. it still ean be usc· ful for forecast ing purposes. lar. class si7. This reoogn ition underli es much ot 1M use ot" rcg.:tive what matler5 is whether it is a reliahle predict or of tesLperformance.'m are conceptu(lU~ very differcn l. Nevertheless.5) estimate!.9 . the ca usal effect on test ')C(lr~S of class size.
we return 10 the probkm of assessing the validity of <I regression model for forecasting. Both the Massachusetts and Ca lifornhl tests are broad measures ofslu dent knowledge and academic skills.7 .S. availa ble. In contrast. When a regrl:ssion model is used for forecasting. the (. I n Part IV. 9 . we con sider whether Ihe results can be ge nemli7cd to perrormance andardi zed tests in other el eme nt ary pu blic school distri cts in the o n o 'h ~r SL U nited Stal es. although the detaib dirfer. In this seetlon. ln the case o f test scores and class size. lbus. e leme ntary school di.ation is made.»\i matcd regression musl have good explanatory power. fi nding si mil ar resullS about the effect of the studen t. the orga nization of classroom instruction is broadly simila r at the elementary schoollc\el in the two states (as it is in most U. based on standard· ized les.that is. Conversely. results for fou rth grade rs in 220 puhl ic school distric ts in Massachusetts in 1998. we mus t add ress the threats to internal "alidity summarized in Key Concept 9. and il must be stable in the ~ nse that the regression esti mated on o ne set of data can be reliably used 10 make forecasts using ot her data. its coeffici ents must be estimated precisely. fi nding diffe re nt results in the two stales would ra ise ques ti ons aboulthe internal or c:= ternal validity of at least one o f the srudie!'l.9. we exami ne a dirferen t data set.a lthough aspects of elementary school fund ing and curriculum differ.1 nOled Ihat ha ving more than one study on the same topic provides an opportun ity to asseSS the exte rnal va litlity of both stud ie~ by comparing their rcsult!'l.!>lricts). To obtain credible eSlimates of causal effects. if we are to obtain reliable forecasts. in fac t. a paramount concern is lhat the model is ex ternally valid.teache r ratio on test perfo rmance in the Califor nia and Mas!\aehusens dilta would be evidence o f externa l validi ty of the fi ndings in Californ ia.4 Example: Test Scores and Class Size The uamcwork of internal and external \alidity helps us 10 take <I critical look at what we have learnedand what we ha\'e notfrom o ur analysis of the Ca)jfor nia t~s t score d ata. in the se. future va lues of time series dala. External Val idity Whether the Calirornia analysis can be gene ra lized. Section 9. Similarly.4 Example: TMI Scores and dau Size 329 their rcspcclive problems. Here.nse that it is stabk and quanti tatively ap plicable 10 the eirClIlllstancc in which the forecast is made. • . whelher il is e~te r nally va !iddepends on the populalion and ~I! tl in g to which the generalir. other COm para hIe data sets : nc.
me is grea ter in Cali fo rnia.1 I 19 18. it is in teresting to examine the relationship between T eST ~co r es lind avcr:lg.330 CHAPTE R 9 Assessing Studies Based on Multiple Regression TABU 9...2 for lhe C. is given in Append ix 9. the Massachuseu s data are althe school district leve l.lli fornia data.3% $I K747 220 199.6 15.c district in come in Massach usetts.. including definitions of the variables. Cubic and logari thm ic regr~" sion functions 3rt: also plou ed in Figure 9. Test scores and average distrt'ct income.7% 19. To save s p ac~. but the test is dif£erent. The average studentIeacher rat io is higher in Califomia (19.1 '0  .lifolll l<t data:The relal io nsh.9% 15.1 15. or nearly so. 2. that ill. Stondotd~ MauochuH1t.. The average perce nt age of students still learning Eng1i~h and th e ave rage perce ntage of s tudents rece iving subsidized lunches arc both much highe r in the Ca lifornia than in the Massachusetts d istricts.1 % % English learners 1. Ave rage distr ic t income is 20% higher in Massachusem" but the standard deviation of inco. Because il was 3 focus in Chaptcr 8.etts sam ples.'arit y.8 173 I ~ Standard Oeviot>o" Test scor~s StudeDtt ~. 7011.6 versus 17. Like the C'.8% 44. the re is 3 greater spread in average district inco mes in California than in Massach usetts. we do not pre )~nl sc3 tre rplots of all the Massachuseus data.1% $1 5.1 Summary StatistKs for CalifornKJ and Massochusetts Test Score Data Sets Colifornia A. Evidently. A_. Table 9.. The genera l pattern of Ihisscatterplot is similar to that in Figure &. The definit ions of the variables in lhe Massachusett s dala set arc the s!l me as those in the Cali fornia data se l. Th is ~ca tt crplot is presented in Figure 9.. so a direct comparison o f scores is not appropriat ~. 1.. 1 presents summary statistics fo r the California and Massachu:. The average test &core is higher in Massachuseus.. 2.1. The cubic regression function hJ ~ 8 . 1h e linear regression plOl' ted in the figure misses this apparent nonli m..1.317 420 1999 S7226  SS""" Comparison ofthe California and Massachusetts data .ip be1ween income and lest scores appears to be steep fo r 10\\ va lues of income and na lter for high values. More inform ation on the Mas sac hu s ~1lS data ~e t.3).:lchcr rat io  654.3 % Re ceiving lunc h su~idy Ave rage district income ($) Number o( obscrvlltion~ Year 15. ho\\ ever.1 19.3% 27.
Income fo.72 /050 :: .3. reported in column (1) in the table. 6&Ji r. also prescnt in the Massa chu se lh dfll3 .1 s how~ th at the ge nera l pallern o f nonl inearify found in the California income and (.Massachusetts Dota 78il The esti mated li rlCOr otiol'l r l«ar rrgression regreu lon function does no! capture the nonli near relotion between income and te~t scOte$ in the Mos$OChur. similer for district inc:omes between $13. however.2 than Ihe log<l ril bmic specificatio n (0.xample: Tesl5cofM and Clo~~ Size 33 1 FIGURE 9.e~t score data i:. The (jrst regressio n. with the cubic specification fin ing bes t in Massachusetts but lhe linear· log speciIicalion fitting best in Californ ia.. • .. ho W> Falifo rniJ ep for 10 \\ r pl(l~> ~IC regrc~> ilion ha~· • .64 in regression (3)..44). t.'. iOn paverJgl! [r S.e1ts dalo. 7<)(' l " . ".000 ond $30.Tht.0. hls o nly the stude ntteacher ratio as a regressor. the pe rcenlllge of students eligible for a free lunch .wf of the ~~ '~I e Cali· nitions c Cali· la ta SCi. huselt~ n 10 20 .. ". Controlling for the percentage of English learners. ~ .1. The estimated lineorIog and cubic regreuion functions or. from . Regression result s for the Massachusetts data are presented inlable 9.486 ve rsus 0. a nd average district income reduces the estimated cocfficienl on the student.69 in regressio n (2) and .L "''''.. the region conlolnll'9 mos! ob~I'YOIion$. The precise functiona l forms that best describe this non li nearity differ. The remainingoolumns report the results of including additional variables that control for student characteri stics and uf imroducing nonlinearities into the esti mated regression function.000.7 and 9.4 E. Comparing Figures 8.. and the hypOthesis that the coeffici ent is zero can be rejected at the I % signifi cance level tr :: .2.Ieacher ratio by 60% .9. T he slope is ncgalive (1.1 Test Scores V$. 7 ~t ~ 74() ~ " 710 f .1.72 in regr~s i on (1) 10 0.72). UbK regJ '... luc both Multiple regression results.. I() 50 Dim·jet income L _____________________________________ (Thousands of do Uars) rferent. rpresc nt F9.455) .cacher i~ 20% in Cali liforoi3 EngJ~h slightl y higher R. ""'•.
teacher rati o : The F· statislic if) regression (4) testing whether the population coefficients OIl STR2 and STR J are zero ha s a p.50) 0.l.2 Multiple Regression Estimates of the Sl\.7~ * (0.0.8) S£ R' . 220 obHrvorions.teacher ratio has a di[fere nl effect in districts wi th many En glish learners .4" (11.0.0023* (0.72) .IdentTeacher Ratio and Test Scores: Data from Mos5Q(husetts .091) Dislrict income (logarithm) District income District incorne l .value of 0..O .64* (0.085) .306) .641..·ut (0.53** (3. /li STR·' % English learn ers % English learne rs > median? (Binary.6 (9.303) .49) 0.582" (0." "ore in tne f(hool district.184* (0.37) 0.0023· (0.104) .5) 0.22 (2.434 (0 .0.521'" (0.. ..6&1 (0.0) . .a t isticall y significant evidence o f a nonlinear rela· tionsh ip between test scores and the st ude nt.0.4· ~  0 .3) 759.4 (1 4. even holding consta nt the :.0" (21.S5) 0. there is no evidence that a reduction in I h~ student. 332 CHAPTER.67* (027) s· '" .9" (23. Regressor (1 ) I' ) 0.2) (20.077) 16.090) 3. A ") 16) OependenT voriable: average combined English.56) .3.5· · (81.i2 u (0. fUll } (0.0010) 665.0010) District inwrn c. moth .0010) 7 4 7 .165 (O.1 74 (0.1) { TaMe 9.1 cm:ww rJi Comp<l ring the :R2's o f regressions (2) and (3) indicates thai the cubic specifi · cation (3) provid es a better mode l of the re lat ionship betwee n test scores and income than does (he logari thmic specificati.587" (0.31 ) 0.87· (2.6 (8.on (2).0.0. There is no st.0022 (0. 739..097) . O.164 (0.013) .12..1.[U · de ntteache r ratio. fourth grade.OS9) 0.80 (0.l 07 (2. HiE L ) o. and KierK.27) 12.27) (3 ) (4 ) Studentteacher ratio (STR) STR2 . q Aueuing Studie~ Bo~ on Multiple Regres~ion TABLE 9.o53 u (0. = Hi£L x 5TR % E ligi ble foc free lunch 0.02" (0.l Intercep.3. 437 (0. Similarly.15) .300) .3)  0.49) .CXHO) 744..0.737) '.35) 0..38 (2.6) 682.69* (0.0022· (o.
74 7. W· Comparison ofMassachusetts and Call/ornia results.86 (0.674 R' 0.001)  «0.80/0. Sland~t<.odficicnts.leache r ra lio based on the Massachusetts dat a. 2.75 « 0. (3 ) .1 rol1l.Statistics Clnd pValue.teacher ra tio does not change substanti ally when the percentage of English lea rners (which is insignifica nt in regression (3)1 is excluded.003) 1.0_73 ITable 8. Testing Exclusion of Groups of VClrioble.55 (0.4 .a ry :scboo] dl~lncl~ dncritx<.T ics.EL S£I< 7. The effect of cutting th e student.2.676 8.01 '" . . regression (2)1.Ieacher Tfltio is nonlinear.01 % le vel. Therefore we adopt regression (3) as o ur bafie esti male of Ih~ effect in test scores of a change in the siudent.038) 0. 2. Indi.9.. Adding variables Ihat con lrol for siude nibackgrou. There is some evidence that the relationship between test scores and the stu dent.l in Append»:: 9. Ihe res ulls in regression (3) are not sensitive to the changes in functional form and specification considered in regressions (4){6) in Table 9. 0 (0. even after adding variables tha t con trol (or Siude nt backgrou nd a nd diSlrict econo mic c haracleristics.. (I I All 5TH.l~~ ar~ given in parenTheses under th~ F.2.64 0.nd cha racle ristics red uced the coeffi cienl on the studentteilcher ratio from . 1.20S) 6.61 0.64 1) 4. are gwen in parentlle!>C's unde r the .2.3.63 0.~dual "'odfincrm arc Stallst lC4lly significanl al th e '5% level or . ~nd p'~' all..5) ~HO) I "~ tban wilh few (the Istalisticon HiEL x 5TR in regr~i on (5) is 0.teacher ratio did not depend in an important way on the percentage of English learn ers in Ihe distri cL 4. For the Ca lifo rnia data.'w~d) . variables and lIl t ~ract i on s .69  8. .020) STR!. .5S (O. The hypothesis that the true coefficie nt on the studentteacher ratio is zero was rejected at The 1 % significance level.001) X 5. In shOJi.431 Finally. a reduction of 68%. in n rela 3.'> • ..28 [Table 7.675 8. n!im~t~d USlIl! th« data on MassacbuseU$ e1emclll.:.64 8. 1. '_g1 are 1 in InC !arner. STRJ '" 0 Jllcomr.56 = 1. we found: I) " IHIII(d) J.62 8.002) STR 14. regression (1)] to .670 0..85 (0. regression (6) shows that the estimated coefficient on the student. H. pt:!cifi es and he stU i ~tic.063 0. 13 ~~ . income' JIiEL.45 (0.l errQ n.4 &omple: Tesl Scores ond Cion Size 333 ( TQh~ 9.675 TIwlot regressions wer".Tatt.
Because the two standardized testS are different. Mass1 Lmea~ Slandari .tandard deviation units. ru  catif nca CubiC" Rt'I/m Cubic: R. th e coeffi cients th emsi..2..69 [Tank 9. however. however.3. As io the Ca lifo rnia da ta. Includi ng the additional control variables reduces the coefficient on the student. divided by the standard deviation o f the tesi.'6/ 19. lliis comparison is undertaken in Ta ble 9. and second in ... divided by tht: 5lim· dard dc"i(l lion o f lest scores. The coefficien ts on the studcnt.7.h. the n the estimated class size effects can be compared.1 pomt. Beca use the standard de viation of test scores is 19. However. 'Tnu5 the coefficie nt on the studen t. O ne way to do th is tS to transform th e test scort:s by standardizing Ihem: Subtract the sample a\'crag.lbe standard error of this estimate is 0.ame unit s. The seco nd column report s the s ta nd ard deviation of the test sco res across districts.teache r ratio. ca n he cOOl'p<lJe<. the answer is } c$.teacher ratio from 1 .46 poin ts.. liw O LS coefficie nt est imate using C. first in the umts of the test. The slope cocfucienlS in the regression with the transformed test score equal the slope coefficie nts in the origi nal regressio o.!h·c~ cannot be compared directly: One point on the Massachuse tts test is not the s<lme as one point on the California lest. regression (2)1.26 X 2 / 19. The fi na l t\HJ columns report the estimated effect on test scores of red ucing the student._ lish learne rs in the d istrict.teacher ratio and Ihe binary v. so it is no t surprising that the Califor lllil cstimates are morc precise.aliforn ia dat a is . there is no statistically significant evide nce in the MassachuSCIIS da ta o f an inte raction bet wee n the studen t.c and divide by the standard deviation so thattbcy have a mea n of Uand a varianct: of 1..1 r TAB l. a Jeduction of 60%.. whereas they a r~ ').73. this corresponds to 1.teacher ratio by two students per teacher (our s uperintenden t's proposal). Those cocfficientl) arc only significant at the 5% kvel in the Massachuse l1s da ta. does oo t hold up in the Milssachusetts d ata: Th e h~ path· esis that the relationship betwee n the s tudelltt e~c h e r ratio and test sco reS is lin ear can not be rejected at the 5% sig nifi cance level whe n tested aga in st a cuhic specification. so cutt ing the st udent.teachl" ra tio re main signific:::a nt aft e r adding the eo nlrol variables. the lest scores are pur into the .72 [Table 9.334 CHAPTU 9 As~ssing Studies Based on Multiple Regression Do we fi nd the same things in MaSSlichusem? For fi ndings (1 ). If.1 X ( 2) = 1. the percentage of students e ligible fo r a free lu nr.!!" nificant at the 1% leve l in th e California data . regression (I )Ito 0..0.2.niable indicati ng a large percentage of Eng. s estimated to increase district test scores by _0. For the linear specification. and (3). there arc nearly t". Fi nding (4).076 standard deviations of the distribulton vi test scores across districts.d!/(. (2).teacher ratio by two. and the average district income incl uded as conlro l variables...1 = 0.11)c fi rst column re ports the OLS est ima tes of the coefficient on the studentteache r ratio in a regression with the perce ntage of English learne rs.l across the two data sets. i c~ ilS ma ny observations in the Califo rnia data.
1 j 1.26) 19. so:.2 / 19.1 19. Cutt ing the student. Based on the Massachusctts data. In the Cal ifornia data. while nonzero.9. arc small.076 standard deviation unit .. 0.. this c:slimaled dfecl is 0. with a standard error of 0. a red uction of two students est scores by 0.027.soc hu Mtt$ 19.lble 8 3(7) RtJII('t oS (R Irom 22 11120 Mo.lnd ..teacher ra lio by two is a large change for a district. but the predicted improvement is sma ll.3(2) .52) 1 0.4 Example: Test Scores and dau Size 335 TABLE 9.2H 0. with the specific effect depend ing on the in itial stu dent.thle 8..teac he r ratio.46 (0.' Deviotton* unear:Table 8. but the estim ated benefits shown in Ta ble 9. The estimated eff~c ls for the non linea r mode ls a nd thejr Slanda rd errOrs w~re computed using the method described in Section 8. fo r example. 1) standard deviations.54) !:>t. TIle non linea r models fo r Californ ia data suggest a somewhat larger effect.0.ralio by twO would move a district on ly one ·tent h of the way from the med ian to the 75 th percentile of the distribUlion of test scores across d istricts.027) 0 153 (0. or 0.·en In r arcnlh .037) ruN.085 (OJI36) to. r Standard Pain'.027. l.teacher ratio is predicted to raise tcS! scores. Based on the lin ea r model using Ca lifornia data.: T. on the Te.w~ Stvdenf$ pet' TeeKher.64 ( = 12.. the differe nce in test scores bet\\cen the medi an district and a diSlricl a t the 75 1h pe rcentile is 12.27) 15.69) 1 0.085 standard deviation uni t. In Units of: OLS Estimate 8Jr. Ratios and Test Scores: Comparing the Estimates from California and Mossochusetts htimoted Effed of T_ F.1 1. nJ ~nQn Ire &I. CaHtom io Standard Devtarion of T. Reducing the student.according to this esti mate.s' Stores Across Dislri<'. per T eacher is es lim at~d to increase T wilh B standard erro r ofO.3 StudentTeach.93 (0.036. in other words.1).2(3) .64 (0.0. The estimuted effect from the linear modd is just over oneIent h Ihis size. CUll ing the Slu dent tcacher.099 (0.73 (11.3(7) Rt rillCt srR from 2f)w 18 Cui"llc·1'. .1 2.076 (0.2 test score points (Table 4.70) 190 (0.036) Lme!lrTahlc 9. These estimates arc essen tially the samc.3.
We consIder these threats in turn. lO conduct an experime n1. such as other school Hnd student charac teristics. diSlriCIS with a low student teacher rat io migh t also offer man y extracurricular learning opportunities. A[~o.: did nOt substantially ah d the estimated effect of reducing the studcntteacher ratio. districts W iUl a low studentteacher ratio might attract fami.io n specificatioos.. We found that some of th e possible nonl incarities investigated were not statistically significant. at least when generalized to ekmenw ry school districts elsewherc in the United States.) Ihe coe ffi cie nt o n the ~lUdentt eac her ralio. and if t ea ch~r quality a {fecLS test scores.. at least in theory. and we examine it in Chap ter 13. if the studenHeacher ratio is correlated wit h teacher quality (perhaps beca use better teache rs are at tracted to schoob with smaller stude nt. Omitted variables.n the estimated effect on test scores on class size. while those th at wen. Similarl y.336 CHAPTER 9 Assessing Studies Based on Multiple Reg r~sion This ana lysis of Ma~sachusetts data suggests that the California results <Ire externally valid.2 listed five possible th reats to internal validity that could induce bias i.: m<liu findin gs of these studies are unlikely 10 be sensitive to using dirferent nonlinear regrc. this suggests tha t tho. i!. O ne way to eli minate omitted variable bias. students could be randoml y aSSigned to different siu classes. The analysis here and in Chapter 8 exp lored a varict) of hmct ional forms. Altho ugh furthcr rune' tion<l l fo rm ana lysis could be carried oul. Possible omitted variables remain. Such a study was in fact conducted in Tennessee. Section 9. Functional form . Such omi ued factors could lead 10 om il1ed variable bias. Internal Validity T1le sim ilarity of the resulls (or California and Massach usetts does not ensure their imemal va lidity. and their subsequent performance on standardi zed tests could be compared.then omission ot teache r quality could bia'. .teache r Hlli05). TIle multiple regressions reported in this and previ ou~ chapters control for a student characterist ic (the percent age of English learn ers ). For example. and their omission might cause omitted variables hias.. a family economic characteristic (the percentage ot students receiving a subsidi/t"d Il1nch) .1ies that are more com mi11ed to enhancing their children's learni ng at ho me.. For example.. and a broader measure of the affluence of the di suict (average district income).
Another variable with pote·nt ial measure ment error is average district incume. ct an size d be it in All o f lhc results repon ed here and in ea rl ier chapters use heteroskedasticrobust standard errors. Simultaneous causality. a series of court cases led to some equalization o f funding.. Those data wen~ taken from the 1990 census. Altbough there afe alt er nat ive standard error formulas th at could be applied to this sil u<l tion... '~om mance on standardized l es t~ affected the studentteacher ratio. This could bap· pen. al or Th e average studentteache r rati o in the distri ct is a broad and potentially inaccurate measure o ( class size.9. t. while the ot her data pertain to 1998 (Massachusclls) or 1999 (Califo rnia ). however.ng the time of these tests.:.'on to believe that sample se lection is a problem here. for example.. which in tum resulted in hir ing more teachers. the det ails are compl ica ted and specialized and we leave them to more advanced lex ts. u<. this would be an imprecise me<lsure of the actual average dislric1 income. in neither Massachusells nor Calirornia does simultaneous causality appear 10 be a problem. il1ed ~ fun c' Ings 01 ~essi o n Discussion and Implications The sim ilarity between l h~' Massachusetts and Cali fornia results suggest that these studies are externally valid. {n Massachusetts. because stu dents move in and OUI of districts. which in turn could lead to the estimated class size effect being biased towflrd zero. The California and the Massachusetts data cover a ll the puhlic ele mentary school dist ricts in the slate th aI satisfy minimum size restrictions. alter Iyof g. Thus. if there is a bureaucratic or political mechanism for increasing the fun ding of poorly pc rform ing schoub or districts. In California . Simultaneous ca usa lit y would arise if the perfo r ac the tter d if Ihe nl Iso. Correla tion of the error term across ob~ e r va li ons. so heleroskedasticit y does 0 01 Ihreaten in ternal va lidi ty. ~o th ere is no rea.l istri butjo n of funds was not based o n student achieveme nt . Errorsin ~. . ro r exam ple. :ould Heteroskedasticity and correlation ofthe error term across observations. could threaten Iheconsislency of the standard errors because simple random sampling was not used (t he sample cons ists of all elementary scJwol districts in the st ate) . s).4 C Example: Test xares and Closs Size 337 e . no such mechanism for cqualization of school fin ancing was in place duri. hut Ihis rel.ariables. II" the economic composition o f the district cbanged substantially over the 199Os. in th e sense thai the main find ings can be generalized to perform:lnce on standa rd ized tests at oth er elemen tary school districts in the Uniled Stales..the stude ntteacher ra lio might not accurately represen t the actual class sizes experienced by the students taking the lest. ed CI Selection.
Th e benefits include improved academic per formance . Thrc. and district affluence.nd Willms (200h. . CUlling Ihe student. but it is quite smaiL TIlls sm<lll eslim atcd effect is in li ne with the results o f the ma.. misspecific<1tiOn of funct ional form (nonlinearities). GaIDOran.. SLiIl. A leading candidall' is omitted va riable bias. some pote nti al threats to internal vali dity remain ..~ by "hrcnberg. she will need to wei gh the COStS of th¢ proposed reducti on against the bendit s. In making this dec ision. Brewe r. perhaps arising because the control variables do nut capture o ther characte ristics of the school districts or extracurricular learning opportunities.5 Conclusion The concepls of int ernal and external validi ty provide a framework for assessing what has been learned fr om an econometric study. we arc able to answer the superintendent 's quest ion fr om Section 4. and if standard errors 3re consisten t. imprecise measurement o f t he inde pend~nt ~J r you arc mterestro in learning more ahoulthe reilltion~hi p bel"'ccn cl a~ Site and Ic~l ~or~j. '\<'. and after mode ling nonljnea rilies in Ihe reg ression functi on.1 : After controlling for fam il y economic backg round. 2OO1b).Ieacher raLio by two students per teache r is predicled to increase lest scores by approxi malet)' 0. The costs in clude leacher salaries and expenses for additional classrooms. A stu dy based on multiple regression is inlernally valid if the estimated cod fid ent s are unbi ased and consistent .• 338 CHAPTER 9 Assessing Studies Based 00 Multiple Regression Some o f the most important potenli al threats to internal validi ty have been addressed by cOntrollin g for student background.: the revie. 3. and district afnuence. family economic back ground.ny stud ies that have investi ga ted the effects on test scores o f cla ss size reductions. which wt have measured by performa nce on standardized l e<. The eSlimated effeci of Ihe proposal o n sta ndardized lest pe rfo rmance is one imporlan l input into her ca lcu lation of costs and be ne fil s..08 sta ndard deviatio n o f Ihe dis tri bu tion of teSI scores across districts.lIs to the internal validity of such a study include omitted vari ables.t~ hUI there are o ther poten tial benefit s that we have not studied . includ ing lo\\er dropou t rates a nd enhanced future earni ngs. student characte ristics. Based on both th e California and the Massachuse liS data . 4 Th e superin tendent can now use lhis estimate to help her decide whether 10 reduce class ~izes. and by checking fo r oonlinearities in the regression funl"' tion. Thise ffeci is statistica lly significant. 9.
con fid ence int ervals and hypot hesis tests are not valid when the standard errors are incorrect. Second. one o r mOre of the regresso1'5 is measured wi th erro l'. Whether or no l the re a re two o r mo re such studies. o r if they are het eroskedaslic but the standard errors are comput ed using the homoskedasticilY o nly fo rmula. The next {Wo p. A study is externally valid if its infe rences and conclusions can be genera lized fro m the population and selling studied to o ther populations and sellings. Regressors and error terms roay be correlated when there are omitted variables. an incorrect functional form is used . Some times il can help to compare two or more studies on the sa me topic.r~ ss ion estimation of causal effects. A study is inte mally valid if the sta tistical in ferences about causa l effects are valid fo r the popUlation being :. which in tum makes OLS estimators biased and inconsiste nt. A study using regression ana lysis. In reg. These latter problems ca n be addressed by computing the stao· dard errors properly. o r there is sim ultaneo us causa lit y between the regressors and dependent variables. I'" . like any statistica l study. Each of these introduces correlation between the regressor and the error term. g fIS r. is ext ernally valid if its find ings can be generalized beyond the population and se lling studied . then internal validity is compromised because the standard errors wi l\ be inconsistent. 3.ms of this textbook develop ways 10 address threats to inter· na l valklity that canno t be mitigated by multiple regression analysis alone. Summary 1.t udied. OLS estimators will be inconsistent if the regressors and error terms are correlated . the sam ple is chose n oo nrandomly fro m the population . First. which are causal eUects th~H vary " I . however. there are two types of threats to inter· nal valid ity. Part III exte nds the multiple regressio n mode l in ways designed to mitigate all fi ve sources of potential bias in the OLS estimator.Summary 339 variables (errorsinvariables). 2.f over time. If tbe e rrors arc correlated across observation s. randomized controlled experimenlS. Statistical studies are evaluati!d by asking whe ther the analysis is inte rnally and externally val id. as they can be with time series data. Part IV deve lops me l hods for an alyzing time series da ta and fo r using lime series data 10 estimate socalled dynam ic causal effects. and simultaneous ca usalit y. assessing external validity requi res making judgments about thesimila rities of the population nod selling studit:d and the population and st: Hing to which the results are being generalized. Part JJI also discusses a diffe re nt approach to obtaining intern al validity. sample selection .
The seco nd uses the heteroskedasticity·robust formu la. il is not neces'iury for th.: b~ using citylevel da ta.6 A researcher esti mates II regression using two diffe rent softwar e packages. pUler software uses the bo moskcdasticiryonly stand ard errors. thallhe regressio n model be exte rn ally va lid (or the forecastin g applica tion at hand. 9.I. When regression models are used solely for fo recasting.5 A re. The first uses the homoskedasticityonly form ula for sta ndard errors. o r when the errOt term is correlated across differen t obse rvations.4 Suppose that a Slate offered vo lunta ry standardized lests to a ll of its thlrJ graders.3 Economic variables a re oft e n measure d wil h e rro r. !. The s tandard e rrors . how ewr.1 \\'hat is the difference between internal and external validity? Be lwcellihe populal ioo Sl udie d a nd the populatio n o f int e resl? 9. Standard errors arc incorrect when the errors are heteroskcdastic and the Com. Explain how sample se lection bias might inva lidate the resulls. and these data were used in a s tudy of class size on studem ~rr<1T' mance. It is critical. 5.<. regression coefficie nts to be unbiased esLinHlIes of causal e(fects. 9.carche r estimates the t: [fecl o n crime rales of spending o n polic..340 CHAPTER 9 Auening Studies Based on Mukip\e lI:egreuion • 4..2 Key Concept 9.2 describes the problem of vnria ble selectio n in t e rm ~ 01 a tradeoff belwee n bias nnd varia nce. Whicb should the researcher use? Why? . Explain how simul taneous causality migh t in vnlidJle tlw results.H" very different. Wha l is this tradeoff? Why could incltld· ing an additional regressor decrease bias? iocrease variance? 9. Key Terms internal vatidity (3l3) external vl.tlidity (313) population ~ Iudied (313) popul at ion o ( inte rest (313) setting (315) functional form m isspccificario n (311) errors· inva ria ble bklS (3 19) sample se lection bias (322) simuhaneous c:lU sality (324) simult aneous eq uati ons bias (325) Review the Concepts 9. D oes Ihis mean thai regression analysis is unreliahlc ? E xpla in .
9. Exp lain how sample se lectio n m ight be the cause of this result. Show that the regressio n V. and independe nt of Y.2 Consider the onevariable regression model: Y. + Wj' b.construct a table li ke Table 9. Th ey found thai wome n with mo re ch ildre n had higher wages. Y.1 Suppose that you have just read a careful sta tistica l stud y of the effect of adver1ising on the de mand for cigarelles.Ld .ure me nt e rro r which is i. Suppose that Yj is Y j + 11'. and so fOrlh).2. educa tio n. .sa lisfies the assumpt ions in Key Concept 4. Evalu ate these sta teme nt s: "Measure me nt erro r in the X's is a se rious proble m.) 9. = 130 f3IX.ubways was more effective than print advert ising.!ing empirical result. (Assume tha t W j is inde penden t o f Yj a nd Xj (or all values of i a ndj and has a finite fourth mo rne n!. .3 Labor econo mi sts stud ying the determ inant S of women's earnings discov ered a puv.conlroWng for these o lher fac tors. occ upation. ( Him: Notice that the sample includes only women who a re working. Usi ng data from New York during the 1970s. T = = a. so that the data arc Y. :... = f3u + f3LX.) (This empir ical pu zzle mOlivated James H eckman's research on sample selection t hat led tu his 2000 Nobel Prize in eco nomics. Measure. and Xi' Conside r the population rcgre ssio(l V i f3u + f31X i + v" ~here V.E.) c.3 to compare the estimated effects of (l to Ol" increase in distri ct income o n test scores in California a nd Massa · chuse tts. is the mea· measured with error..3. Can confidence inte rva ls be constructed in t}le usual way? e. nn d sup· pose that il satisfies the assumption in KeX Concept 4.· + v. New York in 2006. it concl uded th a t ad vertising on bu:. Using ra ndomly selected employed women. Use the concept of external validity to deter· mine if these results are likely to apply to B ~ t on in The 1970s: ~ Angeles in the 19705. is the regressio n e rro r us ing the mismeasured depende nt variable.3.ment error in Y is Dol : ' 9. they regressed ea rnings o n the women's number of children and a set of con tro l va ri ables (age.es a nd :.4 Using the (egressions shown in colunm (2) o fT:i bk s'3 a nd column (2) of Table ~. Show tha t Vi = It.x.erciSM 34 1 Exercises 9. + II. Are the OLS estimators consb ltnl? d. whe re w.
c.l[ de term.In e rro r when he e nters the d a ta into his regression program: H e ente~ eiu. fHint: Use Eq ua tio ns (4.ine dem a nd.. R' = 0.. A researche r uses the slo pe of Ihis regressio n as an estimate of the s lope of the demand function (f3d. where II de no te!.. Pdenates price. A no the r researche r is interested in (he same regressio n.l .2) 15.) 9. P.1. Use yO U( a nswe rs to (b) a. so he has 200 observalio ns (with observati on I entered twice. Solve the twO simulta neous e quations 10 show how Q a nd P de rcnd o n!1 and v. H. and t he covariance between Q and P. and Q. observations for (Y"X. i. Using these 200 o bservations. De ri ve the m ea ns of P and Q. Suppos..5 The demand fo r a commodity is given by Q = {Jo + {3I P + /I . Qi is th e regressa nd a nd P.'X. Deri .) y == _+ _ X. J ii. a.: lh at LI a nd v bo th have a mean of ze ro. .: regressor.l (..7) a nd (4. (. observatio n 2 e nlered twice. Y. d. i. but he make.) is C Ollected. U~ Ihese to dete rmi ne the regressio n slatistics. and so fort h). fac tors o th e r tha n p rice thai determine supply.ion coe ffic ie nts..1 + 66.) Suppose tha t the sa mple is very la rge.' a n d ~.6 Suppose n :: 100 i. have varia nces u. mOlUa lly uncorrc la led.s of Ya nd X as funcli o o ~ of the ·'cor· rect"' va lues. S upply fo r the commodity is given by Q == "Yo + "YIP . SER = _.:h observation twice. where Q denOTes quantity.nd (c) to de rive values of Ihe regres:. variances and covarianct..81. whilt results wi ll be produced by his regression program? (Hint· W rite the '·incorrect'· values of the sam pit" means.32 .R~ = _. and art. A rand om sample uf o bserva tions of (Qj. the variance of Q.d. Is the es timated slope too large o r 100 sm all ? (Him: Use the fac t tha l de mand cu rve~ slope down and supply curves slope up. b. and u de no tes fact ors olher than price th. e the variance o f P..1) (12.342 CHAPTIR 9 As~sing Studies Bo~ on Multiple Regression 9.. is regressed o n Pj' (That is.8). is tb.: s~i o n Y .SER = (1 5.) yie ld the following r~ s ult s : regrl.
11 Read the box "The De mand fo r EconOmics Jo uma ls'· in Section 8. 1 .1(1). "Ao ord inary least squares regressio n of Y on to X will be internally inconsiste nt if X is correlated with the e rro r te rm .1(1).3.3.n E mpirica l E:cefcise 8.1 0 Read the box "The Returns to Educatio n a nd Lhe Gender Gap" in SCClioo 8. 1(b). E mpir ica l E:xe rcise 4. 9. errorsin . "Each of the fi ve primar)' threats to inte rna l validity implit!s that X is correlated with the e rror te rm:' Would the regressio n in Equatio n (9. TIle da ta set C PS92_04 described in Empiocal Exercise 3.18).1inear regression of Te.uScore on Income shown in Figure 8. misspecificatio n of the functional fo rm of the regression. his ~amplc Ie ··cor· b.Empiricol &erci5e~ b.variables. Would either of these regressions provide a reliable estimate of the effect of income o n test scores? L) ion Would e it her of lhese regressions provide a re liable mc thod for forecasting lest scores? Explain." b. 1 10 a nswer Ihe fo ll owing questio ns. Include a discussion of possible om illed variable bias.1 includes data fro m 2004 and 1992. 9.wn 'ession Empirical Exercises £9. Discuss the intem a l and exte rna l valid ity o t the estim ate d e ffect of price per cit atio n on su bscriptio ns. [the large . simultaneous causality. sample selection . Discuss the inte rnal a nd exle rnal va lidi ty of the estima ted effect of ed u catio n o n e a rn ings. useful fo r predi cti ng test sco res in a school d istrict in M assachusettS? Why or why no t? Consider the.5) be.1 Use tht da ta se t CPS04 described in. 343 a. Which (if any) of the interna l validity cond itions are vio lated? Are the following state me nts true o r false? Expla in your ans\\eT . and inconsislency of the OLS standard errors. Use these data to investigate the (temporal) ex terna l validity of the conclusions that you reached . kes an rs each enterl!d " D iscuss th e inte rnal valid ity of the regressio ns th at you used L O answer Empirica l Exercise 8.2 and the nonlinear regression in Equation (8. [Nult~: Reme mber to adjust fo r inflat ion as explained in Empirical Exercise 3.
344 CHAPTE R. The data SC I ColtegeDistance excluded students from western s l atc~: data [ar these student s are included in the data ~e t Coll ege Dis· tanceWest.1 form of the regres:. c rrors·in va ri ables. APPENDIX 9.2 that has served as the basis for several Empirical Exercises in Part II of the lex\. so you mUSl base your recommendations on thl analysis of the datase t TeachingRatings described in Empirical Exercise 4. simullaneous causality. (This is legal as 10tl~ as doing so is blind to race.Thc test is sponsored by the M assachu~elts Department of Ed ucation .3(i) ..3 Use the data set Coliege Dishm ee de scribed in Emp irical answer the foll owi ng Ques ti a n ~ g.3 10 Discuss the iote rnal va lid ity o f the regressio ns that yo u used to answer Empirical Exercise 8. Based on yo ur analysis of these dala.) You do not have time to collecl your own data. t~·1TI (MCAS) lest administered to all fuurt h graders in Massach usclts public sehoob in the spring of 1995. and gender. and inconsistency of the OLS standard errof5. misspecificalion of the function. E9. 9 Asw»sing Studie~ Based on Mu~iple Regreuion E9.1 The Massachusetts Elementary School Testing Data The Massltchusells data arc distrietwide averages for public elementary school dislfld.. about whether your college should take physical appearance into account when hi ring teaching fucuil y. sample selection.ion. religion.3(i). \\Ih<ll is your advice? Justify your advice based on a care(ui and complete assessm ent of t he internal and C\II.. your help hefore reporting to the Dean.' Hl 1998. U~e these data to investigate the (geographic) ext ernal validity of thc conclusions that you reached in E mpirical Exer· ci:"e 8. Exere is~ 4.. as an econometric expen. age. b.2 A committee o n im proving undergraduate teaching at your college need.\lId . Include a d iscussio n of possible omitted vari able bias. The lest score is taken from the Massachusetts Campn:henswe Assessment Sr.:r_ nal val idity oC Iht: regressions that you carried o ut 10 answer the E mrirical Exercises using these data in earlier chapters. The committee seeks your advIce.
:e.al Ing me I Data on the ~ tu dentte3che r ratio.b' :ltiOn aoJ .199S school year. mat h.I(1ll Is in .ion. on the English. and were obtained from the Massachu setts Dep:trlme nl o f Education. Cen~u~ "< k al :\ to weT .. Data on avemge district Income were obtained from the I:~ Ihe 10< 1 ~ 1990 U.chool district for the 1997.The Mau ochuseth ElemenlQry School Testing Dota js 345 IS mandalury for all puhlic schools. The dala ana lp~t"d here nre the ow:rali lotal soon:. and science ponions ofl he test.S. lC ~: d ijst n ct~ III I Sy. which is the sum of the scoret. and Ihe percentage of stude nlS still ieaming English are ave rages for clIch elemen tary :. . lhe percentage of students receivi ng a subsidized lunch.
'i Exp eriments C H A PTE R 12 C II A I'TB R 13 • .~ RT TH REE I Further Topics in Regression Analysis C H A I'TEK 10 C Il A PTE R II Regression with Panel Data Regression with a Binary Dependent Variable Instrumental Va riables Regression Experiments and QUQ.
o r enL.S. 1 descri bes the structure ofpanel data ana introduces Ih" .~UIU addfiJS thi s questio~ using data on traffic fatalities..rivin.!! omilted variable bias.~ . allows e~ ~ ~~ to control for variables Ih al vary through lime. is an extension of muhiple regression that exploits panel data to control for variables that differ across entit ies but lire constanl over time.P _/J.". the)' cannot be included in the regression <lnuthe OLS estimators of the regression coefficients could ha.. however. .. is observed at twO or morc time periods..g.a!!d d.2 and 10. ". Fixed effect s regression is introduced in Sectio ns 10. ~ panel da ta.(ul~ U lti ~ l e regrcssio.. I . ~? Section 10.!coho! taxes.. .~~ f1"1'1* 4kc . rl'. • ~w.. it is possible 10 eliminaLe the·effect ofomitted variables that differ across ff ~ entitiesbut are constant" over time. .ily. the main tool for regression analysis of (lVV tt.'\flint! .~.. called pane l data.. the n for multiple time period:.~ CHA PTER 10 Regression rmw . In Seclioo 10..gM/~1 /wI) f Of data.e~_ . such liS preva iling cultural atlifudcs toward drinking. drunk driving ="> . (/~ dm 'fngdata ~. in which each obscrva tion<il unit. states for each of th e seven from 1982 10 1988. these • 349 .foLsome ofi t he varia bles.talilies?"We 1Ji. I( data are not 3vailable..t do not cbange over time.ud .lJ)' studying clltl1~geJ in the dependent variable over time... bU.. Th is chapter descri bes a mcthod fo r contro ll ing fo r some Iypes of omitted variables wilho ut actua ll y ubserving them. but do not vary across s tates.." is a powe rfull ~ 1 for controiJ ing tor fhe effect of vanables on which wc· have data. fi rst fo r the case of ani)' two lime periods...3.4. # variables th at differ (rom one state to the next. and related \'ariahles fo r tbe 48 contiguous U.. lualso ~ ~ ~t/ . ilke Improvements inj the safety of new cars.... This p'and data sci ~ts ~ears us cOlll. The em pirical application in this chapler concerns drunk driviQg: Whalare the effects of alcohol taxes and drunk driving laws on traffic fa.· ~ with Panel Data M ptx. This me thod requires a speci fic type I"""'" (" t/(A< ..Kee.faw s. ~..(.roJ for unobserved . Fixed effects regression...
. lf the data set contains observations on the variables X and Y. ~~ pt. Som e add ition al terminology associated wit h panel data describes ~ .Apand 'lhal b.... nandt= I.00 10 kCl'P Irack of both Ihe entity and the time period .~ observatio~. we use these methods to st udy (he effect of alcohol (axes and drun k driving laws on traffic deaths..' . which control (!UK.. refers to the entjty. ods T. refers to the entity being observed.ume pcnQd. 19M).1 k.).1) .. for e" amp!.h of T~nods.. and the second . ~ 10. then the data are denotcd (XIl'Y. me thods are extended to incorpora te socalled timc ~ed e ffects.i..T t<.1Ji'lS notallon IS summanzed III Key Concepl lO.1subscript.iJ QVU . . I.. refers to the ti me period of Ihe ~ ~P~j IJ ~ B~ /J . ..JtJ? ~me o... l.~ _ . we need some addit ional n0 13 1 . ...J. ffll e stale traffic fata lity data studied in this chapter are panel data.u.cs utlwo or more time peri.U / ~Pve/ for unobserved va riables that <lre constant across enl ities but change over ti11\f". for a total of 7 X 48 = 3 3(1 observations. Section 10.. T. When de!scribing crosssectional data it was usc fulto use a subscri pt to denote the e ntil y... 1O.bser"at io~ are mis_s~l~.5 d iscusses tbe panel data regression assumptions and standard crron for pfln el da ta regn:ssion.ohscrvalion~ that ~ .' Recall fro m Section 1....on.3 th at pa nel dala (also caJled longitudi nal data) rcfe~ to data for II differenl entities observed at T different time periods.SI o f observ. wh ere tbe fi n. A bala~ced pancl . 1982. ~I/AfiI.. on .. \\hen: each emity is observed in T = 7 time periods (each of the! year..{.vI ~ the vanablc~ arc ('~rved for each cntlty.i= l. When descri bing panel data.h~ . i.I! PI e A4 : J.:. t.v!e J 1tvd(1 .l..'h of II entities in lite I wht:lh~r . and tile second sub.nt iI Y.and each.he "men cn.. refen to the date al which it is observed..' ~ Panel Data = rl IJ ft.350 CHAPTER 10 Regression with Ponel Data '__ KeY CONal'! NOTATION FOR PANEl DATA ~ (10.aJl its . I... I n Section 10...... referred to the varia ble Y for rh ~ il h e. Y.ThuS !J deno~es I~e variabl~ Yo~served for the . . I /... script..6. ervdr: . 1 Panel dOl' cons. Those data arc for II 'R enli· ties (stales)..ln is is done by us ing two subscripts rather than ••JIle: 'm e fi rst.
200J) estimatcs tha t as many as 25% of dri\(:~ on the road belwee n I A. ho has not been drinki n!!:.I' . The . and th is fraction rises during peak drinking periods." ~~=2. hJd  . ~ Example: Traffic Deaths and Alcohol Taxes f we did not have d ata on fatalities for some s lat c~ in 1983). ...000 peo ple in tht' popula tio n in the st ate.. and) A.. The pan el data set \Jriables related to traffic fatalities and alcOhol.. Approx.. ]) ~ : In this cha pte r.!in& data for at least onc time period (or at least one entity is called an unbalanced p. _.. .. to 1 41'1 dr.=iT "I the 10% Ic\'. O ne stud y (Levitt and Porter. stales for all se\e n yea rs."""' . ' lne measure of traffic ". the Iype of drunk driving laws in each state in each year.. the ...!r of traffic fatalities in eacb state in each year.~ ..! "'''~ ~ / _ 'y _ a lo. .e depends on the regress.a. (.! The dala.cl.. however .:~firetcnt I 'fF. although pre cisely how 10 do so . A Roint in th is scatte rplot reprc ..01 . OIS8eerTax l1982da ta ) po!I.I ~ ~p wI~' if<. I Panel Dote 3 St some mis.  l7 10...1i...ltlVC. wruch is the beer tax .n pract. wc s ludy how effec tive va rious government polic ies d~sigDed to discourage drunk dri ving aClUally are in reducing traffi c deaths. some dala were missing (for example.000 highway traffic fatalities each year in the United Stales.M..lhe n the d a ta sel wo uld be unbalanced . put_into I~ dollars by adjusting for innation .. ng used.x: fat a lity ra te and the rea l tax on a case of bee r. se nts tne talUlily ra te III 1982 an d the real beer tax in 1982 for .2 ) . all t hese methods can be used with a n un balanced panel.The methods presented in t his chapter ar e d escribed for a balanced panel . but noutatist .. If. I' ~ ~ . have heen drinking. including tbe num .rc dc!'.  There a re a pproximately 40.~"/1c.f.. Ihey an:: pO I inlO"1988 Jollars~ Llsing Ihc r onsomer Price: i ndcx ICPI). I~?*' .' ln e measure of alcohol taxes we use is the " real" tax on a case of beer.M.S. ITo make the I~xt~ comp~r~ble ovCl" limc..o" softw"" he. Fig ure 10.M 7i. hcc:lo"C of inflation a IU of S 1 in 1m \:orresponds 10 a tax o f $1.o plotted in the figure: the estimated regression line IS ~t. a nd that a dri\c r who is legally drunk is a t least 13 times as lik elv to ca use a (a lai nash as a d river \.. ana the tax on beer in each slate.. I3J _ "'~ on the real beer tax is I V I~.} given sta te.10.. b. j.~ 1 _''~ ( IO.1.1a is a scatterpl ot of the d:l1a for 1982 on two of these varia bles... I'br e){nmpk.Appendix .e _PJ (015) (9.cri!"lcd in more detail in.im ately o nethird of fa tal crashes involve a driver who was drink· ing.delll hs wc usc is the fatahtv rate.t'.W~ ~ _ OLS rcgression line o btained by regre~sing the fatality ratc on the rea l bee r lax is contain~ .. The tra ffi c fa tality data set has data for a ll 48 U.nel. however.23 in 1988 dollars .. so il is ba la nced. if {.a bl. which is the number of a nn ua l traffic deaths per 10.
u I L _~'_~_~_ _ ~_~_~ 0.• '.. Both plots show 0 posi · tive relolion$hip between the fotol ity rote and the rear beet ' . *~ • •: ..000) 1 ..5 I0 1. ..0 15 1. 2. I58mTar .000) ' .liriu per 10. • ~ . 0·· • .5 • •• • • • • Fillilft)fYlt " 2 01 .0 fiJ~l€ = 1..ty Rate and the Tax on Bee. .5 @ (Y.44BNrr..2. 0.0 J5 3. on a case 01 beer (in 1988 cIoIlors) for 48 stole! in 1982..5 ' . (Y~ 25 \0 Bu r Ta~ (b ) 1988 mu (Dollars per case $1 988) ..1· 0.0 2..0 0. Ponel b shows the do to fo r 1988. 2..0 1. • 10 15 o 2. 1 The rroffk Fatot.86 .O. .352 CHAPUR 10 Regreuion with Ponel Data • FIGURE 10. • ·'.0 LI_~:_~_~_~_ _~_' 0. (9 .. Fuality Rate (Fatalities per 10. O .n .0 15 10 05 • • • • ••• .0 3.5 Panel 0 is a KOtterp/ot of lraf.0t.0 • ""' .o • • ~.• . fie fatality (O~$ and !he real lev<. • • • . .5 0. 5 3. ...5 l'J Betr Tax (Dollars per c ase 51988) Fa la li ty Rail' .~ o.
s.e lb~m. whether the state high\. Many fac tors affect the fatality rale.. some of these va riablcs:such as· the cullural acceptance of drinking and driving.2) and (10. J o/~ 10 these pote ntial sources of Olllllled va riable bia s would be to collect data o n a U p. Th e O LS regrc1'. .d. if they are..t we cannot meaSUf. 5' ~. Unfo rtunately.3) t~ ) In contrast to the regression using the 1982 data. J. One approac h .43) .\:he lher il is socially acceptable to drink a nd drive.1 b. If these factors remain constan t over time in a given sta te. because these regressions could ha ve substantial onutted variable bias.n the When da ta fo r each sta le a re obtained [or7 (Y J!e4. .~/.' =. the n anothe r route is available.) {isti c is 3.11) (0. To do s o..<. . it is possible to tV I Ei~ ~~ ~~uA: ~~~<. whethe most driving is rural or urban. . ~ I @a. A ny of these fac tors may be correlated \'lith IJ ..:IU. and \.. we use OLS o ._ ~ ." ~ ~ ([) / () ~. except tha t it uses the da ta fo r 1988.. \Y 1$ vmues of the depe:den~:abl. Should we conclude that an increase in the tax on be!:r leads to more traffic deaths? Not necessanly. This is done in Figure 10. we can in effect hold these ~ factors conSlan l."hese varia bles and add the m to [he annua l crosssectional regressions in Equa ITo ns ( 10. the estimated coefficie nt fo r t he 1982 and the 1988 data il.sio n line thro ugh Ihese dalil is FOiOtity~ = 1.~~~~WithTwoTime {'. (0..53 Because we have da ta fo r more than o ne year. to variable bias. including the quality o[the automobiles driven in the state. fJJ..t.r:$ ~ ~"".Periods: " Before and After " Comparisons ~ ¥o. Beca use we have panel data . trafficfatalitic. the density of cars "Oil the road.10.J:.e'. we can reexamine this rela lionship for a nother year.2 time per iods.\ays arc in good repair. =.86 + 0.2 Ponel Dota with Two TIme Period~: ·Before ond After' Comparisons 3.... not fewer.~ regression with fixed effects.. which is the same scatter plOI as before. Curiously.13) (10.. even thoug!.. PQsilife: Taken lite ral higher real beer taxes are associated with more. the coefficie nt on the real beer tax is statistically significant a t the 1% !eve I (the fst.. however. might he very bard or e\'en impossibJe to measure.3).44Beer"/(1X (1988 data).Jfl J T r tf} J O~ a~d the~ willlea~ o~illed Y ~ . alcohol taxes.
! fatality fat e i~ fir\t T ho 01 2: _[~~.d.") is the error term and. but t. might be thl.I')~:!) + 1/11Y1l11  /I'I~' (10.. 1n Equation (10. any changes in traffic falall " •.~ (10.":c\"e(. . in this regressio n model.. on~e slate tothc next but do not changc o"er lime within the state. . If.354 CHAPTER 10 Regression with Panel Dole period. "" flo + {3 tBeer"{(I. could be Foru.!.Ti"l:(.u. + 1I. . that determine traffic deaths). + {J1BeerTClxd'llt9!.I:I\'IK! + 132Z .therehy eliminating this sourcc: of omitted va ria ble bias. all: in /J • a stale. ~ ulation li near regression rela ting Z. the influence of Z.~eM stant over time..') Subt racting Equat ion (10.l9N:.Z. Z.p<c t..'li'11:0.ln3· .6) eli minates the effctl of 2..\ here ~~ ~.. they did not change between 1982 and 1988 theathey dill n~1 l... For exa mple.FiiiiiIUlRat.'"Rather. tion. and the rea l beer tax to [hI.j98S' ( 10.. local cultural attitude toward drinking and dii"Ving.·fJ.7).!> changes in other facton. IJ..!£rm 'which CilP ~ lUre. lA..  Be£'. k· and driving affect the IcveLof drunl:. Oc a varia ble tha t dctc nnine:s the fa tality ra te in the jlh state.1 ~ lies over time m ust bave arisen from other ~ources." ancJ I >= I•. fhu~ . n fatalities in the state.8mTa. II" . + lI.\ cen 1982 and 19SX.~ produce any c:1rang!J. l)JefJ ~ . 1982 and 1 9~: ' .lnq.! after" comparison in eIfeet holds constant the unobserved factors that differ from .: FlIWlity R(I//!.k fi.7) This specificat ion has an intuitive inte. ' Fatci/jfy Rml.. IQiI2 ' (IlLS) f~~ ~. Le i Z.l9AA . In othe. can be d iminatcd by a nalYling the cha nge in the fa tality ra te between th ~ two periods. ~ ~ Specifying the regression in changes in Equation (10. . not cha nge over time (so the I subscript is o mi tted ). To see th is mathemall(... . lhis "before anI.=fJ. Cultural altitudes toward dn.twee n 19M2 and 19R8:"'AccordingIy: th e p~p _ ~er".. J tftL r _ . + f3 2Z . ho.:a lly. T.. = 1.t .r word .7) eliminates the df!!c t f1~ of the unobse rved \ariables Zi Ihal arc eonstanl over Time.~ ~ Iyring changes in Y a nd X has Ihe cUect or controlling for variables Ihat arc (lIn . driving and the traffic fatality . consicJc..5) from Equation ( W. which changes siOwl~ and lbu..FtltlllilY Rmf'd9S2 = f3 1(HeerTflx.eSH 'f \.<.I'N.tvi"1r (p/?el P1 F(lta liry Rate..4) it will not prod uce a ny change in the ratali ty rate bc[\. ~... we/' Bccu use 2/ does not cha nge over time.:r Equa tio n (lOA) fo r each o f the IWO years.:/t ing . = ~."l I~ Jf..Ie ~ p.idered to be ~omi\a n t be. Tl1U5. these other ~urcts are changes in the lax on beer or changes in the error . B'y focusing on challges in the dependenl vari:J blc..wc. in the reg res~ i on model in Equution (10..
between changes the fatali ty role and changes in the beer lox . i : 0..10. .res .. ' . ~r Figure 10. is nonzero.2 0.8) . <l .. ! ().zero is rejected at Ihe 5% sig nificance level. :ing lIy.L 51. A point in Figure 10.2 presents a scatt~rplO I of the challge in the fatality rate between 1982 and 1988 against the ch(lllge in the real bee r tax belwt::cn 1982 and 1988 .2 represe nts the change in the fatality nile and the change inlhe real bee r {ax betwee n 1982 find 1988 fo r a given state.04(Beer ru·'·I~ .J.2 0.072 .065) (0.6 .l(os • 0. m e OLS regression line. for the 4$ states in our data set. 1.{)._~~_~.tti\·c..7) \ J.BeerTtLlt9W' (0..1.36) (10...1 ana ~ con whe re including an inte rcept allows for the possibil ity that the mean change in Ihe ~ fat ality ratc.5 OJ) between \982 and 1988 for 48 sfote1.~. IS 'Ilk· .. 1982.4 0..~ . The hypothesis that the populatio n slope coefricicnt is.II' ~ In cont rast to the crosssectional regreSSiOn results.¥ . This estimated effect is very largc:1ne average fatality rate is approximately 2 in these data (that is. the estimated effect of a change in fhe reall>eer lax is neg.$. According to lhis esti mated coefficient.s.l. c change in the C hange in Fatality R2 te (Fatalities per 10. .1988 This is 0 $COtterp/ot of the .on 1 88... 2 Chonges in Fatality Rates and Beer TOJ(es.R11tl962 • '¥ U~ IVII} .OOO members ofthe population")_ ~~ pl'l"" .uI9&l l and the chonge in reol beer Io.:t.Filldlit.~~ 0 I.J$~: ~#~  .~.0.g ~ FllllililyRatt'I'»J1 n = cap Idf.2 Panel Octo with Two Time Pericxk: '"Before and Aher" Comparisons 355 .[... • OA 0. • ralilfily Rme l')I'. as predicl~d by economic' theory.d fiGURE 10 .. an increase in the real beer tax by $1 per case reduces the tranic fatality nile by 1.. h : in no< 'atali· . in the absence of a change in the real beer tax. C hllonge in Beer Ta x (DoUan per cau $1988) ).u\1IIIII . estimated usi ng these data and plottctl in the fig ure. '.6) f 7..i1 .(her '. A) There is a negotive relotioluhip In .072 .BwT.000) traffic fololity rote Lu )It 1 FII~irJfiilltl 'JU .5) &.0 • n. 2 ratalities~r year pcrlO.1)4 deaths per IO'(X JO peo ple.04(BmT..
we use the method of fi xed effects regression. lion (10. and it seems foo lish to discard those potent ially m erul additional data . then their omission will pro duce oru illed variable bias. howc"er. an d XII' re~rcc · lively: v. fixed effects regression can be used when mere are two or more lime observations for each enti'ty.ch factors.2.<. This "berore and after" anal ysis works when the data arc obse rved in Iwo dll fe rent yea r ~ O Uf data set.356 CHAPTER 10 Regression with Ponel Data ~/:d ~ ?J' :t1 so the c. ing thl! rcnltax on beer by $1 per casco By cxamin ing changes i n the fatality rate over time.. Y.3 Fixed Effects Regression Fixed effects regreSSion is a met hod for controll ing for omit1 ed va ri ables in punel data when the omitted variables vary across entities (~ Iat es ) but do not change over lime. and if they chan~ over lime a nd arc correlated wilh the real beer lax. is an unobserved variable t hat varies fro m one state 10 the next but J<)t"$ no t change over time ( for example. Unlike the "be fo re and after" comparisons of Seclion 10.. that differ from one entity to the . can be c. we undert ake a more careful analy'lh that euntrals for several s.: .8) controls fo r fi xed ractors such as cult ural att itudes lo ward drinking anti driving. V. = /Jo + PIX" + /32Z ... The Fixed Effects Regression Model Consider the regression model in Equation (10. so fOJ now it is best to refrain from dfilwing a ny substa ntive conclus ions a bout the e ffect o f real beer taxes on traffic fa lalitlt'l.. tm 10. III Sectioo 10:5.next but are conSlimt over time. + " ii' (10. Z..9) where Z. These binary variablesabsorb the influences of all omitted variable. the regression in Equa. represents cult ural all ilud es towa rd dn(l\. But tht!re arc many fac tors that influence tra ffic safety...timllle suggests thaI trafficlatalitict. The fixcd effects regression model has 1/ different intercept~ one for each entity. To analyze all the observations in o ur pa nel data sct. These intercepts can be represented by a set of binary (or indicator) va n· abJes.4) wit h the de pendent yuriat'llc (Fa /(/fi ryRt/re) fin d observed regressor (BeerTtu) denoted a5 Yi..cont ains observations for seven different year\. "But the "belore and after " method does not apply directly when T > 12. half merely byincreas..
9) bUI doc<. Bt!c(luse Zj varies from one stale to the next but is constant over time..lO) can be written equiva lently as v)i1!'f ~. / O'j ~ . 1h~ in terprel<i lion ofa. . That pop ul ation regressiou line was expressed mathemat icall usi" a single binary variable indicating one of the groups (case # I in Key Concept R.9) can be interpreted as having n inter cepts. so we arbitrarily omit th e binary v~lriahl e Dli for the first group.. . rd drink treated as unk nown intercepts to be estimated.(lO. ~.Bl~ 'is the same ~~~". the pop ulation regression model in Equat ion (1 0. Because we have more than two slates.r each reach r) vari . .7). the terms .10) is the fixed errecls regression model. .10) a l '·· .10). entit ies are ~ lates ).he intercept a t in Equation (1 0. as a slalespecificinterceptin Equation. for if we do the regressors will bc perfect ly multi coll inear (thi s is th e ·'dum my vari able trap" of Section 6. the2Qpulation. that blOary vana I.~~. one for each s t ale. we need add itional binary variables 10 ca pture all the slalespecific intercepts in Equa tion (10. let 0 1. o di( years.9). vary across entities but not over time. in which Y" = {3/X" + 0'.lIe .4).3 coo sidered the case in which the observations belong to one of two groups and the populati on regression lin e has the same slope for both groups but different inter cepl ~ (see Figure K8a).w for all states. being in entity i (in the current applicat ion. '" arc known as <ntil)' fixed effects The variation in Ihe . rcspec ( 10. .10..10 ) comes from consideringr ~ .lcS' th<l1 ~' ar ia1.."n 'y fixed elfec's comes from omitted variables thaI . equal 1 when i = 2 and eq ual 0 othe rwise:and so on. If we had only two slates in our data set.9) :tlysis wing lities. The stalespecific illlercepis in the fi xed e ffects regression model also can be expressed using binary variahles to denote the indi vidual stales. To develop tbe fixed effects regression model using binary variables. "'~ Equation ( 10. ""~r. Accordingly. the fixe d effects regression model in Equation (to. but the intercept or the populatio n regression line varic::s from one _ stale to the next. tet a . Sec tio n 8.1. in Equation (10.regression_linc for the i!h state: this populatioru:egression line is . ~ Because t.3 Fixed Effects Regression 357 eas qua and ange pro ing and dri ving).BIX ir 'lne slope coeffici ent of the population regression line. Specifically. S~ . I the "II th t! 5S1 0 0. however. ~ . be a bi nary variable that equals I when i = 1 an d eq uals 0 ot herwise: let 02. + IIII' ( 10. '= f30 + IhZi' Then Equation ( 10. one for each state. . :0 sion model would apply here.10) can be thought of as the "crfecI" . a ll arc pant!l hange effects . . We c"nnOl include all 11 binary vari ables plus a conunon intercept. b. We want to estimale f3 1 ' (he effect on Y of X holding: constant (he unobserved state characteristics Z. like Z.
In both formul . the regresslOfl estimated uSihg "e. T n the first step. bUI are faster hccause they employ some mathemat ical simplifications Ihal arise in Ihe ulgebra o r fixed effects regression. )'2.ntitydC""meantd'· \"ariablCSl Specifically.:ni on X is the sa me from one state to the next The sta le specific intercepts in Equation (10. + /111' (10. so in practice this OLS regression is tedious or. wh Extension to multiple XS.13)1 can be estimated by OLS.l1) nnd the intercepts 10 Equation (10. Estimation and Inference In principle Ihe binary variable specification o f Ihe fixed eC fec ts regression model (Eq uation ( 10. + y/.f30 + )'.For the second and rcmai njng states.10) and the bi nary regressors in Equation (10. . the slope cocffici!. Doing so results in lhe fi xed errects regression mode l with multiple regressor". Ihe fixe d effecls regression model has a common intcrc~pt and II .11).·· ·.10). If there are o ther observed delemlinants of 1' Iha l are corre lated with X and tha t change over lime .iJI routines are equivalent to lIsing OLS on the (ull binary variable regref:osion. i. 11). impossible to imple· ment if the number of entities is large. there are two equiva lent ways to write the fixed effects regression model. J 1).. summarized in Kl'Y Concept 10.11) fT n w where {Jo' {3 \.Iud is .so a. l n Equation (lO. fo: an mn.. has k II regressors (the k X's. To derive the relationship between the coefficients in Equation (to. + y)D3.X II • so a 1 = /30. In Equation (10.I binar y regressors. the en tity· spccificaverage is sublrJcted from each variahle. ThUs. These spc. lhe J>OPulalio n regression equation for the first slale is Po + {3 . is fJlJ + PIX..2.! of a single rcgrC:»I. Ih at varies across slates but not over time. 1 5 . 11) have tbe sam e ~o u rce: the un observed vari able Z. Regression soflware typically Nm' putes the OLS fixed effects estimator in two s leps.111) . This regression..the " . In Equation (10.I binary \'ariables. for i ~ 2. com ider the cas.in some software packages. 'Y" are unknown cocfficienls 10 be esti mated. Equations (10. + Y2D2.Jr in the veThion of Ihe fixed efiects model in Equa tion (10. and the intercept ). + y"DII . compare the population regression lines for each state in the Iwoequations. The "entitydemeaned" OLS algorithm. In the second step.lIion!l.10).. it is wriucn in terms 01 11 sta tes[X!cific imercepts. however. 358 CHAPTER 10 Regression with Panel Dolo • Y" = f30 + PIXi. Ihen these shou ld al~o he included in the regression to uvoid omitted va riable bias. Econometric software therefore has spe cia l rou tines fOT OLS estimation of fix ed effects regression mode ls. + .10) and (10.
and II . + f3~.· ltcse special ~ressio n . .lO)an d O LS estimator {3] from th e bina ry variable specification anu fro m the "before and aft er " specificat ion are ident icnl if the intercept is excluded from the "before and after·' spt. where = Xi' . the where i = 1.11) I THE FIXED EffECTS REGRESSION MODEl ~ . 1 \ = . /111' (JO. . th ey pro duce id e ntical OLS estimates. . X" Ii. fixed effects estimation . and so forth.l2) grcssion terms o f on m odel lions. = 1 if i = 2 a nd D2i = 0 otherwise. . + . whe re D 2.. Y....~..: accordingly. ~~ . .= f3)(x..Dnj + II".. and the "ent itydemeaned·· spec ifinni on in Equa tion ( lO. + "/.and Let~. this estimator is identical to the OLS e&limator of /3.X kJ1 + a t T . . 'Cifications.10): then r !Ythat ~ = tha~Y. A lthough Equation (10.n(IO. tQuid also results in cd in Key take the avcrage o f both sides o f Equation (10.. .~.. Xz' il is the value of Iht: second regreswr. ll1uS Equationt10..:XII + U".( 10.7) (without an intercept).11) have ( no t over 130 + f3IX ). + cr. regression model can be written in temlS of a com mall intcrcep t..1 binary variables rc presenting all but one en tilY: Yu =: ~ espccific ..7).. + iI. there a re three ways to estimate {31 by OLS: the " before and a ft e r'· specificatio n in Equation (10.. .X. ThUs.14).. in the speci a l case that T = 2 the but 11 arise in the ~pically com p.. J)~~~~ 5. K£y CONCEPT The fued effects regression model is Yi/ = 10.. II rive the 'cepls in Ie in the ! f the fim :0 + f31 X ir .)... (10. = 13 ]X.D3.. that is. 11 and 1= 1. (If" Ii.Ie to imple Wusing 11  I ore has spe.( + . The "before and after " regression \IS. obtained byestimationo} th e·~xcd effects model in Equation . + .. U. (1014) !.2 f3 1X 1Jl + . tbe X's.x. + f3l. = Y" .r!. lhe fixed effccl<.:: of the fi rst regressor (or c ntity j in lime period r. are de fined ·si milarly. binary variables (Exercise HD'):': ) in practice . is the vaiu..xk j.t II"  =J ~:~L Yif' and Xi and Ii./~ D2. .11) with its binary va ria bles looks quite differen t than the "before and after" regression model in Equation ( 10. where Xu.. the binary variable specification in Equation (10.: :. whe n T 2.l0) implies 7.3 Fixed Effects Regression 359 ( 10. and so fonh.sion model :r. 13) + "/:.. Thcsc three methods a rc equivalent. t he entity he regression 'r the cast: of . T. and 0'1' •. hask+ n estimated by the OLS regression of the "entitydemeaned" vari ables y" on XI/" In fact . EquivalentlY. .11).10. • a" are en tityspecific intercepts....
8).. The fixed e ITects regression assumptions and . the estimated starei'ixedinle rccpb are not listed tOS<! I":' space anC:! bccau!'e"theyare norof primitry interest in this applica.h J ~\? ~ Application to Traffic Deaths The OLS estimate o f the fi xed effects regression lin e relating the real beer t.t:md ard errors for fi xed effects regressio n are discussed furt her in Section JU. the estimated elleHi· cient in the fi xe'd effects regressiorfincquation (l(X"~) is nt:gar~t hat.15) Ihan 11'\ EquatIon (ro. lion of the fixed effet·ts OLS estima tor is normal in large samples. : 'Irc~ . Like the "differences" specification in Eq uation (10.\ (0. standard errors. if a set of assumption'l_ called the fixed effects regrcs:.uQn.Jlecls. bifieneal neenaxeslfre associated withrewer fi affie death ~hc oppo. \) Qi ' X l.3).(lO.. in m ult iple regressio n with pa nel dat a.. whereas the {ixed eJfc. as pr.. O. then thc sampling distribution of the OLS estimator 1\ normal in large samples.! . (1I115) where. to the fatali ty rate.4 hold. '111e variance of this sampling distrihUlion can be CStl.!oite of what we roundTn the initial crosssection~eglession. de nce intervalsproceeds in exactly the same wa y as in multiple regression \\'Ith cross·sectional data.2Q) + S((lt(·Fi. the sw ndard crrorcan be used to tcst hypotheses using a (statistic and 10 cun.ltor of Lhe variance thOi t is.ion assum ptionshold.2) and (10.. and the square fool o f thh cstim. based on all seven years of data (336 observation s). · dieted by economic tbeory.''i. the s{j uare rOOI o f Ihat estimatur is the standard error.he sampling distribu. the difference between those I\\'O years). is hilaiitl~ . Similarly.15) u~s (he data for all seven years. .8). Given the standard error.umptit)O\ in Key Conccpt 6. and statistical inference. mated from Ihe data.8) uses only the li'!h.:t ~ r~gress i on in Equ:ltion (10. then !.5.. statistical inference.. Because (lIth\! addilional Obscrvation~thc !1I'andard e rror is smal ler in Eq uiltioo.. as is conventional. Slruct confidence intervals. (or 1'982 and Il)~ (specifically.. the variance u{ that distribu tion can be cSl il11uted from the data. In mu l1 iplc regression with cross·sectional data..H. The two reg.w Jl. of Equations (iO.te'lting hypotheses (incl uding joi nt hypotheses using FstatisLics) and construcling confi ." ..360 CHAPTER 10 Regression with Panel Dolo The sampling distribution. Cj .ressions are no t identical because the ··Jif· {erences·· regression in Equa tio n (10.66BeerTu. and the standurd error can be used to construct !·statistic\ ami confidence intervals. if the four least squares as.
then i1 could be picking up the effect of overall a uto· mobile sa fe ty imprO\Cmen ls.10.4 Regression with Time Fixed Effects l 1 r lax 10 lust as fixed effect s for each e ntity ca ll conlrol for variables thai are constanl over lime hut differ across entities.ng.9) can he modified to include the effect of automobile safet y. Still. If. and where Ihe single "t" subscript emphasizes thai safety ch anges over time bUI is constant across states. lcads to the fi xed effects regression model in Equation (10. rematns. Although 5. Beca use safelYimproveme nts in ne w cars a re inlroduced nationally. = ( Ill.. which we will de note by 5.! 5) 130 + {3\X" + fJ1Z. Because 13.\S. In the entity fixed effects model. is unobserved. if is plaus ible 10 think of au to mobil e safe ty as a n omitted va riable Iha t c ha nges over lime but has the same vsl ue for all states. is unohserved. a skeptic might suspect tha t there arc other raclors tha I could Icad to omi tt ed vtl riabJcs bias. + (3. however. in . l!. then we can elimina te the ir in fl uence hy includ jng lime fixed effects. so thai I~e te rm IJ2Zj can be dropped fr om Equation (10.li? {vJ 1/u!' "1.16)... suppose tha t t~e va riables Zf arc not prese nt. safely impro\·emenls evolvcd over time hut wcre the same for all sla tes. its influence can be eliminated because it varies over lime but nol across states. e where S. is correla ted with X". The populalion regression in Equalion ( 10. just as il is possihle 10 elim inate the effeci of Z. they serve to red uce traffic fa talities in all sta tes..tribu ce of Imator Including sta te fixed errects in the fatality rate regression " lets us avoid omit ted variables bias arising rrom omitted ractors.10 ).: Yi . k1 cO¢ffi Time Effects Only lhe ··dir· d 1988 e{icC~ sc of tlte 1 ) titan in For Ihe moment . For example. Our objective is 10 esti mate 13" conlrolling for 5. the presence o f Z. that vary across states but are constant over time within a state. 16) to sa. So. listies esting confi n with effccts 10. which varies across state!:! but not ove r lime.15.. then omitting 51from the regression leads to omi tted variable bias. so can time fixed effects control for variables that are constant across e ntities but evoh'e ove r time. represents va riables thai determine YII' if S. alt hough the te rm /335. over this period cars were gellin g safer and occupants we re increasingly wearing seal bel ts: if the rcal tax on bee r rose on ave r age during the mid1980s. + /fil' (10.4 Regression with Time Fixed Effecb 36 1 In 'lions or is csti ·that con ns . such as cultural attitudes loward drinking and dri\.
and time fixed effects regression model is Y... (lO.. in (his version o f (he rime e(ft:cts model tbe intercept is includeo. + II tr (1 0."" as lime fixed errcds. too. 17) b.. Ihe intercept A . "~ Th is model has a different intercept. varie" over lime but not over s tates. When tbe re <I re additio na l o bserved "X" regresso rs.a.\ ' regressor is ~ Yu· ""P~>'. + Ih8T. because S. leads to a regression model In which each time pe riod has iIS own intercept The time rh:ed effects regression model with a single . while others nrc constant across sta les hut vary over lime (such a~ national safety standards) ... Just as the ent ity fix ed effects r.. vary over lime bUI not across en tit ies.) is omilled 10 prevent perieci multicollinearity. ~ us to eliminate bias arising from omi tted variables like nationally in troduced safe l)' standards lhat change over lime but are ( h~ same across states in a given year.. l)rare unknown coefficients. in Equa tion (10. 10 Regression with Panel Data wh ich cilch state has its own intercept (or fi xed effecl). then It is appropriate to indude 1>01h en tit y (SCLtl!) (flit! tim e effccts.. Both Entity and Time Fixed Effects If some omitted variables <lfo! constant over time hut vary ~H:r os s states (such . and the firs t binary variahle (B l . .. the n Ibese regre~so rs appear in Eq ual ions ( 10. and so fortI) . like 5.::gression mode l cnn be represent cd using n .I binary indicators: t YII = Al + (31X . tne presence o f S.. Similarly. = {J IX u . Tht: combined entit)..\..1n c \<lri. r t () otherwise.e ~ f.18) as wel l.16). + 111/' ( lO I S) when! 8 2.. the lime fixed effecls specification a llo\. As in the fix ed e ({ects regression model in Equ ation (10.17) can be though t of as the"cffect"" on Yor )'ea r I (or. caD the time fixcd effects regression model he rep resented using T ..I + ~B21 + .so the terms " \ .I' cul tural norms )..IY) .. In Ihe lraffic fatalhies regression.1 bin ary indicators. A.362 CHAPTER.11 ). for each time pe riod.17) and ( 10. ATare kno. ation in the time fixed effects comes from omitted variables tha i... and where 82. time period /). more gencr_ a lly. so. in Equation ( 10.1 if t = 2 and 82. + AI + /1 //.
20) as..17) A. is the entity fl.I time bi nary indicators. Fin<llly. in 11 balanced panel the coeffici e nt s o n the Xs can be computed by first deviat ing Ya nd the X s from their ent ity lind timepe riod means. estimated using data for the two years 1982 and 1988. provides the same estimate of the slope coefficient as the OLS regression o f FmalilyRmt! on Beer Tax.~ + 12D2.l . is the time fixed dfecl. including entity and time fixe d effects.25) TimeFixe(IEffect ~. . including the intercept in the r~es: SiOD~ThUS Ihe "before and afte r" regression re ported in Equation (to. which is common ly implementcd in regrcssion so ft wa re. 'Ibis modelean equiva lent ly be represented using n . 131 TarC unknown coefficients. if T = 2. + . Thus their coefficients can bo.2 1) .= estimated by O LS by including the additional time binary variables.19) and (10.•• y".. Estimation. well. in ener \Can S.. along wi th an intercept : y~ = lJo + f3IX. Application to traffic deaths. in which the change in Fmalily Rate from 1982 to 1988 is regressed on lhe change in Beer Tax from 1982 to 1988 including an intercept. .uL~tion Ip... then cstimaling the mUltiple regression equation of deviated Yon the dc\'iaied X~ Th is algorithm. in where /30.8). The time fi xed eCfects model and the enti ty and lime fi xed e[fccts lO. + III/' (10. 19) ~ regression res ults in the OLS estimate o f Ihe regression line : " ityRme = O. ()2"" . IS) :2.I en ti ty binary indicators and T . I ill where a.20) 0 . )'2" . then the se appear in • using rep Equation (10. Addin g time effect s to the state fixed effects ( 10. + (0. + BrBT. el im in ates the need 10 construct the fu ll set of binary indica tors t hat appear in Equa tion (1 0.xed effect and A.10. the X's.a" When there are additional observed " X ' r~ g rcss ors.. "Ine combined Slate and time fix ed effects regressio n model eliminates omit ted variables bias arisiogboth from unobserved variables tbat are constant over time and from unobserved variables that are constant across slates.. + aiB2. .proach. = 0 ualion la nd the ressors lIows us d safel~' year. and the time indicators from their state (but not time) means and to estimate k + Tcoefficients by muhiple regres sion of the deviated Y OD the dcviulCd Xs and the deviated time indicators.ap. {such as Im~ {such ltV (state) model are bolh variants of the mulliple regression mode l. Aller native ly. ( 10. + Yn Dn. + .64BeerTtu + SlOteFi.20 ). the entity and time fixed effects n:gression can be estimated using the :'before and aftcL:. An equi valent approach is to devi ate Y.'«('(IEjjecf." Regression with Time Fixed Effect" 363 des.
L •• 5.15) and (10.4.~. ·In c fu st fo ur of these assumptions extend the fou r least squares assumptions for cTosssectiona l data ( Key Concept 6..56). and the coefficient on the real beer tax remains sig..0.364 CHAPTER 10 R egression with Ponel Dolo This specification includes the beer tax. To keer Iht. so this specificat ion could still be subj ect to omitted variable bias. in which there are no time effects.lI~d for crossscctional data in Key Concept 6.25 . the fifth assumption is impbu· sible.6 therefore undertakes a more complete empirical examination (If the effect o f the bee r tax and of laws aimed directly at eliminating drunk drivi ng. this section (ocuses on the entity fixed effects regres sion mode l of Sect ion 10.. c they are not of primary interest. 47 sla te binary variables (siale rued effects ). rh~ first four of these assumptions extend th e fO UT least squares assumptions. However. in wh ich case a different standard error formula should be useo. many import ant detcn nimlnts of tra ffi c dea ths do nOi fall into Ihis c. This esl imalcd relatio nshi p be lween the eeaJ beer tax and traffic fata l il ie~ is immune 10 omitted vari able bias from variables that are constant either over time or across states.4) to panel data.6 )'ear binary variables ( lime fixed effects). .21)]. a nd an in tercept. so tbal lhh regression aClUally has I + 47 + 6 + I = 55 righihand variablelo! The coc (ficienh • on the time and slate binary variables and the intercept are nOI reported becau. For m Xl. 2.64 /0. to panel data. £ Incl udi ng the lime e ffects has li llie impact on 1he estim ated relatio nship between the rea l beer lax aDd the fatality rale [compare Equations (l0. we fi rst di!K: u s~ the assumptions underlying pa nel data regression and the construction o f standard e rrors for fi xed e ffects estimators. These heteroskedasticityrobust standard errurs are valid in panel data when Tis moderate or large unde r a SCI of five ass um p t h)n~ cnl led the fi xed effects regrcssion assumptions.nificam at the 5% level (/ . ( di 3. Section 10. The Fixed Effects Reg ression Assumptions The fixed crfccts regression <:I ssu mplio ns are summarized in Key Concept I O. controlling for a variety of factors.2. In some panel data 5enings. 10.3.'1tcgory..Il" .ual heteroskedasticityrobust formul a.5 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects R~gression The standard errors reported so far in this chapter were com puted using the w. .. The fi ft h assumption req uires the errors II" to be uncorrcl:l!ed over time for each enti ty. sl. THE There erfect cion:!> I. Before turning 10 that study.' nOlal ioo as simple as possible.
l2" .4 .tand this assum pt ion is to recall thaI II" consists of timevary ing facto~ that are determinants of YI/ hut are not includcd as regre~ors.r lime. i = 1.1.i. Large outliers.. . . draws from their joint distribution.. which do not ha .. X u should be replaced by the full list X UI' X 2.d. ..I' X. S. The fifth assumption is thatl hc errors /I I' in the fi xed effects regression model arc uncorrdatcd ove.. . a winter with more snow thllO average for Minnesota. COV{1I11' li/tI X i! ' X !2 •. o) = O. Th is ass umption is new aD o docs not arise in crosssectional data. ~ The second assumption is that the variablcs for one entity Ilre distribuled 'iden tically 10. given all T values of X for that entity. This assumption plays the same role as the first least !iquares assumption in Key Concept 6. One wuy to unden. 11.2).tha t is. Stated for a single observed regressor. specifically.d.• X i"(' a il = 0 for / "* S. because there already is a "Minnesota" fixed effect in the rcgressioncould result .. 11 are i. ... In the traffic fat alities applicalion.). 3. . X j2• ··· • Xj"/". ···· X k. (X.. XiT• II/I ' !ll2" . condiljonal on the regressors. ~ There are fh'e assumptions for the panel data regression model with entity fixeo effects (Key Concept 10. The errors fo r a give n entity are uncorrelated over time.• II. Like the second least !iquares assumption in Key Concept 6.i. the five assumptions are as fo U ows: 10. There is no perfect multicollinearity. E(u"I X Il . The th ird and fou rth assumptions for fixed effects regression arc analogous to the third and fourth least squares a<. the variables for another en tity: that is.. across entities for i := 1.sumptions for crosssectional data in K e ~' Concept 6. 4.4 and implies that there is no omitted variable bias. arc unlikely: (Xlt • UI') ha\'e nonzero finit e fourth moments. c <I time dime nsion. conditional on the regressors.1I' The first assumption is that the error term has conditional mean lero. For m ultiple observed regressors. 2.. one such factor is the weather: J\ part icula rly snowy winter in Minnesota. but independently of. the vari ables arc i. the second assumption for fixed effects regression holds if e ntities are selected by simple random sampling from the population.10.3 ~ I. S The Fixed Effetts Regression A$$umptioos and Standard Errors for Fixed Effects Regression 365 THE FIXED EFFECTS REGRESSION ASSUMPTIONS KEY CONCEPT ..
If lI u is autocorrclated. thus reducing traffic fatalities for two or mOre ) cur.lrd errors will nUl be valid becllUM! they v. The fifth assumption migh t not hold in some applications.tP""iI" vU~~. then the errors It.llterns ~Iowly adju!l1. : II 'elJ#Y" . conditional o n the regresso ~ ( the bee r IHX) and the state (Min n e~ota ) fixed e Uect.p~(.. which rx:r. then lhis om il1ed variable (snowfall) is uncorrela lCd from one year to the next. Staled more gene rally. lhe error teml over lime . pM SLt r'1 JA4/'~ Y' J vJtY'~ I ~M:sWted as req uiring a lack of autocorrelation of /l lf' condit ional o n the A San d [he . If Ihe errors are aUlOcorrelated. then lIu is uneorre latcd [rom \car 10 year. St andard errors thal.4) the homoskedasticityonly standard errors are nOt valid becJ U~1. conditional on the regressors.:r time within an entity are referred to a . if Ii" is corre· laled ovc r time for a given en titytheulI" is said 10 be autocorrelal ed (correlated with ilself. Simi.are valid if llu is potentia II) heteroskcdllstic aod po~ o' tially correlated o\'. One way to see thisi':i to draw an analogy to hetero':ikedasticity.ge then ove r timc. ~ ~ntity fixed effects./'? r~e. ln a rCJlr. conditional on the regressors. Omitted fac tors like these.It' Y"""' .> in Mjnnesota tend 10 follow in !\uccession. Appeudix 10.> look for n~w jobs and commuting p.' they werc derived under tbeJalse assumption of homoskedaslicilY.> as the worker. will prod uce corre \atlon If u ti is correlated with UI~ for different values of oS (Ind Ithat is. In thiscase"':"ifT" the u':iual (hctc roskcdasfiCityrobusI) standard errors arc valid . th en (as discu_sed in Section 5. For exam.. Tht standard errllI" .. . Standard Errors fur Fixed Effects Regression }f Assumption #5 in Key Concept 10. Si milal'l j . .. jf the errors in panel data are autocorrelaled. if Uti COllSisLS of random factors (such as snowfall) that are unco rrelated from one year to the next . if tbe errors are hctcroskedast ic. v \' \ .366 CHAPTER 10 R egrenion with Panel Dota in unusually tr":<lcherous driving and unusually many fat(llllccid c nt ~ If the amount o f snow in Minnesota in one year is uncorrclate{) wit h tht! amounl o f snow in Ihe next year. and the fift h assumption holds.isl .2 provides a mathematical explanation of why tht' u ~ual standard errors arc nOl valid if the regression errors are autocorrclaled. over mu lIP e years.then thai omi tt ed factor would be corrclated.:' · sian with cross·sectional data . ·n )Us. pie. lariy. cre derived uodcr thejaiscassumplion that they are not aJlo" correlated.then the usual slandarderror formulft i ~ not valid. however.. at Jiffere nt dates) or serially correlat ed . a major road improvement project might reduc~ traffic accidents nOI only in the yt:!ar of completion but also in future years. Autocorrd a tion is 3n essential and pervasive feature of time se ries data and is dbcussed in detail in Part IV. if unusuaU y snowy winter.3 holds.tV.. hClerosked asfid t~ · and aulocorreilitionconsistent (HAC) slandard C!rron .A dowl\t urn in the local economy might produce layoffs and dim inish commuting traffic. then the usual stand. . . Assumption #5 can be lyJt~' . ~~ . m . then Assumption #5 fa/ Is.: are uncorrclated is moderate 'or lar.
loll.sttcand pvalue... the cluste r consists of the observatio ns for the: . I • . analysis to st udy the cffcct on traffic ~fatalitiesordrinKing law~ (im.36) and the column (1) t!<... lbe regression"R! jumps from 0.. .. suggesls Ih<lt the positive coefficient in rcgrc<.es inaeoses traffic fata lilies! Howe\'er. In addition.11\e resu lts in columns (1)..ions that include regressors rep ~. The formal of the table is Ihe same as that of th e ta bles of r e gr es~ i on re sults in C haptt:rs 7.P'll rcsuh in omi lled variable bias.. 1 presents result s for the 01.21)]..n..6 Drunk Driving Laws and Traffic Deaths A lco hol taxes arc on ly one way 10 discourage drink ing lind driving.\i not in the ~amc duster.AS time fixed effects. ~ srant' lbis is done by estimating pand d~lIa regrcs!l.. which includes stale fixed dfecis.'\ion (1) is the resuh of omit ted \'ariable bias (the coeflJdcnl on thl! rea l beer tax i~ .fIW'~: The result. the cocfftCtenron the real beer tax io.6 Drunk Driving lows and Traffic Deaths 367 presenlcd in Appendb: 10...(3) are e~ £IL.posiril'e (0.:rror. whiCh are one t}PC of HAC ~tandard error<. ~ . <1. ~cau"c they allow the errors to he correlated within a duo..5 % level: According 10 this esti mate. ~.cd~ In this section.!} arc uncorrelated [or . 8.\i.S regression of the fatal ity rate on th t! real beer tax without Siale and ti me fi xeJ effecls.ter..(Jmilting state economic conditions also cou ld frt.timate is sta I~tic. and a ~ tat l. clu~tercd standard errors should I"lc used. F~t.O. ... ..10.ally Signi ficantly diffe rent from zero at the .lme entity at all times I I._ resenling other drunk drh ing 13\\5 and state economic conditions..J .lhe regressio n in column (2) Ireported previousJ) as Equation (10.'luding beer taxe~J.·s when ti me effects are added.21 and (10. In lh~ cJ ntcxt of au!ocorrdatio n in pA nel dOi ta. the state fix ed effects accoun t for II large amount of the variation in Ihe data.311. 10... Column ( 1) in Ta ble 10. . Lill ie eh ungl.090 10 OJ*l9 when fixed cffecls arc included: evidently.1.15)]. .~ ' in their punishments for drun k d riving. I : When lInjscorrel hued over time. f r l. darderror.. OlS reported in colum n (3) [reported prev iou. arc calltfd clu."2. because vehicle use depends in part on whether dri· vcrs have jobs and becuusc tax changes can reneel economic condilions (a l)tale budget deficit can lead to tax hikl!~).. but assume that Ihl. If M).. hollJingcconomic conditjons con A _ . or grouping.ly as Equation ( 10.L!reporlS a differen~ r~grcssion and each row r~porlS a ~OCfficient e~lilllale a~d slan ... As in the cro~ sectional regressions for 1982 and l~AA rEquation!l (10. increasi ng I:k: ~r I. .66). and 9: Each colum n .tmitting these laws could produce omitted variable biAS in the O LS estimator of • ~~ h e cffeci of real beer taxes on tra ffic fa lalilies.! th:1I cr:'leks down on drunk dri ? \ ing could do ~ across the hoard by toughening laws a5 wdl as ra ising (axt:!).~ are ~ummarized in Table 10.lered standard erro. we exte nd the preceding.al. cven in regress ion ~ with stale and tI.. or othcr IIlformatlon aboUt the rcgres. States differ ~.r /tPe .~.
~ • v :.r::""T"" [ 08 ~~ d2..~S C'''  ':: >o~r. • • . r c • o .: % • • E ~ c o " ~ • o ~.
and so forthbeing important determinants of thl.0. [neluding theilddili~. the esti mate is quile imprecise: Because the standard error on this coefficient is 0. This said.o.historical Bnd cultural factors. . 'illesec ond set of legal variables is punishmen~ associated with the first conviction for driving under the influenceof alcohol.696. tax: because the average rcal beer tax in these data is approximatcly $0.50 increase (in 19M dollars) in the beer tax is a decrease in the expected fat ality rate by 0.imum. This estimated e ffect is la rge: Because the average fat.22.50 ::!: J. The minimum legal drinking age is estimated to have very lillie effect o n traf fic fatalities. a reduction of O. 19. relativc to the r. Ooe way to ~vtlluatc th c magnitude of the coeffi cie nt is to imag· ine a state with an average real beer tax uoubling it!.000.and the logarithm of real pmdollars) personal income .this en tails increasing the tax by SO. the 95% confidence interval for this effect is . The estimated coefficient (0. with a pvalue of 0. TIle next three regression s in Table 10.2).48. the effect of a $0.50 = 0. The three mea sures o[drivi ng and economic conditions ltre average vehiclemiles per driver. capita (using the logarithm of income permit! the coefficient to be interpreted in terms of percentage changes oi income: see Section 8. The firs t set of variables is the minimum legal drinking age..1 include additional polential delermi millts of falality rates.96 x 0. either mandatory jail lime or mandatory community service (the omitted group is less severe punishment).22 x 0. and 20 (so the omined group is a .! variation in traffic fatalities across 'Hates.45 X 0.ality rate is 2 per 10.50 = (0.50Icase.23 dCilih pcr 10. incl udes two SCIS of legal variables relaled to drunk driving plus variables that control for the amount of driving and overall stale eco nomic condilions. This wide 95% confidence interval includes values of the true effect that arc very nearly zero. According to the estimate in column (4) . Mo reover.lega\ drinking age of21 or higher). attitudes toward drinking and driving.50 /case.e.0 1).10. repo n ed in co lum n (4). 2. the unemployment ratc.2 ~ cor responds to decreasing the fatality rate to 1.44."al variables reduces the e$fimated coefficient on the real beer lax.6 cOnSis[~Dt Drunk Driving laws and Traffic Deaths 369 " with tbe omitted (ixed factors. rep resented by three binary variables for a minimum legal drinking age of 18. The joint hypothesis that the coefficients on the minimum legal drink ing age variahl es are zero cannot be rejected I I the 10% significance Icvcl:The Fstatistic test ing the joint hypothesis thm the threc coc(ficie nts are zero is 0. along with ti me and slate effects.000.0.000.i.45) continues to be negative and statistically significant at the 5% signif icance level.egression in column (3). population density.77 per 10.m. TIlC regressio n in column (4) has four interesting results. 1. TIlt! base specifica tion. the estimates arc small in mag / . general road conditions.45 x 0.
2':1 ). fOr inl~r prclatioll o r th is coeffici ent) .1 repon regressions th ai chec!::: the Se n"I\I\' ~ I /J I . _p [k.e m the .. High unemployme ll t ralcs a rc associa ted with rc\O.of O. ("()n::. TIle regression III 1:01· ~*". n ..:ancc Icvel '(the F·statlstlc IS 38. ('he other codficlents: the sensitIvity oflhe estimated beer tax coeffiCIen t 10 Indud ing the econom ic varillbles.1)63 death ~r IO.ltlon. For cltamplc.nificanll~ diffcr~nt from 1ero al th~ l O'li'. holding the other factors in the rcgn. lOdica tes that the e<:onomlc variables should rem. but no appr~ci<l blc chan 1!c In ~r.~u mn (5) drops the \ar iabks thai control for economic cond~tlons.. The two economic vari ables art: jOin tly '\ignificMt at Ihe0.1 is Ihc reg.hUercnt functional fonn for the drinking age (replacing the thr~C indicator variables ""ilh.er fa talities: An incrca\c in the unc. (4) and (7) art: the same. so a I ~ increase in real per capilli income is associalCU wilh sin increHsc 10 I!"df. ~ ~~ · ' 1111':: finol column in Tablc 10.[ 370 CHAPTER 10 Regression with Ponel Dolo n. "Ion constant.:au.5 and Appe ndix.lOg a l. The economic variables ha\e considerable explanatory power ror traffic r.r~ t.1 % gmlu.>e of im:rcused tfaffic density when theunt:mployment mle is low or greater alco hoi con!>umption . 4.ults 01 regrc~ion l4rare norscnSiti\'e to these chan!:\~:>.lImal cu to' reduce traffic (atnlittcs by 0. Jy of these conclusions to changes 111 the base specifica ti on. The coeffitknls on the firsroffcnse punishmcOl "ariublc~ are also cslima\qJ 10 be small and jointl~ insig.. good ceo· nomic coO(.17)..v/ r '.litions are associated with higher fata lities. ll1e estimated cocftJdl'nl~ ~] .02S deatb pcrlOJXXllhan a "t<..} uf lhe results to u::. hut wit h ~Iu" ~ /. pun· \/ ish menl \ariahk!>." Columns (5) and (6) aCTable 10 .ludc. in 1.'s_ . The rc::. .thc conclusion from regression (4) that the coefficients 011 V. a sto le wit h a minimum legal drinking age of IS is C\l j.equcntly. Th.lln In [ the ba'>C ::.81.OOO.ression ofcoTumn ( 4).ith high fatalities: The coefficient i!> 1.U~SSIOn in'column (6) examines the sen~itiv. ficicnts on those variablcs. perhaps ncc.'d with tbe sta tistica l signi fica nce of thc H~f. fie fat ulitics. ities. i?coilimm. com binl.000 (see Ca!>e I in Key Conccpt t! 2.. ~:IM . A ccording to these esti mat es.drinking age il::.df) and combining the two bina t.:.thc ...peclllc. ....V~ f.lIC \\Jlh aminimum legal drinking age of 21. 3.A ~{0 .:>:timateU effcct of the real beer tax. hen income is high.2.'Simiiarly.."'gnui.. rcrcd standard ~rrors thut allow foraulocorrelati~n of thc error term wilhin an entity. high vulues 01 r~al per capita income are associated \\. as d~ussed in Section 10.'01 ~_ .<llSI death per 10..llaJ.Therel!l... llle I ~ustered standard errOfS in c()iumn (1) arc larger than (he standard crron. the only difference is the slandn rd errof".mpioymeDt rate by one percentage point is e<. canee Ic\el (the Fstatistic is 0. mated to have a fatalilY rate higher byO. umn(4). J J . .: res ult IS an ..10. ~H/IJ~crea. ..
. there is somc cvidence thatlllcrcao.~u .. ..ing) OLdo.7 Coodusion 371 . hul. Ibese resu lts present a provocativ'e picture of mca~urc<.... As always..f so.tically significant also obtai ns u~ing the HAC standard errors in column (7)...<. A subtler po<...intly siti..0. the re"lta x o n beer.. pcrhaps in response to political pressure.e O~I g :md traffic fatalities. but there ure no qualitative differences in tht! \WO set!> of J"statislics and p ..'" s ugg.. One potential source of o mi tt ed variable bias is that thc measure of akohol taxes used here.ociated \\ ith puhlic eduC<lltoll campaigns.O.. as mcasurcd by the real tax o n beer.10. is imprecisely estimatcd . in the rcal heer ta\": could pick up the effect of a broader campai~n to red ucc drunk driving.. .<o:t:.~~) . pun ang~ In contrast.ing :llcohol taxes.. I. ard drinking and dri ._ M."1. neither stiff punishmen ts in tbe minimum legal drinking <lge haH' important cffccts on f:1lal itie~_ r~ cod· ain in oflhc three . According 10 thc~c c:.20). T11. 0.t 10 beer.1.. to.me Fstatislics in column (7) arc smaller than thost! in column (4). chan~e:.QpI.. ( .. Thc magnitude of Ihis effect. does reduce traffic death..~ 10.. .89.lVl!r lime (like culluml itlliIUd~:.si hihty IS that hlk~ III the real beer tax could be a<. could move with other alcohol taxes: th. ti· ith '\'!s drun k driving laws and legal d rinking age.!. it is import ant to think about possible threats to validity. and the interval computed u. In Io::i mmg JnOn' arout drunl dri\ on" .". to COll\rol drunk dri nor incrt! ase ~ aleo· ..: in co l nlS on be used to control for unobserved omilled \ariables thnt differ across entitics but : u ~ou lire mlcreqcd in seeing funher anatY\lI' of Ih~ da ra.ests interprding the results as pertaining more broadly than juo... ~ The Mrengtb of this analysis i~ that including Mate nnd time fixed effects mit igates thi:: threat of omitted v'ariable bia\ ari'1Ilg lrom unobserved "ariables that either do not change I.oJ).rs.ik.. wider than Ihe interval [rom column (4 )_ ( ." col i!lan gein dud· . ~ ~ ..09.valuc~_ One substantive difference between columns (4) and (7) arises because tht: IIAC standard error on the beer tax cod ficient i!l larger than the s tandard error in column (4)_ Consequently..limalet\..: Ruhm ...ing the II AC standard error includes L~ro_ led tfi· tal· An d 10 real I. ..."C JntcrCloh:d IOd .e of V ... io... however.so traf· for ceo I. 11~llh" <. the 95% con fidence interval for the e ffec t on fatalili e!> of II cha nge in the beer tax using the IIAC ~ I andard error.u~.7 Conclusion l b is cha pte r showe d how multiple observations ove r time on the same entily can ! h clm in an Icien l " . bm\c"cr. an: not stuti:."'(:onomlO of akvbot mOR !lC~rall~.not \ary across "'ah~s (like ":)lJfdy innovationsJ.
:111 1' ties bUI vary ove r lime. then any c. . stant acro~s entities. a nd so forthwhere eac h e ntity is observed a t two or more time periods (11. PC('lp1e. you need panel data . tiple regression mod el of Part 11 can be extended 10 include a full se t o( entity binary va riables.I e ntities. t¥>viously. the mul. 4. plus the observable independent va riables (the X's) and an in lt:r· cepl . Panel data consist of obser vations on multiple (n) entities.)1 lie elsewhe re. The key insight is Ihal if the uno bscncd variable docs O(lt change over time. which often are not available. A regression with time a nd e ntity fix ed effects ca n be estimated by incl uding binary variables for 11 . bina ry variables fo r T .j'mmar y r / J/ I .¢ are cons tant over li me.hanges in the dependent variable must he due In influ_ e nces Olher than these fixed characteris tics.\:pld na lions for c ha nges in the traffic fatality ralC over those se\'cn years mo. Time fix ed effects control for unobserved variables Ihal <Ire the same acros> .I 372 CH APTER 10 Regrenion with Ponel Doto g»?~ V<""~ gtJI. A twist on the fixe d effects regression model is 10 include lime l'i xed e ffccl~ which conlrol for unobserved variables thai change over lime but (lrc con. Whe n th ere <Ire two time periods. . 2.1 entities. Both en. Entity fix ed effects regression can be estimated by induding bina ry variable" for /I . Th us there remains a need for a me thcxi that call eliminate the influe nce of un observed omit· ted variables when panel data methods cannot do the job.t jty and time fixed effects can be included in the regression to control for variables that vary across entities but are constant over ti me and for variablesI hat vary . the topic of Cha pter 12. panel da ta methods requi re panel data . pl u~ the: }(' 5 and an inte rce pt. Regressio n with entity fixed effects controls (or unobserved variables Ih. And. With panel da ta. To e xploi t th is insight. then I!.A powerful and general method for doing so is instrumental va riables regressio n. over time but are constant across en lilies. which can be c ~t i m LlIcd by OLS. If cul tural a ttitudes toward drink ing a nd driving do no t change appreciably over seven years within a slale. you need data in wh ich the same enl ilY is observed cli two or more time periods.. fixed effect regression ca n be estimateo hy a " before and after" regression of the change in Y from the fir st pe ri od to the ~cc · ond on the cha nge in X.stutes.1 lime periods. 5. Despite T hese virtues. entity and lime fix ed effects regressio n cunnot control for o milled varia bles that vary hOlh across entities ami over time.lI diffe r froO) one e nt ity to lhe ne xt but remain constant over time. 6. 3. firms. this is the fixed effects regression model. ~~e'. tha t is.
. 1 W hy is it necessary to u. ' Ille rese. er.e two subscripts.m e specific vanabk:s tha t might be correlmed with education and earnings? How would you control fo r Ihe!>e p e rsollspeeific and tim e !>pecific I.2 be used to csti mme the effect of gender on an individ ua l 's earn ings? Can that regres · sion be USed to esti mate th e effect of th e Illtlional unemployment rale on <I n individual's earni ngs? Explain . N.mel (350) unbalanced pa ne l (351 ) fixed effects regression model (357) entily fixe d effects (357) time fi xed effects regression model (362) time fi xed effects (362) e ntity and time fixed effects regression mode l (362) a Uiocorrelatcd (366) seria lly corre lated (366) h e tc rosk cda~" icilY· a nd a utocorrelation consistent (HAC) standard e rrors (366) clustered sta nd<ITd errors (367) Review the Concepts 10.. e ducation.1 million people.ey has a populat io n of 8. and age.trchcr is inte rested in Ihe e ((ecI of education o n e arni ngs.3 Om the regres!>ion that you suggl!slc d in response 10 q uestion 10.1.. G ive so me exa mples of unobserved personspecific varia hies tha t are cor re lated with bot h education anJ earning'$. Conslruet a 95% confidence interval for you r an~ ..1 'Illis question re fers to the drunk d riving panel dal<l regression !>ummarized in Tal:llc 10.... to describe pane! data'! What 10.!ffc:c1S in a panel da1a regression? 10. Usc the results in col umn (4) 10 predict the number of lives Ihat would be saved over the next year. Suppose that New Jer!>Cy increased (he tax o n a case of beer by $1 (in $l988)...2 does i refer 10? Wha t dOel. ge nde r. Exercises 10. 1 refer to? A researc her is using a panel dala sct On n = 1000 workers over T = LO years (from 1996 to 20(5) tha t contains the worke rs' earnings. C1n yo u tbink of e xamples ofti.Exercises 373 Key Terms panel data (350) on lanced p.i and I. New Jer.
Entit y I in time period 3? c. Enlity J in time period 3? .. e red its d rinking age to l K Use the results in column (4) 10 predict tht. The es timate of the coefficient on bee r tax in column (5) i$ significant at the l % leveJ. 11 ). change in the number of traffic fatalitie!'. f. Entity 1 io time period I? b.. dence interva l for your answer. H ow wou ld you test th i!'.. except with an additional regressor. Should time e ffects be include d in the regression? Wh)' o r why not? c.(" On. + ". a nd Xop as a perfect li nea r func tio n of the others. Use the results in column (4) to predict the change In the nwnber o( traffic fa talities in the next year. expre ss one of tbe va ria bles 0 1" 02.t = I (or all i.4 Using the regression in Eq uation ( 10. It. what is the slope and intercept for a.374 CHAPTER 10 lI:ogreuionwi!hPoneIDoIo b. S uppose tbat 1/ = 3. hypo thesis"? (Be specific aboulthe specifica Lion of the regress ion and th~ statistica l test you WOuld use.3 Section 9 . + u". d. Su ppose tha t New Je rsey low.2 gave a list of five pOlelllialthreats to the interna l \lalidit~ l) ( a reg ress\on study. + . D Ii: that is. a different effect on traffic fatalities in the weste rn states than in the o the r ~tate~. c.6 and the re by draw conclu sion s about its internal validity.2 Conside r the binary variable version o f the fixed effects mode l in Equation ( IO. where XQ. Does this mean t ha t the estim ate in (5) is more reliable? f. that is. c.ll ) . A researcher conjectures that the une mploymen t nHe hm. b. D3i . The d rinking age in New Je rsey is 21."ll a na lysis in Sec tion 10. Show lhe result in (a) for genera l n .. Construct a 95% confidence ioterval [o r your a nswer. in the next yea r.. Entity 3 in time pe riod I? d. What will happe n if you try 10 estimate the coefficients of the regres sion by OLS? 10. let Yll ::: f30 + (31)(" + ylD l i + 'Y202.) 10.The estimate in cohJmn (4) is significant at the 5% level. Suppose that real income per capita in New Jersey increases by 1% in the nex t year. Construct a 90% cont i. Apply thi!'. S how lhal the binary regressors and the "con· stant " regressor are perfectly multicollinear. 10. list to the e mpiric.
+ Ujr') b. ) = 0 for I s in E qualion (10. The researc he r collects data o n (he snowfa ll in each s tale fo r each year in the sample (SIlOlV u) and adds this regressor to the regressio ns.. Show that cov(V. = X i/{JI + a j + A) + ifu.8 Conside r o bserva tio ns (Y. In the fixed effects reg ression model . i= I •. + 51 STI + "Y2D2.. union status. do you think tbat the estimated values of 0". ...u" . The researcher collects data o n the ave rage snowfall (o r eac h state a nd adds this reg ressor (A vaogeSIIQlVj) to the regressio ns given in Table 10. 10.PT)? 10. + y"Dn l + lI il .. 8 2_.7 A researcher beLie\'cs that traffic fataliti es inc rease when roads are icy. What happens if Assum ptio n 115 is not trul!? 10•..N. + 11/ model also can be written as 3 75 + un' This Y!/ = {JrJ + {JIXIJI + °2 82. so thai Slales with more snow will have more fata lities th an o the r states. + .1 .Exercises 10. * 10. = = ... whe re Ot. whe re 8 2 .. .5 Considerthc model with a single regres.co n· sistently es timated as n i.5 are sa tisfied.6 Suppose that the fixed eUeels regression ass umptions from Section 10...9 Exp la in why Assumption #5 gjve n in Section to.11 rn a study of the effect on earning~ of ed uca tion using panel dala on annual ea rni ngs for a large number of worke r". 1 = I . education. 0 2. + ilu' ) 10. + Air is an unobserved individua l·specific time tre nd. H ow would you estimate /31 ? 10. = 1 if f 2 and 0 othenvise. a" . + .r Xu) CroOl the linear pane l data mode l Yj . and so fOrlh . T = 4). If 11 is large (say..) related to the coeffi cients (al" . ai. .s small (say. are approximately normally dis trib· uted? Wby or why not? (Hint: Analyze the model Y II = a . and the worker's earnings in the .00 with T fi xed? (Hin t: Analyze the model with no X's: Y . T.Vi. (I re the fi xed e nti ty effects. _ on )'2" '" y.. b.. n = 2(00) but Ti. .. a researcher regresses earnings in a given year on age..or YII = J3I X IJ/ + a.5 is im portant fo r fixed e ffects regression. 1 if i = 2 and 0 otherwise. . Com· men t o n the (ollowing me thods designed to estim ate the d fect oCsnow o n fa ta lities: a.. H ow are the coefficients (13 0./ = for 0 .10 a.. .28).
l S!JDlC u..3 76 CHAPTER 10 Regrenionwith Panel Dolo previous year using [v. Will this regre'>sion gi\lc reliable esti males of the effccts of the regressors (age. 0 0 the re ~ ults change when you add fixe d state effects? If so.!'!II' mil ted eHect of a shall·carry law in regressio n (1). as measured by stat istical significance? As measured by Ihe '"realworld'" signifi cance of the estimated coefficient ? iii. you will analyze the effect of concea led weapons lawf> on violent crimes. available on the Web site.II<'" 2003: "5. In this exercise. st:\t~s. education. are men ta lly compelCn l.SIUII/rmll <I" R. il more people carry concealed wea pons.. stat cs have enacted laws that allow citizens to carry concCa kd weapons. "ShOOHIli 1.·jo) against shall.s. Proponcnts argue tbat. Estimate (I) a regression of In(vio) against shall and (2) a regressIOn af In( . These laws are known 3S "shallissue" la\los because they in'it ruct local authorities to issue a concealed weapons permit to all applicants who arc citizens. for the years 1977.5. (Him: Check the fixed cff ect. avgillc.!gressian results is ma rc credible. b. union statu~.. and Ihil l could Cllllse omi tted variabl!: bias in regression (2 ).o. pbl064.. Does adding the con lrol variables in regression (2) change t he <. On the textbook Web sitc www. and previous year's earnings) on earnings? Explain. Oppone nts argue lhat cri me .>0""" the 'More Gu~ Less Crime' HypoIhc$I'. density. illcurc]fIle. Interprel the cocfficient on shall in regression (2).. Is this I<lrgc or small in a " realworld" sense? ~'sti mal e Ii... a.awbccomlSlock_watson you will hnLl a data file G uns that CO/Hains a halanced panel of data from 50 U.: J HI hi' paper WIth Ian ' \\Tel. plus t he District of Columbia. and have not been convicted of a fd{l]1\ (some states have some add itional restrictio ns). pwlO<H. tI9_ ' t312. \\ hidl sct of rt. i.:ed cffects regression. crime will decline because crimilhlb are deterred from attack ing olher people. regression assumptions in Section 10. and pm 1029. P()P.rip tion is given in Guns_Description. and why? 'Thc~ UUI3 "cre provided by Professor John Donohue or Sianfonl Uni\'e~IIY and "~r( u<.. .. Suggest a variable that varies across sta tes but plausibly \nrics lil tl eor not at aJ1ove r tim e.S. m increase because of accide ntal or sponlaneous use of the wea pon.} A detailed dc.) Empirical Exercises EIO.lm.
2 Traffic crashes are Ihe leading causc of dcalh for American~ bCI\\l!cn Ihe ages of 5 and 32. ha08. Using the results in (e)..'11)97.pud65. spcC'lf7(). c. available on the Web site."cmUII/lief and SfaflSlICS 200.ion ill1al)'si~'! r. wh:1I co n clu~ions would you draw abou t the: effects of conccilled\\ capon laws on these crime rmcs? £10.1Il OC(:upant not ~earing .~ A detailed description is give n in Seatbelb_Dcscri ption .".discuss Ihe size of tht: col'fficienl on sIU1se(l1:c..0. Is it la rge? Small? How many lives \~ould he saved if seat be lt use increased fro m 52% to 90'10? f. plus the D iSlri ct of Co lu mbia. which set of regre~ion results is more credible.is most reliable'! Explai n why. e.rl\· and "'crc IIscd In hIS po'l!)(r with Alma Cohen.ou add time fixed errects plus sta te fixed effects? d. or (c).aYl·bc. Through various spending polici~s.. drill knge21..lhc federal government ha~ ~ ncouragc d states to institule mandalor~ scat hcltlaws to reduce the number of fatalities and serious injuries..: data were provIded by Pro{~ Lrran Einav ofStllnfonJ Um\co. wha t arc: the 1110"t importan t remaining threat<. (b) . Based on your analysis. Slales. Do the results change when ). 111ere are IwO ways that manda tory seatbelt law" are enforced: "Pri ma ry" en forceme nt means that a police officer can slap a car and ticke.' If so. hut must havc another '1'11'::<. In this exercise you will in\cstigate how effecti. a. and wh) "! d. DOI!s lhe estimated regress. 10 thc in t ~rna l validi ty of this regresc.' I. In your view.comJsloclC YI·ubo n }OU will find a data file Selllbclts that contains a panel of dilla from 50 U.I seat belt : "secondary" enforcement means that a police officer can write a ticket if an occupant is not wearing it sell helt."~ on T>mmg HchJ\ ror Hnd Truffic nnal! lI~~ Tit.. On the text book Web site l\ wl\. ··11n: Effccts of Mandalorv Seal Bcll U.ou add slate rixcd effccts? Provide an intuitive explanation for why the rc~u l ts changed. and age. Which re gress ion spccification(a ).S.!ffccts'.Empirical o:S'Cises 377 c. "f . In(llI(:ome)..': ~~(4) IQK_&n .t the driver if the officer observes. R epeat the analysis using In(rob) and In(m"r) in place of In(vio). Do the results change when you tltJd rixed tinH. Estim ate the effed of scat bell usc on fa tal it ies by regressing Fa/afifyRme on sb_lI ~eage. H. Do the results change when ). for the years 198. e.ion suggest that increased seal belt usc reduces fatali ties? h. <..ve these 103""'5 arc in increasing seat hell U~ and reducing fatali ties.
TIle tT{lffic fa tality rate is the number of traffic d ea\ h ~ in a given state a given year. New Jersey changed f rom secondary enforcement to primary e nforce ment. In(illcume). per 10. huOS. Ourea u o f Economic Anai}. annually for thro ugh 1988. State Traffic Fatality Data Set I~S2 In The (illia arc for the "lower 48" U. . Data on the total vehide miles traveled annually by Slate were obtained e(IU from the Department of Transport3tio n. speed65. De partment ofTransportiition Fatal Accident Re porting \~ ere Sy~tem. The beer tax is the tax on a case of beer. Bu reau of Labor Statistics.rnbc Ihe SI. and "Mandatory co mm uni ty service?" equals I i( the stale re qui re~ community se rvIce and aIS0 olherwise.. The two binary punishment va nabks in Table 10.378 CHAP TEfI. .5.1 The .ary is a binary variable for secolldary enforcement. and age.~ morc gcn erally.He 's minimum ~enlcncing rcquireme nts for an initial drun k dri ving con\·lctton: " Mandatory jail'!" equals I if the state requ ires jail time and equals 0 otherw l~c. or 20..S.. speed70. drinlwgc21. primary is a binary variable for primary e nforcement and second.sis. Personal incomt' was obtained from the u. APPENDIX 10. which is a measure of state alrohollaxe. Estimate the number of lives saved per year b) making Ih i~ c ha nge.. Traffic fatality datd obtained from the U. including fi :ted Sla le and time effects in the regressio n.s. . R uhm (If {he: Department or EconomiC!'. In the data set..19. and Ihe une mployme nt rale was obtained from the 1. In 2000.1 dc. stales (excluding Alaska and Hawllii). hether the legal dri nking age i~ 18. 111e~e data "ere graciously provided to us by ProCessor Christo pher 1.s. 1 are binary va ria bles indicating . Run a regression of sb _lIseage on primary. Does pri mary enforce me nt lead to more seal belt use? What about secondary enforcemenl'! g. allhe University of North Carolina. The drin king age variables in Table 10. 10 Regreuioo with Ponel Data reason to ~ I Op the car. ".ec(mdur).000 people \iving in tha t state in that year.
conditional on the X's. ~'..... + x : r ) I' • (X~I + x~:.. in Exercisc IIU5. fo r heteroskeda~ticity.:1 SIondord Errors lor Fixed Effects Regression with SerioIty ~ Errors 379 APPENDIX 10. <." L LX") .c OLS estimator I~ Y b~ obtai ned by replacing X i  X b) X" and Y. .~ . .3 does not hold..:ancd fixed cUects regression \Vith a singlc Tegres wr X..()rr. 1 ..dULIble ~ummalion i~ •• 1.~c t iona l da ta. f31 = . 'Ib is appendix explains why this is ~o and presents an al te rn:lIive form ula f(Jf standard errors thnt Me valid if there i~ bClt' roskcdasticity [lntl/or autocorrelationthat i).3 of the 5.7) by IWO SUnHll1l1 io ns.·· + '\ ~r) ..*1 L L X~ I~I The deri\'ation of tho: sampling distribulion of PI parallels the derivation in Appendix 4. ." + X u + . ... = XII . + . "(' LLX. First.n) and one over lime pcriod~ (/ '" 1•. is the OLS estimator obtained using the entitydemea ned regre.Y.... .... r __ (10.XI' Y. The Asymptotic Distribution of the Fixed Effects Estimator Th\! fixed effect~e~timntoTof fJ._tX". om: o\cr enlities (i '" I.this oorrcs ponds to treating II as increa~ ing to infim ty \\ hile T rcm~m~ fi "<ed. I the C:Xlcn~KJn 10 double sLl~rirl' or 11 ~ingle $\lmmlltiun: = L (X. Xi.~ X'l+ (Xl! • .: .. we focus on the ClIl>C in . h Vu in Equation (4 7) and by Tepkld ng the single SUIll Ill:l l i(l1l in Eq u lIlion (4... Throughout. . ./) + XI!" . ~recifica l1 y when Assumption 5 of Key Conc~p t 10. li nd Xi = rl l..\'..14) in which Y" is rcgrc~~cd (In XII' ~here Y" := Y" .2 Standard Errors for Fixed Effects Regression with Serially Correlated Errors TIns appendix provides formulas for "tandard errors for fixed cffcclS regres~ion when the crrors are scriallycorrc l<1tcd. . I . it" CTQS5.. = T I '2.. ~ub '1h.1mpling distribution of the OLS estima1<lr .t 0. T)5. X lr) . [hen the u~U<t l standard erfliT fonnula is innppropri alc.re~<.. This appendix consider~ emitydcm. The fommlas g.s~ion or Equ(llion (10.and f1utocorrelalionconsi~ l ent (HAC) standard errors.22) LL X. The fo rmula for Lh. hich the numbe r of entit ks is large but the numbe r of time periods T IS small: mathcmaticalt y..lven in this aprendh: are extended to multiple reg. If " II is <1utocorrelateu. 1V.
(1025) IT! = var(q. '" }:. ~ L: L:X.IIT.3.) ]II..23) by nT. where fJ"~ IS VJ(iJl ..' ~ II II· r I LLX)iit _ .!. ~. Q.r.II. v nT{fJ .IX" U " "" I .II). = £7' ._. J~ the lotal nu mbe r of observations. a nd no te that :f. tne variance (If '1i' II follows [rom Equation (10.::!1) Under Assumption S of Key Concepl 1O. dJ"ide the de nominator of the figh l side of Equation (10...4.•• X.divide the num~r.X! Also. 0.. Qx L..21) where lill := and = n~ 1:.26) var(J31) .':6) simplifies. . mulliply Ihe _ T ./3. a riables U and V.. unde r AUumplions 1.26) t herefore can be wrinen !l~ the sum of va ria nce!\..fl1" ~ .14)1 inlo the numeralOr of Equation (10.(X"  W. The variance of1hc sum in Equalion (10.) is dis tribuled N(U.~ Q r 7 ( 1U. p lus covariances: .24) that. From Eq ulllio n (lO.22).. var(U + V) '" \·ar(lf) .r=' left side by X. by the cen tral lim it Iheorem. .1 I:_IX~.380 (H A PUR 10 Regrenion with Ponel Dolo stilUte = fJI X" + u " [Equallon (l0. whe rC: 1].1Q1) whe re (large II).lhc expression fo r u~ in Etjua!ion (10.) '" va r(fir..1. thell rearrange the result 10 o btain y. _\/7 fi £" ~~ "" Qx II (1 0. V ir). ...3.24).25).{). _I X"uj.. fo r two random . The n v.. . The scaling factor in Eq ua lion (lO.'. \'ar(V) + 200v(U..II" Ox Under the first fo ur assumptions o f Key Concept 10.. b /3 I I  .. the va ri ance of t he largesa mple dislribu tion of PI is ± ( lU•. alor by T POV J Nex t.) = fl· V~" ~".. X. ~rrI7"C 'T11 is distri buted N(O..O'~) for /I large. Re"alt tha t. II .
C!~~rmd (clustered standard error). n sho uld be large..]) + v:n (va.. C\' CD u..28) are... in a stud y of earnings. ..30) x The cluste red varia nceesll mator if2 l).) TIle clustered panel da ta standard e rrors are give n by (10.. BertrAnd... This va riance cstimtltor is ca lled the clustered variance estimator because the e rrors a rc grouped into cluste rs o f o bservatio ns.. so if II i) is au tocorre lated the usual heterosi::edasticityrobust variance estimator does no t consiste ntly estimate (10..and au tocorrela tio n· consiste nt. Fo r example.. Q (10. fo r diffen:nt ti me periods but the same entity).] :V.. + "ir ( 10.23) + 2cov(v. . (10.) + . SCl' ( 10.1 2'>")'..1 2).~ " " = r\'ar = + ". For the clustered va riance est imator in Equation ( 10. For cmpirkal examples us ing H AC stltnda rd errors in economic panel data.}' ~ fI .. ] . as n . The clu~tered varianc~' u!.28) nrc zero (Exe rcise 10.."2 _ _ I + .. then the covariances in EtlulIlkm (10./lT. 11'1 might be correlated across entities.T) (1 0..ls ..24) ally autocorrelated .6).) 381 ~ ) I ('i/I T..ith Serially C~ted Errors II . But if U II is autocorrelated.25) SEd).29) to be reliable. (Some soft "here O'~ i~ ware imple ments the clustered variance form ula with a degreesoffreedom adjustment. given the X'~so e numer· l Xirliu all the co\'arjances in Equation ( 10. In contrast.::d effects regression. nonzero..rI~j/""{ where 1 "(' .)  11T O'. + va r(ii.7 _I. 1 . that the sampl ing sche me selects f(lmilies by simple random The n tracks all siblings within 1I family...d~$Imd is a consiSlenteslilnalor of and T is a fixed COI1 SIIlOl ../ = Xi/u" a nd u" is the res idual from the OLS fix. lncn va r ( slondord Errors for Fixed Effecb Regression . Because the o mi tted factors thai en ter the e rro r te rm could have .26) ') + vi"lr( V) 'j!!cn n~ the Standard Errors when SllPPOSC U it Is Correlated Across Entities ~a mpling. + 2oov(ii.29) X'ii· &. the so·called clustered va ria nce estimator is valid eve n if I'll is condition· estimator i~ .I o:T~. thc erro rs a re uncorrela tcd across tJ me periods.24).Vn)l. where the errors can be correlaled wi thin the cluste r (he re. but are assumed to be uncorrelaled ac ross clusters.t <c if the re is heleroskedaslicily a nd/or autocorrelation (Exe r· ( 10.10. T n some cases. and Mullainathan (2004).28) f1var(V. . U nde r AS$um plion 5. Duflo.27) ion (10.1 5): tha t is. The usual heleroskedas· ticityrobu~t variance esti mator se ts these covariances to zero. in general. ) + . the variance estimato r is hcte roskedastidty./ ~ .26) cise 18..:.
. h ~ M.' \~ ~ ~ ~ ~ ~ ) ~ ~ 1 ~ ~ \." \ <0. = .. ~ o • • • w ~ u •  ... <> ~ r~ ~ " ~ ~ ~ ~ i ~ .
D oes the 44 bank treal them the sa me way'? Arc they Ool h e qually likely to have their mOrtgage applicatio n accepted? By law they must receive identical treat men I.crimin ation... /ia._~k evide nce .conta ined in la'rge da ta se l ~ sho\l. Bul whe ther Ihey actually do is a ma iler of great concern amo ng bank regulators. ~ ~h 28% of black»ppl. _~ .m payments take up most o r a ll of the applican t's monthly income. Many studies of d illc rimimllio n thus look fo r sta tistical e\ide ncc of { fOc.so<1 does not rcolly answer tbe questiun that in Ihl! mortgage market'? A start is to comp.nial.':'" d..f~: LpQ<H """w ~.cd.Jik ...nee of discriminatio n e~1 applica nt..L...J "identical but fo r their race:' Ima cad. Loans a re made and de nied for many Icgilimale reasons.. identical but for their race. MassachusellS..IP. ~ T wo people.. ' ed'g.)f ''''U : ~ ~t<.. area.? Regression with a !:::::~ Binary Dependent Variable~~ Wv.can<s .lrc the fraction of minority and \\hite r. fO r ~.l.~ '.lnd minorities are treated dl(fercl1Ily.~<e denjed mortg""" but o nly 9% of . fA..P.. the n a loan ollicer mig ht jusli(hlbly den) Ih e loan. ~ Matis~eal c\:ide." ~ pro posed lo. For exampk. huitJin•l(" Oilier ap(}lical1l c/wTl1cterist jcs CUIL\/llIlt. wa lk into a bank and apply fo r a m ortgage. ~. £0< " • 383 . a large loan ~o thaI each can an identic(li house.bHe "pp l. should o ne chtck. gn thered [rom mortgage applications in 1990 in the Boston ... B ut how.6 '~~ '1 di~rimination.s w~o \~~rc denied a mOr)g~ge. precisely.ve I IM#rtf1j(I_ ... canJs i1 j ~.. that is. because tbe black a nd while applicants we re not necessarily p . If the It.... SO Ihc de nia l of a singJe minority applican t docs nOI prove a nything abo ut di<.CHAP TE R 11 I'e ~ Ir~ fju. I~}~le d~ta cxam!ncd in this chapter. ~ Pi AJJe!~ opencd this chapter.. Butth.. e ven lORn officers arc huma n a nd they ca n make ho nest milllakes.rlng that whi les f. r ..n. Also..~ 1'~~/". wc need a method for comparing rates of r A .. ~uy C _. comp..·/l41~U>uJ..
discusses the method used to estimate the coe[[icieots of the pro bit and logit regressions. cal led " prohit" and ··togit·· regression. TIle binary depende nt va riable conside red in this chapter is an example of a dependeot variable with a limited range. imum likelihood estimation...384 CHAPTU 11 Regrenion with 0 Binary Dependent Varioble Th i~ sounds like a job for multiple regression analysisand it is. • 11.1 (0'. but with a twist. and it allows us to apply the multiple regression models fro m Part II {o binary dependent variabl es..1 (N)U~ and they ca used no particular proble ms.l (f'" U ... 11. al e surveyed in Appendi. { ~U~ ! ~~d". J..... things are morc difficult: What does it mean to fi t a line to a dependent va riable that can take 00 only two values. Th ese methods..1  . Th is in terpreta tion is discussed in Section I I. Binary Dependent Variables 'and the Linear Probability Model Whether a mortgage application is accepted or denied is one example of a binary variable... we apply these methods to the Boston mortgage application data set to see whether the re is evidence of racial bias in mo rtgage lending. r~~J!I PI A> ~ J . Section 11 . The twist is that the dependent variablewhether the applicant is denied./e.1 goes over this "linear probability modeL·' But the predicted prohabili ty interpretation also suggests that alternative.3.. 0 and 1? The answe r to tbis question is to interpret Ihe regression runc:lioJl as a predicted probability.I. for exampk .~ J) . in o ther words..t' ~~ ~.2. ~ k~ p~~  . Whal I:> the dfet:t of a tuition subsidy on an individual's decision to go to college? \Vha l .\. we regularly used binary variables as regre'\sors. In Section 11 ..4. But whe n the depende nt variable is binary.is bin~... 1''</1 Y1I<. 10 Part II . Many o ther important questions also conce rn binary outcomes. it is a limited dependent l'anable. which is op tional. dependent variables that take on mu ltiple discrete values.3. th~ method of ma. nonlinear regression models can do a better job modeling these probabilit ies.. are discussed in Section 11 .~} 'l Or"" . Section 11. Models for o lhe r Iypes of limited depende nt variables.
. area in 1990. S.3 b.onsh.p between deny and I t~'iiI ratio: Few applicant.t' 14'!:f/. ~ ':70  . Iht: lin ear probability l {i i47't&~ >uuo.j I ir the mortgage applicatio n was de nied and equals 0 if it was acce ptcd . and so is the proces!i by which the hank loan officer makes a dccision. ~.> sC3tt crplo t look') different th"'n the sc3ucrplots of Pan II because ~~ . J~nninc:.1. 7 . . As usual. tht! applicalll docs or does nol get a job. relm.1q b'/c..1 is much easie r to make pavntcnts that ~ ~c 10% o f \. ~'I orlgage applications are complic8lcd. ersus Pli rutio for 127 of the 2180 Dbscr\'arions in the data scI. 0.. a coun try does or docs nOl rccej \"c foreign aid.d rIvet{ ing...! to the applicant's income.I..IX".". Ali ~ anyone \I. The data are a subsct of a larger data sel compiled by researchers at the Federal Rescrve Ba nk o f Boston uml~ r the Home DisslmlJ[c Ast ( HMDA) . th e payment ·loincome ratio . t f . ole deny.obi. l h en turns 10 the ~ i mpi cs i model 10 use with binary dependent va ri ubles. I ' Illis positi ve relationship betwcen PII rario . ' t P.' .. O ne impo nanl pieee of infonnation 4 "[ is t he size o f the requi red loan paym ents Jc lath'l. applicant will make his or her 1 01m payments.I J ~ f. :f / MOrlnas .. e their . (Thl: scatterplot is casier to read using th is subset of ~he datu . up s moking? What detecuunes whether a J raU"e fh'~r'< ~. I S Figure 11. . this line plots thc pre dicted value o f tleny as a function o r the regressor. but most upplicants with a paymenttomcomc ratio exce!i. .) Thil.s b. and re late to mOrlgage a pplicatio ns fi led in the Boston..'£' 1 ~.lhe teenager docs or does not take up smoking.. This seclion discusses what distinguishes regression with a binary depende nt variable from regression with a cont in uous de pendent \'~riahle . Thc 130ston HM DA data are described in A ppcndix 11.f...' seems to show . ~ the greatcr the fraction of deniu ls) is su mmarizcd in Figure ILl by the OL5 regres sion linc estimated using these 127 ob~ervations.:. \\ ilh a paymenttoincome ratio ~ss than 0.1y Mode1+"!>ifK JvtA #<'1 country receives foreign aid? What determines whether a Lob applicant is sue ce~!)fur! In alllbe!>c examples...II.lke.. Massachusen s.."'.4 art: denied. The 101111 officer must forecast whethe r Ihe . and the ~ cont inuous variable PlI rmio .k~ /4~.l . p application denkd. ~etH T r) tis .our inC"Omc t ha n SO% ! We the refore begin by looking at the rela tionsrup be\\\ ccn two va riables: the binary depe nde nt va riable deny.:ond the 'u". wh ich b the ratio of the a pplica nt's anticipated 10lal 5 . mo nthly loa n pavmen ts to his or her month ly income.v'.# t 'Binary Dependent Variables The applica tion examined in this chapter is whe ther race is a factor in denying a mortgllge applicat ion: the binary dependent variable is whether a mortgage appli cation is denied..flo/ I .md deny (the higher the PI! mtio.1 presents a sC811erplot o f tlen). . the OUicomc of interest is hinary:The student does or does nOI go 10 college.m"y... wbich equals 1. P<obob.. ho has ~ rrowed 1ll0l1CV knO\:'rlo ..1 . JJ J 'I ' .. whether a teenager 1.
t (or mean) is the probability that Y = I.. thc predicted value f of 0. when I'll r(ltio = 0. First. if Y is a 01 binary. • X.' PI! ratio P. II I'l l rllIio "" 0.. ..2 II. Second.4 0.3. . . (~' ~ 4... E(Y IX 1. . Mortgage a~ 0..3. .and more generally 10 understa ndmg ~ cling the probabiliry that the dependent variable equals I.variable. that is..' ~~ P a~? 0/. given X. from Section 2.6 \Jnea( PtobabI'ity Mor!9* dfnltd ModtI IIA 0..0 U. In the re~rcs~ sia n context the expected va lue is conditional on the value of the regreSS(ll~.. condj· tional on the PI/ ratio .O is 0.2 0) 0. )..386 CHAPTf.. the popul ation er~egress ion function is the expected value of Y given the regressors. from Pan II . 1 Scorterplot of Mongoge Application Dental and the Paymentto·lncl Mortgage opplrconl1. But what.'tP« ( IUt7 .2 .lth {l.. then its expected value I.7 U. for a binary variable the predicted value fTOm th ~' popularion regression is the probability that Y = I. • •• • X~) =" Pr(Y = l IX I •.. 1 X. 0 il approved) The lin· ear probability model U~ a Mroight line 10 model the probability of denial./' ? .....'2 1. '/ U. Sa id differently. Th us for a binary va riable. . This interpretation follows from two facts.8 0 .. then 20% of them would he denied. deny . m . if there were ma ny applicali on~ y.20. c::d!? p nO L .3 is estimated to be 20% .. w ith a high rolio of debt payments 10 income (PII ratiol ore more d en y 1...0 ~ ••..20 is interpreted as meaning that.R " RegrO$sion with 0 Binary Dependent Vorioble fiGURE 11 .. •••• 0. . when PI] ral.2 V. ._NPL.. ~D ~ the probabil it y is conditional on X.tI~=1 regression with a binary dependent variableis to interpret th e regression a~ mod ~ The key 10 answe ring this q uestion. the probability of loan denial. £(Y) = Pr(Y '" 1). The lineur multiple regression model applied to a binary dependent vfln Jbl e is ca lled the lincar probability model: "l inear" because il is a straight lin e.Thus.••••••••• .. • . does it mean for the predicted va lue of the binary variable demoto be 0.. _ _ • • _ .• 0. the probability of demal L. 1. .. and "probability mode l" because it models the probability thai the dependent variJPlc cq ulI ls I. £( YIX l' .5 U6 0.= p.. precisely.V I. For exampl e. ~ ::. [n short. in our example...3. the pred icted val ue of de ny is 0. .4 ttkely to have their oppficclion denied (dtony = 1 if denied.)....20'1 i1.
~ p.'l a II 51" i. The OLS regressio n of th e binary P M l !. = Y ..1)."" 0 •••". yrobust standard errors be used for inference. P~b..e in X. This is impossible when the dependent van I I_ffble is binary. Simil arly. if the PI/ ratio Increases by 0.1Il be estimat ed by OLS.:scss ntialtbat hetcroskedasti . 1 8 : ~ y" _I 'I ~ variable equals 1. hypothcse~ concern ing several eoef fi .et w .. it is possible to ' agine a situation in which the R~ equals 1: All the (lala lie exactly on the rcgr ssion line. unless tbe gressors are also binary. and the population coerfi· I cient is statistically significantly different from Icro at the I % level (the lstatistic . compuled using ~ . and the OLS estimator ffi l estimates the change in the probabiJ ity that }' = I associated with a unit change in X.rted with a IInir chons.1) ".r ' . The population coefficient f3 1 on a regressor X is the cbQl!gp in 11ll' p ro bobjlic)' ~hM Y I associ tr..". When the dependent variable is co IOUOU S. and !he U. deny.. Iv' n variables can be modeled usi the methods of Section 8." '""0. r" '~ 11 ..3. given a change in the regres r. E! 0.. (.againsl the pa)'ment ·to·incom~ ratio. 1.~ /J X~I 0 e tooll hat does no t ca • over is the R2...l4t I' 7 I ~ le. fY~61:t "'*:"1t' ~ 387 f £R J. . is the predicted probability that the dependent 1. /J /} r' f J ff' ~ ~plicatjon to ~ 1d d~epcndc ~ nt van · . Y._ f) de.. l A lmost all of th e tools of Part II ca rryove r to th e linear probability model. pl1 then the probability of denia l incrcases by 0.0 per I !L __ .032) (0... B ecause the ~J:tors f thelinear probabili ty mod arc always heicroskedastic ( Exercise 11. The linear probability model is the Ilame ror t he muhiple regression modiFot'"Pa~ II when the dependent variable is bi nary rather than conti nuous.080 + O.. .11''1 e Boston HMDA data.•' ~w · 0. For cxample. nts can be teste using the Fstatlsllt seusse 10 Chapler 7: and IOterae Ions ).. The coe fficie nl s C.060.. A jp. The linea r prob ility model is summarized in Key Concept 1/.604 X 0.1. Ninetyfive percent confidence inter. Ir~yre likely to have their applicatio n denied.96 standard errors.J. 1 8.: vals can be formed as ± 1. Th Us.13). the popula tion regression function corresponds to the probabili ty thut the dependent variable equals one. (0.8). l. ~stimated Sing 311 2380 bservations in our data set is 1_i!~l'k 1. PI[ rafio.098) P ' ( 11 . Accord ingly. applicants with highcr debt payments as a frac tion of income are ".~ Model ~ The Linear Probability Model Y ~~y. We re turn to measures of fi t in the next sectio n. '.1 centage points.I. n i It ~ '.. This coefficie nt can be used to com .~ 1/1 The estimated coefficient on PII ratio is positive. . .O~ C.fw.. _ {. that is. Because the dependen t variable Y is binary.)~ ~0 ~ o f . given X. the OLS pred icted va lue. according to Equation (11. pute the predicted change in the probability of denial. the R2 is not a par ~ f"'~ ' ularly useful statisti he re.t' ~~ .604PII ratio../th e estimated regression fUnc lion . by 6.
089) "nl C coefficient on Mack.l77Mllck.:vet ( th e Istatistic is 7. + {J(X Ic The regression cocrficient {31 is the change in the probability tha t }' = 1 :l..101.SSociatcd with a unit change in X I' holding constam the other regressors. and th e usual ( hcte roskcllasticityrobust) O LS s tandard errors cnll he used for confide nce inter vuls and hypothesis tests. the n the PI} rallU IS 0.091 + O.388 CHAPUR 1 1 Regression with 0 Binary Dependent Variable • THE LINEAR PROBABILITY MODEL ~J 1 1. a 17. T f32Xb.. J) ca n be used tll compute prell ieted de nia l proba bili lies as a funclion of the PII rmio. ThaI is.o + O. .025) ( lUI (0. + fh x h + It. holding con.I XI' X l' .tant their pay mentIoincome ratio. holding constant thl: 1'// ralio.604 X 0. 1% that his or her aprli· ca tion \\ ill be den ied.stimate Equation ( 11.is is different than the proba bility of 20% based on the regn:ssion line in Figure ILL bcCOi USC that line was estima ted using only 127 III thl' 23SO obse rvations used to . f3A' TIle regression coefficients C an be est imated by OLS. ( 11. + . is binary_ so thm Pr( Y .080 + 0.3 and the predicled va lue from Eq uation ( 11. For e xample.1 ) is .cd th.ry:::: . _ X~) = f3(J + f3I X j + {32X2 + "..nJIJ ca nt al the t % h. llle estimated linea r probability model i.7% higher probabil ity o f \l<lving a mortgage a pplicmion ~l'n.0. 177 _indicates that an A frican 'American appll.! To keep th ings sim ple. r rn. : : (0. a n I'I pplicam whose projected debt payments arc 30% o f income has a prob<lbility of to. if project ed dcbt paymen ts are 30% of a n a pplican t's income. we focus o n differences hetween black and \\ hilt' a pplican ls.11 ). .i) with <l binary regressor that equals 1 if the applicant is I1lilek and equ:lls 0 if the applinmt i~ white..2) when: Y.: Int ha. The estima ted linea r pro ba bili ty model in E quat ion ( 11 . To estimale the effe ct of race. a nd so forth for {32" " . hold ing constant the I'l l rmi() . according to th is linear pro babiji ly model.1). 1 Yi = f1n + {Jrx l.3 = 0.lIl a \\ hile. '(k.029) (0. 'fllis coefl'icicnt I:' sif. 0.0. "' ~ aug me~t Equaliorr(l t ..559PI} rar.) Whm is the effect of race on Ihe probability of den ial.
1'1l) ~~~~. ICh as the applicant's carning potentia l and the individual's credit history. '. f~ 'nle probit regression model wllb a stugk regressor X .~rR7.c" "dJ.:..V. Logit regression. also cailed logb:fic .3. ~ 11. Shortcomings ofthe linear probability model. thiS estjmale suggests that (here might he racial bias in mortIt gage d('cision~ but such a conclusion would be premature. omission from Equation (11. Ihey are used in logit and probit regrcssions.~ able ~ models the probability th~t Y = \.' "d '0..\).ables'.·T1te estimnted line representing tbe predicted prollahilities drops below ofor very low \alues of the PII rtlt. Although the payment toincome ratio plays a role in tbe loan officer's decision. pl'eJ.c.3) will cause onl ined variable bias.1 .~:£~'d{!t. II any of these variables are correlated \\~lh the regres.ss~n:~ ~ 11. ._ u ulatlOn that forces (hc predIcted values to be between 0 and I. (114) .. ~"O"""d it I'r( Y ~ IIA1 ~ ">(/l. Look again al Fig ure 11.jti'V"'. j " j: Probit Regression " / ""' P irdIX)" ~(. s pro uct! probabilities betwee n 0 (Section 2.d...! Bullh..SI.is is non sense: A probability cannot be il'ss than 0 o r greater than I.on.k.o and exceeds 1 for high values.O'. + ~tA1.2 Probit and Logit Regression ( p . _.d J.rt? 0~ Thken literally..t1 r (" C/". To address this problem..2 ProbitondtogitRegression 389 f. it makes sense to adopt a non lincar for ~JJCl /2. we introduce new nonlinear models specifically designed for binary dependen t var. Becaust ptC~Umulati ve proba Islribul ion functions (e. so do many other factors.!. Pro /I4I't~ bit cssion uses Ihe standard normaL c.. p. then their.f..>.!>O" black. uses thc "log. This nonsensical fea lUre is an inevitable consequence of the linear regression. or PII n il/(}.. The linearity that makes the li near probabilily model easy 10 use is also its major nnw.J./t f" Prohi l and logit 1 r~gre~sion arc nonlinear regression l1looeb spccificallv de~igncd ~.robit regression with a single regressor. for binary dcpendent \'ariables" BecauM= a regressIon "'i tll a hinary dependent vari 'O.Is ~ egress.... TIms \\e must defer ~ny conclusions about discrimination in mortgage lendi ng un til we completc the more thorough analysis III Section 11.
2 plots the estimated regression fu nction produced by the probit regrc~ion of deny on Pil ratio for the 127 observations in the sca lle rptOI. . un._ . if /31is negative..2 %.•. MOOgaqt'denltd I. the calculatiun in the previou s paragraph can. That is. more generolly. 1he predicted probability that the application will be denied is 21. however. • . If /31 in Equa tion (11. 1IX)..::: .8 0.0 where <l> isthe cumulative standard normal distribut ion function (tabulated in AppeodixTablt! 1). the probil conditionol probabilities are olways between 0 and 1.2 .. Whe n lhere is just one regr. the coefficients are best interpreted indirectly by com' puting probabilirie:.4.tioo func tion 10 model the probability of denial given the paymentto income ratio or.6 Probit Model 004 0.3 OA " . In the prohlt model.2 and f3t "" 3. What then is the prob ability of deni al if PII ratio"" 0.2 Probil Model of the Probability of Denial. The estj· . then looking up tbe probability in the tail of the normal distribution to the kft of z . and f31 "" 3. f30 = .2 + 3PI I ratio) =: <1>( . 0.8) = Pr(Z ::.pu + PI X. suppose Y is the binary mort gag~ de nial variable. which is 21. Thus. Figu re 11. the easiesl way to interpret a probit regr c~ sion is to plot the probahilities.:_~:~~ 0. _ • • _ _ < 0.. deny.....::: $( . to model PriY . computed using the pro bit model wit h the coefficients f30 = . andlorchanges in probabilities.0 · 0. sor. Unlike the linear probability model.l' bility that Y = 1...2 0.. 0. equivalently." z = ~ + f3 \X . be done by fir st computing Ihe ·'zvalue..0. Instead .390 CHAPTER 11  Regression with a Binary Dependent Variable FIGUR£ 11 .2 %.2%..004 u .4? According to Equation (11.8.6 0. For example. an increase in X decreases the probability that Y = 1.. _ ________ __ • • • ____ _ Mortgao.4) is positi ve.2.~)..4).8) = 21. plays the TOle oC"z" in the cumula· tive standard normal distribution table in Appendi xTablc I .2 + 3 X 0..0.O .::: 2 + 3 X 0.0 _ .. According to the cumulative normal distribution table (A ppendix Table I)..S P/I ratio 0. the term .:. Given the P/I Ratio The probil model use the cumu lative normal distribl. 1 0. <t>( .4 = . this proba bili t~ is ~ (Po + 13IPI l ratio) . X is Iht' payme nlloincome ratio (PIf ralio). __ . when the PI! ratio is O. it is not easy 10 interpret the probit coefficients (3'1 and f31directly.2 1.:_~~:::.0.8.4) = (1)( . Beyond this.7 U.2 OS 0..l~ (lWIlW~ IL~_:. then an increase in X increases the prob. deny 104 1.
X. As emphasized in Section 8.5) 1 . tillS cxpected change is estimated in thre e steps: Firsl.. + (J. Probit regressio n is no excep lio n. the solution is 10 include the add it ional variable as a regresso r.1.3%. In general.3.3) = 38%. the de nial proba 'hilh)' is 98.5 x 1 = .3.able bias in probil regression.O. Effect of 0 change in X . l yields the estima ted effect on Ihe probabililY that Y = I of a cbange in X. is . Probit regression with multiple regressors_ In aJl th e regressio n problems we have studied so far. .he ility ba thaI :s 131' :onJ .5. the melhod of Key Con ce pt S. then the zva lu e is z = . and..1 %.4 + 0. is a lso the solutio n to o m itted vari. For small values of the payment toincom e ratio. the estimated proba bility of de nial based o n the estimated probit function in fig ure 11. Ibe prahil popululion regression model with two regressors. the probability oC den ia l increases sha rplv 10 SI.11 .4 and X 2 = I is Pr(Y = ll Xl = 0. ted ro . suppose. R ecall from SeCtion 8.B 2 = 0. the pro ba bi lity of denial is nearly I .ul a ion  For example.9%.n he b . X + ax. Accordingly. 1%. dlld when !he PJl ratIo IS 0. for PI/ Ttlfio = 0. the esti ma ted p robability of denia l is 16.81 = 2. compu te the predictcd value at the changed value of X. when the population regression function is a non lin ear fun ctio n of X.: : value. so the expected change in Y arising fro m a change in X is tlle change in the probability that Y = 1. Xz = 1) = <fJ(. the effect o n Y of a change in X is tb.4.X.) = <I>(Po + p..lit}' Prey = qx. Thh. When Y is bi nary. reS obit e. When the PIT Talio is 0. When appl ied to the probi ! model.1 that.\i .. its condi tional expectation is the conditional probability that it equ a ls 1. According 10 this estima ted probh model. the proba bility of denial is sma ll .8o = 1.. compute the pre dicted value at the original value o f X using the estimated regression fu nc tion.2 Probit and legit Regre~ion 391 nUlled p robit regression function has 11 stre tch ed "S " shape: 11 is nearl y 0 and flal for small val ues of Pi l rlllio: it turns and incrcasc ~ for in termediate values: and it flatte ns ou t ag<lin ond is neorly 1 for large values. X.). (11. If Xl = 0.1. next. tor example.0. Xl and X 2. for ~pplicanls with high paymcn H oincome ratios. This procedure is sum marized in Key Concept 8.4.6 + 2 x 0.2. SO. When the Pi l TtI/io is 0. leaving out a determina nt of Y that is corre lated with the included regressors results in om itte d va ria ble bias.6.2) = 2. this method always works for computing predicted effects of a change in X.e expected c hange in Yarisi ng from a change in X.4 an d X 2 = I..8)_ l ).1.2 is Pr(deny :: 11 PIT rOlio = 0. the probability that Y = 1 give n XI = 0. Tn li nea r regressio n.6. no matter how complicated the nonlinear model. The pro bit model wit h multiple regressors extends the si ngleregressor pro bit model by adding regressors to co mpute Ihe . then compute the dWerencc b!:lween the Iwo predicted values.
and esti mated effccts are sum marized in Key Conccpl ll .16) (0. X.6) where the de penden t variable Y is binary. The prohit coeffi cients i31J• {3 ).392 CHAPTER 1 1 Regressioo with a Binary Dependent Variable Key CONCEPT Ti:JE.::rencc..! va lue. X. 7) is tha t the PII ratio is positive ly related to proba bi lity of denial (tbe coefficient 011 the ('II ratio is posilive) and Ihis re1ationsb ip is statistica lly significant (t "" 2. (0.( Y = II X. z == f30 + {3IX I + f32X2 + .: is calcu_ l ated by computing the zval ue..~7 = 6... lbe predicted probability thai Y "'" I. What is th e change in the predicted prohability lhat a n application \\111 be denied when the payme ntto· income ratio increases fro m 0. AND ESTIMATED EFFECTS The po pu latiull pro bit mode l with multiple regre ssors i~ 11. .. + . + {3 kX j.pROBABILITIES.. I> is the cumulative standard normal distribution function and X I' X 2.toincome rat io (PII raTio): Pr(lieny =I PII ratio) = (N~2 .97. + ~. lndced.• 13k .97{'11 raTio). . preditted probabilities.£BOBIT MODEL. given \'a lues of X l' X 2•. the pre d icted probabi lity for the ini tial va lue of the regressors: (2) com puting the pre d icted probability for the new or changed value of the regrcssors: and (3) tak ing their ditT. 19 + 2.1 Y a nd 2.ln be rea dily concl ud ed from the estimated prohi! regression in E q uaLion (1 1...'1). As an illustration.2 P. X.3 to O. (11..X. the o nly thing that C . Application to the mortgage data .. .) = <t>(flo + ~ . . we fit a probit modcl to the 2380 obse rvations in our data set on m o rtgage denial (deny) and th e pay me nt.• XI. .2. PREDICTED . TIle probit regression model. do not h~\'e simple im erpret at io m. • . + ~ .47) (11.32). The effect of a change in a regressor is computed by ~I) computing.97 are d ifficult to interpret bC c<lU SC thcy affect the probability of denial via the . are regressors. c lc.7) The estimated coefficients of .J ..X.:\Ll~W<r Ihis q uestion . we follow the procedure in Key Concept K. The model is best interpreted by computing predicted probabilities a nd the e ffect of II change in a regressor.!' .4? To .... and then loo king up Ihis zvalue in the normal distribution table (Ap pe ndix Table I). 1: Compute th r: plob.2.
$ ( 1 .)0) 0.0.19 + 2.3 to 0. the values of the codIidents are difficult to interpret hut the sign and significance are not.lhe dfcct of race on the probability o f mortgage denial.rio + 0.:mls on Ihe starting value of X..4) = Ci)( . l:uger than the increase of 6.8) Agl1in.239 .5) = <1>( 0.t 1.7) is (1 )(2.97 x 0. 3%.3.2 percent age poims.0 percentage points.74PI/ ". (0.0.5).00) = 0. black) = cJ)( . the predicted denial probllbilit y is 7.062.2 Probil and logit Regreuion 393 bililY of denial for PII ratio . of applications.4 to 0. Wh al b.ticaliy significant at the I % leve l (the I . That is. y. the difference in denial probabili ties between ihese two hypothetical applic<llll s is 1 5. the effect of a ch a ng~ in X depl.3 it is 23.~ percentage poin ts.5 is 0.239.5. so this is a simple method to apply in pract ice.3 to 0.3 is (D( 2.3) . indie:lIillg that ao Africa n American "pplica nt has a higher prohahility of denial than a white applicant .7Iblack). "The estimated change in the probability of denial is 0.tant their paymenttoincome mtio.5%. Standard errors produced by sllch software can be used in the !>amc way as the ~tandard errors of regression coefficients: for example. and th en compute the difference. 71) "" 0.97 x 0.4 is associated with an incrense in the probabilit y of denial of 6. then the esti mated denial probabil ity based on Equation (1 1.97 x 0.19 + 2.4. if PII ratio = 0. an increase in the paymentlaincome ratio from 0. . including regres sion wit h a hinary dependent v'lriablc.44) (1 [. while for a black applicant with PI.9%. a 95% confidence in tcrval for the .1. The probit coefficients reported here were estimated using the method of muximum likelibood.4. from 9. The probability of denial \\ hen PI! ratio = 0. Thb coefficient is sta ti.~I<lti " tjc on Ma ck j. Thus the change io the predicted probability when PII ralio increases from 0.16) (0. 'Illc maximum likelihood estimator is con sistent .26 + 2. s. e estimate a probi! regressiun witb boTh PIT rario and l>Itu:k as rcgrcssors: = Pr(dellY  IJ PI! ratio..159. which produces efficient (minimum variance) estimalo~ in a wide variet).2.1 59 .097. For a white applicant with Pli ralio = 03. 2 percentage points when the PITratio increases from 0.097 = 0. Beca use the probit regression function is nonlinear. 'rne cocf(icicnt 0 11 black b positive. sta t i~tical Estimation ofthe probit coefficients..5. then for PI! rmio = 0.4 is (1)( 2.7% to 15. so thai (statistics and confide nce interva ' ~ for the coefficients can be constructed in the usual \~a) .0. ratio = 0.lnd llonnaUy distributed in large samples. Regression soft ware for estimating probit models typically uses maximu m likelihood estimation. For exam ple. 159. holding con stant the paymenttoincnmc ratio? To estimate this effect. The probability of uenial when PI! ratio = 0.. hold ing conJ.0"3) (0. or 8. 19 + 2.
2.3.3. t:XCCpl that the cumulal ivc distri bUlion fu nction is different.l he logit coeffic ients are best interpreted by computing pre· dicted probabilities and d iffere nces in predicted probabilities. r Logit regressio n is similar 10 probil regression .3 The po pulation logit model o f the binary dependent variable }' "'ilh mUltiple regressors is 1 ::: 1 + eCAl+.9).. Max imum li kelihood estimalion is discussed ~ ~ further in Section 11.~t ·p. As with probil. Logit regression is su mmarized in Key Concept 1l. true probi! coefficient can be conslJucted as the e ~timated coeffici ent ::!: 1.. which we de note by F.. This is illust rated in fig: ure 11.394 CHAPTfR 11 Regression with a Sioory Dependent Variable LOGIT REGRESSION 11.B. ~ }~ Logit Regression The logit rezression model.. which is given as the final expression in Equat ion (11. The Jogit regression mooel is similar to the pro bit regression mode l. The logistic cumulative distribution function has a specific functional form. estimated by maximum likelihOod .{.· tributio n fun crion . £ ·statistics computed usin g maximum likelih ood estimat ors ca n be used to test joint hypotheses.ich graphs the probit and logit regression functions fo r the depenM n! variable deny and the single regressor Pil TaljO . The maxim um likeli hood estimator is consistent and no nnally distributed in l ar!!~ sa mples. with additional del ails given in A ppendix 11. wh. The coefficients of the logit model can be esti mated by maximum likelih ood. Similarly. defined in te rms of the e xponent ial function.X" p'>.6) is replaced by the cumulative srandard logistic di. The logit and pro bit regression func tion s are simila r.. except that the cumula tive standard normal d istribut ion funccion (}) in Equation (11 .3. so that tstClt istics and confidence intervals for thc coeffici ents can be con structed in the usu a l way.96 stan· dard errors.
0 0. ..074. . black) = F( 4. Given the P/ 1Ratio Thew lagit and probit model. Ihis dis tinctio n is no longer important. .4%. . _ .• .47).1 and 11.".1O) (0.. With the adve nt of morl! efficie nt co mputers.37PI! ratio + (0. The diffe «nces be twee n the two functions are sm all.0 0.2 0.8 percentage points. ~ ~".8 0(.( ·UH 5. in Figu.21XOJJ = 1/ /1 + e2. using the 2380 observati ons in t he data set.11 ...2 03 OA 0. The pred icted de nia l probabilit y o f an African Ame rican applicant with PI! falin = 0. the main motiva tion for logil regression was that Ihe logist ic cum ulative distribution fu nction could he computed faster tn(lD the normal cumu lative distribution function. . or 7..2 _____ u _ _ _ .J1XO. _.52J = 0. yie lds the estimated regression function Vr(deny = l lP II ratio. Application to the Boston HMDA data _ A logit regression of dell)' against PI[ ratio and black .2 . produce nearly identicol esti motes of !he probability that a mortgage application will be denied. 11. _ .3 Probit and logit Models of the Probability of Denial.7 08 P/ I ra n o ~.. h I. .J' _ .3 is 1/[1 + e .27hfack ).'l Probil and lagit Regression 395 f iGURE: 11 . o r 22 .. Hi storically..3 is I / fl + e U5 J = 0. so th e di ffe rence between the two proba bilities is 14. .6 0. "''''~:~ Mortgag~ dffiied .%) 1..15) lbe coeffi cient on black is' positive a nd statistically signiUcant at the 1% level (the tstatistic is 8. _ _ _ _ _ _ _ _ _ • • _ .lion.I 0.2%."._.222.0 IJ.13 + 5.. 04 0. .35) (0.4 J OIl'" 0. . __ u _ u _ Mortgage approved 11. given the poymentkl income ratio.2 1 · l 1. (11.3t1. ( ~ using Ihe sam e 127 obse". .5 0. d eny 14 1. The pred icted denia l probability of a white applicant with PI! ratio = 0.
396 CHA PTlR " Regression with e Binary Dependent V enable Comparing the Linear Probability. Fo r practical purposes thl! two t. in which ca ~ e th e linear probability model still can provide an adeq us it: approx imation . based on Eq uation (1L10). O ne way to choose bdween logit and probi t is to pick the method that i~ easiest to use in your statistical soft ware. 11 . I 1. 7 pe rcent age points. In cont rast. car prolxl bil ity model is casiest to use and to inte rpret. Th aI is. So which shou ld you use in practi(:c? There is 0 0 o nc righ t a nswe r.:~ · si on fun ct ions are a nonlinear ~nc tio n of Ihe codricien!!>. and logitarc just approximati o n~ to the unknown popula tion regression funClion E(Y I X) ..: non linear nature o f the true populatio n regressio n fu nction .1lle only way to know this. 6) ap pear inside the cumulative <. For exa mpl e. th l! estima ted black/white gap from the li near p roba bili tv model is 17.111' dard no rmal tlist ribution function (I>.. but their regrc~ io n cocf f)Cients are more difficult to interpret. in some d ata sets there may be few ext reme va lu ~s of the regressors. Pr( Y .8) the dif(cre ncx' in denial probabili ties between a black upplicant and a white applica nt with Pi l rario = OJ was cstimatcd to be J5.m""Y .2 and 8.9 percentage points./3A' in Eq alion (11.. the unknown coefficients o r those nonlinear reg. or oo". and Logit Models All three mode lslinear probability. E ven so.If the in dependent varia bles hu t arc linear fu ncti ons of the unk nown coe ff ici cnl ~ (" parnmctcrs"). and Ihe logil c~ ci c nts in Equalioll ( J I.R percentage points.: stimates arc very similar. and differe nt researchers use diffe rent model\.3) .r<: s ion funclions can he estimated by O LS. Probit. .. ~(J~. TIle linear pro ba bili ty mode l provides t h~ast sen sible approx imalion to the nonli near population regre!>sion functi on. Probit and logit regressions freq uent ly produce si milar results.'l l ..3 Estimation and Inference in the Logit and Probit Models 2 The' non linea r models studied in Sections 8. the pTll\1i l coeffici ents {Ju' f31"" .\.\').. is 10 estimatl! both a hn ear and nonline ar model und 10 compare their predicted pro babil ities.. whereas the logit estima te of this gap. L n the denial probability regression in Eq ua tio n ( 11.. huger than the probi! and 10gil estimates but slill qU<l litatively sim ilar. Probit and logit regressio ns mode l (his no nlinearity in the probabil ities. w