James H. Stock
HARVARD UNIVERSITY
Mark W. Watson
PRINCETON UNIVERSITY
~
...
Boston San Francisco N ew York
London Toronco Sydney ToJ..:yo Singapore Madrid
Mexico City Munich Paris Cape Town Hong Kong Montreal
Brief Contents
PART ONE
CHAPTE R I CHAPTER 2 CHAPTER 3
Introduction and Review
Economic Questions and Data ReviewofProbability Review of Statistics 17
65
3
PART TWO
CHAPT ER 4 CHAPTER 5
Fundamentals of Regression Analysis
Linear Regression with One Regressor 1II
109
Regression with a Singl e Regressor: Hypothesis Tests and
Confidence In tervals 148
Linear Regression with Multiple Regressors Hypothesis Tests and Confidence Intervals
in Mult ipl e Regression 220
No nlinear Regression Functions 254
312
Assessing Studies Based on Multiple Regression 186
CHAPTER 6 CHAPTER 7
CHAPTER 8 CHAPTER 9
PART THREE Further Topics in Regression Analysis
CHAPTER 10 CHAPTER II C HAPTE R 12 CHAPTER 13
347
383
Regression with Panel Data
349
421
468
Regression with a Binary Depe ndent Variable Instrumenta l Variables Regression Ex periments and Quasi Experiment s
PART FOUR
CHAPTER 14 CHAPTER 15 C HAPTER 16
Regression Analysis of Economic
Time Series Data 523
Introduction to T ime Series Regression and Forecasting Estimation of Dynamic Causal Effects 591
637
Additional Topics in Time Series Regression 525
PART FIVE
CHAPTER 17
The Econometric Theory of Regression Analysis
The Theory of Linear Regression wit h One Regressor 704
675
677
CHAPTER 18 The T heory of Multiple Regression
v
Contents
Preface
XXVIl
PART ONE
CHAPTER I
1.1
Introduction and Review
Economic Questions and Data 3
Economic Questions We Examine 4
Question # 1: Does Reducing Class Size Improve Elementary School
Education? 4
Question # 2: Is There Racial Discrimination in the Market for Home Loans? Question # 3: H ow Much Do Cigarette Taxes Reduce Smoking? 5
Question #4: W hat Will the Rate of Inflation Be Next Year? 6
Quantitative Questions, Quantitative Answers 7
5
1.2
Causal Effects and Idealized Experime nts
Estimation of Causal Effects 8
Forecasting and Causality 9
8
1.3
Data: Sources and Types
10
10
Expe rimental versus Observational Data CrossSectional Data [ I
Time Series Data II
Panel Data 13
CHAPTER 2
2.1
Review of Probability
17
18
Random Variables and Probability Distributions
Probabilities, the Sample Space, and Random Variables 18
Probability Distribution of a Discrete Random Variable 19
Probability Distribution of a Continuous Random Variable 2 1
2.2
Expected Values, Mean, and Variance
23
The Expected Value of a Random Variable 23
The Standard Deviation and Variance 24
Mean and Variance of a Linear Function of a Random Var iable Other Measures of the Shape of a Distribution 26
2.3
25
Two Random Variables
29
29
Joint and Marginal Distributions Conditional Distributions 30
vii
viii
CO NTENTS
Independence 34
Covariance and Correlation 34
The Mean and Variance of Sums of Random Variables
2.4
35
The Normal, ChiSquared ,Student t, and F Distributions
The Normal Distribution 39
The ChiSquared Di stribution 43
The Student t Di stribution 44
The F Distribution 44
39
2.5
Random Sampling and the Distribution of the Sample Average
Random Sampling 45
The Sampling Distribution of the Sample Average 46
45
2.6
LargeSample Approximations to Sampling Distributions
The Law of Large Numbers and Consi stency The Central Li mit Theorem 52
APPENDIX 2.1
48
49
63
Derivation of Results in Key Concept 2.3
CHAPTER 3
Review of Statistics
65
66
3.1
Estimation of the Population Mean
Esti mators and Their Properties 67
Properties of Y 68
The Importance of Random Sampling 70
3.2
Hypothesis Tests C oncerning the Population Mean
71
Null and Alternative Hypotheses 72
The p Value 72
Calculating thepValue When O' y Is Known 74
The Sample Variance, Sample Standard Deviation , and Standard Error Calculating thepValue When O'y Is Unknown 76
The tStatistic 77
Hypothesis Testing with a Prespecifled Significance Level 78
OneSided Alternatives 80
3.3 3.4
75
Confidence Interva ls for the Population Mean Comparing Means from Different Populations
81
83
84
Hypothesis Tests for the Difference Between Two Means 83
Confidence Intervals for the Difference Between Two Population Means
3.5
DifferencesofMe an s Estimat ion of Causa l Effects
Using Experiment al Data 85
The Causal Effect as a Difference of Conditional Expectations 85
Estimation of the Causal Effec t Using Differences of Means 87
CO NTENTS
ix
3.6 Using the tSt at istic W hen the Sample Size Is Small 88
T he tStatistic and the Student t Distribution 88
Use of the Student t Distribution in Practice 92
3.7 Scatterplot, the Sample Covariance, and the Sample Correlation 92
Scatterp[ots 93
Sample Covariance and Correlation 94
APPENDIX 3 . 1 APPENDIX 3.2 APPENDIX 3.3
The U.S. Current Population Survey
[05
Two Proofs That Y [s the Least Squares Estimator of J.Ly A ProofThat the Sample Variance Is Consistent [07
[06
PART TWO
CHAPTER 4
Fundamentals of Regression Analysis
linear Regression w ith One Regressor 112
III
109
4. 1 The Linear Regression Model
4. 2
Estimating the Coefficients of the Linear Regression Model
T he Ordinary Least Squares Estimator [18
OLS Estimates of the Re [ationsh ip Between Test Scores and the
StudentTeacher Ratio 120
Why Use the OLS Estimator? 12 1
116
4.3
Measures of Fit
123
The R2 [23
The Standard Error of the Regression 124
Application to the Test Score Data 125
4.4 T he Least Squares Assumptions 126
Assumption #1: The Conditional Distribution ofu, Given X, Has a Mean
of Zero [26
Assumption #2: (X,. Y,). i n Are Independent[y and Ide ntically
Distributed [28
Assumption #3: Large Outliers Are Unlike[y [29
Use of the Least Squares Assumptions 130
=[.. . ..
4.5 The Sampling Distribution of the OLS Estimators T he Sampling Distribution of the OLS Estimators
4.6 Conclusion
AP P E NDI X 4 . 1 A PPEND IX 4. 2 A PP EN D IX 4 . 3
131
132
135
The California Test Score Data Set Derivation of the OLS Estimators [43 143
144
Sampli ng Di stribution of the O LS Estimator
x
CO NTENTS
CHAPTER 5
Regression with a Single Regressor: Hypothesis Tests
and Confidence Intervals 148
Regression Coefficients 149
5.1 Testing Hypotheses About One of the
Tw oSided H ypotheses Concerning f31 149
O neSided H ypotheses Concerning f31 153
Testing Hypot heses About the Intercept f30 155
5.2
II
Confidence Intervals for a Regression Coefficient Regression W hen X Is a BinaryVari able
Inte rpretation of the Regression Coefficients
ISS
I.
I
5.3
158
158
5.4
Heteroskedasticity and Homoskedasticity
160
W hat Are Heteroskedasticity and H omoskedasticity? 160
Mathematical Implications of Hom oskedasticity 163
W hat Does This Mean in Practice? 164
5.5 The Theoretical Foundations of Ord inary Least Squares 166
Linear Cond itionally Unbiased Estimators and the Ga ussMarkov r heorem Regression Estimators Other Than OLS 168
5.6 Using the tStatistic in Regression When the Sample Si z~
167
Is Small
169
The tStatistic a nd the Student t Di stribution 170
Use of the Student t Distribution in Practice 170
5.7 Conclusion
171
180
APPE N DI X 5 .1 Form ulas for OLS Standard Errors APPEND IX 5.2
The GaussMarkov Cond itions and a Proof
of the GaussMarkov Theorem 182
CHAPTER 6
6.1
Linear Regression with Multiple Regressors Omi tted Variable Bias 186
186
Definition of Omitted Variable Bias 187
A Form ula for Omitted Va riable Bias 189
Addressing Omitted Variable Bias by Dividing the Data into Groub s
6.2
19 1
The Multiple Regression Mode l
193
194
The Population Regression Line 193
The Popu lation Multiple Regre ssion Model
6.3
The OLS Estimator in Multiple Regression
196
198
The OLS Esti ma tor 197
Appl ication to Test Score s and the Stude ntTeacher Ratio
CONT ENTS
xi
6.4 Measures of Fit in Multiple Regression
200
T he Standard Error of the Regression (SER) 200
The R2 200
The 'I\djusred R2" 20 I
Applicat ion to Test Scores 202
6.5 The Least Squares Assumptions in Multiple Regression
Ass umption # 1: The Conditio nal Distributi on of Given X I, ' X 2i , Mean of Zero 203
Assumption # 2: (XI, ' X 2" . . . , Xiu ' Y,) i I, ... ,n Are i.i.d. 203
As sum ption # 3: Large Outliers Are Unlikely 203
Assumpti on # 4: No Perfect Mul ticollinearity 203
u ,
202
. ..
,Xk, H as a
=
6.6 The Distribution of the OLS Estimators
in Multiple Regression
205
206
6.7 Mul t icollinearity 206
Examples of Perfect Multicollineari ty Impe rfect Multicollinearity 209
6.8 Conclusion
APP ENDIX 6 . 1 APPEN DIX 6 .2
210
Derivation of Eq uation (6 .1) 218
Di stribution of the OLS Estimators W hen There Are Two
Regressors and H omoskedastic Errors 218
CHAPTER 7 Hypothesis Tests and Confidence Intervals
in Multiple Regression fo r a Single Coeffic ie nt
220
221
7.1 Hypothesis Tests and Confidence Intervals
Standard Errors for the OLS Estimators 22 1
Hypothesis Tests for a Si ngle Coeffi cient 22 1
C onfidence Intervals for a Single Coefficient 223
Application to Test Scores and the Studen t Teacher Ratio
7.2 Tests of Joint Hypotheses 225
Testi ng H ypotheses on Tw o or More Coefficients 225
T he FStatistic 227
Applicatio n to Test Scores and the Student Teacher Ratio T he Homoskedastici ryOnly FStatistic 230
22 3
229
7.3 Testing Si ngle Restrictions
Involving Multiple Coefficient s
232
234
7.4 Confidence Sets for Multiple Coefficients
xii
CONTE NTS
7.S
Model Specification for Multipl e Regression
235
Omitted Variable Bias in Multiple Regression 236
Model Specification in Theory and in Practice 236
Interpreting the R2 and the Adjusted R2 in Practice 237
7.6 7.7
Analysis of the Test Score Data Set Conclusion 244
239
APPE ND IX 7. 1 The Bonferroni Test ofa Joint Hypotheses
25 1
CHAPTER 8
Nonlinear Regression Functions 8.1
254
A General Strategy for Modeling Nonlinear
Regression Functions 256
Test Scores and District Income 256
The Effect on Y of a Change in X in Nonlinear Specifications 260
A General Approach to Modeling Nonlinearities Using Multiple Regression
264
8 .2
Non linear Functions of a Single Independent Variable
264
Polynomials 265
Logarithms 267
Polynomial and Logarithmic Models of Test Scores and District Income 8.3
275
Interactions Between Independent Variables
277
280
Interactions Between Two Binary Variable s 277
Interactions Between a Continuous and a Binary Variable Interactions Between Two Continuous Variables 286
8.4
Nonli ne ar Effects on Test Scores of the StudentTeacher Ratio
Discussion of Regression Results Summary of Findings 295
291
290
8. 5
Conclusion
APPENDI X 8.1
296
Regression Functions That Are Nonlinear
in the Parameters 307
CHAPTER 9
Assessing Studies Based on Multiple Regression
312
9.1
Internal and Exte rna l Validity
Threats to Internal Validity 313
Threats to External Validity 314
31 3
9.2
Threats to Internal Va lidity of Multiple Regression Analysis
Omitted Variable Bias 316
Misspecification of the Functional Form of the Regression Function Errorsin Variables 319
Sample Se lection 322
316
319
CONTE NTS
xiii
Simulta neous Causa li ty 324
Sources of Inconsistency of OLS Standard Errors
325
9.3 Internal and External Validity When the Regression Is Used
for Forecast ing
327
328
Using Regression Models for Forecasting 327
Asse ssing the Validity of Regre ss ion Models for Foreca sti ng
9.4 Example: Test Scores and Class Size External Validity 329
Internal Validity 336
Discu ss ion and Implication s 337
9.S Conclusion
APPEND IX 9.1
329
338
The Massachu setts Elementary School Testing Data
344
PART THREE Further Topics in Regression Analysis
CHAPTER 10
347
Regression with Panel Data
349
351
10. 1 Panel Data 350
Example: Traffic Deaths and A lcohol Taxes
10.2 Panel Data with Two T ime Pe riods: "Before and After"
Comparisons
353
10.3 Fixed Effects Regression 356
The Fixed Effects Regression Mode l 356
Estimation and Inference 359
Application to Traffic Death s 360
10.4 Regression with Time Fixed Effects 361
Time Effects Only 36 1
Both En tity and Time Fixed Effects 362
10.S The Fixed Effects Regression Assumpt ions and Standard Errors
for Fixed Effects Regression
364
The Fixed Effects Regression Assumptions 364
Standard Errors for Fi xed Effects Regression 366
10.6 Drunk Driving Laws and Traffic Deaths 10.7 Conclusion
367
371
378
APPENDIX 10 . 1 The State Traffic Fatality Data Set APPENDIX 10 . 2
Standard Errors for Fixed Effects Regression
with Serially Correlated Errors 379
xiv
CONTENTS
CHAPTER II
11.1
Regression with a Binary DependentVariable
Binary Dependent Variables 385
The Linear Probability Model 387
383
384
Binary Dependent Variables and the Linear Probability Model
11.2
Probit and Logit Regression
Probit Regression 389
Logit Regression 394
389
Comparing the Linear Probability, Probit, and Logit Models
11 .3
396
Estimation and Inference in the Logit and Probit Models
Nonlinear Least Squares Estimation 397
Maximum Likelihood Estimation 398
Measures of Fit 399
396
11.4 11.5
Application to the Boston HMDA Data Summary 407
400
APPENDIX 11 . 1 The Boston HMDA Data Set APPENDIX 11 . 2 APPEND IX 11. 3
415
415
418
Maximum Likelihood Estimati on
Other Limited Dependent Variable Models
CHAPTER 12
12.1
Instrumental Variables Regression
421
The IV Estimator w ith a Single Regressor
and a Single Instrument 422
The IV Model and Assumptions 422
The Two Stage Least Squares Estimator 423
Why Does IV Regression Work? 424
The Sampling Distribution of the TSLS Estimator Application to the Demand for Cigarettes 430
428
12.2
The General IV Regression Model
432
434
TSLS in the General IV Model 433
Instrument Relevance and Exogeneity in the General IV Model The IV Regression Assumptions and Sampling Distribution
of the TSLS Estimator 434
Inference Using the TSLS Estimator 437
Appl ication to the Demand for Cigarettes 437
12.3
Checki ng Instrument Validity
439
Assumption # 1: Instrument Relevance 439
Assumption # 2: Ins trument Exogeneity 44 3
12.4
Application to the Demand for Cigarettes
445
CONTENTS
XV
12.5 Where Do VaJid Instruments Come From?
450
Three Examples
12.6 Conclusion
451
455
462
APPENDIX 12. 1 The Cigarette Consumption Panel Data Set APP EN DIX 12 .2
Derivation of the Formula for theTS LS Estimator in
Equation (12.4) 462
LargeSample Distribution ofthe TSLS Estimator 463
APPEND IX 12 .3 A PPE N DIX 12 .4
LargeSample Di stribu t ion of the TSLS Estimator When the
Instrument Is Not Valid 464
Instrumental Variables Anal ysis with Weak Instruments 466
APPENDIX 12.5
CHAPTER 13
Experiments and QuasiExperiments
Ideal Randomized Controlled Experiments The Differen ces Estimator 471
468
4 70
13.1 Idealized Experiments and Causal Effects
470
13. 2 Potential Problems with Experiments in Practice Threats to Internal Validity 472
Threats to External Validity 47 5
13.3 Regression Estimators of Causal Effects
472
Using Experimental Data
477
The Differences Estimator with Additi onal Regressors 477
The DifferencesinDifferences Estimator 480
Estimation of Causal Effects for Different Groups 484
Estimation When There Is Partial Compliance 484
Testing for Randomization 485
13.4 Experimental Estimates of the Effect of Class Size Reductions Experime ntal Design 486
Analys is of the STAR Data 487
Comparison of the Observational and Expe rimental Estimates
of Class Size Effects 492
13.5 QuasiExperiments 494
Examples 49 5
Econometric Methods for Analyzing QuasiExperiments
13.6 Potential Problems with QuasiExperiments
486
497
500
Threats to Internal Validity 500
Threats to External Validity 502
CONTENTS
xv
12.5 Where Do Valid Instruments Corne From? Three Examples 451
12.6 Conclusion
450
455
462
APPENDIX 12 . 1 The Cigarette Consumption Panel Data Set AP PENDIX 12.2
Derivation of the Formula for theTSLS Estimator in
Equation (12.4) 462
LargeSample Distribution oftheTSLS Estimator 463
APPENDIX 12.3 APPENDIX 12.4
LargeSample Distribution of the TSLS Estimator W hen the
Instrument Is Not Valid 464
Instrumental Variables Analysis with Weak In struments 466
APPENDIX 12.5
CHAPTER 13
Experiments and QuasiExperiments
468
13.1 Idealized Experiments and Causal Effects 470
Ideal Randomized Controlled Experiments 470
The Differen ces Estimator 471
13.2
Potential Problems with Experiments in Practice
Threats to Internal Validity 472
Threats to External Validity 475
472
13.3 Regression Estimators of Causal Effects
Using Experimental Data 477
The Differences Estimator with Additional Regressors 477
The Diffe rencesinDifferences Esti mator 480
Estimation of Causal Effects for Different Groups 484
Estim ation When There Is Partial Compliance 484
Testing for Randomization 485
13.4
Experimental Estimates of the Effect of Class Size Reductions
Experime ntal Design 486
Analysis of the STAR Data 487
Comparison of the Observational and Experimental Estimates
of Class Size Effects 492
486
13.5 QuasiExperiments 494
Examples 495
Econometric Me thods for Analyzing QuasiExperiments
13.6 Pot ential Problems with QuasiExperiments Threats to Internal Val idity 500
Threats to External Validity 502
497
500
xvi
CONTENTS
13.7
Experimental and QuasiExperimental Estimates in Heterogeneous Populations 502
Population H eteroge neity: Whose Causal Effect ? 502 OLS w ith Heterogeneous Causal Effe cts 503 IV Regression wi th Heterogeneous Causal Effects 50
13.8
Conclusion
A PP EN DIX 13.1 A PPEND IX 13. 2
507
The Project STAR DataS et 516
Extension of the Differencesi n Differences Estimator to Multiple Time Periods 517 Conditional Mean Independe nce 518
A PPE N D IX 13.3 APP EN DIX 13.4
I
Individuals
IV Estimation W hen the Causal Effect Varies Across 520
I
PART FOUR
14.1 14.2
Regression AnaJysis of Economic Time Series Data
Using Regression Mode ls for Forecasting 527 528
528
523
CHAPTER 14 Introduction to Time Series Regression and Forecasting
525
Introduction to T ime Series Data and Serial Correlation
The Rates of Inflation and Unempl oyme nt in the United States Lags, Fi rst Differences, Logarithm s, and Growth Rates 528 Autocorrelation 532 Other Examples of Economic Time Series 533
14.3
Autoregressions
535
The First Ord er A utoregressive Mode l 535 The p th Order Autoregressive Model 538
14.4
Time Series Regression with Addit ional Predictors and the Autoregressive Distributed Lag Model 541
Forecasting Change s in the Inflation Rate Using Past Unemployment Rates 541 Stationarity 544 Time Series Regre ssion w ith Multiple Predictors 545 Forecast Uncertainty and Forecast Intervals 548
14.5
Lag Length Selection Using Informat ion Criteria
549
553
Determining the Order of an Autoregression 55 1 Lag Le ngth Se lection in Time Series Regression W it h Mult iple Predictors
14.6
Nonstationarity I: Trends
What [s a Tre nd?
P ....... I ~... .."..~
554
" 1:; 7
555
I..... ~
r.., ........ .....I
c:: . __ l... .... ;" ;... T ...,.. .... ...I r
CONTENTS
xvii
De tec ting Stochas tic Tre nds : Testing for a Unit A R Roo t 560 Avoiding the Pro ble ms Caused by Stochastic Trends 564 14.7 Nonstationarity II: Breaks 565
W hat Is a Break? 565
Testi ng fo r Breaks 566
Pse udo Outo fSample Fo recasting 571
Avo iding the Proble ms Caused by Breaks 576
14.8 Conclusion
AP P ENDI X 14 . 1 A PPE NDI X 14 . 2 APPENDIX 14 .3 APPENDIX 14.4 A P PEN D IX 14.5
577
Time Series Data Used in Chapter 14 Stationarity in the A R(I ) Model Lag Operator Notation ARM A Models 589 588 586 586
Consistency of the BIC Lag Length Estimator 589
CHAPTER 15
1 S.l 1 S.2
Estimation of Dynamic Causal Effects An Init ial Taste of the O range J uice Data Dynamic Causal Effects 595
596
Causal Effects and Time Series Data Tw o Types ofExogenei ty 598
591 593
t 5.3
Estimation of Dynamic Causal Effects with Exogenous Regressors
The Distributed Lag Model Ass umptions 601 Au tocorrelated U t , Standard Errors, and Infe rence 60 I Dynamic Multi pli e rs and Cumulative Dynamic Mul t ipl iers
600
602
15.4 H et eroskedasticity and Autocorrelat ionConsistent Stan dard
Er rors
604
604 606
DIstribution of the OLS Estimator w ith Autocorre la ted Errors H AC Standard Errors
15.5 Estimation of Dynamic Causal Effects with Strict ly
Exogenous Regressors 608
The Distributed Lag Mode l wit h A R(I) Errors 609 OLS Estimation of the ADL Model 6 12 GL S Estimation 613 The Distributed Lag Model w ith Additional Lags a nd A R(p) Errors 15.6 Orange J uice Prices an d Cold Weather 15.7 Is Exogeneity Plausible? Some Examples U.S. Income a nd Aus tra lian Exports 625
Oil Prices and Inflation 626
6 15
618 624
xviii
CONTENTS
Monetary Policy and Inflation The Phillips Curve 627
15. 8
626
Conclusion
APPENDIX 15.1 APPENDIX 15.2
627
The Orange Juice DataSet 634
The ADL Model and Generalized Least Squares
in Lag Operator Notation 634
CHAPTER 16
16.1
Additional Topics in Time Series Regression Vector Autoregressions
638
637
The VAR Model 638
A VAR Model of the Rates of Inflation and Unemployment
16.2
641
Multiperiod Forecasts
642
Iterated Muliperiod Forecasts 643
Direct Multiperiod Forecasts 645
Which Method Should You Use? 647
16.3
O rders oflntegration and the DFGLS Unit Root Test
Other Models ofTrends and Orders of Integration 648
The DFGLS Test for a Unit Root 650
Why Do Unit Root Tests Have Nonnormal Distributions?
648
653
16.4
Cointegration
655
658
Cointegratlon and Error Correction 655
How Can You Tell Whether Two Variables Are Cointegrated 7 Estimation of Cointegrating Coefficients 660
Extension to Multiple Cointegrated Variables 661
Application to Interest Rates 662
16.S
Volatility Cl ustering and Autoregressive Conditional
Heteroskedasticity 664
Volatility Clustering 665
Autoregressive Conditional Heteroskedasticity Application to Stock Price Volatility 667
666
16.6
Conclusion
669
674
AP PENDIX 16 . 1 U.S. Financial Data Used in Chapter 16
PART FIVE
CHAPTE R 17
The Econometric Theory of Regression Analysis
The Theory of Linear Regression with One Regressor
The Extended Least Squares Assumption s The OLS Estimator 680
678
675
677
678
17.1
The Extended Least Squares Assumptions and the OLS Estimator
CON TENTS
xix
17.2
Fundamentals of Asymptotic Distribution Theory
680
Convergence in Probability and the Law of Large Numbers 681
The Centra l Limit Theorem and Convergence in Distri bution 683
Slutsky's Theore m and the Contin uous Mappi ng Theorem 685
Application to the tStatistic Based on the Sample Mean 68 5
17.3
Asym ptotic Distribut ion of the OLS Estimator and tStatist ic
686
Consistency and Asymptotic Normality of the OLS Estimators 686
Consistency of H eteroskedasti cityRo bust Standard Errors 686
Asymptotic Norma lity of the H eteroskedasticityRobust tStatistic 688
17.4
Exact Sampling Dist ributions When the Errors
Are Normally Distributed 688
Distribution of ~ J with No rma l Errors 688
Distribution of the Homoskedasticityonly tStatistic 690
17.5
Weigh ted Least Squares
691
W LS with Known Heteroske dasti city 69 1
W LS w ith H eteroskedasticity of Know n Functional Form 692
H eteroskedasticityRobust Standard Errors or W LS? 695
AP P ENDI X 17. 1 The Normal a nd Re lated Distributions and Moments of
Contin uous Random Variables
A PP ENDI X 17. 2
700
702
Two Inequa lit ies
C HAPTER 18 The Theory
18.1
of Multiple Regression
704
The Linear Multipl e Regressi on Model and OLS Estimator
in Matrix Form 706
The Mul tiple Regressio n Model in Matrix Notation The Extended Least Squares Assu mptions 707
The O LS Estimator 708
706
18.2
Asymptot ic Distribution of the OLS Estimator and tStatistic
The Mul tivariate Central Limit Theorem 7 10
Asymptotic Normali ty of ~ 710
Hete roskedastic ityRobust Standard Errors 711
Co nfidence Intervals for Predicted Effects 712
A symptotic Di stri but ion of the tStatistic 7 13
710
18.3
Tests of Join t Hypoth eses
713
Join t H ypotheses in Matrix Notation 713
Asymptotic Di stribution of the FStatistic 7 14
Confidence Sets for Mult iple Coefficients 71 4
18.4
Distribution of Regression Statistics w ith Normal Errors
Matrix Represe ntations ofO LS Regression Statistics Di stribution of ~ w ith Nor mal Errors 7 16
7 15
715
xx
CONTEN TS
Distribution of S$ 717
HomoskedasticityOnly Standard Errors Distribution of the tStatistic 718
Distribution of the FStatistic 718
717
18. 5
Efficiency of the OLS Estimator with Homoskedastic Errors
The GaussMarkov Conditions for Multiple Regression 719
Linear Conditionally Unbiased Estimators 719
The GaussMarkov Theorem for Multiple Regression 720
719
18.6 Generalized Least Squares 721
The GLS Assumptions 722
GLS When n Is Known 724
GLS When n Contains Unknown Parameters 725
The Zero Conditional Mean Assumption and GLS 725
18.7 Instrumental Variables and Generalized Method
of Moments Estimation
727
The IV Estimator in Matrix Form 728
Asymptotic Distribution of the TSLS Estimator 729
Properties ofTSLS When the Errors Are Homoskedastic 730
Generalized Method of Moments Estimation in Linear Models 733
APPENDIX 18, I APPENDIX 18,2 APPENDI X 18,3 APPE N D IX 18 .4
Summary of Matrix Algebra Multivariate Distributions
743
747
748
Derivation of the Asymptotic Distribution of {3
Normal Errors
APPENDIX 18,5
Derivations of Exact Distributions of OLS Test Statistics with
749
Proof of the GaussMarkov Theorem
for Multiple Regression 751
Proof of Selected Results for IV and GMM Estimation 752
APPENDIX 18.6
Appendix 755
References 763
Answers to "Review the Concepts " Questions Glossary 775
Index 783
767
Key Concepts
PART ONE
1.1
Introduction and Review
15
CrossSectional, Time Series, and Panel Data Expected Value and the Mean
2.1
2.2 2.3 2.4 2. 5 2.6 2 .7 3. , 3.2 3.3 3. 4 3.5 3.6 3. 7
24
25
Variance and Standard Deviation
Means, Variances, and Covariances of Sums of Random Variables Computing Probabil ities Involving Normal Random Variables Simple Random Sampling and i.i.d. Random Variables The Central Limit Theorem Estimators and Estimates Efficiency ofY: Y is BLUE The Standard Error ofY 67 47
38
40
50
Convergence in Probability, Consistency, and the Law of Large Numbers
55
Bias, Consistency, and Efficiency
68
70
79
1: ILY,0
76
80
The Terminology of H ypothesis Testing
Testing the Hypothesis E(Y) = J.Ly,0 Against the Alternative E(Y) Confidence Intervals for the Population Mean
82
PART TWO
4.1
4.2 4.3 4.4
Fundamentals of Regression Analysis
I 19
109
I 15
Terminology for the Linear Regression Model with a Single Regressor The OLS Estimator, Predicted Values, and Residuals The Least Squares Assumptions General Form of the tStatistic
131
LargeSample Distributions of.8o and.81
133
1:
5. 1
5.2 5.3 5.4 5.5 6. 1 6 .2 6.3 6.4 6.5
150
f3 1,0
Testing the Hypothe sis f3 1 = f31 ,0 Against the Alternative f31 Confidence Interval for f31 157 Heteroskedasticityand Homoskedasticity
152
162 189 198
T he GaussMarkov Theorem for .81 168 Omitted Variable Bias in Regression with a Single Regressor The Multiple Regress ion Model
196
The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression Model The Least Squares Assumptions in the Multiple Regression Model Large Sample Distribution of .80' .81' ... , .8k
204
206
xxi
xxii
KEY CONC EPTS
7.1 7. 2 7.3
Testi ng the H ypothe sis f3i = {3,.o Against the A lternat ive f3, 7= !3j.o 222
Confidence Intervals for a Single Coefficient in Multiple Regression 223
O mitted Variable Bias in Multiple Regression 237
7.4
8. 1 8.2 8 .3 8.4 8.5 9.1 9 .2 9 .3 9 .4 9 .5 9 .6 9.7
R2 and R2: W hat T hey Tell You and What They Don't 238
The Expected Effect on Y of a Change in X, in the Nonlinear Regression Model (8 .3) Logarithms in Regression: Three Cases 273
279
282
A Method fo r Interpreting Coefficients in Regressions w ith Bi nary Variables Interactions Between Binary and Continuous Variables Inte racti ons in Mu ltiple Regression Internal and External Validity 313
318
319
287
26 1
Omitted Var 'iable Bias: Should I Include More Variables in My Regression? Functional Form Misspecification ErrorsinVariables Bias Sam ple Se lection Bias 321
323
325
327
Simultaneous Causal ity Bias
Threats to the Internal Validity of a Multiple Regression Study
PART THREE Further Topics in Regression Analysis
10. 1 10.2 10.3 11. 1 11.2 11. 3 12.1 12. 2 12 .3 12.4 12 .5 12 .6 Notation for Panel Data 350
359
365
The Fixed Effects Regression Model The Linear Probability Model Logit Regression 394
388
347
The Fixed Effects Regre ssion Assumptions
The Probit Model, Predicted Probabilities, and Estimated Effects
392
433
The General Instrumenta l Variables Regression Model and Terminology Two Stage Least Squares 435
436
44 1
444
437
The Tv" 0 Condi t ions for Valid Instruments The IV Regression Assumptions
A Rule ofThumb for Checking for Weak Instruments The O veridentifying Restrictions Test (the }Statistic)
PART FOUR
14 .1 14. 2 14. 3 14.4 14.5
Regression Analysis of Economic Time Series Data
530
532
523
Lags , First Differences, Logarithm s, and Growth Rates Autocorre lation (Serial Correlation) and Autocovariance Autoregressions Stationarity 539
544
The A utoregressive Distributed Lag Model 545
KE Y CON CEPTS
xxiii
14. 6 Time Se nes Regression w ith Multiple Predictors 546
14.7 Gra nger Causal ity Tests (Te sts of Predictive Content) 547
14.8 The Augmented Dickey Fuller Test for a Unit Autoregressive Root 14 .9 The Q LR Test for Coefficient Stabil ity 14. 10 Pse udo OutofSample Forecasts 15.1 15 .3 15.4 16. I 16.2 16.3 16.4 16.5 572
600
602
6 17
569
562
The Distributed Lag Model and Exogeneity
15 .2 The Distributed Lag Model Assumptions
H AC Standard Errors 609
Estimation of Dynam ic Multipliers Under Strict Exoge nei ty Vector Autoregressions 639
645
650
Iterated Multiperiod Forecasts
Direct Multiperiod Forecasts 647
Orders of Integration, Differe ncing, and Stat ionarity Cointegration 658
PART FIVE
17 .1 18.2 18. 3 18.4
The Econometric Theory of Regression Analysis
675
680
707
The Extended Least Squares Assumptions for Regression w ith a Single Regressor The Multivariate Central Li mitTheorem The GLS Assumption s 723
710
721
18. 1 The Extended Least Squares Assumption s in the Multiple Regression Model GaussMarkov Theorem for Multiple Regression
G eneral Interest Boxes
The Distribution of Earnings in the United States in 2004 A Bad Day on Wall Street 42
Landon Wins I 7[
35
The Gender Gap of Earnings of College Graduate s in the United States A Novel Way to Boost Retire ment Savings 90
The "Beta" ofa Stock 122
86
The Economic Value of a Year of Education: Heteroskedasticity or Homoskedasticity? 165
The Mozart Effect: Omitted Variable Bias? 190
The Returns to Education and the Gender Gap The Demand for Economics Journa[s 288
323
407
425
Do Stock Mutua[ Funds Outperform the Market? Who Invented Instrumental Variable Regression? A Scary Regre ssion 426
The Externa[ities of Smoking The Hawthorne Effect 474
498
446
284
James Heckman and Danie[ McFadden , Nobel Laureate s
What [s the Effect on Employment of the Minimum Wage? Can You Beat the Market? Part I 540
The River of Blood 550
573
Can You Beat the Market? Part 11
N EWS FLASH : Commodity Traders Send Shivers Through Di sney World Robert Engle and Clive Granger, Nobel Laureates 657
625
Preface
E
conometrics ca n be a fun course for both teacher a nd student. Th e real world of economics, business, and government is a complicated and messy place. full of competing ideas and quest ions tha t dem and answe rs. Is it more effective to tackle drunk driving by passing tough laws or by increasing the tax on alcohol? Can yo u make money in the stock market by buying when prices are historically low, relative to earnings. or should yo u just sit tight as the ran dom walk theory of stock prices suggests? Can we improve elementary education by reducing class sizes. or should we simply have our children listen to Mozart for tcn min utes a day? Econometrics helps us to sort o ut sound ide as from crazy ones and to find qu an titative answers to important quantitative questions. E conometrics opens a wi n dow on our complicated world that lets us see the relationships on which people. businesses, and governments base their decisions. This textbook is designed for a first course in undergraduate econometrics. It is our experience that to make econometrics relevant in an introductory course, interesting applications must motivate the theory and the theory must match the applications. TIlis simple principle represents a significant departure from the older generation of econ ometrics books, in which theoretical models and ass um ptions do not match the app lications. It is no wonder tha t some students question the rel evance of econometrics after they spend much of their time learni ng assumptions that th ey subsequently realize are unrealistic, so that they must then learn "sol u tions" to " problems" that arise when the applications do not match the assump t ions. We believe that it is far better to motivate the need for tools with a concrete application, and thell to provide a few simple assumptions that match the appli catio n. Because the the ory is immedia tely relevant to tbe applica tions, this approach can make econometrics come alive. TIle second edition benefits from the many constructive suggestions of teach ers who used the first edition, while maintaining the philosophy tbat applications should drive the theory, not the other way around. The single greatest change in the seco nd edition is a reo rgani za tion and expan sion of the material o n core regression analysis: Part II, wh ich covers regression with crosssectional data, has been expanded from fo ur chapters to six. We have added new empirical exampl es (as boxes) drawn from economics and finance; some new optional sectio ns on
xxviii
PREFACE
classical regression theory; and many new exercises, both paperandpencil and computerbased empirical exercises using data sets newly placed on the textbook Web site. A more detailed description of changes to the second edition can be fo und on page xxxii.
Features of This Book
This textbook differs from others in three main ways. First, we integrate realworld questions and data into the development of the theory. and we take seriously the substantive findings of the resulting empirical analysis. Second, our choice of top ics reflects modern theory and practice. Third, we provide theory and assumptions that match the applications. Our aim is to teach students to become sophisticated consumers of econometrics and to do so at a level of mathematics appropriate for an introductory course.
Realworld Questions and Data
We organize each methodological topic around an important realworld question that demands a specific numerical answer. For example, we teach singlevariable regression, multiple regression , and functional form analysis in the context of esti mating the effect of school inputs on school outputs. (Do sma ller elementary school class sizes produce higher test scores?) We teach panel data methods in the context of analyzing the effect of drunk driving laws on traffic fatalities. We use possible racial discrimination in the market for home loans as th e empirical appli cation for teaching regression with a binary dependent variable (logit and probit). We teach instrumental variable estimation in the context of estimating the demand elasticity for cigarettes. Although these examples involve economic reasoning, all can be understood with only a single introductory course in economics, and many can be understood without any previous economics coursework.Thus the instruc tor can focus on teaching econometrics, not microeconomics or macroeconomics. We treat all our empirical applications seriously and in a way that shows stu dents how they can learn from data but at the same time be selfcritical and aware of the limitations of empirical analyses. Through each application, we teach students to explore alternative specifications an d thereby to assess whether their substantive findings are robust. The questions asked in the empirical appli cations are important, and we provide serious and, we think. credible answers. We encourage students and instructors to disagree, however, and invite th em to reanalyze the data , which are provided on the textbook's companion Web site (www.awbc.comlstock_ watson).
PREFACE
xxix
Contemporary Choice ofTopics
Econometrics has come a long way in the past two decades. The topics we cover reflect the best of contemporary appli ed econometrics. O ne can only do so much in an introductory course. so we focus on procedures and tests tha t ar e commo nly used in practice. For example:
• Instrumental variables regression. We present instrumental variables regres
sion as a general me thod for handJing correlation be tween the error term and a regressor, which can arise for many reasons, including omitted variables and simultaneous causality. The two assumptions for a valid instrum enl  exo geneity and relevanceare given equal billing. We follow that presentation with an extended discussion of where instruments come fro m, and with tests of overidentifying restrictions and d iagnostics fo r weak instrumentsa nd we explain what to do if these diagnostics suggest problems.
• Program evaluation. An increasing number of econometric studies analyze
either randomized controlled experiments or quasiexperiments, also known as natural experiments. We address these topics, often collectively referred to as program evaluation, in Chapter 13. We present this research strategy as an alternative approach to the problems of omitted variables. simultaneous causality, and selection, and we assess both the strengths and the weaknesses of studies using experimental or quasiexperimental data .
• Forecasting. The chapter on forecasting (Chapter 14) considers univariate
(autoregressive) and multivariate forecasts using time series regression, not large simultaneous equation structural models. We focus on simple and reli able tools, such as autoregressions and model selection via an information cri terion, that work well in practice. This chapter also fe atures a practica lly oriented treatment of stochastic trends (unit roo ts), unit root tests, tests for structural breaks (at known and unknown dates), and pseudo o Ulofsample forecasting, all in the context of developing stable and reliable time series fore casting models.
• Time series regression. We make a clear distinction between two very differ
ent applications of time series regression: forecasting and estima tion of dynamic causal effects. The chapter on causal inference using time series data (Chapter 15) pays careful attention to when different estima tion methods, including generalized least squares, will o r will not lead to valid causal infer ences, and when it is advisable to estimate dynamic regressions using OLS with heteroskedasticity and autocorrelationconsistent standard eITors.
xxx
PREFACE
Theory T hat Matches Applications
Although econometric tools are best motivated by empirical applications, students need to learn enough econometric theory to und erstand the strengths and limita tions of those tools. We provide a modern treatment in which the (it between tbe ory and applications is as tight as possible, while keeping tlle mathematics at a level th at requires only algebra . Modern empirical applica tions share some comm on characteristics: the da ta sets typically are large (hundreds of observations, often more); regressors are not fixed over repeated samples but rather are co llected by random sampling (or some other mechanism that makes them random); the data are not normally distributed; and there is no a priori reason to think that the errors are homoskedastic (although often there are reasons to think that they are heteroskedastic). These observations lead to important differences between the theoretical development in this textbook and other textbooks.
• Largesample approach. Because data sets are large, fro m the outset we use largesample normal approximations to sampling distribu tions for hypothesis testing and confidence intervals. O ur experience is tha t it takes less time to teach the rudiments of largesample approximations than to teach the Student l and exact F distributions, degreesoffreedom corrections, and so forth. This largesample approach also saves students the frustration of discovering that, because of nonnormal errors, the exact distribution theory they just mastered is irrelevant. Once taught in the context of the sample mean, the largesample approach to hypothesis testing and confidence intervals carr ies directly thro ugh mu ltiple regression analysis, logit and probit, instrumental variables estimation, and time series methods. • Random sampling. Because regressors are rarely fi xed in econometric appli cations, from the outset we treat data on all variables (dependent and inde pendent) as the result of random sampling. This assumption matches our initial applications to crosssectional data; it extends readily to panel and time series data; and because of our largesample approach, it poses no addi tional con ceptual or mathematical difficulties. • H eterosk edasticity. A pplied econometricians routin ely use het eroskedas tieity robust standard errors to elimina te worries about whe ther hetero skedasticity is presen t or not. In thi s book. we move be yond treating heterosked asticity as an exception or a "problem" to he "solved"; instead, we allow for hete roskedasticity fro m the outset and simply use heteroskedasticity
Ii
PREFACE
xxxi
robust standard errors. We present homoskedasticity as a special case that pro vides a theoretical motivation for OLS.
Skilled Producers, Sophisticated Consumers
We hope that students using this book will become sophisticated consumers of empirical analysis. To do so, they must learn not only how to use the tools of regression analysis, but also how to assess the validity of empirical analyses pre sented to them. Our approach to teaching how to assess an empirical study is threefold. First, immediately after introducing the main tools of regression analysis, we devote Chapter 9 to the threats to internal and external validity of an empirical study. This chapter discusses data problems and issues of generalizing findings to other set tings. It also examines the main threats to regression analysis, including omitted variables, functional form misspecification, errorsinvariables, selection, and simultaneityand ways to recognize these threats in practice. Second, we apply these methods for assessing empirical studies to the empir ical analysis of the ongoing examples in the book. We do so by considering alter native specifications and by systematically addressing the various threats to validity of the analyses presented in the book. Third, to become sophisticated consumers, students need firsthand experience as producers. Active learning beats passive learning, and econometrics is an ideal course for active learning. For this reason , the textbook Web site features data sets, software, and suggestions for empirical exercises of differing scopes. These web resources have been expanded considerably for the second edition.
Approach to Mathematics and Level of Rigor
Our aim is for students to develop a sophisticated understanding of the tools of modern regression analysis, whether the course is taught at a "high" or a "low" level of mathematics. Parts IIV of the text (which cover the substantive material) are accessible to students with only precalculus mathematics. Parts IIV have fewer equations, and more applications, than many introductory econometrics books, and far fewe r equations than books aimed at mathematical sections of under graduate courses. But more equations do not imply a more sophisticated treat ment. In our experience, a more mathematical treatment does not lead to a deeper understanding for most students. This said, different students learn differently, and for the mathematically weU prepared students, le arning can be enhanced by a more explicitly mathematical treatment. Part V therefore contains an introduction to econometric theory that
xxxii
PREfACE
is appropriate lor students with a stronge r mathematical background. We believe that, when the mathema tical chapters in Pa rt V are used in conjuDction with the ma terial in Parts IiV, this book is suita ble for advanced undergraduate or mas ter's level econometrics courses.
Changes to the Second Edition
The cb a nges introduced in th c second edition fall into three ca tegories: more empi rica l exa mples: ex pa nded theoretica l ma terial, especially in the treatmen t of the core regression topics; and additional student exercises.
More empirical examples. TIl e second edition retains tbe empirical examples from the first edition and adds a significan t number of new ones. T11cse additio na l examples include estima tion of the returns to education; infe rence abou t the gen der gap in earnings: the diffic ulty of for ecasting the stock market; and modeli ng the volatility clustering in stock returns. The data se ts for these empirical exam ples are posted on the course Web site. The secoml ed ition also includes more gen eralinterest boxes, for example how sample selection bias ("survivorship bias" ) can produce misleading conclusions about whether actively man aged mutual fun ds actually beat the marker. Expanded theoretical materiaL. The phil osophy of tbis and tbe previous edi tion is that t he modeling assumptions should be motivated by empirical applica tions. For this reason, our th re e basic least sq uares assumpt io ns tha t under pin regression with a single regressor includ e neith er normali ty nor homoskedastic ity, bot h o f which are argu ably the exception in eco nometric a pp li cations. This leads directly to largesample inference using heteroskedasticityrobust standard errors. O ur experie nce is that students do not find th is difficult in fact, wh at they fin d difficult is the traditiona l approach of introducing tile homoskedasticity and normali ty assum ptions, learn ing how to use { and Ftahl es, then being told that wh at they just learned is not reliabl e in applications because of the failure of these assumptions and that these " p roblems" mu st be '·fi xed." But n ot all instruc tors sh are th is view, anti some find it useful to introd uce the homos kedastic no rm al regressi on model. Moreover, even if homosk ed asticity is the exception instead of the rule. assuming homoskedastjcity permits djscllssing the G au ssM arkov theo rem , a key moti vation fo r using ordin ary least squares (OLS). For thcse reasons. the treatm ent of the core regression mate rial has been sig nificantly expanded in the second edition. and now includes sections on the theo retical motivation for OLS (the GaussMarkov theorem). small ;ample inference in the homoskedastic normal model. ,md multicolline <lDty and the dummy vari
PREFACE
xxxiii
able trap. To accommodate these new sec tions, the new empirical exam ples, the new generalin terest boxes. and the man y new exercises, the core regression chap ters have been expanded from two to fo ur: TIle linear regression model with a sin gle regressor and OLS (Chapter 4) ; inference in regression with a single regressor (Chapter 5); the mul1 iple regression model and OLS (Chapter 6) ; and inference in the m ultiple regression model (Cha pter 7). This exp anded and reorganized treatme nt of the core regression material constitutes the single greatest change in the second edi tion . The second edit ion also includes some additional topics req uested by some instructors. One such addition is specificat ion and esti matio n of models tha t are nonli near in the pa rameters (Appendix 8.1). Another is how to compute stan dard errors in panel data regression when the error te rm is serially correlated for a given entity (clustered standard errors; Section 10.5 and A ppendix 10.2) . A third addition is an introduction to current best practices for detecting and han dling weak instruments (Appendix 12.5 ), and a fourth additi on is a treatm ent , in a new final section of the last chapter (Section 18.7) , of efficient estimation in the heteroskedastic linear IV regression mo del using general ized met hod of moments.
Additional student exercises. Th e second edi tion contains many new exer cises, both "pa per and pencil" and empirical exercises that involve the use of data bases, supplied on the course Web site, and regression software. T he data section of the course Web site has been significantly enhanced by the addition of numer ous databases.
Contents and Organization
There are fiv e parts to the textboo k. This textbook assumes t hat th e student has had a cou rse in probability and statistics, although we review that material in Part I . We cover the core material of regression analysis in Part n. Parts Ill, IV, and V prese nt additional topics that build on the core trea tme nt in Pa rt II.
Part I
Chapter 1 introduces econometrics an d str esses the importa nce of providing quan titative answers to q uantitative questions. I t discusses the concept of causality in statistica l studies and surveys the d ifferent types of data encountered in econo metrics. Material from probabi li ty and statistics is reviewed in Chap ters 2 an d 3, respectively: whether these chapters are taught in a given course, or simply pro vi ded as a reference. depends o n tbe background of th e students.
xxxiv
PREFACE
Part II
Chapter 4 introduces regression with a single regressor and ordinary least squares (OLS) estimation, and Chapter 5 discusses hypothesis tests and confidence inter vals in the regression model with a single regressor. I n Chapter 6, slUdents learn how they can address omitted variable bias using multiple regression, thereby esti mating the effect of one independent variable while holding other independent variables constant. Chapter 7 covers hypothesis tests, including FteslS, and confi dence intervals in multiple regression. In Chapter 8, the linear regression model is extended to models with nonlinear population regression functions, with a focus on regression functions that are linear in the parameters (so that t he parameters can be estimated by OLS). In Chapter 9, students step back and learn how to iden tify the stren gths and limitations of regression studies, seeing in the process how to apply the concepts of intern al and external validity.
Part III
Part III presents extensions of regression methods. In Chapter 10, students learn how to use panel data to control for unobserved variables that are constant over time. Chapter 11 covers regression with a binary dependent variable. Chapter 12 shows how instrumental vari ables regression can be used to address a variety of problems that produce correlation between the error term and the regressor, and examines how one might find and evaluate valid instruments. Chapter 13 introduces students to the analysis of data fro m experiments and quasi , or natural, experi ments, topics often referred to as "program evaluation."
Part IV
Part IV takes up regression with time series data. Chapter 14 focuses on forecast ing and introduces various modern tools for analyzing time series regressions such as unit root tests and tests for stability. Chapter 15 discusses the use of time series data to estimate causal relations. Chapter 16 presents some more advanced tools for time series analysis, including models of conditional heteroskedasticity.
Part V
Part V is an introduction to econometric theory. This part is more than an appen dix that fills in mathemat ical details omitted from the text. R ather, it is a selfcon tained treatment of the econometric theory of estimation and inference in the linear regression model. Chapter 17 develops the theory of regression analysis for a single regressor; the exposition does not use matrix algebra, althougb it does demand a higher level of mathematical sophistication than the rest of the text.
PREFACE
xxxv
TABLE I Guide to Prerequisites for SpecialTopic Chapters in Parts III, IV, and V
Prerequisite parts or chapters
I
I I
Part I
I
I
Part II
I
I
I
I
Part III
10. 1, 10.2
I
I
14. 1 14.4
Part IV
14.5 14.8
IS
Part V
Chapter
13
I
47, 9
I
8
12.1, 12.2
I
I
17
I

1
10 11 12.1. 12.2 12.312.6  13
X' X'
xa
I
X· X' X·
c
15 16 17
 L4 
X· X' X·
xa
X· X· X' X' X X
X X X X X
I   b b b
I
X X

X X
X X
xa
X· X X
X
X X
18
X X
X
this table shows the minimum prerequisites needed to cover the material in a given cha!.'er. For example, estimation of
dynam ic causal effects with time series data (Chapter 15) first requires Part I (as neede ,depending on student preparation, and except as noted in footnote a), Part II (except for chapter 8; see footnote bJ, and Sections 14.114.4. _ aChopters 1016 use exclusively large.sa1.le aphroximations to samplinr, distributions, so the optional Sections 3.6 (the Student , d istribution for testing means) an 5 .6 (t e Student r distribution or testing regression coefficients) can be skipped. bChapters 14 16 (the time series chapters) can be taught without first teaching Chapter 8 (nonlinear regression functions) if the instructor pauses to explain the use of logarithmic transformations to approximate percentage changes.
'
I
Chapter 18 presents and studies the multiple regression model, instrumental vari ables regression, and generaliz ed method of m oments es tima tion of the linear model, all in matrix form.
Prerequisites Within the Book
Because different instructors like to emphasize different material, we wrote tltis book with diverse teaching preferences in mind. To the maximum extent possible, the chapters in Parts III, IV. and V are "standalone" in the sense that they do not require first teaching all the preceding chapters. The specific prerequisites for each chapter are described in Table 1. Although we have foun d that the sequence of top ics adopted in the textbook works well in our own courses, the chapters are written in a way that allows instTuctors to present topics in a di(ferent order if they so desire.
xxxvi
PREFACE
Sample Courses
This book accommodates several dillerent course structures.
Standard Introductory Econometrics
This course inlroduces econometdcs (Chapter 1) and reviews probability and sta tistics as needed (Chapters 2 and 3). It then moves on to regression with a single regressor, multiple regression, the basics of functional fOfm analysis. an d L he eval uation of regression studies (all of Part II) . The course proceeds to cover regres sion with panel data (Chapter 10), regression with a limited dependent vari able (Chapter 11), and/or instrumenta l variables regression (Chapter ] 2). as time per mits. The course concludes with experiments and quasiexperiments in Chapter 13, topics that provide a n opportunit y to return to the questions of estimating causal effects raised at the beginning of th e semeste r and to recapitulate core regression methods. Prerequisites: Algebra II and introdu ctory statistics.
Introd uctory Econometrics with Time Series and Forecasting Applications
Like the standard introductory course, this course covers all of Part I (as needed) and all of Part II. Optionally, the course next provides a brief introduction to panel dat a (Sections 10.l and 10.2) and takes up instrumental variables regression (Chapter 12, or just Sections 12.1 and 12.2). The course then proceeds to Part IV, covering fore casting (Chapter 14) and estimation of dynamic causal effects (Chap ter 15). If time permits. the course can incl ude some advanced topics in ti me series analysis such as volatility clustering and conditional heterosked asticity (Section 16.5). Prerequisites: Algebra f1 and introductory statistics.
.;
:
A pplied T ime Series Analysis and Forecasting
This book also can be used for a short course on applied time series an d fore casti ng, for which a course on regression analysis is a prereq uisite. Some time is spent reviewing the tools of basic regression analysis in Part II. depending on stu den t preparation. The course then moves directly to Part IV and works through forecasti ng (Chapter 14), estimation of dynamic causal effects (Chapter 15). and advanced topics in time series analysis (Chap ter 16), incl uding vector autore gTessions and conditional heteroskedasticity. An important component of this co urse is han dson forecasting exercises. ava il a ble to instructors on the book's accompa nying Web site. Prerequisites: A lgehra {J alld hasic introductory econo m etrics or the eqll;valelll .
PREFACE
xxxvii
Introduction to Econometric Theory
This book is also suitable {or an advanced undergraduate course in which the stu dents have a strong mathematical preparation, or for a master's level course in econometrics. The course briefly reviews the theory of statistics and probability as necessary (Part I). The course introduces regression analysis using the nonmath ematical, applicationsbased treatment of Part II. This introduction is followed by the theoretical development in Chapters 17 and 18 (through section 18.5). The course then takes up regression with a limited dependent variable (Chapter 11 ) and maximum likelihood estimation (Appendix 11.2). Next, the course optionally turns to instrumental variables regression and Generalized Method of Moments (Chapter 12 and Section 18.7), time series methods (Chapter 14), andlor the esti mation of causal effects using time series data and generalized least squares (Chap ter 15 and Section 18.6). Prerequisites: calculus and introduclory statistics. Chapter 18 assumes previous exposure to m atrix algebra .
Pedagogical Features
The textbook has a variety of pedagogical features aimed at helping students to understand, to retain, and to apply the essential ideas. Chapter introductions pro vide a realworld grounding and motivation, as well as a brief road map high lighting the sequence of the discussion. Key terms are boldfaced and defined in context throughout each chapter, and Key Concept boxes at regular intervals recap the central ideas. General interest boxes provide interesting excursions into related topics and highlight realworld stud. ies that use the met hods or concepts being dis cussed in the text. A numbered Summary concluding each chap ter serves as a help ful framework for reviewing the main points of coverage. The questions in the R eview the Concepts section check students' understanding of the core content, Exercises give more intensive practice working wi th the concepts and techniques introd uced in the chapter, and Empirical Exercises allow the students to apply what they have learned to answer realworld empirical questions. A t the end of the textbook, the References section lists sources for further reading, the Appen dix provides statistical tables, and a Glossary conveniently defines all the key terms in the book.
Supplements to Accompany the Textbook
The on li ne suppleme nts accompanying the Second E d ition of Introduction to Econom etrics inclu de the Solutions Manual, Test Bank (by Manfred W. Keil of Claremont McKenn a College), and PowerPoint Lecture Notes with text figures.
B ruce Hansen (Uni versi ty of Wisconsin . and an Excel addin fo r OLS regressions. San Di ego). Eli Tamer taught from an early draft and also provided belpful commen ts o n the penultimate draft of the book . anti Christopher Sims (Prince ton ). OUI Instructor's Resource Disk. Gregory . cont ains the PowerPoint Lectu re Notes. FinaUy. Mad i son) and Bo Honore (Princeton) provided helpful fee dback on very ~arly o ut lines and preliminary versions of the core materia l in Part n. Graham Elliott (University of Calirornia. EViews amI STATA tutorials for students. she also helped to vet much of the material in this book while it was being developed for a requ ired course for maste r's studen ts at the Kenn edy School. These include data se ts [or all the text examples.comlstoclcwatson. provides a wide range of addi ti o nal resource s fo r students and faculty. Acknowledgments A great many people contributed to the first edition of this book. These resources are avaiJ able fo r download from th e Inst ructor's Resource Center at www. We are also indebted to two o the r Kennedy School colleagu es. Andrew Harvey (Cambridge Un iversity). for their patient explanations of quasiexperiments and the fi eld of program eval uation and for th eir deta iled comments on early drafts of the text. Our presentation of the mate rial on tim e series has benefited fro m di scussions with Yacin e Ai t Sal181ia (Princeton). offered in Test Ge nerator Software (Tesl Gen with Q uizMaster) . data se ts for the endofchapter Empirical Exercises. found at www. a Compa nion Web site. A lberto Abadie and Sue Dynarski. Lf instructors prefer their su pplements on a CDROM. and Key Concepts.3\Vbc. provides a rich supply of easily edited test problems and questions of various types to mee t specific course needs. In addjtio n. Suzanne Cooper provided invalu abl e suggestio ns and detailed comm ents on mul tip le drafts. avail able for Windows and Maci ntosh. many people maJe helpful suggestions on parts of the manuscript close to their area of expertise: Don Andrews (Yale).xxxviii PREFACE tables.com/irc. John Bound (Un iversity of Michigan). an d the Solut ions Manual. while the Test Bank. As a coteacher with one of the authors (Stock). Berkeley) provided thoughtful sugges tions abou t o ur treatmen t of mater ials on program evaluation. A t H arvard 's Ken nedy School of Government. rep licat ion fil es for empirical resul ts reported in the text. Joshua A ngris t (MlT) and Guido Imbens (U nivcrsi ty of Califo rnia. Our biggest debts of gra tit ude arc to our colleagues at Har va rd and Prin ceton who used early drafts of this book jn tJleir classrooms. the Test Bank. We also owe much to many of our frie nds and colleagues in econometrics who spent time talking with us about the su bstance of this book and who collectively made so ma ny helpfu l suggestions.awbc. A t Princeton. The Solutions Manua l includes solutions to alllhe end ofchapter exercises.
Student Assessment Services. We thank Jonathan Gruber (MIT) for sharing his data on cigarette sales. Many people were very generous in provicling us with data. G reensboro) graciously pro vided us with his data set on drunk driving la ws and traffic fa talities. Eric Ha nushe k (the Hoover Ins titu tion). Mi Iwa ukee Ch ristopher F.). U niversity of New Mexico ChiYoung Choi. Bauru . Massachusetts Departmen t of E ducation. Corp. David Neumark (Michigan State U niversity).Alan Krueger (Princeton). detailed. Kerry Griffin and Yair Listokin read the entire manu script. and Andrew Fraker. and Malt Watson worked through several chapters. Pie rre Perron (B os ton University). an d Lynn Brown e for explaining its policy context. Blackburn. U niversi ty of Delaware C lopper Almon. we particularly thank Geoffrey TooteU for providing us with the updated version of the data set we use in Chapter 9. David Dm kker (Stata . Massachusetts Institute of Technology Swarnjit S. which we analyze in Chapter 10. Caroline Hoxby (H ar vard). A lessand ro Tarozzi. Jean Baldwin G rossman (Princeton). Steven Levitt (University of Chicago).PREFAC E xxxix Chow (Princeton). Kenneth Warne r (U niversi ty of Michigan). Universi ty of Chicago Douglas Daienberg. and Richard Zeckhauser (Harvard). The research department at the Federal Reserve Bank of Boston deserves thanks for putting together their data on racial discrimination in mortgage lending. and AJa n Krueger (Princeton) for his hel p with th e Te nnessee STAR data that we analyze in Chapter 11. University of Maryland. and thoughtfu l com ments we received from those who reviewed various drafts for AddisonWesley: We t hank se veral people for carefu lly checking th e page proof for errors. Han Hong (Princeton). Un iversity of South Carolina A lok Bohara. We are also grateful for the many constructive. Agnello. Thomas Downes (Tufts). Richa rd Light (Harvard ). California Department of Education. for his help with aspects of the Massachusetts test score data set. Boston College McKinley L. Hong Li. The California test score data were constructed with the assista nce of Les Axelrod of the Standards and Assessments Division. University of New Hampshire Dennis Coates. Queen's University. University of WiscQnsin. We are grateful to Charlie DePascale. A rora . Ori Heffetz. Amber Henry. Canada Richa rd 1. Baltimore County Tim Conley. Ch ristopher Ruhm (U niversity o f North Carolina. Joseph Newhouse (Harvard) . University of Maryland Joshu a Angrist. University f Montana . G raduate School of Business. Michael Abbott. James Heckm an (Unive rsity of Chicago).
James Madison University D avid Eaton. Sh ippensburg University Tung Liu. University of San Francisco Edward 1. Truman State University Christopher Taber. F leissig. Vanderbilt University Mico Mrkaic. Berkeley john Xu Z heng. Ithaca College Manfred W. University of Pennsylvania Leslie S. Murray State U niversity A drian R. Ball State University Ken Matwiczak. Naci Mocan. The George Washington University Elia Kacapyr. Austin KimMarie McGoldrick. Syracuse U niversity Pierre Perron.xl PREFACE Antony D avies. LBJ School of Public Affairs. Middlebury College Zhenhui Xu. State University of New York . United States Naval A cademy Bruce E. University of Wisconsin. Georgia College and State University Yong Yin. Virginia Commonwealth U niversity Jane Sung. U niversity of Texas. UniverSity of Western Ontario Phanindra V. Keil. Northwestern University Petra Todd. Madison Peter Reinhard Hansen. California State University. University of Arizona Oscar Jorda. U niversity of California. Davis Frederick L. Claremont McKenna College Eugene Kroch. Columbia University William Horrace. Goodman. Joutz. State University of New York. Los Angeles Frank Schorfheide. University of California. D oyle. University of Melbourne. University of Colorado. University of Texas. Daniel Westbrook. Macalester College Kajal Lahiri. Wunnava. University of Colorado. A ustin . Hansen. Duquesne University Joanne M. Vytlacil. Boulder H . University of Minnesota Sunil Sapra. Australia Marc Henry. Duke University Serena Ng. Villanova University Gary Krueger. Stratton. Buffalo jjangfeng Zh ang. Fullerton Rae Jean B. Georgetown University Tiemen Woutersen. Stanford University M. Denver Mototsugu Shintani. University of Pennsylvania John Veitch. California State University. The George Washington University Simran Sahi. Johns Hopkins University Jan Ondrich. Henry. U niversity of Richmond Robert McNown. Brown University Ian T. Boston University Robert PhilEps. Albany Daniel Lee.
Jeffrey Lieb man. Hansen. we had the benefit of Kay Ueno's skilled editing in tbe second edition. Wei bin Huang. Chris Murray. Alan Krueger. Ken Simons.PREFACE xli In the first edition we benefited from the help of an exceptional development editor. John List. Susan Dynarski. F resno Bradley Ewing. wh o worked with us on the second edition: Adrie nne D 'A mbrosio (senior acquisitions editor). Craig A. George Tauchen. Florida A&M U niversity Jim Bathgate. Cecilia Rouse. and Denise Clinton (editorinchief) . and presentation. and Samuel Thompson. Tom D oan. University of O regon Jamie E me rson . Harvey Rosen. corrections. We exte nd o ur thanks to the superb A ddisonWesley team. the changes made in the second edition incorporate or reflect suggestions. Clarkson UniverSity Scott England. William Greene. Minot State U niversity R. Kim Craft. hard wo rk. Elena Pesavento. Ross Levine. com ments. Finally. We also benefited from thoughtful reviews for the second ed ition prepared for Addison Wesley by: Necati Aydin. and Della Lee Sue helped with the exercises and solutions. H eat her McNally (supplements coordinator). organization. large and small. JeanFrancois Lamarche. Nancy Fenton (managing editor) and her selection of Nancy Freihofer and Thompson Steele Inc. Brigham Young U niversity IMing Chiu. Charles Spaulding (senior designer). and attention to detail improved the book in many ways. We also received a great deal of belp preparing the second edi tion. Michael Jansson . Linfield College James Cardon . and their effo rts are evident on every page of this book . Chris Foote. and help provided by Michael Ash. We have been especially pleased by the number of ins tructors who contacted us directly with thoughtful suggestions for this edition. Bo Honore. and Motohiro Yogo. In particular. Jane and Sylvia patientl y taught us a lot about writing. Ed McKenna. G iovanni Oppenheim. Do uglas Staiger. William E vans. who handled the entire production process. Jalon Gardella . Texas Tech University Barry Palko Iowa State University Gary Ferrier. Southern Utah University Brad Curs. Manfred Keil. Jane Tufts. Robert Porter. Hong Li. Avinash D ixit. Graham Elliott. Roberto E . Jim Bathgate. This edition (including the new exercises) uses data gene rously supplied by Marianne Bertrand. Sylvia Mallo ry. sta rting with o ur excelle nt editor. Peter R. Laura Chioda. Depken ll. California Sta te Universi ty. A ddisonWesley pro vided us with first rate support. Jeffrey Kling. John Donohue. D anie l Hamer mesh. whose creativity. Bridget Page (associate media producer) . and extending through the entire publishing te am. Liran Einav. University of Arkansas . Steve Stauss.
Iowa State University Charles S. SUNY at Brockport Kyle Steigert. Washington University Carolyn 1. San Diego State Un iversity Shelby Gerking. University of Wisconsin. Riverside Elena Pesavento. University of G eorgia William Wood. Rutgers University Justin Tobias. Madison Norman Swanson. Writing this book took a long timefor them. Wright State U niversity Brian Karl Finch. we are indebted to our families for their endurance throughout this project. the p roject must have seemed endless. University of California. Sacramen to Ron Warren. Brown University Sharon Ryan. Northwestern University Tom oni Kumagai. James Madison University Above all. Central Washington University Rob Wassmer. University of Central Florida Edward Greenberg. Columbia John Spitzer. J L . Califo rnia State University. University of Wisconsi n. Wassell. University of Missouri . They more than anyone bore the burden of th is commitment.xlii PREFACE Rudy Fichten baum. and for their help and support we are deeply grateful. Wayne State U niversity Ta eHv. . Heinrich.ry Lee. Emory University Susan PorterHudak.Madison Christina H ilmer. Northern Illinois nive rsity Louis Putterman. Virginia Polytechnic Institute Luoj ia Hu.
A fourth might lell )'ou thai it is the science and an o ( using historical data to make nume rical. in general terms. eco no me trics is the scie nce a nd ar! of using economic theory a nd stat istical techniques to analyze economic da l ~. Al a broad level. This ehaple r poses four of those questions and discusses. a nd eco nomic policy. macroeconomics. The chapter concludes wilh a survey of the main types o f data available Lo econometricians for Answering these and other quantita tivI: ecunouUc questions. labor economics. microecono mics. ind uding fi nance . We wi ll use these methods to answc r a variety of specific. quantitative questions taken from the world of business and governmenf policy. Econometric me thods a re also common ly used in Olhe r soci al sc iences. the o\'("rall growth of the economy.ay thaI econome trics is the process of fi tting m:uhcmatical economic mode ls to rcal world da ta.CHAPTER 1 Economic Questions and Data A is the a~ sk a half dozen econometricians what econome triCs is and you could get a half dozen differe nt answers... Econo me tric me thods are used in many bra nc hes o f econo mics. all these answers are right . . policy recommenda tio ns in government and business. including political scicncc and sociology.~ 1 second might Iell you that econometrics of tools used for forecast ing fut ure va lueS of economic variables slIc h a firm 's sales. Another might ::. the econometric a pproach to answering them. Thi s book introduces you to the core se t o f methods used oy econometricians. . or stock prices. One might tell you that econometrics is the lh e~ science of testing econom ic ::. Or quanti tati ve. I n fact. marketi ng.
public ed ucation system genera te heated debate. writing.4 CHAPTER 1 Economic Ovestion~ oncI Data 1. and grades improve. cigarette consumption. r Question #1 : Does Reducing Class Size L Improve Elementary School Education? Proposals for reform of tlle U. each student gels more o f tbe leac her's anenrion. such as developing social ski lls. is the e ffect on e lementary school educat io n of red ucing class size? Reducing class size costs money: It requires hiring more teachers and. To weigh COSIS and benefits. BUI what. if the school is already ai capaci ty. To provide such an answer.1 Economic Questions We Examine Many decisions in economics. While this fact is . evidence based 00 data. lea rning is e nhanced. This book examines several quantitative queslions taken from cu rrent issues in economics. These decis ions require quanlitalive answers to quantitative questions. bUI for many pare nrs and educators the most impo rt ant objective is basic academic lea rning: reading. In the Cali fornia da ta. and basic mathematics. Many of the pTop os al ~ concern the younges l sWden ts. Ele me ntary school education has various objecti ves. business. students in districts with small class sizes te nd to perfonn belter on standardized tests than students in districts with larger classcs. In this book . building more classrooms. tbose in elementary schools. racial bias in mort gage lending. precise ly.S. the re are f~wer class disruptions. and macroeconomic forecasting. With fewer students in the classroom. Is the beneficial effect on basic learning of smaller classes large or small ? Is it possible that smaller class size actu· aU) has no C(Ccct on basic learning? A lth oug h common sense and everyday expe ri ence may suggest that more learning occurs when there are fewer sludenls. One prominent pro posal for improving haste lea rning is to reduce class sizes at e leme ntary schools. we examine the re lationship between class size and basic learn ing using data gathered (rom 420 California school districts in 1998. Four of these questions concern education policy. howeve r. the argume nt goes. A decision make r contemplating hiring more leachers must weigh these costs against Ule benefits. the decision make r must ha ve a precise quantitative understanding of the likely benefits. and gove rnmenl hinge_o n understanding relat ionships among variables in lbe world around us.that is.re lating class size 10 basic learning in ~kme nt af y schools. we mu st e xamine empirical evidence. common sense cannot provide a quanlitalive answer to the question of wha t exact ly is the effect on basic learning o f reduci ng class size.
.cal0 all ways but their race should be equallY likely to have their mOTl· gage applicatio ns approved .S.<. if so. tbe re should be no racial bjas in mOrl g. how large is it? The fa ct that more black than white applica nts arc den ied in the BasIOn Fed dala does nOI by ilsel f provide e vide nce of discrimination by mortgage le nders. sucb as the economic background of the students. In theory. Question #2: Is There Racial Discrimination in the Market for Home Loans? M~~ ilu"""I.ion ana ly:. holding COfISWnt o the r applicant characteristics. in practice. In contrast to tbis theo retical conclusio n. we usc mu ltipl e regres. U. these data mUSI be examined man: closely to see if there is a difference in th e probability of being denied for OIherw. there is racial bias in mo ngage lending? 1f so. Many o f the COSIS of smok ing. It could be these ext ra learn ing oppor tunitjes that lead to highe r lest scores. io C hapter 11 we introduce econo me tric me thods that make it possible to quantify the effeci of race o n c hance of obtainin g a mortgagl1.a large loan secured by t he va lue of the home. whether this diffe rence is large or sma ll.Jt idtntical applicants and. notablv their abilitv to re ay the loan. the n. while o nly 9% of while applicanl~ are denied. In Part II . lending institut ions cannot lake race into account when deciding to grant or deny a reques t for a mo rtgage: Applicants who are iden.1 Economic Oues~oo~ We Examine S consistent with the idea that smalle r classes produce bette r lest SCOfl!S. nOi smaller class sizes.is to isolate the eUecl of changes io class size from changes in other factors. so students in small class districts could have mo re opportunities for learn ing outsid\! the clnssroom. Befo re concluding that there is bias in Ihe mortgage market . Question #3: How Much Do Cigarette Taxes Reduce Smoking? Cigare tte smok ing is II major public hea lth conce rn worldwide. By law. Do these data indicate that.ring fo r Ihose made sick by . For example. such as Ihe medical expe nses of ca.!i. beca use the bL ack a nd wbjle applicants diffe r in man y ways other than th eir race. To do so. it m iglll simply re neel many o lher advantages that students in districts wit h sma ll classes have over the ir counte rparts in districts with large classes. Pt'Y U~~ Most people buy thei r ho mes with the he lp of a mortgage. districts with small class sizes tend to have wealtbier reside nts than districts wit h large classes.1. researchers Dl the Federal Reserve Bank of Boston found (using data from th e early 1990s) that 28% of black appli · cants are d en j~u mongages.age le nding.
In these data. by what percentage will the quanti ty o f cigarcltes sold d ecrease? The percent age change .es. there il> a role for govern· me nt interve ntion in reducing cigarette consumption. if so. What will Sales be next yea r at a firm considering in vestin g in new equipment ? Will lhc st ock market go up next month and . Question #4 : What Will the Rate of Inflation Be Next Year? II see ms thllt people always wa nt a sneak preview of tbe futu re. states wit h low taxes. consumption will go down. and thus low ciga re lle prices. and pc. we need to analyze data on eigarene consumption and prices. by flIi sing taxes. are borne by other m embers of sO<"iet)'. states in the 1980s and 1990s. Basic economics says thai if cigarette prices go up.. Howe\'er. have higb smokiog rates. To \carn the e lasticit y we must examine empirical cvidence abo ut th e behavior of smokers and potential smokers .n Ihe statc the n local politicians might try to keep ciga rette taxes low to satisfy their smok ing constilUents. If we wanl to reduce smoking by a cert ain amounl. The dal3 we examine are cigareue sal es. Bul by how much? Jf the sales p rice goes up by I %.S. [n Chapter 12 we study methods C or handling this "simultaneo us causal ity" and use those methods to esti mate lhe price elasticity o f cigarette demand.e. prices. tlnd s tales with high prices nave low smok ing rales.rsona l income for U.n the quantity demanded resu lting from a 1% increase in price is the price elasticity of demand. the analysis of th ese data is complicated because causal ity runs b01b ways: Low taxes lead to high demand.. ta:.o to the beach? . in ot her words. Because Ihese costs are borne by people other than the s moker. say 20%. One of the mosl flexib le wols fo r CUlling consumption is to increase taxes on cigarettes. it does not tell us the nume rica l va lue of the pr ice ci<lSlicil y of demand . bU I if there arc many smokers i. lh en we need to know lhe price elasticity to ca lculUlc the price increase necessary to achieve this reduction in consumption.6 CHAPTER 1 Economic 0ue5tions and Data smoking and the tess quantifiab!e costs to nonsmoke rs who prefer not to breathe secondhand ciga re lle smok. by how much? Will city tax rece ipts next year cover planned expenditures on city services? Will your microeconomics exam nex t week focus on externalities or monopolies? Will Saturday be a nice day \0 g. But what is the price e lasticity of demand fo r cigarettes? Although economic theory provides us with the concepts thHt help us answe r this questio n.
ize .thc conceptual [ramework for the analy sis needs to provide both a numerical answer 10 the queslio n fi nd a measure o r how prcci~e the answer is. An impoTlant empi rical relationship in macroeconomic dat a is the "Phillips curve. Quantitative Answers Each of these four questions requires a numerical answer. If they guess \'oTong. Economic theor) pro vides clues alJout that answcrcigarelle consumpl ion ought to go down when the price goes upbul the actual va luc o f th~ num ber must be learned empirically.depcnding o n her best guess of the rate of iol1a tion over the coming year.. One of the inflation fvrecasts we deve lop and evaluate in Chapt er 14 is based on the Phi ll ips curve. holding othe r thing. the~ risk causing.. then they might increase interest rates by more than that to slo. Economists at central hanks like the Fcdcrol Rese rve Board in Washington." in which a currently low value of thc unemploymerll rate is associated with an increase in the rate of inflation over the next year. The conceplUnl framework used in this book is the multiplc regression model. If they thin k the ra te or inflation wil l increase by a percentage point. '" hat effect does 11 change in cla~s :. O er many. 1'le data we usc to fon:cas\ int1ation :Ice the rates of inflation and unemploy ment in tbe Uni ted States. that is.111ere(ore. so their decisions about how to set inte rest rales re ly on the outlook for innation ove r the next ye. by analyzing data. Beca use we usc data to answcr qua ntilath e questions. our answers alwtlys have some uncerta inty: A differe nt sel of data would produce a different numerical answer.C.lr.. For example. introduced in Part n. Professional economists who rtdy on preci§c numerical forecast s usc econo me tric mode ls to ma ke those forecaSl ~ A fu rcca~ t c r's joh is to predict the fu ture using the pas\. This model. risks overheating. A financial profes~ional mig. the mainstay of econometrics. D.1.ht ad\ isc a client whethe r to make a loan o r to tak e one o ut at (1 given rate of interest. arc responsible for keeping the rat e of price innation under control...tant.> cont. down an econ omy tbat. 1 Economic QuestiOO1 We Examine 7 O ne a~pcct of the fulUre in which macroeconomists and financial economists arc particularly inte rested is the rate of O'w crall price inflation d uring the nc'\:t ) ~a r. dlher an unnecessary recession or an undesirable jump in the ratc of inflation. Quantitative Questions. provides a math ematical \\ ay to qua ntify how a change in one \!ariable affccts another variahle. in thcir vic\\. and the European Ce ntra l Ran k in Fran kfurt. and econometricians do Ihis by using economic theory an d st.:l tis tical lel'h niques to q uanli fy re latiunsh ips in hist orical data.
holding ConSllI1If the in come of smo kelrs and potent ial smokers? The multiple regression model and it!'.2 Causal Effects and Idealized Experiments Like ma ny questions encountered in economerrics. of fh al action . Each plol is te nded ident icaU y. ex te nsio ns provide a framework for answering these questions using data and {or quantifying the unce rt ainlY aswciated with those answers. Thi s is an eX<lmple of a randomized controlled experiment. the honicuhuralisl we ighs the h<lrvest from each plot . measurable consequence (more tomatoes). Moreover. putting air in your tires causes them to infl a te. In that experiment. ln common usage. C <lu sality means that a speciCic action (appJyi ng fertili ze r) le ads to a speci(ic.lt is controlled in the sense tha I the re are both a co ntrol group that receives 00 treatment ( no . Estimation of Causal Effects H ow best might we measure [he causal effect on tomalQ yield (measured in kilo grams) of applying a certain amount of fertili ler.The dif· ference between (he average yield per square meter of the treated and untreilted plots is toe effect on tomatO production of [he fe rtilize r treatment. a horticultural researcher plants ma ny plots of tomalQCs. an action is said to cause an outcome if the outcome is the direc t result . while the rest get none.. 1.holdillg COII SWIlf Other (uctQrs such as yo ur ability to repay the loan'? What effect does a 1% increase in the price of cigarelles have on cigarette consumption. At the end of the growiog season. drinking wa ter causes yo u to bl! less thirsty. say 100 gram s of fertili zer pe r sq uare meter? One way to measure this causal eUecl is 10 conduct an experiment. IwldinR consran( slUde nl characteristics (such as family income) Utal a school district administnHor cannot control? What effect does your race have on your chances of ha ving a mortgage application granted.8 CHAPTER 1 Economic Questions OM Data ha ve on lest scores.sbles. e nsuring that any other di ffere nces between the plots are unrelated to whetb er they receive fe r tilizer. Touc hing a hot Slave causes you to get burned. whether a plot is ferti lized or oot is de te rmined randomly by a compute r. the firsllh ree questions in Sec lio n Ll concern causal re l ~lionshjps among var. pUlling fe r tilizer on you r tOmato plants cau ses the m to produce more t(lm[l toes. with one exceptio n: Some plots get 100 gr<lms of fe rtilizer per square meter. or consequence.
It is possible to imagine an idea l random izet! controlled experiment 10 answer each of Ihe first three qUt!stions in SeCi ion 1. how sunny the plot is and whe ther il receives fefli li7.1. This random assignment e lim inates the possibility of a systematic rclationsll ip belween.c analysis of causal effects using actual data. it i ~ not possible to per· form idea l experime nts.cdifference between the groups of students is their class size.cy are une thicaL impossible 10 execute satisfact orily. Evcn t hough forecasting ne ed not involve ca usal relationships. the only systematic reason fo r d ifferences in OUI comes belween thc trealme nt and conl rol groups is the trea tment itself. then in the ory this experiment would estimate the effect on test scores of red ucing class size. however.1 co ncern causal effects. Ihen it will yield an esti mate of th e causal effect on the outcome of interest (tomato pro ductio n) of the treatment (applying 100 &1m2 of fertilizer). tbe fourthforecas ting innationdoes no\.e r.ndomly assigning '" trea tme nIS" of diffcrent class ~izes to differ~n t groups of student s' If the expe ri ment is designed and eXt!cuted so that the only sy~ (emali. In this book . The concept of an idea l random ized conuoHed experiment is useful bel. A good way 10 ' fo recasl " i f il is raining is to observe whether pedestrians are using umbrellas. or prohibitively expensive . bUll he act of using an um brella docs not cause it to rai n. Fo r example.1. TIlt:: concep t of the ideal randomized con trolled expe rimenl does. provide a theoretical benchmark for an econometri. As we sec in Chapter 14. II is randomized in tbe sense t hat the trea tment is assigned ra ndomly. ex perime nts are ra re in ccono me lrics beca use often Ih. multiple regression a n a l ysi~ allows us to quantify historical . In prllctice. the causal effect i ~ de fined to be the effect on an outcome of a given action or treatment. as measured in an ideal randomized controlled exper ime nl. for exa mple. I n such an experimenl. so that t he only systcmatic diffe rcnce be tween the treatment and control groups is Ihe u eal ment . Forecasting and Causality Although the first thr('c q ueslioos in Section 1. how ever. In fact. If this expcri m~nt is properly implemcnled on a large enough scale.:ouse it gives a definh ion of a causal effeCI. holding al1 else constolH . You do not need 10 know a causal rela lionship to make a good fo recast.2 Cousal Effects and Idealized Experiments 9 ferti lizer) and a treatm ent group that rece ives the treatment (100 g1m 2 of fertil izer). to siudy class size one can im agine ra. economic theory suggests patterns and relationships that might be use ful for forecasting.
Instead.10 CHAPTER 1 Economic Q.lesigned to cvahl(lIe a treatment or policy or 10 invcstigute R causal effect.s size exa mple) arc not a~~igm:(j ul random. 1Il some circumstances experiments arc not on ly expensive and difficulllO adminbter but also unethical. In that expe riment.lcms were ra n domly u~signcd \0 d.cd tests.3 Data: Sources and Types In econometric. Obscrvatjonal data pose mujor challe nges to econometric allcmph to esti mate causal effeclS.o.ed controlled experiment examining class site in the 19SOs. which we examine in Chllpter 13. <i. is dc\'oted to methods for k: ~p eri l1lenl j) 1 ." ampil:. and the tools of econometrics to t:lckle these rhullenge~ In the real ~orld.:l ss~ of d ifferent si7es for sevcral years ami were given ann ual standardil. The ' lcnncssee class size ex perimen t cost millions of dollars and required the o ngoing cooperalion of many adm inistrators. the statl.!>. For e.This book examines both expe nmental and oon e.lge applications main tai ncLl by lending institutions..teacher ratio in the cla:.arcllcs to ~c how ma n) they buy?) Because of these financial. and leachers m er several years. Much of econometrics.uch us historical n:cords on Illortg. levels of "treatment" (the amount of fer tilizer in the tomato c_"ample. experi ments in c(. practical .&tions and Dota relatlunships suggc~ted by economic theory. pa renl. \0 make quantitatiVI! (oreca~1S about the future. Moreover. the student. (Would it be eth ical to offer randomly selected tecnag. to check whet her those relationsbips have bl:CII stable over time. 110 It i ~ difficul t 10 sort out the effect of the "treatment " from other relevant factors. data come from One of 1\\0 sou rces: experimenls or nonexpcri mental observations of the world. 1.thcy have flaws relative 10 ideal randomi7ed controlled exper imcnts. and admini"irative records. Data obtained by observing actual behavior outside an t"xperimclltai setting <lre called obscrvaliomll data .. Observation<ll data arc colkctcd using SllrvCy~ sllch 3" Iltclephone survey of consumers. and much of Ihi~ hook. mOllt economic data are obtained by observing real\\ orld behavior.:perimental dahl sets..Qnomi(::) are ra re... th ousand~ of "tut..: ofTcnncsscc finance d a I ~rge random it.ers incxpensi\c cig.. and to asse~s the accuracy of those forecast. and ethical problems.. Becaulll! realworld experiments ~it h human llubJects .Ire difficuilio :ldmin ister and torontrol. Experimental versus Observational Data data come fro m experiments t.
The percentage of studen ts in that district still learning E nglishth at is. Time Series Data lime series data ure data for a single entity (person. The remaining rows prcsent data fo r other d islriclS."hcrs in d istrict #1 . so for exam ple.89.10 gene ra l. Each row lists data for a differen t district.nlose data arc for 420 en tities (school districts) for a single time period (1998). and so fort hfor a single time period are called '. data se ts come in three main types: crosss~ctional data.. With cro~sectio na l data. is an arbitrarily assigned number that organizes the datil. firms. gm·cmmenta l units. in the Califorma dala set II = 420.on d languagc an d who are 110t YCt proficient in Englishis 0% . all the . the data on test scores in Californi'l school districts are cross sectional.e stude nt. the number of entilit:S on which we bave obse rvations is denoted by II. In this book )'ou will encounter all three types. is 17.. For exam ple.1 . Some o f these data are tabulated in 'Iable 1.. the avcmge test score for the firM district ("'d istrict #1 ") is 690.te:lche r ratio in tha t dist rict is 17J':Y. and panel data. As you can see in the Hlblc. The California te~t score data set contains measurements of ~c\('ral different varia hies for each district. that is.8: this is the ave rage of the math and science test scores for all fi fth graders in that district in 1998 on a standardizeLl test (the Stanford Ach ieve men t TCM). which is called the observation number. o r Olher economic en tities du ring a single lime period.consumers. Fo r exam ple. we ca n learn about re lationships among varillbles b) ~Iudying differences across people. The order of the rows is arbitrary. Our data set o n the ratcs of inflation and unemployment in ' he United State~ is an example of a lime series data SCI.'fosssectional datil .. arillblcs listed vary considera bly. and the number of the district. The averag. CrossSectional Data Da ta on different cntitiesworkers.3 Data: Soorces and Types 11 meeting the challenges encountered when realworld data arc used to estimate causa l effects.1. divided by the number of classroom tea(. The dat a set contains . firm.3R5. country) collected at multiple lime periods. the number o f students in district #1. Average expen ditu re: per pupil in district #1 is $6. time ~eri es data. firms. Whether the data are experimen tal or ob~cr"ational. the pen:l:!1tngc of students for whom En gli ~ h is a s~(.
".2fl 19. lime peri ods) in a time series data set is denoted by T.7 % .. and so fOrlh ).. 418 41 ' 420 NOlI' " Th. The observa tio ns in this data set begin in the second quarter of 1959..._ .. 4 1 ' . 2U lI} 20.2 655.8 661.. CPl) would have increased by 0. In the third q uarter of 1959.1n Pupil (S) Leominil English 690. 5993 Calif()mi a !C~t :>ooro: daLa 111:1 <Jc:scrilJ.89 2Jj2 18.. .3%. In other words. this data set contains T = 183 observa tions. and June .0% . {Fifth Grad_ ) Studen.!mploymeol) for a single enfity (the United Slates) for 183 time periods. 5..12 CHAPTER..T eacMr bpenclitu. that is.2. .rcentoee of Ob servation {District) Number j Diih'icl Average rest Stor..1 o bservationS 0 0 two va riables ( the ra les of in(]ation and unt. The data in each row correspond to a differe. Some observations in this data set are listed in Tabl e 1. May.0 672.8 tKo7 5236 _ . which is denoted 1959:11 and end in the tourth quarter of 2004 (2004: IV). the rate of price inflation was 0.d in Appendi x 4... 1 Economic Ouestions and Data TABLE 1 . and the ra te o f unemployment was 5.1 %.. 1% .7 5502 7102   0. if inflation had continued for 12 months Ht its rale during the sec ond quarter of 1959.6 641. the rate of CPT inf1.t.no 17...lIion was 2. In the second quarter of 1959. . pet" Studentt • .   ...3 4776 JO 5. February.0 13. the overall price leve l (as measured by the Consumer Price Index. The number of observations (thal is.()4 .. and Ma rch : Ihe sec o nd quarter is April. Each lime period in this dala sct is a quarter of a year (tbe first quarter is January..70 17.ln the second q uarter of 1959. 441)3 .t he rate of unempl oyment was 5..n l time period (ye ar and qu arter).0  643.0 . 1 ~ted Ob$eTvatiom on Test Scores and ()tf.9 5  6W. Because th ere nre 183 quarters from 1959:11 to 2004:lV.6 30.7% per year at all annual rate. for exa mple.2 S6385 5099 0.er Variables for California School Distric~ in 1998 P.1 % of the labor fo rce report ed that they did not have a job but were looking for work. 24. .:1 j) .645.
organized alpha helically from Alabamllto Wyoming.2 SeIect. (I re da ta for multiple entities in whic h each entity is ob. Panel Data Panel data. time series data c<l n be used to st udy the evolution of variables over lime and to forecast future values of thos.4 0.3.1 5..5. through 1995. in Ihat data SCI are listed in Table 1.umplion and prices (Ire an ~xample of a panel data sel. The next block of 48 observations lisls {he data for 19M.l:1II 200UV ~ 5.. Unemployment Rat.59:IV 1960:1 1960:11 0.59:111 19.d Observations on the Rates of Consumer Price Index (CPI) Inflation and Un&mpioyment in the United Stoles: Quarterly Data..crved at (wo or more time period ~.3 5. cigarcllc sales in Arkan ~~ were 12K5 packs pe r capita (the tolal num ber of pack~ of cigarettes sold = u. 3 19.~continental states (enti ties) for T = 11 years (time pe riods) from 1985 to 1~5 . and !ielecled vilriables and observation:. .:: variables.3. Thus the re is a tOlal o f /I x T = 48 x II = 528 obsen·atiollS.6 '.1.7% 2. So m~ dala from Ihe cigarc lle consumption dala set are listed in Tablc 1.3 Dolo.1 2OOJ1I 2()(). also called longitudinal data.4 54 IS3 ·\·olr 35 described in A f'PI'ndi~ 14 I. CPllnRmion Rote (% per y. wc have obsen 'ations o n" 4.59:11 19. J9592000 Obefvarion Number (Vear:quarter) . and the num ber of time periods is denoted b~ T.3 1. The number of entities in a panel data set is denoted b)' II. 1>1 I ~l '... For example. In the cigarette data set.4 . The fir~ t block of 48 obse rvalion ~ lihts the data for each Slate in 1985..6 • .6 .s. in 1985. 4. Ill" t·" InnaLM'on 3nd un<'mplo\1T1cnl tbLa loCI Oy tracki ng a single cnlily over time.or crt an annual ra". O ur da la on cigarette COI1 :. a nd \0 fO r1h.1 % 5. Sources and Types 13 TABLE 1.1 5. (%) .
.S.985 (poco per UlpitgJ per Pock (Induding tox••) + .. Soles y.985 U'S6 0. time series d a ta.985 .!{.ai  3 A rizona . and pane l d ata a re summa rized in Ke y Concept 1.O!t9 0.0i5 2 Arkans. TIle definitio ns of c rosssectional data. . was $1. COP5umpUon dDt~ SCI h. ta)c.8 0. Prices..360  NOft.. and Taxes.5). and loca l taxes.n Appendi:>. incl uding tax.1107  96 Wyoming Alabdma .2  0.382 '" 49 ' 985 .240 0.935  n. 1lIe Clgar.995 112.3)3 0.3 SeIeded Observations on Cigarette Sales.8 115.585 0.015. Pa nel da ta can be used (0 learn about economic rl'ia tionships h om the expe rie nces of the many difCe rent enl ities in the da ta set a nd fro m the evolution ove r lime o f the variables for each e nt ity. (cigar.022 1.olon tQJ<J SO.The average price of It pack of cigarettes in Ar.. SlaMS.3. st al e.kansas in 1985.4 J.14 CHAPTER I Economic OvesIion ~ ond Dolo TABLE 1.1. ObNlrvotion Number .1. exci.240 0. 19851995 A~Pric.et.370 I 1165 1285 1045 SI.335 97 1.8 129. in Arkans:ls in 1985 divided by the total population of Ark a nsas in 1985 equals 128. Alaba ma Cigarett..362  47 West Virgima Wyoming 1985 11 2. of which 37~ we nt to federal . Tohri Taxe.2 1. .""" 1. ~bc:d i.'\4 1.135  528 Wyoming . 12.986 1987 127.986 Al nbama 11 7.. by State and Year for U.
whe re each e ntity is observed at two or more time period!. 2. the way to estim a te a causal effect is in an ideal randomized con trolled experi ment. imperfect experime nts 4.each of which is obse rved a t multiple poin ts in time.. Summary 1. Key Termi 15 CROSSSECTIONAL. Crosssectional data a re ga the red by observin g multiple entities at a single point in time. Econo me trics provides tools for estimaling causal effects using ei the r observa tional ( no nexpe rime ntal) da ta o r data from realworld .1 • TIme series data consist of a single emity observcd at multiple time periods. lime series da ta a re gathe red by obse rving a single e ntity a t multiple points in time: a nd panel data a re ga the red by observi ng multiple e mities. TIME SERIES. KEV CONCEPT 1. • Pa nel data (also kno wn as longitudinal data) consist of multiple e ntities. Conceptually. Key Terms randomiled contro lled e xpe rime nt (S) conlrol grou p (8) treatmen t group (9) causa l effec t (9) experime ntal dat a (0 ) observational da ta (10) crosssectional data (l L) observation number (I I) lime se ries data (11) pa nel data (13) longitudina l dala (13) . but performing such experiments in economic appl ic(l ti oos is usually unet hical. impract ica l. J. AND PANEL DATA • Crosssectional da ta consist of multiple entities observed at a single time period. or 10 0 expensive. Ma ny decisions in busim:ss and economics require quant itati ve estim ates of how a ch ange in one va riable atfec ts Another variable.
:l u~a l effect : b. De ~ribe: :. an observational time series data set for studying this effect: and d. Suggest some imped iments to implementing this experiment in prtlclice. 1.livity of its workers (output per worker per hour).. all ideal random ized co ntrolled experimelll to measure 1hb e. c.16 CHAPTEt 1 Economic Oues~on5 and Dolo Review the Concepts 1.3 You are asked to study the relationship between hours spent on employee training (measured in hou~ per worker per week) in a manufacturing plant and the prodU(. Sugge!il some imped. . imen ls to implementing this experiment in practice.1 Design a hypothetical ideal randomized controlled experiment to study the effect of hours spent studying on performance on microeconomics exams. 1. an observational crosssectional data set with which you could study this effect . an observational panel data SCI for st udying Ihi~ dfecf.2 Design <l hypothetical idea l randomized controlJed experiment to study the crfect on highway traffic deaths of wearing seat belts.I.
CHAPTER 2 Review of Probability T his chapte r reviews the core ideas of the theory of probabililv thaI a rc nectl ~d to understand regression an alYsis and econometrics. and Section 2. and varia nce of a single random variable:. 1f you feel confident with the material. chi·squared. For example... mean.lbility distributions for a single random variable.·~ntral importance in econometrics: the randomness that a rises hy randomly drawing a sample o f data from a larger popula tion.. you 17 . Most of the in teresting pro blems in economi cs invoh c more than onc variable. record (or "obscn·c··) their earnings. o ~clions of this chapler focus on a specific source of randomness of (.3 introd uces the basic cle men IS of probability theory for two ra n(]olll variables. Sectio n 2. The theory o f proba bility provides ma thema tical tools fo r qua ntifying a nd describing this randomn e~ Section 2..ou still should skim the chnph:r and the Icml S a nd concept s at the end \0 make sure you arc familiar with Ihe ideas J nd nOl3tion. Mecau5/! you chose the sample a t random.4 discusses three srecia l prubabili ly d istributions Ihat p laya centra l role in sla Tislics il Od econometrics: the normal. chapter. a nti F distributions. a nd Secfion 2.2 covers the mathema tical cxpectation. and compu te the average earnings using these ten daln points (or ··observations'·). Jf your knowledge of prohabilil~ is state. you should rcfrc<:h it by reading thi!.1 revie ws prob. ). The fina lt . suppose you survey tcn rece nl co llege graduates selected a t random . Most aspects of the world around Uii have an elemenl of randomness. We <I~sume p roha ~ i lity that you have taken an introductory course in and statistics.
ln each of these exarnpk:s. Section 2. When th e sample size is sufficiently large. Only one of these outcomes will <lclually occur ( the outcomes a re mutually exclusive). This sa mpling distri bution is. referred (0 as its sa mpling distribution because Ibis di stri bution de. and Random Variables Probabilities and outcomes. il result known as the ce ntral limit theore m.5 disc usses ra ndom sampling and the sampling di. a nd the number of times your computer will crash while you afe writing a term pape r all have an ele me nt o f chance or rnndomness.18 CHAPUR 2 Review of ProOObility could have c hosen ten differt:nt gradua tes by pure random chance. which is discussed in Sec tion 2. . your grade on an exa m. il might crash twice. yo u would have o bse rved ten different earnings and you would have comput ed a differe nt sample ave rage. Because the a verage earnin~ vary from one randomly chosen sample to the nex l. The gende r o ( the nex t ne w person you meet .stri bution of th e sa mple (Ive rage. The mutually exclu sive potentia l results of a ra ndo m process atc C<'l iled tbe outcomes. The refore. you will comple te 80% without a crash .6. the Sample Space.1 Random Variables and Probability Distributions Probabilities. thcn oyer the course of writing many term papers. complicated. there is something nOl yet k. in general. the sampling distrib ution of th e sa mple average is approximately normal . howe ver. and tbe outcomes need not be equally likcly. had you done so.lhe sample average is itself a random variable. If the probability o f yo ur com puter not crashing while you are writing a lem p<lper is HO %.ribes Ihe different possible values of the sample average that might have occurred hnd a diffe re nl sa mple been drawn .)\'erage has a probability d istribution. ' n le probability o f an ou tcome is t he proportio n of the lime that t he outcome occurs ill the long run. the sample . For exam ple. a nd so on. It might crash o nce. 2.nown lhat is eventually revealed . your computer might never crash.~.
the probabili ty or no crashes is 80%: the probabil· ity of one crash is 10%: a nd Ihe probabilit y of two. 1. .ible value:.1 6. Probability Distribution of a Discrete Random Variable Probability distribution ..umulat ive pro bability di. According 10 thiS distributio n. .·. 6%.uiable is the list o f a ll possible va lues o f the .1. that is.jves Ihe (. Cumulative probability distribution . or four crashes is. For example. The probability of an eve nt can be eom pUled from t he probability distribution . whereas a continuous Iandom nlri Hble ta kes o n a COnlinuum of J>O". The numbe r of times your computer crashes while you nrc writ.00 = 0... 2 . you will quit and write Ihe pape r by h<lnd. Pr( M = I) is th(. l11at is. Pr(M = lor M = 2) = Pr(. Some random variables a re discrete a nd some are eontin uou.'. 1 Random Variables and Probability Distributions 19 The sample space and rvents.' rrobabilit y o f a si ngk compu te r crash: and so fOTlh .1 b. 111C l a~t row uf'labll: 2.xa mple. A n .<..W = 1) + Pr( M = 2) = 0. For e. 1. This pro bability di~lribution is plo u e d in Figure 2. A~ their names suggest. a nd 1%. . The probability distributiun or a disc rete r andom \'.viii occ ur.. The cumulative probability distribution i"the prnbability lh:llihe random variable is less than or equal to a particular value.trihutiun of the random . . The probability distrihution of the random variable M b th e list of probabi li ties of e ach possibl e o utcome : th e fl robabi lity that M = O. Tnese probabilities sum to I.: probabilhies s um 10 l(XI%. the probability of the eve nt of one or 1\\"0 crashes is the sum of the proba bilities of the cons tituent o utcomes. TIle event " my computer will crash no more than once" is the ~c l co nsisting of twO outcomes: "no crashes" and " one crash." Random variables. ~o il b a random variable. i~ (h e pro ba bi lity of no computer cr. if your computer crashes four times. denoted Pr(M ::: 0).ing a term paper is random and takes on a numerical valul. Thest. le i M be the nu mber o f t i m e~ yo ur co mpute r c ras h~s while you are writing a T erm pape r. lll C set of a ll possible o utcomes is called the sa mple space. respectively. t hree. an event is a set of onc or more outcomes.xi1mfllc of a probabi lity distribution fo r M is given in the second row of Ta ble 2.'aria hlc a nd the probability t hat each value .or 16%. 3 % . An eVl'nt is a subset of the sa mple space. Probabilities 0/ rvents. like O.lshe~.10 + 11. A random variable is a numerical summary of a random outcome.1: in this dist ribution. :t discrete nmdom \'ariable lllkes on only a discre te set of va lues.
or a cumulati ve d istri bution ..99 0.1 Probability of Your Computer Crashing At Times Outcome (n umber of trG$h. 1 compuler cro~ The height 01 each bor is !he probability thot the the indicated number 01 times.20 CHAPffR 2 Review of Probability TABLE 2. 0. th at is. 0. a nd so forth for the other bars. Pr(M :S I).00 0.01 ProbabililYdi mibution Cumulative probabilily diSiribulion 0. and its probability distribution is called the Bernoulli dist rib ution . 50 14.d.[.boes . e probobility of 1 com' puter crash is 10%. The height of the li r~1 bor is 0. a c.80 .80 0. The height of the second bor is 0 . which is t11e sum of the probabilities of no crashes (80%) and oC Ollt: c ra~h ( 10% ). so the P'"obobility of 0 com pu te r crashes is 80'1.. FIGURE 2 .J 0 2 3 • 0.90 1.. A bi nary rllndom vJJ iable is ca lled a B ~ rnoulli rand om va riable (in ho nor of the seven teenlhce nlury Swiss ma thematician and scienti st Jacob Gemoll lli). th e o utco mes are 0 or I. is 90% .8.' .06 0.00 variable M.' Nu m~r of cn.96 0. the probability of at most olle crash.10 0.. Probability Distribution of the Number of Computer Crashes Pro ln biliry Th e Bernoulli distribution. For example. A cumulative probabili ty dis lributi on is also referred to as a cum ulative dis· tributioll fUll cti on . A n important special case of a discrete random variable is whe n the rando m variable is binary.
.s less than o r eq ual to a p articular val ue. e d istribu· tion of com muting tim es.. 1 R ondom Voriobl63 One! Probobility Distributions 21 For exam ple.f. function between any two points is the probabilit y that the rando m varia ble falls between those two po ints. The probab ility di stribution in Equation (2.2b p lo ts the probability density function of commu ting limes corre sponding to the cumulative dislribution in Jlgu re 2. it is 03tur31 10 Ireat it as a continuous random variable.p. is no t suitable for continuous variables. Tha t is. The outcomes of G and their probabilities thus are 1 with probability p { G = 0 with probabilit y I . 1) where JJ is the probability of the next new person you mee t bei ng a woma n.anable.. . the probability that th e commute takes less than 15 minutes is 20"/0 and the probabilily (hat il la kes less th an 20 minutes is 78%. Figure 2. lhe probability is summarized by the probability density fundioD . Probability Distribution of a Continuous Random Variable Cumulative probability distribution . which is 0. o r simply a density. or 58% . For example.2a. because it depends on ra n· do m facto rs such as the WeatJle r and traffic conditio ns.58. . which lists the probability of each possible value of the random ..d. th e cumula li ve pro bability distribution of a continuo us random va riable is the probabili ty that the random varia ble .2.2a pl o ts a hypothetica l cumula ti. A probabil ity de nsity function is also calle d a p.. Inste ud. the pro ba bili ty distribution used fo r discre te variables. Equi vale ntly.L be tween 15 mi nutes a nd 20 minutes. wh e re G "" 0 indicates tha t the pe rson is ma le and G = 1 indicates that she is fema le.2a as the difference Probability density function .d. This stude nt '5 co mm uting time can lake o n a conlinuum of va lues a od . The cu mula tive probabilily distribu· tion for a continuous variable is defined just as it is fo r a disc re te ra ndom va ria ble. Figure 2. For exa mple. The probabililY thai the commu te takes betwee n 15 and 20 minutes is given by the a rea under the p. conside r a student who dri ves from ho me to school. Because a continuous random variable can take on a contin uum of possible val ues.1) is tbe Bernoulli distribution. a de nsity fu nctio n. le t G be the gende r of the nex t new pe rson you mee t. (2. this prob a bility can be see n on the cumulative distribut io n in Figure 2. The area unde r the pro ba bility de nsit).
Commutin g rim e (llIi nUlcs) •• J~ '" II) C urrmunw .0.78 1l.22 CHAPTER 2 Re"Iiew of Probobilily FIGURE 2 ./ I '" Probab ility df'llsi ly 4115 " . . and i1given by the area under the curve belwNo 15 and 20 minules . :::::.L~.ll6 Pr (Commuling bIM :> 20):: 0 22 1111.d f The pobobility tnol 0 commuting time is between 15 and 20 minule1.d.20 tl.78 {78%}.2 Cumulative [)i$tribvtion and Probability Density Functions of Commuting Tima Probability 1.) of commuting limes..dJ) of commuting times.(1' ..linn of '(}OUlilUtil1!o! nme Figure 2 20 ~~ the cumulative ptobobitity dimibvtion {OI'" c. Probabililiei orc given by ontos u~ the p.INnbuIHlI'I lillll'UOIl ufcomllLuring 11111(' Pr(CommutmgtJrn«:S 15).H n f. 25 )) ~ ~ CUJnm\ltin~ ( i lnt' (milUII(lS ) (b) Il ruh. leu thon 20 minule$ is 0.' 058 !J20 Illxll .ll Pr (l) < Comml1ti"9llmt S 20). ood!he probobilily thai it i. Pr ((ommurmg 1Jme S IS) .11 "'7"'~ . •· J 1' 1 I~ 0. 10 0.20 0.22 '! M . 0. Figura 22b ~$!he probability density fvnctiQl'l (or p. 058 """' n.f. ..lbd' t\' d<'lNf'I fUIl. i1 0 58 158%). The probobiHly thol 0 com muting lime is less than 15 minutes is 0.:" ..20 {or 20%).1 "'I/'} \\.
cnse 10 say tha t the computer crashed 0. (2. If the loan is repuid you get $110 (the principal of $100 plus interest of $10).lhility distribution sho\\ the same information in dif4 ferent formats.""y.1 i~ Ihe mcan 01 u Bernoulli rllnJom .99 + SO X 0.2) means that the average num· ber of cra~h~ o\cr many such term papers is 0.1.Thu~ the probability density func· tion and the cumulativc prob.35..2.10 + 2 X 0. Thus the expec ted value of your repayment (o r the "mean repayment") is $1 08.1.99 and equals SO wit h probability 0. A~ a second cx<lmplc. E(M) = 0 X 0. the amount you lire repaid is a random vari able that eq uals SIlO \\ jlh probabilit y 0.90. Mean.. and Variance The Expected Value of a Random Variable Expected value.2 Expected Values. 4 X 0. O ver many such loon". h Ul lhere i ~ a risk of I % tha i your friend will deff. Expected yalue ofa Bernoulli random 'IIariable.35. 2.1t. The expected '1llue of a ra ndom \'ariable Y. For exa mple.2) That b.35 times while writing a particular tcnn paper! Rather. weighted by the frequency with which a crash of a given size occurs.. The e xpected value o f M is tbe a verage number of crashes over many term papers.u lt and yo u wiIJ gel nothing at al l. the acw al number of c rashes musl always bt: an integer: it makes no <.99% of tbe time you would be paiL! back $1 10. 111t' formula for the ex pected value of a discrete random variable Y .35.01. bUl l % of the time you would get nothing. Mean. the ca!culutio n in Equation (2. 'Cordingly. suppose you loan a friend SIOO a t 10% inle ces!.03 . the expected number of compu ter c rashes while writing a term paper is 0. where the weights a rc the probabilities of that outcome.00 + 1 X 0.lI ca n take: On k different \ alues is given as Key Concept 2.01 = 0.'an of Y and is denoted by .!'o on average you would he repaid $110 x 0. The expected v'ltue of a discrete random variable: is comp Ulcd as a weighted average o f the possible ou tcom~ of that random variable. O f cou rse. An important special case of the general formula in Key Concepl2.2 Expected Volues. co nsi der the number of compul e r ccashes M wilh the probability rJistributi on given in Table 2.. is the longrun [lverage v<liue of the randum va riable over many repcated trials or occurrences.90. Thus.06 + 3 X 0. ond Voriance 23 hctween the probability that the commute is less than 20 minut es (78%) and the probability that it is less than 15 minutes (20%). The expected value of Y is alw called the expectation o f Y o r the ml. AI.01 = $108. de notcd £(Y).
These definitions are ~ummarized in Key Concept 2. denoted var(Y). variable.. denoted £(Y) . The expected va lue of a continuous random variable i ~ also Ihe prolmbility·wcighlt:d average of lhe pos sible o utcomes of the random va (iable.p) ~ p. which is the square rool of the \ ariance and is denoted U)" The :. . Because the variance involves the square o f Y. where. The ex pected va lu e of G is E { C) ~ 1xP+0 x (I . ) 2].3) where the n Olat ion " ~~]Ji Pi " means ·'the sum o f Yi P. Ihl! formal ma thematical de fi nitio n of ils expecta tio n involves calculus and its definition is given in Append ix 17. and so fort h. )'. denotes the first value. It is thererore co mmon to measure the spread by the siundard devia lion .Y2 denotes the second value. £(Y) = YI P I + YZPl T ••• + y" Pl = LY. and that the prob ability that Y lakes on )'1 is P" the probability thai Y takes on >"2is P2' and so fo rth. .2. which makes the variance awkward to interpret. The expected value o f Y.24 CHAPTER 2 Review of Probability  f KEY CONCEPT 2. . The Standard Deviation and Variance 1lle varia nce and standard deviation measure the dispersion or the "spread" of a probability distributi on. is FXPFrTFn _ _ _ _ _ VAlllF AND THE MEAN ..1 .Iandard deviation has the same units a~ Y. (Z. i'" I (2 . fo r i running fro m I (0 k:" The expected vaJue of Y is also called the mean of Y or the expectation of Yand is denoted lJ." Expected value of a continuous random yariable.. the prohabi lit y that it lakes on the value " 1. the units (If the variance are the units of the sq uare of Y. Lei G be the Be rnoulli random variable wi lh lhe probabi lity distribution in Equatio n (2. The variance of a random variable Y.y.P. Because a continuous random variable can HIke o n a continu um of possible values. is the expected value of th e square of the deviation of Y from its mean: var( Y) = E({Y _ ~ .1 Suppose the random variable Y takes on k possible values.4) Thus the expected va lue of a Be rnoulli random variable is p . _y".. 1).v.
0.1) is J.LG = P (Equa tion (2.1'(1 .r 2. 10 .p)' x p .pl· :: (2 .03 + (4 .L) )2.4)] so its v<Hia nce is "ar(G) = uJ = (0 .2 (2.35).6475.p) + (I . denoted (f~ •• is °1 ___ VARIANC E AND STANDARD DEVIATION KEYCONCEI"I I (1'\~ = "m'(Y) = f":{( Y .1lO + (I . 0. . The units of the standard de viation a rc the Slme as the units o f }'.6475 '" 0. the variance of the number of computer cm. so (T M "" vO..06 (2.35)' x 0.1lO. X 0.J.. TIle mean of the Bemoulli nm da m v.p)' x (I .0.2 Expected Volues.\ = .6) + (3 .35)' x 0. Variance of a Bernoulli random yariable. i_ I 2. 2:. v'p(l p).tax earnings)(. TIle sta ndard d evia tion o f M is Ihe !>LJunrc roOl o f the vari ance.\) p. con!'ider an income tax scht:mt: under which a ""orker is taxed III a ral~ of 20% on his o r her ea rnings and then given .\'.35 )' x O.lriable G with prohabi lity distribution in E quiltion (2.X and Y) that arc related by a linear function . MI!Jon. (taxfree) gr.0. (2.(2.yf P" TIle ~t3 ndard dcvint ion of Yis (F l" the squart: roo t of the variance. For e xample. {y. and Vooonce 25 r The: vuriUllCC of the discrete random va riable Y.35: var( M) = (0 . U nder Ihh ta x ~heme.0. a ft erlax earn ings Y arc reb tcd 10 pre. plus $2000.mt of 52000.7) lltus the standard devia tion of a Bernoulli random nlriahlc is (T(.S) 11\:lt is.35)' X 0.s )? is 80% uf pre.tax earnings X by the equation Y = 2UOO + 0.0.0 1 = 0.hes At is the pro b 3hilityweigbtcd average of the squared difference betwee n M and its mean.8. For exam ple.l!. afterlax cOI rni n. Mean and Variance of a Linear Function of a Random Variable 'ntis section discusses random \ ariablcs (say.
...9) The va.t .)'J= E[[O.... .... plus $2 . :_••._. so..... and the standard deviation of Y is u y = ba x' Tht: expressions in Equations (2.Lx and variance Because pre·tax: earnings are random. 11 follows thal m(Y) ~ O..e expected value of ( Y .riunce of aft ertax: earnings is th. which . £( Y) ~ ~y ::::: 2(XX) + O. (2..8).111 ) [I re applicati ons of th ~ more ge nera l fonnul as in Equations (2. Ihe standard de viatio n of Y is a" = 0. her earnings arc 80% of the original pre·tax earnings. Thus the expect ed v~ l ue of her afterlax earnings is 0'1..8(1'".8(X . . Then the mean and vari ance of Yare Py = (2. 1l) 0 + bJ.11>u.SJ.:. (2.1 3) wilh a = 2000 a nd b = 0.. .Lx and (2..1' .LLy)'l.13) if} = b2ul. El( Y .8X. BC(a use Y = 2000 . so arc afterlax earnings..8(X .0.<o<I"r. ~ 2000 + O.).. taking th e square root of the va riance. 12) and (2.l'x)')..Ii4 £ l(X ..·.8 X .. '1 "..1 ..r.... :... This ana lysis can be ge neralized SO tha i Y depends on X with an inlercept a (instea d o f $2000) and a slope b (i n s te ~d of 0.... Other Measures ofthe Shape of a Distribution The mean and standard deviation measure IWO importaot fcalurc:s of a distribu tion: its cen ter (the mean) and ils spread (the standard devialion).8..I ''.~ ••• : __ .I'.10) 11mt is.12) (2.Lx.·".000.1 ... Wh at are the mean and standard devia tio ns o f her aftertax earnings under this tax? Afler taxes..I'x» )' ) = O. the standard deviation of the distributio n of her aft erlax ea rnin gs is 80% of the stand ard deviati on of the distribution of pre· tax earn ings.(2000 + 0....64var(X ).26 CHAPna 2 Review of Probability Suppose an individual's pre· lax earnings next yea r arc a random variable wiLh mean /.1'. l l1is section dis· cusses measures of two other features of a distribution: the skewness..'.9) and (2 ..8I'x) = O... so that Y = a + bX. Y .
lhe reIore. The kurtosis of a dislribuLion is a measure of how much mass is in its tails and. The skewness of the distribution of a random variable Y is Skewness = £[ (Y . the kurtosis will be large.ty) ~J = 0: the skewness of a symme tric distribut ion is zero. th e distribution in Figure 2.eon.e. Skewness. and kurtosis are all bused on what are called the moments of a distribution." are its tail s.3c.). Dividi ng by u ~ in tbe denominator of Equolion (2./ky) J) q~ (2.15) If a distribution has a large amount of mass in its lails. the more likely arc outliers. of (Y . 14) where ITy is the sta ndal'lJ devia tion o f Y./.ts skewness.J. is a measure of how much of the va riance of Y arises f.3d appears to devjate more from symme try than does the distribut io n in Figu re 2. If a d istri but jo n has a lo ng righllail .y)" can not be negative.rom extreme va lues.'J <7} (2. If a dislribulio n is nOl symmetric. The greater the kurto sis of a dislribution. changing the unils of Y docs nOl ch. variance. ./. Th us. twO which are symmetric and two which <I re not.mge j. An eXlreme value o f Y is called an outlier. For a symmetric distribution. Beca use ( Y . and Variance 27 how thick. Visually. on ave rage (in expect ation) .s is negati ve.3 plots four distributions. and the skewness is positi ve. then some ex lreme depar tures o f Y from its mean are lik ely.t y)3 generall y is nm offset on average by an equally likely nega tive value. the kurtosis cannot be: negative. If a distribut ion t)as a lo ng left tail. Ef( Y . The kurtosis o f the distri bution o f Y is Kurtosis = El(Y .2 Expected V~. positi ve va lues of (Y . skewncss. The mean.2.}J. . a va lue of Ya given amo unt abovc its mea n is just as like ly as a va lue of Y the same amoun! below its me<m .3 is ilS skewness. or "heavy.. so lhe skewness is nonzero {o r a disni butio n lhal is no t symme tric./. Ilnd these very large values will lead to large va lues. lf so. its skewnes. then posi tjve va lues o f (Y . for a distribution with a large Ilmount of mass in its tails. Bdow each of the fo ur dislribulions in Figure 2. 14) ca nce ls the uni lS of y J in the numeral0 r.JL)')~ afe no t full y o ffset by neg a live vu lul!s.).i.ty)J will be offset on average (in expectation) by equally lik ely negalivc va lues. Figure 2.Y.JL yt. The skewness of a distri bution provides a mathematical way \0 desc ribe how much a di stribution deviates from symmetry. in Olher words. lhen a positive va lue of (Y .llws. for a symmeLric distribu tion. Kurtosis. so the skewness is unit fre.
\ '" 1).' IU Four Distributions with Different Skewneu al'ld Kurtosis '" ". " = . .1.\ / \ J 1.u U.1 t" I . .. IJ ..\·WIlt"\~ 0.tiled. .e djW"ibutiOl'1~ hove 0 mean of 0 and a variance of I. "'t H. n. .. the di$lributions with nonzero skewness (c and d) are not symmetric_The distributions wi. '" :'i " I >.. kunos]..: . \ 0. .<' .6.~ is called Il'ptokurlic or..'I Iq~ / \ I i' ~ 0.2 . 3 . . The dislribvrioru with skewnen of zero 10 and b) ore symmetric.r 0." kurtosis exceeding 3 [bd) hove heavy loill " me kurtosis of a normally distributed random variable is:t so a random vari · abl e with kurtosis exceeding ~ has more mass in its tails than a normal random variable.J is its kUrlnsi.. Below each of the four distributions In Figure 2. in tIg ure~ 2.28 CHAPTER 2 Review of Probability FIGURE 2 ...  3 . Like skewness..hs tribu tion. the kurtosi ~ is unil free. All of the:.. . A distribution with kUrlo~i s exceeding . kUrooi.3 (\. . :W !.~ .. .5 II... \\ur<. 111. . heavytailed .1 ( b) ')kCI'v1K"5 . k urlO'il> . "" (a) skc\\ I1l"' ''::: ()..~ . !l_3~ \ u" 11.. 5 . (c) ~1.1l .I'! . so changing the units or y dOC3 nOI change its kurtosis. ( !) "2 IlL 11 ':i) I . ~l (d) " . kurto".1...3bd arc hC3\yt:. . n.. more simply.' (1. II. .\ " .
According to this distribution . the ~xpc clCd value of Y' is called the r l h mome nt of the random variable )J. 15% or the days have rain and a long com mUle (X = O.3 Two Random Variables Most o f the interesting questions in economic. ATe collcge graduate~ more likely to have ajob tha n nongraduatcs? HO\~ does the dis tribution of income for women compare to that for men?These questions co ncern the di<. and cond itiona l probabi. 15. and lei X be a binary random variable that equais O if il is raiD ing and 1 if nol.. is the probability tha t lhe f(ln dom variab l e~ simultaneously take on certain values. E(Y). The probahilitiell of all possihle (x . Bet ween these two random va riables. Y = 0): rain and short commute (X = 0. weathcr conditionswh e the r o r not it is rainingaffect the commuti ng lim\! of the student comm ut e r in Section 2.07.lribu tion of two random variahles. The skewn e~s is a function of the first..ion s. The joint probHbilily diSlrilmfion of Iwo d iscre te random variable!'.15. and Ihe kurtosis is a function of the first through fourth moments of Y. . is a l ~o ca lled the firs t momen t of Y. I. Y = 0) = 0.l11al is. 1nese four possible out comes are mutually exclusive and const itute the sample space <. a nd the expected value of the square of Y.lity di~tribut. The mean of Y. A n ~wering such questions requires an unde rstandi ng of the concepts of joint. rainy com mute is IY'/o. that is. 2. is coiled the second moment of Y.3 Two Rondom Voriobles 29 Moments. considered togethe r (ed ucatio n and employment SlatllS in the fi rst exa mple.1.63. the rib moment of)' is £( 1"'"). Y = 0) = 0. For example. o. and third mome nts of Y.. say X and y . Y = 1). An example o f a join t distribution of these two va ria bt e~ is gi\'en in Table 2. l' = 0). there are four possible out WIllCS: it rains and the com mute is long ( X . 11le joint probabi lity distribution is the frequency with which each of these fou r outcomes occurs over IlHlny repea ted commutes.2.0 the four proba bilities sum to l.. Le t Y be a binary ran dom variable tha t equals I if the commutc is s hort (less th an 20 minu tcs) and cqua l~O otherwise. over mc ll1Y commutes.o.abil it y distri bution can be wriUen as the funct ion Pr(X = x. Y = y). and Pr(X = I. 1l1e joint prob. Y . In general. or Pr(X = 0.0). involve two or morc variables. Pr(X = 0.sec ond.2. Y = 1) = 0. Pr(X = I. Joint and Marginal Distributions Joint distribution. say x and y . income and gender in the second). £( Y:!). and no rain and shon commute (X = I. lhe probability of a long. Also. Y = I) = O.\') combi· n<ltions sum to 1. Y = 1).mar ginal . no ra in and long commute (X ".
II X can take on f different v<llues X I ' .2 Review 01 Probobility TABLE 2. the joint pro babilit y of a ra iny short com· mute is 15% and the joint probability o f a rainy long commut e is 15 %. so if it is ra ining a long commute and a short comm ute are equall y likely. 1) T . over many commutes it rains 30% of the time.. . The mQrgi'ud I>robability distribution of a random variable Y is jusl anothe r name for ils probabililYdistribution. of £I ll possible out comes for whi ch Y takes on a specified va lue. Th u<. Conditional Distributions Conditional distribution..2 Joint Distribution of Weather Condi6ons and Commuting Times Roin (x '" 0 ) No Roln (X . .70 0. what is the probability of a long commute (Y = 0) if you know it is raining (X = O)? From Table 2. The m<l rginal distribution of Y can be com put ed fr om the join! distri bution of X a nd Y by addin g up the probabilili~s. so tt)e probability of a lo ng comm ute (rainy or not) is 22%. is 50%. the ma rginal pro b. Y ~ y). in Table 2..2. Similarly. the proba· bility of a long comm ute (Y = 0).x" the n the ma rginal probabilit y that Y takes on the value y is PT(Y ~ y) ~ :L P.78 100 Marzinal probability distribution.2.50.2. For exa mple.15 0.. or Pr( Y 0 IX = 0) = 0.. the probability of a long rainy commute is )5% and the probability of a long commute with no ril in is 7% .30 0. Equivalently. The ma rginal distrib utio n of commut ing times is given in the final column of Table 2.litional on another ra ndom variable X taking on a spcci fic value is ca lled Ihe con ditional distribu lion of Y gil'en X.biJil Y thai it will r<li n is 30%. Of this 30% of commutes.. the marginal probability of rain is 30%: that is.=1 I (2.on) from thc jo int diMribut ion of Y and a na l he r ra ndom va riable. as shown in the fina l row 0 1 Ta ble 2.22 I)  0. 'Tlte conditional pro bability Ihat Y take!io on t h ~ Vul lle y whe n X t<l kes o n the value X is writte n Pr( Y = y lX = or). (X ~ x.' Long Commute ( Y Shon Commute ( y Tolal '" 0) "C 0.63 0. The distribution o f a ra ndom variable Y conc.Th is te rm is used (Q distinguish lhe distribution of Ya lone (the ma rginal diSlribut.07 0.15 0.2.30 CHAPTU. 50% = . conditional o n it being rainy (X = 0)..16) For example.
Joint Distribution '" Old compu!er (A . the newer compUie rs a re less likely 10 crash tha n the old ones: for exam ple. lhe cond ilional prohability of a lo ng commUle given thal it is rainy 0) ." = Y ~ y) (2 . Suppose you use a computer in the libra ry to type yo ur term pape r a nd rhe librarian ra ndomly assigns you a compute r frum tnose ava il a bl e.50 O.8 0. is given in Part B of the table..05 0. Conditional Distrioorions of M given A '" Prt. Then the condi tional distribution of computer crashes.45 0.90 0. half of which a re new a nd half of which a re old .O) / Pr(A = 0) = 0. A (= 1 if the compuler is new.. Y = O) I Pc(X = .3.UO~ 0. given th e age of the computer.01 '" _ 3 '" 0.17) Par example. 1 ".30 0.50. the conditio nal probabilit y o f nocrashes.1 0.00 o.01 . "' _ 3 "' .02 O.r is P.'/IA .1 1.Q35 0. .0. is Pc(Y = OIX = 0) = Pc (X = 0.1) 0.06 0.3 T wo Random Voriobles 31 TABU 2. For e:w:am pie.= 0.00 0. As a second eX!:I mple. . is Pr(M = OIA = 0) = Pr(M .065 0.35 . Because you are randomly assigned L a a co mpule r. consider a mod ification Of tnc cras hing compute r exam ple. .35.3 Joint and Conditional Distributions of Computer Crashes 1 M! and Computer Age (A) A.70.t3 0.(y = YI X = x) = Pre: ~.35 /0.1 5/0. because half the computers are old .W' A I) ".lO .00 •. . Suppose the joint disLribution of the random variables M and A is given in Part A of Table 2.._1 0.02 0. = 0 if it is old) . In contrast . . the cond itional probahili ty of no c rashes given tha t you are assigned a new compute r is 90%.00 1.10 0. or 70%. the conditional d istributio n of Y given X = . 4 0.0) Pr(. 0) ~cwcomp u l c r(A 0.00 B. A .m 0. 0. According to the conditional distributio ns in Part B of Table 2. the jo int probabilil YM = 0 a nd A = 0 is 0. is a random variable.50 = 0.01 In gene ral.03 1.given that you a re using an old com pute r. the probability of three crashes is 5% wi lh a n o ld computer but J % with a new com puter.Q25 O.OS 0. "' _ 1 '" . . the age of th e computer yo u use.Ql Tou l 0.07 0.2.3..
13 + 2 x 0...side of Equat ion (2. is E(M A 0) = a x 0.1l1ed the •. is the mean of the conditional distribu· tion of Y given X.20) is computed u. St:Hed differl!ntJy. the conditional expectation I~ lhe t:=:(pecled value of Y.. if X la k.ln the e xam ple ofTable 2. x1.. the mean he ight o f adult!.:d average of the conditional ('x(lCctation of M given Ihal il is old and the cund.J mathematically.: weighted CI\' er" g~ of the cond itiona l expectation of Y given X...lional expectalion .. TllC conditiona l expecta tion of Y given X = .1lJ) follows from Equations (2.lbl:o E(Y) ~ L .: X ~ x)..J. also c..32 CHAPTER 1 Review of Probability The condit ion al expecl alioll or Y gh cn X..70 + I x 0. E(Y) = EIE(YIX)]. is £(M A 1) 0. \.'o nditional mean of Y gino X . For example. Stalci.ln number of cr.14. computed using the conditiona l distribution of Y given X If Y lakes on k values YI . give n tha t the compu te r is old. the me.20) is known a. Pre Y ~ y.14. is the weigh ted average of the mean he ight of men and the mean heigh l of women . I I E( Y I X ~ .Qlion. . L)' (2.lhc expected number of computer cf<I!lhes.10 + 3 x 0..J . (2. (2.05 + 4 x 0.. Equation (2.. given thaI the computer is nc:w.\ cight.18) ano (2. the I!xpectCltiol1 of Y is the expe ctation of the condit ional expectation of Y given X.56.rcisc 2.!mong !le. = = = The law ofiterated expectations.x.3. weighted b)' the probability distributio n of X.). hased on the condit ional dislribUlions In Table 2. ..shes is 0. tha t is.Il'r(X ~ x.sing thi! conditional distribution of Y given X and the outer expectation i~ com· puted using the marginal di"tribution (If X..l.1 7) (sec EXl. co mpu te rs..' expected Dum· her of computer crashes.l(pcct<ltion on the righthand ..• )~.\"..I • . lhe mean number of cT<\shes i. The mean of Y is th.111(.02 = 0...56.r is jU $t thc mean value of Y whe n X .20) where the inner c.56 fo r old com· puter<. For example. That ic.19) Eq uation (2.\ 4.omen. then the conditional mean of Y given X = of is E(YlX ~ x) ~ Conditional expectation ...s 0.. so the condi tio nal l! xpecta lio n of Y give n thm thc compute r is o ld is 0.\ eightcd hy the proportions of men ani.es o n the I \ al m~~ x l.. the mean number of crashes M i3 the \. the conditional e:\:pectation of Y given that the co mpu tcr is new is 0. the law of ilerated expecl.19). Similarly..18) For example. . h:~~ Ih:1Il for Ihe old computers.
the variance of the distri· bu tion in the second row o f Panel H o f Table 2.1properties of conditioml l expectations with mult iple \ anahles.t . Z) is the conditional c'<pcctalion of Y given both X and Z.0. thaI is...d mathematically.. 111en the law of iterated expeC(.:d ex pectations implies thut if the conditional mean of Y given.3. For exa mple.56)! X 0. '1l1e law of iterated expccHllions also appl ies 10 expectations tha t arc condi· tional on multiple random v::trianlcs. then E(Y) = ElE(Y [XJI = £IOJ . the cond itional variance of the num ber of crabhes given that the compu ler is old is \ar(M IA 0) (0 . The \l ari ltDCe of Y conditional on X i ~ the variance of Ih~ cond it ion al distribut ion or Y gi ven X.20 provides some addition. P) i~ the expected nu mber of crashes for a computer with age .05 + (4 .:r wi th age A and number of progra ms P.cro. y. iY.99). For the conditional dis tribu tions in Table 2.56)'2 x 0. thc mean of Y must be zero.99 = 0. let X. 0::: 0::: E( YI X =x)1' PreY =). so the standard de\'iation of M for new computers is \1'0.22. X.:. kt P denote the number o f program~ installed o n the computer.35.3.56 X 0.! weighted avcrage of the expected number of crashes for a comput. the expected number of crashes for new com puters (O. 1) 0. [X = x) .\' IS z.99. if E( Y IX) = 0.22 0.. in the comput.47) th:m (or old (0. l~) is less than that for old complllers (0.( . The conditional vari:lOce of M given thaI Ii I j .20).0. t . Conditional variance.99.0. the condit ional variance of Y given X is .70 + (I .1lle expectl!d numbe r of c ra~h~s overall .n the mean of Y is i'cro. which is 0.56)2 X 0. (3 .S"id dirrer· cntly. i~ tht.2. (2.ero. :(M ). as measured by the condilional s tandard deviation. var(Y I X = x) = L .1 (I . Z ) 1. The law o f iteratt. and the spread of thc distribution of the num· ber of crashes.56)'2 X 0. then it must be th nt thc probahilityweighted avcrage of th ese conditional means is zero. as calculated in Equation (2. For example..02 ~ O. and Z be random vari· :lblcs that are jointly distributed.50 + 0..50 = 0.l tions say~ that E( }') = £{£( Y .0..1l1is is the mean of the ma rgina l tlistnhution (If M.3 TYI'O Random Variables JJ of M given that it is new. if the mean of Y given X is I.:r crash illustration of Table 2.56). Exercise 2.13 + (2 0.1be standard deviation of the contlitiona l distribution of M gi"en tha i A 0 is thu~ v'O. Stalc.14 X 0. the.47.t that has P programs installed. 0::: 0::: 0::: .21) For example.3. so £(/1'1) = E(M iA = 0) X Pr(A = 0) + £(MIA => 1) X (' r(A . then F.\1 .. weighted by the propo rtion o f computers with that value of both A and P. is smaller for new computers (0.O..lb is is an immcdintc consequence of 0::: E4 ualion (2.56)2 X 0.2). where E(Y ~ X.
TIle covariance between X and Y is the expected value EI(X . Thai is. then the covaria nce is give n by tbe for mula cov(X. X and Yare inde penden tly distrib uted iI.r) = PreY::: y) (inde pende nce of Xa nd Y) ./. (x. then the covaria nce is zero (see Exercise 2. . The ctw aria nce is denoted by cov(X. . Y ~ y) ~ Pr(X · x)Pr(Y ~ y) .u y) tcnds to be positive. then Y tends be grcateI than its menn (so that Y . if X and Y lend to move in opposite d irections (so thai X is la rge whe n Y is small." y) Pr(X ~ x" Y ~ y.iJ. Finally. X and Y arc independern if the conditio nal distribution of Y gh'en X c40als the margina l dislri bUlioD of Y..y}(y.x ) X (Y .24) .~x)(Y . (2. If X om take on I values a nd Yea n take on k values. the jo int d istributio n o f two independc nI random va ria bles is the prollucl of their marginal distributions.22) Substitut ing Equation (2.) .. y is positive) ./Lx)(Y .i. (2. then Pr(X ~ x. Specifica lly. . the n the covariance is negat ive.J. or indepe ndent. In contrast.19).Ll' is the mean of Y.22) into Eq uatio n (2. fnr all values of x and y . and vice versa).· sion for indepe ndent rando m variables in terms of their join t distribution ./Ly < U).\ /'" \ To interpret this formu la. if knowing tbe va lue of one of the variables provicJes no information about the o ther. If X a nd Ya re independent." ./L y)l . and whe n Xis less tha n its mcan (so that X .x < 0). t 7) gives an alterna tive expre!.23) Thai is. the product (X . whe re ILx is the mean of X and /. so the covariance is posilive.p. I (2. In bot h cases.1 34 CHAP1U:1 Review 01 Probobility Independence Two random variables X and Y are independently distributed.. then Y tends to be less than its mean (so that Y .. . Covariance and Correlation Covan'ance. One measure of tbe exte nt \0 which t wo random varia bles move toge ther is their covariance. suppose that when X is greater tban ils mcan (so that X . if X a nd Y are indepc ndc nI .)) =L L .I'.L x is positive). Prey c y1 X .Y) or by (T XI' . Y) ~ d Xy ~ E{(X .
= eov( X. If Y and X do not ha ve mean zero.}. O.X) .23.\ ' + /L y. (2. YJ v'var( X ) var( Y) =~ u. It is n OI necessarily true. first subtract off their means. tben the preceding proof applies. the units of X times the uni ts of Y. then Y and X are uncorrelated. (2.27) We now show this result.2fi) Correlation and conditional mean . Speci fi cally.X) = E[( Y .By th e bwofi tcrated expectations [Eq"".. Sa id diffe rently. as prove n in Ap pe n dix 2. its units arc.25). '0 oo' (J. awkwardly.20) ]. An example is given in Exercise 2. ·lhis "units" p roblem can make numerical values of the covariance difficul! to interpret.. however. The correlation always is between . if £( Yl X) = JAy. Equation (2.X) = O. (2 .28) .Y) Correlation. tha t if X and Y are uncorrela ted. ff the condi tion al mean of Y does not depend on X.Lx)] = E(YX)..X) = 0 into th e de(ini tion of correlation in E quation (2. then the conditional mean of Y given X does oot depend on X. the units cancel and the correlation is unitless. the correlation betw een X and Y is the covari ance be tween X and Y.. The Mean and Variance of Sums of Random Variables 'nle mean of Ihe sum o f Iwo random variables. The random va ri a b l c~ X and Y arc said to be uncorrelated if corr(X. divided by their stan dard deviations: corr(X.25) Because the units of the numerator in Eq uation (2.1.25) are the same as those o f the de nominator. E(Y IX) = O. Y) = O. .I s curr(X.27) follows by substituting cov(Y. deviated from thei r means. so t hat cov(Y.X) = 0 and corr(Y. :I Two Random Voriob~ 35 Because the covariance is the product of X and Y. Y) :0.is the sum of their me'llls: £(X + Y) = £(X) + £(Y) = /L. That is.2: . it is possi ble for the conditional mean of Yto be a function of X but fo r Y and X nonet he less to be uncorrelated.\'Uy · (2. 111e correlation is an alternative m e a~ure of dependence between X and Y that solves the "un its" problem of the covariancc.oo (2. then cov(Y.J. E(YX) = E{£(Y [X)X] = 0 be"".(y)(X .I and I: tha t is. First suppose th at Y an u X have mea n zero. X and Y. 1 (correlfHion inequality).
'7 l3.:n with high 'ichooJ diplon1. nnd of the conditional dl~tribUlion') able to !let u heUe r. ~nd l>unuSC1. erc compul.4. $20. does the distribution of earning~ for men a nd women differ? For example..S. ·? Among workers wit h a <. and thc mcan.. if Son.nb... ')tandard dC\'ialion.2J Jlt85 11.lrihuilon fo r womcn with only II high school degree ( Fittu rc 2Alt ): Ihe ~n m e ~hifl c.. and. 1For c\..2) 15..02 19.. dlf.l~) 2:t'!.:grel: than If Ihc)' Are presentcd in Tahle 2..lribution of t hc~e some pcr~nli1cs nre a college de.:u In Appendl\.79 lmedian) '''''' "" $ Iti.1ch.3• CHAPTER 2 Review 01 Probability S ome parenl:> leU their children that ~kip th~y will bl' F. female college gradu a[c ~ Gelillcr = Il1e distribu tion of iI\'cragc houri\' ea rnin~~ for (Figure 2At'!) is ~hiftcd to thc right of the dif..mlings..\'nate huurlr cllrmnJl. are thc bes t paid college.25 21.· degree) and on gender_ These arc shown in ble 2.:: 2Ad :lnd tl collc~e t he di~tri bUli on of e.31 35. hig h c r' ra~jngjob I{they get higher education.{/.' I "'l"lctJ annual!) l llc: d~ltll>ulil'n\ .im Har educntiQn. fmwlt'}is SJ3..'!>l dCgrl'C earnings differ hcl".4 Summor~s of the Conditional Distribution of Averoge Ho urly Earnings of U. UC th. "'''!!CS.dJ~t<lcd tw Ihe number ur hU'l!" .4 .25 JX:r htlUr. the ~pre a d of the distribution of ~ilrn ing.04 $12.'lrn ing~ arc higher {or those with degree (Ta diploma or b.1 ! .15 S 7.4. salaries.: conditIonal mean of earnings for " omen ('05/ wh~e parents rig ht'! Docs th~' \010 high. who h.'I 17.co nditional o n thc high eM cducatiull nl deeree achie ved (high four conditional d i ~tribut i ons Figure lAc).:cd ueUled me n? degree = high ~clf{)ol (lip /Villa.!o<...m be ~cell Onc wny to n!lswcr [h e~c questions is \0 e \:amine ~e hoo l for the tW O groups of ml:n lFigur.viotion "" .ce n workers gradua lc ~ ho are co lle~e i!> a high school di plo mathat i"..!005 rum:nl 1\'f'LIlalJon SUI"ey..71 ucgrec .t horn the M:lJdI .11I . mean e.26 J4.$ 8.clor<.54 17.w!! only a high school diploma.. "'hoclll~ d.tmple:.~ .~ure 2..}.63 27.. Fo r both men ...74 11. and worken.. Ih.. ho.! Ui3 :t:'i. )1 d iploma (b) Women w)lh [ouryear colle!!c degree (c) NIl.26 2d ~5 48. 'urn u( annuat prt'ta:c. Intcresllngly. FullTime Worh rs in 2004 Given Educotion level and Gender (n) Women with hIgh sd100.educatc!] W Omen paid ll~ wdl as t he l'k!stpaid co Ueg. as measured cQlI/il/lled N I I/e. E( Fllrmnf:~ j/liglr.87 24. hpo.1 (d) Men wllh fouryea r oollcg.tl page TABLE 2.: ... $13.lOll \\'('lmcn. first numeriC column ).12 Perc~tUe Standgrd o.
2.3 Two R ondom Voriable! 37 by the st:munrd deviation. is greater for those with a Another feature of these distributions is tha t the .
and V be random variables. + (T~ (if X and Y ar" independent).. Y.\ + ell i • (2. H nd c be conSlants. in . Y) (Fr. 111C fo llowing f.31 ) (2.. /)(TXl' cov(a + IlX + cV. and leI a. f:( y I ) "" I Icorr(X. y be the covariance betwee n X and Y (and so forlh for the othe r vari· abies).o h iog weighted sums or random \'ariable~ are collected in K ey Concept 2. (1.33) E(XY) = :5 + .36) If X a nd Y arc indepcndcm.32) + 2abCT.35) The varianct! of the sum of X lwd Y is the sum of their \·arianccs. variance. ar(Y) + 2cov( X .30) (2.l') = (T X ) + C<Tn"' ( 2.xlt y. \'ar(lIX + by) """ (/ 2(rk (2.'rul expressions for means.b Y) = h2(f~.{ Q"~ (corrdalion inequality).Y ) = uk + fT~ + '2<rn.15 fo llow from the ddini lions of the mean. then the covariance is zero and the \ariancc of their sum is the sum of their variances: var(X +n = var(X) + var(Y) = if !. VARIANCES.3. (237) Us(...34 ) 1 and '" n ! S V{T.38 CH APTU 2 Review of Probability KEy CONCEPT MEANS. and covariance: E( tI  .'O' + /rn+. ami covarian(:e!.. 2.3 iT t + bX + e Y) = (I + h/L. AND COVARIANCES Of SUMS Of RANDOM VARIABLES LeI X. variances."\I. . (2. b.3 arc derht..u. + . The rcs u lt~ in Key Concept 2. let I1x and be the mean and variance of X .:d in Appendix 2. plus twice their covariance: var(X + y) .1. leI Q).and (2. \'ar( X ) .29) var(a .uf'.
between JA. Pr(Z s c) = (fl(e).t 'JNJ I' JI . I. I). where c hi <.t ribulion.:lhili ly density shown in Fi gure 2. and F distributions.cd conci"cly a<.%u.2.5 SllOV. The Normal Distribution A eonl inuo ul> Tnndom variab le wit h n no rmal dhtrilJUl ioll h. . and variance ~J 2 is symmetric afound ill> mean an d ha ." The standard nnrmlll dist ribution i~ the normal distribu · tion with mean J.I. ·lne specific (ullction defining Ihe normal probability densit y is given in Appendix 17.. and F Distributions The prohabililYdislnbUlions mOSI often encounte red in econome trics are the· nor maL c hi ~uared.. = 0 and variance q2 = I and is denoted N(O.. Random variables Ihat have a N(O.u.\" the familiar bell· shaped prob. ChiSquared..95 The normal di~tribution is denoted N l.900 i! 0.As Figure 2. (J~ ).4 The Normal.1. Chi·Squared.I) distribu tion art' often denoted by Z .5 The Normal Probability Density The normal probobility den ~i ly function with mean p. it mUSI be st:mdardized by first subtract ing the mean..I. centered at JA.md vari:lOce a ~ is expres!l. The area under the I"IOfmol p.1 is a bell·lhopcd curve. th en Jividing the re:. ond F Oi~tributions 39 FIGURE 2 . q 2J.·S. Student t. ". ··N(p.5.4 The Normal .ull .f. + 1. The normal d istribution with mea n J.9<xr and J.900 and p. 1.d.<)()(f )' 2. 95% of its prubllbility between p. 1.1 constanl. Student t. Student t. Some sJ'lCcia l nOlalion aDd tc rminology have been developed for the normal di .. JI . Values of the standard nor· mal cumulative dil>tribution function a rc tabula ted in Appendix Table L To compuk pmhabilities for a norma l variable \\ilh" genera l mean and \an ance.. the nor· mal denslI) with mean p. and variance . and the standard normal cumulative distribution fun c tio n is denoted b~ the Gree k leiter <1>: accordingl:v..t + l..
suppose Y is distributed N(l .that is.. th e random va riable ~(Y . ThUs.I ) is normally distributed with meaLl zcro and va ria nce o ne (see Exercise 2. Then.4 c:r: in other word.(I'« i l ).1).<t>(dl)." presents a n unusual applicatio n of the c umulati ve no nnal distribut ion. that is.and Pr(c l s Y s e2) = Pr(d J S Z s d 2) = (I'(tll ) . Wha t is the proba bility that Y s 2.5) ~ 0..39) PreY ~ el ) = Pr(Z S el2) = ¢I(d2}. Y is nonnall). The nonna! distribution is symme tric. it hA S the sta ndard normal distribution shown in Figure 2.69 1 is taken fr o m Appe lldi xTablc 1.40) 'nle normal cu mulat ive distribution function ell is tabulated in Appendix Ta ble 1. 4).6a'! 1he standa rdized vers io n o f Y is Y minus its mean. that is'1( Y . di vided by its s ta nda rd devia tio n. These ste ps arc summarized in Key Concept 2.L)hr.8).J. The sa me approach can be applied to co mpUle the probability tha t a normally distri buted random variable exceeds some value o r thaI it fa lls in 11 cen ai n range.6b./L }/a. l llen Y is sta nda rdized by sUbtracting ils me an and dividing by its standard deviation.I varia nce of 4. distri bu ted with a me an of 1 a nd <.. The box.4 1) where the value 0.] ).• 40 CHAPTER 2 Review 01 Probobility r. Pr(Y ~('I) = Pr (Z ~ dd = I .. wha t i s the shaded area in Figure 2. by the standard deviation .. so its skcwness is zero. Y is d ist ributed N(IL. ( Y . (2.. Th ~ multivariate normal dis tribution_ l lie normal distribution can be gen . by computing Z = (Y . Accordingly. NY " 2) ~ Prfl (Y . "A Bad D ay o n Wall Slree t.t) s ~(2 . Let CI and C1 denote two numbers with c1 < d" = «('2 . Now Y s 2 is eq ui va le nt 10 i(Y .1) / V4 "" iCY . (2' and let d 1 = (e1 . "Ibe kurlosis or the nor ma l distribution is 3.) /u and (2. • KEY CONC' COMPUTING PROBABILITIES INVOLVING NORMAL RANDOM VARIABLES Suppose Y is normally distributed with mean Ii and variance 2.1)" \1~ Pr(Z s j) ~ <1>(0..1) s }. Ih at is. (2. (Tl).38) (2.4.691. tha t is. For exa mple.
a2uJ + b2al + 2abif. 1.6 The Normal. and FDislributions 41 Cokulating the Probability that Y:s 2 When Yis Distributed Nfl.2).5) 0.1.691 . I) distribution 0.on in I Figure 2. is given in Appendix 18. If X and Y have a bi variate no rmal distribution with covariance (Txy. from Af:Ipendix T obie I. if /I random variables h.d.0 (a) ~~1. Siudenl t.0 2.0 0..1lle fomlUl a fo r the bivariate normal p. .y. the distribution is called the multi"ariate normal distribution.5 (b) 1\'(0. slondordizing Y is J.. Y is standardized by subtroct ing its mean (~ .. 1Me probability lOOt '( s 2 is shoo. is a standard normal (II random .1) and dividing by Pr(Y s 2) its standard deviation (17 .. and if a and b are two constants.6b. and the corresponding probobility one. NU .. PrjY s 2) = PrCT:S Pril $ 0.own in Figure 2..I.51.arioble.2. and the formula for the general mult ivaria te nonnal p.. ChiSquared. OT.42) Mo re generally.4 FIGURE 2. the bivariate normal distribution. Y bivariate normal) (2 . I) . standordize Y. 4) distnbution y.f. is given in Appendix 17. 8ecouse the slondordized random variable.'llion of these variables (such as their sum) is normally distrihUied. then aX + bY has the normal distribution . tondord normol distribu tion tobia.f. if on ly IwO vari ables are being considered. The multivariate normal dist ribution has thrce important p roperties. ~en use the . Pl'(l:s: 0.5) .. 4) To calculate Pr{ Y$ 2).d.60.x + hp. aX + bY is distributed N(ap. then any linearcombin.N(o. 4) r Y) P rlZ s 0.we u Illultiv:lrinte nomlul distribution.n) ( X .69 1.0.
IK ~ l"iU 1%. then th. a plot o{ the clail.OOO . Pc: rce n{ c han gt It) the 1980s. i< ·2!l ~19. thA I is.6% wa ~ fl ne. October 19.. it!).!!flli \ c return of Z2( '" 2.6%! From January I. . Y"8 r . t mded on the 0.''''~n more:. whe re {h er e Are .hul nothing com r. .00014. t'lS{. This is 11 lot.. bUl price changes on the Dow wtl ~ 1.hc Oow l o nes Industrial An:mg. dard dC\'ialions..:" 510d.1 6% .6%. .: (an average of 30 large mdustrial stocks) fell by 25.42 CHAPTER 2 Review of probabili ty O n iI typica l d tl~ the overall \ alue of stock:. The enormi ty (If Ihi.1 1'~ :.6fL 16) stall t:rm lillll l'd FIGURE 2.1onday."1\0131 o f l06leTO!>! 25.~ 1"11" 1'1'11 . 19110. 1') .'i . returns on the D o w during the 1980s.'· . 1987"BIock Moodoy" iho .7 I '/).. the overage percentoge doily change 01 "!he Dow" index was 0. 10 OC loher 16. 19H7. the ~tan d a rd uc \'iation o f daily pcrccnl agl' Yuu will not find thi ~ va lu~ In Appendix Ta hle 1. market cun rise o r fall hy I % or t. ().hungcs nrc normally distributed. On October 19.. . drop eM\ be seen in Figure L 7. 1')). so lhe drop of you can cakulal(' it u ~if1g a computer (Ir).7 Duri ng Daily Percentage Changes in the Dow Jones Industrial Average in the 1980.. This probabililY is 104 x 10 w' .05'l ond its standard devioliQn WQS 1 16%. If da ily pa ccnlage price (. 1987.)(ltbty.: probabili ty of 11 drop of al k usl 12 standard devin lic)nS IS Pr( t $ ..1987 ·1 ~ " L~c~c~c~~~ I __LLL__L" 1')kU 1')111 I<.»<b loll 25.21) '1 '( 22 ). On "Black M(.Ired In what happened on \. or more thon 22 siondord deviotion.
. stock price per centage changes hove II distribution wilh heavier tails than th. U XY =: O. or 2 X Ahhough Wall Slrecl did have a bad day. rega rd less o f the ir jo int disu ib utio n. a nd Zl be ind e pende nt sta ndard norma l ra ndom variables. The ChiSquared Distribution Thc chisquared d istributio n is used when testing certajn types o f hypo theses in statistics and economelrics. so some periods have higher "olatH. which is ca lle d Ihe d egrees of freedom o f the c hisq uared d istribut ion.!O about o ne in 6 billion.42) by se tting a = 1 and b = O Third. If X a nd Y a re jointly no rma lly distribute d . if va ri a bles with a m u ltivariate n o rmal d istri b ut io n have covariances tha t equ a l ze ro .? Consider the fo!lowing: • The world popula tion b ahout 6 billion . TIle name (or this zi Z 1 .4 x 10 101 . let 2 1.13• Second .:ginning of time is 2 x 10. In Se!. tion (2. the n the con verse is a lso true. The probabili ty of choosing one at random is 10. Thu s. if a se t or variables has a mullivaria le normal di:\tri bution. Z z. The Normol. This resull. there <ITt! more days wit h large positive or large negative to lO. or about 5 X 10 1. ChiSquared.81\Ce of the l>erccntagc change in siock prices can cvoh'e over time.that zero covaria nce imp lies indepe nde nce is a special prope rt y o f the m ult ivariate n o r m<l l dis tribution that is no t tr ue in general.sn the prob· changes than the normal distrib ulion would suggest.lhe fact tha t il happened at aU suggests tnat its probability was mon: than 1.tconds since the hl.3 il was sta ted that if X and Ya re inde pe nde m then. • The re are app roximately 1043 molecules of gas ill the first kilomete r above Ih t! earths surface. financc profe~siollllls ability of choosing n particular second at random from all the !. This distribution depe nds on m .18.'i lh 3 degrees of freedom. 1l1en + + Zi has a ch i·squared d ist ribution \\. so the probabilit).2. Ihen the var iabJes are ind epende n\.1 X 10 un. In fact. of winning a random Jonery among att living people . Studenl/.I)' thu n o thers. For example. For this reaso n. use economet ric models in which the \'8r. • Tht: universe is believed to have existed for 15 hil lion years. These models with changing variances fi re morc consistent with the very had:md "cry gooddays we aemall)' see on Wall Street.: normal dis tribution: in ot her words. and F Oi$tribution$ 43 How small i~ 1.tion 2. ~conds. if X and Y h a ve a biva ria te norm al di stribution a nd a xy = 0. The ehi·squllred distributio n is th e distribution of the sum o f m squ a red ind e pendent !Standard nOlTtl al ran dom varia bles. lhen the ma rg inal d islri bu lio n of each o f the va ria bles is no rma l (this follows fro m Equa J.. th e n X and Ya re in de p en de nt.
.XI) = 0.!rcent iles of the Sru dent I distribution are given in Appendi.. where Wand \' are inde pendently di~Mibuted. ..". . L .. is defined to be the distrihution of the ratio of a chi sq uared random variable witb degrees of freedom m...let W be a chi·squared random variable with m degrees of freedolll and let V be a c bi· ~quared random variable with 11 degrees of freedom. Wben m is 30 or more. b UI when III is ~mall (20 o r less) il has more mass in the laih. to an indcpe ndentl)' d i~ lri b ut cd chi·squared random variable with degrees of fre edom 11 . divided by the square root o f a n imlcpcnde ntly distributed ch i·squa red ra ndom variable with m degrees of freedom divided by m . Selected p!. divided by m.quared ran I.:cs of freedom til .._ _: .that is. it i::..44 CHAPTU! '2 Review of Probability distribution derives from the Gree k leiter used 10 denote il : a chisquared distri bution with 111 degrees of freedom is denoted x. let Z be a stamlard normal random varia hie. .. the Student 1 dist ribution is well approxima ted by the standard normal dislribUl ion... an F distribution with numera tor degrees of lIeedom m and denominator degrces of freedom II. This d istributioo is denoted (". a nd the I. The St u· dent I distribu tion has a hell shape similar to that of the nonnal distri but ion. In this limiti ng c<I!'e.81. all important special case of the F distribution arises whe n the de nomimHor degrees of freedom is large enough that the FM II dis tribution can be approximaled hy the /'~' ._ _ . x. a " faller" bell shape than Ihe nonnal.. Appendix Table 3 shows Ihallhc 951h percentile of the .. lei W be a random variable with a chisquared distribu ti on wi th //I degrees of frce· dom. __ _ _. The Student t Distribution The Student I distribution wi th m degrees of freedom is defined to be the distri· bution of the ral io of a standard nonnat random variable. and Jl:\ Z a nd W be independe ntly distribu ted.. . . distribution equals th e 'itandard normal distribution... To ~late this mathematicaUy. the de nominator random variable V is the mean of infinitely many chic.. _ •. I _ _ _ • ./' Selected percentiles oCthe distribu tion arc given in Appendix Table 3.rl d is tribution is 7.. . Then has an I~Jln distribution. The Student (di ~lribution depends on tbe degr... denoted F". _ __ _ :~ .i. divided by 11 . The F Distribution The F distribution with m and 1/ degrees o f freedom.95.. In stati"tics and econometrics. I" .". That is._ • .so Pr(Zr + Z~ + 21 :5 7. The n the random variable z rv'iv~ has a Student {dis tribution (al so called the (distribution ) with III degrees o f frl. For example. Thus the 95 111 pcfcentill: of the 1m distribution de pends o n the degrees of freedo m m.!cdom.. '.thal is..'I( Table 2. w_ ~" ____ _t ___•• _ _ . distrihu tion.
which is called its sampling dbtribulion.>t.w:. pro.24}:Ibus the F..1O distrihmion is 2. 7. ides no infonnation about the commuting time on nnut her of the days: tbat is.lily com muting time has the cumuiali .that is.1mple average itsel f a random variable.. Because the sample avcrage is a random . an cssenlial step towanj understanding the performallce of econometric procedures.ample of del la. . dis tribution is 2.has thl! effect of making the 5. For example.S Random Sampling and the Di~tribvlion of the Sample Averoge 4S random .92. randomly drawing a s~lmplc from a larger population. Characterizing the distribu· tions of sample averages therefore i. Suppose our commuting studcnt from Section 2. The 9()lh.: begin by discu s~ing random !!:"llllpling." limit of 2. the value of the commuting time on one of these randomly selected day:..elects these days al random from the school year..Ind her d. Random Sampling Simple random sampling.5 Random Sampling and the Distribution of the Sample Average Almost arr the ~ I atistieal and econometric procedures used in th is hook involve ave rages or weighted uverages o f a :. 95111 • and 99 111 percentiles of the Fm •II distribUlion arc gi\'en in Appen dix Table 5 for se lected values of In and n.tribution of the l>3mple averag\!..the 95 . For example. c distribution function in Figure 2. divided by m: Wl m is dbtribuled F".60. from Appendix Table 4. .c these days were selected at random.11 percentile of the F!O. Bec.1 aspires 10 be a ~tal i stieian and decides to record her commuting times on vilrious days.(0).ampling and the di:.'ariable with In degrees of freedom.2a." dis tribution tend.KI (from Appendix Table 2). the "alues of the commuting time on each of the dirfcrent days arc independcntly distributed random vllriahies.71..rl distribution. :. which is:1 (7$113 = 2.'ariable is I (see Exercise 2.60. This section concludes with some properties of the sampling di:. The <let of random sampling. As the de nominator degrees of freedom 1/ increa~es. and the 95 . the 95 th percentile of the F~. This section introduces soml! basic concepts about random . it has a prohability distrihution.9(I distribution is 2. to the Fl... WI. the 95 ' " percentile of the P. She .. beca use the days werc selected al rantJom. "hich is the same as the 95 111 percentile of the . knowing. 2. ariab1c.11 per centile of the Fh.2. divided by the degrees of free dom.l rihutiOll S of averages that are u:!ied throughout the book. distribution is the distri bUlion of a chisq uared random .
YI!' When YI .d. has the same marginal distribution for i "" 1.. then the obM! rvotio lls and . The Sampling Distribution of the Sample Average The sample average. . Y L•••• . .elected days and Yj is the commuting time o n the ph of her randomly seJected days. Had a different sample bee n drawn.. a specific val ue is recorded [or each observa tion. . . Y. . . In the co mmut ing example. . " . Be fo re they are sampled.. If dif fe rent membe rs of the populati on are chosen. their average is random.. When Y. .11m&. is random ..i. knowing the value of Y1 provides 00 infor mation abollt Y 2• so the conditional dis tribut io n or Y2 given Y. f1 . Because YI .. the act of random sam pling means thar Y1' ..". or U. . in which n objects are selected at random from a populalion ( the population of commut ing days) and ~ach member or the population (each day) is equally likely to be included in the sample. YII is Y ==  I (YI n + Y1 + . . the margina l distri bution of Yi is the same fo r each j == I. .46 CHAPTER 2 Review 01 ProlxIbility The si lUarion described in the previous paragraph is an example of the sim plest sampling schem e used in statistics. Y n can take on many possible vJ lues:a h er they are sampled. Y". is the same as the margin al d istri bution of Y"2' In o ther words.. Y" are drawn from the same dLstributi on and are indepe ndently dislri buted . . they are sa id to be independently and identically di~1 rihut e d . Simple random sampling and Li. Because the s. . where Y1 is the first obse rvat io n. ._1 (2.. Yt is dist ributed independe ntly of Y 1. Y" are rando m. thei r values of Y will differ. under simple rantlom sampling. Y" are randomly drawn iTom the same popu la tioD.5.. . U ode r sim ple rando m sampling. " . .!lmplc was drawn at random. Y 2 is the second observation. i. Because the members of the popu la tion induded in the sa mple are selected at ra ndom. ..d.. called simple r~ndom sampling.43) An essential concept is that the act of dra wing a random sample has th e eUeet of making lhe sample average Y a random variable.. draws. draws ure summarized in Key Concept 2. the values o f the observatio ns Y1' •• • • YII are themselves random. Be<:a use YI .. + I y~) = If " L Yi' . .. Y" are said T O be iden tically distributed .d. The n observations in the sample are denoted Y.. Y I is the commuting time on the firs t of her II randomly :. . .. the value of each Y. th is margin al distrib ution is the distribution of Y in the popu lation being sampled . Y" can be trea ted as ra ndom vari abl es. then Y b . o f the n observations Y. a nd so forth .
•• •• Y" are independe ntly a nd idcnlically distributed (i.. drawn objec t is denoted Y. Y" are i. . 11 and Y1 is distributed independen tly of Y2"'" YII a nd so forth. The va lue of the random variable Y for the .i.44) The variance of Y is found by applying Eq uation (2..)J = x 211y = }J. . The sampling distribution of averages and weight ed averages pla ys a central role in statistics and econometrics..d .1 popu la tion and each object is equally li kely to be drawn . . For general/I. ...1xmon 01 !he Sample A"'"'9" 47 L . then compUled th~ average of those five limes.28): E(Y 1 + Yi> = ~y + ~y = 2~y. var(Y\ + Y l ) = 2(11'. Whe n n = 2. t he mean of the sum Y . 1 KEY CONCEPT SIMPLE RANDOM SAMPLING AND IJ.. 2.5 their sample average would have been differe nt the vulue o f Y differs from one randomly drawn sample to the next. bccause if is the pwbability distributio n asso ciated wit h possible va lues of Y that could be computed fo r different possi ble sam· pIes Y1 • .S R oodom Sampliog ond!he [J.... . il has a probabilit y distributio n.LE(Y."tb ra ndoml). because l't . (2. We start our dis(. is the sa me fo r aU i . Because each object is equally li kely to be drawn a nd the distri bution of Y..d "y. RANDOM VARIABLES Jo a simple random samp le. for n = 2. n objects are drawn al random from .i.• In general. Y"..• Y" are i. •n). Had she chosen fi ve different d ays. so cov( 1'. I " E(Y) ~ .suppose OUT student commuter selected £iye days al random to record her commute times. the random variables Yt .). Tht: di ~iribu t i on o f Y is called the sampling distribution of Y.) = n ". Suppose Illal fhe observHlions Y1•. Beca use Yis nmdo m. I 2. . Thus the mean of t he sample ave rage is El~(YI .d. (beca use Ihl! o bse rva tions are i.  t . . var(¥) = ~(]"f. nrc independen tly distributed for i . so [by applying Equa tion (2. + Y2 is given by applying Equation (2.d ..."ussion of th e samplin g distri butio n of Y by computing its mean and va riance under general conditions on the population distributioCl o f Y. For example."..). Mea n and variance 0/ YO . y.• Y.. the mean and variance is the !oa me for all i = 1•.• and let ~y and (T~ denote Ihe mean and va riance of Y. and Y.37).she would have recorded five difrereot tim esand thus would ba ve computed a dilTerenl value of Ihe sam ple average.. YJ) O. For example.i.Ld. ' nlU S. D.31) with a = b = ~ and co\'( Y "Y2) = OJ.. tha t isthe dislT i· bution of Yi is the sa me for all i = I. j.
4 7).. the su m of 1/ nonnally distribull'd random variables is itse lf nor mally diqrihuted. the mean.42) .act·· approach :lnd an ··upproximnte·· approach. dra\\s from the N(Jl).i.L L '.' Jistri bulion of Y. Umt i~ the vari ance of the population di ~tribUlion from whkh the observation i. (2.." U and sld.dcv(Y) y c u. There arc two approachcs to charnctcri711lg "'iltllpli ng distribution!': an ··c:.Impling distribution o f Y is. I.46). As Slated fo llowing Equation (2. is: that i~. O'l' )' then Y is distributed N(. Y" are i. q r denotes Iht: !l.}! and econometric procedures. and the standard deviation of Yare £(Y) ~ \'<lr( Y ) 0:= IL" . " cov(Yjo Y. Because the mean of"f is Jlr and the variance ofY is (T~Ill. the variance.Iv'ii.amplin!: dl'lrihutinn . L . . in a mathematical sen'ic. u~/1!).so it i~ important to kno". . . In summary.4S) These rc~ult ~ hold whatever thl. vii ' (lA7) (2. the lli ~lribulion of Y. (T). = . what the ~:.'1ch entai ls deri\ing a formula for the .!. (2 . Suppose thai Y lo. Y" are i. The notation (T~ denotes the va riance of the sampling dhtribution of the ~..46) 0 (Tv = • ul. Sampling distribution ofY when Y is normally distributed. . if Y 1• • •• . Similarly. tn cont ra~tot7 i· is the variance of each individual Y.I LY.) + 11. d<xs not need to take on a specific form. drawn.u == .) " L . ..tandard deviation of the sampling distribution of Y. this mean s that. such a~ the norma l distribution. draws from the N(Pyoq~) distribution .d. j"' The standard deviation of Y is the MJuarc root of the variance. 2.48) 10 hold. for Equa· tions (2. m pIc averagf! Y .. ". and (2.6 LargeSample Approximations to Sampling Distributions Sampling di~ t ributions pia) a central role in the development of statisti!. The "\:xac! 'Oappro.'C Y..Lt! ..) C2 .48 CHAPTER 2 Review of Probability var(Y ) "" var ( I" 1 " ~ .uy..~5) LJ 1/.
y) / a r does 1101 de pend on the dist ribution of Y. the sampling distribution of the standardi zed sa m· pi e average. Because she used simple random sampling.d. A lthough exacL sa mpling distributions are complicll ted and depend on the dis· tribuLion of Y. Unfortunately. if Y is normally distributed. and Y I • . then (as d iscussed in Section 2. druws of a Be rnoulli random .d .•• • Y" are i.e is o nly " . Le t Yi equ al 1 if her commute was ~ h o r( on the j Ib ra ndomly selected day and equal 0 if it was long. The central limit theorem says that..5) the eXllct distribution of Y is normal wi th mean p.d. the large values balance the small values and their sample aver age is close to their common mean. when the sample size is large. This is someti mes called the '"law of averages. Y. i = I.iJ. For example.2. This section presents the two key tools used to approximate sampling distri· butio ns whe n the sa mple size is large. when the sample size is large. This normal approximate d istribution provides enormo us sim plifications and unde rlies the theory of regression used throughout this book.jJ.·. . . under general conditions. The large !<ample approximaiion to the sampling distribution is oflen called the asymptotic distributioo.y)/Uy.i . The Law of Large Numbers and Consistency The law of large numbers states (hat.y with very high probability.y and variance u~ 1 n.. . (¥ . The law of large numbers says that. is approximately normal."asym ptotjc" because the approximations become exact in the limi t that /I ~ ce.. Y will b~ close to p." When a large number of random variables with the same mean are averaged toge ther.0 lorgeSomple Approximations !o Sompling Distributions 49 descri bes the distribution of Y for any" is c'llIed the exad distribution or finite sample distribution of Y. Y I ..i . (he law of large numbers and the central lim it theore m.remarkably the asymptotic normal distribution of (¥ . Y" arc i. As we see in this section. if the distribution of Y is not nor mal . Thus. Moreover. then in general the exaet sampling distri bution of Y is very complic<Hed and depe nds on the di:mibution of Y. The "app roxim ate " approach uses approximat ions to the sampling distribu tion that rely on the sa mple size being large. these asym pto tic distributions ca n be counted o n to provide very good approximations to the exact sampling distribution.z.· . in which she simply records whe1he r her commute was sho rt (less than 20 minut es) or long. Y will be ncar /. these approximations ca n be 'cry accurate even if thc sampk. For example. consider a simplified ve rsion of o ur student com muter's expe r iment. Because sa mple sizes used in practice in economet· rics typically number lDthe hundreds or thousands. II are i.. the asympto tic disLri bu tions are si mple.i.l )' with \cry high probability when II is large.30 observations.si.
howc ver (fig ures V'. Beca use thr: expectat ion of a Bernoulli ra nd om variab l~ is its success probability. assumption holds. un more values and the sampling distrihution bl!comes tightly centered o n }Ly..7K As II increases. Fo r example. <lnd hoth .6). The law of large numbers Slall!s Ihal. where (from Table 2.h ..~. bccauM! the re is an upper limit \0 our sludent's commuting time (~he could park and walk if the trumc is dreadful).. th:'II Y is consistent for J1. <lnd that the variance of Y.. .2) the proba bility Ihal Y. y (or. as short. these large value~ cuuld dominatc Y and th. When II = 2 (figure 2. out lieni.1f the data arc collected by simple random sampli ng.c 10 the Irue proportion in Ihe popuJalion. y and if large o utlie rs Il T e unli kely (ICChlli .8 sho\\·s the sam pling distribution oCY for various sample sizes n .::pt:nden tly and idc niitillly dist ri huted with £ ( Vi) = jJ.I). the n Y ~ J.50 CHAPTER 2 Review of Probobility L ".afe unlikely and observed infre quen ll~ : otherwise. the variance or the dl.y wilh increasing prohnhilit~ . . CONSISTENCY. <lnd I (neit he r co m mute .2. = 1 is O.l't II increases is called cOII\'c rgc nce in pro bubi lit) or.. . O.. nlC math(' rnatieal role uf these condi tions is made dear in Section 17. E( Y.7R.."ould be unreliable. Y is con sish:nl (o r J. The sample avcnlge Y is the fract ion of d A YS in her sample in which her commu te was shurt. ere "hurt ). one was short.) if the p robability thai Y is in the fange lJ. equivalently. . where the law of IUfi!e numbers is pruve n. The a~sumption that t he varianc.: is finit!! sa ys that e"tremc1y large \·alues of Y...l. consiste nc) (sec K c~ Con cept 2.8a)..C 10 Pol + c bccQlm:'s ~ ~ 2. equi\alcnlly.... more concl!lch.d).)" . Thb l~ wri llc n as y 4 Ilr.trihution of commuting times is finile.that is. '!lIe sampk ave rage Y con\'crgcs io probability to jJ. This assumption is plausible for the applications in this book .: ~ampk a\oerage \. CONCEPT CONVERGENCE IN PROBABILITY.) or.r. then the Ll. 11 a rc ind. The C"Onditions for the law of large numbers that we will USe in this book are that Y.' ).d.. .78.6 ! a rbitrarily close to one as " increases for any constant c > O. i = 1.. none of which b parlicular!) c10. 'Ille propen)' tbal Y is near J1. under cerlain condilions. 111C law of large n umbers says t hat if Y I' i .) "" /L y ::: 0. AND THE LAw Of LARGE NlIMRFR<. .y . cally if \'ar( V. \ariable. Y can take on on ly threc val ues: O.) = if y < '". d.I. Y takc::. is rinite. (r~. 11 arc Li. Figure 2.. . Y con veri!l!s in probability to J1.
. iI.... JI .1. 'it' (") 7'> 1. .78 os the somp/e ~ze n increases . l) .0U \ "J .5 1 'Xl flO 02'i O'jj.'rage (b ) II 1 . l)7~ J. 07S . '"' 0. Uti 25 (.I)(.' ( 011 1 1 .2.78 (the probability of 0 short commute is 78'1.7 P robabili lY '" "4 p " O.).8 Sampling Distribution of the Sample Average of n Bernoulli Random Variables Probability ". .6 lorge·Sample ApproKimolions 10 Sampling Distributions 5I fiG URE 2. 11 1 .. (1 . ~ 1) "" 0.5 ProbabililY Probabili ty o J25 /1 ::: II Ii 'i2:.IM' Value of sam ple average (C)II :!. 1 .78 1i.1 .3 '1'. the sample average 01 n independent Bernoulli random YOriabies with p Pr(Y. (!.75 1.i Va lue o f sam ple average (a) II Va lue o f sa m p le a\.bution5 are the $Ompliog di~butions of Y...5<J 11.1" . ".\ . ." ". so the sampling dislribuhon becomes more tighrly concen· 'rated around in mean p. Val ue of sam ple average (d) 11"= tuO The di~tr. .II 112 0 25 OS" U. The yorionce of the sam· piing distribution 01 Y cleaeoses a s n gets larger. .
the distribution of Y is well approx imaled by a normal distribution when fI i~ large.). Ih is distribution should be well approxi mated by a N(O. A ccording to the ccntral limit theorem.8: the distributions in Figure 2. respectively.}. The d istribution of the standardized average (Y .})/u y . when the underlying Y. TI}is point is illustrated in Figure 2. 103.Althougb the sampling distributjon is approaching the bell shape for /I .25.fJ. and 100.y. arc themselves nonnally distributed. lOb.8.5. this requires some squ inting.9 fo r the distributions in Figure 2.. how large mUst If be fo r the distribution of Y to be approximately normal? rne answer is '" it dcpend~"The quality of the normal approximation depend. if 11 is large enough. e xcepl th at the scale of the horizontal axis is changed so that the standardized vuriable has a mean of 0 and a variance of I. However. One way to do this is 10 slanda rd i%e Y by su btracting it:i mea n a nd dividing by its s tandanJ deviation. 1) distribution when If is large.8. is shown in Figures 2. then Y is exactly normally distributed for all II . After this change of scale. C. themselves have a distribution that is far from normal .9 are e xacliy the same as in Figure 2. in contrast. the normal appro ximation still has not iceable imperfections. when the sa mple is drawn from a population with the normal distribut ion N (p. it is easy to sec that. The sampling distribution of Y. if the Y. and d for n = 5. The convergence of the distribu tion of Y to the bellshaped. the distribution of Y is we ll approximated by a normal distributio n. The central limit theorem 5<1)'S that this same resuh is approximlltely true when II is large even if Y1•• • •• Y" are not themselves normally distributed. on the distribution of the underly ing Y. This distribu tio n hus a long right lail (it is "skewe d" to the right). It wou ld be easie r to see the shape of th e d istribution of Y if you used a magnifying glas!> or hud :some other way to zoom in Of to cxpand the horizontal axis of the figure.}. ::0 . the distribution of Y is f'xlIclly N(p. when n is large the d istribution of Y is approximillely N(p. u?). that make up the average. rrj.y a nd its variance is u~ u }ln.1his leads to examining the distribution of the standardiLed version of Y. y)/U y is ploued in fig ure 2. As dis cussed ilt the end of Section 2.10 for a popu la tio n distribution. how lmge is " Iarge e nough" ? Th at is. normal approxi ma tion can be see n (a bit) in Figurc 2.. shown in Figure 2. then Ihis approximation can require" 30 or even more.52 CHAPTER 2 Review 01 Probability The Central Limit Theorem The centrallimit the orem says tha I. so lhat it has a meltn of 0 and 'l variance of 1..p. Recall that the mean of Y is p. At one extremc. 25. 0:= at> One might ask. According to the central limit the orem. under gener al conditions. that is qu ite different from the Bernoulli dis tri but ion . because the distributioo gets quite tight for large n. (Y . after centeri ng a nd scali ng.
..i1 nnJ... 30 .oe)..· tions in Figure 2.0 III lUi " I .2..l00 tel /I .\ "' ('1 0.l on .ample avt rage (d) n .5 " 0.6 LorgeSomple Approximations to Sompling Di~bulio.8 and magnifies the scole on the horizontal axis by a Iodot of Vii. .U .3.11 . c.8 is plolted here otter slondardizing Y.J(. When the sample size 11 Iorije.. (I 2.) P robability 0. This centef"51he d.. The normal distribution is Kaleel $0 that the hei9ht of the di:ltribution5 i5 opproximotely the 5ClmG in 011 figures.ndardiu.11 011 " 10 2 11 .  Staodardi t:ed va lue or "'rnple average (b ) n " 5 Prob~b iU ry 2 P ro babiU ry 01 Standa rdized value o f sa mple average Standa rdin d val ue or . .!5 The 5CJfTIpling dj ~tribution of Y in Figure 2. 3 .d va lul! or u mpll! average (a) .3 02 01 U.7 .< ~ " . .5tribt. the $OIfIpli"9 di1tributions are increasingly well approximated by the normal dhtributioo {the solid I. .lO Su..9 Oi$tribution of the Standardized Sample Average of n Bernoulli Random Variables with P = 0.78 Probability (I.u 53 FIGURf 2.2 .1. 05 predicted by the cenlmllimit theorem.!ll . U..
\i) <l. as ptedided by the ceotrat limit theorem. the sampling distribu· tion. Probability 0.Inn The figur~ show the !oOmpiing distribution 01 the sta ndardized sample overoge 01 n drows from the skewed (asymmetric) population distriootion shown in Figure 2.3. like the population dip. r ~ 1 '4'T~~~ " . the sampling di~bution is well approximated by a stondord oormot dip. is skewed.1:: Probabili ty 01" I UIIl 0(1\ ~ .. The nor mal distribution is KOIed so that the height of the dislriburions is opproximately the some in 011 figvrM. =:.54 CHAPTER:Z Review of Probability FIGURE 2. I) 3( \ Sta ndardized value of sa mple average (a) 01 = 1 Stand ard ized Vol lue o f sa m ple :lvera ge (b) .201 n.100..YI 0.ibution.."'1 P robability (U~ t'I. 5)..lll lI . When n is smo!! In . ~2 j Stnnd ardi zed val ue of sarllple ave rage (d ) II =. 10 Distribution of the Stondordized Sample Average of n Draws from c Skewed Distribution P ro ba b ilit y 11.ibu\ioo (~id tine). Bvt when n is large (n = 100). ..0 Standardized value of sam ple a~'erage (c) .
rSummary 55 ~ ' f¥4i%JIlJtp.vl KEY CONe". The ccn trullimit theorem is summaril. The conditional probabilit y distribution or Y given X . the probabi lity distribution fun ction (for discrete random variables) .)..' com'eniencc! of the normal approximation. Y is said to he asyml)loti ca ll~' norma II)' distributed . A Tlormn ll y dist ri but ed ra ndom varia hie has the bcll lo hnped probability dc n ~ity in l::"jgurc 2.. arc i. B ccau~c th e distribution of Y approachcs the !lumMI as 1/ grows large. is its probabil ityweighted average value. )' and va r( Y. the distrihuti on of (Y .d.5 .llion is quite good. . combmed with its wide applicability because of the central limit theorem. howc\'er. 4.9 and 2. .cd in Key Con cept 2.lLy)1J.10 a TC complic<t led and q uile dirrl'TC ni from each o ther. of popula lion distributio n ~ l llC cc nlrallimil theorem is a remark a ble res ult While the "small n" distrib utio ns of Y in parts band c of Figures 2. The expcctl!d value of a random variable Y ( also called its mean..7 (!? By II = 100. To calculat e a probilbility associo t('d with a normal random variable.. . 2. 2.7~ <?C. amazingly. The joint probahilitics for two random variables X and Yare summ ariLcd by t heir joint prObability distribution. As II + 0:: . for" 2:: 100 the normni approximation 10 th~ distribution ofY typical!} i~ vcr)' good for a wide variet). the "Iarge ..lhe norma l approxim. denoted £( Y). LTHE CENTRAL LIMIT THEOREM Su ppose that Y. The va ria nce of Y is CT~ = E(Y .. ha ve a simila r s hape. conditional on X taking on the value x. Summary L The probabili ties wit h whic h a random variable takes o n diffe re nt values are summarized by the CUlllubtivc distribution Iunction.1 OU are simple and.. Th. JA. •.}Jy) /u y (where = a} /lI) \lCcomes ilTbitrarily weH lIpproximated by the standard normal distribution..7.. 3. and the probabllity density function (for con tinuous random variables)." di stributions in Figures 2. whe re o< .) = 0'1.. Y. In fact. 9d and 2.i. with £(Y/) = p. makes it a key underpinning of modern applied econometrics. or is the probabil ity dist ribu tion of Y. and the sta ndard deviatio n o f Y is the sq ua re root of its variance.
d.l n. CH. . If Y I •. th e central limit theorem ~ays thai the standardilcd versio n of Y..d. a re i. the law of large numbers says tha t Y converges in probability to J. then: a. dislribulion (44) FdiSlribulio n (44) .f. the sa mpling dis tribution of Y has mean iL y a nd variance and ur = «1. (Y . .5.) kUrlQsis (27) oUl lier (27) le plo kurlic (28) joint probability dislribution (29) marginal probability distribution (30) conditio nal diSlribution (30) conditional expecl3tion (32) condi tio nal mean (32) law of iterated expecta tio ns (32) conditional varia nce (33) independence (34) covariance (34) (20) Be rno ulli rand om variable (20) "B ernoulli disuibution (20) pro ba bi lity de nsity funct ion (p.f. b.i. Simple random sampJ.i.ing produces n random observations Y•.prEII: 2 Review c:J Probability fi rst standardize the variable. ). . has a standard normal distribution [N(O. Y" thai are inde pendently and iiJeolically distributed (i.) (21) de nsit y function (21 ) density (2 expected va lue (23) expectation (23) mean (23) varia nce (24) standard devia tio n (24) mo me nts of a distributio n (27) skewness (27) correlation (35) uncorrelaled (35) normal distributio n ( 39) n standard normal distribution (39) standardi ze a variable (39) muhiva ri ate normal disuibution (4 1) bivariate normal distributio n (41 ) chisq uared distribution (43) Studenl. The sam ple ave rage. 6. Key Terms o utcomes (18) probability (1 8) sa mple space (19) event (19) discre te random variable (19) continuous rando m variable (19) probability distributio n (19) c umulative pro bability distribulion (19) cumula tive distribution fund ion (c. Y. 5.py)lu y . .ly: c. varies from one rando mly chosen sample to the next and thus is a rando m variable with a sa mpling distributio n. l) di stribution] when II is large. • Y". then use the standard norma l cumulative distribu tion tabula ted in Appendix Table 1.d. .d..
i.1 Examples of random variables used in this chapter included: (a) the gender of the next person you meet. Ske tch a hypothetical prObabilit y distribution of Y.) (46) sa mpling distribution (47) exact (finitesa mple) distribution (49) asymptotic distribution (49) law of large numbers (49) convergence in probabilit y (50) consistency (50) cenlIallimit theorem (52) asympto tic nonnal distribution (55) 57 Review the Concepts 2. (e) the time it takes to commute to school .7 Y is a random yariable with k "" 0... random variables with a N(t . What i ~ th.. A re X and Y independen t? Explain. 4) distribu· tioo . 100.i.Review of (oncepls simple random sam pling (46) popul ation (46) identically distributed (46) independently and idenlicaJly distributed (. A random sample of 4 students is selected from the class and their a\'eJ age weight is calculated. Explain why n ran dom yariables drawn from Ihis distribution might h. Would it be reasonable to use tbe normal approximatio n if II . Explain why knowing the value of X tells you nothing about the \'alue o f Y. Repeat this for II = 10 and n .l\'c some la rge ou tliers. random ya riables with the probabilit y dis· tributio n given in Figure 2.. Will the average weigJ1t of [he studeuts in the sam ple equal 145 Ibs. (d) whether the computer you are assigned in the library is new or old. . (b) the number of times a computer crashes. Z.s Suppose that Y 1• •• •• Y" are Ll.e rela tionsbip between your answer and the law of large numbers? 2. 2. Y. Explain why each can be thought of as random . is a random variable. Sketch the probability density ofY whe n /I = 2.? Why or why not? Use this exa mple to explain why the sample average..3 Suppose that X denotes the amount o f rainfall in your hometown during a given month and Y denotes the number of children boro in Los Angeles dur ing tht: same month. describe bow the densities differ.4 An econometrics class has 80 students.1). 51 Whal about II = 25 or n = l 00? Explain.6 Suppose that YI .. In words. You wa nt to calcula te Pr(Y $ 0.. 2. and the mean studen t weight is 145 lbs.2 Suppose (hat the ra ndom variables X and Yare independent and you know the ir distributio ns.d. Y" are i. and (e) whether it is raining or no1. 10a. 2. and kurtosis = 100.d.d. skewness = 0. (Iy = I. 2.. . ..
1) Total 0. n. Wha t is tbe mea n. .58 CHAPU:R 2 Review of Probability Exercises 2. Derive the mean and variance of Y. ( Him: You migh t find il he lpfuJ 10 use Ihe fo rmulas give n in E xercise 2. cumula tive probability dist ribution of Y. and (r ~: and (c) (fll' V and corr( W. Population Aged 2564. va r i a n cc. consider two new ran dom variables W == 3 .24t 0. popula tion.) 2. I) .s k ew nes~.4 Suppose X is a Be rno ulli random vari ahle wit h P( X a.7Y. Joint DislTibution of Employment Status and CcMlege Graduation in the U.ls ( X . De rive th .7119 O. l hc nmdom variable s X and Y from Table 2. oX and " == 20 .21. V) . 2. Scatth:'s daily high te m pe ra ture has a mea n of 70°F a nd II !itandard devialion of JOF. Comput e (a) £(W) and £ (V) . D e ri ve the probabilit y distribut ion of Y. the joint probability d istribut io n be tween empluy me". a nd vari ance in °C? 2.1l5<J 1. Compute tbe me a n.hl 0.005 0.146 0.S.3 .2. ~ 2.2 Usc the pro babi lity dislfihUl ion given in Table 2.UJ College gr:ld~ IX "" I) TO III O. 2.( ~5 EmploYK IY. Shnw E(X~) == p . c.S. a nd kur· IOsis o f X. Suppose that p == 0..6 11le following table give:. 1990 Unemployed IY" 0) Noncollcg<: grm. h. Show E(X*) == p for k > O. c. status a nd college graduation a mong Ihose ei ther e mployed or look ing for \\ork (u nemployed) in the working age U. b.3 U~ing.p . Y ). .000 . Ce nsus.2 to compUlc (a) £( Y ) and E(X): (b) <I\: ami a ~: Hnd (c) a xy and corr(X. st:l nd a rd devia tio n.9'iO O.5 In 5e pte mbc r.1 Let Y denote the number or" heads" tbat occur when IwO coins tIfC tossed.S.. based on the 1990 U. (b) (T~.
"' 0.lIld a standard dcvi<ltion of S IROOO. Female cam ings have a mean of $45.02 O . Y = 14) = 0. Calculate the pro ba bilit y t. Calculat e the probabili ty distribut ion.1 mean of $40. independent? Explai n.115 U.01 0.000 per year .. d. 2.0:. Thai is.. Show that lhc unemployment fOlie is given by 1 . Pr(X . male earnings have .!istrihution . e. 2. mean.000 per year and a standard devia tion of S 12.7 In a given fX)pula lion of twoearner male/fe male couples. Wh a t is the probabil it y tha t thb worker is a COllege graduate'! A noncollege graduate? f.I .02 • X · 8. the cova ria nce between m ale a nd fe mail: ~lIrni ngs? c.80. A randomly selected member of this population re ports being unem· pJoyud. 10 11. Arc ed ucational achievement and employment strum. Calcu late the une mployment rate for 0) (·ol1 cgc grut.9 X a nd Ya re discre te rando m variables with the follow ing joint dist ribul ion: Value of Y 14 0.09 0. b.L = OandfT> "" 1. Lei C denote the comhin ed e<l rnings fo r a randomly selected couple. a. 2.01 0. The corre lation between male a nd female e arnings for a couple is 0.02.15 0.(c) from S (dolla rs) to E: (curos).W 5 0.17 0.!uutcs and (ii) non college graduates. an d variance o f Y give n Calcu la te the covaria nce and corre lation bel\\·ecn X lllld Y.. Ca1cuhHe E( YI X = I) and F :(Y I X = 0). 0. The unemployme nt fate is the fraction of the labor force that is une m ployed. Le t Z = ~ (Y .0 1. and so Iorth.1 5 om 0.£( Y). c.Exerci5eS ~.\andard dev iation o ( C? d.02 Value (Ie x . and va ria nce of Y.ShowthatJJ.8 Th e ra ndom variab le Y has a me an of 1 and a varinnce of 4. t'. Convert the answe rs to (a).l). h.1. mean. W hat is the mean of C"! b_ Wha t i. a. W ha t is the :. . 59 Compu te £ ( Y) . " 30 40 .
b.13 X is a Be rno ulli random variable with Pr(X = 1) = 0. and W is distributed N{O .1. Use the centra ll imil th eorem to . find Prey > 0). £(5 3 ) and £(5 4 ) . If Y is dislribuled N(O. 1(0). 25).79) . Show tha t £ ( y 3) = 0 a nd £(W3) = O. r[ Y is diSlribUl ed N(3.) 2. S = Ywhe n X = I. b. find Pr( Y :s 7. 9). Y ~ distributed N(O. d.12). Derive E(S) .99 :s Y :s 1. find Prey > 4. find Pr(Y :s 1. Show that £(y2) :0 1 and £(W2) '= 100.) d. If Y is distributed X~. £(5 2 ).. D erive the skewness a nd kurtosis (or 5. I).lind Pre Y > 18.ewness for :l symme tric distribution? ) c. Co If Y is distributed N(50. If Y is d istributed X~. 4}.83).. c.10 Cornru le the foll owing probabilities: It.) e.78). I).60 CHAPTER:I Review of Probability 2. d.99 Y s 1. Why are th e answers to (b) and (c) the sa me ? t!. 2). Y s 8). If Y is distributed F7. find PreY > 1. find Pr(40 :$ d. If Y is dimibuted ( 15.and S = Wwhen X "" 0. 2. (Thai is. find Pre Y > 1.31). r. 2.12 Compute the I'o llowing prob.14 In a popula tion 11. !i> c.) o.1. (Hint: Use the definition of (heXi distribution. b. (Hilll: Wh at is the sk. Show that £(y 4) = 3 a nd £( W4) = 3 X t 002.75). Le i S = XY + (1 . 1. (Him: Use the law of itera ted expectation s conditioning on X = 0 and X = 1. (ind Prey ~ J).99). If Y is distributed FlO. If Y is dis(ribUled . Why a re the answers to (b) a nd (e) approx imately Lhe sa me? c.1' = 100 and u} answe r the following questions: = 43.l2O" find Pr( Y > 2.99. (l1il1l: Usc the fac t th a t the kurt osis is 3 for a no rmal d istributio n.99).. If Y is distributed /90' find Pre . find Pr(6 s Y :s 52 ). If Y is distribute d N( I.X)W.:'l bi lilies: a. b.0) .11 Compute the following probabiliti es: a. find Pre .t1o. If Yis distributed Fj4. If Y is distri buted N(5.
0 as n grows large. Suppo:. c. Pr(Y :S 0. eac h d istribu ted N(10.15 Suppose Y i . The wealhc r can innict storm damage 10 3 home. ii. What is the mean and standard deviation of the damage in any year ? b.37) when n = 4(JO.18 In a ny yea r.i .000.. Le t Y denote the 3\l.ragc damage to these 100 homes in a year. (i i) II = lOO. fi nd Pr( 101 S Y s 103). Let Y de no te the sample mean .d .43) when n = 100. 2•. random variables. b. . Comp ute Pr(9 .16 Y is distributed N(5 . (i) What is the expected value of the average da mage Y? (ii) What i:.4. How large would n need to ~ 10 e nsure That Pr(OJ 9 :S Y :S 0. III. Y~ lOA) when (i) IJ = 20.e that in 95% of the years Y = SO. Explain how you can use your computer to compute an accurate approx imation for Pr(Y < 3. fin d Pr( Y > 98). n. in llny year. you do have ~ou r com· puter and a computer program that can generate Ll.6 s (iii) n = I. the proba bility thai Y exceeds $2000'1 . Let Y de note the dollar val ue of damage in a ny given year. Berno ulli ra ndo m va ri ables wilh p = 0..1. Use th e cen trallimil to compute approxima tio ns fo r i. 1(0) distributio n. 1(0) and you wa nt 10 calc ulate Pr( Y < 3..Exercises a.4 1) ~ O.6). 2. 2. lllld b.95'? (Use the central limit theore m to comput e an a pproximate a nswer.. .. you do not have your tcxthook and do nOl have access to a normal probabil ity table like Appendix Table 1. Show tha t Pr( ID  C S Y :S 10 + c) becomes d ose to 1.d . a.nsura!lCC pool " of 100 people whose homes are suW ciently d ispersed so that. arc i. Use you r a nswe r in (b) tu argue Iha l Y conve rges in p ro ba bility 10 10. UnfOrlU· nately. .4 ).6). However. Suppose c is a positivc num be r. n are i. 17 Vi' i = 1. i. Pr(Y 2: 0. li nd PreY s 10 1).OOJ. In a ra ndom sa mple of size II  64 . c.) 1. draws from Ihe N(5. Consider an "i. but in 5% of The years Y = \20. In a random sample of size 11 = 61 100. the d amage to d ifferent ho mes can be viewed as independen tly dis tributed ra ndom variables. " a. From yea r to yt=ar. b. the damage is ra ndom.. I n a rando m sample of size 1/ = 165. 2.d.
+ 6IE(X)]' IE(X') ] 2. Use your answer 10 (a) to ve rify Eq uation (2 .l l X is a ra ndo m va ri able with mo me nlS £(X ). 1 .1'/)... E(X})...W . Suppose tha t X a nd Y a re independent. + (1 .4(E(X) IIE(X ' )] 3IE(X)]'. Z is Pr(X "" x. £(X2).. .. and the conditional probability di stributio n of Y given X and Z is Pr(Y = IX = ". Show that Pr(Y = y) = L:~ I Pre Y = yjl X = . . Show that q corr(X..05 (5%) an d standard deviation 0. Explain how the marginal proba bility that Y = Y ca n be ca lcul a ted from the joint probability distrib utio n.J b. c. f'r (X x .Y):< O . 0 a nd 2.. is ra ndo m with mean 0. 1:". X) .. [Hint: This is a genera liza tio n of Equatio ns (2.IV . a.7 { } . b.. If you place a fra cti on w of your m o ney in the ~Iock fun d and the rest .Z)).r..for sim plic ity.25 . Wha t val ue o f w makes the mean o f R as large as possible? What is the standard devia tion of R for this value o f w? . a. The correlation betwee n R. a.( Z = z) = f'r(Y .3(E(X'JIIE(X)1 + 2(£(XlJ'. Suppose that IV = as Compute the mean and sta nda rd deviation of R... Y = y..)l ~ E(X') .19).20).. b. Show E(X .• Yk> Ihal X lakes on I values X I ' . a nd Rb is 0. x . c..• . and so fo rth .. [Hint: This is a ge ne ra liza tion o f Equation (2. CHAP1Ea 1 Reviewof Probability 2. 16).19) a nd (2.XI. Show £(X .. Suppose tha i $1 in vested in a stock fu nd yie lds R.. ) :l.75. into a bo nd mutual fund . and that Z tak es o n m va l ues ll . Y. Suppose that w = 0. the n th e return o n your investment is R = wR.I1')R/.. and th at Rb is rando m with mean 0. 1 .t /. The joint probability distribut ion of X. l ' " d y .07. " y~.. Suppose tha I Y takes on k va l ues Yl' .04 .08 (8%) and standard deviatio n 0.. Y. Sl and you are pla nning to p ut a fraction w into a stock ma rke t mutual fund a nd the res' .. and tha t X takes on I values XI' .20 Conside r three ra ndom varia bles X..)Pr(X "" .22 Suppose )'ou have some money 10 io\'esl..l'.19 Coosider ( Wo random variables X and Y. in the bond fun d. (Hillt: Use the definition of Pr( Y = yJI X = x. Show tha t £( Y) = E IE(Y IX. Compute the me a n a nd standa rd devia tion of R. after one year and thai $1 invcsted in a bond fund yie lds RbI that R.J b.).. Z = z). a nd Z. Suppose thaI Y takes on k values Yl' ..)' ~ E(X') .. •.
23 Th is exercise provides an ex ample ora pair of random va ri ablt': s X and Y for which tbe condit ional mean of Y gi ven X depends o n X bU I corr(X.(oJJ x + b.1)0 \' y + b?u f.. (Harder) What is the value of w that minimizes the standard devia tion of R? (You ca n show this using a graph. Show that £(Y IX) = X 2 . Show thai W = ~ ~. Eqoatio n (2.. .lioD of the variance to write. I k . a Show that £(Yl'q'2 ) = I. c.O.1 ~I is d istributed 1.ty)]2] ~ x) l l + 2E [a b(X . Show that £(W) = n. (249) . Show thaI cov(X. y) Z] :>: f>2q}.:'J 'II is dislribuied X. zY j n ./oly)] + '" E(b2(y . is distrib uted i.~ x) = E(a l( X  + bY  + bey . Y) = 0 and thus corr(X.uy)2) u2 var(X) + 2lIbcov(X.lYt.d . a nd le I Y "" X l + z.p._ 1' APPENDIX _ __ 2_ . (Hilll: Use your answer to (a).. y "" 1.ndard normal random variable are all zero.3 1). c.uy)J'1 = £{[u(X . NCO.. lei X and Z be IWO independent ly distribu ted standard nonnal random va ri ables. var(b + bY) = Ella E(b + bYWI = Ellh(Y .J. a. To de ri ve Equalion (2..Io< x)(Y + b 2var(Y) . usc lht defi nition of the variance 1 0 wrile var (oX + bY) = EH(aX + bY) . (Him: Use the fact that the odd moments of a sl::J. Y) = al ai + 21. o r calculus. 3 This appendix deri ves Ihe equations in Key Concept 2.p.30). algebra. To derive Equill._ 111 Derivation of Results in Key Concept 2.Y ) "" O. Sbow (h a l V::. 1.. 0"2) fo r i :.3 63 d..l(Y . Show (hat E( XY ) = O. 2.2 .29) follows from the dd inilion of the expectation. .y)FI .. b. n.) d.n (2. i.24 Suppose Y.Deriva ~on of Results in Key Concept 2.3. use Ihe defin. Y) "" O.) d. Show that JJ. b.) 2.
~o from t he fina l line of Equ.y. usc tht· defin ition of the co \ ari:mce 10 write (.tfhltf } . To derive Equ3tinrl (2. lei n "" ..ud + J.II (250) '" b u XY'" coVI" \\hieh is Equation (2.e(V = E JJv)lI Y .lx) + JJdl( Y . Equalion (2. I.J. Ihal is. lcorr( X .35).ly) + ~ III + ILxlA. n .\' Uy) !S 1.IJ.O'.'0\'(11 • !J X + cV.fl + ."'I'll ". we huve lilal var(aX + Y) = (l1" i + O'f + 20fTXV '" ( .IT n /(' j.W o"j + u f· + 2(u xy/(I'l)uxt' (2.[(X . To derive EqulHion (2.51 ) it must he tha t u?  ul Y/ tfl ~ O. 1 252) incqu. it CHnno t be n. o.y = (T X Y + 14 .\' .ty) + j.l S  u} + /. o~ ...33). lcorr(X./f.~ because E(Y . and the fou rth equality fo llows by the definition of the vari.mse var( aX + Y) is II variance. Y) ~ EU II + bX + cV  £( 11 + b X + e VJI[ Y . = We now prove Ihe co rrelation in.Y)1 S I .. write E(X Y) = E II ( X . >lI Y IIb(X . \II Y ". i Jlie cmilria ncc I s (.lIlce and co\ oriance.. which ( using the definit Ion of thc correla tion) prow:s the co rrelation mequality.3 1).FI = L:. Y ) I :s.:gal ilc.s b)' collecting lenns.l tion (2...A.JJx ) F.ty£ (X .II + E iI'(V  ". t h.33).(y 2) = £U(Y It 1') + 214\.Po rl! = E II b(..J." .S1) .64 CHAPU t 2 Review of Probobilily where the 'lCcond equali l )' folJo.34). Itf \ y / (u. Rea rranging this ineq uality yid ds o. .r) . ilnd b  L A pplying.E( Y  JJ.lxE( Y  j.:quality in Equil tio n (2.HY JJ.32)..loUt (covariance Inequality ).: thi rd equilhty follo~ b~ expand ing Ihe quad ra tic.JJ\') "'f. Bec.. To derive Equiltion (2.Pr») + J.. write F. C4ui\i"lk ntly.llity implic!' th a t dJ yl(('~ u ~)!S I o r.1'IJ.~x)(Y .
we can u~ this sample to reach tentat ive conclusio ns.ing thc da ta ta kcs te n yea rs. Census cost $ 10 billion.CHAP TER 3 Review of Statistics S IlIl is lic~ is the scie nce o f using d <lta 10 lea rn a bou l Ih e world around us.Io draw »tati!>tical inferences. lhe population d istribution of e arninw>.sur vey. differ for men and \\omen and. population is the dece nnial 20lX) U. howc\"cr.:. managing and cond uc ting the surve ys.S. such a co mprehe nsive !>urVl"~ would he extremely eXfXn~iw. Thus a di(fere nt.about characteristics of the full popululion.S. For examp1t:." of distributions in populations of interesl. hi!> cx trl.! to the distribulion of earn ings in the population o f worker!>. more practica l approllc h is needed. whill is the mean of the distribution of earnings of reCent college graduale~'! Do mean earning. and the process of design ing the ccn»us (orms. we might . by how mucb? 1l1t:~e qu eslion ~ rclatl. SuujSlicallools help 10 answer ques lions about unknow n c ha raclcrislic. 1000 members o f the 'Pula tio n. TIle key insight of st atistics is that o ne elm learn about a population d i~lribulio n by st:1ecting a random sa mple frO\lllhat population.5 . The o nly comprehensive censu~ l llC survey of the U. Des pi" .lordinary commitme nt. llIa ny me mbef100 of the po pulation slip through the crac ks a nd arc no t surveyed. Rather tha n . population. Usi ng statistical methods. and compiling a nd ana ly:t. .survey the e ntire U...S. selecte d at random hy simp le random sam pling.. say. O ne way to answer these questions wou ld be 1 0 perfo rm an exhausti ve slIrw y of the population of work ers.In practice. measuring the earni ngs of each worker and th us fi nding. if so.
3 . lhe methods for learni ng abou t th e mea n of a singJe population in Seclions 3. In some special circumstances.) observations.23.lals can be based 00 the Student f dis tribution instead of th e normal distribution: these special drCUl11sta nces are discussed in Section 3.d . Y" are i.i.3 are extended to compare means in two different populations. •• Y" (recall that Y1. .4. Esti mation entails computing a "best guess" n umerical va lue for a n unk nown characteristic of a population disllibulion. 1 Estimation of the Population Mean I ~' Suppose you wan llO know the mean value of Y (JI. Sections 3. Confidence interva ls use a set of da ta 1 0 estimate an imerval or range for an unknown popula tion characteristic. and 3. A natural way to esti mate this mean is to compute the sample average Y from Ii sample of II inde pen dently and ide nticall y dis tributed (ij . hypothesis tests and confidence intef'.•• CHAPTER 3 Review of Siotisrics Tbrec types of statistical me thods are used thro ughout econometrics: estima tion. Section 3. if they are coll ected by simple random salllplin~).ty and the properties of Y as an estimalOr of fJ.1.2.t). H ypolh e~is testing enwils fomlUlating a specific hypothesis about (he population..d..r) in 11 populalion.. hypothesis testing.3.such as its mean. Y1..\" . such "" Ih t: mean earnings of women recemly graduated from college.5 focus on the use of the normal distribution fo r pertorming hypolhcsis lesls and fOr constructing confidence intervals when the samplc si7.c is large. The ch<lpter concludes wjlh a discuss.3 review estimation. is the re a gap between the mea n ea rnin gs for male and fe male recent college grad uates? In Sec tion 3. MOSl of the interesting questions in eco no mics involve rela tio nships be lween two or more variables or comparisons between different popula ti ons. from a sample o f data. then using sample e\'idcnce 1 0 decide whe the r it is uue. and confidence interva ls in the context of statistical inference about an unknown popula tion mean . For example. ••• .7. and confidence intervals.ion of the sampl~ correlation and scatterplots in Sect ion 3. Sections 3. hypOlhcsis testing. Ibis section discusses cSlimation of j.3.5 discusses how the means of twO pop ula tions can b~ u~t:d method~ for comparing the to estimate causal effecl'i in experiments.1 .
y) :: Joty. To slate this mathematically. Suppose you eval uate an estimator many times over repeat ed randomly d rawn samples. such as Y or YI . 1 Estimators and T he ir Properties The sample average Y is a natural way 10 estimate j.. let jJ.y. we would like the sampling distribution of an estima tor to be as tightly cen tered on the unknown value as possible.3. where E(jly) is the mean of the sampling distribution of p.. while an estimate is a nonrandom number. we would like an estimator Ihal gets as close as possible to the un known true va lue.. . many estimalors of J. in oth er wo rds. It is reasonable to hope that. co nsistency. There a re. Th us a desirable property of an estimator is that the mean of it::. in fael. take on different values (Ihey pro· duce differe lll esti mates) from one sample to the ncxt. 3. A n estimate is th e numerical value of the est imator when it is actually computed using data from a specific sample. I Eslimalion 01the Population Mean 67 ESTIMATORS AND ESTIMATES An eslimtllor is a function of a sample of data to be drawn randomly from a pop ulat ioll . of which Y and YI are rwo examples. and erficiency. The estimator j. fly is biased . Thus. al lenst in sOllie average sense. the estimators Y and Y\ both have sampling distributions. Y and Y. An eslimutor is a random variable because of randomness in selecting the sample. This observation leads to three specific desirable characteristics of an estimator: unbiasedness (II lack ofbias}. y.l. so what makes o ne estimator '"better'" than another? Because estimators are random variables. the estimator is said to be unbiased. but it is Ilot the o nly way.1.Ly.. There are many possible eSlimators. both are estimators of ILl" When evaluated in repeated samples. on average. this question ca n be phrased more precisely: What are desirable characteristics of the sampling di stri bution of an estimator'! In general. if so. For example.y denote some estim ator of ~)'... Es timators. y is unbiased if £(j. YI' Both Y and YI are functions of the data that are designed 10 eSli mate Jot)': using the term inology in Key Concept 3. o therwise. ano ther way to estimate /Jy is simply to use the fi rst observalion. Unbiosedness. you would get the right answer. sampling distribution equals ~)'.
so Y is a n .4) approaches I iI~ the ~a mpl c size increases.ing the e:. co n sist~ney. BIAS. As sho wn in Section 2.'Y a re summaril. if itt hilS a sm alle r variance than iLl" the n it uses the informatio n in the da ta more efficie ntly tha n docs . y.f.y if £(~r) = p..: sa mple ~ i z~ is la rge.0 b t ochoo~e the estimator wit h the tightest sampling. Let it y be nn e~timator of J.ed in Key Concept 3.y is said to be morc emcienl than ill' if var(fj.lIld iiI b~' pick..5 and 2. Consistency. and efficie ncy'! Bias and consistency.Ly. This ~ug gests choosing betwcl:n iLl . CONSISTENCY.: ~ mpling distribution of Y has a lready bee n exam ine d in Sections 2. Jf iL) has a sma ller variance than il y • the n jLy is said to be more efficient than ill" The terminology "efficiency" stems fro m the not ioll that.r' Then: • TIle bius of it)' is E(~y) .timator with the §milllcst vari· ance.y is tha t.)" and suppose thn t both jJ.al of the true villue J.l. bot h o f which are unbiased . • ity is an unbiased estimator o f Ji. the uncertai nty about Ihe \lIl ue of J. Th.y) < v8r(i~y ).. ~.l. The n [A.68 CHAPTER 3 Review 01 Sloliuics KEY CoNCEPt 3. that is. distribution.. and dficilo!nl. Sla ted more precisely. • Let jiy be Ilno ther estimator of p.l arisi ng fro m random \ ariat ions in the sam pl e is vcry "mall. A not he r dcsira ble prope rty of an estima to r ji..'y. AND EFFICIENCY • p.y and ILy.6).0.y is a consistent estimalor of IL)' if it y ~ p. Bias. jJ.2.h t you c hoose between the m? O ne wa) to do <. Suppose you have Iwoca ndidate e~ lim a lo rs. consistency.5. r is consistent for lLy (Key Con cept 2. How mig.p...2 ~. l::P' ) = ~y.y and iii' orc unbiilsed. when th. iI desira ble prope rt ~ of ~ )' is thilt the probability that it is within a ~mall inter\. Variance and efficiency Properties of Y H o w does Y fa re as a n estimator o f lLy when judged by the three criteri a of b ia~.
l.i.y. it s v.l r functions of Y1. the variance of Y is (T ~. Simi larly. We start by comparing the efficienc).( Y I ) = My: thus YI is an unbiased estimator of J.25(T~. llle mean of Yis p. that is.5. accord ing 10 the cri terion of efficie ncy. Y is consistent. Y is a morc efficient estimator than YI so. . In fact. Th is resul t is sta ted in Key Concept 3. of Y to the estimato r Y 1• Beca use Y 1• ••• • Y" are i. the \'ariance of Y is less than the vari · ance of YI: that is. + 2" I Y"1 + 2" 3 ).s the least squares estimatoro{J. However. From Section 2.ll'. il is the most effi cie nt (be'll) estimator among all estimators Ihat are unbi ased and are line. Y is the Bc!>t l. < •••• Y .y and its variance is var(Y) = 1. I " (Exe rcise 3. Y 1• and Y have a common math \!mlHical structure: They an~ weighted averages of YJ Y"..6) sta tes thaI Y 4 }Ly.. i . for 11 === 2.The sample average Y provides the hcst fit 10 the da ta in Ihe se nse Iha l the average squared differences between the observations and Y are the smallest of aLI possible estimators.. because var(Y) t () as n + 0(1.I I1.inear Unbiased £ stimalOr (BLUE ): Ih'lI is.why would you go to the trouble of co llect ing a sa mplt: of n observations only to throwaway all but the firsl?and the con ce pt ot c rricie ncy provides a (o rmal way to sho. the mean of Ihe sampl ing distribution of Y 1 is F. Y should be used instead of Yt.niance is V3T( Y I ) = O' ~. Said differ· o ( till unbiased esti mators that arc 'Acighted averages of Y I cn lly.. Y has a larger variance th an Y lllU S Y b more efficient than Y. these conclusions reOect a morc general result: Y is Ihc most efficient estimator Y". < ••• ." llle estimat or Y I might strike }OU as an obviously poor estimator. t E 5timotion 01 the Population Mean 69 unbiased estimator of J. Y i ~ consistent. Thus.. . Y"..".d .= Ii 1(2 1Y1 + IYl 3 Y 1 +I 3 Y• + . . The comparisons in the previous two paragraphs !>how th at rhc weighted averages YI and Y have larger variances than Y. The estimators Y. 11 ). What can be said about the effit:icney of F! Because efficiency entails a comparison of estima tors.ly. Thus Y is unbi ased and . the law of large numbers (Key Concept 2. we need to specify the estimator o r estima tors to which Y is to be compared.3 and is pro\ en in Chapter 5.3. + 2Y3 Y" (31) whe re the number of obse rva tio ns 1/ is assumed to be even for conVt:o ience. th at Y is a more desirable estim ato r than Y 1• What aboul a less obviously poor estimator? Consider the weighted ave rage in which the obst:rvations are alle rn ately weighted by and~: Efficiency.
i..3 ~ Let fJ. _. L i"'l 01)' .2) is called the Jeast squares estimator.2). Y is the most efficient estimator of J. If p.m can be thought of as a pre diction mistake. as is done in Appendix 3. a" are nonrandom constants. Because m is an estimator of £(Y) . .. Because most empl oyed peo ple are at work at lhat hour (fl a t siuing in the park!) .70 CHAPTER 3 Review of Statistics KEY CONe"'! EFFICIENCY OF Y: Y IS BLUE 3. draws..y ~ ~~_ l aiYj' where al ' " . that is. such as those that would be obtained from sim ple random sam pling. to estimate tbe montMy na tional unemploymen t rate. the unemployed are o verly .. on the second Wednesday of the month .50 that Y is the least squares estimator of f.. then var(Y'} < var(. p. The estimator m that minimizes tbe sum of squa red gaps Yj . The Importance of Random Sampling We have assumed that YJ ••• • .2) as small as possible. A lte rnatively. The sum of squared gaps in expression (3. you can use algebra or calculus to show that choosing m = Y minim ize!> the sum ot' squared gaps in expression (3. Thus Y is the Best Linear Unbiased Estimator (BLUE).2) which is a measure of the tot al squ ared gap or distance between the estimator m and the sample points. a statistical agency adopts a sampling scheme in wh ich intervi ewers survey workingage adults sitt ing in cit y parks al 10:00 A.2. Yw :0 Consider the problem of finding the esti mator m that minimizes " (Y. .In in expressio n (3.t y = Y. so that the gap Y. Y n are i. • Y". ..uy) unless f.2) can be th ought of as the sum of squ ared prediction mistakes. yo u can think of it as <l prediction of the value of Y/.M .y is unbiased . Suppose that.y be an estimator of Py th at is a weighted average of Y.olve the least squa res problem: Try many vaJ ues of m um il you are satisfied that you have (he value that makes expressi on (3. •. This assumption is impo rta nt because nonra ndom sampling can result in Y being biased .rl.Ly among all unbiased esti· mators that are weighted averages of Y. that is. One can imagine using trial and error 10 s. (3 .Ly.
The was right Ihnl the elect ion ... Do you think surveys conducted over the Inler net might have a similar problem with bias? represented in the sa mple..! an 59% to 41 %! How could the Gazelte have made such a big mis take'l The Ga~elle's sample was chosen from tele· phone records and automobile registration files. the in 1936 mllny households did nOI have cars or tele I.and those that did tcnded to be richerand we re also more likcty to be Republican. CUrTent Population Survey (CPS). • .S. This section describes lesling hypolheses concerning the population mean (Does the population mean of hourly earn in l!. Franklin D. the estimato r was biased and the G(Il(!lte but it was wrung about the winner: Roosevelr won by mad. u. Landon would defeatlhe incumbenl. unemployment ra te.or oversamples. HVDJHhesis tests 109 two populations (A re mean earnings !be s m c Mr me l' 2nd p evus?. are taken up in Section 3.S.2 Hypothesis Tests Concerning the Population Mean Many hypOlheses about the world around us ca n be phrased as yeslno questions. the sur vey it uses to estimate the monthly U.. Appendix 3.ilerary Gazelfl! puhlished a poll indicating thai Alf M. but the " La ndon Wins!" box gives a realworld example o f biases introd uced by sampli ng thaI is ~ ot ent irely random... It is important to design sample se lection schemes in a way that min imizes bias.3. The statistical challenge is to aoswer these questions based o n a sample of evi dence.2 Hypothesi ~ T ests Conce.S20?). by a l a n d~ l ide57% to 43%. Because the telephone survey did not sample randomly from t he Gll~e(l(! popuhuion but instead undcrsampled Democrats.4. the unemployed meml"le rs o f the populatio n.. T his bias arises beca use th is sampling scheme overTepresents. Roosevelt. and an estima te of the unem ployment rate based o n this sampling plan would be biased. 1 includes a discussion o f what the Bureau o f Labor Statistics aCllIally does whe n it cooducts Ibe U. 3. phones. This example is ficl"itious.ning the Population Meon 11 S hortly before the 1936 Presidential election.s.s eCIUOl I.. D o the mean hou rly earnings of recen! college graduates equal S20lhour? Are mean earn ings the same for male Clnd fe male college grad ua tes? Both these ques tions embody specific hypot heses about the popula tion distribution of earn ings. BUI embarrassing mistake. as a landslide.
V _..O ::: r r JJ. aod these a re d i.. 11 p. u~pothes is or fai li ng to do so. For c xumpie.!l. pare the null hypothesis to a second hypothesis.D iffe rt!nccs between Y and .3) IAA h.The twoside d alternative IS wnHe n as * H I: £( Y) "* JLy'o (twosidt:d alternative). Ihis is called a 1wo sided alternative hYR9/hni. •. ~ I 20 III b qu a bon (3 J).o.4) O neside d a l1 e rnativcs are a lso possible.ltl ve h ypothesIs spi:<:llies whalls true If tIle: null h ypotheSIS IS nut Ibe most ge neral alternative hypothesIS is thai E( Y) fJ.. t ~ ~ ./4 e. called the Dull hypothesis.scussed la te r in Ih is set:tion..:. t en e nu ypo le IS IS a c . H ypothesis tcstinp enta ils using data to com. called the IIl1erllatit'e h )'p9Ihc'j:is. The aliem.Y.I. . £(Y). a n y give n sample.. colle ge Snlduates earn S20/ho ur consti tutes a Pi!!! b YQQ1bo is gbm!! the oopula tion distri bution of ho urly carnin So Sta ted ma thematically. because It a llows £(Y) to be c lthe r le ')s than o r greate r tha n ~) :!l.. as t!ither rt!jccti..~ _ . (3...u):o can arise because the true t ca ll in fact does not e ual .. y. ~.72 CHAPTER 3 Review of Statistics Null and Alternative Hypotheses "m e starting point of statistical h. tha i ho lds if the n ull d oes no t. The pro bl em facing the of ~taLis ti cian is o n addi ti onal evidence. de noted by 1Ly. the conjecture that. For this reason.ypotheses test ing is specifying the hypolbesis to be tested.()o The null hypothesis is de noted Ho and thus is inc tile Hri E(Y) ~ ~"" (3./. . . sta tistica l hypothesis testing can be posed ~S..J . IIdll iljpGtI1ESa IS tliJi population mean . if Y is the ho urly e arning o f a ra ndomly selecte d recent co cge gra un e. o n average in the populatio n.l. ta kes on a specific value.~ \ ~~s.:value _ I c. (the Dull hypothesis is false or mea n equa S fro(the nul] hypothesis IS true Ul Iffers from ~ y'o because of ~iII I . the sample average Y mrety be exact ly cqua l lo the hypothesized va lue JL Y.
. ingasfallSIIC31 caslasa versetol cnu y t sisa I comute In your sam Ie. . .24 by pure nmaom samp im ' 120 (Ihe esis is true.u. it the o rem. when th Ie size is lar 'e the sam li n dlSlri Imate by a normal dist ribution ." ~ .o ) · (3.y'(~ so unde r the n ull hypothesis Y is d istributed N(IJ. the p..the pull hYpothesis in a wa y thaI accounts (or §ampUrg unserlaint)'.u. IS pva ue IS sma .3. also called the signifiamce robabilih'..5) /JrI \ . in your sa m ple of recent college g ra u ate~ c r'fI • P"f1AIfAJ: but ion und /) fJ'''''... (Tt )../ is consiste n t with the null hypothesis."1 _~~( average wage is $22.~. A s d iscussed in Section 2.'.eon 73 is im possib le 1 0 distinguish betwe en these T WO possibilities with certainly.II.~Y.6.. " k• A h mpme t e p va ue ry 0 now the sampling distribu tion of Y under th e null hypothesis.ol > IV"" . ~"The pvalue. nder the nu Yl norma l d ist ri bu tion is . is the r . o r exa mple. AHhougha sample of data canno l pro vide conclusive e vide nce about the null hypothesis. .Y al leas! as far in the tails of its distrih si~ s sam Ie avera£e ou actually computed.. when the sam Ie size is small this dlslnbutJon is com licated .s calcula lio)involves using the da ta to compute the pva lue of the n ull hypothes is. assumin the null h ' thesis is correct. ~ That is.' ue IS I c probability of drawmg. "':~_ 'sel f _A' ''~ A. H ow v r accord in 0 the .. Thepva lue is the leas t as different (rom POPU [:l tIO value of $22.'. but if the value IS small. Thi..ol· If the p value is large.5" pl~ would have been drawn if the null hy  p value = PrHJI"V .24. is no . it is possible to do a pro babilistic ca lculation tha t permils tes ting V I" <>a'''I'I.... . the pvalue js the a rea in the ta ils of the distributio n of V under the null hypo thesis beyond IYIIe/ . suppose tha t. In the case at hand. then the observed value y uc.say .it ..2 Hypo~is Test~ Concerning the Population M.~ ..LlI5.)~o.
r " """I 1 "y z where u~ = (T ~/ Il. bg wcye r d COrud o n whether a1.1 Calculating a pvolue The JM'Olue is !he probability of drawing 0 voIue of Y tOOt differs from ""'(0 by at Ieo~t 0$ much os Y "".6) .rr)/ u.U~1 under the null hypolhesis. val ue is t he probability of o b ta ining . the pva!uc) is ij pvnluc  P'HW ~:Y'I > IY ·'u/. I) _ I f oli' ~ It I.lriboled """.." .ro."'~i).'. (3. (Y JL}..~r)/ar' = !'f{O. Calculating thepValue When Uy Is Known Th e c:llcu la lion of the p value when U y is known is summari7.Th i s largesample no rma l approximation ma kes it possible III compute the pvalue without needing 10 know the population d istribution of Y. lT y)' whe r~(:~..1 (that is. then under the null hypothesis the sampling d istribution cry is N(JLy'o. unde r the null hypotbesis..74 CHAPTER 3 Review of Statistics FIGURE 3.U)IcT} . Y is di~tributed N(p.1.cd in Figure 3.2q{IY.0. Thus. as long us the samp le size is large. Thus the pyolue is the ~ 51ondord normal tail probabilIty oul~ide tYOC1 .. "" O"~ / " .fl va lue of Y 0 ." °i) . is k nown . l~ lUi! prob abil it)' in Ii gure 3.ul o ".ln large samples. The p . I). If the sample size is la rge. the stan darJlI~cd ve rSIOn o f Y. so (V' . The details of the cakuhll ion .jA. has a standa rd normal d istrib ut ion . is di.
i = 1. Similarly. ut The formula for the pvaluc in Equalion (3.l. is the sou.7) instead of ncoI" rects for this small downward bias. t±r is repl aced by Y. c formula for the sample variance is much like Ihe formula fo r the popula· ti on variance. nle sumplc "ariance. £!( Y( y )2 J [(II 1) / lIlo} Th us.'lrr root of the s. wo 10 xercise 3. so th a t 0: 0 e ' rees egrc£s o 0 f freedom reedor l. In practice.. (3.y by Y inl roduc ' . .n.t:'L~I (l'.weed of n is called a degrees or freedom correction: Est'f' matin the mean uses u some of the IIIlo rmatlo ll that uses u . • ui. The populat ion variance .y )2.l.t modificatIon y is unknown and thus must be estima ted: the natural estimator of IL )' is Y.6) ~kgepd § 99 the "niaper Of !he ·0 ulalion distribution. and Standard Error 'ille sa mple variance $~ is an estimator of th<!' population va rianc~ (T ~: the sample ~ I<lndard deviation s} is a n estimato r of the popul ation siandmd deviation (Ty: and the sta ndard error of the sa mple average Y is an estimator of the standard devia Lion o f the sampling d istributio n of Y. and as a res ult s~ is unbiased. A n exception is w en i IS mary so !ts ISln UUon IS e rnou I.I in Eq uation (3. and second.ion.. n 1_ . the avera s the divisor" I instead of II.3.f~. e reason for th~ fin. the sample vuriantt is th e sam. . The Sample Variance. Yl! " £(( Y.J.1 8.JJ. is .I. t he lH' :.'flle rea· son fo r the second modificatio ndividi ng bv " .. ThaI is. this variance is ty iennv unknown.. pie average of (Y. £(Y . :~rce 0 . e data. one "<. is the average value of (Y . Sample Standard Deviation. D ividing b" 2 ! in 5'11 1'$11 til:a) '. . we now turn to the problem of es tima ting (T ~.•f = _ ~(Y..L.in remain  is. with twOmodifications : Firsl.yfl = (II .o) / u y.7) l nc sample ~tanda rd deviatjon h·..2 Hypori'Iesi$ Tem Concerning the Population Meal 7S where <1'1 is the standard norma l cumulative distribution fUllc tio n. in which case the van ance ISdele rimned bv the nul! h)'QQ!h£§i§' Set' EpllalipQ (2 7l ! Because in general must be estimated before the [Iva lue can be compu ted. .I .y)2 in the population distribut. .s the area in the tails of a standard norm lll distri hutio n outsid~ == (Y""  ILy.lOlDlc= variance.Iluc .I V)' .! ios'C o " g f by II is Ih O' nO mati ng J. The sample variance and standard deviation .I )(T5 " Di viding by n . .ILy)2. .
u Y) Calculating the pValue When Uy Is Unknown Because s~ is a consiste nt estima tor of u ~.d. E quati on (3. th e reason that s~.. The standard error of Y is sum marized as Key Co ncept 3. The fo rmula for fhe sta ndard error also ta kes o n a simple form that depends o nl y o n Y and 11 : S£(P) = v' Y (l I IT . itively. bas a finite fo urth mo me n!: thaI is.Y" are i.p )/ 11 [see E q ua lio n (2. the p...4 The sJaPdard error of Y is an est jmatgr of the §1ftpda rd deviatio Q of Y.. the sample varia nce is close to the population va ria nce with high probability when 11 is large. The stan dard e rror of Y is denoted by S£(Y) or by uy. C'I VJ .6) by the standard e rror. (3.)2911'5' hi] illite. Y~ are i. ~" over the symbol me ans th a I this is a n esriml'l lor of u y ) .9) In other words. SE(Y) = (r y_ That is. . de no led by SE(P) o r by y (the .. But for st to obec le law o f lar' 4\ must be i 6.3 unde r the assumptio ns tha t Y. est imator of th e population varil'lDce: 111e sample variflnce is a consiSle lll sf ~ u~.. The estimator of a y • Sy / VIi..value can be computed by replac ing u y in Eq ua t. must have a fin ite fo urth mo me nt. (y.. 1.9) justifies using sy/Vii as an estima· to r of a y .. (3. . 15 consistent IS that It IS a sample e 'sin Ke awof lar: e numbers. wh en U y is . The standard error ofY.d.i.76 CHAPTER 3 R eview of Stotisti c~ KEY CONCEPT THE STANDARD ERROR OF Y ~ 3. 5£(1') = up = sy/'Vii .8) Consistency ofthe sample variance.ion (3. is cal led Ihe standard e rro r o f Y a nd i!. Y. Yn are Li. The resull in Equal ion (3.7)1.9) is proven in A ppendix 3... Because the standard deviation of th e sampling dis tribut ion of Y is CTr = a ylVn.i. aod Y.d . When Y " . Whe n YI •.4. in o the r words. the formula for the varia nce of Y simplifies to p(1 . dra ws from a Be rn o ulli distribution with su ccess probability p.
14 / \1200 = 1.s} is close to (1~ with high probabilit y. The sa mple a ve rage wage js Y"CI = $22.}Ly.. t is approximately distributed N(O./J.}Ly.. Acco rdingly..64 .2. Fro m Appe nd ix Ta ble I.1 0) ca n be re written in te rms the Istatistic... the lstutlslic o r lraUg.o r = S£(Y) (3.e II.u SE (Y) (3.s calculated using the fo rmula 77 pv. the pva lue . (3..3.y. Y .'I).7). Thus the distribution of the /Sta tistic is approxim ate ly the sa me as the distributio n of (Y . .14) As a hypothe tica l e xa mple. suppose tha t a sample o f II "" 200 rece nt college grad uates is used to test the null hypothesis tha t tne mean wage. 13) Accord ingl y.l' . assuming the null hypothesis . Whe n " is large.039.l 2) The formul a for the p value in E quati o n (3. lue = 2<1>  . LarResample distribution ofthe tstatistic. or 3. Y II are jj.64 a nd the sa mple standa rd deviatio n is $)' = S1 8. ( I l'~' (3. The lStatistic is a n im po rtant example o f a test statistic.(6) = 0..d . .2. Le t I~(.06... The value of Ihe Ista tistic is ( fl<1 = (22. under the null hypothesis.28 .28.'1 y .20)/1.. whe n " is largt:. a test st atistic is a statistic used to perfo rm a hypothesis test. I) ..va lue ~ 2e1l( . The n the sta nda rd e rro r o f Y issyl vn "" 18. tbe pvalue can be calcula ted usin g p.p.9%.o)/SE(V) plays a cent ral role in testing sta tistical hypotheses and has a specia l name.ean unk nown a nd Y I _. 1) fo r la rs. the pval uc is 2~( . E(y) . 11) In gene ral. (J.""' sEc'Y) .10) The tStatistic The standardized sample ave rage (Y .yu)/a V' wh ich in turn is we ll approxim ated by the sta nda rd norma l d istri butio n whe n" is large beca use o f the ce ntra l limit tbeore m ( Key Conce pt 2. I de note the valu e of the Isia tistic actua lly computed: or r".2 HypoIhe~i ~ T esh COf"ICefning the Pop!Jlalion M..14 . is $20Ihour.. Thllt is.
>ugh.that i ~. r. the probabilit y o{ e rro neously rcje("tmg the null hypothesis ( rejecting tile null h)'Polhesis w h~n it is in fact true) is 5% .96.96. repon ing o nl y whe ther the nu ll .06.t is 1. This a pproach gives prefere nt ia l trealme nt to the null hypothesis.y is said 10 be sta tislica ll y sig nificantly dif fe rent from J.  p Reject I/o iftt'''''1 > 1. t hen unde r the null h y po th ~s i s the (statistic has a N(O. l cs ting hypo theses using a prespecifie d significance level docs not req uire computi ng pvalues. 1) distribution. the null the n you wi ll r~ject the nul! hYPO l h~si s if and on ly if the p value is less th an 0. 'me signi(icance levt:l of the test in Eq ua ti o n (3.05.15) 10at is. the probabili ty of obtaining a sample 3vcragt! atlCasl as different from the null as the one actua lly computed is 3. .i.15) is 5%.96. In the previous exampl e o f testing th e hypothesis that the me a n ea rnin g o f re ce nt college graduates is $20.o at the 5% sig nificance le vel. so th e hypothesis is r~jected a t the 5% level. 1f th e lest rejects a t the 5% ~ig oiJicance leve l" tb e po pu lation mean fJ.96. ~ 'F" ~ (forcxa mple_5%).:d fro m the sample j~ greate r {han 1. Hypothesis tests can be pe rformed wi thout computing the p valuc if you ar e willing to specify in adva nce the proba· bility yo u a rc willing to rolerate of maki ng the first kind of mistake . summ arized in Key Concept 3. hut in many prac tica l sit ualions th is prefe re ntial treatme nt is appropria te. or you can fa il to reject the null hypothesis when il is fa lse. If you chouse a prespeci . (3.: e»ceeds 1. and the rejection region is the va lues o f tbe Istatistic outside :t 1. you clln make fwO types of mis takes: You can incorrect ly reject tbe null hypothesis whcn it is true. T11.. A1thoug h performing the test with a 5% signi ficance level is easy. Thus. This (camework fo r testing statistical hypothes~s has some s pecialized termi no logy.9%. reject if the absolute value of tllC (statistic c()mpu{. th e (sta tistic Wil') 2. Hypothesis Testing with a Prespecified Significance Level When you undertake a statistical hypothesis lest. If I I is large e n. of ~.78 CHAPTER 3 Review of Sloli~tics to be true.5 . the critica l value of this twoside d te!.96.
"'."'. by random sampling vnriat io n. legal cases sometimes involve sla tislical evidence. In some legal sc ll ings the significance level used is 1% or even 0.1 'Yo. in which the null h ~rpot hesi s is rejected whe n in I'aet it is true.· ticians and econome tricians use a 5% si gn ificance level.'. in which the null hypothesis is nol rejected whe n ill fuet it is false. The sct o f values of the test statistic for which the test rej ~cts the null hYP()l hesis is the rcjl'C: lion region. the pvalue is the smallest significance level at which )'OU can reject the null hypothesis.. So metimes a more conservative signi fica nce level might be in order.5 hypothesis is rejected at a prespecified ~i gni fieance level conveys less informa tion than rcponing the pvalue. 'nli:: critical "alue o f the teS! sta tistic is the va lue of the statistic for wltieh the test j ust rejects the null hypothesis a t the give n significance level. . If you were to test many !>latistieal hypotheses at the 5% level. For e xample.3. yo u would incorrectly reject the null on aver· age once in 20 cases.s is th e . Similarly. The p vnlur. Eq uivalently. the prespccified probability of n type I e rroris the significance le. and the null hypothesis could bc thai the defe ndant is not guilty.1 of Ihe tcst. The p re~ p ecificd rejec tion probability of a statist ica l hypothesis test when the null hypothesis is true that is. a t least as adverse (0 the null hypotht.2 Hypothesi~ Test~ Concerning the Population M. stat h. a very conservative standard migh t be in order so that consumers can be sure thOl the drugs available in the markel actually work.lcccptanc:c region. and the va lues of the tt'l)l statistic for which it d oe s not rcjectlhe null h)'pothes. lind the prob at'lilit)' thai th e [cst correctly rejeclS the !lull hypothesis when the al tcrn ative is true is the power of tlli~ tr... if a government agency is considering pCfmilling the sa le of a new d rug.:: is the proba bility of oh ta inin g a te st statistic. to avoid Ihis sort of mistake. and it t~' pc U error.sis value a~ is the statistic ac tually ob<. What significance level should you use in practice? In many cases.ervcd. KEY CONCEPT ~ 3. • "/''l'"Y.eoo 79 '.". The probability tha l the test actually incof reelly rejects the null h y pOTh es i ~ when it is true is the size of the test.''Y''''''' ""1 THE TERMINOLOGY OF HYPOTHESIS TESTING A Statislica! hypothesis leSt can make 1\\10 types uf mislakes: a type I error.//.::st. assuming tha t Ihe Ilull hypolhl!sis is correct. then one would wan l to be quite sure that n rejection of the n ult (conclusion of gui lt) is not just a result of random sample varia tion .
• 80 CH .o . the alternat ive hypothesis might be lhat the mean exceeds . 3.S£ (Y) IEq uation (3. Compute the {statistic [Equation (3. and th e more difficult it becomes to reject the nu ll when the null is false. with the modifi cation that only large positive values of the (statistic reject the n ull hypot hesis. 13).Y. rather than values that are large in absolute val ue.values and to hypothesis testing is the same for onesided alte rnatives as it is for twosided alternatives.." J KEY CONe"" 3..O AGAINST THE A LTERNATIVE E(Y) Mr. In fact . (3. has a cost: The smaller lhe significance level. PTER 3 Review of Statistin "O_. 1.05 (equivalently. construct the istatistic in Equation (3..O. Specifically. in the sense of using a very low signific ance leve l. if l t~1 1 > L96). Compute the standard erro r of Y.6 summarizcs hypothesis tests (or me population mean against the twosided alternative. the larger Ihe critical value.0'1/".6 ".'I. 16). the lower the power of the test. the most can· servative thing to do is never to rejecllhe null hypothesisbut if thaI is your view. so a 5% significance \evel is often considered 10 be a reasonable compromise. 10 test the onesilled hypothesis in Equation (3.o (onesi ded alternative).. Key Concept 3.14)]_ Reject the hypothesis at the 5'"1/0 sig nificance level if the pvalue is less than 0.Ly. TESTING THE HYPOTHESIS E(Y) = My. OneSided Alternatives In some circumstances. Being conse rvative. Many economic and policy application s cao call for less conservatism than a legal case. * . Comp ute the p value [Equation (3. then you never need to look at any statistical evidence. for you will never change your mind! The lower the signifi cance level. but rather th at graduates earn mo re than nongralluates.8)1. 2. The p value is the area under the standard norma l distribution to the right of the . so th e reI· evant alternative to the null hypothesis that earn ings are the same for college grad· uates and nongraduates is not just that their ea rnings differ. For example.16) The general approach to computing p . This is ca ll ed a o nesided allernathc bypothesis and ca n be written HI: E(Y) > f.13)J. one hopes thai education helps in the labor market. .
u r.17) The N(O .16) concerns values of I'.3 Con ~dence Intervals for the PoptJlo~on Mean 8t calculated tstatistic. you can tell him whetber his hypothesis is rejected or not sim ply by lOOking up his number o n your handy list.lblc property: The pro bability Iha t it con tains the true value of the population mean is 95%.645. ' m e rejection regio n for th is test is all values of Ih e (statistic exceeding 1.o by com puting the {statistic... However. it is impossible to learn the exact value of the population mean of Y lL <..645. The confidence sct fo r fJy tu rns out to be all the possible vn lu ~s of the mean be tween a lower and an upper limi t.'). ing only the info rmation in a sample. I) critical value for a onesided test with a 5% significance leve l is 1.96. is p value = PrJ/a(Z > .o a nd test it . Tha t is. H e re is one way to construct a 95% confide nce set fo r the popul atio n mean. This list is useful becau~e it !'>ummarizes the set of hypotheses you can and can not reject (at the .):0. 3.. keep doing this for a ll possible val ues o f the pop ulation mean. and write down this nonrejected value 1'. so that the coo fid e nce set is an interval.t)  I . (3.0.o agains t the alterna tive tha t Jt)' .3. if you cannot reject it.3 Confidence Intervals for the Population Mean Because of rando m sampling error.. .« Jtr. the p va lue. fo r exam ple.1.645. Ihis hypothesized va lue . A bit of clever reasoning shows that this se t of values has a rc markl. write this value down o n your list. called a cOlllidence interval. Continuing this process yields the set of a ll va lues of the populatio n mean tha t cannot be rejected al tbe.u Y.y is conta ined in this set is called the confidence le\·el. Begin by pick ing some arbitrary va lue for the mean: ca11 this 11y. Such a set is called a confidence set . it is pos sible to use da ta fro m a rando m sample to construct a set of va lues that con ta ins the true popu la tion mean JJy with a certa in prespecifi cd probability.. I) approximation to the distri bution of the (statistic.o. the 5% rejection region consists o f values o f the (statistic less than .yexcceding I'y.ou with a specific numbe r in mind. . Do this again a nd a gain: indeed. Test the null hypo thesis tha t JAy = I'y..5% level by a twosided hypothesis test.o is no t rejected a t the 5% level.. If instead the alternative hypothesis is that E(Y) < .Now pick anothe r a rbitrary value of JAr. if il is less tha n 1.<1>(.O ' t he n the discussion of the previous paragraph applies exce pt that the signs a re switched. based on the N(O.5% level) based on your dat a: If someone walks up to ).. The o nesided hypothesis in Equa tio n (3. and the prespecified probability tha t JI.
consider the problem of constructing a 95% confidence intl' r val for the n K~an hourly earnings of recent co llege graduates using a hypothetical random SAmple of 200 recent college gmduatcs whe re Y = $22.13) .64 and SfTV) 1. I) d istri bution.5: Ihis means that in 95% of all sampl~ your list will contain the tr ue value o f Ily . Su ppose the true va lue o f J1. ] n 95% of all samples. 1. in particular you tested the true VAlue. Thus.) is 21. K~y Co ncept 3. Thu~ the set of "alues of J. According to the formula for the Istatistic in Equation (3. y that are not rejected a l Ihe 5% leve l consist.et uf values of Il r that cannot be rejected hy a onesided hypothesis test.5 (although we do nOI know this).I.5 at the 5% leve l is 5%.5 has a N(O. if n is large. Fortunately there is a much e.7 summaril.5!{S£(YH· 90% confidence in terval for Il) = [y ::!: 1.l. 95 % . That is..y.% x 1.es tb i~ A s an example. O m' could instead construct a onesided confidence interval as the .(Y). The 95% conUdenC(~ interval for mean hourly earn ings is 22.yas nu ll hypotheses. Although onesided confi . l11i s discussion so far has focused on twosided co nfidence in tervals.5 1 = [$20. [J ut bccau~e you tested al i possible values of the population mean in constructing your set . Then Y has a normal d istri bu tion centere d on 21.::!:.64 :: 2. ier approach.t:.96S£(Y) :s: Jll :S approach.15 ). Ill" = 21.. Y+ 1.7 A 95% twosided confidence inlervaJ fo r Il y is an inte rval constructed so that it contain s the true value of Il l' in !}5% of all pt)~sib1c random samples.96SE(y) of Y. a nd 99 % confidence intervnls for Jly are CONFIDENCE INTERVALS FOR THE POPULATION MEA N ')5 % confidence interva l for Jly = (y =.64 . fo r 11 requires you to test all p os~ible VAlues of lJ. a\\a~ from Y.l. of those values within :':" J. $25. (Y ::': 2. 99% confid ence inte rval for Jly "" TIle clever r e asoning goes li ke t his.82 CHAPTER 3 Review of Slothtic~ 3.64SfiY)}.5. and the Istatistic test in g the null hypothesis Ily "" 2 1.13.2:< = 22.28. Thus. a trial val ue of J1. the probability of rejecting the null hypothesis J1. 90% . When the sample size II is la rge.96SF. you will correctl y (Iccep l 21.y ::= 21. This met hod of constructing a confidence SCI is im practical. a 95% confidem:t: interval for Il ) is Y . the values on your liq co nstitute a 95% confidence set for J.1.%S£(V) j.5.):u is rejected at the 5% level if it is more than 1.96 stamiard error.
.. sa y do... according to the central limil theo rem.  11.. :..~ Recall thill Ym is.... . approximately distributed N(JJ. 18) The null hypothesis that men a nd women in these populations have the same earn ings corrc~ponds to Ho in Equation (3.· of cltrnings for men. is Y".lhey are uncom mon in applied econometric analysh. .Y.""q~/" .. fO const ruct confide nce intervals (or the d iffe rence in Ihe means from IwO diffe rent Hypothesis Tests for the Difference Between Two Means Let 11... . .18) with dn ::: O... be the mean hourly earning in the popuilltion of women recently g.fllm )' where ""}" is the popUlation vari anc. Simi larly..Then an estimator of Po".... This sectio n summarizes how to lest hypothe!oes and how populalions. (f . computed over all poss ible ra ndom sarn plesIhal il cOnlai ns the true population mea n. The cove rage probah ility of u confidence inlerv" l for Ihe POPUI<l lioli mean is the pro bability. dn "s.IL.11 .4 Comparing Means from Different Populations Do recent male and female college graduHles earn the same amount on a verage? This question involves compa ring the means of two diffe rent population distribu tions.. Considc!' Ihe Il ull hypoth esis that earn ings for these two populations differ by a ce r tain amount . H I: 11".The n the null hypot hesis and the twosi ded a lte rna tive hypothesis ure 11& 11".... women drawn aT random IIOIn their popula tions..p. lin using Y"..radu (lIeu fro m college and let 11.. . dV' (3.).. . . . Suppose we have samples of II".3.". To lest the null hypothesis That IL". for men and Y. men and " .4 Comparing ho\eons from Dilferent Populations 83 denec 1I11en'uh have application!) in some brunches of st8tistics. Leltbe !lample u"erage a nnual earnings be Y". Because these population means are unk no\\ ll..JL.Y. be the popula tion me an fo r rccently grad ua ted men. 3. they m U!lt be estimaTed (rom somple~ of men and wome n. .. Coverage probabilities.. we need to know the distribution of Y". for women. . is appro:<i matcly distributed N(iJ.. Y.
nifi can ce level rejects when ( > 1.L. 19) The Istatistic for testing the null hypothesis is constructed analogously to the /Statistic fo r testing <l hypothesis about a single population mean. and If " are large. II' ~.20) If hoth Ifm and If..Y ) .val ue is comput ed using Equation (3. arc constructed from differe nt randomly se lected samples. is defined si milarly for the women. by subtracting the null hypot hesized va lue of J.w.. are large. Thus.: t = (i'm...Y _ _ M. if the alternative is that "'...20) and compare it 10 Ih e appropria te critical value.I. simply calculate tho:: Istatist ic in Equation (3. Th us the standard error of Y". To conduct a lest wi th a prespecifi ed significance level. ) .f.f.) = \ nm / S2 m + n·. . (3. is the population vAriance of earnings for women . and . .  SE(Y.is Istatistic has a standard normal distribution.Y. . Y .Y".."'" from the eslimalor Y".8.2 (3.do ( SE( Y . As before.11". H u~ and 0'.J...: extends to constructing n confidence imcn'al for the difference belween the . is Y"... tbe n tb is approximate normal distribution can be used to compute pva lues for the lest of the null hypothesis that J. however." where . Because Y. f .4 that a weighted average of two normal mndom vu riables is itself normally dislribUied .J) . (U~' ''''I) + (0''...: value of the I·statist ic exceeds 1. For ex ample. 17).96. they can be estimated usi ng the sample variances..In practice. recall from Sec· tion 2.65.. ) Istatistic or companng two means . these population varia nces a re typica ll y unknown so they must be estimated. Also. .7)..~. then the lest is modifi ed as outlined in Section 3..'1. th ey arc indepe nde nt random va riables. = do. the null hypothesis is rejected at the 5% significance level if the absolu t. II' . and a test with a 5% sig. is distributed ~'" . The pvalue is computed using Equation (3.1 11..14) .. is defin ed as in Equation (3. the pvalue of the IwosideJ test is computed exactly as it was in the case o( a single population: that is.L. CHAPTER 3 Review of Stoti)tlCl where O'~. 2... Because the (statisric in Equation (3 .. Confidence Intervals for the Difference Between Two Population Means The method for constructing confidence intervals summarized in Section 3.20) has a standard normal distribution under the null hypothesis when tI"... except that the statistic is computed only for the men in the sample. > do). then th.)). Ih e . . and dividing the result by the standard e rror of '1". s~ and .J..F". If the alternative is one·sided ra ther than twosided (that is.' are known. and Y.Lm .
.. the box " The G ender G ap of Earni ngs o f Col · lege Gradu ates in Ih e U.: in hand.5 DifferencesofMeons Estimorioo of Cousal Effects Using Experimenlal Dolo means. is (Y.E(YIX = x) .96standardcrrorsofY". In the context of e xperiments. the trcalmenl effect) is E(Y I X = I) .... Y".96. . or to a conlrol group. which receives the exper imenta l treatment.Y.E(YlX = 0) in an ideal randomized colllrolled experiment. )" 1.: (3. The diffe rence bd wccn the sample means of the Ire.96S£(Y.96 mea ns IhHI the estimated diUercncc.Umen! and camral groups is a n est ima tOr of the causa] effect of the treatmen t. 1.lhe causal effect is also called the treatment effect.  Y"" is less than 1.2 tha i a random ized controlled ~x pcri mc nt randomly selects su bjects (individuals Of.m }l . .96'((0 will he in the confidence sel if fll:S. (/ = IJ. which docs not receive Ihe {rcalme n!. lllUS. If there arc only two trealmenlle"e ls (that is. the causal e ffe ct on Y of Iremment level .S.Y. Ihe 95% twosided conridcnce interval for (/ consists o f those values lII' d within ± 1. .S DifferencesofMeans Estimation of Causal Effects Using Experimental Data Reca ll from Section 1.. ." contains an empirical investigation of gender differ· ences in cumings of US college graduates. entit ies ) from a populati on o f interest. then randomly assign~ them either to a treatment group. m . ).. W ith these form ula!.fl.3. If the lreatment is binary treatment. Specif ically. RUI III s: 1..96 standard errors away from do.1::(Y IX "" 0 ). then we can let X = 0 dcnOlc fhe control group und X = I denote the treatment group. morc generally. The Causal Effect as a Difference of Conditional Expectations The causal effect of a treatment is the expected effecl on the outcome of interest of the treatment as measured in an ideal randomi7c d controlled experimcnL 1'11is effect can be expressed as the difference of two conditional expectations. if the treatmen t is binary). 3. where E( Y IX = x ) is the expected value or y ror the trealment group (which rece ives treatmcnt 1e\'el X "" x) in an ideal randomized controlled experiment and F:( YI X = 0) is the expected value of Y for th e control group (which receives treatment lc vel X = 0).  Y. is the difference in the conditional expcclalions. then the causal effect (Ihut is.. 8S Because the hypothesized value do is rejected al the 5 % level if lll > 1.21) 95% confidence interval for d = IJ..
52 21..40 2. the avcrage hourly earnings of the 1. 1996.48.70 8.79/hour in rcnl terms over th is sample. The 95 % confi dence intenal {or the gender ga p in earnings in 2004 is 3.39 " 1592 y 17. but o\'er a year it adds up to $7 . Third. wit h i~ I ings in the Uni te d Slates in 2004.~O.73" 2.80" 3.163. f_ ' 8. and the standard deviation of ea rn ings for me n was $10. and 2000 we re adjusted for infla tion by pUlling the m in 2004 dollars using the Consumer Price Indel(. Second.99 of 50.30 0.37 0..52 ( . using data collected by the Curren! Popula tion Survey.$18.99 ThdC estimaLes are compuled usintdaLI on III flill ume " 'orkers aged 2534 wrvcyed 111 the CllrrC"nt l'lIpul.3. What arc the recenltrends in this "gender gap" in earnings? Social no rms and laws governing gender discrimina tion in t he work place have changed 5ubstantially in the Uni ted Stales.73/hour to 53..72 18.20 IU6 0..1 Trends in Hourly Earnings in the Uniled Slote.31 a sta nda rd c rror (~ V lO. .. this Mable or hll ~ it dimini~hed over time'! Table 3..22.. The a verage hourly carning.96 x 0. aDd the sta ndard deviation of earnings was SS. in 2004 of the increase is not statistically significant :It the 5% sig· nificancc level (Exercise 3.. and 50 paid weeks pc.~uming a 4Ohour work wed..3'P11901 + 8. The difference I.521hour: howevc r.1 7). w~ Confid. 1996.s of Working College Graduates.on a verage. "The Distribution of Eam 1739 women surveYl:d was $1 8.29 2.00 10. the dati for::!O:M wen: collected in M a ~b ZOOS). 12) .48 10.21 18." shoW! that. "'imfican!ly dlU crcmirom zero at Lbc ··1 % IIgniroana: te'·el.1.30 2... An hourly gup of $3. Ages 2534. Is the gender gap in c:lrnings of college grad uate~ SJ.60 16.33 19.03 8.HJ4. FirsLthc gender gap is large.'J I . 1992 10 2004. in 2004 women C :UlI/illIlCti TABU 3. Y_l 1992 1996 20.52 might not sound like much. Y_.3 1377 1300 1901 7.90 " 1370 1235 l lR2 1739 Y  y.!: 1.\ suggest four conclusions. from 52..52" 0. L In 2004.56" 3...52 . The results in Table 3.1 gives eSlimltlcs of hourly earnings for collegeeducated full time wo rkcrsagcd 2534 in the U nited States in 1m.rKe . ~ .901 men surveyed was $21.n 2.:r yea r. Ihis gap is large if il is meas ured instead in percentage tcrms:Accord ing to the estimates in Table 3. male college graduates e arn more than female ("Ollegc grad uates..1 @1I739).914 1.111us the estimate of the gende r gap in earnin~ for 2004 . Earnings for 1992.16. Sli P. Ow.29 2.<1.31 "" ( S2..ulX! Surw:r ron dueled m Mardi of lhe f\l:lt yel r (for exlmple.99.86 CHAPUR 3 Review 01Stotistk:$ The box in Chaptet 2..39. 2000. w~ Diffto"• • Mo." . as.n ¥'S. the esti mated gender glLp has ineTensed by $O. and 2004. $21. in 2004 DoH ars .47 ' 6.'17).31 :woo 2004 21.
3. TIlis empirical analysis documents Ihat the "gen a measure of the price of a "mMket basket" of consumer de r gap" in hourly earnings is large and has been goods and ~el"iiCC5 constructed by the Bureau of ubor Sta fairly stahle (or perhaps increased slightly) over the tistics.12. howclJcr.346 to PUt them into "2004 dollars. the gender gap is smaller for young college gradu31t's (the group analyzed in Table 3. Ol'cr the twcl\'e ~'cars rrom 1m to 2004..4): As reported in Table 2. the mean earnings for all collegecduc:lled women working fu lltime in 2004 was $21.99). then the causaJ effect can be estimated by the difference in the sample average outcom es berween the trea tment and control groups. the pric. !his adjustmell t is to uS<: th~ Consumer Price Indt .20). difficult to administer.. On.6%: in other words.. such as medicine. tell us CPI baske t of goods and ~rvjccs thai COSt SIOO in 1992 cost $134.~ (CP1).·as worth more than a dollar in 2004.Means Estimation of Cousal Effec~ Using Experimental Dolo 87 earned 16% less per hour than lIlen did (53. randomized controlled experiments are commonly conducted in some fie lds.1S3.12)127. Founh. To mak e earning. In economics.33).lhHI is. while for men this mean lBecause ofi nflatioll.: of the CPI market baske\ rose hy 34.73/ $20. so a 95% con fidence interva l for the causa l effect can be constructed using Equation (3. and .60 in 2004. b~' multiply fere nces in skills. in some cases. ethically ques {ionab le. 1. econometricians sometimes st udy "natural experimen ts:' a lso called quasiexperimen ts. The hypothesis thatlhe treatment is ineffective is equi valent to the hypothesis that the two mea ns are the same. Thus 24% [= (27. For Ihis reason.83 . A welldesigned. in the sense tha t a dollar in 1992 could bu~' more goods and ~crvices !hun a doll~ r in lOOJ could.1) than it is for al! college graduates (ana men and women'? Does it reflect differences in choice of jobs'! O r is tht:re some other cause? We return to [hest: questions once we have in hand the tools of multiple regression analysis.. in which some event . which corresponds to a gender gap of Estimation of the Causal Effect Using Differences of Means If the treatmen t in a randomized co ntIoU ed experiment is binary. or education between ing 1992 earning:.4.a dollar in 1'192 . experi ments tend to be expensive. so they remain rare. lyzed in Table 2. For this reason.21) . wellrun experiment can provide a compelling estimate of a causal effect.' in 19')2 and 2(Xl~ C()m why this gap exists. however.: wa y to ma ke collegeeducated workers.521 $21.83] among all full'lime earnings in 19')2 canno t be directly compared to earnings in 2004 without adjusting {or mfla tion.5 Difference~ol. the recent P<lSL The analysis does not. ~ was $27. the topic of Pa rt IJ. more than the gap o f 13% ~cn in 1992 (S2. Does it arise from gende r dis parahle in THble 3.21. by 1. experience. given in Equation (3. A 95% confidence interval for the difference in tbe means orlhe two groups is a 95% confidence interval for the causal effect. 1992 earnings ~re infla ted b~' the crimination in the labor market? Does it rencet dif amount of (wendl CPt price inflation. which can be tested using the 'statistic for comparing two means.
which Hpplies wh en th e ~ am plc size is large. using dala Yl .. sr·/n (3.ign ing dif ferent lrenlmcnLS to different subjects as if they had been part of a randomi/...5.cd controlled experime nt.!' (~st at i stic testing the mean of a si ngle population i!> the Student f distribution with /I .. 7). however.1 degrees of freedom.10). the standard normal distribution can provide (j poor approximation to t he distribu tion of the {slatistic. If. 'fbcre is.LI. where the standa rd error of Y is given hy Equat ion (3. then the exact distribution (that is. and critical values can be taken from the St udent (distribu tion. the population d ist ribution is ilself normally distributed.'i on the di~tribution or Y. small. As discussed in Section 3. 3..:!" the sample siLe b small . tfUt' [see Equation (3.Substitution of the laller expression into the formur yields the fur mula for the {statistic:  y V J.~. .6 Using the tStatistic When the Sample Size Is Small 1n Sl:ctions 3.tie is reliable for 11 wide range of distributions of Y if II is large.." provides an exam ple of such a quasiexperime nt thai yielded some surpri si ng conclusions. The tStatistic and the Student t Distribution The tstatistic testing the mean .2. Y" . however. "A No\cI Way to Boosl Retirement Savings.um. The formula for thi s stlltiSlic is given by Equation (3. .. the finitesample dil3tribution: see Section 2..Although the standard normal approximation III thl! Ista~ tl.B).22) where ~~ is gi\'en in Equation (3.2 through 3. Wh!.. it can be unrcli able if /I i.O .6) of thl.88 CHAPTER 3 Review 01 Skltistic~ unrclnted 10 the treat ment or subject characteristit:S has the effeci of a&. The usc of t he standard norma l distributio n is justified by the ccntra l limi! theorem .: in \\hich . .. hypoth es i ~ Co nsider the {·statistic used 10 test the thai the mean of Y is . and it can he \'cry complicated. the (stalistic is used in conjunct ion with critical va l ues from the standard nonnal distribution for hypothesis It!lIli ng and for the con struction of confidence intervals. under ge neral wndilions the !Stalistie h~s a (tan dilrd normal distribution if the sampk size is large and the null h ypo t hesi~ i::. onc special en. The exact distribution of the fstausticdepcnd.1 2)j . The box .
1). the 5% twosided critical value for thl' ' 19 distrihution is 2.O is correct.lel Z = (Y .. u }! n ) for all II. then under the nu ll hypoth esis the H t n ii~ ti c given in Equation (3 . The Ista tis tic testing the d if· ference o f two means.. consider a hypothetical problem in which lOCI = 2. d ist ri bution . y.. Because the Istatistic is larger in abso lute value than the critical value (2. W = ( 11 . .l distribution for all II..1 degrees of freedom.1.. 1(11 .22) ean be writt en as Z /VW /(lI .. does not have a Student .3. y.4 that if )'l.d. The 95% con fi dence interval fO r My.I degree ~ of freedom .i.15 > 2. wo uld be Y !: 2.. where Z is a random variable with a standard no rmal distribution. a nd Z and lV arc independently distributed.. ' Y" are i. constructed using the / 19 dis tri but ion.(T ~) .1 degre'es of freedom is deIJncd 10 be the distribution of Zl'l w /( 11 .15 and 11 = 20 so that the degrees of freedom is" .llIy.l)s~ / rT~: then some algebra l shows that the Istatistic in Equation (3. then th e {slatistic in Eq uation (3. and Y and . recall from Section 2.. thc n critical va lues (rol11 the Student I distribution can be used to perfo rm hypothesis tests and to const ruct confidence intervals. thus..22) has HStude nt t diSlrihUlion with 11 .d..I~n I"" V4..J.96. if the population dist ribu tion of Yis nonnal. and the populat io n dis tribution of Y is N(J.l )s~h'~ bas a X~ .22) has an exad Studell t I distribution wit h II . then Z = (y .i. When y 1•. the null hypo th esis would be rejected <It the 5% si gnificance Icvel against the twosided al ternative.ViV / (>II).I)Jifui· Vvrlll '\ " Z . if the n ull hypo thesis JI. (1'~).py. It fo llows thllt. the Istatistic can be written as such aralia. To "crifv this resu ll. In additio n.I) .!!. The St udent I distribution docs not apply here because the variance estimator used to compute the standard error in Eq uation (1 .. IThe d<:sired ex pression i~ obtained by mul!jpl~'ing and dl\'idinj! by ~ and oollccting terms: l' . Fro m Appen dix Table 2. \ "i ( Y ..= The tstatistic testing differences ofmeans. . even if the population dist ribu tion of Y is normal. YII are i. .o)/V(T~ / 1I has a standard normal distribut io n fo r all /I.al is somewhat wider than the confidence interva l constructed using the standard normal critical value of 1.6 Using the IStati1lic When !he Somple Size Is Small 89 distributed. and the pop ubtian distribution of Y is N(p.1 degree~ of freedom. If the population dist ri but io n is normal ly d istributed.09. y.f~.JA~_") \l'ul"" + GI.09) . l .) ./J. give n in Equation (3.iJ).. Recall from Section 2.4 that the Student I distribution with 11 . distribution ofY is exactly N(p.09SE(Y} This confide nce inter. A s an exa m ple.20). W is II random varinble with 11 chisquared distribution with 1/ . are indepen den tly distributed.19) does 11 0t produce a de nominator in the Is ta tistic with a chisquared distribution. Spccifi c... then the sampling.}'.1 = 19. ...o)I'/r:rf/1I and let W = (/I .y = P1.
r.90 CHAPU t 3 Review of Stotis~c~ M any economists think th tll worker. from an econometrician's pcrspecti\'c. Although neither cxplanation is eco nomically rational in a conventional sense.he method of enroll ment. Madrian and Shea's finding was lIstomshing.ays optional. find these plims:..lIing employ· ees. 10 change To economists sympathetic to the conventional view lhat the dc{aull enrollment scheme should nOl mattcr. whereas the cnrollment rote for the "OPI()UI " (lrC3tmem) group was 85. But. ) phm from .c. According to conven tional cwnoll1 ic models of hehavior. fidence for thc treatment effect is light (46.8% to 50..: that behav amI au tomatically enrolled (bu t t"Quld l)pl OUI). in full or in part. the change was like a randomly as..{[cel its enrollment rate? To mcasure the rf(ce t of t. ~a\'i n gs no sY!olcmatic dif· ferences bel\\ccn the workers hired before and after the change in the enrollment ddault.igned treatment and the clluS<\1 effect of the change coutd be estimated by the difference in mean~ between the Iwo groups. 'JbC) compared two groups of worlen: those hired the year before the change and not automatically enrolled (but could opt in).o c. the 95% con. or opt inshould scarce ly milller: An cmployee who wnnt:."II other firms employees are enrolled only if they choose to opt in. Madrian and Shea studied a large firm Ihilt chang~'d retirement.:onfusing thatlhey his o r her enrollment stlltus simply and th~' fill~ out a fonn. Con\'cn ~vings nonpartidpation to panicipallOn.4% (n = 4249). Howcver. nre aUlomlllically enrolled In such 11 plan unless they choose to Opl QUI: . Madrian and Shea found that thc default enrol! taken oul of the paycheck of particip. In an imponllOl study published in 2001. The esti male of thc trealment effect is 48. have increa)ingly o~f\'cd arter the {hangt.S. "Iadrian and Shea argued thm there ". called 401(k) plans afler the applicable section of the U. both are consiMent with the predictions of "bchllviortll eco nomics. however. Recently. there has heen an upsurge ill interest in unconventional ways to influ ence economic decisions.livings anot her cxphmalion is thai you ng workers il~ ing wrong? Doe~ would simply rather not think abou t and plan directly .·ladrian and wOllder~d. One potential exphmation for their finding is thaI many worken..imply Ircat the defanlt option as if il were r...5% ( = 85.:liable ad vi ce~ cOlild this coO\'entiona l reasoning be the metliod of 1'Ilrolllllf!1It in a .. lax code.·· and bolh could Icad to lIcttpting the default rOlllilJu{'l f the default option for it~ 401(1.. tend nOlIO save enough for their retirtmenl . Brigitte Madrian and Dennis ShCl1 considered Onc sueh uncon\'cntional method for Slimul tit ing ret irement saving..efl' nomic models. Enrollment in such pl:tns. Beeau~ thei r sample is large.! the same. the method of enrollmentopt out . As a con)cque ncc.2%). The ior is not 1I1~'ays in D eeord with conventional cco financial aspect!:> of the plan wer. s. at some firms employres ment rule made a huge difference: The enrollmcnl ralC (or thc "optin~(control) group ~as 37. ThUs. dollar value of the ti me required to fill out She~ the form is vcry small compared with the finRm:ial implications of thi~ decision..9% (II = 51301). is al.. Many [UIllS offer rClircmelll savings plans in which the firm matches. and those hired in the }'~ar tional mel hods for encouraging retirement ceonomi~ts focus on financial incen tives.9% 37' %)..
. (T~. = 0". the pooled SI<lD d<lrd e rror formul <l applies only in t he spt:cial case Ihallhe two group~ h:lVC the same variance or that each group has the same number of observations (Exercise 3.. and the pooled I·statistic is computed using Equation (3.n1e (that is. see Thaler and starting [ 0 think tha t . The pooled standard error of the dif fcr~ncc in means is SEpooirA Ym . .) = spoolrd X ~ + l In". " 2 [ L .rM is Ihat it applies only if the two population variances an: the same (assuming lIm *.ts <l rt To 1e:HIl more about behavioral economics lind tho.Stoti~tic VVhen the Sample Size Is Small 91 enrollmellt option .m.. . A modified ver"ion of the dit'fcrcncesofmcans Istatistic. where the standard error is the pooled standard error.19).21).u ~l (.ervalions in group In and the second sum· mallon is for the ohservatio lls in group IV...I '" . it does not cven have a standard normal distribution in large samples..  y.such details lIligh! be as iml'or IImt a~ fmll nci(ll aspecb for boosting enrollment in retircmcOi savings plan. SEp(><>I~iy.) . The drawback of using the pooled variance estimator S~~ ..y~. based on a differ ent standard error formulat he "pooled " standard error COOTmlahas an exact $lUdenl . and if the \v.) .mup . then under the null hypothesis the Istatist ic computed using lhe pooled standard erro r has a Stude nt I distribution with " ". Q".r:. Therefore.6 Using the t. ] (3.)2 + ~ iI ' (Y. in fact.~ ). if the popUlation distribution of Y in group n· is N(. the pooled variance estimator is bi<lsed and incon sistent. where the fi rst summation is for the ob:.2 degrees of freedom . If the population variances are different but the pooled variance formula is used .23) ~' ('! u p g.'v ." ).. I'f the population distribu tion of Y in group 1M is N(J. + "". . + 11 .}Q()I~d . distribu tion when Y is normally distributed: howc\'cr. ( Yi  Y".1. the pooled standard error and the pooled (statislic should nOI be used unless you have a good reason to believe Iha t th e population variances are the same.so that the twO groups arc denoted as m and IV.o group variances are the ::...:l. the null distribution of the pooled (statistic is not a St udent I distrihution . .20) . S . The pooled variance estimator is. Bena rlzi (2004).y~. = n". cven if the data arc normally distributed.: design of retiremen t ~1\'in gs plans... Inc reasingly mll ny ccono mi.J.). Adopt the nOlalion of Equation (3.U the pop ulation vmiances arc differen t.
In practice.ic reason for two groups havin g dif fe rent means typically implies that the two groups also could have different vari ances. Accordingly. the sample covariance. "The Distribution of Earnings in the United Slates in 2004" and "A Bad Dayan Wall Street").statistic is valid if the sam ple size is large. Even if the population distributions are norm al. and the sample correlation coefficient. This section reviews three ways to summarize the relationship between variables: the scan crpiol. normal distributions lire the exception (for example. therefore. Y (earnings). the diffcrence in th e pvalu~s comp uted using th" Student (and standard normal distributions ne ver exceed U. large enough for the differe nce between the Student r distribution and the sta ndard norm al distribution to be neg ligible. the pooled standard error form ula is inapp ropriate and the correct standard error fonnula. . used io conjunction with the large sample standard normal approximation. the sample sizes are in the hundreds or thousands. some software uses the Student I distribution to compute pvalues an d confidence mler vals. the normal approximation to the distribution of the . however. stat istic computed using the standard error fonn ul a in EqouLioD (3. inferences about differences in means should be based on Equation (3 . Even thoug. this does not pose a problem because the differe nce betwee n the Student I distribution and the standard normal distribution is negligible if the sam ple size is large. which allows for diffe rent group variances. inferenceshypothesis leslS and confidence intervals about the mean of a distribution should be based on the hlrgesample normal approximation. Therefore. 10 anothcr.92 CHAPTER 3 Review 01Statistics Use of the Student t Distribution in Practice 'F or the problem of testing the mean of Y.. the Studenl r distribution is applicable if the underlying population distribution of Y is normal. X (age). is as given in Equation (3. In most modern applications. \¥hen comparing two means. In practice.19). and the Sample Correlation What is the relationship between age and earnings? This question. and in all applications in th is textbook.19) does nOI l w\'~ n Student t distribution. 3. any econom. For economic variables. like many oth · ers. Even if the underlying da ta are not normally distributed.19) . the Sample Covariance. for n > 80. the)" never ~xceed 0.h the Student {distribution is rarely applit:n ble in economics.002. relates one variable. th e . For n > 15.7 Scatterplots.Ul . sec the boxes in Chaple r 2.
tlerplot shows a positive relationsh ip between age and earnings in this sample : O lder wo rk ers tend to earn more than yo unger workers.2. I : : 201.• • • • • • • • • • • •• •• • • • 501 • 0 !O 25 30 35 40 45 50 5S 60 A" 65 Exh point in the plot reprosent5the age and average earnings alone of tho 200 workers in the sample. Y) pair for one of the observations. " I . I . Scatterplots A scatterplot is a plOLof n observa lions on X . The sc<l. I' . I i·· . Age Averagt Hourly earnings 90 • R() 70 ' ~}f • • • • • • • •• • • •• • • • • • I • • • 401• • • • • • • • • • • • • • • I •• • • I • I • "'lI • • • • •• • • • I ••• • •• I • •• • • • I .7 Scotlefplo~.YJ For example. II · ' . The colored clot corre~51o a dO·year·old worker who oorns $31 .. FIGURE 3.•.3 .. and Y" in wh ich each observation is represenled by the poin t (X. I •• • • •••• • I . Figure 3. The data are for technicians in the informotion industry from the March 2005 CPS.25 per hour: this worker"s age Clnd earnin gs are indicated by the colo red dot in Figure 3. however. and earnings could nOI be pre d icted perfectly using only a pe r~on '" agl::.25 per hour. and the Sample Correlotion 93 ! . the Sample Covariance. . .. .2 lf~ Scatterplot 01 Average Hourly Earnings Vi. E ach dot in Figure 3.. •• ••• . For example one o( the workers in this sample is 40 years old and earns $31..2 is a sca ne rplol of age (X) and hourly earnings (Y) for a sa mple of 200 workers in Ihe information industry from the March 2005 CPs. I • • • • I \0 I. Th is re lationship is nO! exact.2 corresponds 10 an (X.
The sample correlat ion equals I if X. A high correlation eodfiden t dOes not necessarily mean that Ihe line has a ~tl:!ep slope . If the line ~lopes down.)( Y. variancc.:0. Consistency o/the sample covariance and correlation . 1. ' I .:ova riancc.' (the c>. Thot is.X .94 CHA PTfR 3 Review of SIo~5ticS Sample Covariance and Correlation The covariance and correlat ion were introduced in Section 2. tht: closer is th e correlation 10 :: 1. . then there i~ a negative rdat ion<:hip a nd the correlalion i~ I. it makes Ii HIe diffcrL'nce whether division is by n or n . The closer the scanerplot is 10 a straight line.2') b computed by dividing by 1/ . . (3.: popula tion covariance and correlation can. The s1tmplc correla tion coe ffi cie nt. . More generally. this difference ste mS [rom usi ng X a nd Y to esti mate the respective popula tion mcans. i = 1.1\).. amI the correlation is I. rather. . . Whe n IJ is hLrge.:!: I if the scattcrplol is a straighlline. it means that the points in the snllh. for all i and equals I if X. the)" are computed by replacing a population "vcrag!. Th(.I instead of n: here. then thcre is a po~itive rela tionsh ip between X and }. I Y) .3 as two propertics of the joint prohability distri bution of the random variables X a nd Y.1'. is 'n° ~ 1/  L " (X. thl! correlation is . the sample covariance is consiMenl. Like the cstim:llors di~u~sed previously in this chap te r. The ~amplc cova riance a nd correlation arc estimators of the popula tion cova riance and corrc lation. too.). n .a mple correlation .\ ) nle sample correlation measures th e ~trl:!ngth of th e linear a~socjation be tween X and Y in a sample of II observations. Like the population correlation. The sa mple I. Y. how eve r.] and l: r\y l s: 1. or .:rplot fall very close to a straight line. denoted.. If the line slopes upward..pectat ion ) with a sample average.2.1. Like the sampk . be t!stimalcd by Inking tI random sample of /I members of thc population and collecting the data (X" Y. IS dcnoteJ r x)" and is the ratio of the ~am ple covariance to the sample sumdard d~\'iati ons: rn' =~) )' tn (3.24) Like thc sam pic va riance. in practice we do Dot know the population covariance or correlat ion. for all i. Because the population distribution is unknown. the sampk correlalion is unitle~) a nd lic~ betwcen ... == . the average In Equation (3.
have rinite fourth momcnlS is similar to Ihe proof in Appendix 3..20). suppose that earnings had been reponcd in ce nts. Figure 3. the rea.on is tha t.3b shows a strong negative relationship wilh a sample correlation of ..26) under the assumpt ion that (X. in which casl.25 means Ihal there is a posit ive rela tion:.h large and ~m a ll values of X. for these data. l3ecause the sample variance and sample oova riancc arc con<. and correlation . and is left as an exe rcise (Exe rcise 3.7 5<_pl"I>. A~ an example. that is. in large samples the sample covariance is close 10 the popula tio n covariance wilh high probahilily.3 gives additi onal e xa mples of scan erpJOl!. Figure 3. To verify Ihat the correlation does not dl:!pe nd o n the uni ts of measurement.75 X 1379) = O . but as is evident in the SClt u erplol.79) = 0. the sample st<tndard deviation of age is.i"!cn!.75 years and the sample standard deviation of earnings is.hip. Y.0.d. and Y. For these 200 workers.:. Fig ure 3.8.01 /(10.25 or 25% . Y in itiaUy increases but then decreases. The covariance between age and earnings is s"t.3d.). U.3c shows a scallerplot wit h no evident rela tions.11lUs.' . .orrelill ion of 0.01 (the units arc years x doUarll per hour.S13. Despih~ this discernable relationship between X a nd Y. small vlIlues of Y are associated with oo. r . a measure of linear association. i~ 3701 (u nit s a re years x cen ts/ho ur): the n the correlation is 3701/(1 0. and Ihe s. 111. Y.U" (3. the Sample Co.. the s. Figure 3. . nol readily interpretable).1mple correlation is zero. th is rela tionship ill far from perfecl.3d ~hows a clear relationshi p: A. and that X. Figure 3.1l1ere is a relationshi p in Figure 3.9. This final example emphasizes an important poinl:'i"ne correlation coefficie nt i. mple correlation is 0.Ion"" ~n 95 L.26) In othcr "orus. and the Sample ("".3a shows a strong positive linear relat ionship between these vnriables. Example.) are Li.and the sample correlation is zero.75 X 13. consider the data on age and earnmgs in Figure 3.2. proof of the result ill Equation (3.\1I = 10.79Ihour. X increases.anooce. thc correlation coefficient is r!lF = 37.l'l:.n ~ corr(X.: the sample sta n d ll rd devia tions of earn ings is 1379~Jhour und the covar iance bet ween nge ami earning:.. .2S or 25 %.hip between age and earnings. = 37. The (.. but it b not linear.. the sam ple correlation coefficient is ('on3blem.3 that lhe sample covariance is consistent.
. The s.0 .\h \ . " 7u 90 lfIO 110 120 Do ....3 ill. . .. .jJ.0 (qu:ldratlc ) The scolterpioo i.' " ' : .9 (b) Corro:hlll<.' . "''hen Y . 01 .. . .. ' I . ! . is :\O estimator of the population mcan. 1111 .... .0... ..m • . ' •.. Summary I . ' .. In Figure 3.30 oM 3. • . ..' . .Ill '" 01 70 1'10 t)O 100 110 t2t) 130 . .. Figures 3.: " • ' . .~ .. Y is unbiased: ...01 .3b show strong linear relationships between Xcod Y. .3d. ..'ll1lple average. . ' I ... .0..... . ..8 7 6 W 50 " ." . 6111 . .. In Figure 3./11. '. \. . '" •0 30 .: . ...1:' • '. " '" . 70 . 60 '0 ~.. ·.:...... 1"0.' ." . . ." . Y" arc LLtI . the two variables olw are unconeloted even though they ore related nonlirleOrly. i: . • .. 100 130 .. a. (d CorrcLmon . .'.. . . '" '"o 7" 70 80 90 100 110 120 130 . ': '. " '.0 30 301 20: Wi • tlO '. Y....3(.' ' .: :. . ' • \"1>" w 50 . the sampling dist ribution of Y has mea n JLy lind \ ariancc /r ~ = (T ~/ " : h. '.96 CHAPTfR 3 Review of StotitliC\ FIGURE 3.. . . .'" " :. w 10 01 70 . . c· '0 · .. Scotterplob for Four HypothetKal Data Seb . ..0 (d) Corrcl:lt K)o . 120 . (a) O mclation ..y. X is inde pendent 01 Yond rhe two voriobles orc uncorreloted.. KO .
ample covariance (9~ ) sample currelation coeffieienl (sample correhlli<1I1) (94) eau~al .: ~landaf(1 de"iation (75) degrees of frecdom (75) .'a tblic...lard error of 1I. The sample correla lio n cue[ficic nl is an estimator of Ihe population correlation coe ffi cient and measures the lini:!ur relationship be tween IWO varia blesthat ill. 2.. by the central limit theorem.n cslim. 'i"he ' lltatislic is ust!o 10 test the null hypolht!sis Ihal the popuilltion mean takes on a part icular value.ht line. If II is large. The (stat islic can ~ used to calculate the pva lue associatt:d wit h the nu ll hypoth esis. and efficiency (68) critical value (79 ) rejection region (79) . con<:i<. A 95°/() con fidence in terval for J. 4 . how well their scallcrplot is approximated by a st ra ig.Key Terms 91 c.Iam.ly is a n interval conslructr:u so thot 11 conlainb the true value of p. 3.te ncy.'I.s i~l c nl: and d .Ieeeptanee region (79) size o r a Ic<:1 (79) powc r (79) onellided al terna tive hypothesis (80) confi d e llc~ sct (81) confidencc Ic\'el (S I) confi de nce interval (HI) coveragc prohahility (83) t~ 1 for t he difrerencc hclween two means BLUE (69) lea<:t <. 6.. Y has an approximatcly normal samrling distribution when the <.a5.ample size il> largc. H ypolhesis lesls ami confidence intervals fo r Ihe difference in the means of twO popu lnt ions arc conceptually similar to tests and intervals for the llli:!aD of a sin gle popu la tio n. Key Terms e"timator (68 ) cS li ma t~ (68) l.lior (76) I~talistic «(ralio) (77) te". by the law of large Dumber~ Y is cOIl. !. the 'statistic has a standard normal sampling dis tril:>ulion when the null hypothesis is true.y in 95% of rcpealed samples. 5. A sma ll p va rue is evide nce Ihrl t th e null hypothesis is falsc.q uares estima tor (70) hypothesis test (71) null and a lternative hypotheses (72) twol>ided alternati\'e hypothesis (72) I)value (significa nce pro ha hilily) (73) "ample variance (75) "ampit.. (77) lype I error (79) type II error (79) "ignificl\nce level (79) (8] ) c{(ecl (85) treatmen t effect (85) scalterplut (93) <.
In a random sa mple of si7.0. Use t he centra l Limit th eorem answer the following questi ons: It) o. Re late your answers to . y = toO and a~ = 43.0.2 Explain the diHcrence between an estimator and an estimate. Ihlll v!lr(j1\ = n(l  . applied 1 0 data Crom a ran domized COntrolled experi me nt. Show that b. Le i p be the (ral" lion of successes (I s) in this sa mple. (c) 0. 3. is an estimatOr of the treatment effect.d.6 Wh y does a con fiuence interval contain mo re informa lion than the re:<.4 Wh "t role does th e central limit Lheorem play in stati stical hypoth esis test ing? In the construction of confidence intervals? 3. b.d. significa nce level.3 A population distribution hal' a mean of 10 and a variance of 16.ul t of a single hypot hesis les!'! 3. ~hn .PTER 3 Review 01 Stoli~liC5 Review the Concepts 3. find Pr(lDl c. Exercises 3.1. and (e) n = WOO. In a random sa mpl e of size n = 100.he law of large numbers. (b) . Cd) .. 3. In a random sample of size n = < Y < 103). 3.. y~ be U .. •' p = y. and power? Between a onesided and lwosidt:d alter· oative hypoUles is? 3. sample fro m this populat ion for (0) n = 10: (b) 11 = 100.i..2 Let Y be a Bernoulli random va ri ab le with success probability Prey = l) =< p. 165. find PreY > 98).e 11 = 64.7 Explain why the differe nces·ofmeans estimato r. . Provide an example o f each. 38 Sketch a hypothetical scatterplot for a sample of size 10 for two random vari· abies wilh a population correlalion of (a) 1. 3. .\/ ..O.5 Wh at is th e difference between a null and allernative hypothes is'? Among size.1 Explain the diffe rence between the sample average Y and the population mean. t In a popula tio n p.. De termine the mean and va riance o f Y from a ll i. find Pr(Y < 101).. and Je t Y . . a.9. draws from this distribution..0. Show that p is an unbiased esti mator of p..5: (elO.98 CHA.
0. Construct a 99% confidence interval for p.5 vs. nod let p be the fractio n of survey respondents who pre ferred the incumben t.3 In <t survey of 400 like ly vOIers. HI:p > 0.S uppose that yo u decide torejeci Ho if 1 ft . c.5 vs. Test Jio:p = 0. p denote the frac 3.S? d. HI: P j:.53. a. What is the p. HI :P > 0.5 Ys. H r: p "* 0. Construct a ~5<J/o confidence interval for p.02.5 vs. p= 0.54 . 3. What is the size of Ihis test? ii.51 > 0.Exercises 99 3. a. Let p denote the unction of ali likdy voters who preferred the incumbent at the time of the survey. and lei tion of Yoters in the sample who prefer Candidate A. In the survey j. Construct a 95% confidence interval for p.50 vs. .5 A survey of 105S registered voters is conducted . Let p denote Ihe fraction of voters in the popu lation who prefer candidate A .4 Using the data in Exercise 3. Use the estimator of the voriance of p. HI: P 0. iv.3: a.0(1 . 215 responded tbal tbey w(luld vote for the incumbent and 185 responded that they would yote for the challenge r. Construct a 50% confidence interval for p. . i. HI:P :#0. ii. Without doing any additional calculations.p)ln. Construct a 99% confidence inte rval for p. ={:.50 at the 5% significance level. an d the votcrs are asked to choose between candidate A and candidate B. b. b.5. Did the survey contain statistica lly significant evidence that the incum bent was ahe<ld of {he challenger at the time of the survey? E:\ plain. ". Why is the iolerval in ( b) wider than the in terva l in (a)? d. to calculate the standard error of your estimator.5 using it 5% significance level.value for (he test Ho:p = 0.5 using a 5% significance level. b. O. Test Ho:p = 0. test the hypothesis Ho: P = 0. You arc interested in the competing hypotheses: Ho: p = 0. Compute the power of this test if P = 0.5? e. iii. Why do th e rc~ ults from (c) and (d) differ? f. What is the pva lue for the leS1 Ho: P = 0. Use the sur vey results to estimate p. c.5 vs.
version of the SAT Icsl is given to H)(X) randoml)' selected high schoul seniors.. u.6 Lei Yl" ' " Yn be i.. she "'HI conclud e that the new p races:. cr of the plant manager'" h!sling . Supp~t. An inwnlOr cI"im ~ 10 have developed an impro"fo!d process that produces hu l h~ with a longer mean life an d the S<l me standard devitttion.cY". The sample mean 1110 and the sample 1Ilamlard de\ ia· tion is 123. \Vhal is the probability thai the true value of p is contained in all 20 of these l.: invenlor 's claim if the sample IDcan life of the bul b!! is grea te r th.:rwi~e. 11 % of the like ly VOlers arc A frican American . The plnnt l1lannger randoml~ St!lcCI!! 100 bulbs produced by the p rocess.... Ie!S I score i5 3.7 In it given population. She says that she will bt:licvc th. = 2()(X) vs.. Dues the 95% confi dence interval con lain J1 '" 5? Explain.100 CHAPTER 3 Review of Slatistics c. Let Il.. Suppose thallhc survey is carrieo out 20 lime.:: 20 ~ur.d.: thai the ne\\. fl 2100 h ours:othl.01) ~ 0.9 Suppose that a Lightbulb manufacturing. Can you determine if p.'Onfidence intervals? ii.hould 1/ h~ if the sur vey usc:'> simple random sampling? 3. 8% African Americans.i . In survey jil rgon.05. Is there e\ idence that the sur\ey IS biased? E xplain .. How many Qf lhese con fidence the true value of p? int~f\iab do you c '\:pcct to contain d. Fo r each of Ihes.'d Pr{ li) .:.denote the mean of the ne w process. the " margin of error" is 1. A sur vey using a simple r an<. A test o f Hu: J1 = 5 versus Ill: J1 "* 5 using the usual fstatj~tic yield" a ".process is in fact beller and has. 3. using independently selected voters in each survey. What is tbe po. Construct a 95% confidence inkrval for the population mea n test score for high school senion.8 A ne. plant produces bulhs with a mean life o f 2000 hours and a standard dc ~' jation of1oo hours. d ra ws from a d imibution with mean /l. il is ~ linH:!S the length of 95% co nfidence imerval.lom !!ample of 6(Xlland·li lle telephone numbers find:.pI > 0.. What is the sitc of the planl manager '~ testing p roce dure? h. 3. b.. l ll<'ll t~ you wnnh. a. ConsiJer tht: null ami alternative h ypothesi~ Hu: p. a 95~i) confidence interval fo r p is constructed.t mean hulb life of215O houf'. value of 0. Ho w la rge :. i. is no better than Ilk old process. "" {) is containcd in the 95 % confi<.96 x SE(p): that i'l.. Suppose you wnn tcd to design a survey thai hau a margin of error of a1 mO~1 I %.lcncc inter· \al? Explain..03. fl l : It > 2lWlO.
Show that (a) E(Y ) = J.\lOO $2900 $200 Women S:t20 64 n.dcfi ned in E q uation (3.2 and ~larHhtf(1 de"ial. The sa mple average itCorc Y on the test is Sf! points and Ihe sH mplc standard deviatio n. producing a sample average of 62 points and sample sta n· dard de.13 Data on fift h·grade test scores (reading and mathemafics) for 420 school Iriels in ("alifllrn..timalor y . y a nd (b) var( Y) = 1. 11 Consider Ihcc!'.:ics? Explain. What do these data sugge:. iation of 11 points.25(T ~. sr. d is~ . um· mary of Ihc resu lling monthly salnries follo ws: Averoge Solory (f ) Standard Deviation (sr) " 1 00 Men S.t about w<lgc d irfcrenec~ in Ihe firm ? Do Ihey repre~nt staListie<llly significan t evidence Ihat wages of men and women are d ifferent? (To answer this qu~ ~tion . Construct a 95% confident/. 12 To invcstiga te possible gende r d iscriminatio n in a firm .5 .l n. first slale the null and alternati\e hypothesis: second .mager use if .a sample of 100 men a nd 64 women with simila r job descri ptions <lrc selecled al random.: inte rval for Ihc mean score of all New Je rsey third graders. c. 10 Suppose a new standardized te:. 1). Suppose the same test is givc n (0200 randomly selected third grade rs fro m Iowa. he wants the si/e of he r test to be 5 %? 3. comput e the p value associated with the /'SllIlislie. b. Construct a 90% confi de nce interval fo r Ihe difference in mean scores between 10Wl! and New Jersey. 3. Th e authors plan 10 admini ster th e test to <III thirdgrade stude nts in New Jersey..a and New Jersey students are different'! (\Vhat IS Ihe Slandard error of the difference in the two SHmplc means? Wh<lt is the p . and finally usc the pvalue to answer the question. Can you conclude with a high degree of confidence that the populat ion means for 10.Exerci~ 101 c. a. compute thc relevant (statistic: third.) b..t is given to tOO r<lndomly selccted third· grade st uden ts in New Jersey.l yield Y "" 646.value of the le:.. is 8 poin t!\.on ~ y = 19.l. Whal lesti ng procedure should the pla nt m.. 3. A !'.. Do these data suggest Ih:lI thc firm is guilt}' of gender discri mina tion in its compe n ~a lion polir.\ of no difference in mean'\ veniUS some difference?) 3...
2004. and 350 re pon ed 11 preference (or Senator Jo hn Kerry. ConSlruCI a 95% con fidence iDlerval for the mean test population...surveycd 756 likely \ '0[ t: rs: 37l=i re pon ed a prefe re nce for Bush. b. 3.14 Va lues of heigh I in inches ( X ) and weigh! in pounds (Y) tl f e recorded fro m X a slImple of 300 male college stude nts.73 inches X Ibs. ~orc in t be 1>.3.. a.cd test a re known to have 3 mean of 1000 fo r studentS i.1 6 Grades on a sta nda rdi7. c. 5 inches: Y = J 58 Ihs. the mea n is 1013 and the sta ndard deviation (s) is 108.n the Un ited States.15 The CNNtUSA Today/Gallup poll cond uc ted o n Septe mber 35. a. . the {ollowi ng. Construct a 95% confide nce inte rva l for the frac tion of likely VO ters in the popula tion who favored Bush in t:a rly October 2004. When the djsuicts we re divided inl o districts wjlh small classes «20 students pe r leache r) a nd la rge classes (2:: 20 stude nts per leacbe r) . SXY = 21. Construct u 95 % confidence inteTVul fo r the average test score for Florida stude nts.2 Ibs. sur veyed 755 likely voters. Ckru Siq Score I Yj Stondard ~ lsrl " 238 182 Small Large: 65 i A 650. 1004. Bush. Con vert these stati stics to the me tric system (meters and kilograms). and r XY = 0. a nd 378 re ported a preference (or Ke rry. The test is administe red to 453 ran domly selected stu de nts in Florida: in this sample..9 [5 there s tatistica lly significa nt evidence Ihal the dist ricts wit h smaller classes have higher average test scores? Explain.. results were found : A.0 19. 1lle Cl\'N/U$ A 10day/GaUup poll conducted o n Octoher 1..4 17. 405 reported a prefere nce fo r Preside nt George w. 3.85. s x ::: I. Was there a statistically significant c ha nge in vote rs' opinions anoss the twO dates? 3. TIle resu lting summary sta tist ics aT e = 70. Construct a 95% co nfidence inte rval for the fraction of likely voters in the population who favored Bush in early September 2004.R inches: Sy = 14..102 CHAPTER J Review 01 Stati stics B.
18 n lis excrcise shows that the sample variance is an unbiased estimator of the popula tio n variance when Y i •.17 Read the box "The Ge nde r Ga p in Earni ngs of College Grad uates in the United States. I s there statistically sis nificant evidcnce Ihat the prep course helped? d... Use Equation (2. The o riginal 453 student s iJ re s iven the prep course and then asked to talce the test a second time. Construct a 95% confide nce interval fo r the change in men 's average hourly earnings between 1992 and 2004. it. . ii. i. Construct a 95% confidence in terval for the change in average test scores. Construct a 95 % confidence in terval for the change in the gender gap in average ho urly earnings between 1992 and 2004." iI. Construct a 95 % confidence interval ror the change in women's aver age hourly earn ings between 1992 and 2004 .2Il). Describe an experiment (hal would quantify these two effects. Ii. Another 503 students are selected at rando m from Florida.". Is there statistica lly significant evidence thaI Florida s tudents perform diffe rently than other slUdents in the United Stales? c.d . The average change io their test scores is 9 points.3 1) ( 0 show tbal E( Yi . ...I. c. 3. Is there statistica lly significa nt evidence th at students will perform beller on thei r second attempt after laking the prep course? iii.I91J2 .2004 .and the standard devialion of the cbange is 60 po int ~ i. Thei r average test score is 1019 with a standard deviation of 95.y)2 1= var(Yi ) 2cov(Y"Y) + var(Y) . Y" are U. b..) 3. S tudents may have performed hetter in their second att empt because of the prep course or because they gai ned tcst·taking experience in the ir first attempt. with mean tL y and variance tT~. I992 is independl!nt oCYm. (Him: Y.Exercise5 103 b.y..Y. 'TIley are given a threehour preparation course before the test is adminis tered.. Construct a 95% con fi dence interva l for the change io average test score associated with the prep course. .
with finite fOUl1h momtmt~ IL5? Prove that th.u\\·bc.')I given fo llowing Equat io n (3. 3. agl! 2534. Compute the sample mean for average hourly earnings (A H E) in 19l)~ and in 2004. where SXY is defined in Equation (3.)..20 Suppose that (X. adjust the 1992 data fo r the price innation that occurred between 1 992 and 20()4.\'Y ~ (1".l.) are i.truCl a 95 % confidencl! interval for the population means of A H E in 1992 and 2(1(). as the ir highe.3 and the Cauchy· Sch\\aru inequality.ing power from 1992102004. If you \\ cre interested in the change in workers' pu rchll:.3. d..l fo r the mc:m of A H E for high school graduates. Use the 2004 dlltll to con~truct a 95% confidence intcrv". 19 u. Is )72 an unbiased estima tor of . Con:. = n. R cpc <11 (a) but usc A IIE mca sured in real 2004 dollars ($2(X)4): that is...m. 1n 1992. A detai led descript ion is given in C PS92_04_ Descriplio n. would you use the re!tults fro m (a) or from (h)? Explain..24). Y is an unbiased estim:ltor of /J.d. Is yl a consi!. Con"truct a 95% conndcnc~' . lhat is.comlstock_wa tson you \\ill find a data iilc C I'S91_04 that con tai n) an extended version of the dmascl used in Table . ).h:!111 estimator 01 3. with a high school d iplo ma or G.33) 10 show that cov(¥.23) cqual$ the usual standard error fo r the d ifference in mean<.104 CHA PTf R 3 Review ol Statistics b. 11.i. 1I. Empirical Exercise E3.21 Show that the pooted standard error [SEp<WJI~d(Ym . "contains data on fulltiml!. c. l On the text We b site W W \".Y. fullye ar workers.t aDd the change hetween 1992 and 2{X)..S. Y is a consistent ~slimutor of JJy. c.' degree. )'.) 3.~'! b.)  u} f ll. In 2004. . the value o f the C PI was 140. in Equation (3.ults in parts (0) <lnd (b) to show thai E(si·) '"' (I"~. (llm/: Usc th~ strategy of Appendix 3.u..19) wht:n the two group sizes lJrl! the S<lnlC (1/"...:: sam· pie co\ ariancc is a con~ i s l cn l estimator of the population covariance. the value or the Consumer I'rice Index (CPI) was 18R9. Y. '1 of the text for the years 1992 and 2(04. avai lahl\! on the We h site.A. Usc the rc.. Use these data to answe r the following questions. Use Equation (2.Y.
.._ 1 . g. using appropriate l'~timalcs.h(fercnce between the IWO me:ln5.1 The U.. Repeat (d) using the 1\)92 data expressed in $2004. IS more delailed than in other month~ . S. . l abJe 3.c/eclcu.The sample I~ chosen by nmdomly select ing addresse~ from a datab<lsc of addresses from the I~ mo~t recent decenmal c<::nsus aug· mented with dllta on new housing units corntrucled afler the I:ISt census. including thi: level of emrlo)·mcnt. Current Population Survey l OS imen·al for the mean of AHE for workcrs with a college degree.1 presents information on the gender gap for college gradu ates.'! APPENDIX ___3 _."lbe exact random sampling scheme rather complica ted (firsl.S. Are there any notable differc.lIa are for fuil·tlme defined to be somehody emplovcd more than 35 hours per week for al least 4R weeks in the prevIous vcar.... More th<ln 50.ks que~ti(ln~ u~ing pre\'iou~ ~enr.tphical areas are random ly randoml~ ~Iected}:dctllils .(. .t 1 were computed workcr~ thl' M. Prepare a similar lable for high schOOl graduatcl' using the 1992 and 2~ data. in lithic .and earnings.lh. which provides datu on labor foret' ehnraeterisl!t'S of the population. Did real wages of college graduates increase? D id the gap between earnings of college and hig.'~ bt:1\\een the results for high ~chool and college gradualc.r.'als.S. f. confidence inter. Con struct a 95% confidcncc interval for the.tnd a.s. e. households are sun'eyed each month.h school graJuatcs increase? Explain.:u!.1bor Sllllistics in the U. Did rcal (inna tionadjusted) wages or hi~h school graduates increase from I Y92 to 2004" Explain. small geo.lrch The CPS earnings d.000 U.. Oepartment of Lahor conducts the "Curren t Popu lation Surve y" (CPS). and test sl3tis tics. go~) The ~Uf\C) conducted :aeh March Jbout earnmgs during the ~un'cy~.:n hoUSing units within Ihese = s are (WW\\ can he found in the Hallllh()tIK of Labor Sranstic5 and on thl' Bureau of LatlOr StatistiC"i Web site hl.. The statl~lic:. unemployment. Current Population Survey Fnch month the Bureau of L.lhe u..
~1  Y) + I1(P = L (Yi .lculu$ and one nOl . (Y. thai is.III) . N ."Slimalor of £(Y). thai Y is the leasl square~ t. Lei d =: Y . . .Y . T 2t(2 (Yi .l 2: (YI.([Y.. (3..m)2 . This is done by setting d = 0..m )2 is minimized by ch()().(Y.::!d) where th. I :. from which it follows tha i Y is Ihe leasl squ a r~ estimator.'Y)1 + 2deY.2 Two Proofs That Y Is the Least Squares Estimator of f. . . ndl. l n + d .(Y  be tert>. 1 " (3. one using c. lake its derivat ive lind set il iO zero: d ~ d L( Y m I_ I ' m = m)l = ... . 1(Yi .y) ~ .. II m)2 = L (Y.2)lhal is. the sum of squared predict ion mistakes IEqUnll0 n (3.1..=1 n . Because both terms in the final linc of Equat ion (3. .27 ) Solving fo r the final eq uation fo r m sho ws thaI ~:.Ly This appendix provides two proofs.Y] + d)2 "" ( Y.'\lng d to make the second term. by selting. CHAPTER 3 Review of Statistics APPENDIX 3.. Then (Y. .28) are nonnegative and because the first term does not depe nd on d.'D... thai Y minimw:s the sum of squared prediction mistakes in Equation (3. second equality uses the [act that I ~ 'l (Y'  Y) = O.Y .". . Calculus Proof To minimize the sum of squared prediction miSHlkcs. as small JS possible. so thil! dW . Thus.2 LY.I  ¥f' + II(J ~. .2)) is II L (Y.l lQr o f E(Y ).d.m.m)2 is minimized when y. + 211111:: O .so that Y is the least squan:s es tim. m . Noncalculus Proof The strategy is tashow that Ihe difference between the least squares esti mator and Y muSt III .
J. In addition.d.p. Because Y !:. • IV~ are i j. whe n YI.~..y.·1 J. and var( IV.I( Y' .l)) . Define Wi = (Y.l y 10 wrile (Yi initionaf I~ [Equation (3.). y) .d.  lIy)2 .29) p. ni (3. Thus IV. .29) where the fin a l equality follows (rom Ihede finit ion ofY [which impl ies that I:71( Y . W" art: i. we have that  y)2= [( Y. . .lIy)(Y .. SO ~ ~~. .1 ~ ' I II _ 1 . Y~ arc i.J'.: (Y.6 and \V ~ E(W. and E( Y ~) < ""'.. First.. £(Wl) . hy a~sumptio n ..7)]. y)F .. . a co n· sisten t est imator of the population varia nce uf·. Also.:. so the first term in Equation (3.)·f '·1 I II •(_ n )[! ± n.9). .. . The law of large numbers can now be applied to the twO terms in the final ti ne of E qua· lion (3. a s st ated in Equation (3.J. CY toul·. .. w(n  I) + I.~y)2 ~ 0 so Ihe second term conerges in probability to ze ro. Now £(W. Because the random variables Yl " .. add and subtract J.21}) ..y) l  ._ 311 A ProofThat the Sample Variance Is Consistent This a ppendix uses the law of large numbers to prove thM the sample variance $~ i!.2(Yj .  #.(.p.so IV satis fi es the condItIOns for the law of large num· ber<.\ L(Y.'L )')~ J < "" because.A Proof That the Sample Variance Is Consistent 107 APPENDIX __ 3 _.'.. Dut IV .:r L(Y .d.y)l II  1 .. the Tnndom variables W" .i..l n. In Key Concepl 2.~Yf.I(YI  /.t y)2./J.: u ~ (by the d~tinition of tht: variance).l yf Substituting this ex pression for (Y.p. J.) < "".y)] and by collecting terms. ...~ con\'erges In probability ul.l y) + (V .y)1 into Ih(: def· I • " • . ...d._l (V ' ~.)'l (" )(y.: E[(Y.ly)(Y .l y) +.L (Y .=j'" 2" L(Y.J. Y" are Li.) .4) 2 and E(Wi): ui·.(Y . £( Yi) < 00.! 2. Combining tbese results yields j~.n(Y . ~ u1...i.
PA RT TWO Fundamentals of Regression Analysis CHA PTER 4 Linear Regression with One Regressor Regression wilh a Single Regressor: Hypothesis Tests and Confidence Intervals Linear Regression with Multiple Regressors Hypothesis Tests and Confidence Intervals in Multiple Regression Nonlinear Regression Functions CHAPTER 5 CHAPTER 6 CHAPTeR 7 CIIAPTER 8 CHAPTER 9 Assessing Studies Based on Multiple Regression .
1 scores? You succc:. or earnings ). . or )'e:lf~ of schooling). using dala on c1a~s size~ and lest scores from different school districts. studen t test scores. dcmc:ntary :lehool classes: Wha t is the: effec t on its students' sta ndardized le:. to e~limate the dfec\ on Y of a unit change in X using. say. daS:. Y (Y being highway dealhs. size. 'nle economC"tric problem is to esti mate this slopeth at is. the slope 01' the line relating X lind V is an unknown <. duting class sizes by.This model postulates a linem rela tionshi p belwccn X and Y: the slope of the line relating X and Y is the efft!ci of a oneunit change in X on Y.scs: What your fut ure earnings? i~ Ihe e(feet o n All three of these questions are abom the unknown effect of changing one variable..'1t is the: dfcct on hLghway fatalities? A school dlSlrlCI eUls the S ize of 11'0.ough new penaltie. on a nolhe r variable.: Wh..s on drunk ~rivers.:haraetcristi e of the population join t distribution or X and Y. Just as the mean of Y is an unknown charaClerif'tie o f the population distribution of Y.sfully compielc one more year of college cla:. 111 . to a nothe r. For inslance. Y. a sample of data on these IWO vari:tb l t'~ ll1 is cha pler describes methods for estimating this ~Iope using a random sample of data on X and Y. 111ls chapler introduces the linea r regression model relat ing onc variable. X. X (X being penalties fOT d runk driving.ChAPTER 4 Linear Regression with One Regressor A S'3IC. one student per class. we show how to e"l imate the expected effeci on teM scores of r. The slope and the intercept of the line relating X and Y can be estimated by a mcthod called ordinary leasl squares (OLS) . imple men~ I.
If she hires the teachers.1 ) so that !J. J3c~s.) Suppose thai f3(1JwS.2: tb al j" )'ou would predict that lest scores wou ld ri~e by 1. If the superin tendent changes the class size by a certain amount.. (della) stands for "change In:'That i s. divided by the change in the class size. Thus.6.1 • 112 CHAPTIR 4 linear Regre~sioo wilh One Regl"e$SOf 4...TeslScore = f30""Si:t X Il ClassSiz. (J.TeslScore change in ClassSize . (3c:w&~" where tbe suh script " ClassSize" d istinguishes the effect o f changing tbe class size (rom otbt:r effects. If you were lucky enough to know (3climSlw You wouJd be able to tell the super· intendent that decreasing class size by ooe studenl wou ld change dist rict wide tesl scores by f3 C/a.u is the change in the test score that results from changing the class size. what will the effect be on standa rdized test scores in her d istrict? A precise answer to this qu estion requires a quan titative stat ement abo ut changes.Si:t' You could also answer the superintendent 's actual question.1 The Linear Regression Model The superintendent of an elementary school dislrict must decide whether to hire atlditional teachers and she wants your advice..) X ( 2) . she will reduce the number of students per teacber (the student.. To do so. In many school di strict~ student performance is measured by standardi7ed tests. Parents wan I smaller classes so that their ch ildren can rccci\e more individualized attent ion. wha t wiu Ihe effect be o n slUdent performance? '.e. But hiring more teachers means spending more money.(. J3CI.. 1. which is nOI (0 the liking of those paying the bill! So she asks you: I f she eUl s class sizes.ll where the Greek letter. ClassSize ' ('. Thcn a red uction in class size of IWO st udentS pttf d ass would yield a pred icted change in test scores of (0.teacher ratio) by Iwo.6. what would she expect the change in standardized lest scores to he? We can write this as a maLheOlalical relationship using the Gree k let ter heta . and the job status or pay of some administrators can depend in part on how welllheir student s do on these tests. Sill! Caces a tradeoCe. We th t:refore sharpltn the superintendent 's question: If sh.. rearrange Equution (4.:r = 0.6. which concerned changing class size by two st udents per class.2 points as a resu lt or the redll(' (iew in class sizc!> by two siudents per class.e reduces the average class size by twO student s. ..ssSi~r = change in TestScore _ .6.
we sim ply lump all these "other [<lC tors" togethe r and write the relationsh ip for a given district as Tes(Score = f30 + f3CltJS.3) 10 the superinte ndent. for all these reasons.of course. how lucky the students were on test day). quality of their teachers. 1) is the definit io n o f the slope of a )t raight line relating test scores and class size. For now. if you knew flo and f3cw.4) ThUs. This straight li ne can be writte n TesrScore = f30 + f3 C/<U. the idea expressed in Equation (4.JsSize. test sco res. When you propose Equatio n (4.3) whe re f30 is the intercept of Ihis straight line. it shouJd be viewed as a statcme n! a bout a re la tio nship that holds 011 overage across the population of districts. One di<.•.3) (an idea we re turn to in Chapter 6).'~:~ X ClnssSize.4) is much more geneml.'iI~t. including each district's unique characte ristics (for example. (4. Instead. as before. howeve r. f3CIll»SI~t is the slope. (4.3). A lthough this discussion has focused on test scores and class size. a nd.t rict mieht h ~ te r teachers differe nt test scores for essentially sons having todowith the performa nce of the individual s\udc ntson Iheday ofthe test.'\i~~ x C/assSize + o lher factors.. She points oUl th3t class size is just oDe of many face ts of elementar y educalion. t he leSI score fo r the districi is writte n in te rms o f o ne co mponent. so it is useful to introduce more • .4. f30 + {JC/lu. t The linear R egrenion ~r 113 Equation (4. b<lckground of lheir stude nts. She is right.. One approach would be to list the most important factors and to introd uce the m explicit ly into Equation (4. not only would you be able to determ ine the change in test scores a t a district associated with a change in class size.3) will not hold exactly fo r all districts.'il~~ x CIt. but you also would be able to predic t the average test score itself fo r a given class size. Equation (4. a nd Ihal lwo districts with the same class sizes wi U ha ve dilfe re nttest scores for many reasons. that re presents the :weruge e ffect of class size on scores in the populat ion of school districts and a second component that represents all other factors.. According to Equa tion (4. she te lls you Ihat some tbing is wrong with Ihis formulation.. A versio n of this linear rela tionship that holds for tile" distric! must incorpo ra te these othe r faclors innuencing.
14. denote the ot h ~ r factors innuenci ng the t C~1 score In the i 'b district..4) can be written more generally as Y. Then Equa tion (4 ."" district. = f3a + /3 IX.. be t he average test score in the . . let X. Suppose yo u have a sample of " district".. Let Y. be the average class SILt? in the . and let fI .114 CHAPTfR 4 linear Regression with One Regressor ge neral notation. (45 ) ."" dislrict.
.. . .1'J" II KEY CONCfPT :_ _ 4. . Figure 4.linear regression model is 'Jl'I~ TERMINOLOGY FOR THE LINEAR REGRESSION ~jr. 11 . Y\. The population regression line slopes down (f31 < 0). the regres. the regress(I1ld.. runs over obs~rvations. 1"<" . . bul. and III is the error Icrm. + U i .. as mentioned earlier. the value of Y for dbtrict ttl. = . MODel WITH A SINGLE REGRESSOR I. The intercept 130 has a mathe matical meaning as the value of the Y axis intersected by the population regres sion line.. ..4. the hypotheti cal observa tions in Figure 4. or simply the righl·hand varia/)/e: f:3o + fJJX is the population regression line or poplI/miol! regression funclioll: f10 is t he inlercepl of the population regression line: /31 is the s/opt of the population regression line.. In con· lTasl. .wr.:. . . is above the population regression linc. 111is mea ns that test scores in district I#l were bette r than predicted by the population regression line. which means that districts with lowel" studentteacher ratios (small er classes) tend to have higher tesLscores. Yi is the depelUlf:1Il variable. or simply the left·hand varit/bll!. For exam ple. so the e rror lenn for that dist rict. Because of the other factors that determine test per{onnance..teacher ratio by two stu dents pe r teacher? The answer is easy: Th e expected change is (2) x f3 C/IlSSS...1 The Uneor Regression Model 115 ~ .1 Yj = fJo + {3 !X .1 do not fall eX<lctly on the population regression line. i = 1. .1 summarizes the linear regression model with a single regressor fo r seven hypoth etica l observations on lest scores (Y ) and class size (X). Now return to your probl em as advisor lO the superin tendent: What is the ex pected eHect on test scores of reducing the student. where Ibe subscript. The popu · lation regression line is the straight line f30 + " IX. Y2 is below the populalion regression linc. and U2 < O.ze' But what is the value o f {3 ClimS. X j is the illdepcndell1 variable.so lest scores for thai district were worse th an predicted. ii i' is positive.:r? ~ • . it has no realworld meHning in this example.
we must usc data to el>timate the unknown slope and intercept of the popu· lation reg ression line.2 Estimating the Coefficients of the Linear Regression Model In a practical situa tion... possible 1\) lc:un about thl! population mean u~ing a sample of data drawn from thai . We do not know th\! population va lue of i3ulW$.11 6 CHAPTER 4 linear Regnmlon with One Regressor FIGURE 4 .Y I) " " • • • <>Ill 6:!lI 6001L_ _ _~_ _ _'_ _ _'_ _ _J 10 15 20 25 )0 Stutl entteac h tr rauo (X) 4.(~I . But jus t as it \\a. Although the populalion mean earn ings ur.:c) and }' ( tcst score ..l you want to compare the mean earning"S of men and women \\ ho recenlly grad u.fllf example. is the average earmngs or the feln. observation. whieh is !he population error term u. This estim ation problem is similar to others you have faced in statistic. i\)O li nc relating X (class si. we can estimate the populat ion mean s using a random samph: of milk nnd fema le college graduates. 'rne same idea extends to the linear regression model.. LBo + PIX). lhe intercept 1311 anlll>lopc f3 1 of the popula tion regression line are unknown.. for the . Thert: · fore.grt:s. unknown. veriicol distance from the jl. (XI.her Ratio (Hypothetical Dotal Test sco re (Y) 7j~) The =1Ieq>Ioi >how.even school dislricb... Then the natural estimator of the un kno\\ n popu lution menn earnings for \~ ome n.ISS size lind tcst seorel>..:.. 1 Scatter plot of Test Score n.nuJ uatcs in the sample. SUPPOSt..' the slope or the unk.nown popula tion n. n.lIed from college. (or example. such as t he applica tion to cI . hypothetical observations for s. StvdentTeoc. point to the popukrlion regreuion line is Y. """ (. ).. The population regression line is/Jo + PIX.'lt" college g.
teacher r.3).. The data we analyze here consist of test scores find class siz~ in 1999 in 420 California school districts thai ser\~ kindergarten through eighth grade.2 E~rimoting the Coefficients 01 the linear Regression Mode! 117 T... __ '. k h Un 10 ' t hu l sample.. While this method is easy.4. e~c d<l .that is.. there arc other determinants of test scores that keep the observat ions from fa lling perfectly aiong a sl raight line. then the sl o~ o f this line \l. I 654.9 students per teacher... . One way to draw the line .9 19.ul£ 4 ..6 6652 1.ould be an estimate of {3CIIU><.hown in Figure 4.3 (that is. The samph! correlation is 0. then. 1 Summary of the Distribution of Studen.Ita ta arc described describel n Appendix ". The test score is the districtwide average of reading and math scores for fifth graders.9 90% 19."Ores and the ~Iudenltcache r r:llio is ::.3tios belo w 17. if one cou ld somehow draw a srraight line through these data.6 640.mkr using a sam ple of datil .9 re'>t ~Lore 630. it is \ery ullscien ti fic and different people will creale different e!)tima lcd lines.Teacher R atios and f jftflGrode Test Scores for 420 K8 Districts in Califomio in 1998 Av" rQg.] Percentile (median) ""' 60% "" 10.6 ~ l udcnls per teacher and the standard deviation is 1.7 679. A scatterplol of these 420 ob~ rval ion s on test M. while the district 3t the 901h percentile has a studentteacher ratio or 21.5 ""'.U· ~llc ~ r o .. .3 197 ZIl. The .Ii . the district divided by Ihe number of II is.l MilA 21.1 popu]'Hion. • . indicating a weak nega live relationship betwee n the 1\1. SO is it possible to lea rn about the population <.l.1ope {3( 't.• . JIJ. How.2. Although larger da~~cs in this sample tend 10 have lower test scores.1 11.23. only 10% of districts have SiU· dent.0 variables.'~' based on these data. to use the ord inary least squares (OLS ) estunotor... • . which is Ihc no ._ .4 MI}. the districlwide student teacher rauo.ould be to take out a pencil and a ruler and 10 ·'eyeball"' the besl line ).. "hou ld you choose among the many possible lines" By far th e most common way is 10 choose the line thai produces th e ··Ieast squares" fi l to Ih c~c data. Class size ca n be measured in various ways. "i1UJ<:nt~ leather ratio Standard Deviation '0% "" 18. The IOIIa percentile of the distribu tion of the stud enlteacher ratio is 17.ou cou ld. The measure u~t!d here is one of Ihe broadest. Despi te th is low correlation.Ivcrllge studentteacher rati o is 19.9.
so the value of Y."$..'... Y minimizes the 10lal squa red estimalion • mistakes L~_ I( Y."I :'./) lX i. that minimizes Ihe expression (3.b L o b. . As discussed in Section 3. " .1 " (Y..' 10 ____~~L~~____~~~~~~~______~~ 15 2() 25 30 Stud e. .l...ttl . is 0... 1'). ll1e sum of these squarcd prediction mistakes ovcr all n observa tions i~ .j'.'... .. .mJ2 among all possible eSlima tors m (see expressio n (3_ 2)J . . .. 'i.) .. the mistake made in predicting the j1 11 observation is Yi .2)..I •••'.'':... .. • . '. . . .. The OLS estimator extends this idea to the linear regression model. Studen.. .I !...6» ).....he least squares estima to r of the popu lation mean.t. _ .. is !. . In fact..st SCO ff no There i~ 0 week negative relotion~ip 700 ""') between the student teocher rotio one! leU scores: The sample correlo~oo .' l"'.X...) == Y/ .... . .. . .e mean in expression (3. £(Y).~ •. .420 Coli· fernio ~hOOI diitrim.(bo + bI X .\.2). 1..o (California School District Data) Dolo from .2). . •• . the sample average. ' '. Y.• 118 CHAPTIR 4 Linear Regreuion with One Regressor fiGURE 4 . .20 &O...6). Th us... bo in expression (4. .6) is the extension of the sum of the squared mistak es for the problem of estimating th. " '.·:t"' . " " " The Ordinary Least Squares Estimator T he OLS est imator chooses the regression coefficients so that the estimated regression line is as close as possi ble to the observed daiS.ntt each er ratio . Let ho and hI be some estimators of I3u and f3 1• The regression line based on these csll mators is ho + b l X.: .. . (4..:f...23. Just as there is:1 unique estimato r. • Of''' ".:..' ... so is there a unique pair of estimators o f 130 and f3 1 that minimize expression (4. 660 640 (. ..~.2 ScatterpIot of Test Score v.••Q: .~I~ .h. that is... if there is no regressor. to '...6) The sum of the squared mistakes for the linear regression mode l in expres sion (4.. . .... T.. .. '. whe re close ness is mel· sUTed by the sum of the squared miswkes made in predicting Y given X. Y.Teochef' Rat... predicted using this line is bo + b l X . ". .6) and the two problems nre identical except for the different notalion (m in expression (3. then III does not e nte r expressio n (4. .:"' ..
This method would be quite tedious.. These form ulas arc im pl emented in virtually all sta tistical and spreadsheet programs. slope (~I)' and residual (iii) are computed from a sample of " o bserva tions of X.6) usi ng calculus.v.80 and . These f()fmulas arc dcrived in Appendix 4.6) In cxpres )roblem of m.81 is denoted P I' ~c OLA S regress ion tine is the straigh t line constructed using the OLS estimators: f3Jl + 8.10) 30 ra tio j'i = ..2 (4..81 by tTying diffe rent \'alues oC bu and b 1 repeatedly until you find those tha t minim ize the total squared mis wkes in expression (4. "' II (4 . Fo rtuna te ly there a re form ulas.. . The OLS formulas and terminology are collected in Key Concept 4. are y.limatcd . .im iz ing ex pression (4.6) are ca lle d the ordinary least squares (OLS) estima tors of .8 1. . Y.2..2 A _ 'lXi·ThuS.2)]. slope (13 1) . and error term (uJ s estima 'ilimation (3. f30 + {3\Xi• Yj i == I. el. .80 is denoted 130.PIX' . and tbe OLS estima to r o f . dUl l strea mlin e the calculation 01" the OLS estimato rs. \. AND RESIDUALS KEY CONe"" iI The O LS estima tors of the slope p.9) (4. L (Xi .7 ) 1'1 f'i _ i~l " (X.t X)(Y. then b1 :ept fo r the IS there is :t unique pair • .. . Le i /}o hese esti = The estimat ors of the intercept and slope thai minimize the sum of squa red mistakes in expression (4. YJ . derived by min.X.. These are estimates of the unknown true population intercept ({3o). and Yioi = 1. i = 1. L . and the intercept P il are 4.4.8) The aLS predicted vulues Vi and residuals iI...}J.2 Estimating the Coefficienb 01 the lineor Regre~sion Model 119 ~ THE OlS ESTIMATOR. II. (4. II. =1 PO= Y .X )! Y) _ SXY . . The 'ili bet ween Y/ a nd its predicted resid ual f' a1ue: Iii "" Y.hu 'alions is (4. You could compute the OLS estimators f30 and .5 IS mea The esti mated intercept ({3o). . OLS has liS own specia l notation a nd terminology. PREDICTED VALUES. The OLS estimator of . = . however.6) : they are the le ast squares estimates.
'_~ '" • "" ..... .~..lhc estimated slope is 2.(X•. ' . figure 4...56 poi nt.p belween te~t KOfe~ and the student teacher ratio..._ _ _ ~=~_~_~...1..... the OLS regrc.Ieacher ralio... "'" ...1 ' II ' . '. . ".18 and tht: estimated inter~pt is 698..... "..9. for a dbt rict with 20 student:.. !he e1limoted regres sion predim IMI las! ~e~ will increcne by 2. per teacher. TIle slope of 2 ...2.228 X STH.:.. on average.3 The Estimated Regression line for the California Data Tell ...'C .: relu ting the student. . 111e symbol . studenH cachcr ratio.. .ion line for these '20 II. 28 points o n the test. ~ :: 6989. r ' ... .._ ..... observations is TeslScore = 698. by 2... For example.. 28)1 "0 $ :egaliiC sls..... ~ (4. ..11) where Tes/Score is the ave rage {eM score in Ihe district lind STR is tht: s tudent. .. Accordingly. ' 5111 ~ '!... j J "1 ~ •• . tf doss !>izes fall by 1 stvdent.~cor(' The emmoted regres sion line show$ a 720 71ltl negotive relatian~. 'kdecrcass ip .. ~" over T."..\.!..28 • .PIS p~r tcacher flarger classes) is associated with poorer pe rformance o n the test IS now possible 10 prediclthc di'Slriclwide lest score gl\~a lu e of tht. .': ". . . ' 'S ' he ' mere 5"':!(.• 120 CHAPUR 4 linear R egression with One RegresS()( OLS Estimates of the Relationship Between Test Scores and the Student.' 2"" ~( J Slud"ntleacher nuio .tht: FIGURE 4....9 .28 means that an increase in the st udent." . • '. .28 points.7) indicalc ~ Ihallhis is the predicted value based on the O LS regression line.\ISCOrt' in E4u31ion (4.~.be >'II"cR'dCa r h gr prje AI' 2 st lldent." • ' •• .. fl:?l) . pe r cl ass is 99 uyepve 2§SOCIiIIC" with a n in£[CilW in I "S I scows pf 4.. " (. ' " 'I D" .teacher ratio 10 le~1 scores using the 420 obse rvations in Figure 4.~ Mil '.teacher ratio by ont' ~tudcn t pcrclass is.'C' •• 1": ·. . . ~I~_~_ _ _ ~~_ _ 10 15 ~~:'_. .?~.2.H':' _ '.... .... associated with a decline in district wide l~lIt score:.'It '. . .: .. 21\ " ~ (..flo (>(..3 plots this OLS regress ion line superi mposed over Ihe scatterplolo[ the data previo usly ~ how n in Figure 4...2.·..{I . 1 = 2 X ( .Teacher Ratio When OLS is used \0 estimate a lin.
.on Model 121 predicted tcst score is 698. the smallest studentteacher ratio in these dala is 14.28 X 20 = 653. at leasl. a decrease in class size t.hat would place her district close to the 10% with thc smallest classes would movc her test scores from the 50tll to the 60 Ih percentile.7 to 17.~s perform . so these data alonc lire nOI a re liable basis for prcdict ing Ihe effect of a radical mO\'e to such an ext reme ly low student. ThUs. Is this estimate o f the slope large or small'! To answer this. A reduction of2 Students per class. Why Use the OLS Estimator? There are both practical and theoretical reasons to use lhe OLS estimalQrs f30 and ~]' Because OLS is the dominant melhod used in practice. 654.4 . Presenting results using OLS (or its vari ants discussed tater in this boo k) mC'lrlS that you ure "spea king the same language" w •• • .teacher ratio by a largc amounl (2 students per teacher) would help and might be worth doing depe nding on her budgetary situ· ation.culling the student. from 19. we return to the superintendent's problem. such as reducing the stude ntteacher ratio from 20 students per leacher to 5? Unfortunately. based on their studc ntteacher ratio. Of course. and sht! would need to hire man)' new teachers. this improvement would move her dist rict from the median to just shan of the 6(Jlh percentile. These data con tain no infor mation on how districts with ext remely small clas.2 E stimating !he Coefficients of the Linear Regrew.7.9 .teacher ra tio.5. finance (see Ihe box). From Table 4. but it would not be a panacea.11). Suppose ht!r d istrict is at the median of the California districts. they are predicted to increase to 659.1. wou ld move her studenHeacher ratio from the 50th percentile to very ncar the 10111 percentile. How would it affect test scores? Ac.6 points: if her district's test scores are at the median. cutting the s tud en t U~acher ratio by 2 is pre dicted to increase test scores by approximately 4.11) would not be very usefu l to her. and the social sciences more generally.1 . Recall that she is contem plating hiring enough teach· ers to reduce the studentteache r ratio by 2. 'Ihis regression was estimated using Ihe data ill Figure 4.2 . This is a big change.5. Wh at if the superintendent were contemplatin g a far more radica l change.2.7 and the median test score is 654. Is this improve mentlarge or small? According to Table 4.1 . and as the figure shows. the median studentIeacher rat io is 19. absent those other factors. this prediction will not be exactly right because of the other factors that determ ine a district's pcr formance.3. BUlthe regression line docs give a prediction (the OLS prediction) of wllat test scores would be for that district. According to these esti mat es.::ording to Equation (4. it has become the com Illon language for regression analysis throughout economics. the estimates in Equation (4..
! other stocks in a "portrolio" in olher word~ by diversify· illg you r fin ancial hllidi ngs..xces.S.. btimoted IJ O. studied in Section 3.O~ The capital ai>.01 Y a!o un est imator l. CAP. mu. can he reduced by holdin).Mart (discount retailer) Waste. This me~l1I~ Ihat thc right way to (mce U.Ill spread !ohcc t a nd s ta tistical software packag.! . ~·url· logg have Stock:!. According to the l' xce!:>~ e".:d return l on a ri~ky inve~lmcnt.Th('" f<. 1. ret urn.::>s retu rn.! c x~cd the rcturn un a S<1fe. ri~k ier than the market pMt U linancial incenti\'c to tak e (l ril. . Thus thc exc. or cx~('ted The "hcla" of :l qock has l:'Iecome a workhorse of tho: iD n:..O.. The table below gi\'cs estimate:d p's for stocks. R ..M folio and /3 c. In 0.f the population mean.e! pncing model l('APM ) formal· lZes Ihh idea.R / ).. expected i. making OLS easy to usc. and you C Dn oblBin ei>li· mated fJ·~ risk·free.lhc Iff>. (3(1< "."peclCd Compony return on an asset is pro].~ r('tu m on a portfolio of iJlI a.12) whcre R". wilh [Oil' {h: riskier technology hal e high but rathe r or its 1·(11 /lria/ICt' ""jlh the market. rolio and thu s eomaods a higher C\rec1cd e:\fX:>:. 3 stock "'illl a fJ < I hil~ bs risk Ihlln Ihe ma rket ponfolio lind therefore has a lower expected excess rcturn than the mar!... ~tocks L o\\" ri~k ~ix ....50]/$JU(l 7. Under tbe 3$SUmplions introducted in Section ...\. which then pmd 11 $2.:. 0. ~hC)ul d be po<. Sm.122 CH APTfR 4 lineorRegrenionwitno. These .\f.R...ll· ogous 10 the de~ira blc properties. arc c~limated hy At fi r$t it might ~hould ~ecm like the ri~k of a stock I'ariance. The QU. imcstmclII. ken 10 be the ra te of interest on d~'bl.· ~ion of R .:! 1..Ire an. .... H.\umplc.u.:0nlra5t...!!t portfolio. m I.l'hort·tcnn U.ailahlc as~ts Thai i!\. (4. like owmng stock in a company. thc CAPM says tbat R R.)mtionul 10 Ihe (the ·'nlllrke t portfolio~ ).... for hundred5 of :.quipmcnl ret..lment industry...l5 2. the risk·rree return I~ often t. The OLS formulas are built into \'irtual1\" . in\'e$tmcnl.:! mar ket pori Kellogg (breakfasl cereal) \Val.llkr) Ama?Qn (online retailer) . H..lhe c:I:pcct. M uch or Ihlll be mea~ured by il~ OLS re!!ressioll of the actual CXCC~) return on the stock again~t the actual excess ret urn on a broad markcl index.eRegres~ A fundamenlalldca uf modern fina nce is that an inveslor nc:c:d~ a stock with iI {J > I 1<.fAthe OLS _ __ . .17 2.W! di\l\knd dUl'ln~ the year and ~lU on Ikccmt>cr 31 [or$J05 . .es.. a Mod buu~ht On January I (or 5100. government According to Ihi.itin:. .locb on in\'cslment Those /3's I~pically fiml Web ~iti:r..RI' on a risk). ho wcver. is the: ex~C'tcd rdum on Ih. CAP~l.ould ha\(.ur. Solid di ffcre nll~ . r• In practice.7K I.5% p!u..65 U.~ as other economi!ots and statisticians.. on R".R..\1anagcmcnt (\\iI~te di~ptJ.' a relOm (If H [(S105 $1(0) + $2.. estimators also have d esirable theoretical properties.lurn on (In in~C!I\m~\l( is the chRnge in itl! pnct any payout Idil'ldend) frum !he ill" estlllent ItS" pcl ccnillge of ill> initial priet:o For c.s..' t"1)l!frlCient in thc population rcgre:. consumer producl!I (irm~ like Kd mCil~u re th¢ risk of a siock i~ not by i1:.70 U. .. Iht..al) Spri nt Nextd (tdecommUntC3lllms) nam6 and '\ohle (hook retaikr) \lim~fl (software) Bcsi Bur (electronic I.k.
or arc they spread out? 11lt... . and furtherdi~ussion of thi~ result is deferr~d until Section 5. the R2 is the ral io o f Ihe sample \arianee of Y. I (4. thai is expbined by X.2) ~l1ow us to write the dependellt variable Y. typically is from it5 predicted . Y. you might wonder h<. Ihe R2 can he written as the ratio or the e xplained sum of squares to the total sum of square. TheR2 The regression R2 is the (raclion of Ihe sample va ria nce of Y.ell the OLS regression line filS the dala. from their avcrage. as Ih e sum of the pre dkled value. plus the residual i : u Y.j')'.:diclcd by) X . tightly c1u~tered around the regression linc. this efficiency result hold::.. The slaodl"lrd error of the r. explained by (or pn.) " . ):. R1 and the standard e rror of the regression measure how "".ures Ihe frac tion of the variance of Y. .Y)2 .lw well that regression line describes the data.. ESS = ~ (Y.4.\.3 Measures of Fit H aving estimated a !ineur regression. 4. TSS = ~(l~ . however. Does the regressor account for much or for little of the varhuioll in the dependent variable? Are the obscnation. 3 Meonlr~ of Fit 123 C:. 'alue... M. and the total s lim o(squllres (TSS) is the sum of squared dc\iulions of 1'. "" ~ + iiI' (4.timator is unbiased and consbtcnLThe O LS e::.llhemalically.~ The eXI. The definitions of Ihe predicled value find the residual (~ce Key Concept 4..lhe Rl r anges between 0 and I and mea.3). 14) uses 11ll: (.timator is also efficient among a ~crtain class of unbiased estimators.Jained sum or ~qullres (ES~l is thc sum of squared deviations of the predicted values of 1'.· 1 (4. " .::gres sion measures how fa r Y.1. under sollle IIddilional special conditions. from its 8\'Crage: In Ihis notalion.15) Equation (4 .5 ...1l ) 10 the sample vari ance of Y. .el thaI the sllmplc average O LS predicted valuc equals Y (provcn ill Appendix 4.
i~l (4. nOI e xpill ined by X i. In coolTas!. t l i = 0). so Iha t ESS = TSS a nd R2 = 1. In genera l. or SSR. the n the SER meas ures the m agnit ude o f a typi cal devia tio n from the regressio n line. If ~ l = 0.in dolla rs.124 CHAPTER 4 linear Regreision with One Regressor The R1 is tht: ralio of lhe explained sum of sq uares to the total sum of squares R2 = E SS T SS' (4. thus the R2 is ze ro." sq ua red OLS residuals: SSR = " 2.ile <I n R2 nel:lr 0 indic a t~ " th at the regresso r is not very good at predictin g Yi . is the sum of Iht. fo r a ll i a nd every residual is ze ro (that is. The Standard Error ofthe Regression The sta nda rd e rror of the regression (SER ) is an es timator of the stan da rd deviation of the regression error iii .. th e R2 of Ule regression of Yon the single regressor X is !. if XI expla ins a ll of the variatio n of Y. Thus the RZ a lso can be a tbe tOlal sum o t expressed as 1 minus the ratio of the sum of squared residua ls L squares: R2 =t _ SSR TSS' (4. . then Y.17) It is sho wn in A ppendix 4. are the same.3 th at TSS = ESS . O1ed sured in the units of the dependen t variable. For example. The units of 1(. Ii. and Y.. so the SER is a measure of the spread of the o bservations aro und th e regr ession lin e.he S<. In this case.16) A lte rn'Hively. if th e units of the depen de nt va ri able are dollars.. The s um of squa red residuals.SS R... the n Xi expl ains no ne of tbe va ria tio n of Y. tbe R2 does nut take 00 the extre me values of 0 or 1 but fa lls somewhe re in be tween.)ua re of the co rre lati o n coeffi cient between Y a nd X The R2 ranges be twe e n 0 and 1. the expla ined sum o [ sq uares is ze ro <lnd the sum of sq ua red resid uals e q uals the to ta l sum of sq ua res. Ihe magnitude of a typical regressio n e rro r.1 8) Fina lly. a nd the pred icted va lue o f Yi based o n the regressio n is jusl the sample average o f Y.that is. the R2 can be wrin e n in te rms o f the fraction of the variance o f Y. = Y . A n Rl near 1 indicales that the regressor is good at predicting Y" wh.
Y in Equation (3. The R2 of this regression is 0.19) whe re the form ula fo r uses the fact ( proven in Appendix 4.. and the S£R is 18. it. imply Ihilt this . whe re the units are points on the standardized lest.1 % of the va riance ot" th e dependent va riable TwScore. Figu re 4.or hy 11 2 is negligible. but much varia lion remains unaccounted for. the Stll dent.I U (4.11) reports the regression line. estimated usi ng the California test score data.l. o r 5.. fil e difference between dividing by n. SSR = So.2 here (instead of II) is the same as the reason C or usi ng the divisor II .4.3 superimposes this regression line on the scalterplot of the Tes/Senre and STH data .. .7) is " . The formula for the SER in Equation (4.) When n is large. whereas here it is n . 3 MeowrM 01 Fit 125 Because the regression errors U I••.teachcr ratio fo r that district will often be wrong by a large am ount.3) Ihat the sample average o f the OLS residuals is zero. This is called a "degrees of fn:ed om" correction..051 means Ihal thl: regressor STR explains 5. the SER of 18..6 means that there is a large spread of the scalterplot in Figure 4.051..teacher ratio (STR).(.un arc unobserved. The S£R of 18.1 %.3 around the regression line as measured in points on the lest.7): It corrects for a slight downwa rd bias introduced beca use two regression coefficie nts were estimated. As the sc3 uerplot shows. (The mathematics behind lhis is discussed in Section 5..the OLS resid uals u\.I in Equation (3. except that Y i . This large spread means that predictions of test scores made using only the sludent. relating the standardized test score (TesIScort!) 10 the student. The reason for using the divisor IJ .7) is replaced by it" and th e dh'isor in Eq uation (3. si Application to the Test Score Data Equation (4.7) in Section 3.thc ScR is computed using their sample counterparts.. Tbe form ula for the S£R is SER 1 ~.6. Because the standard deviation is a measure of spread.2. by itself. What shou ld we make of this low R2 and large SER? Th e fact that the R Zof this regress ion is low (and Ihe S£R is large) does nOI. two "degrees of freedom " of the data were lost .1.6. by n .. .teacher ralio e xplains some of the variation in test scores. 11 1" = n _ 2' n i. where sJ = =2 .1 9) is similar to the fomm la for the sample standard deviation of Y given in Equation (3.6 means that standard deviation o f the regression residuals is 18. because two coef· ficients were estimated Iflo and (31).so the divisor in this factor is 11 . The R2 of 0.2.6.2.
<II I ere nlly. be in!!. a t a give n value of class size. As shu\\n in Fi gure 4.ero: stated mathcmutically.)rncwhat . howe ver.Ini tia ll y these assumptions might appear abstract..lI1 predicted (II.. . and asserts that the~' otner fuc tors are umt' laled 10 X." Wha t Ihe low R2 does te ll us is Ihal oth er important factors innue"ce lest scores. ! con 1110na on " .:1 to differ from thc prediction hased on the population regres..pre dictionisri~h t .lit o f 7. centerc on l' .) . say 20 stude nts per class. E(II . .4 Ib is is shown as the distrihution of II . = x) = 0 or.. . £(uI IX. Assumption #1 : The Conditional Distribution ofuj GivenX j Has a Mean of Zero The firs t least squares assumption is that the condit ional di~tribution of u. more gene rally.20 and. ' < In' . > 0) and so nw times to worse perfe rmam:e (II. but Ihey do indicate that Ihe studcnHeac her ratio alone exp l ili ll~ only a small pari of Ihe variation in leSI scores in thc!oC da ta .o.im plcr notation. The populat ion rcgrc~si on is the relationsh ip tha t holds on a verage bet ween class size and test scores in the populat ion. differences in !>Chuo! quality unrclalt'd 10 the Slu dcnl.e iven = 20 the mean 0 tel!) ~ 7SW ' 9 Flgms ] . given X.ion line. population regression line at X . and thL' e rror teml II. 1111. lll ol h er words. "nlCSC fa ctors could include difrcrcnces in the student body neros)' districts. < 0). has a mean of zero. 4. someli ll1~') these olhe r ractors lead to hetter perfonmnce th. This assumption is a form al mathem:ltical sta tement abllul the "other factors " contained in II.. l lley do.S willand wi ll notgiv~ useful estimates of the regression cocfficknts.Icl\ch. have natur~11 illtcrprctMions.4.11 is illustmted in Figure 4. f30 and (3 1.lIc estimator (I r the un known regression coeffi cients.Hlu understand ing these assllm pt iom is essential for understa nd ing whe n O I.I X.The low R2 and high SER do not tell us what these factors arc. in the sense that. re prese nts the other factors that lead test sco res at a given diM no. in SI. at othe r va lues x 01 : j c .: r ratio.4.• "6 CHAPTU 4 linear Regres~ion with Ooe Regressor regression is eithe r "good " o r ·" bad.4 The Least Squares Assumptions This sect ion presen ts a sel of three assumptions on the linear regression modd and Ihe sampling scheme under wbich OLS provides an appropri. given a . bUI on avera c over the popu lation the.'alue o f X" the mean of the distributiun 0 f these uther factors is 7cro. or luck on the leSI. the distribution 0 iI.
suhjccls are randomly assigned to the treAt mcnt grour (X = 1) o r LO the control group (X = 0). in thc precise sense that E(II. StuJC'ntlea c her nliQ . Y is distributed around the regr~sion line and the error.4 The least Squares Assumptions 127 fiGURE 4. The random assignment typo iC<. 20.4 .Y .) = O. a nd 25 slvdC11h.di~lrich wirh cla~~ $iws of 15. 25) Olstrlbu\Jon 01 Ywl!enX = 25 / fj(J . • .4 The Conditional Probability Distributions and the Population Regression Une T~st score 7:!1I 7~1 Dbufbution 01 Y wIll'l1 X '" 15 (!l'>tJ / (Ylh IS) DlSIribution of Y whtn X • 20 N >I I / E(Y!X" 20) ((f!X". .{flo + (jIXI.ubjccl.1 the popul ation regrcssioll lin e is Ihl.' conditional mean nf Y. AI 0 givoo yolue of X. III Fi '" . f {YjXj. (a mathe matical proof of thi s is left as Exercise 4. Ih.PIX (.lgment . Tn a rdndomized controlled experiment. given X. v . X is not randomly assigned in a n experimen t.. WheLher this assumption holds in Ii givcn cmpirical appli cation with observational data requires earefullhought and ju<.fW..Illy is done using a computer program thit' uses no information about the subject.and we return to this issue rc pea tedly." The figure shows the conditional probability of test scores b. th e assumplion that E(III IX. n IX.. Instcad. is the population regrenion line flo .4(1 6~1 f.1n be hoped for is that X is fll if randomly assigned .. the best thai c. g llfOfl the Uudenlteochet rolio. e nsuring that X is dislri huled independently of all pcrona l characteris tics of the <. has a condilionol mean 01 ~o for all values of X .4.6) . In obsen'ationa l da ta. As shown in Figurc 4.IX. The conditional mean ofu in a randomized controlled experiment.nt to assuming. The mean of rhe conditionol di~lribvtion 01te1l scores. which in tum implies L hal the conditional mcan o f II givcn X is zero.:qui va1e. Random assignmen t makes X find II indepe ndent.) 0 i~ o.
.) across observations. the conditio nal mea n assumption £(u. then (XbY. then the IWO ran dom variables have zero covariance and thus are uncorrclated [Equation (2. If the obser vations are drawn hysimple random sampling from a singJe large popula tion . assumption is a reasonable one for many data collection schem es.d. Recall from Secti on 2. the n the conditional mean assumption is violated. so the sa mpling sche me is nO I Li.5 (Key Concept 2. It is the refore often convenient to discuss the conditional mean assumption in terms of possible corre lation bc{weell X. howe ver. are correl a ted . they are i.n are independe nt lv a nd ide ntically distributed ( i. i = 1. IX j) = 0 implies tbat X i and u. suppose a borticuhuralist wants to study the effects of different organic weeding methods (X) on tomato produc tion (Y) a nd accordingly grows d iffe rent plots of tomatoes using different organic weeding techniques. i "" 1.d. 1Xi ) is nonzero. or corr(X.i . 'Ib e results prese nted in this chnph:: r . this implica tion does not go the other way: eve n if X. s urvey dala from a randoml y chosen s ubsel of the population Iyri caUy can be treated as i. that is.• n Are Independently and Identically Distributed The second least squares assumption is that (X"Y. . is nonrandom (although the outcome Y j is ran dom).i.27)]. a nd II.) sample of II workers is drawn from this population. For example..observa tio ns on (X" Yj). then (X"Y.. However. X and Y will take on some vaJues). if Xi and 1/. 3fe uncorreialed. . . Not all sampling schemes produce i. Assumption #2: (Xi' Y. Thus. 1f . Because correlat ion is a measure of linear assa cia Lion .. .J i := 1. For example.i.i. does not change (rom o ne sample to the next..).d. then the value of X. O ne example is when the values of X are not drawn from a random sample of the pop' ulation but rather are set by a researcher as part o f an experiment. . .i ..).d.n.5) .. That randomly drawn person will have a certain age lind ea rnings (that is. i = I•.d.• 128 CHAPTER 4 linear Regression with One Regre»af Correlation and conditional mean . and imagine drawing a person a t random fro m the population of workers.d. let X be the age of a worker and Y be his or her earnings.) ::=.). a re uneorre latcd. and U r If Xi and Il i are correl ated . If ther are drawn nt random they are also distributed independently t"rom one observa tion to the next. Thus X . As discussed in Sectio n 2. For example. then il m us t be the case that £(u. this is a statement about how the sample is drawn. II .d. necessarily have the same distrihution.i .. the conditional mean of u i given Xi might be nonzero.. The i. If she picks Ih e techniques (Ihe level of X) to be used on the J1h plot a nd applies the same kchniqu e to Ihe ph plot in all re petitions of the expe riment. O. 3 tha t if the condit ional mean of one ra ndom va riable given a nother is zero. /I aTe i.
'<perimental protocol is used .9) Slales tha llhe sample va riance s? is a consistent estimator o f the population variance oi.(s~ ~ u~). ••. If Y1.6 a pplies to t he average.IJ.5 using hypot het ica l data. In this book.3 showing that is consis tent.i. For example. Assumption #3 : Large Outliers Are Unlikely The third least squares assumption is that large outlier.74. y)2.d..i. for exa mple.ling \ S (If ~ o l!S l C j1h 'ro lll s? . For examplc. is finite. and the fourth mo me nt o f Y. A no ther way to sta le this assump tio n is Ihal X and Y have finite kurt osis. When this mode rn e. modc rn experimen tal pro tocols would have the hort icu lt uralist assign the level of X to the differe nt plots using a computerized random number gencrator. Large outliers can make OLS regression results misleading. Another example of nOIli.4 The Leost Sqvares Assumptions 129 developed for i. thcy arc likely to be low next quarter. ~ "'i.. One source of large outliers is dalRen try errors. a key Slep in lhe proof in Appendix 3.p Iplc. they might be recorded fo ur limes a year (qu an e rty) for 30 years. the assumption that large outliers arc unl ikely is made mathe ma tically precise by assuming (hat X and Y have nonzero fi nite fourth mo menLS: 0 < E(Xn < (XI an d 0 < E( y n < co.d . Ti me series data in trod uce a set of complications that arc best hand led aft·e r devel oping the basic lools of regres~ i o n analysis. observations with values n/" X i andlor y/ rar out side the usual range of the dataare unlikely. then the law of large numbe rs in Key Conce pt 2. and a key fe ature o f time series data is tha t observations fa lli ng close to each o lher in time are oot independent but rather tend to be correlated with each uther: if interest rates are luw now. q uite specia l. We encounler ~ d this assum ption in Cha pter 3 whe n d iscussing the coMistency of the sample variance.i. tbe level of X is random and (Xi' Yj ) are i.i.d . assumption. such as a lypographicaJ error o r incor rect ly using differel1l uni LS (or different observations: Imagine coUC<:ting ncS. Equation (3. sampling is when observa tions refer 10 thesame unit of observation over lime. howevcr. . ypiJ ill. where these data a re collected over tjme from a specific firm. The ca. This potential sensi tivity of O LS to extreme o ulliers is ill ustrated in Figure 4.>that is. we might ha"e d ata on inventory lev e ls (Y) at a firm an d the inte rest ra te at which the firm can bo rrow (X) . TIlis is an exa mple o f time se ries data.i. This pattern of correlation viol a t e~ the ·' in dependence·· part of the i. Specific<l lly.e o f a no nrnndom regressor is.4. • Y" are i. The assumption of fini te kurtosis is used in the mathematics that justify the largesample approx imations to the dist ributions o f t he OLS test s ta tistics. regressors 3re a lso true if the regressors are nonrando m.d.d. thcre by circum venling a ny possible bias by the horticuh urll iist (she might usc her favo rittl weed ing method for the tomatoes in the sunniest plo t). 1 (Yi .! ".
but the OlS regression line estima ted without 1 HNl MOO the ou~.: four moments.)1~ () . Class size is ca pped by the physical capaci ty of a classroom: the hcst you can do o n a st andardiLed test is to get all the qucstions right and the worst you can do is 10 get all the q uestions wrong. Still .rC'i~ion lin. BecamL' c\a~s size and lest scores ha ve a fin ite ran g~ .CI. but inadvertently rccortling one stutlent\ height in centimeters instcatl.. as a ffi <l the matinll matter... ir that is impossible. Data entry e rrors aside. 2.'19 POSltrve reIotioruhip 17uO 141_1 • between X and Y.130 CHAPTER 4 linear Regression with One Regressor FIGURE 4 . If this a ~sumr lion h old ~ then it is unlikely tha i statistical in fe rences using O LS will be tlominatr. ~ \ lgOU(hn . nnd Ih is assumption rules ou t those distributions.!:.i ~ is a plausible on" in man)' applications with economic data.. . O lS 5~) n:j. The OLS regres ~ line estimated with me oudier ~ a stro.er shows no relationship. If you decide tiwi an out li er is due 10 a tlata entry error.)\) hll~ C'xduJmg oot!J ~r I I _~J )0 (. they ncce~sari ly have finite kurto<.5 The Sensitivity of OLS to large O\Itlien y 2(1)1) Th is hypothetical dolo ~ has ooe oudie.:J by a fewobse rv:lIions.l ve infin llc fourt h mome ll t~.re"H. Use of the Least Squares Assumptions The Ihrec Ica. und "L' return 10 thcm repc. then you C<ln either correct th~' error or. drop the obse rvalio n fro m your data !.. Onc way to fi nd outl iers i::..ucdl y throu ghout this textbook.umm:l rized in Key ('nllcepI4.'h as the noml<l l di stribu ti on h<l\'<.t squa res assumptions fo r the linear rcgrcs')ion model nrc .i:.3. the assumption of finite k unrn. More generally. commonly used distributions s u(.(l 7(1 X data on the heigh t of sludents in meters. to 1'101 your tlata.• I 40 OLS n::.The least squarc~ assumptions play t Will roles... some dist ributio ns h.
One reason why the fi rst leasl squares as.4.i = 1.:. as is shown in the next section. n are inde pendent and identicaHy distributed (i.:>ump tion miglll no t ho ld in practice is discus.. then.2. c<ln be sensiti ve to large outliers. . where 4. 11. The t'fror term IIi has conditiOnill mean zero given Xi: E(u j!X 1) = 0: 2. the firsll east squares assumption is t he most important to conside r in practice.":':.:>ed in C hapter 6.'/"::.ons with time !!oeries data. and additional reasons are discussed in Section 9. As we wit! see.Ihe sampling distributionthat describes the values they could take over >umm a • .el. K£Y CONCEPT : I Yi = f30 + f3\X. in large samples the OLS esti malors have sampling distribu tions thaL are normal.es are random variables with a probability distri bution.'c 4.) draws {rom their joint distrib ution: and 3.3 1.you should examine those out liers carefully to make s ure those observations arecu[ rectly recorded and belong in the data set I I and .. the regres· sion methods devc loped under assumptio n 2 req ui re modificat ion for some applic. It is also impo rtant to consider whether the second assumption holds in an application. the inde pendence assumption is inappropriate for lime series data.. + II i. urge o utliers are unlikely: Xi and Yj have nonzero finite fourth moments...1\. . : .5 Sampling Distribution ofthe OLS Estimators Because th e OLS es timators ~() and ffi 1 are computed fr om a randoml y drawn sample. . The third assum ption se n 'es as a reminder Ih<ll O LS. 1l1eir second role is to organize the circun1Sl ances that pose difficultic~ for O LS regress ion .: .i. the estimalors therru. 'caust' rtosi!>.:?. 111 turn. (Xi. If your data set comains large outliers. ' .: ~' s ica l lll tn!. Therefore. c n f~ ryou ~t the ~ ono.YJ i = 1. just like the sa mple mean..5 Sampling Distribution 01 the OlS Estimators 131 v THE LEAST SQUARES ASSUMPTIONS rAi\H"1I4'1!(?~ ... this largesample normal distribution JetS us deve lop methods for hypothesis testing and constr ucting conCidcnct: intervals using th e OLS estimat ors. Although it plausibly holds in many crosssectional data sets.d.1 ha\'c ilfinite isump linatt:d Their fi rst role is mathematical: If these assumptions hold..
A llha ugh Ihe sampling distrihution of Y ca n be compli ca led whe n thi. iJo Po iJo PI .> lions 2. The Sampling Distribution of the OLS Estimators Recall the discussion in Ser. these distributions arc complicated. it is possible to ma ke certain sta te me nlS about il t hat hold for all II.so Y is a n unbiased estimalor o f J).4.). a nd ~t a re random va riables tha t take on diffe rent values from o ne sampk to the nex t. an e~t l ' malOr of the unknown populati on mean of Y.) : ~. Th is section prese nts these sampling distribu .21)) Lhat is. the mean of the sampling distributions of ~() and are J30 anJ 131.. bUI in large sampk.6 about the sampling distribution of t he sample average. P I) ill II. y. the mea n of the sampling distributio n is 111' tha t is. Review ofthe sampling distribution of Y .3 and the proof that is unbiased is \e ft <IS Exercise 4.7 If the sam ple is suf(iciently large. they are approximately normal because o f the cenlrallimit theorem.A. Y is a random variable that takes o n differen t values from o ne sa mple to the next the probability of these different values is summarizt!d ill its sampling dislribUlion. In pa rticu la r.Po (ind ~I are unbi(ised estimators of f30 a nd P I' The proof that PL is ullhitlseJ is given in Append ix 4.! sa mple size is small. by the central limit theorem Ihe ~ampJinf dist ribution of and ~1 is well approximated by the bivariate normal dislTibutiOl1 (Section 2. the n more can be said about the sa mpling distribution. y.• 132 CHAPTER 4 lineor Regression with One Regressor differen t possi ble random samples. This implies Ihal the marginal distributions of and are nornlJ l in large samples. In small samples. 1n othe r words. In particu la r.5 and 2. ~y..6) states that this dist ribution is approximately normal. the cenl ra llimit t heorem (Section 2. Decause Ihe OLS estim ators are calculated using a random sam ple. £(Y) . The sampling distribution of and These ideas carry over to the OLS estima tors a nd ~l o f th e unknown intercept AI and slope /31of the population regression line.3 . under the least squares assumptions in Key Concept 4. Because Y is calculated using a ra ndoml y drawn sample. tions.J). the probability of these di(fe re nt values is summarized in their sam pling distributions. ill E(P u) : ~o and E(P. (4. Po Po PI Although the sampling distribution of and can be complica ted whcn lh~ sample size is small. lf n is la rge. it is possible to make certain state ments a bout ill hat hold for all In particular. Y.
:= . As dis c ussed furth e r in A ppendix 4.X ).3 hold.4 imply that the ()LS estimators arc consistent that is.tL.4 .4.)F . in regression analysis. U summarizes the de riva tion of these fo rmulas. ~o a nd ~l will be close to the true popula· tion coeffi ci ent~ /30 and f31 with high probabilit y.22) This argume nt invoke!!. SO Ihe distribution of the OLS est.ults in Key Concept ~. as reliable unless there arc good reac.3. like )7. where the variance of this distribution.~ (. it is normally distributed in large sam ples. ll . (rt ~. and of th e eSli mators decrease to lCro liS 11 increases (1/ appears in the denominator of the formulas fo r the variance!!).6 we suggested that n = 100 is sufficiently large for the sampl ing distribution of Y 10 be well approximated by a normal dist ribut ion. In virtually all modern econometric applications" > 100. The rc:.Y)(X. malo r~ will be lightly concentrated around their mcnns. (Y. The largesample normal 4..50 \\C willlrea t the normal approxi mations to the distributions of the OLS estimator.1) a jointly normal sam pling distribution.1 var(llilll ) "iJ. where _ _ . The largesample normal distribution of iJo is N(Ar (TJ). bu t a n average of th~ product. (Tk ."lmple size is large.5 Samping DistributiQ"l of the 0lS E~timotors 133 lARGE·SAMPLE __ and PI JHlVt. .L.] l • is (4 .4 = 1 vur\(X. ..enot a simple ave ra ge..21) n [var(X.7) for Pl' you will sec thai iL too. In Section 2. il) a I)'PC of averag. This is bt!cause the variances ~ . lik e the simpkr average Y .1 . the centm llimit theore m co ncerns Ihi: distribUlion of averages (like }i). f30 and /31' when 1/ is large.E. The normal npproximation to the di~lrib ut i on o r the OLS estima tors in large samples is summarized in Key ConccpI 4. If you e xamine the numer aTor in Equtll ion (4. Technically.! DISTRIBUTIONS OF ~O AND ~I KEY CONC£PT I lfthe lcust squares assumptions in Key Concepl4. . II I E(H ~}j2' \\hr. lhe ccnlrallimit theorem. when the s.) £( X l) X I_ (4.ons 10 thin k otherwise. (Appendix .:re H . .) A rc [e" ant queslion in practice is how large n must be [or these "pproximat io ns to be reliable. This criterion carries ovcr to the more complicated 3\'t!rages a ppearing.4 distribution of ~ I is N(fJ" ("tll. Ihe cenlral limit theorem applies to th is a\'erage so thai .d ~~. then in large !>l'lrnp1cs j. a nd sometimes smaller /I suffices.
which presents a scatlerpio\ of 150 arli1icial data poi nts on X and y' The data points indicated by the colored dots are th e 75 obse r yation ~ closest to X... and the Variance of X y 206 The coiOfed don repre sent 0 set of X:s wjth small vorionce.. which have a larger vari ance t!:l an th e colored dots.. the large r the variance of X" the smaller . 202 t :t 1% • • • .' . .6. . .....• 134 CHAPTER 4 lineor Regres~ion with One Regres!>Or fiGURE 4 . With this approxima tion in hand . : .which would you choose? It would be easier to draw a precise line through the black dots. .J 97 98 99 100 101 102 103 X Another implication of the di~lributions in Key Conce pt 4. in ge neral. o ( Mathemat ically. we arc able to develop methoc... • • ••:: • ... ' • • ' • • • • 194 LI_ _ _"_ _ '_ _ _'_ _'_ __ '_ _..... •• • · . ::'1 :" . · . ·.. .·" I··" •• . .r. the variance erJ ...' ·. look a1 fi gure 4. witn a lorge variance. The no rm al approximation to the sampling distribution of ~f) and b l is a pow erIul tool.. ...6 The Variance of p.... . ·... cise is b" .' . ...... Suppose you were asked to draw a ljn~ as acc urately as possible through either the colored or the black dots.: the larger is var(X i ) . the larger {he variance of X. ... the more pre PI' u J. The regren ion line can 204 be estimated more occurotely with the black dots thon with the colored dob.21) so the smaller is To get a bett er se nse o f wh y thi" is so... . ~ . Similarly.. .. th is arises because the \'ariance of PI in Equation (4....2 1) is in versely propo rt ional to the squ are of (he variance of X . The 0 black dots represent 0 set of X.b for making infe rences a bout the true population values o f the regression coeffi cientS using only a sample of dat a. (he targer is the de nomi n<l tor in Equalion (4.4 is lhal.."'.
The inte rcept.3 S 4.Sumroory 1 . and have a sampling distribution wit h a variance (hat is inversely proponionallo the sample size 11 _ Moreover. The reason for 'h is assump tiOn is {h at OLS can be unreli able if the re are large ou tliers.. .. th ese results are not sufficien t to tl.i. This stepmoving from the sampling distri bution of to its standard error.that is. The rtsulls in this chapter d e~cribe th e sampling distribution of the OLS esti m<ltor. hypot hesis tests. thcn the OLS estim fllOf'3 of the slope and inter cept are unbiased.of It observa tions o n 3. for . The slo pe. Slated more fo rmally. for the vari ance of the samplin g distribu tion of the OLS est imator. the standaru error of the OLS estimator. By themselves.6 Conclusion Thischapter has (ocused on tbe use of o rdinary least sq uares to estimate the inter ce pt and slope of a popula tio n regression Ii n ~ using a sampk. 130. is the mean o[ Yas a fu nction of the va lue of X.1 summarizes Ihe le nninology of the population linear regres sion model.d. {3 1' is the ex p~c t ed change in Y associated with a Iu nit change in X. if n is large. dependent variable. X. but doing so using OLS has scveral virtues. are consistent. The first ass umption is that the error te rm in the linear regression model has a conditional me:m of zero. This assumption yields th e formula . If the least squares assumptions hold.Y. then Ihe sampling dislribmion o f tJle OLS estimator is normal. Y. and a single regressor. This assumption implies lhat the OLS estimator is unbiased. pre sented in Key Concept 4. These impor1l'lol properties of the sa mpli ng distribut ion of the OLS estima tor hold under the three least sq uares assumptions.) are i. and confidence imervals is taken in lhe next chapter. determines the level (or hcight) of the regression line.4. The third assumptio n is that large o utliers ure unlikely. ~ is rri pre: ~"ic nl S ~ : Pi Ie Id Summary l.. The popula tion regression line. There lirc ma ny ways to draw a straight line th ro ugh a scauerplot .!st a hypothe sis anoul tbe valu e of {31 or to construct a confidence inlerval [or {31' Doing so requires an estimator of the standard deviation of the sampling dist ribution. f30 + PIX. however. Key Conce pt 4.given the regressor X. X and Y have finite fou rth moments (fi nite kurtosis). as is the case if the data are collected b} ' simple ra ndom sampling. The second assumption is that (X.
.ee n the residual Ii.9.' unlikely. arc 10 the estimatcd n::gression\ine...5. The standard error of the regre<:<:ion is an eSlimator of the standard deviation of the regression crror. .'gression linc can be est imated uliing s(lmpie observat ions ( Y. Sketch a hypothetical scaacrplOi oC data for a rcgres:.sion "Jth Rl _ 0. i = l.1) explained sum of sq uares (ESS) (123) lOla I sum o f squares (TSS) (123) sum of sq ua red resid ua ls (SSR) (1 24) standard erro r o f the regression (SER) (124) least squares assumpt ions (126) Review the Concepts 4. provide an example in which the :lSSUI1lI " lion LS hllid. ·n lere are three key as:.d. .sion (S £ R) are measur~ of how close thl.ion \I. . n by ordinary le.sumption faii~ 4.i.rc<. '1l1C R2 hi between 0 and 1.2 For each least Stluares assumption.lX) 4.. \\ ith a larger vulue indicat ing Ihllt the Yj's are closer to the linc. and ".. II.. P I . The OLS estimators of the l. values of Y.: (2) the sample observations arc i.. X..: and between the O LS predicted value Y. have a mean of lero condit io nal on the regressors X. random draws from the popu lation: ~nd (3) . and then provide <In example In which the a. 136 CHAPTER 4 linear Regression with One Regressor 2. 111e R' and qandard error of the regre<. (1) unbiased: (2) consistent :and (3) normtl ily distribut ed when the S<L Po Key Terms linear regression mooel wilh a si ngle regressor ( 114) depe nden t variable (!l4) independent variahle (114) regressor ( 114) populatio n regression lint! (114) popu lation rt!gression fu nction ( 11 4) populat ion in tercept and slope (1 14) population coefficients (1 14 ) pnni metcfS (114) error term (I 14) o rdi na ry least sq uarc:s (Ol S) esti mato r ( \ \9) OLS regression line ( 119) predicted va lue ( 11 9) resid ual (1 19) regrcssion H2{I 2. If these assumpt ions hold ..1 Explain the difference he tween and 131: bet . and the regression error If. jlh R! 0. regression intercept and slope ure denoted hy ami P 3. The popu lation rl.Lst sq uares (OlS).).I arg~~ outlier') an.3 Sketch a hypothetical "Cauerplot of data [or an estimated reg.. 4. the OLS estimators (30 and (3 1 arc mpie is large.( }/.umplions fo r the linear regression model: (1) The regres· sion errors.
R2:= O..5.) \\'ilh with 4. TestScore ~ = 520. the regression's weight prediction for someone who is 70 inc hes tall? 65 inches tall ? 74 inches tall? b.SER = 624.ion of weight o n height yields Weigh! ~ = . selected from a popula tion a nd tha tthest! me n's height a nd we ight are recorded.7 + 9..99. o n clus:. A rcgres· :.Exerci$eS 137 Exercises 4.M.81. What i:. 23 stude nts. What a re the regres!. Last yea r a classroom had 19 s tude nts. B. measured in dollars) o n age (measured in years) using a random sa mple of collegeed uca1ed fuli tim e workt!TS aged 2565 yields the following: AWl:' = 6%. Wha t is the sam ple sta nda rd de via tion of test scores across the 100 clas. thc. lh ~ rcgn:ssion's predict ion for thlll classroom's a verage tesl score? b. Rl = 0.c. A cla:.1 Suppose lha l a researcher..SER = 11. ) 4..rooms? (Hint: Revie w the fo rmula:. inche~ where Weighl is measured in pounds and llldg'" is measured in )I.023.? (llint: Review the fo rm ulas fo r the OLS estimators. Ill l} ' regression? (G ive all resuhs.5. for the R2 and S ER.io n·s prl!dictio n fo r tht! change in the classroom a ve rage test score? c.6 x A gC'". and this year il ha:.ion estima1es from this ne w centime t cT~ L:ilogra m the '. sizc (CS) and a verage le st scores from loo thirdgrade cl asses. S£R = 10.) rl . usin g dill .ize across the 100 classrooms is 2 1. estimated coefficie nts. estimates the OLS regress ion .sroom has 22 students. Suppose that instead of measuring weight a nd height in pounds a nd inches. The sam ple a verage class !.41 + 3. A man has a late growth spurt lind grows i . .5 inches over the course uf a year. .4.<.e varia ble are measured in centimeters and kilograms.2 Sup pose tha t a ra ndom sample of 200 twentyyea rold men i!. What is the regrcs.4 .ion of ave rage weekly ea rnings (A IV£. Wha t is the regression's pred iction fo r the increase in this ma n's we ight? c.82 x CS. Rl. and S £R. What is the sam ple a verage of the test scor~\o across the 1(X) classroom!.2. Wlla! i!. R2 "" O .94 x Height.3 A regres.).1.
4. = Po . fJ IX..) 4.2.5 A professor decides to run an experiment to measure the effect of li me pres ' sure o n fi nal exam scores. Ex pla in what the eaefficieol values 696. Let Y/ denote the number of points scored o n the exam by the jlh stutlent (0 :5 Y.2.• 138 CH APTE R 4 linear R egression with Ooe Regre~sor o.Rj ) for thjs stock is greate r than the va riance o f (Rm .R. do you think il is plausihle Ihm the distribu\j on o f errors in the regression is nor· mal? (Hint: D o you think thai the distribution is symme llic or skewed? What is the smallest va lue of earnings. For each company listed in the table al ine end of the box .3%. " nd is il consistent with a normal tlislIibu lioo'!) g. denote the amount o f lime thai the student has to complete the exam (X.023.R. the rate of re turn on 3monl h Treasury bills is 3. Suppose thll lthe va lue o f f3 is greale r than 1 for a part icuklf siock.7 and ~. The shmdard error o flh e regression (SER) is 624. Show that the va riance o f ( R . Will the regression give reliable pred iction s for a 99yea ro ld worked W hy or why not? f. b.. He gives each of the 400 sfude nts in his course the same fioa l exam. a. Is it possible Ihal variance of (R . What are the units of measuremen t for the Rl (dollars? yea rs? o r is R'l unitfree)? d. :5 loo). . TIlt regression Rl is 0..1. Each student is ra ndo mly assigned one of th. and consider the rcgrcssion model Y. What is the a verage value of A WE in the sample? (Him: Rev iew Key Conccpt 4.is sa mple is 4 1. 6 mean.) c. Tn a given year. What are the Unil'l of measurement for the SER (do llars? years? or is S£R unitfree)? c.Rrl for this Slock is greater than the variance of (R". . + II /< .90 or 110).). b. let X.. Suppose that the value o f f3 is less than 1 for a particula r slock. but some stude nts have 90 minutes 10 complete the eXdTll whi le others have 120 minutes.6 years. use the estimated value of /3 to estim ate the slock's expected rate of retum. G i ve n what you k now abom the distribution o f ea rnings. What is the regression's predicted earnings for a 25·yearold worker? A 45yearold worker? e.)? (Hilll: Don '\ forget the regression error.: examinat io n times based o n the nip of a coin.5% and the rate of return on a large di versified portfo lio o f stocks ( the S&P 500) is 7. The average age in th.4 Read the box "111C ' Beta' o f a Stock" in Section 4.
I). Show thaI Ihe regression satisfi.11. 139 represents.4 continue 1 0 hold ? Which changc'! Why? (Is (31 normally distributed in la rge sa mples wilh mea n a nd variance given in Key Co ncepl 4. 4. II .J 4.O . . Explain what the term d ifferen l values of II. a ssu l11 p ti o n~ n. whe n X = 0. Suppmc you know that f3t.IX. im plies that P.. s~l i s fied ? A re the other assumplions in K ey Concept 4. When X = \. Wh y will different students have b. W llich is shown in A ppendix 4. ii.1 Show tha t P o is a n unbiased esti ma tor o f f3o.0 .? tI .:::: Iln + f3\X.:d exce ptlhal lhe fi rst assumptio n is rcplaced wilh E(II.. b.1f.d. 4..) 4. + II " a.. is a flernou ll i random va riable wit h Pr( X =: 1) . (ll int: U~e the fact that ~I is unbiaseJ . D erive a fo rmula for . 4lJ + 0. i~ N (O.l X.. is N(O.) ~ ~o ~ 0& O. in Ke y Conce pt 4.Exercises a. Compute the estimated gain in sco re fo r a s tuden t who is given an additional 10 minutes o n the exa m.3 are b. Suppose you kno\\ Ihal J30 = O . + " /. E(II .3.) £(Y. Explain why t:{u.3 a re sal is· fi. where (X . b. De rive c!'timlllor of fj .20. i. ) =: 2.: ~Iirnal()r of {31 " It formu la fo r Ihe \cast squares Ihe least squares :::.IX.) are Li. Show tha t R2 ..10 Suppose Ihat Y. Whi':.9 u. = O? 4.) l·.21).6 Show that the firs t leasl squ a res assumption .f3\X. A linear regression yields PI = O .X. 4). Doc!.::d.8 Suppose that all of the rcgrcs. and X . p. 11 Co nsider Iheregression model Y.3 =: Explain." ion assumpt ions in K ey Conccpt 4. A linear regression yie lds R2 = O . Derive a n e xpression for the largesample variance o f [Him: E valuate the Icrmsin Equalion (4. .::y Concept 4.4? Wha t about ~l'?) 4. d. Compute the estimated regression 's prediction for the average :. l h ! estim ated regression is Y._ 4.core o f studcn [s gi\'e n 90 minutes to complete Ihe exam: ' 20 min utes..h pariS o f K . = 0 for [his regression model.. and 150 minu tes.!X. = f30 '. this imply that #.24 X .
H ow much do ea rnings increase as workers age by one yea r? b. 24(4): Pf'. show Ihal Rl::. leadi ng to highe r productivity and earnings. Show that the R2 from the regression of Yon X is the same as the R~ from the regression of X on y . August 2U05..c.com!stock_watso n). In this exercise yoo will investigate bow course eva luations :lre re lated to the professor's beaut )'. fuilye<lr workers.. Predict Bob's earnings using (he esti.1 for 2004. 1 A detailed description is given in Tcachi ngRat· ings_Dcscription . Alexis is a 30yea rold worker. That is. with 11 high school diploma or B.comlsfock_wafson).ity of'texas at AtlSun lind ".2 On th e text Web site (www.ilb Amy !'arke r..) n. Wha t is the esti ma ted intercept'! What is the estimated slope? Use the estimated regressio n 10 answe r this question: . Predict Alexis's earnings using the estimated regression . "Beauty in tbe Cl1IS)room: t llsiructor~' Pulehntude lind l'utJ' live Peda'O£. Empirical Exercises E4. One of tbe characteristiCS is an index of Ihe professor's " beauty" (I S rated by a panel of six judges. rho b.~. Does age account for a large fraction of the variance in earnings across individuals? Explain . (These are the same data as in CPS92_04 but an: limited to the year 2004 ) T n thili exercise you will investigate the re l ~lIion sh ir between a worker's ag~ and ea rnings.S.you will fi nd a data file CPS 04 that contains an extended version of the dat a set used in Ta ble 3. course characteri stics. also avail able on the Web site. mated regression.Economics of £dllCfJllOll Rt:v.1 On the text Web site (www. Show thai the r~gr ession Rl in the regression of Yon X is the squared va lue of the sa mple corre la tion between X a nd y.o::re u~. Bob is a 26yearold worker. It contains data for full·time .you will find a dahl fi le Teachin gRatin gs tha t contili ns data on course eva lu ation:>.aw·bc.369376.aw· bc. also ava ilable on tbe Web sil e. and professor characteristics for 463 courses at the Univt:f· sit y of Texas al A U51i n. age 2534. c. (Gene rally. £ 4.AJB..ree.ica1 Producli~ily.140 CHAPua" linear Regre~on with One Regressor 4.n hiS paper ". as their higheSTdeg. . A detailed description is given in CPS04_Dt:sl:ript ion . Ru n a regressio n of average hourly earn ings (A H £) on age (Age)..12 a. o lder worke rs ha\'e mort: job experience. l'I"htK dutll " ere provided by Professor Daniel Hamcrmesh of the Um.
In this exercise you will usc these da ta (0 investiga te the re lation:.s nnd the distance from each stutlcnt's Iligh scbool to the nea rest fo uryear college. (For example. Construct 14 1 a scauerplot of ave rage course evalua tions (Course_Evnt) on the profe ssor's bcaUly (Beou/y).awbc. so Il)at slUdents who Ihe closer to a four. Does BC(tllIY explain a large fraelion of the va riance in across courses? Explain .3 O n Ihe text Web site (www.surtsS (lltil Et·o"umic Slil/l~·'IQ... Is the estimated effeci of Beauty o n Course_Eva/ large or smu ll? Explain what you mean by " Ia rg.ation or DI"er(ion1 The EUecl of Community Colleges 011 Educational AIt ain· men!:· Journal of BI.Empericol Exercises 3.you wi ll find a dala fil e Coliegef)iSlonce Ihat cont ain::.) What is the esti mated inte rcept? What is the estimated slope? Use the esti mated regression 10 answer this q uestion: How does the average value o f years of comple ted schooling change when colleges are built close 10 . o n average.comlstock_walsoo) . D ocs there appea r to be a relation ship between the va riables? b. Apnl 1995.ression o r average course evaluations (Course_ElIlIl) o n the professor's bea uty (Beottty).224. What is the estima ted inte rcept ? What is the estimated slope? Explain wby Ihe esti mated interce pt is equal to the sample Oleall of Course_Eval.e'· and "sl1'I<. ." e. while Professor Stock's va lue of Beauty is one standard deviatio n above the ave rage..hi p be tween Ihe number of completed years of ed uca tio(l for young adult.·ided hy Professor Cecilia Rou.><! of Princeton Un lVe n. complete more yea rs of highe r educatio n. 12(2). Run a regression o f years of compieled education (ED) on diSlance to t h~ nearest college (D ist).. d. here students go to high school? !J1. Commen t on the size of the regression 's slope. ~va l uatio ns E4.III.ity and wcre used In her paper ~ DcmOCfa!iz. .lable o n the Web site. Disl . Run a reg. pp217. ( Hint: What is the sample mean of Bcalll)'? ) c_ Professor Watson bas an average val ue of Beuuty. also avai. where Disl is mC<lsured in te ns of miks. 2 D. data [rom a random sample of high school seniors interviewed in 1900 and re·interviewed in 1986.2 means that tJl(~ distance is 20 miles.) A detailed descriptio n is give n in College Distance_Descriplion . (Prox imil Y( 0 college lowers the cost o f educa tion.year college should .~ data we re pro. Predict Profe ssor Stock's and Professor Watson's course evaluations.
run a regression of Growth on TmdeShare.. or some thing else)? E4. " I 2h1.o: WI.. yea rs of completed education using the estimated regression. dollars..0.th.. d. has a trade s hare much larger than the other countries.. W hat is the value of the standa rd error of the regression? What are the units for the standard error (meters. e. "'~r~ pro\"idcd by Profc~1IOr R ~ U\ln( 01 Bro .142 CHAPTER 4 Linear Regression with One Regressor h. Hlso lIv"H"ble on the We b site.yo u will fi nd a da ta rile Growth that contains data un Jve rage growth rates over 19601995 for ti5 coun tries. grams. An::. Where is MaILa? Why is the Ma lta IraJl' share so large ? Should Ma lt. 2\00.hc." indi\'iduals? Explai n.lI is the esti mated intercept? Use th~ regression 10 predict the growth rate for a country with trade share 01 0."lf'< ' of Finan. In lhis exercise you will in ves tiga te the re lati on...e Gr..! be included or excluded rrom the unaly~ i~? ~I"h~'s.comfstoek_wwtson). Does the re a ppear [0 be a re la tion· ship bet ween the variables? b. d..~ l:.C()/IOIIUO. What is the estimated slope'! Wh..300. Find Malta 0 0 the sc'ltte rplot.~... Constr uct a S(·atterplot of a verage a nnua l growth ra te (Growlh) on Iht: 'lverage trade share (TrtttieShare). n Uni~cr. along wi th va riables that are put e nt ia lly re la ted to growth A uelailed description is given in Growth_D escription . Does distance to college expltlin a large (met Ion of the variance in educational allainment aeros. ami ~orman Loa)7. Using all obser vations.:. lit! Thor~h:n Bcd. ] n..timate the same regression excluding the: dat a from Malta . cent ~. In hi~ p.ny 3ntl wef(' Ol<J j()I!nW/ .a. .aw.ship between growth and trade..UrI.5 "nd wit h a trade sha re e qual to 1. Predict Bob'!.' c.'C:'o l. Bob's high sl:hool was 20 miles frOTllthc nearest college.. E::.wer the same questions in (c). years. Ilow would the prediction change if Bob lived 10 miles from the nearest college? 1. One country. Ma lia. ml the s. UFin'ln~.4 On the text Web !>ile (www. Does Malta look like an outlier'..
24) . .~.Xi nr..gov)..xy = 2 2. All of these data were obtain. School characlcriSlic.rcent age o f sludenls who qualify for a reduced price lunch.hlch the ucri\luivcs in EquatIons (4.( dcrivath e~ with respect to blJ and b l : iJIo .b . l " bll  bIX. school dWl"nctcrblic.blX. nu mber of t e ach e r~ (mo.. (Y. and ~" are the values of bl• and b l t.ba .b() b.) and (4.' (rom fl ll 420 Kfl3nd KlS dbtricts in Califo rnia wilh dutl! OV:lilllbtc for l\J9g and 1999.23) fi. for ". (4.:) per f. .. and studen t dcmogruph ic backgrounds. number of computers per cl. O. Test scores arc Ihe It\'cmgc of the read ing and mmh scores on the Stanford 9 Achievement lest.:n in Key Concept 4. a slAndanJi'lcd leSI administered 1 0 fiflhgrade SlUdc nts.·d from the Califorma Do:pan ment of Education (w\W.e L: . firs t lake the parti.1 Derivation of the OLS Estimators I biS appendix u~es calculus to derive tho: formulas fo r thc OLS cstimnton.L " .'. giv.s (:tvc r aged :lcross the district) include. equIValently.1 The California Test Score Data Set 'fhe Cillifornin Stnndardizo. Ihe p.'mographic vurinb1cs {or the students also are 1I\'cTilged across the di"lricL'Ille demographic variables ind ude the percentage of students who) are III Ihe public lI~ista nct program CaIWorh (formerly A FDC).l.. .. The dat a used here :tn. f '" 2 "L (Yi .ca.oo of the as EsJimoton 143 APPENDIX 4 .23) p  l I ~ l btX. The OLS CSlHnator.I.cde. SllIdCnlS for whom Engl ish is a second language). To minimit:c the sum of squared prediction mlSlaKes I:~ I (Y' .)X..)! IEquation (46)1. APPENDIX _ _ _4_..TIl": S\U denlleacher ralio used here b the number of ~tudenl S in the district.:iI)ufcd liS "fulltime equ ivalcnl) ").hal minim._ 2. I f1~ .2. 11 .L " (V. .:d l b ling !lnd Rl'pl1rting da ta )ct contains d<ll:\ on lesl JX rfor mance.b" " Ix ..\udenl. enroll me nt. and cxpendi\ un.~ bll .Oemot. I( Y.dl\idcd by the num her of full time cqui \'alcm teachers. (Y. and the percentage of studenls who are Eng lish learne r" (that is.\ssroom..lhe vn lues of /Junod b.
(4 2~) Y .Y) " '"". bas the normal sa mpli ng.Accordingly. collecting lerm\. we show Ihallhe O LS estima tor PI is unbiased and.lIl<. . Representation of P I in Terms of the Regressors and Errors We sta rt by providing an expression fo r Y.. .  u)j (4.X) + II._ _ _ __ . . .~. . I_I .(X.4.X Y 13. .htX "" 0 and 1 " • • I " LX. must satisfy the two equation. 1 X)(ll. distnblllion given in Ke y Conce pt 4.(X I' L{X.sxyh} is obtained by dividing tht numerator :md denomm tllor in Equa.ii. . APPENDIX ___4 _" _ 311 Sampling Distribution of the OLS Estimator In this ap~ndlx. .v.+:~: the formula~. ..y) ..(X.27) is i( x. .Y.1 "/1 Solving this pairor equations for (4. .1.2(. $0 the ~I in Equation (4.27) iJ" Y . PI given In Key Concept .P.0 " /. LX.2 7) b yn . . BcC.X)(¥. set ling these dcrivuti.27) and (4.::!lI) Equa l i on~ (4.) iJo and P I yields i{X.iJo .)1 + :L.XI i_I + (u.Ii) .Po and PI.X)( Y.x. fJ. i(X.." A _ II !_I ! II ._I ! i x. .24) equnl zero.esequallo zero. = P! in terms of the regressors and ~'rrors. (4.e nllmerator of the form ula for AJ + f3 1X/ + u j• Y. 144 CHAPTER 4 linear Regreuion with One Reg(~~ and (4.o and lion (4. and dividing by" shows thill thc OLS esumalors.  Y = /31( X. .X)[~. in large !>Bmpl¢. .X .XI' " (·1.29) " "" PI L I I (X j  " :\.28) are the fo rmulas (o r ._I LX.
I~ (.) obt3lncd h} l'un~ldc ring the bcha\'ior of Ihe finallenn in Equation (.X l' 1/ .)' ) ::: PI }.X f . + E [ .Pi . ) . ..O..L ( X.X )/1 L. I (1. Subslituttng this II ) . I " " .3 ). PI IS unbiased..alenlly..PdXI•.I (X. which implies that ~ 7.31 ) _ f31' n ' :L(X..X l' i~I where Ihe ~econd equality In Equ at ion (4.{J .30) ProofThat Thus. .s E~timo lor No w ~.  >. . Ihal is. ..X. It fol1o.27) yidds 1 " L(X .' I(X. of iterated expecta tions qjJ. .\:. howcvcr. .X)II . .] ~.'.X)F.) = O...uion 01P I ~ obtained by taking the elpect3tion of bolh "id~ of b..Sampling Distribution of the a. B~ the second lea~ t squares assum pt ion. XJ =.. ~ thalthc (undilional eXJ'Cc\i! ' large hraekets . given X I' . II) = 145 2:7.)..I X.h:. Xn)) =' O.11 X III = O.. .1  .  f3 11 X I" . E(lIiI X. I(X.30).(.X)II. By the la.29) yields l ~ I( X i L.trit'outed mdepclldc nlly of X for all Obierl'. . so that E(PI) = Pi: that IS. ] " .X) ! + cxpre~sion in turn into the formula for P : in Equa tion (4..X )(11/ X)II.....O . ._I ' ~ ] (4.L~ I (X. X" . By the fir~t lenst squarc~ assumplion. so L::(lItI X I" lion In . Is Unbiased ln E(P1)=P j "' /:.X. ~o that [(P I .. I.onccpt4.jurttion (1. .X)(u.PI. X~ ) ~ E(II . .ltions other tha n i.X H Y.UO). Substituting 2:7 d I(X. tnto Ihe Gmll e x pression in Equation (4. . ..~1 l'1condilion' all y unhill'!ed .n the 'Second line of Equation (4. .X'. . . II . (Key C. LargeSample Normal Distribution of the OLS Estimator The largc slLmrlc normal approximalion to Ihe lim1llngdi~trihution olP..il'__ Pi .L: :(X.:L (X..XlU .) II . .4) .. The cxpcct..kl  I ~~ 1X..Ptl = FIF. [ f(X... I(X.' )11. . P.t.  X)II. j(X.E(~tl X..31) is zero..(XI. herl' the finn l equnluy follow'S from lhe definitiOn of X .. Equi\. .(u.7 _t (X.31) fo ilo \\'5 by U ~tng the la w of itcm ted expecta tions (Section 2. . .. .~ "". .
• 146 CHAPHR 4 linear Regre~5 ion with One Regre~SOf Fll'S\ consider the numerator of this tenn.". is. )n) distribu tion. =Y. vl u " is. By the first least square5 assumption.2 [Equa tion (3. Vi has a mea n o f ze ro.! consider the express!on in the denominalO r in Equal io n (4. is u .8)].! is Y .x.JLX)If.il~·.35) say t ha t t he sample average of the OLS residuals i<. 13.) so IhM lite sampling distribution of lion (4. we have that. whe re u~ = u~/ n. and (4. =1 I " . 1). in large s<lmph. II satisfies all the requiremen ts of the cenlrallimil theorem (Key Conc.3. .. In large sa mples. . (Xi .. d istribu ted N(O. to il Bcca u~ X is consistent. by the th ird least squares assumption.X :' 0. '" 0 and Ir. (4.3)) (. As discussed in Section 3.3~) Equations (. whe re (Tt.md i.. in large 'l amples. (4. i. 7). "" var( v)/ [va r( X. The variance of v. ~ I  f3 1. .~ nonzero and (in..ILx)u J /[ n[var(X. ". and SSR arc defined in Equatio ns. X is nea rl y equal 10 I'x' Thus.32) through (4. = var(Xi . =O' . Thus. so in large samples it is arbitra rily close to the population vari· flnce of X .".ux)lI. 17)1.ltio n variance.d.l which.30) is The sample ave rage V. if the sample sue close a pproximation. the term in the numerator of Eq uation (4.)J"I . By the second leas t squares ass umption.. ii lvar(X..15).1.30).1 4).2 1) ..~. = var[(X. tT. which is the e xpression in Eq u.i. ilnd the ro tal sum of squares is In.I " :2>. is large. which is inconseque ntia l ir n is large).) TSS = SSR + ESS . the samp le covariance s~\ betwee n the OLS residuals and the regressors is ze ro.:pl 2. Thus the d istrib ution of v is well approximated by the N(O.: sum of the sum of squared residuals and the expla ined sum of squa res [the ESS. Th ~· fero re.popul.).I (4. (4. ze r(l~ the sample average of the O LS predict ed values cq u.". where v.J· Some Additional Algebraic Facts About OLS The O LS re~ idua l s and predicted values satisfy : 1 " nL u. is i.. Ihis is the sample v" ri am'c of X(exn:'pt di vidingb}' n [alher Ihan n . Nc). Combining these two results. N(t3I ' q~. l].32) nL Y. TSS.. ( 4. . the sample variance is a consiste nt estimator of th e.
To verify Equation (4. )(Y.X. L (X._ i I {(Y..Y) . = X) ] (X.27). no te t ha t Y. i" = O . . .Y}(X. ) = Pur:..37) = SSR whe re the + £ss + 2±U..Y i r. where t he fina l equa lity in E<luru ion (4.X.IY whe r(' lhc:: second equalil}' is a consequence of EquatIOn (4 .32). nOle t ha t the definition of AI lets us wri te the OLS residu als as Ii. . + Y I y)2 = I_I i (y.X ): thus ft L u.~J Y. = O.Y.Sampling Oi ~tribution of the 0lS Estimator 147 To \c.I ..i.}~ ) :: 0 and S:. _1 . r. ~ y) . + P I I:~ lil.1i.i.. ~ X)'  (06) o.IY / + L:_I.I i:(y. .33).35) fo llows from the previous results and some algebra: TSS = .(X/ .rify Equation (4.:.(X.~IX. fin a l equality fo llows from = 0 by the previous n:suhs.L:~ Ili. = ( Y. lli.I iu. I• y.:.  y }2 + 2i(y..(Po + PIX..   X}  P '/L ( X. ~ L (Y.~ \ Y. nlis res ult.32)._ 1 • ~ X). SSR + ESS .i r:. .  But t he defini tion of Y a nd X imply that }:~_ I (Y' ..I .~ ii.  iJo .  y. note tha t I:.y) 2 = "iCY. .X ). '0 . + . .Y) (4. "" Y.P.].(X.o r.l.}2 + .34). .x.yi ..36) is o btained using the formula fo r /3.X) L(Y . I.. in Equa tio n (4. im plies that 5"X Equatio n (4.so l L~. I(Xi X) = O. ::: L:. . .II._I iCY. :. To verify Eq uation (4. • . 1_ 1 i~ ' Y)  p. . .= . combined wi th t!le preceding resul ts.j "" 0 im plics r.
. If.:d how the OLS estimntor {J! of the slope coefficie nt {3\ diffe rs from one sampl e to the nexl. Section 5. and ih standard error to test hypot heses.2 explains how to construct confidence intervals for f31' Section 5.3 takes up thl:! regressor.. 15. Ih r. in additi on. Section 5. OLS i!I effiei!!n! (hilS thc smal1e~! variance) among a certain class of estimators. [n th is chapter. a concept int rod uced in Section 5... Section 5. The sla Tting poin l b the slancJa ru error of lhe OLS esli ma lo r.5 presen ts thl:! G '1U ~~ ·M arko\' theorem.Ihtll is. Chapler 4 cx plain.4...6 dic.1 provides an expression {or this s ta ndard ~rror (and for the Siandard e rror of the OLS estimator of 1ho.: intercept).: Ihe sampling unce rlai nty. we show how knowlft dge of this sa mpling distribution can be used to mak~' Sl alCI1lt. which qates thaI. which measures the spread o f the s ampling tl istribution of 131 Sect ion 5. then some stronger rc\ults can be derived regllrding the dbtribu tion 0f Ihe OLS estimator One of these strongc. Sections 5.:nts abou t f31 tha t llcc ura le ly Sllmmnri/o. under certain condition. how iJI has a sampling distrihution. some ~\ron g.3 assume that tho.: r ~pedlll case of a binary conditions hold . conditiOns is Ihm lhe crroI3 lire homo)kedastic.CHAPTER 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals T hiSchapler conlinucs (he trea tmen t o f linear r~g ress i on wilh a single regressor..cusscs the d istribut ion of the OLS estimator when the population distribution o f the rcgrc"sion c r l'\)r~ i~ nonna!. then shuws how \0 use [J.:c k asl squart:s assumptions of Ch apter 4 hold.
'iJu of the popu lation regression line is 'Z e ro.o)/ SE(y). The second !)Iep is 10 compute the Istatistic.1 Testing Hypolheses About One 01 the Regres~ion Coefficients 149 5. Heca use the effeci o n test scores of a unit c hange in cl ass size is Ihl(f~SiU" the tl'lx payer is asse rting th at the population regression line is fiatthaI is. The tes t of the nu ll hypothesis Ho aga inst the twosided alternative proceeds as in the three ste ps summarized in Key Concept 3. Class sizt'othe taxpayer cla ims. so we begin with a brief revie w. at leas! ten tatively pending Cunhcr ne\\ I. TwoSided Hypotheses Concerning 13. by ran dom snmpling varia lion .! tcst statistic actually obsc rvcd: equivalcntiy.2 that the null hypolhe!)is Ihat the mea n of Y is a specific v<l lue l.'\'idenec? This section di'SCusscs tests of hypotheses aoou tt he slope {31o r inte rcept Po of the popu latio n regressio n line.1. which is an estimator of the sta nda rd deviation of the sa mpling distribution of Y. a nd the two·sided alte rnative is " l: £(Y) t:.. assuming t hat the null hypot hesis is correct .i r.ero? Ca n yuu reject Ihe taxpaye r's hypothesis thai {3Ch1u. (Y  I' r. the n lOrn to one·sided tests a nd to tesls of h ypo the!)e ~ rcgarding the inte rcept f3 u.\l:~ =. the (sta tistic is r Tes ting hypotheses about the population mean . Recall fr om Section 3. The [i ~ 1 is to compute t he sta ndard error of }It S£(Y'). evidence in your sample of 420 obse rvations on Californ ia school dislricts thai th is slope is nOfl1. the pvalue is the prObabi lity of obtCl in ing!\ slatistic. Is there.J. has no effect on Icst scores.ilalis tic actua lly obse n'ed. the supe rintendent calls you with a prohle m. applied he re. The tax payer's claim ca n be rephrased in the language of r~gre ss ion analysis. which has the general form gi ven in Key Concept 5. at Icast as diffe rent from the null hypothesis value as is the f.L}' o.1 Testing Hypotheses About One of the Regression Coefficients Your clie nt.5. based on Iht. or stl ould you accept ii . o can be written as flo: E( Y ) = p. The third step is to compute the p value.O. the superint edent ask~. so that n:ducing them further is a waste of mon ey. The gene ral approc lch to testi ng hypoth eses about these coefficie nts is Ihe S(lm c as to testing hypotheses about the population mean. We sta n by discussing twosided tests of the slope f3 ] in de tail . th e slope f3 0 <l". She has an a ngry tax payer in her offi ce who asserts that cutting class sizt'owill not help boost test scores.r:o.6. which is Ihc smallest signilicancc level at which th e null hypothesis could be rejected.
. the critical fCJ ture justifying tht..s ta tistic act uall y computed and e l) is the cumul ative standa rd normal distributi on ta bula ted i. The null and alte rna tive hypotheses need to be slated precisely before thl!~ can be tested.n A ppendix Table 1 A lternatively.Under the twosided a lt ern ative.lope f31 can be It.2 ) To test th e null hypothesis H o. (5. the sampling distribution of Y is approximate ly normal.l ) ~ (Key Concept 3. Because PI also has a normal sampling distribution in large sa mples. the third ste p can be re pl aced by simply co mpa ring the Istatistic to the critica l valu e apprO pri<:l le for the test wilh the desired signific.~ and the twosided Illtemative hypothesis are Ho: {31 = f31. we follow th e sa me three sleps as for the popul a· lio n mea n. In Ihis case.. "* /3 1. tes t is 2¢( l r<Jell) .1 In general.1 is the value of the .. Th e firs t step is to com pute the st andArd error of SE(~ l) ' The standard error of ~I is an estimator of lTiJ • th e standard devia tion of the S<1mpling diSlribu .. Testing hypotheses about the slope P t.m ce level For example. lion of 131' Specifica lly.J) .Al a theoreticallevc l. Beca use the Istatistic has a standard normal distributi on in lurge samples under the null hypothesis.' foregoing test ing procedurc for the popula tion mean is that.statistic has Ihe form I = csLimator ./3 l.. {31 does nOI equal f31. (5.  SE(Pl ) = . hypotheses about the true value of the s.. ..arr l > 1.<1 (t wosided a lt e rna tive ).5).96. the p va [ue fo r a twosided h y p o th e~ i.hypothesized value sta ndard error of the estimator (5.o. the null bypothesi.1 ..• 150 CH APTER S linear Regres~ion with One R egressor KEY CONCfPl GENERAL FORM OF THE ISTATISTIC 5. where lU/.0' Tha t is.0 "s. The angry l<ixpayer"s hypmhesis is that O CllUSSj~t = O ..'sted using the same general approach. More gener ally. H1 : {3.. under the nu!! hypothesis the true popuilnion slope OJ takes on some specific value. Pt. the . u twos ided resl with a 5% significance level wOllld rejectlh c nulJ hypoth esis if I.. iD large samples. the popul at ion mean is said to be sta tistically sig nifkall1ly diffe re nt than the hypothesized value at the 5% significance level.
the null hypothesis is correct. the sec ond eq uality fo ll ows by dividing by SE(Pl)' and I n(/ is th e value of the .PUll! SE(~. ~ . Altern:Hively.S. and rejecti ng the nuU hypothesis al Ihe 5% level if IrMfI > 1. _ 1 ± (Xj 2 /. x ..4 ) is disc u s~ d in Appe ndix 5.4) Th e estimator of the variance in Eq uatio n (5. 1 _ X )l/il 2 .2.96. ) (1 ' 1> "." I] = SE(~.1 Testing Hypotheses About One of the Regreuion Coefficients T51 where _ 1 _ "t. Because ffi) is approximately normally distrihuted in large sam ples. say less than 5%. [IP 'fu'l > Ik' p.a). .7) A small va lue of the pvalue. . the null hypothesis is rejected at the 5% significance level. PI pvalue = PrHJ IP I  f31.statistic actually computed.11("11) = 2$( l lq("ll).. uoder Lhe null hypothesis the (statistic is approximlltely dist ributed as a StHn d<lrd no rmal random variable. 1 /I  [1i (x. If so. 1.oas the estimate actually computed (jf. the probability of observing a value of at least as diffe rent from f31.~ SE(iJ. so in large samples.ul > liff' . Alt hough the formula for oJl is compl icated. (5.) ' (31 . in fact. the hypoth esis can be tested at the 5% significance leve l simply by comparing the val ue of the Istatislic to ± 1.) Pr ~ p.0 (5. H.. the critica l value for a twosided te"t. I '~'I)· (5.(31.5) The third step is to compule the pvalue . assum iog tbat the nuU hypothesis is correcl. in applications the standard e rror is computed by regression software SO that it is easy to usc in practice. (5. Stil ted lnilUle matically. The second step is to compute the Istatistic. °lllese ste ps are summarized in Key Concept 5. pvalue = Pr( IZ I > 1.96.6) where PrH" denotes the probability compuTed under the null hypothesis. _X)'] II . provides e vide~cc against the null hypothesis in the sense that tbe chaoce of obtaining a value of /31 by pure ran dom variatio n from one sample to th e next is less th an 5% if.
Compute the pvaluc [Equation (S..4 and S£(&l) = 0. One compact way to rdpaft the standa rd errors is to place them in parentheses below Ihe respective coeHicients of' the O LS regression line: Te~'ISwre ~ "" 698.O <.S) provides the estimated regression lin e.96.S) also rcporl s Lhe regression R2and Ihe Slandard erro r of the regre... (5.~ • 152 CHAPTER 5 linear Regres1ioo with One Regre~!>Or KEY CONCEPT TESTING THE HYPOTHESIS {31 = {3. 2.2 . R( 10.3)]. the 5% (twosided) critical \'alu ~' take n from the standard normal distri bution.% .2. Suppose you wish to test Ihe null hypothesis 1hllt the slope PI is zero in th.· sian (S E R) followin g the estimated regression line. ll). equivalen tly.= O. construct the Istatistic and compare it to 1.52. and it will he used tbroughout the rest of th is boof. if It.2. * 5. by convent ion they are . llle Istat istic is const ructed hy substituting the hypothesized value of fJl under the null hypothesis (zcro)... The :.. The O LS regression of lhe test score against the studentleacher ra lio.5)].7)J. repon ed in Equa lion (4.52) .: population count erpart of Equat ion (5.1(11> 1.9 anti ~I = . Reporting regression equations and application to test scores.) {Equation (5.6. 1. Compute the tstatistic [EqUlIlioTl (5.  TIle standard ~rror and (typica lly) the tslatislic and pvalue testing f3! = 0 afe computed automatically by regression soflware.05 or.. the estl" mated slope.8) into the general rormula .. yielded iJo = 698.S) Equation (5.28 X STR.28.OS l . {3'l. .9 . 3.ncl udtXI whe n reporling the estimated OLS coe((icicn ts...' I ~\[ldard errors of these esti ma tes arc SE(~n) = 10. and two measures of the fit of this regression line (the! R2 and the Sl:R):l11is is a common form at for reportlllg a single regression equ ation. Because of the imporlance of the standard errors..8) at the 5% signi lica nce level. To do so. estimates of the sampling uncertainty of the slope and the inlercepl (tbe standard e rrors).. Thu s E quation (5 . Reject the hypothesis 011 the 5% sig nificance level if the pvaluc is less than 0. 0 AGAINST THE ALTERNATIVE {3. SER ::: 11:1. Compute the standard error of ~I' SE(P .4) (0. and its standard error from Equa ti on (5.
y are ra\' to bcu\·e (5. Th e dis(:ussion so far has foc used on testi ng the hypothesis th.4.o.: to c:om:lude thai Lhe n ull hypothesis is fal se. the pvoloo 1 s lg 96. as shown in Fig ure 5. e xtre me ly Slll<l ll. in th e studen t. That {JI I rc£. Thi s Is ta [ist ic. book in the One. • . it is rcasonubl(. o r 0.o against Ih e h ypothe si~ Ihat {31 f3 !. t h~ result ..l to the riqIIt of "'4. if the null hypoth esis {3Ck'>iS.remely small.52 = .resinty of I (5. l e~s than 0.0000 1.4. is only 0.'luse under the alternative /3 1could be eithe r larger or sma ller than 131.38. 1 Calculating the pValue of a TwoSided Test When t(J(l = .' = 0 is true.%. This probability is ext. it is appropri ate to use a o ne sided h y pot hc~is tcst.l Testing Hypoiheses About One of the Regression Coefficients 1S3 FIGURE 5. For exa mple.28 . 1) ~. the pro bahility of o btaining a value of as far fr om the null as the value we actually obt .5). IOLS ~ qua ~ 1M pvalue Is tilt area ~. approxim ately 0. to thf ~ft of U8 l cst! the art.ac' = 4_ 38.S) c !'it of porting ~ i. so th e null hypothesis is rejected in ravor of Ih" twoside d alternati ve at the 5% significance le vel. AlLernativeiy.0)/U. IW can wmpute thepvalue associated with ! "O .38 The pva!ue of 0 twosided test is the probabili ty thot II > 1t"""I. bec.11115 probability is the area in th e tails of standard normal distribution.S. at value e te ct ll~ the e~ti for !l1 11 \l1 * • .001'Yo . exceeds ( in absolute va lue) the 5% twosided cri tical value of 1. I are N(O. ma ny people think that sma lle r classes provide a better o do SO .s.$ I QCf = ( . Because this eve nl is so unlikely.lined is.00001 .001% .8) in Equation (5.38.4. however.38.:. 0 ·DM . when .Sided Hypotheses Concerning P. where Z i~ 0 stondord normal ro ndom variable and fX' is the value of the tstatistic colculated from the ~mple.teacher ratio/test score problem.1l1is is a tWOsided hypothesis test.2.1.:n f3! = f3l.0' Sometimes.
one si ded a1t e rn a l i \o. Under that hypothesis.96. Pr(Z > ( l!('I ).the inequality in Equation (5.<1>(.. the null hypothesis and [he oncsided alternative hypoth esis are Ho: {31 = /11 .') (pvalue. When should a one~sided test be used? In practice.o.statistic is the same..9).o' If the ahe rna tive is that P I IS greater than f31. . such ambiguity often leads economeuicians to u~ twosided tests. Because the null hypothesis is the same Ior a one. (5.645.onesided lefttail les!). In prac · tice. (5. the hypothesis is rejeCied at the 5% significa nce level if l ilCI < . va lu es of the ( st<uistic: Instead o f rejecting it' .~ hypotheses should be used only when there is a clea r reason {or do ing so. For a onesided lest. It might make sense. . PI is negative: Smaller classes lead to higher scores..9) is reve rsed . The pvalue for a onesided test is obtained fTom the cumulati ve standard nor mal dis tributiOI) as pvalue = Pr( Z < "4 .. This rea son could come rrom economic theory. but not large posit ive. the nuH hypothesis is rejected aga inst the o nesided alte rnati ve for large negative.and twosided hypothesis test is how you interpret the (statislic.teacher ra tio exam ple) a nd the alterna ti ve is tha t 13. upon reflection th is might not necessarily be so. th e inequa lities in E4u3tio ns (5 . we are reminded of the graduati on joke th ai a uni versity's secret of success is to adm it talented students and the n make sur~' that the facu lty stays oul of tbei r way and does as linle damage as possiblc. therefore. to tes t th e nu ll hypothesis that P I = 0 (no drect) against the onesided alternative th at (3 1 < O . eve n ir it init ia lly seems that Ihe rclevant a lte rna ti ve is o nesided .0 vs. HI : P I < fh./\ > 1. Ho"' ever.n the cl ass size eXlImp1c. th e constructi on of th e . unde r the null (0 in the !>I udent.9) where P. prior e mpirica l evidence.0. is less tban f3l.10) are reversed.9) and (5 . 0 is the value of P. For the ones id ed alt ~ matj ve in Equm ion (5. I. (Doesided alternative) . A newly rormulated drug undergoin1! cli nica l tria ls actually co uld prove harmIul because o f previo usly unrecognized sid e effects.. or bOl h. TIle onl y difference betwee n a one . so the pvaluc is the rightt ail probability..1..10 ) If the alternatjve hypothesis is lhat {31 is greater than P I».154 CHAPHR 5 lilleQr Regre~~ ion with One Regreuor learning environment.and a two sided hypothe sis test.
This is less tha n 2. lnc Istatistic tcsting the bypOl hes is tha t there is no effect of class size on test scores [so /3 1. th e hypothes is concerns the intercept.1l1e null hypothesis con cerning the inte rcept fi nd the twosided a lte rnative are Ho: (31) = .1J vs. If th e alte rnative is onesided. HoW ~ergoi ng The genera l approach to lesting Ihi s null hypothesis consists of the L hree steps in Key Concept 5. rd. there a re many times that no single hypothesis a bout 3 regression coefficient is domi na nt. however. 131 is Testing Hypotheses About the Intercept 130 ot be ~ec n lr the . appli e d to. so the null hypothesis is rejected 3g. H ypOl hesis tests are use(ul if you have a specific nuU hypothesis in mind (as did o ur a ngry taxpayer).38. ~ rn ati vC l11is rca' b.0gnized ~ o k e that )ukc sure ~.st the the t 5% 111i5 discussion has focu sed on testing hypotheses abo ut the slo pe.9)) is / iKI = 4. (3\ .l3o. Being able to uccept o r to reject this null hypothesis based on the statistica l evide nce provides a powerful tool for coping with the uncer tainly inherent in using a sample to learn about lhe population. Based o n these data. This calls for constructing a COnfidence inte rva l. Occasion ally.l3o (the formu la fo r the standa rd error of is given in Appendix 5.1). It • . In fac!_th e pvalue is less tban O.9) ~am Application to test scorl!s. and instead one would like to koo w a range of values of the coefficient tha i are consiste nt with the data. we cannot de termine The true value of f31 e xactl y from a sample of da ta.IO) ~it iCS in ~ability.11) lC fd nor P o (S. H I :. upon f.2 Coo ~del'l(e Intervals for a Regression Coeffici ent 155 :ad h at ) Ih 5. (5.5.I3o "* f3o.li os t the onc sided alternat ive at the I % level. Ye t. In prac 5. you ca n reject the a ngry ta xpaye r's assert ion tha t the negat ive estim ale o f the slope arose purely beca use of ra ndom sampllng variation at the I % signi ficance level .0 = 0 in Eq ua tion (5.o (twosided alte rnative). t3o.cessarily has sampling unce r tainty.()(XI6%. this approach is modified as was d iscussed in the previo us subsection for hypotheses abou t the slope.33 ( the crit ica l value fo r <l onesided test with a 1% signifi cance Ic\'e l) .2 Confidence Intervals for a Regression Coefficient Because any stati stical estimate 0 1 the slope {31 ne.2.
28 and SF:(~. 156 CHAPTER S linear R egression with One Regressor is. will he contained inl he co nfi dence inter · val in 95% of all possib le sam ples.. A hypothesis t ~ t with il 5 % s ig nificance level will. B UI construl·ti ng the tsta lis tic for a ll va l u e~ o f /3 1 would ta ke fo re ver. in 95 % of a ll possible . A 95% conridence in te rv..96SE<p..' 1 . Second. il is an interva l ilia! has a 95 % probabili ty of containing the true value of 13 1: that is. First .0) ill thr.3. reject the true va lue of 13 ) in on l} {J I 5% of a ll possible s:lmples.3. Application to test scores. re po rted in Eq uation (5.amplcs the true vulue 01 will 1/01 be rejected . il is said to have a confidence level of 95%.1. A n easier way to construct the confid e nce inte rva l is to nOtC thlli lhe 1.521 .26.. the confidence inl erval will contain the trUI: va lue of Pl' Beca use this inte rva l contains the true val ue in 95% of all sa mp les.ll fo r f3!\ is const ruclt:J i3u <lnd SE<flIl) repiacing .%S£ (P.Th is argume nt para lle ls the argume nt used W develop a confide nce inte rval fo r the populati o n mean.:I S K ey Confidence interval for as in Kt:y Co ncept 5.: 5% sig ni fica nce le ve l using the ..· Reca ll .52. Confidence intervai for 13 .2. in 95% o f ~ sible samples tha t might be d rawn.U whe never /3 10 is o utside t he rang. tlf 1 .. As in th e case of a co nfidence inte rva l fo r the pop ulatio n me a n (Section 3.H).).3). two e q uiva lent de Clllilio ns. wi th Po. by definitio n. it is the sct of va lues thai ca nnot be rejected u ~ i ng a twosided h)'pothesis les! wit h a 5% significa nce level.hat a 95 % eo nfi d ~ n ee interval for 131 ha..sta Tislic. the 95% confide nce interva l for (JI h the int cc\:ll 1. 1.2. tes ting the null hypothesis /31 "" (Jill fo r all va lues of /31..  lie will reject the hypothes ized va lue J3 I.. 131 is summari/c d .! It 95% co nfide nce in ter val ctln be computed hy tcsting all possi ble \'al· ues of /3 1 (t hat is.c PI l.81and SF.teacher ratio. The value {:J I ::.28 . that is.3.(p]) . possible to usc the OLS estimator and its sta ndard erro r to construct a confide nce inle rva l fo r the slope 131or fo r the inte n. The 95% twosi ded confidence interval for /3 is [.1<11. = T he conSl ructi un of a co nfidence in terval for Co ncept 5.30 :5 13 1 S . yielded == .. llie reason these two de finitio ns a rc eq uivale nt is as fo llo ws. T he 95% confiden<:e inte rval is tlwl! the cnllectio n o f nJl lhe va lues of 13) Ihnt a re nOl rejec te d.: 0 is not contai ned in Ihis confide nce lntc n <ll.96S£(P ')'{. however. Th ~ OLS reg ressio n of Ihe tcst score again ~t til" stude nt. Because the 95 % co nfide nce inte rva l (as de fi ned in thc t1r\t de finitio n) is the set o f all valu es o f 13 1 I1Mt a re nor rejec ted 3 1 the 5% significan ce level.:cpt /3r.96 X 0. TIlat is.•.1 + 1.. PI .8. in principiI. it fol l ow~ th a t the true va lue of {3.)}. l '= 0.
For example. 60.teache r ralio by 2 is predicted to increase test sco res by between 2. it is th e set of values of {3 1 [hal can no t he rejected by a 5% twosided hypothesis test . our hypothetical superintendent is contempl ating reducing the student. j( is conslTuCted as 95% confide nce inte rval for (3 1 0:: 5.3.f pos true rtCs. p .52.13) l SITlIC1.p.30 X ( . Ie val al th.1 2) c n r~ l icullce . inter so (as we knew already from Section 5. x. as Key 95% confidence in te rva l fo r PltU: = IPl <l x .1.we can construcl a confi dence interval for the predicted effect f3 1tJ.2 Confi dence Intervals for 0 Regression Coefficient 151 ~ 1  .teacher ra1io by 2.. Confidence intervals for predicted effects ofchanging X . or as littl e as . with a 95 % confidence level.teacher ra1io by 2 could be as great as . ~ x. ) x .alue of /3\ with a 95% probability: that is.( P11 ( 0. Equivalentl y.dx • 1.96SE(~l) ' the predictcd effect of th e chan ge !l x using this estimate of (31 is [~I .> x l. Beca use Ihe 95% confidence in terval for /31 is [.nle rvai fo r Ill .) x ~x..96SE(jJ I). + 1.1 ) the hypothesis III = 0 can be rejected at th e 5% significance level.d ~a im' t \ h l! \d SE. rii&'I(lf*'4 KEYCONCEI'T I ~sing has erval ~ . (5.3.26].2) = 2. Tht..96SE(P .1. it A 95% twosided confidence interval for {Jl is an interval tha t con tains the t rue \.l. or e int erval.96SE(~1 ) 1 X ax. . • .3 ).60 points.)]..1." CONFIDENCE INTERVAL FOR (3.96SE(P. kruct 5.96SE(~I ) I x !l x.52 a nd 6.3 "is l l!SI only lut' of IP. When the sample size is large.30. Thus decreas ing Ihe student . 96SE(P.2) = 6. the eHect of reduci ng the st udent. The other end of the confideoce inter val is ~ I + 1. but because we can co nstruct a confidence i.26 x ( . (5.. it contai ns the true "'" Iue of f31 in 95% of aU possible randomly drawn s amples.1.1 . .! predicted change in Y ASS O cia ted with this cha nge ill X is f3lux. The pop ula tio n slope f31 is unknown.x can be expressed as u3.52\. Thus a 95% confidence interval fo r the effect of chang ing x by the amount tJ.965£(&1) ' and the predicted effect of the change using that esti mate is + 1.: is then lng the Istcll is I e ran ge used !O iin tt:rval fidence interval for (31 can be used to construct a 95 % confi dence interval for the pred icted effect o f fI general change in X Consider changing X by a given amo unt. Because one end of a 95% confidence interval for III is ~l . The 95% con· n 3.
+ /I .. A binary variable is also called an indicalor \'ariablc or somet imes a dummy \·Hriable.or is Y. more compactl y. wbether a school dislfict is urban or rural ( = 1 if urban. thC:fI D. If the studt. Regression a n a ly ~is can also be used when t b ~ regressor is binarY.. then what is it? TIle be:.teache r ratio is less than 20: 1 if the stude nt. (5. For exa mple.141 The popuhuion regression model with D j as the regre5<.. 11..15) is 11 0 1 a slope. .3 Regression When X Is a Binary Variable The discussion so r<lf has foc used on the case that the regressor is a cont in uou'i.4. 1h . i = 1. 0 . . 1h district < 20 district <!: 2U.:nl. and il turn s OUt tha t regres· sion with a bin ary variable is eq uivalent to performin g a lIirf~ r ~nce of means analysis. (0 . To sec this.lhat is.15) becomes Y. X mig. lhal equals either 0 or I. .teacher ratio in 0· = { .Ieachcr rat io is high . Because D.l + II. h not tonti nuo us. (5. except Ihat now the regr~sso r is the binary variable D j .! way \0 inter pre t f30 a nd /3 t in a regression with a binary regressor is to co n~id e r . = 0 jf large). there is no " li ne" so it makes no sense to talk about a slope. or whether Ihe district's class s i7c is small Of large ( = I if small. 0 if the student. Interpretation of the Regression Coefficients llle mechanics of regression wi lh a binary regressor arc the same <IS if il is con · tinuous. suppose yo u have a variable 0 .:gression model with the con linuous reg resso r X .. .158 CHAPTU S linear Regression wi th One Regressor 5. is different . Thus we wilt not refer to /3 1 <IS the sloJX in Equa tion (5. 0 or 1. Th e interpretation of /31' however.. ::: 0 (11ld V I = I. = 0 if rural). = 0).l me as the r.0 and' Eq uat ion (5. the coeffi cient on 0i' l! /3 J in Eq uation (5 . because D j ca n ta ke on o nly Iwo \ <11· ucs. depend· ing on wh ether thl:! siud ent. == A.::: f30 + PJD. it j.1 51 This is the S.teacher mlio in .16) .: coeffid ent multiplying D J in this regress ion o r. variable. indeed. one at a l im~ the two possible cases. = 0 ir male) . 15): instead we will simply rere r to P I a~ Ih. as descrilxd in Secti on 3. whe n it lakes on o nly two va lues.ht be a worker\ gender ( = I if fe male.. (5. not useful (0 think of P I <IS a slope.
a regression o f the lest score inter Ine at a ti nle.14) Hypothesis tests and confidence intervals.035. = I) . £( Y. = 130 + 13 1 + II. If the two population means are the sa me. yields res/ Score = 650.15 ) r X" except :muous. (5. depend· = f30 + /31.111is hypothesis can be tested lIsing the procedure outlined in Section 5.40. As an exampl e. = I and f3'1 is the pop ulation mean of Yi when Di = O.8) ~ (5.15) is rem. (5.2. "" I) 'iable or it is ca n· at regrcs of nleans . is tile difference between the sa mrle averageS of Y. = 0) = /30: that is.2. (D .ID.1 Di = 0). Because f30 + f3 j is the popula tion mean of Yi when 0. 18) (5." Nay 10 again slthe student. ) = 0. Thu ~ the Dull hypothesis Ih. (1.3) ( l. the conditi onal ex pectation of Y..3 Regression When X Is a Binary Voriable 159 iouO US ~·~th a( Because E(ud D. to /31 as the ~ coctlicient Application to test scores.S. then 13 1 in Equation (5. Specifically.1. whe n Di = 0 is £ (YiI D .\ /J. /31 is the differe nce betwecn th e conditional ex pect ati on of Yi when D. Si milarly.teacher ratio binary variable D defined in Eq uation (5. = I and when Di = (). or /31 == £(Y. 17) TUTal :::n large ThUs. .. 10) . provides a 95% confid encc in terval for the difference between the I WO popula iio n means. R2 = 0.leacher ralio is low. is high.1I the two population mea ns a re the same can be lesled against the " licrna tivc hypoth esis that they diIIer by testing the null hypothesis f3 1 = 0 against the ahernalive (3. /31 is lhe d iffe rence between mean lest score in d ist ricts with low student. a 95% confidence interval for (3 \.::: 1). the null hypoth esis can be rejected at tht' 5% level agai nst the two sided altern ative when the OLS Istatistic I "" ~\I SI::(~I ) exceeds . In the lest score example. 'orker's Jf Y.{3o is t he popul ation mean valueof lest scores when the stude ntteacher ratio is high. in th c two groups. = I. In oth er words.E(Y.1. Ibm i!>. Similarl y. the difference ({Jo + f3\ ) .Jtion mean value of test scores when the student.14) esti mated by OLS usin g Ihe 420 obscn'ati ons in Figure 4. and in fact this is t he case. constructed as (3 \ :: 1. 13u + PI is the popul.96S£(/3 I) a" described in Section 5. when D. SER == IS. (5.0.7.0 + 7. it is rliy tWO val we will not .f30 = /3 1is the differ ence between these two means.teacher ratios and the mean leSt score in diSlricts wi th high studentteacher ratios.96 in absol ~te value. when D j = I . til. it makes se nse (hal the OLS e~timator f3. +. Because {3] is th e difference in the porulation means.
4 / 1. Is the difference in the population mean tesT scorl!S in the two grou~ stati.st a tistic on {31: ( = 7. lhe n the errors arc said to be homoskedastic. .. 5.4. and the risks yo u run if you use these .8 ~ (3.0 + 7..teache r ratios grea ter than o r equal to 20 (Ihat is. c rror term is h e l ero~ ke d as tic .:. The differcn.160 CHAPTER.04. Th is is 7. This exceeds 1.teache r ratios less than 20 (so 0 = 1) is 650. the coefficient on the stude nt. Thc error l~rlll is homoskedaS Iic' if the variance of the condi ti onal distri bution of II. . (11 = 0. is constant for i "" I.n and in particu lar does nOI depe nd on X..11\1 5 confidence int e rva l e xclude:. the simplified formulas (or the standard errors o f the OLS estimators that a rise if the errors arc homoskedaslic. I ..conditional o n Xi is Ihal11 ha!l a mean of zero (t he first least squa res assumption) . 10.teacher ratio binary variable D. OtherwisC". Th is section discusses homoskedasticity.0. given X. and Ihe average test score for the subsample with Stu dent.4 Heteroskedasticity and Homoskedasticity O ur o nly assumptio n "bou t the d isLribulion of lI.9.4 = 657'. IhC II. cally significantly different from zcro at the 5% level? To fi nd OUI .96 x 1.96 in absolute va lue. The OLS estima tor a nd its sta ndard error can be used to construct a 95% Con fidem::e inte rval fo r the !r ue differe nce in means.0) is 650.. Thus the average test score for Ih\: subsamplc with student. 5 lineor Regres~ion witn One RegresSOf" where the staodard errors of the OLS estimates of the coefficie nts f3 0 and {3t ilro.]Ill plificd formula s in practice. What Are Heteroskedasticity and Homoskedasticity? Definitions of heteroskedasticity and homoskedasticity. If.: between the sample average It:st scores for the two groups is 7. illr which D .. the I'orial/n' of this cond itio n.8 = 4. so tha t (as we know from the previo us paragra ph) the hypothesis f3 1 = 0 ca n be rejected a l the 5% signt ficance level. so thc hypothesis that thc popu la tio n mean test scores in districts with high a nd low MU dent..teacher ralios is the same can be rejected a llhe 5% sign ificance leve l.4 :!: 1.ti. This is the OlS estimate of fJ.. given in parentheses below the OLS estimates. 1distributio n does no t de pend o n X. fu rthe rmore. its theoretical implica tions. construct thc .9).
teacher ra tios is the same can be rejected at the 5% significance level. an d the risk.96 in absolute value. is constant for i = 1.s you run if you usc t h e~ sim plified ro rmu l a~ in pracLice.0 1herwise.."'0.':(\" .. .HI. so the hypothesis tha I th e population mean test SCa TeS in districts with high and low ~IU · dent. thai arise if the errors arc homoskedastic.onst ruct a 95% con fide nce inl(.teacher ratio binary variable D.: I) is 650.0.4 657. This is the OLS estimate of 13••the coefficient on the st udent.sta tistic on 131: t = 7. its theore tica l implica tions. 10.! given in parentheses below the O LS estim.0 . The OLS t!~l i m ato r and ils standard error can b~ used to(. subsample with studen t. for which D == 0) is 650. Thus the average test score for tho. de nt.H :.l lte differen . [he error term is heleroskedaslil. gi\ cn X.. What Are Heteroskedasticity and Homoskedasticity? Definitions ofheteroskedasticity and homoskedasticity.."\! betwee n the sample ave rage \es\ scores (or the two groups is 7..• 160 CHAPTER 5 linear Regreuion with One RegreMOr whe re the standard errors of the OLS estimates or th e coe ffi cients (ju nnd {J..teacher ra tios grea ter than o r eq uaJ to 20 (tbat is. co nstruct the ...04 . so Iha t (<IS we know from Ihe previo us parasxaph) the hypOlhesis 131 = 0 can be rejected :l tt he 5% significance le vel. Th is confide nce interval excludes /31 = 0. The error {al11 uj is homoskcdastic ir the va riance of the conditional distribution of II. 11 lind in particular docs not dcpend on Xi.9. . = ~ . .4 / 1. This is 7.\1 for the true differen ce in means. conditional o n X i is that it ha~ a mean of zero (the fir st leas t squares assllmption ). Is the diffe rence in the population mean lest scores in the IWO grou ps slatl'>\I' cally significa ntly d iffe rent {rom zero at the 5% level? To fin d out.4. the simplified formulas for the standard errors of the O LS estima tors. .. Like F 5.8 = 4.lIcs.4 ~ 1. If. (3.4.Ieache r ratios less than 20 (so D . 7. Th is exceeds 1. and the average test score for the subsu mplc with Slu.4 Heteroskedasticityand Homoskedasticity Ou r o nly assumptio n about the distrihution of u. .. furthermore...% x I. the l·ari(llU"t· of this condi tional distribution does not depend on X" tben the errors are said to he bo moskedaslic... Th is section discusses homoskedasticit y.9).
Ihi~ ~how1 !he c../J IX 96x 1.ld low stu lcvl. vor(u(XI. this d istributio n is tight.the QLS b\e D ps statisti lstruct the lue. in Figure 4.4.orM:liliOflOI di!. is shown fo r vario us values o( .tribution of lesl ~es 101' Ihree diffeeenl doss sizes. so that tile e rrors in Figure 5. / Y hmX .!L ~l 95 % con 64U 620 IU /3 0 .2 the variance of uj given Xi <= x increases with x. Becouse lhe voriance 01 1 the dislribution 01 vgiveo x. As drawn in tha t fi gure.kedostic . I y varia/l ce of ..8 =' )W from the ~igni{ic a ncc 6OO.>0 StlldcntIeac her ratio l C' Uke Figure 4. all these condi tional distributions have the same spread: more pre cisely. hetero. Ihe varia oce of these dist.. in Figure 5. For small val ues of x. re turn to Figure 4. il has a greater spread. that it ha:'o a As an illustration . so tbe errors illustrated in Figure 4.2 illustrales a case in which the condi tio nal dis tribuLion of Il. so the FIGURE 5.~f~~__~~~~__~ ~__~~~____~ J'i 10 25 . given Xi IS Jtherwise. bu t for larger values of x.'( 0/ f Ywl\en X= 25 . spreads o ut as x inc reases..t.4.4 Heteroskedasli<:ity and Homoskedosticity 161 10/31 arc ~ fo r the la\ is.2 An Example of Heteroskedasticity Test seore 72U 7()O XIS "1M"'''' 0 " 'hm) O_tioo '" 660 1.. The dis tribution of th e errors II . Figure 5. In contrast.ri bUlioos is the same for the various \'al ue~ of x. the .4 are homoskeoastic. u i.4. That is. Unlike Figure 4. t!lis is the conditional distribution o f IIi given X j x.. for \\'ith stu lifferellce .4. 0:= 0:= l[C said to be lica! irnplica tirnato rs that Ise these sirn 11C e rror tern 1 f II . tne:le distributions become more $preod out lhave 0 lorger variancei for lorger doss sizes. the conditional variance o f Iii given X. Thus.4. 20 01"" ". x does nOt de pe nd on x.2 are hete roskedastic. The definitions of heteroske dasticity and nomoskedasticity are summarized in Key Concept 5. Because tbis distribution applies specifica lly for the indicated vaJue of x. depends OIl X.
To hc lp cl a rify them with an e xam ple. "'Jh e G ~ nder G ap in E a rni ngs of College Grad uales in the United Slates. be a binar) va riable Ihal equals 1 [or male college gralJua tes a nd equals 0 for female gradu a tes. var{lIi l Xi = x).. requ ires think in!! hard a bout what the error term act ually is. Because Ihe regressor is hi na ry. (5. IIi is the deviation of the /. 11... and in particular doe!' not depend un x. . is the va ria nce ot' the e rror te rm the same for me n a nd for wume n':' If so. .. we digress from the stude ntteache r r. = f3(1 + {3 \M A L E i + It. = f30 + Earnings. frO Ill the pop ulatiolllllcan earnings for me n (f3o '..te roskcdastic. it is use ful to write Equa tion (5.lt ia/lest score problem and in stead re turn to t he example of ea rn ings of male vcr sus female coJlege graduates co nsidered in the box in Chaple r 3.h woman's ea rnings from the popu l. u.in this case. 1~) (or i = 1..4 . is homoskedastic if the varia nce of the conditional distribution of IIi given X . (wome n ) a mI (S. . O therwise." Lei MA LE. In ot he r wo rd~. = II . These terms a re a mouthful a nd the defi nitions might see m a bstract. the error te rm is he. He re the regresso r is MA LE. In this rega rd. .1 9) as t\\·o sepa rate equations.20) t30 + f3 1 + II.f3 1)' Jt foll ows tha t the: . (or wome n. . (5. is (.onstalll for i = 1. 11 . s t at e~ (h a t the va ria nce o t II . (Jl i!> the diffe rence in the popu lation means of th e t wo group ~ . so at iss ue is whe the r the va riance o f the e rror term de pe nds on MALI:.. does \lll t depend on the regressor.. (m en).162 CHAPTER S Linear Reg re~sion with One R egressor KEYCONcm HETEROSKEDASTICITY AND HOMOSKEDASTICITY The error te rm If. . the e rror is homoskcda ' tic: if not . Deciding whet her lhe vari ance of " i de pe nds on MALE.1' li on mean earnings for women (l3Q )' and for me n. 5.".ll ) Thus. one for me n and one for wome n: Eamiltgs.h ma n'~ earning!. it is heleroskedastic. 'l11e bin a ry \'a riable regressio n model re la ti ng somcone 's earn ings (0 his or her ge nde r is &millXs. Example.. t he differe nce in mean e a rn iTl g~ betwee n me n and women who graduated from college... is the de viation of the /. The definition of homoskcdasti cit).
1 ab!.. they apply to both the general case of heteroskedas· ticily and the special case of homoskedasricity. docs not whether the variance of omoskedlls res thinking . homoskedasticityonly estimator of lhe va riance of {J l: ~ v'iif. and asympto tically normal.l n addition. If the least squares assumptions in Key ~o llceel 4.22) (X. .4 simplify. p . (5. conditiona l on XI". Therefore.S. then the formulas for the variances of ~o and ~! in Key Concept 4. In the special case thaI X is a binary variable.3 hold and Ihe errors are homoskedastic. the error term is heteroskedastie. Mathematical Implications of Homoskedasticity The OLS estimators remain unbiased and asymptotically normal.3 place no restrictions on the condit iona l variance. "the variance of e arni ]]g~ is the same for men as it is for women. Consequentl y. in this ~Xal11 p k:. ••• • Y" and are unbiased.wri te Equa 'n : Homoskedasticityonly variance formula .1Y) Efficiency ofthe OLS estimator when the errors are homoskedastic. "th e variance of II. which is called the GaussMarkov theorem.X) n the popula . the • .tract." In otlleT words.4 Heteroskedos~city and Homoskedas~cily 163 lribution does not statement. the t:~timator of the variance of P I under homoskedaslicity {thaI is. then there is a specialized formula that can be used for tbe standard e rrors of ~o and ~l' The homoskedaslicityonly standard error of ~l' derived in A ppe ndix 5. 1 the popu HI earn ings H.. then the OLS estimators 130 and {J I are efficient among all est ima lors that are linear in Y j .f the i!h man'" IlIow!. . t_teacber male ver :nder Gap e a binary ale gradu tS to his or (5 . is discussed in Section S.211 2 a = ~J .L. X. Whet her the errors are homoskedastic or he teroskedastic." is eq uiva lent to the st atement. if the errors are homosked astic. that tht: where s~ is given in Equation (4.2 (homoskedasticityonly) . Because the least squares assumptions in Key Concept 4. The homoskedasticity·only fornlul a for the st andard error of ~o is given in Appendix S. consistent .20) (5. the OLS estimator is unbiased. does not depend on MALE.1.5.1 .t he e rror term is homoskedastic if the variance of the population distribution of earnings is the same for men and womcn . This result.19). If the error term is homoskedas tic. i =i ~ J" .' . if these vari ances differ. the OLS estimators have sampling distributions that are normal in large samples even if the e rrors are ho moskedastic. the OLS estimators remain unbiased and consiste nt even if the e rrors are hOOloskedastic. (5. is SE (Pl ) = where ij~ is the p.
..women were not found in th e toppaying jobs:There ha ve alw a~~ been poorly paid men. if the error~ are hi:!lcroskedaslic but a confidence interval is constructed as == 1. give n in Equation (3. but there have rarely been highly paid womc n.J on thoSt: standa rd e rrors a re valid whether o r not the errors a rc hd eroskcdastic. heteroskedasticity or homoskedosticity? ·lllC answe r to thi ~ question depends on the application. Simil arly. As the name sugge.· clues as to which assumption is more sensible. Because the swndard errors we have used so far (i.• ... In conlnlSt. if the errors !Irc hctcr<)skedaslic. to a le ~ ~t:r exten t. cven in large snmpics.e. Familia rit). and (jJ" of the va riances o f PI and given in Equa tio ns (5 .4) and (5. Specifically. In other words.sts. In fact. Thus hypothesis tesLS and confidence intervals ba~e t. today.. even in large ~a m p l es. then the {statistic computed usin!! the homoskedaslicityonly standard error does not have a standard norm al distIi hUl ion. and White (1980). CHAPTER. they are also referred to as EickcrHuberWhite stan dard errors.. For many yearsand . the variance of the e rror term in Equ a .23). those based on Eq uat io ns (5. then t he homoskedastici tyonly standard errors nre inappropriate. be(. lJu b~r (1!J67) .4 ) the est imators and (5. UJI #ll What Does This Mean in Practice? Which is more realistic. thaI are valid whether or not the errors are beleroskedastic. i.Cluse ho moskedaslicity is a special case o r he te roskedaslicily. so those (. wi th how pcopk are paid in the world around us gives ~O Oli. However. Because these alternative formulas are deri ved l or the special case thallhc errors arc homoskcdaSlic and do not apply if the errors are hClcroskcdastic.n general the probability th at thi s intervid contains the Irue va lue of th e coefficicnt is not 95%.5 linear Regression wilh One Regre5S01" square of the standard error of ~l lLndcr homoskedaslir:ily) is the socalled pooled variance formula for the di ffe rence in mea ns. Because such fo rmulas were proposed by Eicker (1%7). Th is sug gests that the distribution of earnings among women is ligh ter than among men (See the box ill Chapte r ). the correct critica l va lues 10 u~e fo r thi ~ homoskeda sticityon ly (statist ic depend on the precise na tu re of the het ct\1skedilstici t y.'ll in ference. if the errors are het croskedas tic.ritical values cannot be tabulated. the i"slles can \:"It' clarified by returning to the example of the gender gap in earn ings among collcgt: graduates. they will be referred to as the " homoskedaslicityonly" formulas for the variance and standard error of the OLS esti mators. "111e Gender Gap in Earnings of College Grad uates in the United States") ..26)] lead to stalislic.they are called heteros ked ast idtyrobust sian da rd errors .96 homoskedas· ticilyonly standa rd CHors.26) produce va lid statistical infere nces whe the r the errors a re hel e roskcdast ic o r homoskcdastic.
rrors n::::q uin's analyzing data.i vcssome to a lesser a ve alw a ~ s 1. on 'ty ? The les can lx ng colkg. so ans we ring it : <. .\1 'I! and 'C het pri atc.3.43: and for workers with a college degree. For workers with len years of education.61. BUI if the beslpaying jobs mainly go to the col lege educated. lbe first is that the mean of the distribut ion of earnings i~reaselt with the num ber of years of education.3 is that tbe spread of the distribution of earnings increases wi th the years of r:duea tion . but some wili . workers . The 95% confide nc¢ interval for this coefficicnt is 1.77. I L.4 Heteroskedaslicity ond Homoskedosricity 165 loled the ·. The coefficient of I 47 in the OLS rcgr es. slItndard deviation is $7.33 to 1.ol1. (0.u es \\i ll he earning S5OIhour by the time they Art: 29.47 ::!.'lbe data come from the Ma rch 2005 Cu rren t Population Survey.23) !lile sIan This line is plotted in Figure 5.07. 'Ibis increase i:. . ages 29 and 30. hourl~ earnings increa5c. summari7ed by the OLS regression line. .5 .e FIGURE S ....mong ml!\l radua h.11 .ith more education have higher earnings Ihnn workers ..:' ill n in E qu a '" ~ I .47 for eacb additional year of educatio n. no t ali college grudu. .~ion line means tbat .ilh le~ educa average.the regrc~s i()n error~ are hete roskedastic. 1. 01 education H" lor 2950 full·ti me.to 3OYear Olds in the United States in 2004 ?.4) HC hel ntervals are hcl bascdon 1ethc r or ). very fe w workers wiLh Jow li:\'c!S of education have highpaying jobs. 8.. Figure 5... The spt"eod arOUnd !he regression line increases wit/..1. Because thcsc stAnda rd devillliolb differ for diffe rent lncls of education.~7Years Educm. [t11 only te n years of education ha\'C no shot at those jobs. they .3..3 has two striking features. til 15 "'''. with between 6 and 18 ~ears of education.130.nd.cClling that rhe regreuion erren ore hel~kedo~tic . by $1. the standard deviation of Ihe residuals is $5.. While some ".. this laSlieilY· (5 .llnd worke r. !he yean of education.: of Figure 5.. S£R . this standard deviation increases to $10. distri or this O n average. which is de~ibed skcdas a ins the arou nd the OLS regression line. for workers wi th a high school diploma. The second striking fr:ltl ur. This Ho urlr cUlli ng. Docs the dis tri bution o f earnings ~pread 1. In realworld terms. the variance of Ihe residuals in the regressio n of Equation (5..07) Rl = 0. Figun: 5.23) depends on thc value of the r<:grc)sor (the years of educlltion): in other WOrds. r~ or ~u n"jon 20 . H uhe r lOS in A ppendix 3.us! st an ~ "" . or \ using tion..96 X 0.3 Scatterplot of Hourly Earnings and Years of E ducation for 29.B + 1. 29· to 3Q·yoor·old 'H(lrkers.. Hourly &arnings are plotted against year. il migh t n l~o be tha t the spr('u(1 of the distribution of earnings is gl't"ate r for workers with more education.93) (0.78. orkers with many years of education ha\'~ lowpa~ in g jobs. This can be stated more precisely by lookin!! at the spread of Ih ~' rc ~ idual ~ IC' hel out as education increases? This i ~ an empir ical q uestion. s ug.3 is a scanerplot of the hourly earnings and the Ilumherof years of edu· cation (or a sample of 2950 fu lltime workers in the United States in 2004. ".46.II!illl . (5.
. then you should use the mo re re liable on. Unless there are compelling reasons 10 the con· nary. is a lways to use tht' hete roskedaslicily. It therefore is prudent to aSs ume that the errors mi ght be heteroskedastic unl ess you hu ve com pelli ng reasons to believe o therwise. so it is up to the user to s pecify the option of heteroskedaSticityrobust standard errors.·. 10 the Ii~t of lea~1 squarcs assumption '!. ho.Jiscussed.:li releva nce in this disell" sion is whet her one should use lleteroskedasticityrobust o r homoskedaslieityonly standard errors. J( the homoskedas.and we can think of noneit mak es sense to treat the error term in Ihi. economic theory rarely gi \'e~ any reason to believe \hal the errors are homoskedastic.ticity.:s that a llow for hctc roskedas. the presence of a "glass ceiling" for women's jobs and pay suggests that the error term in the binary variable regression model in Equation (5.omen is plausib ly less than the variance of the error tenn In Equation (5.s hcteroskedastic. Th e main issue ot practie.20) for ".5.ilh OIher I C ~I~ it might he helpful to TWtt: Ihm 5(lme le\I' booh add homO$ledo)hCII).. it is usefu l to im agine compu ting both . is consistent.robu. example as heteroskedastic. Tn thi!> rega rd. The de tails of how to implemcnI bctcroskedasticilyrobusl standard errors depend on the software package you u.:Iri anec th at is in versely proportional to n.robust standard errors.uIIC'I)robu~1 )tandard crmrs are U50ed.nd is not used in I.21) for men.~r.166 CHAPTER 5 Linear Regression with One Regressor tion (5. The simplest tbing. . \. has a \'. As this example of modeling earnings illustrates.ticityonly and he tc roskedastidlyrobusl standard errors are the sa me. and has a nonnal sampling dislributhln lI n casc: Ihis book . then c hoo~ ing be tween tbem.ter chap1en. Practical implications.:cd.t9). Thus.1 used In conjunction .t he n. many software programs usc the ho moskedasticity·only standard e rrors as their default se lling. ibIS iC!CIIOn IS optional .. For tlistorical reasons. Ih. All of the empirica l examples in this book em ploy heteroskedasticityrobusl standard errors unless explicitly staled otherwise.$ addit lonlll IIs~umpuon is nOI needed for the validil)' of OLS regression analysis I" lon~ J' hctemd. I '5..... howeveJ.c. hcteroskedasticity arise!> III many econometric applicalions. AI a general level .."t standard errors: if they diffe r.. A ~ just .. nothing is lost by using the heteroskedaslicit).5 The Theoretical Foundations of Ordinary Least Squares As discussed in Section 4. the OLS estimator is unbiased.
3 ) hold and if the error is homoskedaSlic. X" bul not on Y1.25) hold~ (it is . summ ari zed in Key Con cept 3. is P l' That is. X". •• ·. . however. the OLS estimator is the Besl Linear conditio nally Unbi ased E stimatorthat is. X". conditional on X l" ...24) (it is linear) and if Equation (5.· . that the sample average Y is the most efficienl discus ty only choos· robust 'robust )le oneS usc the estim aLOr of the population mea n among the cla ss of all estimators Ihm ure un bi ased and are linear fun ction s (weighted averages) of Y l •• ••• Y".lbe section concludes wit h a discussio n of alternative esti mators that are more efficien t than OLS whe n lhe conditions of the GaussMarkov theorem do no t hold. yrobus t Linear conditionally unbiased estimators. Yw'rlle estima tor ~ I is conditionall y unbiased if th e meiln of its condition al sam pling dislribut ion. ((31 is linear). given Xl' . then the O LS estim ator has the smallest variance of aU conditionally unhiased estimators that are linear functi ons of Y\ . if the least sq uare:. assumptions hold and if the errors are homoskedastic. conditional on Xl..r) ()..24) .tributiofl where lhe weights 01 .l l~ ~l /31 ([3J is conditionally unbiased).= 1 (5.as a van . 5 as lonF. under certain conditions the OLS esti· mator is more efficient than some other candidate estimators. if ~l is a lin ear estimator.. which is a consequence or th e Gauss·Markov thcorem. Tn other words. l11e class of linea r conditionally unbiased es timators con sists of all estimators o f f3J that are linear fu nctions of Yl .. . This result extends to regression the result . jobs :lei in : con n th is i<:es in I gives lent 10 )ellin g 167 when the sa mple size is large.. Linear Conditionally Unbiased Estimators and the GaussMarkov Theorem If the three least squares assum ptions (Key Concept 4..• X". III <ldd ition.s. YII" This section explains and discusses this result.. it is BL UE. s The Theoretico! Foundations of Ordinory leosl Squores m In . lhen the OLS est'im ator has the sma llest variance.. ' a'l can depend on Xl •... then it can be wrillen as (31 = fO iY.. That is. .as The esti mator (3\ is a linear conditionally unbi<lsed estimator if it c<ln be writ ten in the for m o f Equation (5. the esti mcllor lJl is conditionally un biased if E(~ ll Xl··· · . Xn) = t somf. .. Specific<l lIy._) d. among aU estjmators in the class of li near conditionally unbiased esti mators. • Yn and that are unbiased. .3. :ityonly :!oplion plcment you usC.. .
• . then OLS is HLU E. the OLS estimator is BLUE.5 If the thrce least squares assumptions in Key Conccpt 4. some regression estimators arc mo re effici c nt than 01> l . A n alte rnative to OLS whcn lht:fe b het· eroskedasticity of a known form . Regression Estimators Other Than OLS Unde r certain conditions. l condi tionally unhiased).3 hold alld if errors are homoskeda.168 CH APTE R 5 linear Regr~sion with One RegrElS50r KEVCONcm THE GAussMARKOV THEOREM FOR ~1 5. but it does mea n that O LS is no lo nger th e efficien t lin car condition<l lly unbi ased estimator.2. In part icular. Howeve r. The second limitation of the Ca ussMarkov theorem is that e ve n if the Clm" diti on:. Limitations ofthe Gauss Morkoy theorem . The GaussMarkoy theorem. then the OLS estimator ~I is the Best (mo:. 111e Ga uss·Markov The nrem stales Ihat. X n • of all lincar condit ion ally unbiased estimatOrs of fil: that is.2 tha t tilt: O LS estim ato r is li n ear and condi tionally unbiased.t efficient) Linear con ditionally Unbiased Estimator (is BLUE).5 amJ prove n in A ppe ndix 5. I he GaussMarkO\ conditio ns. which a re stated in Appendix 5. the theore m has t\\O important lim itations. The GaussMarkov theon.Ir~' mo rc efficient tha n OLS. Conscqucntlr. As discussed in SecLion SA the prt':\encc o f helc roskedasticity does not pose a th reat to infe rence hased on hcte ros k ~d. called the weighted ICilst squa res estinHII OI. ll1e Ga ussMarkov theorem i!ro sta lt': d in Ke y Conce pt 5. the re are other cand ida te estimators tha t a re not lin ear a nd conditionally unbiased. undtt a se t of conditi ons known as lhe GaussMarkov condi tions. under some condi tions. given XI •. the OLS estim ator /3 1 has th e s m{lllt:~t cond it ional variance. these othe r estima to!' . I ' discus$ed be low.m~ implied hy T he three le a~ t squares assumptions plus the assumption that the e rrors nre ho moskedastie. It is shown in A ppend ix 5..'it ic.:m provides a theoretical justification fOT using OLS..2..ls" ticit yrobust standard e rrors. . First. if the three least sq ua res assu mptions ho ld and the e rro r~ ilre homoskedtlstic. of Ihe theorem hold. if the e rror term is hete roskedasticas it oCi e n is in economic a ppiica lion'ithen the OLS estima to r is no longer BLUE. its conditions might not hold in practice.
then the OLS if the CO Il nal are nol lin Yell estimator" tlrC' "111is secli()n is oplional lind is nO! used in la\cr c hapICrs. in II •5.ussMarko v east squares bnsequentl\'. One such estimator is the least absolute deviations (LAD) esti mator. so use of the LAD estimator.re ho moskedastic. is het S estimator. and Ihe regression errors a re normally distributed . If the errors arc heteroskedastic. except th at the absolute value of the prediction "mistak e" is used instead of its sq uare. 3. In many economic data sets.jnear con DalOr is lin IhaLunder 's(imator /31 :mditionalh . If the nature of the hClcroskedastic is knownspecifi ca lly. i~ than i~ OL$. 'alions.6 Using the IStonsnc in Regression VVhen the Sample Size Is Small 169 The weighted least squares estimator. lerrors are . given X i' Because of th is weighting. the regressio n e rro rs a. the OLS estimator can be se n ~ iti ve to outliers.6) . then ot her estimators can be more eHicient than OLS and can produce inferences that are more reliabl e. when a pplied to the weighted data. As discussed in Section 4. If extrt!me outliers are not rare. the practical problem with weigh ted least squares is that you must know bow the cond itional variance of II. this estima tor is less sensi tive to large outliers kov theorem )rcm has tWO In particular. ~das ti c. In practice.then il is possible 10 construct an estimator that has a small er variance than Ute OLS estimator. ca lled weighted least squares (WLS). if the conditional variance of u j given X j is kn own up to a constant fac tor of proportionahly. in which the regreSSion coefficients (3() and {3. so OlS. is BLUE. the least absolute devi a tions estima to rs o f ~. then rep! 5_5 and The least absolute deviations estimator. how e\'er. the exact distribution of the Istatistic is compli cated and depends on the unknown popul ation distribution of the data.6 Using the tStatistic in Regression When the Sample Size Is Small When the sample size is small. If.something that is rarely known in applications.then the presence Ilcte roskedas :f the efficient 'n there. is uncommon in applications. severe outliers in II are rare. . the Ihree leasl squares assumplions ho ld . then OLS is nO longer BLUE. Although theoretica ll y elegant. • . Thus Ihe tre<ltment of lin ear regression throughout the n:mainder of this text focuse s exclusively on least squares met hods. weights the i lh observation by the inverse of the square root of the condi tional va riance of u. are obtained by solving a min im iza tion like that in E quation (4. depends on Xi . This method. or other estimators with reduced sensitivity 10 outli ers.:' lI Y. the e rrors in this weighted regression arc homoskelias tjc.bo Pu aod PI are the values of ho and bl tha t minimize blXil. That is.5.
u) h.. • X" Isee Equalion (5.IIl!. conditional on Xl' .. the Istatistic computed usi ng the homoskedasticityon ly standard emIr can be wrille n in this form . if the h~V populatio n d is tributio ns are norma l with the same varia nce a nd if the Islat lslir i~ constructed U Sing the pooled standard e rror form ula IEqutl tion (3. \\be n X is binary. It fo llows th at the resull of Section 3.t of testing (o r the eq ualit y of the mea ns in two samples.4 lhul the Sludell( ( distribution with III degrees of freedom is defi ned to bt' the distrihution o f Z / v'W I m . The t·Statistic and the Student t Distribution Rccull from Section 2.a re collecti vely called th e homoskedastic normal regre. In addi· tion. Be catl~c a wc i ~hl ed a verage ot inde pendent normal random varia bles is no r~ a l ly distrit'l 0'1 uled. the horn os ktd a~ ticityo nly standard erro r fo r P t simplifies to the JXlolcd sta ndard error formul a {or tbe diffe re nce o f means. divided by n . In that proble m. cussed in Section 5. P I has a norm<ll dis lrihulio n.(3 1.. A ') di. Under the nul\ hypothesis..111US (/31 .P I n) h~I" a normal distributio n under Ihe null hypothcsis.. th e weigh IS de pend o n XI " ..• 170 CHAPTER 5 Linear Regr~sion with One Regres5Ol'" es tima tor is normally dislribuled and the bomoskedasticityon ly Istatistic has a SlUden! I distr ibution... Use of the Student t Distribution in Practice If the regress ion c rror~ arc ho moskcdastic a nd norma lly dist ributed and if Iht' homoskcdastici tyon ly lsUltist ic is used . lhal the e rrors aT C homoskedastic.:d in Eq ua tion (5. the (normalized) homo~kcdas ticityonly variance eSlimtllOr has it c hisquared d istribution with 1/ .'~ Exerci se 5. the homoskcdaslicityonly r· stati~tic hu~ a Stude nt I distribution with n ..' wh ere i ~ uefint.. These fj.'!1 the homoskedasticit yo nly regression rSI::J lislic has a Studenl {di stribut ion (:..5 in the cont!. ' X". Conseq ue ntly. This result is closely related to a result discusS('d in Secliun 3.the th ree least sq uares a~!\u mp · lions.2 degrees of fr eedom.5 is a speci(ll . Under the homoskedastic norm al regr. cond itional on XI "" . The homoskedas lidt yonly {statistic testing fJr ~ f3 10 is I = (~l .j1.10). conditio nal on X l.. if the hOllloskcdaslic normal regression assumptiom hold. and iT3r and PI arc indcpcndenlly distributed . where Z is a rnndom vurt. .'c assumptions.. Y has a norma l distri bution. Yn • whcu.). thcn the (pooled) Ista tistic has a Student {d iSlribmion. the n critical values should be taken frtHII ' I . .i'~ o f the result Ihal.. and that the errors are normally distrib· uted.tble with a s tn ndard no rmal distri bution.I. hUlian wi th 1M degrees of freedom. X".. and Z and Ware independent..22).5.~ sio n ilssllm p li on ~ . lV is a random variable with a chisqua red dbtrj ..:s s ion ass umptions.2.32) in Appe ndix 5.23) ). Xn . thc OLS estimator is a weighted a verage of YI.21.2 degrees of freedom.
Ye t.:ore data st't. by fi rst computing heteroskedaslicity. TIle coefficie nt on the slUde ntteache r ratio is statistically significnntly d if terent from 0 at the 5% significance level. th e proba bi lit y of doing so (and of obl<lining a [sta tistic on f3! as large as we did) purely b)' ran dom variation over potentia l samples is exceed ingly ~m all . O!dom : wi th diSl ri the St udent I distri bu tion (Appendix Table 2) instead of the standa rd nor mal dis trihut iOll. nlls represents considerable progress toward answering the supcrintendent's question.7 Conclusion Re turn for a momenl to the problem Ihat started Cha pler 4: the su p"ri ntende nl who is considering hiri ng addiLionalteachers 10 cut the st udentteacher mt io. .teache r T atio and teS1 scores.quared j ~L are istic has context the t WO atis!iC is then the oskcdas for mula ~da l case old. inference can proceed as dcscrihcd in Sect ions 5. h)'pothcsis tests. on average. increase sco res? )) Iii~" t"!gres A.2.ht be 0. this disti nct ion is relevant only if the sample size is small .that is. Because the difference between the Student I distribution and the nor mal distributiun is neg.S.ligible if 11 is moderate or large. TIli s corresponds (0 moving a district at the 50 110 percentile of the dil. however. t e ~t scores that arc ·1. Bu t does thi s mL'"an thilt red ucing the student.30 :s. highcr test scorcs.3. Thcre is a negative relat ions hip between th ~ student.1 Conclusion 171 las a Lmp .robust standard errors. Howeve r. then . based on the 420 observations for IY9R in the Cali fo rnia test :. In econometric application s.lion (sec find if tlW ~ken tronl • . where ·ecaust distrih 1 t. A 95 % confidence in terval for f3) is .uib 0 115.tribution of test scores to approximatel y the 60 th pe rcenti le.e null I error 5. (3) 5 . The coefficie nt is mode rately large.. a pproximately 0. but is {his relationship necessarily the causal one tha t th e superi ntendent needs 10 make her decision ? Districts with lower ~ tudent t e<lc hcr ratios have.'fll e population coefficient mig. and we might simply have estimated OUT nega tive coerricien t by random sampling \'ariation.S dis .00 ] %. on av era ge. in fact.26.1. there is rarely a reason to bel ieve that the errors are homoskedastic and normally distri buted .(. Because ~amp l e sizes typically are large. What have we learned that she might fi nd useful? Our regression analysis.teacher ratio will..Q) has Inaddi . a n<lgging concern.teacher ra lio and test scores: Districts wilh srnaU tes t scores.6 points higher. and thcn using tht! standard nonu<ll distri bution !O compute p values. re mains. in a practical sense: Districts with 2 fewer students per teacher have. and coufidence intervals. showed t ha t there was a nt' g<lti vc relati onshi p between er classes have higher the slUdent.1 and 5.
in fact. reason to worry tha t it might not. aft ~t al l.teacher ra tio is a consequence orlu rge cl as~t:~ being fouml in conjunction with many other fa ctors that are.96 stand ard errors.0" group and the "X = 1" gro up. in millly cases. Like a confid e.112 CHAPTER 5 linear Regre~on with One Regressor There is. These o ther faclors. these immigrants lend to be poorer than Ihe ovenlil population and .teac he r ratio alone would nOl cha nge these other factors Ihat determine a child 's pe rformance al school. Ihe real came of the lower tes t swres. Summary I. it could be misleading: C ha nging the student.. including better facilities..so wealthier school di!itricLS can hetter a(ford smaller classes. in faci. H ypothesis testing fo r regression coefficiems is analogous to hypothesis testmg C or the population mea. It thus migh t be that Oll r negative estimated re lationship between test scores and the student.Ieac her ratio.lndeed. a 95% confidence interval fo r a regression coefficien t is computed a~ th~ estimator ± 1.0 estimate and test hypolh. Ihe regression model can be used 1. That method is mUl ti ple regression analysis.. To address this problem. For example.:d va ria hies. Hiring mare teachers. . holding Ihese other /oclOrJ COlfSlal/(.i.costs money.nce int erval for the popul ation mt':all. When X is hi nary.or "o mi th. 2. students at wealthier schno ls tend themselves tn come [rom more affluent rami lit:s." could mea n that the O LS anllly<.:' ses about the difference between the population mea ns of th e "X . done so far has linle va lue 10 the supe rintendenl. and thus have othe r advantage!> not directly aSSOCiated with their school. newer books. California has a large immigrant community. we need a method that will nllow us to isolate the effcct on tt::~ t scores of changi ng Iht:: stu dent. and betterpaid teachers. lht:: to pic o f Chaplc r 7 and 8. their children are not native English speakers. Moreo\Cr. B ut students at wealth ier schools also have ot her advantages ove r their poorer neigh bors.n : Use th e (s lat i ~ tic to calcula te th e pva lu es and either "crept or rej eellht': null hypothesis.
The diffe rence between the Studelll l distribut ion and the normal distribution is negligible if the sample size is mod erate or large. if the regression e rro rs are homoskedas lic. = . For end to 3. the n.statistic computed using homoskedasticityonly standard errors h as a Student I distribution whe n the null hypothesis is true. the OLS estima tor is BLUE. that is. the variance o( III at a given va lue of X" var(uilX. Homoskcdasticityon ly standard errors do not produce valid statistical inferences when the errors are helerosk edastic.fl" f Bu! )cigh. as a resu lt of the GaussMa rk ov theorem. fac tors Key Terms null hypothesis (150) twosided alternati ve hypothesis (150) standanJ e rror of (151) Ista listic (lS I) pya lue ( 151 ) confidence interval for f31 (156) confidence level (JSh) indicator variable (158) du mmy vari able (158) coefficient multiplying variable 0. then the OLS . S. (15S) he te ros kedasTicil y and homoskedast icily bomoskedaSlici (yonly stand ard errors P. In general Ihe error Ui is hcteroskedasticthat is.lX. I f the three least squares assumption hold and if t he regression errors are Ire not onship classes I cause r nal~sis ~'e need It he stu ~u 1t iple ~ adlOg: horooskedastic.1") depends on x.and if th e regression errors are normally distributed. 4. If the th ree le aST sq uare s assumption s hold.( 158) coefficient o n D. A specia l case is when the error is homoskedas lic. = x} is constant.Key Term$ '73 I. but hetc roskedasticityrobusl standard e rrors do. vaT(l/ . (163) be lcroskedasticityrobust standa l"d crro r (164) best linear unbiased estimator (BLU E) (168) GaussMarkov theorem ( 168) weight ed least squares (169) homoskedastic normal regression assum ptions (170) GaussMarko v conditions ( 182) Is testing either D putatioO cd as the rd IhypothC ~roup (lnd (160) • . fa mi :>1. ~ovc r.
.06.6 is cont ained in the 95 % confidence inle rval for fJl' d.. T~!ifSCO'~ ..... (20.4 . /3.23) (0. b.leol variables? 5. Exercises 5. P rovide a hypothetical empirical example in which you think the errors woul d be heteroskeda"l ic. R~ = O.d. Construct a 95% confide nce interval fo r 131' the regression li lope cod fici e nt. determine whe ther . i = 1. Without doing any additional calculat ions. using wage data on 250 randomly selected m. 5. SF. = O. sel of obscrvitlions ( Y X . l 174 CHAPHR S linear Regr~ion with One Regre~sor Review the Concepts 5.4) (2 .5..0. set of observations Y. R7 . using data on da ~ size (CS) and average te(t sco res from IOU lhird grade cJas::..1 O ut line the proced ures fo r computing the p value o f a t wosided lest 0 1 H rj..52 + 2.2 Suppose that a researcher..).ulue of a twosided lest of Ho: f3 1 "'" 0 in ~ n:grcssion model using nn i. Outline the pru· cedures for comput ing t he p ·\.R .i.i. n.12 x Ma le. Do you reject the null hypothesis a t the 5% le . II.2..520.idty and heferosker/aslicily.i = 1. . Wh~ 1 arc the dependellt and indcpem. Calcula te the p va lue for the twosided tesl of the null hypothesis Ho.. al ue for the twosided test of the null hypothesis Ho: /3 1 = 5.. My = 0 using an i.. .\·ER (.11.es.el? c.3 Define hQmosketlol'.el? At the 1% le .2 Explain how you could use a regression model 10 estimate the wage gentkr gap using the data on earn ings o f me n a nd wome n. and expl ain you reasoning.lk workers and 280 fe male worke rs. "Wag(: = 12.d . . estimates tbe OLS regression.08.21) II.81  x CS... Ca lcula te the p ·. eSlimales tbe OLS regression.5.1 Suppose that a rescarcner..5. Construct n 99 % confide nce inter. " 5.nl for f3n.0.36) = 4.
Exercises 175 .R1.ssion esti mates calculated from this regressio n?  Wage  + .23) to answer the followi ng. a. How much is this worker's ave rage hourly earnings expected 10 increase? • . A high s< :hool graduate ( 12 yea rs of education) is contemplating going to a communi ty co llege fo r a two· year degree.99. a variable th at is equal to I if the person is fema le and 0 if the person a male. 5. A regression af weigh t on height yields ~ ~ . In the sample. A randomly selected 30yearold worker reports an education level of Hi years. . What are the regre. what is the mean wage of women? Of men'! cal r' test c. Use the regression reported in Equation (5.31) = r f  lO. == x Female.94 x Height. Construct a 95% confidence interval for the gender gap. Rl = 0. [ I whe re Weighl is measured in pounds and H eigh. d.15) (0.SER (2. but regresses Wages on Fema lt!. Anot her researcher uses Ihese same dat a. SER ~ _ _ _.4. Is Ihe es timated gender gap significant ly differen t (rom zero? (Com· pute the p . is measured in inches.) l\ :c:r m' c.f " where Woge is measured in SJhour and Male is a binary variable that is equal to I if the person is a male and 0 if the person is a fema le. A man has a JUIl. CoO$lfuct a 99% confidence interval for the person 's weight gain.2.3 Suppose that a random sample of 200 twentyyear·old men is selecled [rom a population and their heights and we ight s ilrc recorded. a.4 Read the box "The Economic Value of a Year of Education: Heteroskedas· licil y or H a moskedaSlicity?" in Section 5. What is the es timated gender gap? h.41 + 3. Define the wage gender gap as the difference in mean ea rnings between men and women.5 inches over the course of a year.val uc for testing the null hypoth esis that there is no gender gap. What is the worker's expected ave rage hourly earnings'! id mall! b. R .: grllwth spurt and grows 1. 5.
Do small classes improve les t scores? By how much? Is the effect la rge? Explain. Co Construct a 99% confidence interval for the effect of SmaIlCla..6 Refer 10 the regression described in Exercise 5.ression evidence? What range or values i<. b.stic ! Expla in .3). .SER = 74. 5.0 + 13.sion with One Regreuor c.) Su ppose Ihal .6.2. b.I binary variable equal to l if the stude nt is assigned to a small class a nd equal to 0 ot herwise.26.3.5 10 the 198Os.ln! sample of size 11 .. and given s ta ndardized tests at the e nd of the year. (Regular c lasses con ta ined approx imately 24 students a nd small classes contained approximately 15 students. college gradu <lteS ea rn $10 per hour more than high school gradua tes.9 (1.250 is drawn and yields Y== 5.ll (1.5) 0. in (be populatio n. A hi gh school (.6) X SmailClas~) R 2 = O.oullse lo r [e lls a student that. A rand.4 + 3. on <:average. R2 = (3. Do )'OU toinl:: th at the regression errors plausibly are homoskeda.5. consisteot with lhe regression evidence? 5.5(c)? Expla in. 5.5) a.7 Suppose tbat (Yo Xj) sa tisfy the assumptions in Ke y Concept 4. Suppose tha t lhe rcgres· sio n errors were ho moskedasLic: Would th is affect the v<:alidity of the conf ide nce interval con5IJucied in Exercise 5. (2.2X. SER = 6. the s tandardized tests have a mean score of 925 poi nts and a standard deviatio n of 75 points. Let SmollCltlSs denote . Is Ihis state· ment consistent with the reg.01. A regression of Tes(score on Smal/Class y ie ld~ ~ "" 918. S£(bl) was computed usi ng Equation (5. 3.~s on test score. Tennessee conducted an experiment in which kindergarten stu dents were r ando mly assigned LO " regular" and "small " classes.176 CHAPTUI: 5 linear Regres. Is the estima ted e He ct of class size o n test scores statlsticaUy s ignifi ca ot? Carry oU( a test at the 5% level.
+ U i ..55 at the 5% level. Construct a 95% confidence interval for {31 c. regressions estimated. = f3 0 + {3 IX. y = 43 . b. Let denote an estimator of f3 that is constr ucted as {j = . + 11 . . H I: (3 ) > 55 at the 5% level. and (a) and (Il) answered..Y b . Suppose thai Yi and lues is 'ten st U Xi are independent and many samples o f size L d given approx :udents. Y". Test Ho: f31 = 55 vs.tate 9. X . Show tllal l3 is conditionally unbi ased . H J: {31 :f. a. a. lilY of the . Test Ho: P I = 0 vs. he r egres . a~ ) and is independent of X A sample of size n = 30 yields 5.9 Consider the regression model skedast ic? Y.3 and. H l : J3. and Xi satisfy th e assumpt ions in Key Concep t 4.2 + 61. Le i Yodenote th e sam ple mea n for observa tion s with X == 0 and ~ P • . In what fraction of the samples would Ho from (a) be rejected? [n what fr action of samples would me value {3..2) (7. where Y and X are the sample means of Yi and X i' respecti vely. we re independent.8 S u p po~e {hat (Y. 5. II i is N( O. and X. (10. .3. .5X. Show lhal {J is a linear function of Y l. SER = 1.4) where the numbers in pa ren theses are th e homoskedast tco nly standard erron. R2 = 0. b.. 0 be included in th e confi dence interval from (b)? denote a Lnd equal I 5. ) score of n = 250 are drawn.) satisfy the a!isumptions in Key Conce pl 4. A ran dom where u. for rhe regressi on coeffic ients. ::0 f3 X.. '" 0 al lhe 5% level.52. Suppose you learned th at Y. c.3. h.Exercises 177 gradu . Test Ho: {31 = 55 \'s. .. in addi I' tion. . d. Construct a 95% confide nce interval for f3o. Would yo u be surprised ? Explai n.10 Let Xi denote a bin ary va riable and consider the regression Y. ffeet signifi : iass on 5.54 .
= }l1  Yo 5.3 and.. a nd s uppose that 311 251 observat ions a re used in the regression Y. .d' Let ~m I denote lhe QLS cstimatorconSlructcu using . Yn .10 and s..22). The corresponding values for women are Yo.r + Il.) ~a ti s ty the Ga ussMarkov conJi lions given in Equation (5. wbere (1I i' X. ~~:1 y~ $523. IT~ ) a nd is independe nt of X.1. 5. (Y ". c. X.andll.1.. == 120 me n and " ". denotes yea rs o f schooling..13 Suppose that ( Y" X. + 1/. Show that AI ~ ~J AI + ~ I = YI.1 0.. {3X..28) in A ppe ndix 5. How would your a nswers to (aJ and (b) change if you assumed an i) that ( Yj • Xi) sa tisfi ed the assumprio ns in Key Concept 4. ". c $51 .. Js {J l conditi onall y unbiased? b.Y".15 A researcher has two indepe nde nl s amples of o bservations on ( Y. Womell. = 13 1 womr.:n The sample 3\'e rage of me n's weekly earnings <V".o + {J. De rivc the conditio nal variance of the estimator..14 Suppose that Y. H.• 118 CHAPTER 5 lineor Regression with One Regreuof de note the sample mea n for obse r va tio ns wilh X = J. )2) is $1)...3 1).o + Pm. <lnJ the independe nt samples are for men and wo men.. suppose that Y.10.3? 5.U A random sample of workers co nt ains n.. Find the OLS estimates of f30 and {31 And thei r correspond ing sta ndard errors.X.. is run." + /(""1' a nd tbe regression for WOnll!n as )' = {J. IX . i = P.). and the sample st:mdard deviation (sm = L:=.. Prove thaI the estima tor is BLUE. b. is N(O.3 and " va r(II/IX.. a... = D e rive the least squares estimator of {3 and show that it is a linear fU lictio n of Ylo. derive the variance of ticilYgiven in Equa tio n (5.1. I X. Is {J l the best linea r co ndit io nally unbiased estimator o r P I? c. II. = x) is consta nt ? d. d. + II . Write the rcgre ~sion [Of men as Ym.. VA {Jo + /3. in addl' tion .. S.:: $485. denotes earn ings.. To p~ specific. How \\ould your answers to (a) a nd (b) c hange if you assumed only that (Y Xi) satisfi ed t. P ounder homos kedn& 5.. Lt:t WOmell dcnOie nn indie<t lOr vari able that i s eq ua l 10 J for women and 0 for men. U Sta rt ing from E quation (4..) sa ti sfy the assumptions in Key Concept 4. = J.be assumptio ns io Key Concepl 4.. Show that the estima tor is conditionally unbiased..
)1) is $68. can you reject the null hypot hesis Ho: {31 = 0 versus a two sided ahe rnative a t the 10%. )I'. Show (hat the sta ndard e rror of ~"'.).) ~ v'ISE(~m. ... run a regres sion of ave rage ho urly ea rnings (AH£) o n Age and carry OU I the followi ng exercises. d...3 and sumed only )' Marko v condi· is a linear un E5. and.1 5.l Using lhe data set C PS04 described in Empirical Exercise 4. Construct a 95% confidence interval for the slope coefHdent... (Hillt: See Exercise 5..) E5.5%.. in add. X .. or 1% signifi canct' leve l? What is the p value associated with coeUicien t's . • $51. t·. T a regression of years of completed e ducation (ED) on dist ance to th e near es! college (Dist) and carry out the fo llowing exc rcise~ u.statistic'! I I? . :onstructcd using b. or 1% significance leve l? Whal ..To hc Df schooling. and SE(h". the data set ConegeDistance described in Empi rical Exercise 4. can you reject the null hypothc sis Ho: 131 = 0 versus a twosided a llernativc a t the 10%. homosked<lS a.10.ression Yj = J31and their Empirical Exercises ES.3 Usin g.· b.1denote the aLS estimator constructed from the sam ple of women.5%.. can you reject the null hypo lhesis Ho: 131 = 0 vcrsus a two· sided alte rnativc at the 10%. anJ l e regressio n [Of )r women as Y. e. or 1% significance level? Wha t is the p value associa ted with coefficient's Istal istic? o n (Y.. Is the estimated regression slope coe fficient statistically significant? That is.3.1O .2 Using the data set TeachingRatings describe d in Empirical Exercise 4. ~ $52 3.1 ... Construct a 95% con fi dence interva l for the slope coe fficient.2.umed pt4..l. stat istic'! ::.en and 0 for . Is Ihe effcci of age o n ea rnings diffe rc nt fo r high school graduates (han for college graduates? Explai n. Run Ihe regressio n using data o nly o n fema les a nd repeat (b).~ . the sample of men . c. ) J' + ISE(~. • . l is given by SE(P.]) and SE(j3. 5%. Let . 13.. run a regression of COI/I'se_Eva{ on Beatay. ~. Is the estimated regression slope coef(icien t statisticall y significant ? That is. Is the estimated regression slope coe((icient sta tistically significa nt? Thai is.. Re peat (a) using only the data for high school grad ua tes.Empirical Exercises 179 lat flo = Y o· 131 women..s thc pvalue associa red with coefficie nt's .1. Repeat (a) usin g o nly the data for college graduates.I ) denote the corresponding standard errors."J  p".
) APPENDIX 5.robu~1 ~(Iln' iJo is 1 ILx)II.3. yields "i.robust " standilrd errQrs. '" it x (l±HfY nj &1 ( ~.3.15. varianc/. L:'. which allow for heleroskedasti.it h a modification llle variance in the nume rator of Equation (4.180 CHAPTER 5 linear Regression with One R egre1W1" d. FormuliLS for tht.16J . analogously to the degreesoffreedom adj ust meOl used in the deflm· lion of the S £R in Section 4. The consistency of heterosk~dasticlly ... The variance in the denominator is estimated b~ ~ L.1 Formulas for OLS Standard Errors 111 is appendix d i ~cu ~es the formulas for OLS standard errors.3. . The ~Stimator of the variance of 1 ~ .. Thc~ :Ire first prese nl~d unde r the least squ<tres assum ptions in Key ('.I(X.X )lil!.21) b)' Ihl'.J and va r(X. here the dIvisor n .in Equation (5. of the OLS esti mators and the associated standa rd errors arc thcn given fo r the special of honlOskedasticity. v.. defi ned in Equation (5..4).jliil ~ " 11 2/_ 1 OJ.onccpt 4.21) by the corresponding sample variances. e. for downward bias. . Is the effect of distance on completed years of educat ion different for men lhan for women? ( Hint: See Exercise 5. R un the regression using data on ly on males a nd repeat (b).2 (instead of n) i ncorpcmlte~ a degreesoffreedo m adj ustment to correct ".2 1) is estimated by ~.J.e 1\\0 da rd errors is discussed in Sect ion 17.:'I { XI  xl Replacing \'a rI{Xj  estimatoN.4) is obwined by replacing the population \3ri ances in Equation (4.) m Equllt ion (4 . ca~e HeteroskedasticityRobust Standard Errors The est imator (.· tty: these are the "he leroskedasticity.
(homoskedasticityonly) and (5. where the second equality follo ws Occause £I(X. .3. and by estimating the variance of ui by the square of theSER.rl = £[(X.. (T~ E[(X.) = fT~.J.0".formulo$ for OLS Standard Errors 181 ni for v.t.JLX)2j by s ubsti tul ing this expression into t. If the errors ar~' homoskedastic.J.t ..J..mcc of to II.timated by )'1 these \\\'0 'robust stan' _. L(X.m: ~. and ste ms from replacing population Vi expectations wjth sample averages.29) .I X. . is homos kedastic. given X j is a eon"lant: "ar(lI. 'lbe result in Equation (5. ( 1 .tx) 2vu r(u. The ril~.tX)II. where nume ralOr of Equation (4.. HomoskedasticityOnly Variances U nder homoskedastici l'1.fixJ1var(u. The homoskedasticityonly estima tors of these variances . ./oIX)ll.J.}tioll (5 .28) skeJaslic aria nee of ca~ To derive Equ.(f I~ ~:_l XfJX.•: ~resent cd '" (5. .21 ) and simplifying.tX) II.h~ = CT~CT1.28). s~ .Only Standard Errors The homoskeda:<>ticityonly stand ard errors are obta ined by substituting sample means and va riances for the pop uillll00 means and van a nces in Equations (5. L(X. here H. • . A simila r calc ulation yields Equation (5.) ).27).27) follo ws lation vari cation. If u.30) . . .)] ".dudl ") = E[ICX.J.2o ) ij2 _ ~. = I .) == (T~ so E{(X. the condItional vari.. .27) and (5. The standard e rror or Po is SE(Po) = VJ[.Xl' " (5. IX. write the numera tor in Eq uation (4.d u:l = £[( X.IX. .. then var( II. IX.4 simplify tY't • = ~ and fliT.27) if.X l ) s! I .28). the fann ulas in Key Concept 4.2 1) as var[(X. ..  )eclal E[(X. (5.' 2 £( X f) 1 (5.] = 0 (by the first kaS l squares assumpt ion) and where Ihl: final equality follows from the law of iter ated expectatio ns (Sectio n 2.:z 1 Ihe defini ." L /I '1 U (homoskedasticit'1only) . The reaso n ing be hind the estimator b the same as behind OJ./oIx)II . It to COffect Homoskedasticity.. . . o 11fT).] = £ (l( X .X) ' • .
. The GaussMarkov Conditions The three Ga uss·Markov conditions are (i) £(u. £(u. IX ..: square roots of Di.1( cond ition (iii) is implied by Ihe tellst square~ U!>suml" tions..c E('I..i. . .. so condition ( L ii) . 0 < O" ~ < (iii) £(11.d. The GaussMarkov conditions are implied by the th ree least sq uares assum ptIon) (Kc~ Concept 4.) = 0 for all i.2 a Proofofthe GaussMarkov Theorem As discussed in Section 5.).. .) arc i.. . i 00 (5 .) .. To show Ih.). by Assum ptio n 2.I X I . . notc that E( uju/I X I .. . The three condi tions. lind by Alisump lion I. var(u. . XII) "" 0 (ii) va r (u.X~) "" 0 for all i . respect ively.. that I' .182 CHAPTER S linear R egreuion wi"" One Regressor where ~ is given in Equatio n ( 4. . so condition (il) holds. The homoskedasticityonly SlandaroJ errors are Ih. X. Xl "'" X. XI " " . va r( II... ..1'1 ). This appe ndix begins by s til ting the Gauss·Markov condI tions and showing that they are implied by the three least sq uares c(lOdi lion pl us homoskcdasticity. which is conslant ..11. has mean olCro.O: thuscond ilio n (i) ho lds.31 ) "* j whe re Ihe condi tions !:Iold for i.) . X.1 X I' .X~).E(/I.3). and a t. the Ga uss·Mil rkov theo rem siaies Ihal if the GaussMarkm condilion§ hold .II/I X.1 X.efficient) condi tionally lifl1!dr unbiased estimator (is BLUE). .) "" ~ .. j . il fo llows that E(II/II.~·d Xs (Xl.I X I . .. (Assum ption 2). Similarly.• X. APPENDIX The GaussMarkov Conditions and 5. Finally.. )C.. we turn 10 the proof of the theorem.var(u. Assumption 2 also implics Ihm E(II.) = £(II... XI) bec.i.! X.. XJ .. hilS a CODSlant va ria nce. £(uil X.) .I. n. where aillhese ~tate mc nts hold coodition:tl1y on all ob scr\..0. .) = E(II/I X') £(II/I Xj ) for i ? j: bc:cnu<..d. an d lhat the e rrnTS !'I re unco rrelated for Jiffe re m observlllions. .I X !. ..IIJIX" X. hy Assump tion 2. stall' thJI u. and becau~ the errors arc assumed to be homoskcdastic. plus the addit io nal assumptions thai the the ob~rvations erro~ are homoskedastic..\Usc (Xi' Y.5. Because arc i.) .I X. the n the OLS estimator is the bcsl (mo~t .Assumplion 3 (nonlero fini te fo urth moments) ensures that 0 < ~ <. j. We next show tha t the OLS estimator is a linear condit io nally un hia wd estimator..
BnO:.)+ i.25). (i n .3. Is a Linear Conditionally Unbiased Estimator 10 show Ihal ~ 1 is linear. l (X.X)(Yi y) = ~:. wc have thaI ms(Key Becau$t ~. .'.. va r(tld X j..x.x.) . ThUs.The GaussMarkO\' Conditions and a Proof01the Gou~~Morkov Theorem 183 'h' holds.) + PI (L::' lll.8 d X I• .31) The rl!sult tha t f31is conditionOlUy unbiased was previo usly shown ill Appendix 4.. = I. ' X R bUI nOI o n YI . X.24) !lnd (5._I(X. .34) \ SSUOlP' .X) = :l:. . b ut for thisequa1ity to hold for a ll values of 130 (lnd P I it must be lhe case thaI.32) I plu!> Beca u:. . 0.. . = (X.. ..IQ iY.  I"" .. Xn) ". i = I 0 and ~G.X. flISt note thill..Pl i~ conditionally unbiased. . .Xn) = 2 7 _IG . .X) Yi • Substi tuting this result into the formu la for ~l in Equalion (4. .. X n . . " "" La.(in. it mus t be t ha t f30 (L. Because ~1 is conditionally unbiased by tlssumption. .. ••• . for a ll esti mato rs PI salisfying E q uations (5. imply the GaussM arkO" conditions in E quation (5. The OLS Estimator (3 . E(27 . (5.'." linear ndi ro ::E (X.::.X) = O(by Ihe definitio n o(. XlV. .X.7) yields  ~7 1 (X..) = . + U j into PI = I :. X) " ~ '" (5. Substituting Y. plus homoskcdaslicil). thc least squares assumptions in Key Concept 4.(Xi . (.'."<" 1 ' '' 1 II (5. Y ix l i where. i. ta king condition al expectations of both sides of Equation (5. = f30 related bsen 'ed + f3.e the weigh IS a " i = 1.) +~. II1. X )' " . iI X) X) ' " (X. the OLS estima tor 0 is 1 11 linear esti mator..34) yi e lds E(. '. a nd collecting terms. ::E (5.. ~Ie that Proof of the GaussMarkov Theorem We start by deri ving some faclS Ihal hold [or ail linea rconditionai lly unbiased e stintators tha i IS.suOlP· Assump' By the fi rst Gauss·Markov cond ition.'. 0. because :E~_I (X.lIi IX I" ..3 1).Y).E(ud X1.X. Under the Gauss·Markov conditions.. . bccau~c il ion (iIi) " La... o f Iht' errors.= ". ::E ~rko\' X) Y . ...35) • .!..  " (X.: \ (5. a.61 10 be condilio na11y unbiased.) + f31(2.. ) ~<% .YL.I .. (or . and the \'aria nce of the conditional distribution of ~T' give n Xl'" ..32) depe nd on X l •.)..33) ::E(X. biased Y" ... • X.3.\a.1 I1.) = P I. 11 in Eq u!llion (5. .X)' . th us. is ." .Q.
.. • • • u ~ • . E .c: 0>': .' "0 E o c: Z > ~ .~ .c: 11 ..c:... of c ~.:..:. 0:...o .. .~~ ~ H! ..' " .
a. under the GllussMarkov :lssumptions." so that the only regressor is the constant regre!>SOT X I\' = I. Thus his result (5..d.' is no regressor..• XII take on the same \ alues from o ne :.... O..' )l". so iI {ollow$ that Y is BL UE if YI.ssary because X l •..I' ..) when Vi' ..baS 18S all of Ine "condnional on Xl. Y. . 1".. Xn ): . X~ " s t nlcmcnlS arc unne.37) . is the most efficient lincar estimator of £( Y . ~u m · The Sample Average is the Efficient Linear Estimator of E(Y) An implication of Ine GauJOs· Markov theorem is Iha\ the sample average. + d. But o nonran· : repealed >sumptio n i. Then the OLS estimator (5.. mple 10 the next. . lllis result was stated previously in Key Co nce pt 3. consider the case of regression without an " X. are i.. Y" are i. Y is BLUE Note thn! Ihe Gauss·Markov requireme ntlh!!t the error be homosl:edastic is irreleva nt in this case because then...36) Equa· for Ine iJo = Y. 'xceeds \ . .xCCpl [nat • . It follo ws that.i. To see this..o.11..The Gouu'lv\oriov Conditions and a Proof of the GouwMortov Theorem .3.). 11 . .d" lhen .i.i.d..'C. Y.
a nd the re by estimate Ihe e Uect of one regressor (the studenHeache r ralio) while ho ld ing conStant the other varia bles (such as srude ot characte ristics).can in fact make The ordjoary least sq ua res (OLS) estima tor of the effect of class size on lest scorc ~ misleading or. suc h as stude nt cha racte ristics.nts attest scores b) col lect ing thei r influences in the regreo. ctors include • .. Thh c haple r e xplains how to estjrnate the coefficients of the multiple tincar regression model. wha t can be done? O mitted fac tors. Ma ny aspects of muhiple regression parallel those of regression with a single regresso r. biased . TIlis c hapter e xpl:lins this "omitted variable bi::ls " a nd introduces mllhipJe regression. then we caD include them as addilional regressor. Co uld th is have produced misleading results a nd. a method lhal can eli minat e omi!led varia ble bias. I Omitted Variable Bias f " By foc using o nl y on tbe student. A lthough school dislricls with lower studentteacher ralios te nd 10 have higher test scores in the California data set perh aps stude nts from districts with small classes have olher adva ntages that help the m perform weI! on standardi7cd leSIS. The coeffici e nts of tbe mult iple regressio n model can be estimated fr om da ta using OLS: the OLS estimators in multiple regression are random variables beca use the y depend on data trom a random sample: (Itld in large samples the sampling distributions of the OLS estim atOrS are approxim :Hely normal.teacher rat io. if so.sion error term. if we have dHta on thest: omitted variables. The key idea of multiple regression is that.l nese omitted r.CHA PTER 6 Linear Regression with Multiple Regressors C ha Ple r 5 e nded o n a wo rried nOle. the empirical analysis in ChaP tcrs4 and 5 ignored some potentially imponant detennina. 6. more precisely. slUdied in C hapte rs 4 and 5.
A ccordi ngly. ..l compute r usage. such as tcache r quality a nc. col . the superinte ndent might hire e nough new teach e rs to reduce the studentteache r ratio by two. But because the student. thell it would be safe 10 igno re E ngIJsh profic ie ncy in the regression of lest scores ag<li nsl Ihl: ::. We begin by considering a n offiillCd student c haracleristic that is particularly re levan t io Ca lifo rnia beca use of its large immigrant population: the prevalence in the school district of students who a re sli1l learning English . Th is small but posilive correlation suggests that districts with more English learners tcnd to have a highe r studen t. such as fam ily background .Ieacher raljo were unrebte tlt o the pe rcentage of E ngli sh learners. To illustrate these conditions. 19.6. If dis tricls with la rge classes also have many siudents still learning English. H ere is the reasoning.teacher ratio.1 Omitted Variable Bios 187 school c haractcrislics. O mit led variable bias occurs when lwo conditions are true: (1) the o mitted variable is correlated with the included regressor: a nd (2) the o mitted variable is a determinant of tbe depende nt va ria ble.ed. If the swtlent.lence to this concern. then the OLS estimalo r will have omitled variable bias. and st uden t characteriSlics. in part. A look a t the California dal3 lends crec. when in fact the Irue causal effect of cutting class sizes on tesl scores is small.Iude nttcacher ra1io.teacher rat io reflects tha t influe nce. the de penden t va riable (tesl scores). based on lhe analysis of Chapters 4 and 5. The correlation belweeo Ihe SludenHe<lchel r:lIio and the percenlageof E nglish learners (students who a re not na tive English spt:akers and who have not yet mastered English) in (h e district is 0. Ihen the OLS regression of lest scores o n Ihe stude nt .ud< been omin ed from the analysis (t he percentage of English learners) and that deter mines.even zero. By ignori ng the percent age of English le arn ers in the district .teache r r:uio and the pe rcentage of English learne rs are correlated. that islhe mea n of the sampling dislribution of the OLS estima tor might not equ al the tr ue effect on test scores of u unit cha nge in the student. but her hopedfor improvement in test scores will fa il to maleriali ze if the true coefficient is small or zero. consider three examples of variables tha t arc omiltcd from the regression of test scores o n the studentteacher ratio. the OLS esti ma lor of the slope in the r egrcssion of test scores on the s tude nt.teacher ratio) is correia le d wi th a varia ble thai has !+ ~ "p .Ieacher ratio co uld be bia!1.teache r ratio (larger classes). Definition of Omitted Variable Bias II" the regressor (the studen l. il is possible that the OLS coefficie nl in the regression 01 test scores on the student.Icacher ra Tio could e rroneously lind a correla tio n and produce a large estima ted coefficie nl . Stude nts who a re still learni ng Eng li!oh might perform worse on standardized tests than nativc English speakers.
Because parking lot space per pupil is no t a det e rm in a nt of test scores. Example #1: Percentage ofEnglish learners. This variable satisfies the first but no t the secolld co ndition fo r omit ted variable bias. Speci. H owe ver. unde r th ~ ass umption that learning takes place in Ihe classroom . If one of these other factors is corrclHled with XI· . However.• . . For example. so the first conditi on wou ld be satisfie d.e r omiued variable is park ing lot space per pupil (the a rea of the teac he r parking I~ I di vided by the n umber of stude nts). in Ihe linear regressio n mooel with a single regressor re presents a Ufactors. in which case tbe percen tage o f English learners is a determinant o f test scores and the seCOnd conditio n for o mitted vari able bias holds.) = 0. if the lime of day of the lest varies from o ne districT to the next in a way that is unrelated to class size. omitt ing it from the analysis docs not lead to om itte d variable bias. other than Xj.teacher ratio. because in this example the time that tbe test is administered is uncorrelate d with the studen t. 1. tha t arc determinants of Yi .teacher ratio could not he incorrectly picking u p the " lime of day" e ffect Thus o mitting the time of day of the Lest does nOI resuh in o miued variahle bias. the OLS estimator in the regrc<. CHAPTER 6 linear Regression with Multiple Regressors Because the percentage of English Ic arners is correlated with the stude nt.teacher ral io. Con ve rsely. recall thaI the error term II. To see why. the firs t condition for omiued variable bias holds. A no lb. th en the time of day a nd class size would be uncorrelated so the fir st conditio n does nol ho ld. so the second condition holds. the pe rcentage of English learne rs.thaI E(u. sch ools with more teachers per pupi l probably have mo re teacher parking space. I X. the lime of day of the test could affect scores (ale rtness varies thro ugh the sc hoo l day) . Omitted variable bios and the first least squares assumption . Example #3: Parking lot space per pupil.. omitting tbe pe rcentage o f English learners may introduce omitte d va riable bias. the slUdent. thus the second condition does nol hold. O mitted variable bias means that th e firs\l east squares assumpt io n. It is plausible thai students who are still learning Eng.. Another variable omille d fro m the analysis is the time of day that (he test was administered. Omitted variable bias is smrunarized in Key Concept 6. parking lot space has no direct effect on learnin g. lish will do worse o n standardized tests than native English speakers.fically. Example #2: Time ofday of the test. a5 listed in Key Concept 4.· sion of les t scores 00 the stude ntIeacher ralio could incorrectly refleci the inOu_ ence of the o m illed variable. Fo r th is o mi tted variabl e. no t the parking lot.3is incorrect. Thai is. Thus. it is plausible that the first condition for omitted variable bias does not ho ld but the second condition does.
.In o ther words.(<J .(<J. PI . the condit ional mea n of lI. TIlen the OLS esti mator has the limit (derived jn Appendix 6. ThaI is. 2.u. tha t is. 1) summarize~ severaJ of the ideas discussed above about omi tted variabl e bias: 1. and Xi a re correlated . and the con sequence i~ serious: The OLS estimator is biased. Let the correlation be tween Xi and U. ... as tbe sample size inc reases.1 ) condil io n Int of lest ias. lId riable.3>'. Y. KEY CONG'" j 6.1) P . is nol a consistent estima tor of f31 when there is o mitted vari ahle bias.) = Px.0WCver.ITor term (which contains this factor) is correlated with Xi. 1 Omitted Variable Bio~ :e of n ror 189 Eng case 'eand O MlnED VARIABLE BIAS IN REGRESSION WITH A SINGLE REGRESSOR Omitted variable bias is the bias in the OLS estim ator that arises when the regres sor. This correlation therefore violates lhe firsllea sl squares assumption. For omilled variable bias to oceuT. X.. {31 is dose to (31 + Px.other tn:lI1 ed with X" Because P I does n OI converge in probability to the true value (31' ~1 is incon sistent..E. if an omilted variable is a determinant of Y. TIlt! term Px... Con l ( r:l~ :~: .. Omitted X) d ). Suppose that the second and third least squares asswnplions hold .Iff x) with increas ingly high probability.\'I' (J'x' ~ is park : number fo r omit ~bly ha"':: 1. twO conditions must be true: f"" '*''.!!. X is correlated with the omitted variable..I that persists e ven in large sa mples./ux) in Equat ion (6. be corr(Xj. e parking (6. is correlated with an omitted variable.1) is Ihe bias in {. rauo.m the this means that the (. This bias does no t van ish even in very large samples. Omitted variable bi as is a problem whether the sample size is large or small. and if it is correlated with Xi' Ihen the error lerm is eorrelal..s from !of day / .1 :gres influ lilt ing 1.:d with Xi' Because ii. then it is in the error te rm. and the aLS estimator is inconsisten t. The omitted variable is a determinant o[ the dependent variable. but . given Xi is nonzero. as r term Ui in . The formula in Equation (6.6. /3 1 '+ f3 1 + P. of day" variable A Formula for Omitted Variable Bias The discussion of the previous section about omitted variable bias can be sum marized mathema tically by a formu la for this bias. but the first does oat because fix" is nonzero..
8. especially the lIr1iclc by Ellen Winner and Mon ica Cooper.9% 1. (pp.) Taken together.'xpe riments eliminate om itted variable bias by ran domly assigning participants to " treatmen t" and "control" groups. W hether Ihis bias is large o r small in practice depends on the corre lation Px. the fraction of English learne rs is pv. In our data.. 1176) Hnd the one by Lois Hetland (pp. suggests that the rea! fI.105. or those schools with ."l'am for an origami exam.O~ :>23. For rcnsons not fully understood. try to fit in a lillIe M07.lson trolled experiments on the Mozart eff. I99J) suggested that listening to nlUsic courses appears to have omined variable bias. By omitling factors such a~ the s tudent's innate abil ity or the overall quali ty of the school.1I sludy made hig new~ and politicians and parents saw an o.~. So the next lime you !.1 deeper music curriculum might juSt bo.R2J .sic courses or and visualizing: shapes.' A closer look at these Sludies. Mozan for 1015 min utes could temporarily raise your IO by 8 0l '} points. For a while.148). 6 linear Regression with Multiple Regressors A stUd)' published in JIoiarurl! in 1993 (R::Iuseher. the estimated relationship betWeen lest score~ and laking optional lSec the jrJ/lTllul of At!Slheric Edlt((Jl'OIl 34: 34 (FaIiMill ler 2(00). the many con What is Ihe evidence for lhe "MOlart effect"? A re view of doz~ns of studies found Ihal students who take optional music or arts courses in high school do in fact have higher E nglish and math test scores than those who don't. Th.8% H.lislne are positivel y or negalively corre late d. For example. ever.art effecr~ One way to find out is to do a randomized controlled experimcnt. 190 CHAPTER . belween the regressor a nd the error term. The larger is IPx. the larger is the bias. how· for the betler lest performance has littk to do with those courses. the 'luthors of the review suggested that the correlat ion between testing well and taking art or music could arise from any number of things.:ct fail to show thAt listening to MOJ:art improves IQ or general lcsi pt:rformancc.. too...$tudying music appear.ISY way to make their childr\!n smarter. (As diocussed in Chapter 4. more interest in doing aCross the board. so th at the percentage of English learners enters the error term with a negat ive sign.:ncr students might ha \"c more time LO take optional mu.0% . we specullted that the percentage of $lU' dents learning English has a lI eglllil'e effect on district test scores (students still learning English have lower scores).. For exampk. So is therc a M07.art. 111e direction of the bi as in ~ t d epend s on whether X an d 1I Alt<.9. it seems that listening to classical music (Ioes help temporarily in one narrow area: fold ing paper Instead. the acade mically b. TABLE 2. l.~ilil'ldy correlated with t he stud entteacher ratio I l'er~entagj <:: 1. to have <1 n effect on test scores when in fact it has none. randomized controlled \.. Shaw ::Ind Ky. 3 . ~O. how· e\'er.. the state of Geor gia even distributcd classical music CDs!O all infanlS in the state.: bc tter schools In the terminology of reg ression .
.\ " 238 . Thus llle student.4 .nn(U 650.. ::. she is interested in the eITect of the studentteache r ral io on test scores.''"% .9 3. 665.' 7.1 " 44 50 51 1. Test $c_ " 182 di. but she has no control ove·r the fraclion of immigrants in her community.teacher ra lia (X ) would ~ lU!g(lfjVe/y correlated with Ihe error term (II). I.7 27 44 ..s. In othe r words. This new way of posing her question sug gests that.. by the Percentoge of English Learners in the District StvOentT_heI'" ROlio < 20 pr_ the ~s ~II Te st Sc:_ .in districts with compa r<l ble percentages of English learners..ct. Districts " " .30 1.2 654.13 denl'> gli~h II} 'ItR"o ion of nl ti Cl r...~t $c.5 665..0..4 661.statistic 4.. of English learners comparable to hers. 1 \ Differences in Test Scores for California Sthool Districts with Low and High StudentTeacher Ratios.0. Among this subset of dis tricts.d T ABU 6 . \ " .teacher ralio ~I would be bi. do those with smaller classes do better on standardized tests? Table 6. includ ing th~ percentage of English learncn>.. perhaps we should foc us on districts with percentage!.U% .lscd toward a negative number.3 low"..0 .rer>(e in T. holding conslUnt other factors. 1 reports evidence on the relatiom hip between class size and test scores with..04 PCrCtn13gc: of English learners '< I~·ll. having a small percentage of English lea rners is assoc iated both with high lest scores and low student. 7.teacher ratios. 636. Htgh 5TR . Addressing Omitted Variable Bias by Dividing the Data into Croups What can you do about omilled vari able bias? O ur su perint endent is considering increasing the num be r of teachers in her distr. instead of using data for aJI districts.. f SlU ['or 664. 23.' 61 I.8 6'9.I Omi!!ed Variable Bios I9 T (dist ricts wilh more English learners have larger classes).72 0.. so one reason tha t the O LS estimator suggests that small classes improve lest scores may be that the districts with small classes have fewer English learners...' . As a result .. so Px" < 0 and the coeffi cient on Ihe student.6.68 634. Ratio ~ StudentT_'20 DiH.. 657..
so (h e null hypoth esis that th e mean test sco re is the same in the two groups is rejected at the 1% significance level. depending on whether the studentteacher ratio is small (S'f R < 20) or large (STR ::! 20). Second. The final fo ur rows in Table 6.192 CHAPTER 6 L inear Regression with Multiple Regrew>r~ ore d ivided iota ei!!)lt groups.he average for the 27 with bigh student. approximately 30 points.3 poi nts highe r th an those wi th bigh stu dentteacher raLios: this gap was 5. How can the overall effect 0 1" te'! scores be twice the cffcct of test scores witllin . within each of lhese four categories.my quarti le? The answer is Ihnt Iht: d ist ri cts with the most English learn e rs tend 10 have borh the h igh~~t student.1 reports the overall difference in average te51SCorcs between districts with low and high student. Thus.04 .teacher ratios: 74 % (76 of 103 ) of \be districts Ln the first quartile of English learners have small classes (STR < 20) ..teacher ratios is 664. the districts are broken into four ca tegoric~ that correspond to the quartile~ of thc distribution of the perce ntage of English learners across districts. At first this finding might seem puzzli ng. the difference in performance betW een di~ trie ts with hig h and low studentteacher ratios is perhaps half (or Jess) of the overall estimate of 7. < 20 and equals 0 oUlerwise. or the districts wi th the fewes t English learners « 1.)triclS.9 points lower in the districts with low stu dentl eacher rfllios! II) the second quarti le. First. Th is evidence presents a diffe rent picture. d istricts arc further broken down into two groups.4. the average test Scare is 7. The districts with few English learn ers tend 10 have lower student.5 and t. th at is.teacher ratios alld the lowest test scores. test scores were on average 0. for the districts with the fewest E nglish learners. the diffe rence in test scores between these two groups without breaking them down further iOlo the quartilcs of E nglish learners. (Recall that this difCerence was previ ously report ed in regression form in Equation (S.) Over the full sample of 420 districts.9%). while 01\Iy42% (44 of 10:5) of the djstricts in the quartile with the most English learners have sm311 classes.1 8) as the OLS estima te of the eeefti · ciem on DI in the regression of TesrSco re on Db where Di is a binary regressor thaI equals 1 if STR . The difference in the average le~1 score hetween districts in the lowest and highest quartile of the percentage of En!!" lisb learnerS is large.4 points higher in d istricts with a low student.teac he r ratios is 665. the average test ::score for those 76 with low student.4 points. So. .te acher ratio th an a hi gh one .9 points fo r the quartile of districts with the most English lea rners. the dist ricts with the most English learners have both lower lest scor~s and higher studentteacher ratios than the other d i.teacher ratios. districts wi th low stude ntteacher ratios had test sco res tha i averaged 3. The firs t row in TabJe 6. Once we hold the J>l!r centage of English learners constant.1 report the diffe rence in test sco res between d istricts wit h low and high studentteache r ratios. broken down by the q uartile of the percentage of E nglish learners. the {statistic is 4.2 points for the third quarLile and only 1.
By look ing with in quartiles of lhe percent age or English learners. • . given the studentte acher ratio anu the percenl flge o f English learners is given by Equation (6.2) is the population regreSSion line o r popuJatioll regressio n runc tio n in the multiple regressio n model. the lest score differences in the second part o f Table 6.i' and so forth) constan t. This model permits estimatin g the effect on Y. is. In the class size problem. Slili. holding constan t the fract ion of E nglish learne rs. (6.2) b highest ~Tage test ~e of Eng ~isb \caW districts in :only42% ~ave small Itest sco T':S wh cre E( Y. a nd the percentage of English learners in the J"th district (Xu) equals X2' then the expected value of Y. Such an estimate can be pro vided .2).9 points the per reen dis ~) of the ~\ of I CS t thai the E( Y. The coefficient f30 is the inte rce pt. IX jj = Xl' X 2i = Xz) = ~o + f3 1.. or rc for 7 wilh nglish o w stu The Population Regression Line Suppose fo r the mo ment thai the re a re o nly two independent variables.teache r ratio (X]. O ne or more of the inde penden t variables in the multiple regression model arc some times referred to as control l'ariables.1 improve lipan the simple differenceofmeans analy sis in the firsl line ofl"ahle 6.) of the stu dent.2 The Multiple Regression Model The multiple regression mode l extends the single variable regression mode l o f C hapte rs 4 and 5 to in clude additional variables as regressors. tween tile o f re. the coeffici ent o n Xu. = x2) is the condit ional expectation at Y. given that X l. Y. however.2 The Multiple Regreuion Model 193 ~ies i ·lish are he r ores "nee into usl\' £fi r that triets. if the student. Eq uatio n (6. l vel. 6. th is analysis does not yet pro vide the super intende nt with a use. the mu ltiple regressio n model provides a way 10 isola(e the effect on lest scores (Y.). Xli a nd X 2i . mo rc simply.6. = Xl a nu X 2.ful estimate of the effect on test Scores of changing class size. th e average r elationsbip be twee n these two independent variables a nd the de pendent \'ariable. the coef ficient f3 .~ ha n ging onc variable (X t . given hy the lin ear fUllction I '.) while holding constfln t lhe percentage of studen ts in the district who are E nglish learners (X 20. the coeffici ent on XII' lind the coefficienl Pz is the slope coeffi cient of Xu o r.. more simply. is the slope coeffici e nt of Xli or.) while holdi ng the other regressors (Xi i' X.l X l' = X l ' X z. using the method of mull iple regressio n.eacher I gh s1U . acher n1l! an This analysis reinforces the superi ntendent's worry that omitted va riable bias is prescnt in t he regression of test scores against the studentteacher ratio.t] "t" f3rz. of . = X 2' That is. 1. In the linear multi p!e r egression model.teacher ra tjo in the jlb district (X li) equals some value x.
however. Thus the popul ation regression function in Equation (6.(X I + AX\) + P2X 2• (6. fJ ..Y = 1316XL' Th at is. is y + 6.Ieacher ra tio and the fractio n of students still iea rniog E nglish. that is. hOldi ng X 2 consta nt.fJo. the f. o ther student characteristics. write the popu lation regression (unelion in Equa tion (6. is the difference bc tll.2) as Y = Po + (3 jX 1 + ~X2 ' and imagine changing X l by the amount . Accordingly. Simply pU L. Tn addit ion to the stU dent. the intercept 130 dctt:r mines how fa r up the Y axis the population regressio n line starts. yielding 6. (or exam ple. Y w1..2). The interpret ation of the intercept in the multiple regression model..holding X 2 cons taot (6.1 y.enl p.3) An equation for 6 Y in terms of /lX l is obtained by su btracting the eq uation Y = AI + PtX. and luck. test scores are innuenced by school characteristics. J USI as in the case o f rcgre<. ctoTS Ihal deter .2) is different than it was when X II was the only regresso r: In E q ua lio n (6. Because Xl has changed. Just as in the case of regression with a single regressor.2) needs to tK: augmented to incorporate these additional factors. The Population Multiple Regression Model TIle population regression line in Equatio n (6.1 X \ while O Ot changjng X 2. Ano ther phrase used 1 0 describe 131is the plllrtial erred on Yoi Xl' holding X1 fixed .11 change by some amount . een tb e expecled value of Y when the independenl variables take on the values XI + I I and X2 a nd Ihe expec ted value of Y whe n the independent variables take on th!' values X I anu X 2.2) is the relat ionship between Yand Xl Rnd X 2 that holds on average in the population. Y + 6 Y. this relationship does not hold exactly because many other (acto rs influe nce tbe depe ndent variable. while holding Xl constan!.nition that the expected ellen on Yo ( a change in X l' 6.4) The coefficieot p\ is the effect on Y (the expected change in Y ) of a unit change in Xl' bo lding X'2 rucd . After this change.3). is the effect on Y o f a unit change in X I ' holding Xl co nstant or controlling for Xl ' This interpretation of P! follows from the defi. Y = Po + J3 .XI . in Eq uation (6. the new va lue o f Y.194 CHAPTER 6 Unear R egrcs5ian with Multiple Regres$Or5 The interpretation uf the coeffi ci. + /3zX2 from Equation (6.i s sim ilar to Inc interpretatio n of the intercept in the singl e re gr~ s sor model : It is the e:'Ipected va lue of Y. when X li and Xu are zero. 131 = :~. say .sion wi th a single regressor.
2 The Multiple Regres~on oYlodel 19S . more gener ally.whe reXr~ == l. . The two waysofwriling Ihe po pula tio n regressio n model. Xl i' X li' " .i = J. . These factors incl ude nOI just thc percentage of English learners.IXli •···• X ki ). tltink of f30 as the coefficient on Xu. we have Y. (6. The multiple regression model holds out the promise of provid ing just what the superintende nt wa nts to know: the effec t of cha nging the student. . . + . The multiple regressio n model with k regressors. For example..\i' var(u.X..2 ) aj.the value Ifor all observati o ns. n. incl uding the economic background of the siudenis. teristi(:S. TIle error te rm l. Accordingly.tj in the multi ple regression model is homoskedastic if the vari ance of the con dit ional distribution of II. (In The variable Xr~ is sometim es called the constant regressor because it takes on the same value.. bIll Olhe r measurable fac tors that might a[fect test pe rformance. XJ:. ignoring the stude nts' economic background might result ill omitted variable bias. is constant for i := 1. a re eq uiva le nt The discussion so fa r has focused on the case of a single addilio na l variable. .6. n. (3(1) is some limes called Ihe const aol term in the regression .2. E quation (6. . the intercept.he stu· "ample. holding constant o the r fac tors that are beyond her con trol. . X li and X2r I n regression with binary regressors it can be useful to treat (30 as the coeffi cien( on a regressor tnat always eq uals 1. . given Xli"'" .5) .Xkr Otherwise. however.teacher ratio.. Xl' In practice. the error term is beteroskedastic.. Similarly..3) ~ i on Y. == f3oXOi + {3\Xli + ~X2' + ui. a mod el tbat includes k regressors. the re might be multiple factors omitted from the si ngle regressor model.6).. To be • . n and thus does no t depend o n the values of Xli' ..i...4) aoge cl an r sim is the (leter· ~ ynnd rcs~i o n )cca use . . t "e rror" term u i' This error term is {he deviation of a particul ar observation (test scores in the /1h district in our eXample ) from the average population r elationship. Accordingly.. .6) K6.5) and (6. (6. = (30 + {3\ X Jj e + M 2i + u" i = I.. Eq uatio ns (6. 5) is the population multiple regressio n model whe n there are two regressors. where XOi == 1 for i = 1. This reasoning leads u ~ to consi de r a model with three regressors or. is summ a rized as K ey Conce pt 6. ds to \'lC a l d~ter .n. Xl e where the subscr ipt i indica tes th e ph of the 11 o bserva tions (distric ts) in the sampl e. The definitio ns of ho moskedasl ic ity a nd heleroske dastic ity in Ihe multi ple regression model are extensions of their defi nitions in the singleregressor mode l. just as ignoring the frac tion of English learners did.5) can a lte rnatively be wriuen as d. the populat ion multiple regressio n mode l in Equa lio n (6.
" ".tit observations on each of the k regressors: and Yj is . is the error term.. resulting from changing X li by one unit . ·· · ... + fJlrf /i' • f31 is the slope coefficient on X" {32 is the coefficient on X 2• and so on.ion with Multiple Regre$$OrS I Kf' CONaPT THE MUlTIPLE REGRESSION MODEl The mu1ripk regression model is YI = 6. .Xj. .196 CHA PH I 6 lineor Regres!.. .ln be cstimaled using ordinary least squares... + fJ..:. Xo. .2 ~ 1.. .3 The OLS Estimator in Multiple Regression This section describes how the coefficients of the multiple regression model can be esti mated using OLS....e interce pt can be thought or as the coefficient on 11 regressor. Fortunately.7 ) Po+ {JIXU + fJ2Xu + .Xl l = X2. are the /I . X .. holding constant Xl.XI. 6. The coefficient fJl is the expected change in Y.Xtj + Ill ' i = where • . .. d '"""fJo + P I Xj + ~1'2 + . II (6.11t observat ion on the dependem variable: XI I' Xu . of practical help to the superintendent... these coefficient!. we need 10 provide her wit h c~ti m'ltcs of the un known population coefficien ts flu" . .... that equals I for:lll i. 'Th. e. '~I:of the population regre::. however. • The population regression line is the relationship thai holds between Y and the Ks on average in the population: £(Y !X II =... = x. X/w The coefficients on the other Ks afC interpreted similarly. • The intercept fJo is the expected valu~ of Y when all the X s equal O. sian model calculated using a sa mple of data.
+ b0/.3 The OLS Estimator in Muhiple R:egfe~sion 197 The OLS Estimator Section 4.. /31' . is V. lhal is. bl: b e eslimators of .8) is the extension of the sum of the squared mistakes given in Equation (4..1. .bo . h k until you are satisfied that you have minimized the total sum of squares in expression (6.:i) ~ Y. The method of OLS also can be used to estimate t he coefficients /30. .f3k· The OLS estimators are d e noted 'ou· 131' · . 0= call e~tima t ors could be computed by trial a nd error.'oo and . X kl' based on Ihe OLS regression line. .6.I (Yi . It is far easier. .Yj ... . + b~k" and the mis wke in pred icti ng Yi is Yi .6) fo r the linear regression mode! with a single regressor. The key idea is Ihat these coefficients can be estimated by minimizing the sum of squared prediction miSlakes.. These form ulas are incorporated into mod ern statistical software.okin the multiple regression model. by choosing the est imators bo and b l so as to minimize L~.ok. . Lei boob]. + ~kXk' TIle predicted value of Yi given Xli' . . The OLS residual for t he /"Th observation is the dif fere nce between Y. . 13k· The terminology 01" OLS in the linear multiple regression model is the same as in the linear reg.0 1. The formulas fo r the OLS estimat ors in the multiple regression model are simil ar to those in Key Con ccpt 4.2 shows how to estimate the intercept and s lope coefficients in the singleregressor model by a pplying OLS to a sample of obserntions of Y and X. 111e OLS . .b. .x. ..8) are called the ordinary l~as! squar:s (OLS) estimators of /30. Y. The estimators that d o so are the OLS eslima 1 0rs.. repe atedly tryin g differen t values of bo. . 131.8).bl Xy..0. 2: bo b. is bo + b]Xli + . the OLS residual is can u. .es for the linear regression modd in expression (6.Tht: sum of these squared prediction mistakes over all II observations thus is ..8) Tlle sum of the squared mistak. . b.X" . . ..b o . =[ " (Y.. + .l'.(bo + b 1X1.. In the mu ltiple regression model .OJ. and its OLS predicted value_ th at is. T he es timators of Ihe coefficients fio.... ~ ~o + ~lXl.00'.2 (or the singleregressor mode l.. + + ~r. . however.ression model with a single regressor. w their presentation is deferred to Sect ion 1B..'ok that minimize th e sum of squart:d mistakes in expression (6.b1X t. The OLS regression line is the strai ght line consLIucted using the OLS estimators: ~o + ~I XI + . •.XI:'.. .. . . the formulas are best expressed and di ~ c usse d using matrix not Cltion ... . .The predicted value o f Y" ca[culated using thcseestimators. to use explicit fo r mulas for th e OLS estim ators that are derived using calculus. (6.tXkt.
{3I " .. 11 /. •fh arc the values of bo.2.11 ). 11.. 11.. it is possible that the OLS estimator i" subject to om itted va riable bias. and (6. Application to Test Scores and the StudentTeacher Ratio in Section 4..• (3A and error term..9  2.s.teachcr ratio (XI. = + ~S. .) and the percent age of Engli~h ..3..).. IJ... The OLS estimators ~o.10) Yj  Y..e = 698.b1XII .. i = I..28 x STR. we used OLS to estimate the inte rcept a nd slope coefficient of the regressio n relating test scores (TcsrScore) to the slUdeoHeacher ra tio (STR ).b~J..Xk/' i = I. ~I' .11 ) O ur concern has been that this rela tionship is mis lead in g becau se the stu · de nHeac hcr ra tio mig....9) (6. h( that minimize the sum of squared prediction mistakes L~_I (Yj .198 CHAPTER 6 Linear Regression with Multiple Regressors THE OlS ESTIMATORS. are y/  Po + b1 xU il. ...__~~==~~~~ ______________________ T ••...Y' The OLS predicted values Y...hl be picking up the effect of having ma ny E ngl ish learn e r~ in ois tricts with large classes. (0. These are estimators of the unknown true popula tion coe ffic ients {3o.. reported in Eq ua tio n (4 . . the estim ated O LS regression line..bo . We arc now in a posit ion 1 0 address this concern by us ing OLS to estimate a multiple regression in which the dependent variable is Ihe test score ( Y.i = 1. TIle defi nitions and terminology o f OLS in muhiple regression are sumnw · riled in Key Co nc~ pl 6. f31' . ANn RFSlnllAI ~ IN THF MIILTIPLE REGRESSION MODEL ~CONaM . That is.) and thc:rl! are I~O regressors: the studcnI.:. is r. and residuals ii. 6. PREDICTED VALUES..3 The OLS est imators fJo. .. '~k and residualtii are computed from a sampl e of II observati ons of (Xli' .b l•. X ki • Y. using OUf 420 obse rvations for California school distric ts.
. Second.3 The OlS Estimator in Multiple R egression 199 learners in the school district (Xl. These two es tim ates ca n be reconciled by concl uding Ihat th ere is omitted imate in the singleregressor model in Equation (6.10 X STR .1 0 points. 0.28 points. butl his estimate refl ects hmh the effect of a change in the studenH eacher ratio ami the omitted effect of baving fewe r English learne rs in the distri ct. First.'hal yo u Jearn ed ahout the OLS estima tor wit h a single regressor carries over to mu lt iple regression with few or no mod ifi cations..11 ). TIle rest of this chapter i~ devoted to understanding and to using OLS in the multi ple regression model. \'vhcreas in the singleregressor regression. it is estimated to increase test scores by only 1. ln variable bias in th e eST Section 6. balding constant (or controlling for) PctEL.teacher ratio (131) is 1.0. O f lhese two methods.1 1l he stu 1earo l ataT is t imate D ~d ther: :English where PCfEL is In e percentage of students io the district who are English lea rn ers. multi ple regression has twO important advantages. The estimated effect on test scores of a change in the studentteacher rat io in the multiple regression is a pproximately half as large as when the studentteacher ratio is the onty regressor: in the single regressor equation (Equation (6.a unit decrease in the STR is estimated to increase test scores by 2.1) and tbe multi ple regressio n approach (Equation (6. it provides a quantitative esti mate of the effeCl o f a unit decrease in the studentteacher ratio.420) .0 . we saw Ihal districts with a high percelHage of E nglish 1c arn e r ~ tend to have no t on ly low test scores but also a high st ude ntteacher ra lio. 12)].  (6.65 X Pct EL. so that muHiple regression can be used to con trol for mea surable fac tors olber than just the percent age of English Jearn ers. We begin by discussing measures of fit for the multiple regression model. Pc{E L is not held constant. . . . We have reached the same conclusion that the re is omitted variable bias in the relationship between lest scores and th e student. th e OLS estima te of the coef ficient on the sL ud ent. and the OLS estim ate of tlte coeff icient on the percentage E nglish le arn ers lfh) is .11) ). If the fractio n o f E nglish It'a rne rs is o mi lled fro m the regressio n. which is what the superintendent need!.1.1 2) f the us~ ng t SSIOn (6. . so we will focus on that which is new wit h multi ple regression.6. This difference occurs hecause the coeffici ent on STR in the mu lti ple regression is tbe effect of a change in STR.12)]. The OLS estimate of the intercept St3o) is 686. to make her decision. but in the mu l tiple regression equation (Equation (6.10.0.teacher ratio by two differ ent pa ths: tile tabul ar approach of dividing lh e data into groups (Section 6. it readily ex te nds to mo re than two regressors.1. red ucing the stu dentteache r ra ljo is estimated to have a larger e ffect o n test scores. Much of ".65. • . TIle es ti· mated OLS regression line for litis multiple regression is TesrScQre = 6M.) for our 420 distri cts (i = 1.
· 200 CHAPTER 6 linear Regression with Multiple Regres$(I(1 6. As in Section 4.2.In unee of Y i ll o t explained by the regressors.lope and int ercept of the regression line) . then k =.l bias int rod uced by I!stima ti ng twocoefficie nlS (the .in gle regressor: R2 _ ESS SS R TSS = 1 . All lh ree stat istics measure how well the OLS estimate of the multiple rcgrcs. TheR 2 The regression R! is the fraction of tbe sample variA nce of Y....3.n .13) and tht: defm ition of Ine S£R in Section 4.: l of the degrees·offree dom adjustment is negligible. Equivalently_the R2 is I minus the fraction of Ihe \. is tht: s um of squared residuals.explained by (or pre dicted by) the regressors.)2 and where the explained sum of squa res is ESS = squares is TSS = l: ~"I (Yi . w 1 l ere.TSS ' (6 141 }.  the total sum of .Y)~. the SER is a measure or the spread of Ihe distribution (If Yaround the regression line.. If there is a sing. the SER is SE R  ~. The mathematica l defin ition of the R2is Ihe saml. 13). and the adjusted R2 (also known as R ~ l. The only difference b~ t wcc n the definition in Equation (6.l summary statistics in multiple regression are the standOl rd error or the regression .. lhe di visor n .3 is the same as in Equa tion (6.1." the data.3. When II is large.ion line describes.3 for the singleregressor model is that here the dIvi sor is 11 ~ k ~ 1 rathe r lhan n ..I adjusts for the downward bius introd uced by es tima ting k + 1 eoeft"iciell1 S (the k slope coeffi· cients plus the inte rcl!p t) .k .4 Measures of Fit in Multiple Regression Three commonly usel.. :£7. The Standard Error of the Regression (SER) Tbe standard error of the regression (S ER) estima tes the sta ndard deviation of the error term H i' Thus.j.. the divisor 1/ .. so the formula in Section 4. using n .1. SSR .k ..1 rather than n is called a degreesof·freedom adjuslmenl. In multiple regression.7_ l itf.2 (rather than 1/) adjusts for the downwa n.le regressor. (Y. Herl!. or ·'fits. In Section 4. . 'i. the dfcr..l) where the SS H.. the regressio n R2.2 _ 1 '" ' 2 _ k _ t ~fli  1' II _ SS R k I' (6.) as for regression with a .
an increase in the R2 does not mean Ihat adding a variabl e actually improves the [it of tb e model.I). Second .13)} 10 the sa mple varia nce of Y Th ere are three useful lhings to know about the R2.. But this means th at the R2 generally incre ases (and never decreases) when a Dew regressor is added. the R2 increases whe never a regressor is added.1)1 (n . 'r (6.As the second ex pression in Equation (6.\)/(11 .2. think. The adjusled Rl. is a modi fie d version of the R2 that does nol neces sarily inc rease whe n a nelV regressor is added.L th e R2 Clln be negative..k . does.1 4 ) total sum of The differe nce b~t wee n this Connula a nd the second defin. 15) by (or prt: of the vari ~ with a sin (6.1) increases. red uce Ihe sum of squ ared res id uals by such a small amount that Ihis reduction fails to offset the factor (11 .15) s hows. One way to correct {or this is to deflat e o r reduce the R2 by some factor. But if OLS chooses any value other than zero. the R2 gives an inflated estima te of how we ll the regression fits the data. the SSR taIl s.·alue reduced the SSR relative 10 the rcgression that excludes this regressor. To see this.6_ 4 Wieowres 01 Fit in Multiple R egrenion 201 lndard <IS Rl ).I).k . TIle Til = I _ he defin i the divi 'r than n} the slope ts for the pc cocfii II is called " l.ller than 1.ition of tbe R2 in Equa tion (6.14) is tha t the ra lio of the s um of square d residuals to the Lmal sum of squares is multiplie d by fhe faclor (n .J) is always gre .k . In practice it is extremely unusual for an estim ated coefficient to be exactly zero. First. then it must be thaI this . On the o ne hood. add ing a regresso r ha s two opposite effects on the R2.the factor (n . or /i. rc~~ ion llion of ution of In mult ipl e regressio n. Whe ther the 'R2increases or decreases de pends on which of these two eHects is stron ger.1) /(11 .1 T SS = 1 .13) The "Ad.(n . so in general the SSR will decrease whe n il new regressor is added. OlS finds the values of the coefficient s that minimize the sum of squared res idual ~ If OLS happens to choose the coeffi cient on the n ~ w regressor to be cxactly zero. which increases the R2.: OLS 10 estimate the model with both regressors. and this is what the adjusted R 2. unless the e~ l im a( e d coefficient on the added regresso r is exactly zero. When you U51. the e Heet Ii?· is n I SSR 5'1: n . Ihis mea ns thai the a djusted R 2 is 1 minus the ra tio of th e sa mple variance o f the OLS residua ls [wilh the degreesoffreedom correct ion in Equution (6. sothe .1) /(n . about starting wit h o ne regressor and I h~ n add ing a second. . Tn this sense.usted n2" Because the R2 increases when a new variable is added. or "R 2. Th is happens when the regressors. taken toge th e r..k . "nlin. (6. O n the other hond. SO]?2 is always less than R2. then the SSR wi ll be the same whether or no t the seco nd va riable is included in the regress ion.k .
The R2is useful because it quanti rie. " maximiu Ihe Rh is ra rely the a nswer to a ny economically or sHlt islically meaningful queslion. We return to th e issue of hoW 10 decide which vari. Whe n lht: only regressor is STR.5 The Least Squares Assumptions in Multiple Regression lliere a re fo ur least squares assumptions in thc multiple regression moJ:. Because 11 is la rge and only two regressors appea r ill Eq uati on (6. The R2 fo r this regression line is R2 "" 0. or explain. The starting point for doing so is e xtcndi.IEL is add\!d to the regressio n. heavy reliance on the l?2 (or R2) can bot: a trap.. is excluded (Equatio n (6.424).11 )J shows tha i including PerEL in t he regression increased Ihc R2 from 0. In <lpph· cations.202 CHAPTER 6 lineor Regression with Multiple Regre~S()(~ Appl ication to Test Scores Eq ualjon (6.12) re ports the estimated regression line for the mull iple regression rela ting leSl sco res (TestScore) to the stude nHeache r ra tio (STR) and the per.(.5 when PCIEL is included as a second regressor. the decision about whe tbe r to include a variable in a muhiple regression should be based on wheth er including th aI va ri able alloW S you belle r to estimate the causal effect of interest. o nly a small iract ion o f the va riation in TestScore is explained .ng the least squMes assumptio ns of Cha pte r 4 to the case of multiple regressors.41. firs!.12).426. the adj usted R2 is R2 = 0. mo re than twofifths (42. and the stand ard error of Ihc regression is SER "" 14. Neverlheless.6%) of Ihe variati on in test sco re~ is explain \!d. 424 .051 100. cent age of English learners (PcrEL). 6. th~ difference between R2 a nd adjusted R2 i$ very small (R l "" 0. this va lue falls to 14.instead. we need to develop methods for quantifying the sampling unce rt:"linty of the OLS estimator. ion with o nly STR as a regressor.5. including the percentage of E ngli sh learn ers substant ially improves the fit of th e regression. In thi s sense. Using the Rl and adjusted R2.lbles to includcand which to exclude. however.. Compa ring these measures offit with those for tbe regression in whic h PetEl. however. The units of the SER a re points on the sta ndardized test The reduction in lhe SER tells us tha t predic tions about stan· da rdized lest scores are s ubsta ntia lly morc precise if they are made using the regression with both STR a nd PC:IEL tban if they a re made using the regrc~ . I . the \'a ria tion in the depen' dent va riahle.426 ve rsus"R2 "" 0. TIle SER for tlte regression excluding PcrEL is 18.in Chapter 7.6.. whe n Pc. the extent 10 which the regressors account for.
This assumption is used to derive the properties of OLS regression sta tistics in large samples. . this is the kcy assump tion that makes the OLS estimators unbiased.) < 00 ..) . observations with values far outside the usual range of lhe dataare unlikely..) random v:uiables.5 The Least Squores Assumptions in Multiple Regression 203 (Key Concept 4. . as in singleregressor case.ln appli Assumption #3: Large Outliers Are Unlikely The third least squares assumption is that large o utl iersth at is. have nonzero finit e fourth moments: 0< E(X1. This assumption serves as a reminder that. X /. .) < 00• . Therefore.3 for a single regressor also apply to multiple regressors. p.d . . As is the case for regression with a single regressor. ~atistica\lY .d. is zero. .. It rules out an inconvenient situation.5.u has a mean of zero. and Y. ca ll ed perfect multicollinearity. in which it is impossible to • . the Assumption #2: (X ii I X 2il • •• I X k i . Y.1his assum ption holds automaticall y if the data are collected by simple random sampli ng.: i . i = 1. First.426.6. The assumption that large outliers are unlikely is made matllcmatically precise by assuming that Xli' . Another way to state this assumption is that the dependent variable an d regressors have finit e kurtosis.5 ~o tn t s on lout stan ~sin!1. !ssion e peT j0. ariahle in ble al\ows ue of how ter 7.. l 11e commellls on this assump tion appearing in Section 4. . TIle second assumption is that (Xli" . i = I.3). and these are dis cussed only briefl y.i. but on average over the popu lation Yi fall s on the popul at ion regression line.TIlis assump tion means that sometimes Y i is above the population regression line and someti mes Yi is below the population regressi on li ne.. the expected value of II. ~gresstOn ~ tifie s the re de pen .i. 'In is assumption extends the first least squares assumption with a single regressor to multiple regressors. . We rNurn to omitted variable bias in mulliple regression in Section 7. 11 are independently and identically distributed (i. n Are i. ex tended to aHow for multiple regressors. ertaintyof ast squa re5 Assumption #4 : No Perfect Multicollinearity The fo urlh assumption is new to the multiple regression model.) < 00 and 0 < £(Y . for any value of thc regressors. . s S£ R Assumption #1: The Conditional Distribution of U j Given X liI X 2. . .0 < E(X l.I • •• I X ki Has a Mean of Zero The first assumption is that the condi tional distri bution of II i given Xl i'" . YJ . PctEL ession a small .Xki. X . The fourth assumption is new and is discussed in mOTe detail. the OLS estimator of the coefficien ts in the multiple regression model can be sensitive to large outliers. ~ to \4. added bores is antially ~ress oTs iT)' small .
) draws from their joint distribution. 4. The (ouClh least sq uares assumptio n is Ihal Ihe regressors are nOl perfectly multicollinear.<. J::(II.d. ... you reg. on STR.. ' _ u _ _• .F<.xk/ + ff"i = t. perfect muhicollinellrit). has condit ional mean zero given Xli' X:u.'" ' XJ. . . uted (Li. I/.h the computer. = f30 + J3 IX j. the coefrici em on tbe fi rst occurrence of STR is the effect on t~SI scores of a change in STR . thl! coefficient 00 one of the regressors is thc effect of a change in lh:l! regressor. on STRi and PaEL.. a second time instead of PC/EL. if you try to estimate this regrt:~ · sion the sohware will do one of three things: (1) II will drop one of the occu rre nce~ of STR.IX!i' X 2.X/ci) = o.• X ki amI Yi have nonzero fi nite fourt h moments. (2) it will re fu se to ca lculate Ihe OLS estimates and give an error me S)~lge : or (3) it will cra:.Xk .. .n. · . Why does pe rfeel muLticolLioearilY make it impossible 10 compu te the OLS estimator? Suppose you wam to estimate the coefficient on STR in a regre~~io n of Tes/Score.where L ". + f3zX2i + .sor (the second occurrence of STR). 111cre is no perfect mult icoll inearity. 3.. and STR j • This is a case of perfect mull icollin earily beca use on~' o( t he regressors ( th e first occurre nce of STR) is a perfec t linea r fuoction of anoth.exce pl tbat you make a Iypogra pttieal error Jud accidenta lly type in STR. In mult iple regression. hold ing the olber regresw rs const ant.. + f3. " I comp ute th e OLS estimator. holding constanl STR. Large outliers a re unlikely: X l i •. is a problem because you are asking the regression to answer an illogical question .. _ .I()N M()nFI  Y. • ! _ _ .: Ihal is.er regre!. The regressors are said 10 be perfectly mu1li co ll in~ar (or 10 exhibit perfert multicollinearity) if one of the regressors is a perfect linear funclion of the o the r regressors. . This makes no se nse..: IhM is." are independently and identica lly distrib. 2. _ _ _ _ ... . and 00 _____ • _ .::ss TeSISt'orc. . . Vi)' i "'" 1. . The mathematical reason [or this fai lure is that perfect multicolli nearity produces divi sion by lero in the OLS formuJufo..I ! 204 CHAPTER 6 linear R egression with Multiple Regreuoo ':~:~::. ___ ! __ • _ __ _ !.... In the hypothetica l regression of TesfScon' on STU and STR .P1 6.4 THE LEAST SQUARES ASSUMPTIONS IN THE MUlTIPLE REC... Depending on how your suft ware package handles perfect multjcollinearity. (Xli' Xli' . At an imuili ve level.
produce different \lalues of the OLS es timators. the OLS estimat ors (~() and ~l ) are unbiased and consis tent estimators of lhe un known coefficients ([30 and (31) in the linear regress io n model wi th a si ngle regresso r. in large samples.. These results can)' over to multiple regression analys is.... This example is typical: When perfect multicollinearity occurs..b". hold )c{Jre 0 0 . In large samples.4)... and if the sample s ize is sufficie ntly large the sampling distribution of those averages becomes normal.6 The Distribution of the OlS Eslimotors in Multi ple Regres~ i on 205 The solutio n to perfect multicollinearity in this hypothe tical regression is sim ply to correct the typo a nd to re place o ne of the occurrences of STR with the vari able you o rigina lly wa nted to include. That is.4. differe nt sample. are averages o( th e randomly sa mple d da ta. the solu· tion \0 perfect multicollinearit y is to mod ify the regressors to e liminate the problem. A dditi o na l examples of p erfect mu ltico lli nearity ar e give n in Secti o n 6.7 . . wruch is the exte nsio n of the bivariate normal distribution 10 the general case o f rwo or mo re joinlly norma l random variables (Section 2. The least squarl!s a ~umptions for the mult iple regression mode l are SUmma· rized in Key Concept 6. tb e OLS estimators ~I)' ~l' .::ssag. the joint sa mpling distribution of '~k is well appr oximated by a muh iva ria te norma l d istributio n..~: Po..[3o. .4 tha t. . . the samp ling distribution of ~o a nd ~I is well approxi· mated by a bi\lariate normal distribution.• {3" in the linear multiple regres sio n mode l. . In additio n. under the least squares assu mptio ns. th is varia lion is s umm a rized in the sampling distri bution of the OLS estimators. 13k" Just as in the case of regression with a sin gle regressor. {3 I. '~" are unbiased a nd consistent estima tors of 110 ' {31> ..6 The Distribution of the OLS Estimators in Multiple Regression Because the data differ fro m one sample to lhe next . under the least sq ua res assumptions of Key Concept 6. 6. In general. it o ft en refl ects a logica l mistake in choosing the regres sors or some previously unrecognized feature of the da ta set.4. This variation across possible sa mples gives rise to th e unce rtainty associated with the OLS estim ators of the population regressio n coe fficients. A lthough the algebra is more complicated when there a re multiple regresso rs.6. Because the multivariate normal l. th": ()T .. which also define s and di scusses imperfect multicollinearity. the cen tral limit theorem applies to the OLS estimators in the m ultiple regression mode l for the sa me reason that it applies to Y and to lhe OLS estima tors when the re is a single regressor: The OLS est imators bo o ~ \. . R eca ll fr om Section 4...p. :t lnd OL S on test . is that ~' OU art' :ion.
k. This section providcs some examples of perfect mu lt icollinearity and discusses how perfect multi collineari ty can arise.hut not perfectly corre lnted.. in Equation (6.. However.. 131.. Key Concepl6.4) hold . 6. 6. . the OLS est im ators arc co rrela ted.. it docs mean that one or marc regression coefficients could be estimated imprecisely_ Examples of Perfect Multicollinearity We continu e the discussion of perfect mul ticollineari ty from Section 6.7 Multicollinearity As discussed in Sect ion 6. a lhird regressor is added to the regression o f TestScQre. Imperfect multicollin earity arises when one of the regressors is very highly correlated. Th e joint sampling distributi on of th e OLS estima tors is discussed in more deta il for t he case t hat there are two regresso rs anu ho moskedast ic errors in Appendix 6.! r~sult th aI . PI' . . and the general case is diSCU SSed in Section 18. in regressions wit h mult iple binary regressors. In each.5. impe rfcet multico llinearity does no t prevent estimation of the regress ion . Unlike perfect multico llinearity.j = 0. this correlation arises (ro m the corn: lalion between the regressors. . in large samples. 13k :~~~~~~If the least squares assumptions (Key Concept 6. perfect multicollinea rity arises when one of the regres sors is a pe rfect linear combi nation of the o the r regressors..y CoNcEPT LARGE SAMPLE DISTRIBUTION Of 13o. the distribUl ion of the OLS esti mato rs in multiple regression is approximately jointly normal.2 .5 P i distribution is best handled mathem atically using mat rix algebra . and can bc avoided. and PC lEL. on STR.5 by exa[ll ini ng three addit ional hypothet ica l regressions..with th e oLher regressors. nor does it imply a logical problem with the choice of regressors. . 12). In general. for the join! d ist ributio n of the a LS estimators are deferred to Chapler 18..• 206 CHAPTE R 6 linear RegrElS$ion with Multiple Regresson K.lTj ). ~1 arc jOin tly normally distributed and each is dis tributed N(P. . the expressioru.A .5 summarizes Iht. then in l(lrge samples the OLS estimators ~o..2.
Example #1: Fraction of English learners. NVS j eq uals I if STR. = 100 x X\Jf .6). 2:: 12 and equals 0 otherwise. aod FracELi. this question makes no sense and OLS cunnOI answer it. Thus we can write NVS. added Let PCIES. wht'n the reg. XOj. be the fraction of English learn e rs in the i ll! district. = 1 x XOi for a l1 lhe observa tions in our dala sel. per fect lfluJticollinearilY is a statement about the data sel you have on hand.ressioll includes a n inte rcept. there are no s uch dis tric ts in our da ta set so we cannot anal yze them in our regression. Like the pre vious example. specifically. Now recall [hat the linear regression model with a n intercept can equivale ntly be thought o f as including a regressor.6. the smallest value of STR is 14. NVSj = 1 fo r all observations.7 Multicollinearity 207 Ie i'" m Ion r Let FracEL . First. and PcrEL" the regressors would be perfectly multicollinear.. N VSi ca n be written as a perfect lin· ear combina tion of the regressors. so that Pc/ELi = 100 X Frac EL.: For every district. whieh v<lries between 0 and 1. Th e reason is that PcrE L is the per· cenrage of English learners. it is impossible to compute the OLS estima tes of lhe regression of Tes/Score j on STR h Pc/EL. . for every district. thai is. This il lus trates two important points a bout perfect multicollineari ty. There are in [acl no districts in our data set with STR i < 12. Thus. but for a more subtle reason than the regressi on io the previous exa mple. as you can see in the scatterplot in Figure 4." specifi call y. gres vides ulti inary very nlike al ion ~~ors. defined to be the percentage of stu· dents who arc not Englisb lea rners. Because of t his perfect multicollinearilY. OLS f:lil s beca use you are asking. Second. Th is regression also exhibits perfec t multicollinearity..teacher ratio in the jlh district is " not very small.) can be written as a perfect lin ear func tion of anoth er regressor (FracEl_ i ). Example #3: Percentage of English speakers. be the percent age of "English speakers" in the jIb disuic t. were included as a third regressor in addit io n to STR.. In ne Ola a nd d in Example #2: " Not very small" classes. PerES. exit l1l . While il is possible to imagine a school dist rict with [ewer than 12 students per teacher. as is shown in Equation (6. . that can be implicated in perfect multicollinearity is the constant regressor X o.2. hol ding constant the fraction of English learn· ers? Because th e percentage of English learners and the fraction of English learn e rs move together in a perfect linear relationship. the n one of the regresson.Thus one of th e regressors (PcIEL. If the variable Fr(/CEL . it eq uals Xo. Wbal is the effect of a un it change in the percenrage of E nglish learners. mated Let NVS j be a bin ary variable that equ als 1 if the student. A gain the regressors will be perfectl y mult icollinea r. the perfect linear relationship among the regressors involves the constant regressor X o.At an intu itive le vel .PctEL j • . that eq uals 1 {or all i.
and if aU G binar: variables are incl uded as regressors... if there is an intercept in the regression.... suppose you have panitioned the school districts into three ca tegories: rU(31.. if each obse rvation fa lls in to one and amy o ne category. a nd Urban . in which case one of th e binary indica· la rs is exclude d . =0 PerCcet multi coll inearity typi CAlly arises when a mistake has been made in specifyi ng the regressio n..6). you must exclude one of th ese fo ur variables. Multiple Regressors This example illustra tes another point: petiee! multicollinearity is a feature of the e ntire SCI of regressors.. a nd at a minim um you will be ceding control over your choice of regressors to your compute r if your rcgre . This situatio n is called the dummy variable Inlp.. lhe regressor Xflj) or PctEL. In Ibis case.11.. would be the average difference between test scores in suburban and rura l dist ric l:'.. Le t these binary variables be Rurali _ which equals I for a rural district a nd eq ualsOolberwise:SLlburbanj. The dummy variable trap. AnOlher possible source of perfect muhl colLinearity arises when multiple binary. 10 eSliro3tc the regressio n.. eithe r one of the binary indicators or the constant term.. ... ho lding constant tbe othe r variables in the regression .. variables are used as regrc<. H e ithe r the intercept ( Le.... if Rural.: mistake is easy to spot (as in the first example) but sometimes it is not (as in tbe: second example).. 1.. r SlIbw'balt.ual wa y \0 avoid the dummy variable trap is to exclude one of the binary variableS from the multiple regression. tht: regressors would not be perfeclh multicoll inea r. lhe constanl term is relained.. relative to the base cast' of Ihe omitted ca tegory..1" . . SOT S.. the n the coefficient on Suburban.. Solutions to perfect multicollinearity. Tn one way or another your software will le t you kn ow if yOU make such a m is take because it ca nno t compule tbe OLS estima to r if you h9\ ':· When your software lets you know th a t you have perfect mu ll icollineanlY. ~t is imponant that you modify your regression to elim inate it. Each d istric t fa lls inlo one (and o nly one) category. all G binary regressors can be inc luded if the interce pt is o millcd fr o m the regrcs~i on..(...ncloded as regressors. The u<.. J = XUj' whe re XIJj de no tes the conSlrmt regressor intra· d uced in Eq uat ion (6. . If yOli include a ll three bina ry vari· ab ies in the regress ion a long with a conslaDl . or dummy.soli .• 208 CHAPTER 6 linear Regression wit!. the coefficien ts o n the include d binary variables repre sent the incremental effect of being in that category. so on ly C . jf lhe rc a re C bina ry variables.1 of the G binary va riables are i. Somctintc~ th. Ihe regressors will be pe rfect m ulti collinearity: Because each district ~Iongs to one and o nly o ne ca tegory. were excluded from Ihis regression . then the regression wi ll fail because of p. a nd urba n.... + Urban... Rllral. ho ldiog cons ta nt the o the r regressors. In ge ne ra l. 111US. Ahernative ly. For ~xample..... Some software! I) unreliable when there is perfect multicollinearity. By conventiOn . suburban. For example. we re excluded .:r feet muhicolline<lrity.
6. tl }' It i 'es· ree ne) .The la rger is U le correlation betwee n the two regressors. which is the variance o f III in a multiple reg ressio n with two re gressors (Xl a ~d X. By ica t on and one inar)' per· usual abies l uded eprc . In o the r words.. Because these two variables a re highly correlated.) for the special case of a ho moske dastic e rro r. then the coefficients o n alleast o ne ind ivid ual regressor will be imprecisely estimate d. then the OLS estima tor of the coefficient on PclEL in this regreSSion will be unbiased. Perfect multi colli nearity is a problem that often sign al s th e presence of a log ica l e rro r.at is highly corre la ted with a nothe r regressor.net 'ari iulti I.Ih a t is. it o va re I ~ will tM: ress ots . it will have a larger variance than if the regressors PCIEL and percent age immigrants we re uncorre lated..lnd Pcll::L Su ppose we were to add a third regres· sor. means that two o r more of (he regressors a re highly co rre la ted. in the sense thatlhere is a linear func tio n o f the re gressors th. ho ld ing consta nt the percentilge immig ra nts. th en the coefficients on one o r more of these regressor. the data sel pro vides li ule informa tion abo ut wha t happe ns 10 lest scores when the percentage of Eng lish learne rs is low hut t he fraction of immigrants is high. Imperfe ct multicoWnearit)..2. whe n multiple regressors are imperiectly multico ll inear. a purpose of OLS is to so n o ut the independent influences of the var io us regressors whe n these re gressors a re po tentially cor related. imperfect muhieoll inearity is conceptually quite differ e nt th an perfect multicoll inearity. indee d.x . o r vice versa. tlle variance of {31 is inverse ly pro portio na l to 1 . impe rfeci m ulticollin earity is not necessarily an e rror.ma to rs. consider the regresslon of Te. t. Imperfec t mult icollinearity d oe s nOi pose a ny pro ble ms for the theo ry of the OLS esti. If the regressors are imperfect ly muhicoUinear. Firstgenera tion immigrants ohen speak E nglish as a second language. however. If the va riable s in your regressio n a re the o nes you mea nt to includethe o nes yo u chose to address the po te ntial fo r omitted varia ble biast hen impe rfect mul ticoll inea ri ty impJic£ tha t it will be d iffic ull 10 estima1e precisely one o r mOTe of thc parlinl e ffects usi ng Ih e da ta ::It hnnd . th ey will have CI large sampling vari a nce. In this case. yo ur da ta .x!" whe re Px. . icall)' . arilY. The effect of imperfect mul ticol\ineo rity on the variance of the OLS estimators ca n !:'Ie see n m a t ~ ematica lly by inspec ting Equation (6. is the correlation between XI and X 2. If the least squares assumptions hold. will be imp recisely est imated. 7 Multkolineority 209 of Imperfect Multicollinearity D espite its sim ila r name. so the vari · IEL a nd pt reeot age immi gra nts will be highly correlated: Districts with a bies PC ma ny re cenl immigrants will le nd 10 have ma ny stude nts who arc still learn ing E ng lish.P}.se of I' all G :ssion. it would be d ifficuJt to use these <. Fo r example.cs thl! il1 the ir yOU have.] n contrast .la ta to estima te the pa rtia l e fftet o n test scores of a n increase in PC1EL. the closer is th is term to zero and th e large r is the variance of (31' More generally. the perce ntage th e distTict 's resid ent s who are first ge neration im migra nts.HScore o n STR . and the q uestion you are trying to answer. but ra the r just a fe ature of OLS. + lro eof .1 7) in Appendix 6.
13k . plu s a fourth assumption ruling uut perfect mulli collinearity.. consistent.... The multiple regression mooel is a linear regression model tb al includes multipk regresso r:. X I' X 2. The coeffici ent 131 is the expected change in Y < Issociated \vith a one.:oefCicienl will be biased and will re neel both lhe effect of tile regressor fi nd the e ffect of the omitted variable. ing the omitted varia ble in Ih e regression. the OLS estimators have a joint sampling dis tribu tion and .4 are satisfi ed . The coefficie nts in multiple regression can be estimated by QLS. includ ing Ihe percenlage I. 4. us ually arises from a mistake in chuosing wh Ich .teac he r ratio.. The least sq uares ass ump tions for mu lt iple regn:ssion are extensions of the three least squares assum ptio ns (or regression wi th a sin gle regressor. Perfect mu lt icollinea rity. Thi s sampling uncertainlY must be quaoti[ied as part of an e mpirical stud y. Summary 1. Associated with each regressor is a regression coeHi· cie n! ...8 Conclusion Regressio n with a single regressor is vuloe rable to o mitled variable bias: If an omilled variable is a de te rminant o f the de pendelll va riable and is correlated W ith Ihe regressor.f English lea rners as a regressor made it possibl e to es timate ute effect on test l)Corcs of a change in the student. Muh iple regression make) il possible :0 mi tigate omilled varia ble bias by ind w... . holding co nstant the perce ntage of En \! li sh learne rs. 2. X k. and no rmally dislributed in large sam ples. holding Ihe other regressors constant. the OLS estimato . 13 1.5Of~ 6 . there fore. The coe ffi cie nt on a regressor. Beca use the regression codficients are estimated using a single Sam p le.J. 13l .. <lod the ways to do so in the mult iple regre ssion model 3re the lopic of the next chapter. then the O LS es timator o f Ihe slope . . 3. . which occurs when one regressor is a n exact linear fu nction (If the olher regressors. s io n coefficie nlS have an analogo us interpretation.. a ~ unbiased .• 210 CHAPTER 6 linear Regreuion with Multiple Regres. In the test scort: exa mple.IJOil change in XI. The other regn. XI in multiple regressio n is the part ia l effect o f a change in X I_holding co ns tant Ihe olher included regressors. have sampling uncertainty. Do ing so red uced by halt the estim ated effect on t e~ t scores of a change in the student teacher ra tio. The statistica l theo ry of multipl e regression builds on the statistical theory uf reg ression with a sin gle regressor.. When the four least squares assumptions in Key Concept 6.. . Q milled va riable bias occurs when an omiued va ria ble ( 1) is correlated with an included regressor and (2) is a determ inant of Y.
3 Explain why two perfectly multicollinea r regressors cannot be incl uded in a linear mull iple regression. ~t lineal g whid' l " .2 A mulliple regressio n includes IWO regressors: YI :: f30 + I3\X •• + ~X2J + 11. 5._ Wha t is the expected c hange in Y if X I increases by 3 un its a nd X 2 is uncha nged? What is the expected cha nge in Y if X l decreases by 5 units and X I is unc hanged? What is Ihe e xpecled change in Y if X I increases by 3 units a nd X 2 decreases by 5 unilS? 6. ~he fou f IO rS ~Ire I ..(Jk (197) OLS regres!iion line (197) predic ted value (197) O LS residual (197) R! and adjusted RZ(RZ ) (200._ . Will (3 . 201 ) perfect multicollinearity or to exhibil pe rfect muh icollinea rilY (204) dummy va riable tra p (208) sam have ofa n \he Review the Concepts 6. (193) coefficient on X l i ( 193) slope coefficient o f X2i ( liJ3) coefficie nt on X 2i ( 193 ) con tro l varia ble (193) holding Xl conSta nt (194) cont rolling for X (194) pa rtial effect (1 94) popu lation mul tiple regression model (195) conslliat regressor constant term (195) homoskedast ic (195) he te roskt:dastic (195) OLS estim ators or f3o.. Solvin g perfect m ulticollinearit y requires changing the set of regressors. . she rcgrcsse~ di strict average test scores on the num be r of compu ters per student. the R2. be an unbiased esti mator of the effect on test scores of increasing tbe number of computers per student? Why o r why nol? If you think 13 1 is biased.1.1 A researcher is in terested in the effect o n test scores of co mputer usage. G ive two examples of a pair of perfectly multi collinear regressors.nclude in a m ulliple regfes~ i o o . Key Terms omitted variable biaS ( UP) multiple regression model ( 193) popula tion regression line (193) population regression fu nction (193) inte rcept (19 3) slope coerticient o f XI. a nd the J?2a re measures o f fi t fo r the muhiple regression mode l.Review 1M Concepts 2 11 an regresso rs to i. is it biased up or down? Why? 6. The standa rd error o f Ihe regression. regrCC. {3.. . an coe W· cd with Using school district data like that used in this chapter.
Do there appear to be important regiona l diffe rences? h.3 Using the regression results in column (2): a. than l. The data set also contained in fo rmatio n o n the region of the country where the person Jived. Is age an important determinant of earnings? Explain.h inary variable ( I if R egion = Midwest . Do workers with co llege degrees earn marc. The highest educational achie'y e.1 C ompute R1 for each of the reg ressions. O o therwise) W(. Sa lly is 29year·old fem ale college graduate. on average. 6.2 Usin g the regression results in co lumn (1): II. The worker's ages ranged fro m 25 10 34 years.212 CHAPTER 6 linear Regre~ion with Multiple Regres~ 6.(2 are highly correl ated . ment for each worker was either a high school diploma or a bachelor's degn=e.O if high sc hool) Female = binary variable (I if fem ale. Do meD earn more than wome n on ave rage? H ow muc h morc? 6. h. Exercises The rirsl four exercises refe r LO the table of estimaled regressions on page 2 1~.4 Explain why it is diffic ult 1 0 t.:Slimale precisely the partial effect of XI' hold. if X I and. For the purposes o f these e xercises let A H E = average hourly ea rning!> (in 1998 dolla rs) College = binury variahle ( I if college.I.'SI = b inary va ria b le (J if Regio n = WesLOothe rwise) 6. computed usi ng d ata fo r 199R from the CPS. O o the rwise) South '= bin<lTY variable (I i [ R egion "" South . 6. ing X 2 conslanl. 0 if male) Agt" = age (in ye:lrs) N ln easf = binary variable (1 if Regio n = North east. Predict Sally's and Be tsy's earnings. Why is the regressor West omitted rrom the regression? What would happen if it was included'! . orkcrs with only high school degrees? How m uch more? b. Betsy is a 34y(? arold female college graduate.4 Using the regression resuhs in column (3): fl. a nd number of ch ildre n. marital ~Ia· IUs. a Olherwise) Midwesl =. 'Inc da la sct consists of infor ma tion on 4000 Culltime fullyear workers.
'R2 .48 5.29 . lAME ).64 2.o tained rltal sta r::FeOla!\: (Xl ) Age (X.72. 801ft denote the number of bathrooms.09OAge . BDR denote arold lhe num ber of bedrooms. . Cal culate Ule expected difference in earnings between Juanita nnd Jennifer.5.27 0. and Poor denote a 0 J if the condition of the house is reported as binary variable that is equaJ 1 "p OOL" An estimated regression yie lds • i\at wo uld 'PriCe = 119.. Hsize de no te the size o f the ho use ( in square feet ).62 2..60 .OO2Lsize ~ 48. College (X I) 5.e denote the 101 size (in :.8Poor.e + O.quare (eel ).29 0. Lsiz.4HSBDR + 23.62 0.2 + O. Jen nifer is a 28yearold female college graduate from the Midwest.2.Age de note Ihe age of the house (i n years). Reg". .". :{ info r I Ieh'\e\'e degree.75 SER R' R' 6.ssor ( 1) (2) (3) Ige 2D. 176  . 0.21 0.69  0.156Hsiz.22 0.44 . Jua nita is a 28yearold fema le college graduate (rom the South .5 Data were collected from a random sample of 220 home sales fro m a com munity in 2003 . \ workers !re? 6.48alh + O. Let Price denote The selling price (in $1000).027  Inicreepi Summary Statistics 12. + O.69 '4<) 3. Norlhe3s1 (X~) Midwest (X 5) South (X6 )  0.194 0. SER = 41.46 5.Exerci~~ 213 I hold Results of Reg ressions of Ave rage Hourly Eornin9 ~ on Gender ond EduCOhon Binory Voriobles ond Other ChoracterishCS Using 1998 Octo from the Current PopulotiOfl Survey Dependent variable: aven:lge hourly earnIng.190 6. 4(0) " 4000 4(0)  c.
th e researcher collects salary and gender jnforma tion for all of the firm's engineers. To deTermine potential bias. What is the expected increase in the value of (he house? b. 6.S. a.on has a permanent effect on a person's wage ratc.. A researche r is interested in delermining WbeUler a la r g~ aerospace firm is guilty of gender bias in setting wages.7 Critique each of Ihe following proposed research pla ns. U. Your c ritique should e xplain any proble ms witb tbe proposed resea rc h aDd descr ibe how the resea rch pla n might be improved . Suppose that a homeowner converts part of an exist ing family room in her house into a new bathroom. counties.ich variables would you add to the regression to conlrol for imponant omitted variables? b. Compur e the R2 for the regression. tenure . elh nicilY. A. Use your answer to (a) and the expression for omitted variable bi a~ giVe\l i. education . He collects similar data on a random sample of people \.214 CHAPTEII 6 l ineor Regression with Multiple Regressors a. wh ich inc reases the s ize of the house by 100 square fee t. Explain why this regression is likely to suffer fr om omitted variable bias. He collects data on a random sam ple of people who have been out of prison fo r a t least [if tee n yea rs. He plan s \0 regress th e COll n!./s crime rate on the (per capita) size of the county's police force. Include a d iscussion of any addi tional !. The data set includes information 00 each pe rson's curre nt wage. Wha t is the loss in value if a homeowne r le ts his house TUn down so tha t its condition becomes "poor"? d. age.\'ho have never served time in prison.n Equa tion (6..or unde restimate the effect of police on the crime ralc.6 A researcher plam to study the causal effect of police o n crime us ing data from a random sa mple of U.1 ) to delermine whether the regression will likely over. The researcher theo plans to conduct a "differ ence in means" test to determine whether the average salary for wome n are significantly less than the ave rage salary for men. A researcher is interested in determining whether ti me spent in pri. (That i\ do yo o trunk that PI > III or /31 < 131?) 6. Suppose tha t a homeowner adds a new bathroom to her house. gender. W'b. What is the elCpcctcd increase in the value of the house? c.lata tha t need to be oolicCled and tbe appropliate sta tist ical techniques for ana· Iyzing the data.
Assume that X l and X l are uncorrelated . and ullion status. . who sleep nine hours per night consider reducing their sleep 10 six or seven ho urs if they wam to pro long the ir lives? W hy or why not? E xplain . tenu re. Tne researcher plans to estimate the effect of incarceration on wages by regressing wages on an indicator variable for incarceration. and so on. The 1. ata y's s ely i.1 million observations used fo r this study came from a random survey of Americans aged 30 to 102. Co mpute the varia nce of ~I ' c.no (Notice that there is no eonstant term in the regression . a.) co ial of "r b. and so on). You are interested in f3 1• the causal effect of X I all Y Suppose that X I and X l are ullcorrelated . 17) in the Appendix 6. IX 1" X 2. Each survey responde nt was tracked for four years. occupation. would yo u reco mmend that Ame rican!. Does this estimator suffer from omitted variable bias? Explain. if you are interested in # 1' il is beS1 to leave X ] out o f the regressio n if it is correlated with X l'" 6. 6. Based on this summary. var(II.) satisfy the assumpti ons in Key Co ncept 6.llms. A random sample of size n = 400 is drawn from the populalion. You estimate f31 by regressing Y onto Xl (so that Xl is not included in the regression). as well as whether the person was ever incarcerated. . X 2) = 0. and higher than the dea th rate for people who sleep five or fewer hours.8 A recent study found Ihal the death nHe fo r people who sleep six to se ven hours per night is lower than the death ratc for peopl e who sleep eight or mOTe hours.4. in addition.5. .) = 4 and var(Xli) "" 6. Xli' X 2.Exercises n 21 S (time in curren t job).9 (Y" X I!. the variance of PI i ~ Jarger than it wo uld be if X l an d X2 were uncorreialcd. Comment on the folloA wi ng Malements: "When X I and X z are corre laled.) Fol lowing analys is like tha t used in Appe ndix 4. ted 6. X li) salisfy the assumptions in Key Concept 6. union status. oukl tho: dat<l ana 6. . Assume that (or(X}. m ' for i = 1. Compute the variance of 13}· [Hint: Look at Equati on (6.11 (Requires calculus) Conside r t he re~ ress i o n ri ~oi1 on ' fif Ie \liM ·.4. inclu di ng in the regression the other poten lial determinants of wages (education . 111is calculation was then repea ted for people sleeping six hours. The death rate for people sleeping seven hours was calcul ated as the ratio of the number of deaths over the span of the study among people sleeping seven hours to the 10lal number of survey respondents who slept seven hours.2.10 (Y i .2 : .on 0 11 t model Y j = f3 1X li + f32X 2j + u.
a. Show lhat PI  17 \ X 1/"y.. Hispan ic.P . I XIi'\'21 O.c<lf0 out lhe following exercises.P2X2' + PIX!. but include some add itional regrc!< sors to control for characteristics of the student. What is Ihe estimated slope? b.. J l~~ l xii' d. Incoillehi. b. a.·tll on B~tlUly. What is the estimated slope? ((I b. Female. R un a regression of years o r completed education (ED) on distance th e nea rest college (Dis /).• 2'. the student's famil y. Suppose thaTthe model includes an intercept: y / :: f30 satisfy P I) :: Y . E6. Professor Smith is a black male with average beaUly <l lld is a natl'ie English speaker. He teaches a threecred it upper·d ivision course.X] .. Pre' dict Professor Smith's course eva luation. Compute the pa rlia l de rivati ves of the o bjectio n fu nction with respect to b l and b 2• c. Minoriry.2 Using the data sel ColJegeDistance descr ibed in Empirical Exercise ·U.1 Using the da ta se l TeadungRarings described in EmpiriCA l Exe rcises 42. In partic ular. CHAPTER 6 Uneor Regression with Mo~ip\e Regressors a. Ru n a regression of UJ/me_Eval on Beamy.. "" O.. + /3)(21 + It. and SMlllfg80. . B lack. Specify the leas! squares funClion tha t is min imized by OLS. "* e. . carry oul the fo llowing exe r ci s~s. and NN English._ Show that the teast squares estimators Empirical Exercises E6. ClIeBO. What is the estimll ted effect of D isl on ED'! . Run a regression of Course _c. Wh at is lbe estimil ted effect of Beamy on Co/me_Enll? Does the r egrc ~s ion in (a) suffe r from im ror· tanl o mitted va r iable bias? c. OneCredir. and the local labor market In particuJar. Suppose ~~. includi ng some additional va riables to control for the type o f course aod professor c ha racteris tics. i::= 1. Run a regression of ED on Dis/. OWl/home. Fem(lle. Suppose:t~ l XI /X. D erive an expressio n for ~l as a functi on of Ihe data(Y"X 1"X2l ). 11. Dat/Coll. inc lude as additional regressors Imra. include as additional rcgn:~' sors Bytest.
Jim has the sam e characteri sti cs as Bob excepL (hat his high school was ted ditional teris edit. lind his family owned a home. d. E xplain why Cue80 a nd Swmfg80 appear in the regression. Include the appropriate units for aH entries. RGDP60.Empirical Exercises 217 . years at comple ted schoolin g using the regression in (b). b. Run a regression of G rowth on TradeShare. Assassinalions.2. a. Why arc the R 2 and R2 so similar in regres e.nim um and maximum va lues for the series Growth. E63 Using the datti set Gwu1h described in Empirical Exe rcise 4. TradeShare.'1lle unemployment rate in his county was 7. Whal is th e val ue of the coefficien t on Rev_Coups? Inte rpret the vldue of this coe fficient. attended college. Pred ict Jim·s years of complt:ted schooling using the re gressio n in (b). Pre dict Bob'!:. standard deviation. and the Slate aver age manufacturing hourly wage was $9.5'%.75. Re peat (c) but now assume that the country's value for TradeShare is .car ry stance to regres fa mily. Compare the fil of the regression in (a) and (b) using the regression n of !>ta ndard e rrors. Construct a table tha t shows the sa mple mean. o ne standard deviation a boye tbe w ean . ca rry OUI the following exercises. Yea rsSchoo/. R 2 and sion (b)? /i 2.t. A re Ihe signs of their esti rnalcd coefficients (+ or . YearsSchool. Oil. but exclud ing the da ta fo r Malta .3. His family income in 1980 was $26. Bob is a black male. E D? Co Use the regression to predicl lh e ave rage annual growth rate for a co untry that has average va lues for a ll regressors.) wha t you would have believed? Interpret rhe magnhudes of these coeffic ients. I rcgrcs (ldCol/. Pre· 40 miles from Lhe nearest college. g. His baseyear composite test score (B y test) was 58. Is it large or small in a realworld sense? 4.4.000. Assassinations and RGDP60. Rev_Coups. a nd mi. Rev_Coup. What does this coefficient measure? tors f. TIle va lue of lhe coeffic ie nt o n DadCofl is positive. but hi s fath er d id not. h. of Impor a tive ·se.pect c. does Ihe regression in (a) seem 10 suffe r from importan t o mitted variable bias? d. His high school was 20 miles from the nearest col lege. H is mother ses 4. Is the es tim ated e ffect of Dist on ED in the regression in (b) substan tively diffe re nt from the regression in (a)? Bast:d o n this.
SubslilUlion kk~I(Xi  X Y ~ O'~. . * ~: .lTj. 11_1 _ _ f31 = f3 1 +'~ i 1 1 lI L _ (X.3 st ates tha I I " ..::cause the errors are homoskedastic. n I [t ( Pi.1 ).: induded? APPE NDIX _ _ _6 _. 1 ~' . X b ) . L P.X) ' .. is ~" /1.X )II.x . the conditional variance of /1. can be writll:n var(u" X II . _ .in large samples the sampling distribution of P I is N(PI.I~ I WO rcgrciWrs (k = 2) und the errors are hnmoskeda. . 1) l h is appendix prc~'nls a derivation of the formula for omillcd variable bias in Equ31 ion (6.\i.16) yields Equal ion (6._ 1 11 Derivation of Equation (6. where th. Why is Oil omiu ed from the regression? What would ha ppen if it weft. . ." (6. and the error ternl l ' . APPE NDIX 6.( X.:.2 Distribution of the OLS Est imators Wh e n There Are Two Regre ssors and Homoskedastic Errors Although the general form ula for the variance of the OLS estim:ltors in multiple regrl!s· sion is complicated.3. if there are OLS estim.).fT!. tben the formula ~jmplifies enough to provide some insights into .: \ nri· ance of thls distribu\ion . Whcn there lire two regresso rs.xJ = px~rr~ux. B.· .~ . (1l 11) . IT). ~ COV (II/...he distribu tion vf th .1 ). Equation (4.• 218 CHAPTER 6 linear Reg(e$$ion with Multiple Regres500 e.ltors.JO) in Ap pendix 4... _ X)". (6 Ib) Under the last two assumptions in Key Concept 4. X II und X 2 homosktdastic.5. and of t hese limit s iolO EqUDl it.
17) pl . A no ther featu re of the joint normal largesmnple disuibutio n of the QLS esti mators is Ih:lt t WO P l and ~ are in general correlated.· (6.~ly. either positively or negati\O.Di~tribution of the OlS E~timom W'hen There Are Two Regreuors ond Homoskedoslic ElTOfs 219 where Pxlx . the corre lation berween the OLS estimalOrsp t and ~ is the negative of the correlation betWeen the regressors: corr(j3t. depends on the squared correlation between the regressors. The variance ujl of (he sampling dis tribu tion of P. If X Land X 2 are highly correlated.I~) .\ ' were close to O.x..{J2) .. When the errors arc homoskedastic.. isthe population correlation between the two regressors X I and X~ and u } is the population \'aria nce of Xt.and th us the term I . Ihen ISciosc to I.Px l.\. is small and the variance of P I is 1:lTgcr than il would be if P'r l . 18) (fi..x: in the denominator of Equ3tion (6.Pk .
Like nil estima tors. in volvc~ a .2 lind 7.:lIes o r the e ffec t on test sco res of a reduction in the student. Thl. Section 7.. in volves two o r mo re regression coeffici ellts. This chaple r presents me thods fo r quantifying the sa mpling uncertainl)' of the OLS estimato r through the use of s tandard errors. Sections 7. muJtipJe regression analysis provides a way 1 0 mitiga te lh ~ problem of omilled va riable bias by including addi tional reg ressors. Deciding which I'ariables to include in a regression i~ an im port ant practical issue. thereby controlling for lhe effec ts of those additio na l regressors.5 discusses ways to approach this pro blem. In Section 7.6. Section 7.1 extends th e me thods for statis tical infe rem:e in regression ..3 show how to teSI hypotheses thaI involve lwo or more regression coefficients. and confiJe nce intervals.teacher ratio using the California tes t St. so Section 7.4 ext(!nd~ the noti on of confidence intervals for a single coeffi cienll O confidence seh for mul1iple coeffici ents.:ore data se t. the F·statistic. The general approach to testing s uch "jo int " hypotheses new test stat istic. ilb a single regressor to multiple regression.! coefficients o f the multiple regression model can be estimated by O LS. lhe O LS estim<l lor has sampling uncertainly beca use ils value differs from one sample 1 0 the nex t. we appl y multip le regression analysis to obtain improved es tim. O ne new possibility thai arises in multiple regression is a hypoth esis that simultaneousl).CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regression A s discussed in Ch<lpt<:: r 6. SlaliSlica l hypolhe!li!l lesls..
.ion ivcs a with" test regressors. we migh t want to test the hypothesis that the true coe[[icient f3. there is nothi.1) ..The null value f3}. Under the least squares assump led 10 lhe estimator tions. which given in Equation (5.2). All this extends directly to multiple regression. and how 10 construct confidence intervals for a single coefficient in a mult iple regression equation. l 11c all lifters P I . The square root of ~.ndS elS for is an t r 110: f3j = f3 j .1 Hypothesis Tests and Confidence Intervals for a Single Coefficient This section describes how (0 compute the standard error. (7. This corresponds to hypotbesizing that the true coefficient f3 1 on the studentteacher ratio is zero in the population regression of test scores on STR and PcIEL. If Ihe allc rnative hypoth esis is twosided . PI OJ1 is the standard error of ~I ' SE(f3I)' an estimator of tbe standard deviation of the sampling dislribUlion of P I.. and this standard deviation is eSlimated by its standard error.g conce ptually different betweeo the single· or multiple· regressor cases. More generally. as in the sludenl. SE(p.l1le fo mwla for the standard error i1> most easily stated usi ng matrices (see Sectio n 18.4). Standard Errors for the OLS Estimators o Recall thai . how to test hypotheses..o \IS. Hypothesis Tests for a Single Coefficient Suppose tbat you want 10 lest the hypothesis that a ch ange in the studentteacher ratio has no effect on lest scores.a re the same whe the r o ne bas one. then the two hypotheses can be wri tten mathematically as ext. so for example tti .imate consistently the standard deviation o f their sa mpling distribution. H( {3J *. {3"o. £rom the decision· making context of tbe application. as fa r as standard e rrors are concerned. The important point is that.). I Icrt ~ J. in the case of a single regressor. yo( .o comes either from economic lheo. TIle OLS ei>l imator of the i" regression coefficient has a stand ard devia tio n. ho lding constant lhe pe rcenlage of English learn ers in tbe district. The key ideasthe largesample normal it y o f the estimators and the ability to esl.(3). the la w of la rge numbers implies thai these sample averages conve rge to the ir population counte rparts. two.teacher ratio exam ple. o r 12 OJ.7.1 HypoIhesis TM and Confidence Intervals for 0 Single Coefficieot 221 7. y or. on the regressor takes on so me specific va lue.o (twosided alternative) . il was possible to estimate the vari ance or the OLS estimator by substituting sample averages for expectations.
~ (7.~~ ~HE HYPOTHESIS i3 = i3 0 _ _.2) 2.0 j j '" • J.'/ is the value of the (statistic actually computed. f3jJJ 3... under the null hypothesis...~'.value testing f3j =. The theoretical underpinning of Ibis proced ure is that the OLS esti mato r has a largesample no rmal distributio n which.statistic.05 or...1 j.96.Q.1.. Compute the . .equivalcnt!y. The first step in this proced ure is LO calculall! the standard error o{ the coefficient ..0 =.O" The variance of this distritlU' {ion can be estimated consistently.J~ P i The procedu re for testing a hypothesis on a single coefficient in muluplt regression is summarized as Key Concept 7. Our task is to test the null hypothe:<. Ho aga inst the alterna tive H I using a sample of data. The standard error and (typically) the [statistic and ".. has as its mea n the h)'P0thesized true va lue.. KEY CONa" AGAINST THE ALTERNATIVE h /3.statistic 10 the critical value corresponding to the desired signif· ica nce level o f the tesl.statistiC actually computed i' . 2 CHAPTER 7 Hypothesis Te~t~ or. The second step is to calculate the {·statistic using the g\!n' eral fo rmula in Key Concept 5. 1. alternat.2 gives a procedure for testing this null hypothesis when Ihere is II single rt:g.3) . the sampling distribution of is approxima tely normal.0 arc computed au tomatically by regression software.ressor. pvalue =. Key Concept 5...tcache r ra tio has no effect o n class size corresronds to the null hypo thesis that /31 = 0 (so 111. Therefore we can simply follow the same prO"" cedure as in the singleregressor case to lest the null hypothesis in Equation ( 7. Reject the hypothesis at the 5% significance level if the pvalue is less than 0.5.. The . As stated in K. 111i5 unde rpinning is present in multiple regression as well. and that the variance of Ibis distributio n ca n be estima ted consistently. 7. 10 compare the ."e1}.d Confidence Intervals in Multiple Regre~~ion r. irt llK"l! > 1.:~ Concept 6.i. The third step is to compute the pvaluc of the lest using the cumulative nonnal dislJibutJon in Appendix Table 10 r.2¢( ! t""cl). Compute the standard error of /Jr SEcfl).0). • where . _ 13/ I  SE(~i).p 22. For example. (7. Und\!! the null hypothesis the mean o f this d istri bution is I3j." =i" . if the first regressor is S TR.. CompUie the pvalue.!. tben the nuU hypOlhesis that changing the stude ot.
1 and the method for constructing a confidence interval in Key Concept 7.965£(/"91.645. in 95% of all possible randomly d rawn samples. CONCEPT 7.e of the I .!' the null ~ria n cC (If Application to Test Scores and the StudentTeacher Ratio Can we reject the null hypothesis that a change in the studentteacher rat io has no erfect on test scores. Equivalently. Howeve r.96Sf.(i3)'~j + 1. Iy. the 95 % confidence interval is 95% con fidence interval for f3. K.!d confidence interval for the coefficient {3..12) and is restated here wit h standard errors in parenthcses below the coefficients: ed in Kt:~ ~Il der til': .1. n \ll ultipl.1.: bmput ed IS .2 rely o~ the huge sa mple normal approximat ion to the distribution of the O LS estimalor f3. (lnd we adopt th i~ simplified nota tion fur the rebt of the book. The regression of test scores against STR and PCIEL.16 in Equation (7. il is customary to denote thI s simply as t. it sho uld be kept in mind that these methods for qua nti fying the sampling uncerta inly are only guaranteed to work in la rge samples. Accord ingly. ~angi ng r he null lesis Ho Confidence Intervals for a Single Coefficient T he m e thod for constructi ng a con fide nce int e r val in the multiple regression model is (lIsa the same as in the si ngl eregressor model. was give n in Equation (6..s distritlU' same pro ~tion (7..3) CONFIDENCE INTERVALS FOR A SINGLE COEFFICIENT IN MULTIPLE REGRESSION A 95% (wosido.2. it is the set of values of P. fSis at t~·:~ denoted [.4) A 90% confidence interval is obtained by replacing 1. 1111S method is summa rized as Key Concept 7.tc':lcher ralio.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient 223 (7.1 1.. estimated by OLS. When the sample size is large.2) (7. once we control for the percentage of English learners in the district? What is a 95% confidence interval fo r the eft"cct on test scores of a c hange in the stude nl. controlling for the percent age of Engl ish learners? We are now able to find QU!. The mEthad for cond ucting a hypot hesis test in Kcy Concept 7. e nthere lt3ndard the gen r . lhat cannot be rejected by a 5% twosided hypothesis les!.7. it comains the (rue value of /3./ in this Key Conce pt.4 ) wilh 1. is a n interval that con tains the true value of [J/ wit h a 95 % probability: that is. natIve ~ tha i the C dsign...2  "" [~i . (7.I.
29 X STR + 3.S) (0. (7.1 0 X STR . Now. she can pay for those teachers e it her through ellis e lsewhe re in the budget (no new compute rs. that is.. (R. Because the null hypothesis says that the true value of this coefficient is zero.43 = ( .95 a nd . 0.J .0.2.26.43) (7. lI she is to hire more teach ers.6).% X U.48) (1.54.1 %. the smallest significance 11."' n li. holding expendi tures per pupil (and the per centage o f E nglish lea rne rs) constant ? Th is question ca n he addre.'1 at which we can reject the null hypothesis is 1.54) = 1.87 X Expn . significance level (bu t not quile at the I % significa nce leve l).lcache r ratio by 2.lir rnr u""lin~ Th AI .reduced maintenance.032) where Expn is total annual expenditures per pupi l in the district in thousatlds of dollars.0)10.0.h.val ue is less thall 5%. we first need 10 compute the .:\1.6 . it is 0 1111 .95.95 X 2.0.cry sma ll effect on lest scores: The estima ted coefficien t on STR is \ \0 I~ Equation (7. The resu lt is striking.43 .5) (0. Inl e rprelt~ d in the conte xt of Ihc s uperinte nde nt 's inte rest in decreasing the SlUde nl. Hypothesis Tesb end Confidence Intervals in Multiple Regression ~ = 686.26 X 2) = (. lolal spe nding pe r pupil .". we ca n be 95% confide nt Ihal the true value of the coefficient is be twee n .teacher ratio is esti mated to hat' a . Ihal is.U .0 .5) but.90. which.52). aft er adding Expn as a regressor in Equation (7. the ' statistic is' = (.1. changing tbe student. is the effect on test scores of reducing the studelllteacher ntio. Because the p. The associoted p·"aJuc is 24>( 2.26).7) (0. or by ask ing (or an increase in be r budge t. A 95% confidence inle rva l (or the popula tion coefficienl on STR is 1.031) To test the hypothesis that the true coefficient on STR is O.11) z: 1.I''''_ _ __ _..10 . taxpayers do nOI favor.. reducing class s ize will help test scores in her district. the null hypothesis can be rejected al the 5CY.1. a nd so on).0.5) l1as persuaded the supe rinte nde nt th at.1% . based on the evide nce so far . Holding expenditu res pel pupil a nd the percent:lge 01 English learners constant.hi' t.she asks. 0.1.59) (0 .O?Q M nrrnvrr . however. of ' he ~fficientlj':. thc 95% confidence interva l for the effeci o n test scores of this reduction is (.. .·statistic in Equation (7.1.6) ( 15.1..2). she moves 0 0 to a mo re nuanced queslion.224 CHAPTER... Wha t. .3.650 X PCIEL. 'n1f~ value.~ d by estimat ing a regression of lest scores on the studentteacher ralio. Your analysis o f the multi· Adding expenditures per pupil to the equation pic regression in Equation (7.656 X PuEL. and th e percent3ge of English leam ers:111e OLS regression line is ~ = 649..
54.6) is thaL.teacher ratio nor e xpenditures per pupil have an effect on test scores.) and the coefficient o n spe nding per pupil ({32) are ze ro.29 .r English 7. ODe inte rpretation of tlle regression in Equa tio n (7.0) /0. tba t is. fro m 0. O ur a ngry t axpayer hypo thesizes that neither the stu dent. he hypothesizes t ha t both 131 = 0 a nd {32 "" o. school districts could raise the ir test scores simply by decreasing funding for other p urposes (textbook s. the small a nd statistically insignit'icant coefficient on S TR in Equation (7. and so o n) and transferri ng those fund s to hire more teachers. lila! the coefficie nt on STR in Equatio n (7. Now. res on tnt . districts are alread y alloca ting the ir fu nds e fficie ntly.2 Tests of Joi nt Hypolhese~ 225 (7. spans. rcm ulti based 00 iet.0. Wha t a bout o ur angry taxpayer? He assens th at the po pulation values of bOTh the coefficient on the slUde nl. so the hypothesis thallhc population value of this coefficicnI is indeed zero cannot he rejected e\'cn al Ihe ]0% signif· icance level (1. d the per zero is now ( = ( .48 = .59 = 2.7.6) o f the test sco re aga inst the stu de nt. c level 'O ( ss Ihan quile JUG " ue value I t t of the he 95% ~ .87/1.6) indicates thai this transfe r would have little effect on test scores. Put diffe re nlly. Note tha t the standard e rro r o n STR increased whe n Expn was added.48 in Equa tion (7.60. Altho ugh it might seem that we can reject thi s hyp o tht:sis because the Is ta tistic testing /32 = 0 in Equation (7. the F sta listic.0. The t ax p flye r '~ hypothesis is a joint hypothesi s. thi s reasoning is nawed. introduce d in Section 6. I r e teach Jdgel (no Icrcase in cst scorC!.0.teacher ratio. and the percent age of English learners.7 in the context of imperfect m ulti collinearity.62) can make th e OLS estimators less precise.645 ). school administra to rs a llocate their budgets effici ently.6) provides no evidence thaI hir ing more leachers improves tcsl scores if overall expe nditures per pupil are be ld constant. If so. However.0. o nce • . cL.1. Suppose. and to test it we need a new lool.60 1 < 1.teacbcr ratio (ft.43 ill Equatio n (7.6 ) is I = 3. th a t corre latio n be tween regressors (the correlation between STR and Expn is . co un Ic rfaclUaily.) were negative and larg~ . (7. tech nology.95 x 2.2 Tests of Joint Hypotheses TIli s sectio n describes how 10 fonnulate joint hypo th eses on multipl e regressio n coe ffi cie nts and how to test th em using an Fstatistic.5) to 0.43.5) ccd to hat the 2. Thus Eq ua tion (7. in these Cali fo rnia da ta. thereby reducing cl ass sizes while holding expendi lures constant. This illustra tes the gene ral pa int .6).01 )usands of Testing Hypotheses on Two or More Coefficients Joint null hypotheses_ Consider the regression in E quation (7. expendi tu res pe r pupi l.
' null hypolhl!si:.nt •••• ? . we can w rit ~ th h. . ing the null hypolhesis thHt fJJ.0.7) is <In e xample of Equation (7.e. rcfi.. "" O.. sor in Equation (7 . ..:maticall~ ae.e. We consider joint null and fllternntive hypl)th~.6) a nd £'(1''' is tht:' second. the null hypothesis is tha llhe coe fficients on the 2ndA'h. ..fjJ = O .. then the joint null hypot hesis itscl f is fa lse. Tn general. In Ihie. (7. O .ion model. a joint hypothesis is a hypothesis Ihat imposes two or more re<. vs.8). so as a mailer alter. The null hypolhe"is in Equation (7. and Ps = o. What happens when you use the "one al a time testing procedure: Reject the joint null hypothesis if eilher" or'2 exceeds 1. .O.liter' nati\"(~ hypothesis is tha t al least one of the equalilies in the null hYPolhl!. tions 011 thl! m ultiple regression model: f31 = 0 (Il1d f3 2 .. f3 z = O.7) im posc~ IwO rl!<...'r to th. Beca use STn b the first regrc\./b i~ . and leI t 2 be the t'3tatistic {or test..1. the . for a total o f q restriclions.i\ f1~ doce.ll l:{J1 '* Oand/or fJ.o. •• ~ I.13"'. . thi. and 51h regres~ors arc 7Cro.. In gene ral. under the null hypothesi s Ho there arc q such restrictions... 1l. Ln II regression with k = 11 regre~ors.teacher ra tio (P I) IIlId the coc(fjcic nt on expenditures pc r pupil (f3 2 ) are "lero is a n example of a joint hypothesis on the coefficients in the muh iple regrcs.. TIlu!... Why can 't I just test the individual coefficients one at a time? Althou. ses of the for m 1111: 13) = f3 .: values of these coeffkients under the null hypothesis.. _ . Anot he r eX<lm ple is Ihat. I.is 1l1... · . If anyone (or morc than one) of the eq ualities unde r the nul l hypothe~b !II' in Equa tion (HI) is fal<.. the fo llowing c:llculation show~ thall~l' approach is unreliahle.6) thai {JI = 0 and f32 = O.!!~ it seems it should he possi ble to lest a join t hypulhcsis by using the U3ual r·:Wll s· tics to test Ihe restrictions onc at a time.so that there 3rc q = 3 restrictions. that is.226 CHAPTER 1 ~is Tests end Confidence Intervals in M ultiple Regrenion we com ro l for the percentage o f English learners.c. en. restricts the value of two of t he eoeffick·nts.~{J1 = Oandp2 = Ovs.8) where P" 13". and pj.c.Jlh. Il l: onc or more of the q reslriclions unde r H o docs not hold. (""'"'J 11\c hypothesis lilal bolll the coefficie nt on the studen t.. suppose that you are interested in test in!! t~~ joint null hypothesis in Equation (7.p". minology wc can say thatt hc null hypot hesis in Equation (7.tric. Let I I be the 1_!>t 9' tistic fo r lesting the null hypotbesis tbat 131 = O.z +.o' P". refer to differe nt regression coe fficil! ll ts. . S pecifically.tric· lions on th e regrt!ssion coefficients. not hold.o. . hypothe.
7) I) (mel a joint sc.omplicated .95 2 = 9. called t he Boo (erron.. ih '2 .9025 = 90.. in a It<: on lhe h at there re q such t h~sis fill the alter the<i. re fer pothesi<i thilt. lllC ad va nl:lgc of the Bonferroni method is that it appHes very ge nerally. The for mulas fo r the Fstatis tic arc integrated in to modern regression sofly. answeri ng il requires characterizing the joint sampling distribution of'l and . the of ler restr.As mentioned in Section 6. whe re each Istatistic has mean equal to 0 and variance equal to I.952 :c..6.:!. That approach is based on the Fstatistic. The Fstatistic with q = 2 restrictions.s di sadvantage is that it can have low powe r.he two tSla . is d ~s cr ibed in Append ix 7. the n turn to the general case o f q restrictions. the situatio n is even more (.statistics ' l and have a b ivari. O ne approac h is 10 modify lht' "o ne 31 a t ime " method so that it uses differ enl cri llca l values tha t ensure Ihal its size equals it!> significa nce leve l. Because t he "o ne a t :'I time" test ing approach bas the wrong siL. When the joint nuU hypothesis has the two restrictions thnt PI = 0 and tisl ics I) and 12 using the formula pz'" O.2 :cgre~_ Test1 01 Joint Hypotheses 227 math_ (7. The null is nOI rejected only if both I'tls 1. Fortunate ly.hhou~h 11 . the Fstntistlc combines . If th e regressors arc corrda ted.lt. .t ing tnt the I_~ta c for I~~t lit a till). Pr( lld s 1. This method .lIe normal distribu tion . d~ 1.e. Because the Istat istics arc indcpendent. Firs t consider t he special case in which th e (statist ics are uncorrela ted and thus arc independent.': rcstric ypothe (70S) .".96 and I (~ I ~ 1.s H~ Because Ihis question involves the IWO ra ndom variabkl) II and 12. there is anothe r approach to testingjoinl hypotheses tha t is more powe rful. il freq ucntly fa ils to reject the null hypothesis when in fad the ailcm ative hypothes is is true.75%. ~speci ally when 'he regressors are highly correlated.96.0.96. in larg~ samples ~I and have a joint normal d istribution. So the proba bility of reject ing the null hypot hes is when it is true is 1 . you get to try again using the second. TIlis "one at a time " method rejects the null too often because it gi ves you too many chances: If you fail to rejeci using the first {statis tic .96) X PrO'2! !S \. I. what is the pro bability that you will rej ect the null hypot hesis wh en it is true'! More than 5%! In this special case we ca n calculate the rejection prohabil it y of t his me thod exactly.96) = P r(I'll !S 1..7.96) = 0. ·lb c liize of the "one at a time " procedure depends o n the value o f the correla tion betwee n the regressors.25%.a new approach is needed .. method. tha t thl! :.lhat i ~ ils rejection ra te unde r the null hypothesis docs not equa l the desired significance leveJ. We fi rst d iscuss the case o f twO restrictio ns. ._stali~· . What is In e size of the "one at a lime"testing proced ure: that is.96 aDd \12! s 1. so under Ihe joint null hypot hesis the . arc.: 0.11 T he F Statistic Th e F statisti c is used to test joint hypothesis about regressJon coefficients.
a nd the formula for the F·stal istlc In Equati o n (7. for historical reasons most statistica l software computes homosked ns ' ticily·o nly standard errors by default. the F statisTic has an F2.ll.4.. If the F. mo re genernlly.orrelated. the Fstatistic is the average of the sq ua red ' ·stat istics. + f2 ' ..x distribUlion. Consequently. a hcteroskedast tc1W robu:!.~.! estimate of the "covaria nce matrix"). Under the alternative hypothesis thai either f3l is nonzero o r f31 is no nzero (or bo th).9) adjum for this correlation. leading the teslloreject the null hypothe~is. / 1 a nd I] are independent standHtd normal rando m va riables (because the Islalistics are uncorre la ted by assumption ). As discu!)sed in Section 5. U nder the null hypOIhesis._ distribUTion in Appendix Table 4 for the approprillt e value of q and the desired significance level.L·" ' (19) where P ")I is a n estimator of the correlation between the two (sta tistics. 1. .sla tistie is computed using the gener~ 1 b etc rosk edasticit ~ · robust fomu la. To und e~tand the Fstalis llc in Equation (7. so under tbe null hypolhesis F has all F2.<~ distribution (Section 2. lf so.'/ If ] ) 1 1" I'. In genera l the {·stati stics are (. Eq ua. then either 1 1 o r II (or hath) will be large. m:lking the P ·stal istic easy to compute in practice.3. Uode r the null hypothesis.2' _:: :. lio n (7. under the oull hypothesis. its large II distri bution under the null hypot hesis is Fq•x regardle~ of whether the errors are homoskedastic or he teros kedastic. Th e homoskedaslicityonly V CT"iofl of Ihe Fstatistic is discussed ot the e nd of this section. under the null hypothesis the F·statistic is distributt!d Fq. This adjustment is made so that.~ distribution in lnrge samples whether or no t the {·stati st ics are correlated.9).9) simplifies a nd F ::::.10) Thus the c ritical values for tbe Fstatistic can be obtained from the tahlt:!~ of the F'I. i({T The Fstatistic with q restrictions. This fa nnula is incorpo rated into regression software.228 CHAPTER 7 Hypolhesis Tests end Confidence Intervo!s in Muhiple Regression f .4). (7. in l:lrge samples. TIle formula for the he teroskedasticity· robus t Fsta lis!ic T esting the q restrictions of the joint null hypothesis in E quation (7. first suppose that we know th. Computing the heteroskedasticityrobust Fstatistic in sta tistical software. ('1 P" . in some software pack'lg~S you must select a ··robust'· option so that the Fstatistic is computed using he!' croskcd asticityrobust standllrd errors (and. That is. + fi): that is. is given by the Fq. in large sa mples..8) is given in Sec tion 18.1t the Ista tistics a re unco rrcl ~ t ed so we can drop the terms involvingp'. the Fsmistic has a sampling distribu tion that.
12) t hat.il' . . j = 1. Lei Ff denote the value of the Fstatist...stlC ~asticit). alternatively.. Then the join t null hypothesis reduces \0 the null hypothesi~ on a single regression coefficient. ..7I.value can be c\alualed using a compu te r. (7.. The p value in Equation (7. Becau~e the P·statistic has a largesample "~.Teacher Ratio We are now able to test the nu ll hypothesis that the coefficients on hmh the s tu dentIeacher ratio m ill expenditures per pupil arc Lero.lClual1y computed. That is. (7.11) can be evaluated u~ing (7.. no ne of the regressors expla ins any of the variation in Y.0 using the rcgre~ion of TestS('()re on STR..volue using the F·statistic. I s til' in under ether The "overall " regression F·statistic.. because a x~d is lribut(:d random variahle is q limes an "~/.7.Pr [f·~. k.". ah hough lhe intercept (which under the nu ll hypot hesis is the mean of Y. It.:z .~t i dt y uation ·ftwa re.12) is a special C<t!o.2 ~hofJoinlHypotheses 229 Computing the p. at le ast o ne j. and the I~slal i s lic is the square o f the Istatistic. Allematively.''i !cussed 11'1 oskedn._ > F. The pvalue of the Fst:uistic 7. controlling for Ihe pcrC\.11 ) a ta ble of the Fq dis .1 2).. we need to compute the hcteroskedaslicityrobust F sllltis(ic of the tc. To lest Ihb h) pothcsis. fl!gardk.thl! Fstatislic (Csts a single reslric lion .<£ trihution (or. and the overall regression f'statistic is lh e I'~stat is t ic computed for Ihe null hypot hes is in Equation (7. because formu las for the cumu lative chisquared and F distributions have been im:orporated in lo most mode rn slOlIis tica l softwa re. the p.. I' then bcsh.13k = 0 \'~ Jf1: {3/ "* 0.9) can he compu ted using the largesample Fq ap proximation to its d istribution. Hn: {3! = 0.dislributed random variable). . The "overaW' regre~sion Fstatistic tf!:>IS the joint hypothesis that all tht: slope coefficients are zero. W) ~abiCS of . the p\"alue is . a table of the x~ distribution.againsl the alternative th at at least one coefficient is nonzero.. OI .) can be llon"lero: rhc null hypo thesis ill Equlllion (7. distribution under the null hypothe<ii:o:. lind tbe The F·statistic when q = I .e" u~ i n g tu:!' das ticit~ Application to Test Scores and the Student.. 'l. . Iht! overall regressio n Fstatistic has an Fk•. in der the U nder this null hypothesis. . In IMge sam ples. .. the null and alternative hypotheses are .'il Ihal {jl = 0 and f3.nlage of English learn ers in the dj"tricl. When q = l.8)..!· paekag." dist ri bu t ion when the n ull hypothesis is true. illy versi(lll \ . that qua· fthe dard ion). .e of the general null hypoth esis in Equat io n (7 . r the p value ./32 = 0. I .
distribu tioo is 3." distribution.EL reported in Equ ation (7. we can reject t.. Based on the evidence in Equation (7. that is. and the 1% critical value is 4.43.) is va lid whe ther the error term is homoskedaslic o r heteroske daslic. be associated witb . the relevant reg. The value of t he Fstatistic computed fr om the data . c. the restricted regressiun is lbe regression in whieh th ose cocfficicn(~ are set to 'le ro. where all the hypothesized \'alue~ are 7ero.6). stic. this intuition has an exact mathematical expres· sio n. ~o the null hypothesis is rejected at the 1% level. lo contrast. Despit e thi s significant limit ali on of (he homoskedastk ity·oruy F slalistic. The resu lting Fstatistic is referred to as the ho moskedasticity·onJy Fstalistie.. if tbe e rror te rm is homoskedastic. the F·sta listic can be wrincn 10 terms o f the im proveme nt in lhe fit of the regression as mc:asured eit be r b) Ihe s um of squ ared residua ls o r by t he regressio n R2. . The HomoskedasticityOnly FStatistic One way 10 rest ate the questio n addressed by the Fstatist ic is to ask wht=thcr re laxing the q restrictions (hat constitute the null hypothesis improves the fit or the regression by enough that this improveme nt is unlike ly to be the result merely of rando m sampling va riation if the null hypot hesis is (Jue.1l) as summarized in this Fslatistic.005). In the [irsl regression.ressors arc excl uded from the regression. because it is va lid only if tbe e rror term is ho moskedastic. This resta tement sug gests thaI there is . Ihe simple formul a ca n be comput ed using standard regression outpu!.61.. substantial increase in we R2. In the second regression .230 CHAPHR 7 Hypothesis Tests and Confidence Intervals in Multiple Regreuion Expn.8). exceeds 4.43. This Fstat istic is S.. lhen the tcst reiects the null hypothesis. the null hypothesis is forced to be true. The S% crit ical value of the F2. link betwee n the Fstatistic and the regression R2: A large F· statistic sho uld.43 if Ule nuU hypoth_ esis really were true (the pvaJue is u.. ea]Jed the restricted regre~sion . The ho mos kedasticil yonl y F·statistic is compllled using a simple C ormul ll hased o n T he sum of sq uared rcsidu als fr om two regressjons. in large samples this statistic has au Flo.h e taxpayer'S hypothesis lhat nei_ ther the studenll eacher ra tio lIor expend itures per pupil ha\ e an effect on t t:~1 scores (holding constaot the pe rcentage of English learners). If the sum of squared residuals is sufficienlly smaller in the unre stricted than the restricted re£ression. Tha t js. th e altemative hypothc~i~ p allowed to be true.'llled the unrestricted regression. Under the null hypothesis. 5.61. ln fact. It is very uolikely thaI we would have drawn a sample that procluced an Fstatjstjc as large as 5. such as might be repon ed in a t<lble thai includes regres~ion R2's but nOI F·slatisti cs.. if the e rro r is homosked. u.. In addiTi on. When lhe null hypothesis is of the IyJ)C in Equation (7. it see ms. Ihe he terosked astici tyrobust F·stati~tic comput ed using til e formu la in SeCllo n 18. and P".00 (Appendix Table 4). its simple form ulAsheds light on what the F·~t a li stie is doing.
if the e rrors a re homoskedastic. In II>' Application to Test Scores and the StudentTeacher Ratio. ~.1) ' (7.. heUJl~ .ions li t lhe datOl .smt'frd ) l q (I . £t:!JII. a nd is give n in E quation (7.nIt~ ..k""~rT1(.ng Eq ua tio n (7. conlrolling for PcrE L .4366: tha t is. Thus. To test the null hypothesis that the populat io n coefficients o n STR and Expn a rc Q."'~wia(d is the number of regressors in tbe unrestricted re gression.R:.mtl) / (n . however. The unrestric ted regression has the regn.2 TestsofJointHypolheses 231 n e homosk edQstitityorn)' Fs tatisti c is given by the formula F'" (SSR mmqrd . we need to compu te the SSR (or R2) for the restricted and unrestricted regr ession . These rule ofthumb (onnulas are ea sy to compute and have an int uitive int e r pre ta tio n in (e rms o f how well the unrestriCied a nd restricted regres. q is the number of restrictions under the null hypothesis. An alternative e quivalent fo rmula fo r the ho moskedasticityo nl y Fstatis6c is based on the R2of the two regressions: F = (R~".d ) /q _ SSRunrnnicrtdl( n k UllujUlCIM I) (7.t _ .rricrfd . : F·sta frthe Using the homoskedasticityonly Fstatistic when n. for large sa mple sizes. R~"r~$lflcrffl = 0. and k. Because homoske dasticity is a specia l case tbal caono l be counte d o n in a pplica tio ns wiiJl economic data.. and Pct£L. whic b depend 00 both q and II . the n Ihe homoskedaslicity o nly Fstalistic defined in Equations (7. or mo re generally wit h data sets typica lJy fo und in the socia l sci ences. they a re valid o nly if the e rrors are ho moskedaSlic.:ssors STR.6): its R l is 0.4. Fq. If the errors rnd~rd a re homoske dastic a nd a re i.. t1 "alu~ ~nts art .o< distri bu tion as n increases.d. SSR"".:'I"':I(d is the sum of square d residuals fro m the re stricted regressio n.s small.4366.R?u"u.13) where S5R'. no rma lly d istributed . As dis cussed in Section 2.13) or (7.wrirred is the sum of squared residua ls from the unrestricted regression.14) has an Fq. U nfo rluna te ly.rftl .SSRullmlrkr. The ~ thesiS is esi~.1 distribution converges to the P1 1 . 14) Ir the errors art. in large samples. the two sets o( c ri rical val ues d iffe r. the sampling distribution of the ruleof· thumb F statistic unde r the null hypothesis is.. 14 ) a nd the het e roskedasticityrobust Fstatistic vanishes as the sample ~ i ze n increases. ress10n f n11Ula rt:nthe ressiofl. then the diffe re nce between the ho mos kedas ticityonly Fstatistic computed usi. the di[fc rc nces between the two distri butions are negligible. th e FqJl .:1ttl . a re give n in A ppe ndix Ta ble S.1.I dis tri butio n under the null hypothes is.<.13) and (7 . For small samples. C ri tica l va lues rOr tbis d istribution .i.k"nrtm i..7. bo moskeda!>tic. in practice the ho moskedasticityonly Fstatistic is llo t a satisfactory substi tute fo r the beler oske dasticilyrobust F sta tistic.
tEL.1)[ = 8. 111is example illustrates the adv::lDtages and disadvan tages of the hom osk~uas· ticity·only Fstalislic.d~ pre~entcd <.4366) / (420 .. but that reslrict iOfi involvcs multiple coerricienls ({31 and {:J2)' We need t o modify the meth.).032) ::= so 2. the effects of the first and second regressor are the sam~ 10 this case.. Fo r example. that is.0 1.. all hough PclE L does ( the nu ll hypothesis does nlll re~trict the codficicnl on PctE/.43. (7 5) (1. and the number of regressors in the unrestricted regression l~ k 3. computed usiog Eq ua tio n (7. cstimated b} 01 . It s disadvantage is that the va lues of the homos kcdas tieityo nly and he! t:roskt:dnsticityrobust FstatiSlics can be very different : The helcroskedastiClty robu~1 F·stntislic testing this joint hypothesis is 5.. 7.6J. IJn 5TR and E:r:p" are zero: that is.IM = O. tions is 1/ = 420. has a single res triction .01 .14).3 Testing Single Restrictions Involving Multiple Coefficients Sometimes economic theory s uggesl ~ a single restriction tha t in voh"es two or nlOre regression cocf[icients.15 R~n.Ol exceeds the J % critical val ue of 4.0. the hypothesis is rejected <It the j % le vcl using this ruleo fthumb approach. quite different from tht.0) (0. Tllcre are two appro.4149..4 )49)1 211[«( .0.' 1e~s rcliahle homoskednstici tyonly ruleof·thumb value of 8.4366 .1 .l'lt depends on your 'illftwarc. so q '" r... theory might suggest a null hypothesis ("Itthl' form {31 = {3l.0 far to lest this hyp01hesis. The restricted regression. Because t!. H2 = 0. ."lches: w hich one \\ill ttc c:'i')ie. The homoskedasticityon ly F·statistic. The number of restrict ions is q = F ~ [(0.1 This null hypothesi .671 x Pc. under the null hypothesis STR and E:r:pn do not enkr the population regressi on.0. I~ TestScore  =:c 66·'1 . the number of obse na. the task is to test this null h}'Polhcsis against t he alternative thu t th~ tWO coefficit:nts differ: Ho:f3 I =/31 VS · lI j:{3I'1f3 2· (7 16.4J~ 9.\.232 CHAPTfR 1 Hypothesis Te~t$ and Con~dence Intervals in Multiple Regression re~trictcd regression i m po~es the joint null h }' pot h es i~ that the true coefficient<:. lls ad van tage is that it ca n be computed using a calculJlor.
Y\ of:..f32X li + f3JX 1 .'" distribu tion is 1. This me t hod c(ln be extended to other restrictions on re gression eq uations usi ng the same trick (see Exercise 7.9). so the 95 % pe rcentile of the Fl. In practice. the hypothesis in Equat ioll (7 . by turn ing Equation (7.4 that the square of a standard normal random variable has an Fl. y. the po pUlation regression in Equa Lion (7.18) I ~ wo or mort Ithcsis of the Ithe same.17) can be rewri tten as Yi = f30 + YVYI.17) into Equation (7. To be con crete.16) into a restriction on a single regression coefficient. ~m the \css Here is the trick: By subtracting an d adding f32X li' we have that .15 ) Approach #2: Transform the regression . Some statist ical packages have a specia lized command design ed to test restrictions like Equ ation (7_Hi) and the result is an Fstatistic that .96 2 = 3. suppose therc arc only two regressors.96S£(Y)). + f32 W i + ui' (7.:ntcr Approach #1: Test the restriction directly."" distr ibution . X l i and X 2t in t he regression. + fhX2t + Ui' (7.sion is k 14).81 . then estimating the regression of Yi on X li and WI' A 95% confi dem:e interval [or the diffe re nce in the coefficients f3 1 . .1.8l(X 1 where Y1 = f31 .82X 2j = (.f32 (Illd Wi = Xli + X 11. If your statistical pack age cannot test the restriction dir ectl y.1 ± J .82)XIi + .7. (Recall rrom Section 2. in the sense that the Fsta tistic from the first method equals the square of the Istatistic from the sec ond me thod .1l1US. has an F Loc distribution under the null hypothesis. = 0 while un der the alternative. is ted at the Y. because q = 1. we have tu rne d a rest riction o n two regression coefficienb into a restriction on a single rcgression coeffic ie nt. lti) can be tested using a trick in which the ori ginal regression equation is rewritten to tu rn the restriction in Equation (7.18).84 .17) lloskedas ~alculalO[. Because the restriction now involves the sin gle coe ffi cient YI' the nu ll hypoth esis in Equation (7..f32 can be calcul ated as ). • . I and het :edaslicit" I .) Des no t O LS.16).16) can be tested using th e Ista tistic me thod of Section 7.) = YIXli + f3 l W!.is (7.1tol ~ t restricti')" the methodS hieh onc Because the coefficient YI in this equation is YI = f3 1 . so the population regressio n has the form observa . The two methods (Approaches #1 and #2) a re equivalent .8\X 1t + f3VY2i = f3 1X II .Io that the tWO \7 .3 Testing Single Re5tridions Involving Multiple Coefficients 233 entS on )\ . Th us. O. = f30 + f31XI . + X2 i + .f32' under th e null hypoth esis in Equation (7. this is do ne by first constructing th e new regressor Wi as the sum of t he two ori ginal regressors.
a confidence set is the generalization to two or more coefficient s of a con!"i· dence interva l for a single coefficient.2 showed how 1O usc the F s tatislic to test a joint nuJl hypothesis tha t /3 1= /31. Figure 7.o at the 5% le veL For each pair o f cand ida tes (/3t O.0 a nd /32 = fJz p. TIlis section explains how to construct a confidence set for two or more regressi on coeffici ent s. Secl ion 7. in practice it is much simp ler to use an explicit formula for the con fidenc e ~c\· This fonn ula for the confidence set for an arbitrary number of coefficients is based on the formula for the P·statistic. A 95% confidence set for two or more coefficients is a set that contains the true population values of these coefficients in 95% of randomly drawn samples._ _ __ _ _ .0).0 a nd l3z. the set of values nOI rejected at the 5% level by tbis F·statistic const itutes a 95% confidence se t for /3 1 a nd {h Although this melhod of trying all possible values of {31. the resulting confidence se lS are ellipses.nificance level.1 for constructing a confidence set for a single coeffici ent using th e I·St8Iistic.z.e case of multiple coefficie nts. This a pproach.teacher ratio and expenditure per pupil. holding consta nt the percentage of English learners... 2.tI)· }'OU construc t the Fstari slic and reject it if it exceeds the 5% crit ica l va lue oj J.234 CHAPTER 7 Hypothesis Tesi's ond Confidence Intervals in Multiple Regression Extension to q > 1. ciems. This means that the null hypothesis thai these two coefficie nts arc both ze ro is rejected using tilt Fsta tistic at the 5% sil!. When there are two coefficients. except that the confidence sel for multiple coefficients is based on the F·statistic.111is ell ipse does not include the point (0.2 extends to Ihis type o f join! hypot hesis... . As a n illustration. The F·siatistic of Section 7.!(l. ware being used . based on the estimated regres· sion in Equation (7. flGU ellipse C' 00$ of PI rejected 7.6J and f3..61. 'The method is conceptually similar to the method in Section 7.Suppose you were to test every possi hie va lue o f . TIle p./3~. suppose ).0 works in thc· ory. stalistic ca D be comput ed by eithe r of the two methods just d iscussed for q = I Precisely how best 10 do this in prac!ice depends on Lbe specifjc regression soh.u aDd [32. Recall thac a 95% confidence interval is computed by findin g the set of vtll· ues of the coefficie nts thai are not rejected using a Istatistic at the 5% significance level. In general it is possible to have q restrictions under the oull hypothesis in which so me or all of these restrictions involve mult iple coefli. Thus. the true population values of (3\ :md [32 will not he rejected in 95% of all samples.To make this concrete. which we al ready kne w from Section 7.6). Because the lest has a 5% significance level.OU a re inte rested in construc ting a confidence set for twO coefficients. can be extended to th.4 Confidence Sets for Multiple Coefficients the 5% si.J shows a 95% confidence seI (confide nce e llipse) for the coefficients on the s tudent. Thus.
"Ine stan ing poin t for choosing a regression specification is thinking through the po~sible sources of omitted " a riable bias." 1. statistic. . . To make idence se t for tWO istic to test a joint o test eve ry possi Th e co nfi dence e llipse is a fal sausage with the long part of the sa usage oriented in the lowerIeflfupperrigh t directio n.1 for except .. the resull iM (confidence e llipS. .5 Coeffici ent o n STR ({JI) : ng the set of vat e 5% significance fficie nls. 7 (. The F· .7.. The elIip~ contoin$ the poir~ of vol· . hat contains the ~ drawn samples.1. egression sofl· Model Specifico~on lor Multiple Regression 235 FIGURE 7.\ \ .:t. which in tum arises because Ihe corn: la tion b ~t wee n the regressors SiR and Exp ll is negative (schools tha i spend mo re pe r pupil te nd to have fe wer stude nts per teache r).sloli~tic 01 <Je~ the 5% significance leoeL . :t fo r /31an d 132 ' Id i32. 1 95% Confidence Set for Coefficients on STR and Expn from Equation (7.'aria blcs to include in multi ple regressio nthat is. m values o f {3 \ and ues not rejected al 7. .S JOS under the lultipl e coeffi.ts iTIore re gression I + (11 1.5 0.8.1l 0 .:s of fit such as the Rl o r "H2.U 0. But do not des pai r. ~7) .) and Expn iJ311 j~ on ellipse.00. (iJ2) 95% Confidence Set of t31 ond 132thot COnnot be relected u ~i ng the F. 3. othesis.0 1. jidales (J3 l. . because some useful guidelines lire available.5 I. " 2. he estimated rcgre~0.11 . ' me re ason fo r this o rien ta tion is th : H the estim ated corre latio n between P I a nd {3: is positi ve. lho)· :Iie"l value 00. th e problem of choosi ng a regressio n specifica tioncan be quite challen ging. and no single rule applies in all si tu atio ns.S Model Specification for Multiple Regression The jo b of detemli ning "hich .o. cie nts of a eonfi· ~stat i st ic.sed for q = 1.0). . :oefficicn ts is ba~l:d :ients.0 wo rks in tM' the con[ide ncl: s.:l ~ndi\ure per pupil. This mea nS tllJI C is rejected using tll ~ " from Section • .29.11:t) = (0. Section 7.6) Coeffici elll on El:p" The 95% confidence set lor the coeff icients on STR (. It is important to rely on your expert knowledge of the empirical problem and to focus o n o btain ing an unbiased est imate of the causal effect of interest:do no t rely St)lely on purely statistical meilsun.
1l. A t lea variabJ! 2.3 . . omitted variable bias implies thilt the OLS estimators are inconsisten t. and if it is correlated with at least one of the regressors. even after co n[roJtj ng C or the percentage o f English lea rn n short. . then the schools will tend to have large r budgets and lower st udent. if the district is a wea lthy o ne. the omitted variable bias persists even if the sa mple si. Fi rst.teacher ral'io wo uld be negatively correlated. the OLS estima tors are correlated. economic theory.OMlnE Omitted \1 more inelu able bias l' I. when dal<! are ava il able on the omitted va ri able. then at least one of the regressors is correlated with the error term.teac her ratio and the percentage of English lea rners.This means th at the conditional expectation of II.teac he r ralio would pick up the effect of average district income. This base specifica tio n should contain the variab l e~ or prim ary int.stude nts from afflu ent families o h en have more learning oppo rtuni ties tha n do their less a rnuen t peers.:e is large. The two conditions for omined va riable bias in multiple regression :He summari zed in Key Concept 7. the afnu ence of the students and the stude nt.!r est and the control variables suggested by expert judgment and economic theory· . ch en the OLS est imators will have omitted variable bias. and the OLS esti mate of the coefficie nt on the stude nt. For example. The O f] Model Specification in Theory and in Practice In theory. A t a ma lhema tjcal level. a core o r base set of regressors should be chosen using a combination of expert judgment. that is. and knowledge of how the data were collect ed: the regression using th is base set of regressors is sometimes referred 10 as a b ast" specifiClttion. given X Ir . Th e gene ral cond itions for omill ed vari able bias in multip le regression are similm to those for a single regresso r: If an omitted varil:l ble is a de terminant of Y . . T variable bias in th e regressio n of tes t scores on the student. O ur approach to the cha lle nge o f potential o mitted variable bias is twofold. the solution to omil· ted variable bias is to include the omitted va riable in the regressio n.. decid ing whe ther to include a particul ar va ri able ca n be d ifficult and requires judgment.236 CHAPTfa 7 Hypothesis T esbond Confidence Intervo11 in Multiple Regrenion Omitted Variable Bias in Multiple Regression The O LS estimato rs o f the coeffic ients in multiple regressio n wiU have omitted variable bias if an omitted determ inant of Yi is correlated with at least one o f the regressors. so that the first least sq uares assumption is violated. Moreover. wh ich could lead to bette r test scores. As was discussed in Section 1l. so in general the OLS eSTimators of all the coefficients will be biased. omitting the stude nts' economic backg round could lead to omitted ers. if the two conditions fo r omitted variable bias are satisfied.• Xk/ is nonzero.teacher ratios. If so. In prac l i c~. however. As a result.
3 j glish leam to omitted tio and the lression are ri nant of Y.< K£y CQNC. • . the eslim. . In practice. This makes these statistics useful summaries of the predictive <lbility of the regression . This bzero... but if il does this docs nol necessa rily mean that tbe coefficient on that added regressor is statistically significant. it is easy 10 reat! more into the m than they deserve.s implies that Expert judgment and economic rheory are rarely decisive. The "R2 does not always increase. Y. two things must be true: I. JJJtfll... The Rl increases whenever you add a regres sor.e effect of ~omitted variable bias is the bias in the O LS estimator that arises when one or more included regressors arc correlated with an omitted variable.fications. on the other hand. ff.S Model Specifica tion for Multiple Resre~sion I 237 . lhis ohe n provides e vide nce tbat the original specifical ion had a mi lied variable bias.<it )f primar). however.Will be gress10n a re r1b1e bias art lor {efln. Ie difficu!l and bias . that is.'I' 7.." . To asce rtain whelher an added vari able is statistically significant.lIeS of the coefficie nts of interest c hange substantiall y across spec ifications. However. A t least one of the included regressors must be corre\<lted with the omitted variable. O MITIED VARIABLE BIAS IN MULTIPLE REGRESSION 7. We elaborate on t.his approach to mode) specification in Section 9. There are four potential pit fa lls to guard against when using the R2or R.. S estimators I DL$ estima· fienlS. QLS esti . and an R2 or an R2 near 0 meam tbey are 1)0t. If the estimates of the coefficiems of interest are numericall y similar across the alternative speci. whe the r or no l it is statistically significan t.. so that variable bias . All increase in the Rl or Rl does not necessarily mean that an added vari· able is srarisfically significant. Interpreting the RZ and the Adjusted R Z in Practice A n Rl or an "R 1 near 1 mean s that the regressors are good at predi cting the values of the dependent variable in the sample.. :ombins tion of I were collectl!d: 'red to as a bs. then this provides evi dence that the estimates from your base specificat ion are reliable .P1 I me of the ~Iearni ng rettc r test 1d to have Ie student!'.2 after studyin g some tools for specifyi ng regressions. Tbe ref01e the next step is to develop a list o[ cand idate aheroative specifications.2: ution to om it :>0 .Foromitted \'ari· able bias to arise. alternati ve sets of regressors. The omitted variable mllsl be a determinant of the dependem variable.. you need to perfonn a hypothesis test using the (statistic.. 2.! omitted rWWMJM '. int erconomie the°!'}'· 1.'l'" .s t\\'orold .. and often the va riables suggested by economic theory are no t the o nes on which yo u bave data..
111us the regression of test scores on parking lot area per pupil could have a high R2 and R2. Cun verse ly. economiC theory and the nature of the ! mhstantive questions being addressed . An incl uded variab le is . Pa rki ng lot are{l is correlated with the studentteacher ratio.nors. You have chosen the most appropriate set ot" regressors. Decisions about the regressors must weigh issues of omitted vari able bi <ls.but the relations hip is not causal (try telling the s uperintendent that the way to increase test scoreS is to increase parking space!).4 Tht: Rl and R2 tell YOEI whether the regressors are good at predicting. 4. The re is omilled variable bias: or 4. then the regressors produce good predictions of the dependent variable io that s<lmplc. data quality. A high R2 or li. The regressors are a true cause of the movements in the dependent Variable: 3. The R2 and R2 do NOT tell ). Non e of . mean that tlie regressors are a (rue cause oj the dependent variable. most importantly. Recall (he l10t 2.1. a moderate }<2.1 does discussion of Section 6. wi th whether the school is in a suburb or a city. A high HZ or /i 2 does not mean there is no omirted variable bias. lIor does a low Rl or lil necessarily mean you have an il/ap' propriate set of regressors.statistically significant: 2. which concerned omitted variable bias in the re gre ~ sion of test scores on the student. the opposite i ~ true. The R2 of the regression ne\ ~r came up because it played no logical role in this disc ussion. The question of what const itutes the right se t (If regressors in multiple regression is difficult and we return to II throughout this textbOok .011 whether: 1. or "explain_ iog. If the R2 (or l?2) is nearly 0. and possibly wi th district income all things that arc correla ted with test scores. a low R2does not imply that there necessarily is omitted variable bi. da ta avail ability.lS.238 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regre~sion WHAT THEY TEll AND WHAT THEY DON 'T R2 AND 7?2: You ~~ 7. A high R l or liz does not necessarily mean you have (he mosr app ropria rt sel oJregre. O mitted variable bias can occur m regressions with a low R2." the v(llues of the dependent varia hie in the sample of dat a on hand . If the R~ (or 7?2) is nearly 1. or a high R2 . and. 1.Ieacher ra tio. Imagine regressing test scores against park ing lot area per pupil. in the sense t hat the variance afth c OLS residual is small compared to the variance of the dependent varia hie.
Con a riablc bias.6 Aoolysis of the Test Score Doto Set 239 these questions ca n be answered simply by having a high (or low) regression R2 or n 2. These poinls are summarized in Key Concepl 7.6 Analysis of the Test Score Data Set This section presents an analysis or lhe effect on lest scores of Ihe slUde nl. These two variabks thus measure the fraction of economically disadvantaged chil dren in th e district. If data are available on these omined varia hles. the coefficient on the studentteacher ralio i ~ the effect of a cha nge in the studentteacher ratio. There is no perfect measure of economic background in the data scI.· 'ssion nC\"t. One of these conlrol variabl es is the ol:"le we ha ve u~ed previously. . apl)ropriQlt Ql 't (UI jl/UP ~ right set l.. is small benrly O. they . None: ~ )mittcd vari Iy.1fI. Ih~ fraclion of "lUden ts who arc still learn ing Eng lish.. a lth ough L hey are related. The two other variables are new a nd control (or the economic background o f tht: stude nts. Whe n we do this. holding constant these other factors.1 lcd variable gh R2. Tne second n ~w va riable is lhe pe rcentage of students in the district whose fami lies qualify for a California income assistance program .74). but th e t hreshold is lower (stricter) than Lhe threshold for the subsidized lunch program . Some of th e fac tors th at could affect lest scores are correlated with the studentteacher Hltio. Our secondary purpose is to demonstrate how to use a tabl e to summarize regres sion resulls.1 'oughout thi~ Ised . Here we I. with income test scores e!ationship : te :. Our primary purpose is 10 provide an exam ple in whi ch multiple regression a nalysis is used to mitigat e omitted variable bias. holding constant student cha racteristics that the superintendent cannot control.t scores Rcca ll lh~ I the rcgre.teache r ratio.~ not perfectly correla ted (their corrclillion coe fficient is 0.7. Families arc eligible for thi s income assistance progra m de pending in part On their family income. This analysis focuses on estimat ing the effeci on test scores of a change in the student. va riable: Discussion ofthe base and alternative specifications. Fxplain ffthe R2 pendent 7. economic .teacher ratio using the Californi<l data set.:onsider th ree variables that contro l for background characteris tics o f the stude nts that could affect tesl scores.4. Although theory suggests that economic : use of the 1 & lot area ratio. the solution 1 0 this problem is to indude them as additio nal regressors in the multiple regression. so in stead we use two impe rfect indicators of low income in the dist rict. The firs t new variable is the percentage of students wh o are eligible for receiving a subsi di zed or free lunch at schoo L St uden ts are eligi ble fo r this program if their family income is less than a certaio threshold (approximately 150% of the poverty line). Many factors potentially affect the average k'~\ score in a district. so omilling them [rom the regre s~io n will rc~uh in omitted variable bias.
240 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Mulliple Regression baCkground could be an important omin ed facto r. Fo r e. The corre. in the specification with Frac£L.0._ _ __ ~ . FracEL ( = PCIELl I OO). In the specification with Pc/EL.650. we choose the percentage eligible for 'J subsidized lunch as t h~ ecooomic backgrouDd variable.64. th e coefficient is the pred icted cnange in test scores (or a o nepercenta ge·point in crcase in En!!' lish learne rs. For ou r base specificatio n. the coefficien t on Pct£L is .0. Scalle rplots of test s scores and these va riables are presented in Figure 7 2. of the va ri ables? The general answer to the question of choosing the scale of the variables ~ to make the regression results easy to read and to interpre t. in regression analy· sis some decision usually needs to he made about the scale of both the dependent and independen l variables. In the regression of Tes/Score o n STR and PCIEL report ed io Eq uation (7.ti5. Alte rn atively.00000356.6. we could have defined th ese variables to be a decimal fraction rathe r th an a percent. Pcf£L could be replaced by the fruc/iolf o f E nglish learne rs. In the It:st score appli· cation.87. more nalUral. to read. . the regreSSion would have had an identical R2 aDd SER.O.. the natural unit for the dependent variable is the score of the test ilSde. the coeHicient on Fr(IC£L would bave been .. How.63. Although these two specifications are mathematica lly equivalent. C stam. so the maximum possibl e range of the data is 0 to 100. If instead the regresso r had been Fr(lcEL. for example. { .5). then. pet'tenlogeq r t' nnr' PfI . if a regressor is mcasured in dollars and has a coeffici ent of 0. Mure gene rally. the coeffi· cien t is the predjcted change in test scores fo r an increase by I in the fra ction of or a lOOpercenlage·pointincrease. A no the r consideration when deciding on a scale is to choose the units of the regressors so that the result ing regression coefficients are eas). o r uni ts.·s 720 71~1 I)(~ ! '''" . to us. theory and expe rt judgment du not really help us decide which of these two variables (percentage eligible fOr <I subsid ized lunch or percen tage eligible for income assistance) is a better mea!lUrc of background . fIGURE T. bctwcl:n tesl scores and the percentage eligible for a subsidized lunch is . Each of these variables exhibits a negative correlation with test scores. it is easier to read if the regressor is convc rted to millions of dollars nnd the coefficil: nl 1 "ifi i~ l . In Fig· ure 7.2..'\a01 pie.0. ". howc:vef. but we consider an alter· D ative specifica tion that includes the othe r va riable as well.?II "''' I (e) The SCOlterpl hon ~ 0. for the purposes of interpretation the one with PctEL see ms. N " 62( "" •• (0) I What scale should we use for the regressors ? A practica l question that arises in regression analysis is what scaJe you sbould use for the regressors. th e units of the variabl es are percent. should you choose the scale.. holding STR constant.0. lation between test scores and the percentage of English learners is .. which would range betwee n 0 and 1 instead of between Oand 100.and betwct: n test scores and the percentage qualifying for in come assistance is ..ho ldi ng STR N Il English learnersthat is.
J pncc IUllch In Fig ange of to be a aced by d range n analy pendent ts. • • ' .. 0 !. coefficient c in Eng : tht: cnefti Tabular presentation ofresult..1111 (. orre tween t ween .re appli. bU I with morc regre ~sors and equations this method of presenta· tion can be confusing.0 ~ ifNI r.1gc learners (b) PercC'ilIage (or n:dun. Each regression has . • I:" .2. '" 7S Ir~." ' . This works wcll whcn there are only a few regressors an d only a few equations. (oiill •• • • 7>' ~ f.~ • • 4..1Jn ~ 'll. " . ..". For c"a(11' p000356.:?(l 6f)1 • • UM) o 2'. In (7. ~. .6 Analysis of the Te51 Score Data Set 241 t do FIGURE 7 .87).<. {b) the percentoge of students qual ifying for c wb~jdjzed lunch (correlation = .f. e nd (c) the percentage qualifying for income 0~5i5!o nce (correlation = ..64).. .7.. Table 7.._. .(/1 • ( .. ow /. /.' I~'::·.2'1 • • ' .ubsets of the possible regressors? So far. it I~ ~e cocfficiCld I units of the problem. {Of th~ [ural.lg~ o{En~h". "~I '. .. itself.'<OTh. we have pre sented regression results by writing oul the estimated regression equations. 2 T~st Scatterplol'J of T est Scores vs. Each column summarizes a separate regression.0.(~t nil h4U (. A better way to com m unic. the cE L.01 .. :.. . eo • "" .. .. We are now faced wit h a communica tion fraction of g STR con enl. • ".0.6).5).l so qU~hfyUlg 75 P e rcent P e rcen t n that (a) Pcr(C'm.lIc the results of several regTcs sions i:. '. of the ~bk" 1S to Test score ~2(1 7... What is the best way to show the results from several multiple regres s ions that contain different <.2(1 f>l~' . Three Student Characteristics Test score "o r a sure {or a score Iter e 7.1 summa rizes the results of regressions of the test score on various sets of regre. in a table.4(1 • . .63). as in Equation (7. the lio'! Percem (c) r"rn'ntJ~'\' 'lua h/\'illS lor UK OllW J~<i~t:ll1c \"" r The scolterplob !>how 0 r icient on negative relation~hip between Ie! scores and {oj !he percen!oge of Engli!>h \earners (correia 0.
529u (O.deDt.0. Reg.58 0.773 420 o. wi th no COO lrol variables.tlti:> regression is  TestScore = 698. I n equation fonn.5 ) Summary Statistics SER R' 18.4) 0.059) 686. the same depe ndent varia ble.te acher ratio. 7 HypoIhe~ is Test$ oncI Confidence lotervoh in Mulliple Regression TABLE 7.1 Results of Regressions of Test Sc:ores on 1M StvdentTeacher Ratio and Student Characteristic Control Variables Using Colifornia Elementary Sc:hool Districts Dependent vo.LOO" (0.030) I .n3 420 .2" (5.0.·. SE R (10. with their standard erro rs below the m in paren· theses.08 11.27) .2.068) 698. The md.0·· (8. descri bed in Appendix 4.0. R') and the sa mple size (which is the same for all of the regressions.2.242 CHAPTE II.28 x S TR.'l.dual coefficient i1 StahstlcaJl y signifi ca nt al lhe "5% level or *"1 % signifi cance k . SER.9 .1. R2 = O.43) 1. Fo r example.)1 ·· (0.024) (4) (' ) . All the informa tio n that we have presented so far in equation Cannat app('al"S as a co lumn of this table.27) .58.. Standard ~rro'" are g..6) 698.0.547" (0.1.J()" (0. The e nLries in the fi rst fi ve rows are thl! est imated regression coefficients. testi ng the hypothesis th at the relevant coefficient is zero.:J4) . is signi fica nt at the 5% level (one asterisk) or Ihe 1 % level (two asterisks) .9) 700A'" (5.~trkts in Ca lifornia. The final three rows contain summary statistics for th~ regression (the sta ndard error of the regression.650'" (0.626 I). and the adjusted R2.122" (0.031) 0.: clVerage tnt lCore in the district. (7.9" (10.O.19) ..049.4) (0.52) UO· (0.·el using a t . consider the regression of the test score againsllhe student.~n in parenth('se~ un d~r codf.65 0.OS 0.1.52) = 18. test score. The asteri sks indicate whether the rstatistics. " 420 420 420 ThClie regressioon5were estimated llSing tbe d~ta on KS scb{K)1 d..) Percent on public income assistance (X4 ) Intercept ( 1) 1 (2( : (3) ."'" (0..0.iobl.46 0.036) .7) 700.488 (0.11 = 420.O JR) .424 9.28" (0. 420 observations).euor Studen tteacher ratio (Xl) Percent English learners (XJ Percen t eligible for subsidized lunch (X.. osided test.01 ** (0.79()· (0.033) .0'" (6.049 14.
529· · (n. for example. Controlling for these student characterist ics cuts the e ffect of the s tudent I hypothesis thaI : asterisk) or Ihl: statistics for the listed R2.7. The estimated coefficient 00 the student. the percenlage ofstudents on income assistance is included as a regres sor.049).uggest Ihree conclusions: I. regressions (2) . n format appears of the test scOre: 4uation form .(5).\.and the sam ple size n (420) appear in the fin al rows.(1.. the R2 in Ihe base "" 420.4.teacber ratio and ( IVa control vari ables. reducing the st udent. ..28) appears in lhe fi rst row o f numerical e nlries. however. which reports t he reg res.4" 9. "R!) and observations).111e intercept (698. Column (3) presents the base specification.lhe SER (J 8.teacher ratio and on the percentage of English learn ers.teacher ralio (.2." (Someti mes you will see this row labeled "constant" becau~c.0. and its standard error (0.1 30" .2. The student characterisTic variables are very useful predictors of test scores. The blank enlries in lhe rows of (be other regressors indicate Ihat tbose regressors are no t included in this regression.5S). In column (4).This hypothesis is rejected at the 1% level. 111ese results s. Column (2). ve rows are the them in parc n Discussion ofempirical results.52 = .28/0. (1.059) (5. and in column (5) both of the economi c background variables are included.048 (0. ho lding constant st ude nt characteristics. tbe intercept can be viewed as the coeUicjent on a regres sor that is aJwaysequallo I.27) .Thc /i. the R2(O. as discussed in Section 6.2.036) 700.1 9) • . For example.01" . tt'lis teacber ratio on test scores approximately in balt 'Ibis estimated effect is not very sensitil'e to which specific control variables arc incl uded in the regres sion..38.ion of test scores on the stud elll. when the st udent chamClerislic variables are added. was previously staled as Equation (7. 2.773 420 Slon d3td eH()T'i lr H\ % ~igni {i All this information appears in column (1) ofla ble 7. in which the regressoTs aTe the stu de nt.52) appears in parentheses just below the estimated coef ficien t.) Similarly.038) 0. which is indic(l ted by the dou ble asterisk next to the estimated coeffi c i~nl i. In all cases lhe coefficient on Lbe studentteacher ralio remains statisti cally significan t at the 5% le vel.5)  lO. these can be computed fr om the information provided.1.n Ihe table. Th e stude nt. Columns (4) a nd (5) present alternative specifications thaI examine the e ffect of changes in the way the economic background of the students is measured.6 Analysis of the T esl Score 0010 Set 24J .2 jumps.. the (slatistic testing the hypothesis that the coefficient on the studentteacher ratio in column (1) is zero is .0.tcacher ratio alon e explains only a small fraction of the variation in test scores:The R2in column (1) is O.9) and its standard error (lOA) are given in th e row labeled "Intercept.5) .teacher ratio by one stu dent per teacber is estimated to increase average test scores by approximately one point.049.llS 0. In the fo ur specifications with control variables. Regressions thaI include the control variables measuring slUde nt ch<fracteris Lies are reported in col umns (2)(5). the percentage of E nglish learners and Ihe percentage of stude nts eligible fo r a free lunch. Although the table does not repon I·stati stics.
pcdfication . tbe eUecl of reducing the StudCnltcachd ratio migh t be quite d ifferent in districts wilh large classes than in d. tbe population regression line is nOt linear In the: X's but rather . a nd U so the \(U. If so. 7.. regression (3)..nificant in specification (5). is 0.82).. constant.leacher ral io in the d istrict.:ady hme small classes..t scores. 111e control varia bles are nol always individuall y statistically significant In specification (5).: popu li!' tion regrc"~ion function is linear in the regressorsthat is. • 244 CH APTiR 7 ~is TasPs and Confidence In lervals in Multiple Regression )..lhe OLS estimator would have omitted \'.. hy p(llh l!~IS tests.. and because the coefficient o n this con trol vun· able I!> nOI sig.<... holding these co ntrol vnriabk ). lic is 0.omitted student charac\(:risljcs thai innuence leSl scores micl11 be correlat ed with Ih~ student. . howc\ cr. Bec:lUsc adding Ihis control varia ble to the bas~ specification (3 ) hil)' :I negligible effect o n the esti mated coerricient for the sludentIcadh. ItJ rcg. we a ugme nted the regn:''> sian by including variables that con lrol [o r va rious student ch ar a c l erisli~ (I hl' percentage of English ICnrnl!h and two measures of stude nt econom1C bJck · grou nd).Ieacher ratio in the dislrict wou ld pick up the d fect on lest scores of Ihc'''' omitted s tudent characteristics.. 'his additional control variable i" redundant.. we need the lools devc lop<!d "'!~ ____. Thus. To mitigate this potentia l o mitted varible bia::.. Duing SO cub. The analy)is in this and the preceding cha pter has presumed that tht. 3. tha I the condition..J . although it rema ills possible to reject the null hypo thc)'is that t h ~ population effect on h:). 773.. denl. In fact.11 cxpectation of Y! given the regressors is a straight line.~ a nonlinear function of tbe Xs. leve l (the /·st..' r ratio and its standard error. thc hypothesis that the coefficien t o n tbe pcrcentage qUilli· fying for incnme assistance is zero is Dot rejccted at the S°k.1t aln. trict" th. and confide nce interva ls are much ma rc useful for advising the superinten dent than the .llly E nglish learners and d istricts with many poor children have lowe r test ~orcs.7 Conclusion Chapter 6 bl!gllO with a conce rn : In the regressio n of lest scores against the stu d\!nl.. To extend o ur anul). however.ICachcr ralio. these muhiple regression estimate). There is..\I . no PitT ticular reason to thin k this is so. Because they e.'I ri able bias.limmate omitted va riable bia" ari'lo)! from thc"c student chllracteristics.n: ~ · sion function " thal Are nonlinear in the Xs.lhc estimated effect of a urnt change in Ihe sludentlcacher ratio in half. i.. he coefficients on the 'tu· dent lJemographic variables are consiste nt wllb the paHems seen in Figure 7 2 Dist ricts with m. f~ro at the 5% significance level. at least for the purposes of Ih is analysis..ingle·regressor estimates o f Chapters 4 and 5. The signs of .
I / nato rs of the regres. Yj = f3u + fJ. . .IO rt!striclions (226) joint hypolhesis (226) Fstatistic (227) reSl.96SE(j3t)· 2. Hypotheses involving more than o ne restriction 0 1\ the coeffit·ic nls are called joi nt hypotheses.! causal cf(~C l of mleresl.3 4 ) base speCification (23t j ) ahe mali\ c 'Ipecific3l it 1ns (237) Bonfe rroni lest (251) Review the Concepts 7.l ~t ici t\' _o n l) Fslatislic (231) 95 % confi~lcnce 'set (2... .Ie. first detern.. Expl<lin why Ihe Rl is lit I be I igh ..~ . is givcn by p.. 'testS regressio n model. IS 3. · Ilh Ihe highest R] can " tcallon w lead 10 reg. 'h a .2 Provide 3n example:: o r a regression Ihat atkuablv WOl ~ ld have a high value of 1<2 but would produce biased and iocol"I.1 Ex pla in how you would lestl he null hvpo.Isten. Joint hypotheses can ~ lested using .' . va l fo r P.X li + fJzXz.ression models Ihat do nol esti mate thl. Why isn't the tl:sul l or the joint test implied by thc resulls o f Ihe fi rst lwo tests? 7. 1 /I.D_ : ~.. .z )'ou \\ O U popul a · .ricted regression (230) unrestricted regression (230) hOJll OSked .sed in the onevariable linear regression_model o f Cha ~% con fidence Lllle r • ple r 5. p d l = 0 in the multiple . :::!: 1. sion coc fficient (s).·· " tllllng a bil . <>sion coeffici ent a re log1 e regrc carried oul using essenliaUy the same procedure. Explain wh y the '\c y 10 J OLS eStin1310rs "'ould be biased and in col"I ~ i stc nt.. I IHII were .. nn par· esis that /3 1 = 0 all(/ fJz = O. ExplaiP how you would test 12.In F. es.a 9 . !he Coocepb 245 Summary L H ypOIhesis lests and con fi de nce int ervals fo r a .l ler potential sources I ress 0 o f omill ed variable bias. ~ The basi specification can be modifi ed by including additional regressors that . Key Terms ~Iu V. 's a. For clI. = O. ·. dd . Regression specification proceeds bv .e specification cho sen to address concern about omitted variable bi . Si mply choosing the spc"lr.~ m p Ie . Explain ho\l.o I Id testlhe joint hypoth the null hypothesis that .
. tained info rmation on the region o[ (he count ry where the person lived. ital status. Construct a 95 % confidc nce inte rval for th~ expected difference between their earn ings. Midwest. For th e purposes of these exercises k't AH E = overage hourly earn ings (in 1998 doll ars) Co f/(!ge = binary variable (1 if college. 7. Is the male. degree..1 Add . b. Is the collegehigb school earnings difterence estimated fr om thi~ regression stati stically significant at the 5% level? Construct a 95''' 0 confidence in ll!r val of the difference.year workers. O othe rwise) 7. 7. The highest educa tional achlc\t:. The data set consists of infor.h school) Female = binary variable (1 if female .. Uo the lw ise) IVes/ = binary variable ( I if Region = West.ion statistically significant at the 5% level? Construct a 95% confidence interval fo r the differe nce. 0 if hig. and number o( children. mar. yearol d female college graduate . Do there appear to be important regional d ifferences? Use an appro priate hypothesis test to explain your answer.<. mation on 4()()() fulltime full. me nt fo r each worker was either a high setlool d iploma or a bachelor'. compllled using da ta for 1998 from the C PS. (5%) and " *¥" (1 %) to the table to indicate the statistical signifi· ca nce o f the coeffici ents.3 Using the regression resuhs in column (2): 9.. Betsy is a 34. a if male) A ge = age (in years) Nth easr . Is age an impo rtant detenninant of earnings ? Use an appropriut e ~IO' lisli call est <lndlor confidence interval to explnin your answer.2 Using the regression results in column (1): ll. The worke r's ages ranged (rom 25 to 34 years.female earnings diffcrence estimated from this regre<. The data set abo Con. 7. Uo the rwise) Midlvest = bina ry variable ( 1 if Region ::."~i on res ults in column (3): a.4 Using the regrc.246 (HAPHR 7 Hypothe~s Tests ond Confidence Intervols in Muhiple Regr~on Exercises The firs l six exe r ci~es refer to the table of estimated regressions on page 2.binary variable (1 if R egion = North eas t. 0 otherwise) Sow" = binary v:v iable (1 if Region = Sout h.. b.n. Sa lly is a 29ye arold female college graduate.
62 (n. ii.27 6.69 (0.21) (31 or's College ( X t ) Fe male (X2 ) 544 (0. Jennifer is a 28yea ro ld [e male college grad uate [ro m the M idwest i.10 6. Molly is a 28yearold femil le college graduate from the West.S The regression shown in colunUl (2) was esti mated again .or 1' 1 5.75 (1.190 6. Conslruct a 95% conIjde nce in terval fo r Ihe d iffe re nce in expected r1ate sIn L\rold eumings be lwee n }lIa nil a anu Mo lly.Cor 247 .60 (0. Ju anita is a 28yearold female college grad uate from the South.28) .62 (0.Exercises Re5Ulis of Regressions of Average Hourly Eamings on Gender and Education Binary Variobles and 0tMr Characteriltks Using 1998 Data from tf1e Current Populorion Survey .21 0.47. converted into 1998 dollars using the consumer price index). The rcsuils are .21) (21 JAS (0.ignifl Summary Srotisriu and joint Tesh SER th is :95% ~re!>Sion F'\[.04) I con let ".20) 0 .04) Age (X l) l'llnheast ( X~) 0.64 (O. this time using dala from 1')92 (4000 observations selcclcd at random from the March 1993 CPS.05) 3.06) s.21) 2.40 ( 1.69 (0.29 (0.176 " 4000 4000 4000 dence b.27 (0. .W) 2.1 94 R' 0.1 4) 4. vc ~pe nc:lent variable: ovet"age hourly eorningl IAHlJ• ROSI .0.20) 0.eJl.0 6.22 0.JllSti c for regional effects . 2.30) MId west (Xj ) Soulh (Xo) ln tcrcc pt 0.46 (0.29 (0. (Hil1l: What would hap pen if you included West and excluded Mid· wes( (rom lhe regression?) 7. Explai n how you wou ld conSlruct a 95 % confid ence interval fo r bl fo r the an apprO' the difference io expected earnings between Juanita and Jennifer.26) 12.
D o you think thai another sca le might be more a ppropriate? \\'h )' or why no t? e..40Age.I 248 CHAPTER 7 HypoIhe~is T ests ond Confidence Intervob in Mulriple Regre~sion 2.03) Xiil = 0.59Fenwle + 0.:!e l fro m an adjacent lo t. A homeowner purchases 2000 square f.7 Question 6.48(1/11 + O.. and statistically significant.~~) (0.9) + OAS5RDR + 23.2 = (0.5) (8.002Lsiu (2.1..20) Comparing this regression 10 the regression for 1998 shown in column ( 2).29College (O. labor marke l.156NsiU + O. to Table 7.0. Construct the homoskedaslicityonly Fstatislic for tcsting {31 :: {34 () in the regrcssion shown in column (5)..1 11) 0..· sion more gene rally? c.6 E\alua te the fo U on Female is negative.S. «US) (0. Is the s tatistic significant at tht: 5% level? t"" Test 133 "" f34 .OYOA gL' . R.'· 7.. tUl\C p. the cocfficio.5 a.94) (0.OO04S) + O.77 + 5. Typicall y fivebedroom houses sell for much more than two· bedroom house:. SER = 4!. SER = 5$5. This provides strong stati sti cal evide nce o f gende r discriminalion in lhe U. d.::nt 7.. Is the cocfficienl on BDR statistically significa nll y difre re nl from :Le TO? b.1 in the text: Construcllh c R2 for each of the regression s. large.= 119.. Is Ih is consistent with your answer to (a) and with the regrec. .72 .8Puor. Lot size is measured in square feet.8 R cfe rrin ~ !l.48.2 . 0 in the regression shown in column (5) using the Bon rerroni tCoSt discussed in Appendi x 7.2l . Con struct a 99% confiden t inte rv. b.5 re ported th e fol lowin g regression (where standard e rrors bee n added). R.2 (2J .OR Are the coefficients on RDR and A ge statist ically different from ze ro at the 10% levct? 7. '1l1e f:·statist ic for o mitl ing BDR and Age' fro llllhc regression is F =' O.6 1) (!U.! for the change in the value of he r house.0(1) (O. was there a sI31islicaily sign ifieanl change in I be coefficiem on CQllege? owing stateme nt: ' In all of the regressions.
R un a regression of A HE on A ge. 19 {3~ = f3~ == II [lineant al the Llsing the Bon • . f31 + /3 2 = 1. = 130 + {3 IXli + fJ2X :1' + u. R2 and R2. Bob is a 26· yearold male worke r with a high school diploma. (Him: You must redcl'inc the de pendent variable in the regression.:l se t CPSQ4 described in E mpiri r. c. Predie! A lex is's earoings using the regression .1 Usc th e dat.9 Consider Ih ~ regression model Y. gender ( Female).e Empirical Exercises {rom E7. What is the cslima!cd effect of Age on e arnings? Construct a 95% confidence in terval for the we (fi cient on A ge in the regression . rbedroom ~le Run a regressio n or average hourly ea rnings (A HE ) on age (Age).2 l. c." ors have listie to test a.14 ) show two fo rmulas for the homoskcdasli cilY only r "statist ic. Use "A pproach #2" from Section 7. whe re a is a constan t. Construct a 99 % confidence interval for {31 for the regression in column 5. f31 + a{32 = 0.Empirical Exercises 0. Are gender and educatio n de lerminants of earnil. f3 r = 131: b. Show tha i the two formulas are equivalent 'iz. Pre dict Bob's earnings using the e stimated regression in (b). Test I. Are the result s from the regression in (b) substa ntively differen t fro m th e result s in (a) regardin g th(' e ffects of A ge and A H £'? Does !he regression in (a) seem to suffer from omitted vari able bias ? ther scale sion is F == '[ereol froJ11 d. d. A lexis is a 30 yea ro ld fema le worker with a co llege degree. Tcsi Ih e null hypot hesis that Bllchelor ca n be de leted from the regressio n.13) and 17.) 7.he null hypothesis that bOlh FI.1 to answer the following questi ons.:a l Exe rcise 4. Compare !he fit of the regression in (a) and (b) using the regressio n standard crrors. 7.3 to lIansform the regression so thai you can use a (stu ~ge? fii cicnt s slrong et. 249 mn (2). Why are the Rl and R2 so si milar in regres sion (b) ? f.. Con b. c.1gs? Test the nu ll hypothesis lha l Female ca n he dele ted fro m the regression.m(lle and Bachelor can be deleted from the regression.10 E quations (7 . and educat ion (Bachelor). What is the est ima!ed inte rcept? What is the e stimated slope'! rcgrcs· lof her t lot. II.
II. Docs controlling for these other factors change the estimaled effect of di!> tance on college ye<lrs completed? To answer this q uestio n. examine the robust. YearsSchoof. R un a regression o f years of completed e ducation (ED) on disl30ce to the nea rest col. Is this result consistent with the regressions that you constructed in part (b)? I .. o n avernge. carry Qui Ihe fo llowing e"e rcises. H. i E7. What is a reasonablt: 95% confidence interval for the effecl 01 Beauty on Co urse_Evof? E7. controlling for other factors. should be included in the regression? Using a lable like 1able 7. Whic h do you think. construct a tabl e like Table 7.~ a nd RGDP60. Ilnd Hi~ panics complete more college than whites. blackl>.1 U sing the da ta sel TeachingRa liugs desc rihe d in E mpirica l Exercise 42. a pe rson·s edu catio nal a tla iome nt wo uld increase by approximately 0. and several modifica tions of the base specifi cation. Run a regression of Course_ Evaf on Beauty. O lhe r factors a lso aHee l how much college a person comple tes.2S(l CHAPTER 7 Hypothesis Te~~ and Coohdence Intervols in Mvlrip1e Regression g. Is the advocacy groups· claim consiste nt with Ine est imated regression? Explain.wer loe following quesrio ns.1. lege (Di~I). Conslruct a 95 % confid e nce interval fOT the coefficient on TradeSJ1I1r/'. I Run a regressio n of Growth o n Tr(llieShare. b. II has been argued Ih at.. Include a simp le specification [co nstructed in (a)l· a base specificatio n (t hat includes a sel of important conlrol "an· abIes). b. Construct a 95% confi dence interval for the effect of Beauty o n Cou rse_ E mf.n ess of ihe confidence inte r"al that you constructl!d in (a). Discus~ boW th e estimated effect of Dist on ED changes across th e specifications.1.15 year if dis ta nce to the nearest college is decreased byl0 miles. Is the coefficic nt stulistically significant at the 5% le ve l? .4 Using the data sel Gro\\1h described in Empirical Exercise 4. carry o ut the following exe rcises. c. A!iS(lssil/(lfioll. Consider lhe various control variables in the data set.3 Use Ihe data set CollegeDistancc described in Empirical Exe rci se 4.3 10 anl>. A regression will suffe r from o milled variable bias when two condI_ tio ns hold. bu t e:'tcJud ing tbe data for Malta . Rev_COUPs.4. What a re these two conditions? Do these condi tio ns seem to hold he re? E7. Q. A n educa tio n advocacy gro up a rgues Ihal.
However. then you will not be able to compute the F·sliltistic of Sec!1on 7. and RGDP60can be omitted from the regression. Bonkrroni's ~ on's edu a r [((lis rest col estimated TIle Bonfe rroni test is a test of a joint h ypo Lheses based on the Istatistics for the indi vidual hypothe ~s. Because Pr(A n B) ~ O. where ') and'2 are t he Istatistics tha t (esllhe reSlrictions on f3 L and f31. Rn'_Coups.1.. Let A n a be the even t " both A and B" (the intcrsection of A and R). What is the p. ~~ van rriSCUSS hO\\' irtcalions. it foll ows tha t Pr(A U B). Does C l of dis conslrUct 8 . and let A U R be the event I Rev _Coul'~ ~ inter. the Bonferroni test i~ the o neatatime Istatistic test of Section 7._ 7_.sisten! with kbut cxduJ· Ily significant Bonferroni's Inequality Bonfcrroni's ineq ualit y is a basic resull of probability theory. 011 legreSSion This method is an appli catio n of it very general testing approach based inequality. Let A and B be evcnts.20) (Bonfe rroni o ne·atlltime (statistic test).ion in which you are interested . APPENDIX b.value of (he Fslatislic? .. respectfully. ·ntis is done by using Bonferron i·s ineq uality to choose the critical Y <ll uc c to <lllow both for the fac t th at two restrictions arc being tested and for any possible cor· relation between I) and 12' = /310 and {32 "" f3z.The Bonfe rroni test of the jnint null hypothesis /3! based o n tlte critical value c > 0 uses t he fo llowing rule: 11 s c: olhe .o tes. Pr(A) + Pr(B). if the author of a study pres~ lIt s regression results but did nollest a joint rcstricl.2.2 do ne prope rly.2.. structed The method of Section 7. rejecl Acce pt if li t I S (" a nd if 1 (7. Yea rsSchool..al fOf ·· A or B or both" (the union of A and B) . TIle trick i~ lo choose the critical value c in such a way that the probability that the one· atatime test rejects when the nu ll h ypothesis is true is no more than the desired signifi· cance level.The Bonlerrooi Tesl of a Joint Hypotheses 2S 1 Indi seem 'ise 4..ted in ((1)1.(Pr(A) + Pr{B)J. Assassill(l (ions. Lei N and W be the complements of .. Then Pr(A U B) = Pr(A) + Pr( B)  Pr(A n B). antiYOU _ . s and His ._111The Bonferroni Test of a Joint Hypotheses able 7. This inequality in turn ~ implies that I . This appendix describes a way to test Joint hypothest's that can be used whe n you on1)' have II tisc 4.3 to table of regression res ults.2 is Ihe preferred way to test joint hypotheses in multiple regre:.Pr(A UB) I ..· r eel of sian. Test whether.vise. and you do nOThave the o riginal data. say 5%.. 1\ . that is. taken as a group.
A' nlJ<.?!.3.252 CHAPTER 7 Hypothesis Te1h 000 Confidence Intervols in Multiple Regression A and R. the BonfeTToni test is \illid onl\ unt.lard crror~ th~' 11 the Oonrcrroni test is \"8lid whe ther or not there is heteroskedasticity.3 arc larger than the critical values for te~tlllg" ~1T1 ~l<' restriction. > c and B b.lity.nificance le\d The Bonferroni approac h can be extended 10 more than replaced b) 11. you get a second chance to rcjel.S ror the fac l that. and 4.s if at Icu~t om: 1_~JjJ \I ..\lC exceeds 2. critical value c w that t he "CIn(' at II larg~ Istatistic has the desiroo significance leve l in large samples.. which yields Bonferroni's inequ:. Table 7..: c w that the probability of [hc rc)cc[ion unde r [he null hypothesis equals Ihe deSired sip.! the c\'cnt that .s {\: . the onca tatime te5t in Equation (7 .241.2.22) tells us t hat. Thus Equa tio n (7.~r\l l'Orree\.3. Thl' critical values in Tabk 7. Under the null hypollh.22) The inequality in Equ:uion (7. For eX<llI1p[e. the f. Pr( 11 1 1> c) '" Pr(jlll > c) = Pr(l l l > c).lI1dard errors. Then the meq PrOI. (. Pr(Nnl1' 1 .l . (7.241 in a~olute "" llue. in large samples. the oneatatime tes t reject.2.l h u~ Equation (7..'l the Jomt nu ll hypothesis. Equation (7. as discussed in Seclion 7.. 1bis critical value i~ t ho: 1. with q . "' I J Bonferroni Tests Because the event "1 /. .' 21 > r. Alx'Ording to Tahle 7.! + Pr(R))ieldl Pr( t' . by looking at two I·~tatistic:s.21) implic~ tha t. (( the indi\'idual/slaliSli~ are based on heteroskedasticityrobustMam.. so Pr(1 rej~cl IWO cocfficienb: if there an: q restric tions under the null.Pr( A U 8) . 1> cor 1121 > c or both" IS the rejectio n region of the onc·atH Ulle ti m~" In test.1.96 bccau'ie 11 pro\. the critical value c is 2.11 \alul. tbe cvents "not A " and " not B:' Because the complement or A UII i. su ppose the desired ~ignifical1i. m samples.20 ) \\ 111 al most 5% of the time under the null hypothesis. but if Ihe t_l>I 3tl~t i<:" a re b.22) provides a way 10 choose critic.(oneatatime tes t rejects) S l Pr( IZ I > c) .lctor of 2 o n Ihe righthand ~ i de in Equation p.:e k Id I~ S'!Iv and q = 2.241) = 2. that is.3 presents cnlical valul'S (' for the one·al3·lime Bonferroni ICSI fur "ilri''u~ ~ig' nlficanec levels and q .TIlis critical \'a1ue is grealer than \. 1~ Zl > 2.l ~r ho moskedasticity.IPr{A) '" Pr(B)] Now kt A be the e\cnt that it) l7(A U 8) :S Pr( A) I'.21) provides a way tochoos<: tho. For cxampk.scd o n homoskedasticityonly ~t. the probability that the onea tatime test rejec ts under the null Pr/J.1 distribution. i~ in large samples.25% percentile of the standllrJ norm.Pr(ArnW) .1 > c or I' ll > c or both) S > c ) + Pr(1r11> c ).
ioglC: 1. we can rejecl the join t n ull hy ~thesi$ hypothesis at the 5% significance level uSing the Bonienoni test. null is < 2.The Bonferroni T est of a Jail'll Hypotheses 253 il AU 8 i~ TABLE 1.0. us ing the Fslati. because 1 121 > 2.W 2.5%·Thus :.807 (7.· ificunco: level i~ ieal \ alu. respectivel y. both (I and 12are less t han 2.2 41 2.241 2.21 ) 1.: ~ tesling a . 'nle /.ull5.ignifica oce leve l.: is Ihe )"" I 2.241 . we able to for var.935 3.394 2.498 1% 2 (7.Ih.c it pror.960 2.22) is in ahsolute value. ~ vtllid ani) tln lic' . However. 11 = .241.1\. at the 1 % significance level w~rt using Ihe Bonferro ni les!.3 1\'n8') 2: Bonferroni Critical Values c for the Oneatotime tStan5tic Te51 of a Joinl Hypothesis Significon<e Level he inequlll Number of Restri ctions (q) 10% 5% 2.6) are.2 . Tn cont rast.\' if the re are q (Ilion (7..128 2..20) ~ill lance \0 reject ttl.t:l li~lic.sig.1 onc 1_~13 Ii~lic cau<.22) e c sn lhal the ificance lev".023 .01'1 (7.43. so we canno l reject the joint null rejeel this hypolhesis at the 1% l:.slalistics tes ting th e joint null hypolhesis that the true coefficients on test scores and e xpenditures per pupil in Equ a tion (7.:rl~ t Tldard errol".60 and 12 A ltho ugh lId = 2.tic in Section 7..:!l I if Ihe 1_. Application to Test Scores 3 e_al:l_lim~ cal a time" ypolhc5is in implies thaI.
. Tn other words. shown in Figure S. students still learning English might cspecially benefit f TQ m h<1\" ing more oncononc attention.amplt!. change in Xl depends on the val ue of another in<.lnt. An example of a non linear regression funerioD wjth this fea ture ie. I so th at the effect o n Y of a unit chunge in X does nOl i tself depe nd on the valul! of X. reducing class sizes by one s tude nl per tcacher might have a greater effect if class sizes arc already man ageably smaU than if they are SO large that the teache r can do little marl! than kee p the cI. ss under control. the effect on test scores of n.iependt'nt varia hI e.teacher ratio \\ill be greater in di~trict~ with many student s stillle:lrnin~ English Ih. ntis first group of met hods is presented in Sectio n S.. the test score (Y) is a nonlinear function oC the Studen Heacher ralio (X I)' where this fun ction is s teeper when X \ is small..• CHAPTER 8 Nonlinear Regression Functions I n Chapters 47.. ln this exa mple.2. say X. the effect 011 . Th is chapter develops two grou ps o f methods fo r de tec ting and madding nonlinear populatioll regression functions.Iconstant slo pe. the sL ope of the population regression fun cl io n was con!ot'.. the non linear popu lation regression function in Figure S. l. For exa mple. If so. The me thods in the first group ar~ useful w hen the effect 0 11 Yof a change in one indepe ndenl variable. X I_ depends on the va lue o f XI itself. the pop ulation regression f unction i s nonlinea r. if so.l~ The methods in the second group are usefu l when the effect on Yof fl has a steeper slope when Xl is smat! than wh en it is larg.:. for ex.dueing the studcnt.ln in diSlricb wilh few English leurncr. But what if the effecl on Y of a ch(lnge in X does depend on th e val ue (If one or more of the independent variables? If so. (h~ popu lat ioo regression [unCI ion was assumed to he linear. Whereas the linear population regression (unction in Figur~ K I. has .
"" Population regressjon jU r'K tioo whfn X2 = 0 x. the slope of the populotion regress ion fu nc tion dep.ioo ftmctioo wheo Xl eo 1 " Ri~r. I Xl " . on the v:11u<" of X l luc of y ' ''1 ''. the pop ulation regression function is a X..\ _____ '"" x. these models are linear funct ions of the unknown coe.>nds on the value of X2 . tbe \ \carning. the popvlation regre~sion fu ndioo has a constant sk>pe. ng To Population regres.. . J ('P ~n d~ 0 <1 the val\l ( uf X 2 ~" nction s In Figure 8 .. 1a . howO 8.fficicnts (or parameters) of th e populat ion regression model and th us arc versions of the multiple regression model of Chapters 6 and 7. the slope of Ihis Lype of population regression func tion depends on the value of X l' Th is second group of methods is presented in Sect ion R3.3. 01 a In th~ models of Sections 8.. Altho ugh they are nonlinea r in the X's. 1 YI Population Regression Functions with Different Slopes Y .~ '"" R i~ '"" x\ (a) C omta m . the slope of the population regre~i on function depen ds on the value of Xl In Figu re 8 . e ffe ct oil .1b percentage of English learn ers in the district (X2 ). lo pe (b) Slope J".2 and 8." F()r ving c nonlinear function o r the independent vari ables.. .<.As shown in Figure S. ta test scores ( Y ) of a red uction in the st udentteacher ratio (X l ) depends on the re S. Therefore. In Figure 8..Nonlineor Regression Functions 255 FIGURE 8.{~ aL n t. 1c. (c) Slup".1 b.le.p~nd. the conditional expectation E(Y.) is a nonlinear functi on of one or more of the X's. X 1. that is..
That analysis used 1 '. To keep things si mple.2 irtl roduce nonlinea r regression functjons in the conk\1 of rcgressio n with a single inde pendent variable. it is impor1ant to analyze nonlinear regression functions in models that comrol for omitted variable bias by including control va riables as well.llt sidized IUDCh and the '''' for inco fl1 ' r .. we re turn to the Californ ia test score data and consider the relationship between test scores and district income. however.lnd tested using OLS and th~ methods of Chapters 6 and 7. Sections 8.!f. In Section ~..1 provides examples of such funct ions and describes the nonlinea r least sq uarc~ estim:lIor..8.3 extends thi. and Secljon 8. 1 and 8.. Test Scores and District Income In Chapter 7. hOWC\'\. wc combine uonlinea. but they cll n be estimated using noolinear least squares. . holding studen t characteristics constant. addilional contro l va riab lc~ arc o mitted in the e mpirica l exam ples of Sections 8.5.256 CHAPTER 8 Nonlinear Regression Functions unknown paramete rs of these nonlinear regression function s can be estimated ... Iv tWO inde pendent \'anables. the nonli near models are extensions of the multi· pie regression model and lhcrcfore can be estimated and tested using the tools of Cha pters 6 and 7. If sO.1 A General Strategy for Modeling Nonlinear Regression Functions Thi s section lays OUI a general strategy for modeling nonlinear population reg res · sion functions. we found that the economic background o f the students is an imp<W tant factor in explain ing performance on standardized tests. First. t ' . the regression function is a nonlinear fu nction of lhc A S alld of the paramete rs.. Appendix 8. rn some applicat ions.r regression functions and additional C(lntrnl variables wh en we l l1ke a close look a\ possible nonlinearities in the relationship between test scores and the student.rccntage o f district families q ualifving .\'0 econom ic background variables {the percentage of studen ts qua lifying for a ~I.tIJ 8. 1. In practice. the parameters canno t be estimated by OLS.teacher ralio. In th is st rategy.3.
TIl e sam ple contains a wide ra nge or income levels: For the 420 d istricts in o ur sa mple. <lIa ng w.'dislriCi i~ com e " ) .nd assista nce) 10 measure Ihe fraclion of stude nts in the district com ing from poor fami lies.. $13. it seems that the relationship betwee n d istric t income a nd lest scores is no t a stra ight line.700 per person). but the linear a s regression line does not adequately de~ribe !he rclotiomhip between the!>e vorioblos score 74" 720 700 """ 660 640 62fi 600 0 10 2 .7 (I hat is.000. .lb tbe OLS reg ressio n line rela ting these t wo variables.000) or very high (over $40.TiCf in come (tho usands of doU an) . Figure 8. In sho rt. The Cal ifornia data sel include. Rathe r. but are above the line whe n inco me is between S15. Test scores a od average income a re stro ngly positive ly corre la te d. ~) '0 50 60 DiSl. bu t if the slope d epe nds o n the va lue of X. slUde nl S from afnue nl districts d o bette r o n the tests tha n slude nls (rom poor d istric ts. 1 A General Strotegy lor Modeling Nonlinear Regression Functions FIGURE 8.2 2S 1 Scotterplot of fe. broader measure of economic back ground is (bi! average annual per ca pila income in the school dislricl (.8.3 ($55.2 shows a scallerplo t of fifthg rade leSt scores against disirici income for the Califo rnia data set. with a co rre latio n coefficie nt o f 0. A differen l .71 .sl Score Vi. {he medjan dislrlCI income is 13. A llo nlinear func tio n is a fu nction with a slope thai is no t constan t:1l1e function [(X) is linear if the slope o f [( X) is the l)ame for a ll values of X.3 ($5300 per perso n) to 55. the n f( X) is no nline ar.000 a nd $30. dislrici income measured j('l tho usands of 1998 dollars. But this sca u e rplo t has a peculiarity: Most o f thc poin ts a rc below the OLS line whc n income is very low (under $10. The re see ms to be so me c urvnturc in the re la tio nship he tween test scores and income tha t is not captured by the line a r regression ..000).. a nd il ranges fro m 5. it is no nlinear.300 per person). District Income with a Uneor OLS Regression function Tf'~1 There is 0 positive correlotion between lest ~ ood dldJicI iocome (correlation "" 0 .71).
.1 ) where /30' 131' ..554. llllcome.! e~limated regression function (So2) is plotted in Figurt' :o. + f311 tlcome! . = f30 + {JI l n collle.851IlCOIIIl' ~ O.IH) . =0 607. R~ 12. . 'AC could model test scores as a functi on of incomt: {/Ild lhe square of income.. In come.:n mathematically as TesrScore.IJO..9) (0. is a quadrat ic fu nctIon of the independent vari"ble.Ih diMricl. One way 10 appro'{imalc such a curve mathematically is to model the relations hip as a quadratic (un(lioo.:ornc in th e .• 25' CHAPTER 8 Nonlinear Regression Functions If t\ slraighl line is not an adeq ua te description of the relationship beh"cen district income and test scores.11131 \.. I) \\:ith the multiplt!' n:gre~sio n model in Key Concept 6.2 yields r..2. E ( TesrSto re. is the inr. and II .. you .. Eq um io n (80 1) is e~\ l led the qu adratic regressio n model became the population regression function. as Income a nd I ncomt? the nonl inear model in Equamll1 (8. as usua l. ill see that Equa· tion (~ . is M error term thal. rcprest:nts "Il the other fac tors T ha t de termine leSt sco res. Ilttn would nallen out as dist rict income gels higher. :Iller defining the r~gn:sson.s. (':\. This cu rve would be steep for low values of district income.1 where (a. however. lI ' unknown popul ation coefficients can be est imated a nd tcsted using the OLS methods descr ibed in Cha pters nand 7.I ) is in fnet a version of lhe multiple regression mode l with twO regre~. and th e second re gressor is IIIC()IlI e1..nrs.sion .1). you cou ld predict the lest score of a district based On ils average income.3 cu rve that (jLS the point>. TllU~.1 ) is simply <1 mul tiple regression mode l wilh two regressors! Be<:<lu~c the quadrat ic regression model is a variant of m ul tiple regrc<. what is? Imagine drawing. The fir st regressor is I ncome.3 + 3.ln d /31 are coeffici ents. /l1wm e. But these population coefficients are unknown and therefore must be estimated using a Sdm· pic of datil..) = f30 + f3Jncorne. is the square of income in the ith dist rict.2.. in Figure !j.27) (O.. E stimating the coe ffici e nts of Equnti<lfl (K 1) using OLS for the 420 o bservations in Figurc H.. Income. A quadralic population regression model reialing lest scores and incomo: i~ wriu.. 0. usual) standard errors o f the estimated coefficicll h are gi\cn if! parenlhesc~ Thl. it might seem difficult to find the coefftcie nts of the quadratic func tion that best fi ts the data in Figure 8.. At first. If YOli knew the population coefficien ts f3r" 131 ' and 132in Eq uation (8.2. If you compare Equation (8.0423IlIcom r .}.. 4 f32 / I/(:om('~ + 1/" 1'.
Because Equation (8. Indeed the pvalue for the Istatistic is less than 0.1) holds with /32 = O. so we can reject the hypothesis th at /32 = 0 at aU conventional significance levels. . If the relationship is lineaL then th e regression function is correctly specified as Equation (8. We C(ln go one step beyond this visual comparison and formally test the hypothesis that the relationship between income and test scores is linear.O. ~ te rm tl. its ~ using the OLS knts of Equation f = U. th en mate such a :tion.81.. this exceeds the 5% critical val ue of this test (which is 1.5 regre~ion function fits the dolo better than the linear 01.M IJ 10 20 30 61) 40 so DiHTicl income (thousands of dollars) Itic (unction of ~tion (R.3: The quadra tic modd fit s the data better than the linear mode l.n is ssion fu nction. . ::w 700 680 660 (8.1). you bille.554. Bul theSi: ed using a sam l I .0423/0.aL (IS luation (8 . the quadratic regression function seems to fit the data better than the linear one.1"11at j~ f income. Thus. In absolute value. Thus thi s formal hypothesis test supports our informal inspection of Figures 8. In short. we can test the null hypothesi s that the population regression fu ncti on is linear against the alternative that it is quadratic by testing the nu ll hypothesis that {32 = 0 against the alternative that /32 *.1 \ (.1) is just a variant of the multiple regression model. This {statistic is ( = (~l .0043 = 8.1 FIGURE 8. District Income with linear and Quadranc Regression Functions Test score 740 Li near fE"9 ff'S>ion The quadratic 01.. The quadratic function captures the curvature in the scatterplot: It is steep for low values of district income but flat tens out when district income is high.) fllts are given iTl I ted in Fi~ure s}· superimposed over the scatterplot of the data. 1) .3 A General Strategy for Modeling Nonlinear Regression Functions 259 ip between t!> the points lcollle. (It. if the relationship is linear. the null hypothesis that /32 = 0 can be tested by constructing the tstatistic for Lhis hypothesis.tu " Quadrati<: regressil)l1 he r ll district.8.01%.1 ) with tho: i'see that Equa' two regn::sson. \. the n Equation (8.2) is ( = 0. that is.96). against the alternative that it is nonlinear.: m~ .2 and 8. Thus.0)/ SE(ffi2)' which from Equation (8. except th at the regressor Inc:ome 2 is absent.unc : q ua d ratlc ~ (8. after del in E4uation ~Ie regrc%ion. ntl income is Scatterplot of Test Score n. 620 (.5 regression function.
Equa tio n (8. H o wever. flIC models In the hod)' of Ihit chapl er fire all in the first family_ Appendl~ ft.. X l . II . ._ that is.2.) is Ihe popula tio n nonlinear regreSSion f\lndio n... If the populmion regression func lio n is line ar. ' X ". + . a possi· bly non linea r fun ction o f Ule inde pe ndent varia bles Xl i' X 2i •• .~~ but I~ II hnear fundl "it the unknown pammetcrs (the 11"). . i~ the e rro r te rm . X k .l t.. the e£fe~1 on y or a change in X I' uX I • ho lding Xl> . TI XI ·10 run exp.• .260 CHAPT(R 8 Nonlinear Regreuion Functions The Effect on Y of a Change in X in Nonlinear Specifications Put aside the test score example for a mome nt a nd consider a gene ra] probk:m You want to know how the depe ndent va ri'lble Y is expected to change when tht independent va riable Xl changes by the a mo unt 6X I . £ ( Y..) .. . Xt. second h. ilnd II. X k . the expected cha nge in Y is 6. When the popula tion regressio n fun ction is Iinc'H...Xk/. + t3. I Tbe no nlinear popula tion regressio n models conside red in this chapte r a re of the form \'alue Ih~ p ~~' jI \~ YI :::: A X il...mJI)...<SInn funcllon IS a nonlmear function o f the ). froll! .. Y = t3 l AX I . In the :)«ono fllmil).3) allows for nonlinea r regression func· tions us well. Whe n the regressio n function is nonlinear.X ) =Al + 13IX" 2l•. constant..fi X II. .. .X! j. X 2. In lhe i~ the populllllon rcsre... i = t. • X A + t32XZ. X k. (~. • Xli. in Ihe q u ~d rat i e regressio n mo de l in Equa tion (8...lly.\ . Ihenj(Xli. . holdi ng constan t other im.m. Ihe pi A generalformula for a nonlinear population regression function . + fJJ l ncomer... penden t variables X2.. Fo r exam ple.. the e xpected change in }' h more com plicaled to ca lcula te because it can depend o n the values of the inde· pendent variables. X 2J •• . however.the populalJon n:tt:rc:u.3) becomes tbe lincar regression model in Key Concept 6.Ii" '? fll. .only o ne inde pe ndent vari:t ble is present. . The effect on Y ofa chanEe in X.) .. X" ..a nd Eq ua tio n (8.. is the d iffe rence In the: '1l1<:" tcon Mn(mline . I ). this effec t is easy to calcula te:As sbown in Eq uation (6. As d iscussed in Section 6. .)n function IS a noJl' hnear funcuon of lhc unknown parnmclcl"5 and mlly or mll~ not be" mmlmear funcllon or rhe .01 whcre !(X li • X2i . so X I is Income a nd the population regres sion (unction is j{ lncomt:.....2.3) . Dccausc the population regressio n func tio n i~ the co nd itio na l expectation of Yi give n X I/' X u •.• X"i. ..4).. where! ca n be a non line a r fun clion.) + Il j. in Eq ua lion (X3) we a llow fo r the possibility Ihat Ihis condi tiona l expectation is a nonlinear fu nctio n o f X l. .) = f30 T f3 1/ncome... IX II' Xli' . where 131 is the popu lation re gressio n coefficien t multiplying Xl.. r rej!rt!>l>lon·· IIpphes IWO co~ccptUlllly ~iffcrcnt ramlh~ of modeh. I tllk~ . up model.
XI: ilnd th e predicted value of Y whe n {hey take on the values X I ..1b' 0111 \)IC Application to test scores and income. denot e this esti mated function by f: an example of such un estimated fun ction is the estimated quadratic rt:grcs sion fu nc. X:!. .. X2•• · •• Xk ) be the predicted value of Y bas~d on the estimator f of the popu lation regression fUllction.\'"I' X 2. X I:' The difference. What is the predicted change in tcst scores associated with a change in district income of$I 000..:: 261 "'I KEY CONCEPT 1 in X.rm . based on the estimated quadratic regression function in Equation (8./f. . X . Ih~ (..l. Y. X 2 . . between these two expectcd val ues. X k constant That is. . X.1 .r. he Ie· aT. (8.X. ( is de· The c:.1..:ssion m ouel of Equation (8.. ~ m.... A t a general level..( d pVO ... .6. + "X.3).~s. Xd . 'he .ff. Ihis effect on Y is [) Y = is.1y) of the change in X l is the diff~re nee between the predicted va lue or Y whe n the inde pendent variables take on the values XI +.. constant.' 8. . lirsi estimate the population regression fu nct ion.4) :be .  THE EXPECTED EFFECT ON Y Of A CHANGE IN Xl IN THE NONLINEAR REGRESSION MODEl (8 . ). X 2' · •. X 1..XI:' Tn the non linear regr. associated with the change in X l' ~Xl.. hold i ng X 2• . To estimate th e population effect.l.r'\ :lion . . . d ...nly res.). Lei I{X!... . ··. .. is the difference between the value of the population regression function before and afler changing Xl ' hold ing X 2_•••. thb. the e~pt!clcd change in Y is the difference : 8. T11en the predicted change in Y is ~y ~ j(X. ..tion in Equation (8.. . I lXI.'" ..n of Ihi~ expected value of Y whe n the independent variables take o n the values Xl + aX I.. X 2····· X".3) "~ . X. th e population effect un Y of a change in Xl is also unk nown..3) 'SSI The estimator of this unknown population difference is the difference between the predicted values fol' these two ~ascs. say !ll~ is whal happens to Yon average in the population wh e n X l changes by an am (l unt !lXI' hol ding consta nt the oth e r variables X :" • .pected change in Y. X I .X2 . 111e estim ated efIect on Y (denoted. . ""~ ".. effect d e pends o n the initial dis trict income.~. Xd· Because the regression function/ is unk. Xj.md the expected value of Y when the independent variables take on the values X l' X?...W.. We there fore . (S.1 A General Strategy lor tv\odeling Nonlinear Regression Functions .5) . j.X 1.Xk• The met hod fo r calculating the expected effect on Y of a change in Xl is sum marized in Key Concept 8.) . . .Y = l(X I + l it.. %flJJ¥H5M>b. .. lion.2)? l:kcausc that regress ion function is quadratic.2).. " .f(X"X.nown. Xl .odd Irune· CI OO n {he J(X ) + .~/:. XI: .
96 points versus 0.0423 X 402) = 694. we need to eompUle the standard e rror of ~ y in Equation (8. " Ln the nonlinear regression models of th is chapter.96 points. the predicted value is 607. 0423 X J01 "" 641.85 x 4l . 42 point). the slope of the estimated qua· dra tic regression fu nction in Figure 8.04 . II is easy to compute a standa rd error for ~y whe n the regression function is linea r.oo:J (the predicted chang~ are 2. and the term in the second se t of parentheses is the pre. a cha Dge of income of $1000 is associated with a large r change in predicted tt!st scores if the initial income is $10. consider lbe r .85 X 10 .2) . (S.5) 10 the quadratic regression model.42 points.57. lha t is. To do so. To illustrate this me thod.OOO is 2. Do ing so yields o (A. x II ' ) .3 + 3. th e pre dicted value of test scores is 607.85 X 11 . These predicted va lues a re calcul at ed u~ing the OLS estim ates of the coefficients in Equation (8.000 to $41 .3 + 3. The ~stimated c[(ect of a c hange in X l is ~IAX1' so a 95% confidence inter· va l for the estim ated c hange is ~16XI :!: 1..000. the predicted difference in lest sco res be twee n a district with ave rage income o f $11 .() per ca pita to $11. The estima tor o f the e ffect on l' of c ha nging Xl depe nds on tbe estim a tor of the po pula tion regression fun ction.0. + ill X 11 + il.000) than a t the higher values of income (li ke $40.6) is AY = (607.96SE(PI )llX1.62 = 0..COO) .(Po + ill x 10 + h x 10'). In the second case.0.1inS sampling error.0. Whe n Income "" 11 . iJu.85 x 40 .0423 X 412) .icted va lue of Y when income = 10. from $ toJ).O<!<J) and an increase in district income from 40 to 41.6) is the predicted \ alue of Y when income = 1\.53 .57 = 2. Y Standard errors ofestimated effects. There fore the estima ted effect c(Jnt. d.96 points.. whe n income changes from $40.n) whe re PI' a nd ~ are the OLS estimators. the sta ndard error of :lr' can be comput ed using the tools introduced in Section 7.693. Thus. wh ich va ries from one sample to the next.3 ror testing a single restriction involving multiple coefficie nts.53. The di ffe rence in these two predicted values is 6 = 644. when hu:vme = 10. Th e te"" in the fir st set of parent heses in Equa tion (8.3 + 3. we lan apply the ge ne ral form ula in EquaLion (8.3 + 3. Accordingly.(0) tha n if it is $4U. the differ e nce in the predicted values in Equation (8. To comp ute AY associated with the change in income from 10 to I L .0.3 is steeper a t low va lues of income (like $10.262 CHAPTER 8 Nonlinear Regression Functions conside r two cases: an increase in uistrict income fro m 10 to 11 (i. Said differently. O ne wa y to qua ntify the sampling unce rtainty associated with the estima ted effect is to com pUle a confide nce mlt!fVa l for the true population effect.5).(607.641.e.0423 X 11 2 "" 644.000 and one with average income of Sl O.
7) (8. holding th e other regressor. applying Equati on (S.17. ct contains cd with tho! A comment on interpreting coefficients in nonlinear specifications." l(b. lion effect. which correspond to the two a pproaches in Section 7. which entails trans forming tbe regressors so that.96. There afC two met hods for doing this using sia n· dard regression software. (8. v . predicted d changes alcd qua Ollle (like \V'hen applied to the quadratic regression in E qu a tion (8. then we have com puted Ihe standard effor of 6 Y. The second me thod is to use "approach "2" of Section 7. F .Y = 2. + 2tb f 1~YIS£(dYW.57 riet with . Thus a 95% confidence inte r val (o r the change in th e expected va lue of Y is 2 . tllis is not generally the case in a nonlinear model.1 A General Strategy for .94.6) I ~d \'a\ue the pre ~u using Inco me Thus.).10 ) + ~2 X (1 12 . le differ .ilnd solving for SEedY).8} is derh·. Th is means that in nonlinear models. The standard error of flY is then given by2 S£(6Y) = V 10' = ~.29). j. Doing this tra ns fo rmatio n is left as ao exercise (Exercise 8. o ne of the coe ffi cie nts is {3.0423 2 points. the regression coefficient.17 or (2. 8.9).thal is. But. Because 6.. if we can com pUle I~e standard error of {3] + 21~. The first method is to use "a pproach #1" of Secti on 7. I~Yl Vi' (8. For eXH mplc.C lesting this hypoth· esis.Modeling Nonlineor Regre~on Functions 263 'O. as we have seen.94 = 0. That is. the regression func tion is best interpreted by gra phing it and by ca lcu lating t he predi cted effec t o n Y o f changing one or more of the independent variables. holdi ng the square of th e d istrict's income constant. ~I X ( 1 1 . + 2Ib~Sli<P.z. In the multiple regression mode l of C hapte rs 6 and 7. /31 is the expected change in Yassociated with a change in Xl.. + 21. c\ on yof unction.1()2) = PI + 21{J2' 'Ine standard error of the predicted change therefore is in Eq uation (8..3.(XlQ estimated change in test scores associated with a change in income from 10 to \1 I'e can nodel.J b)' noting thar the F·stati~\)c is the square orthe lstatbl.s had a natural inte rpretation . which is to compute the Fstatistic te~ting the hypothesis th at (3.8) gives SE(ily) = 2. constant. the Fstatistic testing the hypothesis that /31 + 2 1/3 2 "" 0 is F "" 299.3. + 21f3. in the transformed regressio n.2). l t function i~ deuce in te ! error of oli' lconsider l}\e ~ting a siogle ~Equall('" (8. + 21iJ.96 x 0 .96 == 1.1) as being th e effect of changing the district's income. w hich is uY = S£(6Y) = SE(jJ.8) 6 points.6). it is not very helpful to th ink of /31in Equation (8.82 = O. r.8.0.961'\1299.0423 x 641..3.3 for testing a single restricti on on mu ltip le coe ffi ciem5.5 ).63.
Why might such non linear dependence ex ist? What non linear shapes does this sugges t? For exam.tcd that the quadratic model [i t th e data better than the line ar model. r lol/he estimated nonlinear regressionJunclioll_ D oes the e stimated regres· sion fu nction desc ribe the dat a we ll? Looking a t Figun:s 8.1. Jdemijy a possibl~ nonlinear relationship. pic. 5. After work ing through these sectio ns you will unde rstand the c haract eristics of e ach of these functions. Th e fi nal ste p is to use tht' esU m ate d reg ress io n to ca lcul ate t he e ffe ct on Y of a change in one o r more regrt!ssors X us ing the method in Ke y Concept 8. S ec l jon ~ 8.2 and 8. th e e lJe" Oil Y oj a change in X. we develop these methods fo r a nonlinea r reg r e"~L()1l . implc.3 cont ain var io us nonlinear regression functio n. 2. • 264 CH APTER 8 Nonlineor Regression Functions A General Approach t o Modeling N o nlinearities Using Multiple Regression The genera l approach to modeling nonlinear regression f unctions taken io thl~ chapler has five elements: I. Most o f the time you can use (sta tistics a nd F·stat istics 10 t e~1 the 11ull hypothesis Ihal the popula tion regression funct ion is li nea r against the tllter na ti ve thaI il is nonlinear. SpeciJy a l1ol1linear Junction and esrimate its param erers by OLS.2 and 8. 4. Before you even look althe data.:nd on [he value of X or on another independent v ari able.. tha t can be cqi ma ted by OL~. ask yourself whether the slope of the regression function relating Yand X migh t reasona bly dcp.2 Nonlinear Functions of a Single Independent Variable This section provides two methods for modeling a non linea r regression fun ctiOn To keep thi ngs :. E~'l imate 8.:.3 suggc:. The best th ing to do is to use l'l:O_ nomic theory and what you know aboullhe application to suggest a pO~:'lblc nonlinear relationship. thin king about cl assroom dynamics with l l yearolds suggests that clIlting cl ass size from 18 students to 17 cou ld have a gre ater eHec! than cutt ing it fro m 30 1029. Determin e whether fh e nonlinear model improves upon a lin ear model. 3. JU~ 1 bet:ause you think a regression funt:lL orr is no nline ar does no t mea n it really is! You must dClc mline e mpirica lly whe the r your nonlinear mode l is aprro priate..
and so o n ./33 = 0. se eco funct ion that in volves on ly one indepeodcn t variable.8. The polynomial regression model of degree r is br it from Yj = f30 + {3. . they call be used in combinatio n.)(2.. + 132Xr + . Because Ifo is a joint null hypo thesis with q = r .1.10) ion func li01'l ~ rt:grc~ . wh e rea ~ he re the regressors are powers o f {he same depe nde nt variable.. f. X. the null hypothesis (Ho) that the regression is linear and the allernative (HI) that it is a polynomial of degree f correspond to HO:f32 = 0. In pa rticula r. an exte nsion of the quadratic regression used in the last seClion 10 model the re lationship between test scores and income. . TIle second method uses logari thms of X andlo r Y A Itho uglt these met hods a re presented separale ly.. . against X" X?. Accordingly.2 Nonlinear Functions of 0 Single Independent Variable 265 ~ in tbis I ~.• {3. H.9) is rodel. . these models can be modifieu to include multiple independent variables. however.10).. Xi . + f3.5. lei r denote ihe highesTpowe r of X tha t is included in the regressio n. The polynomial regression model is simi lar to th e multiple regression m ode l of Chapte r 6.2.. X l. Equalion (8.j = 2. tbe unknown coe ff icients Po. Thus the tedmiques fo r esti ma tio n a nd in fe re nce developed fo r multi ple regressio n can be a pplied he re.. then the q uadratic a nd highe r·order temlS do not e nler the population regression functi o n. Equation (8. it can be tested using the Fstalislic as described in Section 7. .9) can be esti ma ted by O LS regressio n of Y..Xi + U i · (8.. :st! the cs!\' I ne Of mort: Testing the null hypothesis that the population regression fun ction is linear. The first meUuxi discussed in this section is polyno mial regressio n. except tha i in Chapler 6 the regressors were distinc t independent va riables.9) is the quadratic regression model discussed in Section 8. (8. . .D.: at least one{3J =I. ted regres ~ suggested I .sible f rtcutting ~g \'hether depend uch non exam· Polynomials O ne wa y to specify a nonlinear regression functio n is 10 use a polyno mia l in X.• X.X.iOfl The nuH hypothesis tha t the popula tio n r egression function is linear can be tested against the alternative thai it is a polyno m ial of degree r by testin g No against H I in Eq ua tion (8... {3. that is. When r = 3. . In ge ne ral. As we see in Section 8. ..9) I Sections t be esti stand the When r = 2.1 restrictions on the coeffici e nts of the population po lynomial regression model. If th e popula tion regression func tion is linear. so tltat the highest power of X included is X 3. F. Just is appro sl the null Init really t the alter· called the cubic regression m odel. in Equa tio n (8. .. tbe regressors are x . = OV5. {3.
. In many applic<Ltio ns involving economic data. . L = 0 in slep 3. Iha l is.' \0 choose a small maximum order for th e polynomial .! ... the)' do nol have sharp jumps or "spikes.raph.. U:>c the I. 1) (0. in Pqua. prccbion or thc estima ted coefficie nts.I bends (that is. Test whether the coemcien! on )(. 3.Q2Jncome .. c~ressioo fo r Ihal r..!_!_ _ .lU should include enough to modcl lhe nonlinear regression funct ion adequately. so use the polynomial of degree r. I . If you do nOI reject (3. But increasing r means add ing.1 is ze ro.000)5) ti' R. be~io wit h r = 2 or 3 or 4 in step 1.71 ) (0.. This procedure. = 0 in step 2. Ihis answer is nOI very useful in practice ! A practical way to dete rmine the degree of the polyno mial is to ask whct ht.. but no more.. how many powers o f X should be incl uded in a polynomial regression? The unswer balances a (radealf between ncxibility and statistical precision. inflection point~) In its g.029) (0.I .statistic on Illcome' is 1. If you n:jccl. Ihen these tcrms can be dropped fr om the regressio n. H you reject this hypotbesis.our po lyoo mi al is statistically significant. such as 2.h. Increasing the degree r introdUl.. Pick a maximum value o f r and estimate the polynomial . Application to district income: and test scores.:r the coefficients in Eq uation (R.1 4. Thus the a nswe r \ 0 Ihe questio n of how many terms 10 include is Ihal }\. If '\Q.I 266 CHAPTER 8 Nonlinear Regrenion Functions Which degree polynomial should I use? Th at is...stat istic to teSllhe hypotbesis that the coefficie nt on )(' {fi. or 4that is.97.. If you do not reject {3.9) associa ted with largest va lues o f ra re zero.:\ more flex ibility into the regression fun ction and allows it 10 match more shape . continue this proced ure until the cod ficient on the highest power in ). a polynomial of degree r can have up \0 r .:t. 2.1 + 5." If so. . 0.I . Unfort unately. more regressors.. Th is recipe has o ne missing ingredient : the initi al degree r of the polynomial. 600. 50 the null hypoth~is Ihat the regression fo~' •__ .li yidua l hypotheses are tested sequcntially...nA!1vc thai it is a cubic at Ihe 5·. (he n belongs in the regres· xr sion.O. (5... The estimated cubi<..eliminate X' (rom the regression and e"ti mate a polynomia l regressio n o f degree r . lI. J. tion (~t9» ) is zero. Ihe non linear functions are smooth.: regfl::~' S1011 function rclati ng dist rict income to lest scores is (8 ~ . use lhe polynomial of degree r .!~~.ffi6/ncomil + O.(0)69JllcomeJ.~~ . The . which can reduce tht. then il is appropriah.. which is called seq\lenti al bypot he~is tes ti ng beca use int. _.55:'1. is summarized in the fo llowing ste ps: 1.
Here arc some examples: • The box in Chapter 3. begin ~bic regreS (8.. Berore introducing those specifica lions. o \ynomial. the exponential fu nction i.. "The Gt:nder G ap in Earn ings of Collt:ge Grflduatcs in th e U nited States.If so. it is often assu med that a 1% increase in price leads to a ce·rtain percelltllge decrease in the quantily demanded. Howe \'er. wi th a p value less than O. rc smooth. Th e best way to in terpret polynomial regress ions is 10 plot the est imated regression fu nc lio n and to calculate the estimated effect on Y assoc iated with a change in X (or one or more values of X. the wa ge gap was measured in terms of dollars. hich is e tested Logarithms Another way to specify a nonlinear regression fu nctio n is to use the nalural loga rithm of Y and/or X. we revie w the exponenti al and natural logarithm fu nctions. ropnate w hat is. we fo und that district income and test scores were nonlinearly rela ted.ion fu nctions. regrcs t and l:sti ficient on graduates." examined th e wage gap between male and fe male college tor that r.71828 . 111 The exponential function and the natural logarithm _ Tbe exponent ial function and its inverse. play an import ant role in mode li ng nonlinear rcgres. professions and over lime when 'hey are ex pressed in percenlage terms. hat you lely. Interpretation ofcoefficients in polynomial regression models.so the null hypothesis Ihal the regression func tion is linea r is rejected againsl tht! alternative Ihal it is either a quadratic or a cubic. ic at the ) • .2 Nonlinear Functions af a Single Independent Variable 26 7 deoff uces Ihapes: ints) in Lee Ihe ~ or X lcvel. the nat ural logarithm. In that discussion. The coef· ficie nts in pol ynomial regressions do not have a simple interpretation . the FSlatislic testing tbe joint null hypothesis that lhe coefficienls on Illcome2 and II/comc' arc bOl h zero is 37. and many relationships are naturall y expressed in terms of percentages. might it be Ihat a change in di strict income of 1%rather than SIOOOis (lSSo cia led wjlh a change in test scores th aI is approximately constanl for di fferent va lues of income'? • In the economic analysis of consumer deman d. llle percentage decrease in dema nd resulting from a 1% increa~ in price is ca lled Ihe pricc elasticil}'. Moreover.1. but N hether ro.Ol%.8. The exponentiul runcti on of x is e'< (that i::.." ression fun': . it is easier to compare wage gaps ncros!!. Would this relationship be lincar using percenlage changes'? That is. ~e in Equa. Logarithms convert changes in variables into percentage changes. e raised to the power x). . II the coef l ificant. Regression spec ificat ions that use natural logarithms allow regression mod els to estima te percentage relationsh ips such as these. I . • In Section 8.7. where e is lhe constant 2.
the natural log(. The logarithm function h as th e fo llo wing use ful prop e rties. y = In(x).. In (x) . That is.0 80 100 60 l ~(J . The link between the logarithm a nd percent· ages re lies o n a key fact: Whe n !:u is small.lfil hm is the funelion for which x 0:: In (e") o r. The aaturallogllTitlun is Ih e inverse o f the expone ntial func tion. the na tura l logari thm . TIle slo pe of th e logarithm function In (x) is l / x. is graphed in Figure 8. con tinu c~ to I In(l/x) = ·In(x). The logarithm fu nction has a slope that is sleep al firs t. in th is book we conside r on ly logarithms in base e.12) (8... . x = In[exp(x)J. ~. the di ffere nce between the loga r ilhJl1 o f x + ax a nd the logarith m o f x is approxima telv . p ) (•. tha t is.:L"L20 . 8 Nonlinear Regre~sion Functions FIGURE 8. In(ax) = In (a) (b.and In (X') == lfln(x).4 The Logarithm function. A hho ugh there are loga rithms in o the r bases. such as base 10. then fl <l ltens out (<llthough th e function increase) . In(x / a) ::::. No te th at the log arith m func tio n is defined o nly fo r positive va lues o f x. InlXI is steeper for.~on thon for lorge vo'ue~ of X.13) l + In(x).. X also wOlten as exp(x) . 15) Logarithms and percentages. . that is. (S. ond has slope 1/ X.In(a).268 CHAPTER. Y = In{XI y 5 The loga ri th mic functi on Y". y '" InOO 4 3 2 °o" ~~~~.4. is only defined for X> 0." The loga rithm fun ctio n.111e b ase of the n a tura l logarithm is e.. the pereenlage change ill J d ivided by lOO. so when we use the term "logarithm " we always m ean " n atural logarithm. equiva lently.
01 (or 1%). lf X changes by 1 %.) = In( 105) . th is is readily done using a sp readsheet or sta tistical software.1 7). ~ta tistie.04S79. In this case. 6< /..2 Nonlinear Functions of a Single Independent Variable OIl 269 In(x + llx) . {3t( ilXIX) wh e re th e fi lial step uses the approxi· mation in Equation (8 .l3.Dl.8. The interpre tation of the regression coefficients is diffe rent in each case.In(X)] .96SE(fJI)' As an example.0 113 1. t he n 6. 1l1e re are three d iffe re nt cases in which loga rithms mi ght be used: whe n X is transt"o rme d by ta king its logarithm but Y is not.In(. we cou ld usc t he linear·log specifi catLon in Equation (8. Instead o f the quadratic specification . .1 7) Because Y is no t in logari thms but X i~ this is sometimes re ferred to as a 1in~8"" log model. In the lin ear· log mode l.l311n(X + JiX )] .16).]e In(.In(lOO) = 0.1. Yi ::.16) wh ere " S!'" me <lns " ap proxim<ltely equal to:'1l1e de rivo tion of this ap p roxima ti on relies o n colcuhL\ but it is readily demonstrated by trying o ut sOme val ues of x and !lx.. . Y is not. when Y is tran sfon neu to its logarit hm b ut X "is not . Case I: X is in logarithms. hypotheses abo ut 131 can be l es l e~ using the l . no (8.05. wh.[.In(x) 6. The o nly diffe re nce between the regressio n model in Eq ua tio n (8.In(x) = In( 101) .!. a 1% change in X is as.t OLS regreSSio n of Y .1 7) . In (X ). x 15 small ) .r = 1. (8. We d iscuss these three cases in turn . the regressio n mod e l is Po + p \lo(Xj ) + u"i .5/100 = 0.oc ia ted wit I) a cha nge in Yof om f31 ' To see this. in this model a 1% change in X is associated with a change of Y of 0. 17) and the regression model o f C hapter 4 with a singk regressor is Iha l the rightha nd vari able is now the logarith m o f X raliler Ihan X itse lf. E stimating this regression by OLS yields • .x 1x = 1/ 100 = 0. Th en Poa nd 131 ca n be estimated by the in .In( IOO) = 0. th en :1XIX = O.and a 95% confidence interval for /31 can be consuucted as /3 1 :: 1. W he n ax = 5.In (x) (which is 0. thus. consider t h l.OI ) is very clo se 10 In(x + ax) . while In(x + Ax) . . To estimate the coefficien b 11) f3 0 a nd (3L in Equa tio n (8. Thus /lx / x (which isO. when x = 100 and A. fi rst com pu te a ne w va ria ble.).995% ).00995). return to the relationship betwcen district income and test scores.00995 (o r 0. + 6<) ..c.l3d ln (X + JiX) . Fo r example. x ( when ~._ + J3 l ln(X )] = . The three IOKarithmic regression models. o n In(X. a nd when bOlh Y and X are lransfonn ed to their logarithms.! diffe rence betwee n 111e pop ulation regressio n func 10 ti on at values of X that differ by 6 X: This is [J3u + .
42In(/ncomc).(XXJ is 36. like the quad rat ic specific<ltio.! hctwel:n a district wit h average income of $40. H2 = 0..36 points. (3.20 600 U L IU 20 30 40 5U (. 720 1011 '''''' 660 610 (.000 versus SI1.In(lO)J = 3.R + 36. the estimated regression fun ct ion is not a slraight lint: Like the quadratic regression funct ion in Figure R.jlh an increase in lest scores of x 36.4210(1I)J .47.n~ bch~ een the predicted values: AY = [55 7.42 x (10(41 ) .(X)() and a district with avcmgc income of S41. this regression predicl~ Ihal a SIOOO increase in income has a larger effect nn teS I scu res in poor districts than il does in aID uen tl. Similarly.42 = 0.5.8) ( lAD) {K lS) AC\':ording 10 Equ ati on (8.5 The linearLog Regression function Ten 5core 7of() The estimated linearlog regrenioo functioo y . Po + ~ 11ntXl captures much of the nonlinear relation between ted )(O(e~ ond di~trid income. The estimaled linearlog regressio n functio n in Equation (S.56 1.In (40)] = 0.\8) I!> plollcJ JI'l Figure 8. the predicted d ifferenc&. il is initially stee p bu l Ih..ue the effect on Y o f a change in X in its original un its o f thou~nds of dollars (nol in logari lhms).8 + 36A21n( lOll 36. we can use the method in Key Concept 8.000? The estimated value of /1 Y is the d if£er.42 X [I n(ll) . Thus. 18).90..lo. 1S) is the nalurallogaril hm of income ralher than income.' 1 Distr icl in corue (thousa nds of dollars)  F WScore = SS7.: n natlens OU I (or higher levels of income.3. To cstim. wha t is the predic ted di(fe re nce in lest scores (or d istrict!> wit h a\'~ r<lge incomes of $10. Because the regresso r in Eq uation (8.• 270 CHAPTER 8 Nonlinear Regression funcnom FIGURE 8.8 + 36. a 1% increase in income is associated ". om = .1 For example.lhtricts.1557.
a one. (8. this relationship is ~ In (E arnings) = 2.8.1.421i\(IO)1 e between (I pecificatiO !l.. From the approximation in E quation (8.19) Because Y is in logarith ms but X is no\. then a Y I Y changes by fJ t.655 + 0. r. (8.\'.e in hi s or he r wage. associa ted with some consta nt per centage increase in earnings (Y ). this is referred to lIlo d c.1 ). X is not . Rl (0. so that X changes by one ullit. n ~.).u.7. o n average in the population..In (Y) == a Yl Y Thus. Y) = /30 + f3J (X + .the rela tionship between age and ea rnings of college gradua1t:s. for each additio nal year of se r vice .86% 1(100 x 0. When X is X + dX. ~ffect on \t::~1 ! inc001e (l ( According to this regression. th en ISI(Y + a Y) . • .TIlis percentage re lationshi p suggests esti mating the loglinear specifica lion in Equation (8.OO86Age.20) ~6. When estimated usi ng t he 12. earni ngs arc predicted \0 increase by 0. if /3 taX is small. The expected value of In(Y) given X is In(Y) = 130 + f3 1X. we re turn to the e mpirical example of Section J..(086) % I for each addi tional year of age. the ex pected value is given by In(Y + 6. In this casco the regression model is In( Y.\j . If LiX = J.) "" {J(~ + {3I X .6.l.\1 = fJ \AX. Ibis is referred to as a loglinea r model.llnit change in X (.030.~ X = I) is associUlcd with a 100 X 13.21 ) a loglog Bt. a worker ge ts a certain percentage increa:. the unknown coeffi cie nts f30 and {31 ca n be estimated by the OLS regression of In (Ra rnings l ) against Age.1/30 + .777 obser vat ions on college graduates in the 2(X)5 C urrent Popul atioll Survey (th e data are described in Appe nd ix 3.. Many e mployment con t ract s spcciry th a t .81.19) SO that each additional year o f age (X) is. Tra nslated in to perce ntages. D conll~ dollau) (8.2 Nonlinear functions of a Single Independent Varioble 27 1 Case 1/: Y is in logarithms. however. % ch ange in Y To see this.thc regression model is plotted ill logarithm (It I straighl litl~ teep but Ih(tl " In(Y/) = fJu + 1l1ln( X. In(Eurnillg.16) .18) id with pt 8. In t his case. l1 tlnit change in X is associ ated with a 100 x f3 !% change in Y As an illustration.:cause both Y (lnd X are specified in logarithms. aYI Y == {3J a x. Case III: Both X andY are in logarithms.010) (0.0005) lth average difference = 0. com pare the expected val ues of In( Y ) for val ues of X that differ by l l. By first computing the new dependent va riable. an tbousand~ FIll In(Y) = 1/30 + f31(X + aX)1 ..) + IIi' (IS (8. I n the loglinear model.Thus th e difference between these expected values is In(Y + tJ.
40 1 o j II 10 20 J I) 50 61 Di strict ill COIll (thousacds of doll ars 40 II In the log..55 l. If thc percentage change in X is 1% (Ih is. 16) 10 bot h sides of this equation yields .XI X = 100 X (AX / X) = percenlagechangeinY perce ntage change in X ' (.557. then 13.X =O. 6.. In(Y) is Q linear function of X.1 130 + . in this speci[ica tion 131 is the eillsticity of Y with respec t to X. in thc loglog specification PI is...1og regreuion function. 6.1~sO' cia ted with the perce ntage change in X. is the percentage change in Yassociated with a I change in X..8 1 is th e elasticity of Y wi tb rcspeC! to X.lln(X + uK) . In the 1og. To sec. 1: lhus In(Y + i1Y) .272 CHAPTER 8 Nonlinear Regr~sion Functions FIGURE 8 . iC 6. Y fl X y:. '''' (.0021) (~.log model. A:i.5d d.In(Yl = 1 130 + 13\ln(X + .ogll'\t¥ ~ L at I i L " th e r 6.0.% chang(: in Th us.lX . (0. A pplic3 Iio n o f the approximati· in Equat ion (8.) :.23 l .6 The LogUnear and Loglog Regression funcrions In{Test score) In the Iog·Iinear regression functioo.45 1'. retu rn 10 the relationship betwee o income and Icst )cOf' When this relationsh ip is specified in this Conn. Thai is.05S4Ln(lncome). the unknown coeffici ent~ arl! <!s mated by a regression of the logarithm of test scores against the logarithm income.O: f3cyor (31 = flY I Y 100 x ( fl Y I Y) o. The res ulling estim ated equation is l ri(Te:'·I S·~.d.: lh aga in apply Key Concept 8. 'R2. ao illuslrat ion .006) (0.OI X). In(YI is 0 linear function ollnlXl.336 + O.' 2 Thus.B)ln(X») = 13. a 1% change in X is associated wit h a 13.In(X)J.the ratio of the percenta ge change in Y .
6 is t he logarilh m of the test seore_and the scalte rplo t is the logarithm of test scores versus d istrict income. (X + .24) (0. As you can see in Figure 8. a 1% increase in income is esti mated to correspond to a 0. A 1% Chang. (B.X.: in X is associated Yi = f1rl + . the re gression function in Equa tion (8.6.. fW¥$~ 82 K~Y CONlOEI'T r .. A I % chang.) + II.B 1ln(X.6.7.Blln(X1 ) + It.11 Because th.497..01131.. In each cascof31 can be eSlimated by applyi ng OLS after raking the logarithm of the depen dent and / or independent variabl e. an independen t variable X. .003) X is 1% (thai ed with a l % nd test score.557) lb an fo r the logli ne ar regression (0 .6 also shows the estimated regression (unclion for a loglin t:ar specification_ which is In(Test Score) "" 0.2. ~ 'Og.221 According to this estima ted regression fun ction. + II f In( r .' U j .s.6. 'nle following table summarizes th ~sC three cases a nd the interpretation of the regression coeffi cient {j . tbe vertical axis in Figure 8. Rl = (O.OO284ln com c. cie nts arC c~\I' ~ logarilh Jl' of . Because Y i ~ in logarithms. o r both (but they must be positive). Eve n so...0554 % increase in test scores. {3q + {3. most of the observa tio ns fan be low the loglog curve.2 Nonlinear Functions 010 Single Independent Variable 273 I~ lr LOGARITHMS IN REGRESSION : THREE CASES Logarithms can be used to transform the dependent variable Y.iX '" I) is associated with a IOO{31 % change in Y.change ' with 3 cha nge in Y of 0. the loglog specificatio n fits slightly better than the loglinear specificatio n. Figure .1•. The estimated loglog regressio n fUllctio n in Equalion (8.8.497). For comparison p urposes.e vertica l axis is in logarithms. o sec this. f dollars) m In(Y. the loglog specifica tion does not fit the data especially well: At the lowe r values o f income.439 1ge in Y aSSO'  + Q.24) is the Slraight line in Figure 8. incomt "" A ill Xhy 1 unit (. so {31 is Lhe elasticity of Y with respect to X. 23) is plotted in Fig ure 8.Xli roximation y X· (B. while in the middle income range most of the observa lions fall above the estimat ed regression function.[t!>sion "'M n t Regrouion Specifi(orion Interpretation of /1. (8. in Y.ooOl~) 0.:: in X is assoc iated wilh a {31 'Yo chsngt in Y.) = .) = f30 + . Th is is consisten t with the higher 'Rl for the loglog regression (0. The three logari thm ic regressio n models are summa rized in Key Concept 8.
If the I. The problem is that . HO\\ e .. consider the loglinear regression model in Equa tion (8.J lo'ibis OlJlen. ThUs.. i. .' thl. the estim ated regrL's sion can be u::... the loglog model had the higher R2.) If II.24).1 rewrite it so that it is specified in t~rrns of Y ralher than In( V ).! income regression.) e £(~.. sion has an "R2 o f 0. using economic Iheory and either your or olhe r experts ' knowledge of the problem. .• 274 CHAPTER B Nonlmeor Regression Functions A difficulty with comparing logarithmic specifications..J. Bccause the depcndent variables in the loglog and lineurIog modcl~ arc diffen:nl.. ."veil ir !:lId = O.. Reell lllhlH th!.I X. the 8..) F 1. Il ow can we compare the linearlog model and the loglog model? Unlor!u.. labor economists typicnUy model earning~ using logarithm::. To do so. Which or the log rcgr~ion models best iiI.'r.ll\'e "· .. th e R2can be u~d 10 compare the log·linear and loglog moo_ els: as it happened. J Y."urC$ tbe fraction of the variance of the depende nt variable explained b~ Ihe regressors. whether it makes sense to specify Y in loga rithms. 18) and the linear rcgf('.23) and (R. the best thing to do ill a pUrlieul "r application is 10 decide. nOI simply obtiUI1.~ E(e"'). (Ie. Sl) Ih\! linearlog model fits the data ocHer.1111.:~lrc rather than its loga rithm.2 can h.: R: mca.tI. . II r\ a bit trickier to compute the predicted value of Y ibdf.j.1 Y).'\eu l X. in terms of poinL'i on the test rather than pcr<:enlage inCTe asc~ in [he le~t score~ so we C ocus on models in \\hich the dependent va riable i!> the test s.ln modeling lest scores. the R~ COIIl/ot he used 10 compare these two regressio ns beeau~1.8o r 13 IX/ + II. Hnd 50 (Orlh arc often mosl naturally discussed in pc'r centage term!t. tak~ thl' I!xponcnlial function of both sides or the Equation (H. is di\tributcd indcpendemly of X" then the cxpccteJ vulue 01 Y.561 while the linear regression has an RI o f 0.. natel}.!> the data? As \I.:n dent variable Y has heen transformed by taking logarithms.:u~ test result:.e saw in the discussion of Equalion~ (8. it docs nol muke se ns~ to I:ompure their R~' s.kp. ) = e A. anyv.lhe linearlog regrL'\.'ay) nat ural to di <.19): the res ult is Computing predicted yalu~s of Y when Y is in logarithms.exp(. it seems (to us.ed to compUl. £(e". Fo r example. because wage compari~on:. the appropriate predicted value of Y.'>_ sion of Y against Xln the test sco re am.508.1 \J more ad~anccll af"J can he: '~lppcd "ltOOUt I""s or ~onunully_ . contract wage in c rea5e~. Beca use of Ihis problem.) = ~+iJ.. Similarly.. the other is In(YJJ.: used to compare the linearlog regression in Equation (8. To sec Ihis." din:ctly the predicted value of In( Y). gi\cn X IS E( Y.'ll d~pcndent variables are different lone is Y.
) and to usc this ~ stimatc whe n computing the pred icted value of Y. the mtu their Ite by taking the exponential funct ion of l30 + that b. but we did not test Ihi~ formally. 1 + llJ.+. whieh is the approllch used in this book. but thb.. n p. In practice.. 00 II be .560.9) (31.2)} and cubic [Equation (8. the estimated cubic regression (specified in powers of the logarit hm of income) is T eS1Score = 486. we compare log arith mic and polynom ial models of the relationship bctwt. depen I "r. economic theory or expert judgment might suggest a func tional form to use. Because the coefficient on ilicom CJ in Equat ion (8. so we select the cubic model as thc preferred polynomial specification.:r iscuss in th~· I scurc Polynomial specifications.:J + 3.7?1::: 0.26.9[ln(/llcomc)j2 (79. by setting Y. this is ofte n acceptab le because when the de pendent variable is specified as a logarit hm.2 \o~ Nonlinear Functions of a Single Independent Vorioble 275 one. fi .g:~c~. then we can COIl clude that the specification in E qulilion (K 18) is adequate in Ute sense that it can not be rejected against a polynomial function of the logarithm. but in the end the t rue form of the populat io n regression fu nction is unkn own.4) (87. Oue way to dn so is to a ugme nt it wit h higher powers of the logarithm of income. Polynomial and Logarithmic Models ofTest Scores and District Income In practice. T n practice. Accordingly.x.8! Yo: l b is predicted value is biased because of the missing factor £(e"'). If these additional terms are not statistically different from zero.7) ~ Jt if £("..: th. O ne sol ution 10 th is problem is to estimate the factor £(e". ltnd Ink. We conside red two polynomial specifica tions spt:cifi ed usi ng powers of In come. .. is to compute pre di cted values of th e logarithm of Y b ut not to transform them to their origin al un its. quadratic [E quation (8. . n. A s an illustration.) lohtl1lllo. (3.re!.1 I)}. II I~ 19). = e ti .11) w as significa nt at the 5% level. A nother solut ion.06[ln(/m:omt')j3. gets compliC<lted and we do not pursue it further. it is ofte n most natural just to usc the logarithmic speci fi ca tion (and the ltssociatcd percentage interpretations) throughout the a na lysis.rc<. the cu bic specification provided an improvement over the quadratic...8.4In(/l1come) .:!:'t i\'cn X i~ Logarithmic specifications_ The logarithmic specification in Equation (8.18) see med to provide a good fit to th ese data. Rl "! t he ode ls blo of the labor I risons.74) (K26) .: 1 (8. filling a no nlinear function therefore enta ils deciding which method or combination of met hod~ works best.':cn district income and lest scores.
~ 6) does nOI provide a statistically signi ficant improvemen t over the model in Equa.11 ) and t he lincar· logspecificat ion in Equation (R l 8).7 'The linearlog and Cubic Regl'tiSton Fundi on~ Tes t score The estimated cubic regre~ion IiJnction [Eqvation (8. The Fstati. with a p value of 0. wllich is linea r in the logarith m of income.ion has a sli ght edge in terms o f the R!. Because the logarit hmic specificat.555.181J are nearly identical in this l1Omple.' . 18).64. Thus the cuhic logarit.0 " Cubk 640 620 600 0 10 20 '" 40 50 D is trict in come (th ou sands of doJlar1) . Figure 8.hmic model in Equa tion (8. so the nul] hypOth. 740 720 7(".• 27. The two estima ted regression [unc. tio n (8. of the logarithmic regre ssion is 0..7 plols the estimated regression funct ions from the cu hic sp(~cification in Equat ion (8.44.818. lions are q uite simila r.561 aud for the cubi c regression it b 0. FIGURE 8. we adopt the logarithmic s pecifi cation in Equation (8.. Comparing the cubic and linearlog specifications. One statistical tool fo r compa ring these specifications is the R2.The"R'1. tic testing the joint hypothesis tha t the true coef[icie nts o n tbe quadratic and cuhle term are both zero is 0. CHAPTER B Noolinear Regression Functions TIle (·statistic on tbe coefficient on the cubic term is 0. lilM'ar. and becHuse this specification does not need highe rorde r polyno mia ls in the log· a ri thm of income to fit these da ta .1 8).1 1I[ and the e~imoted linear· los regressiOl1 IiJnction [Equation 1 8. esis that the true coefficient is zer o is not rejected a1 the 10% level. so this joint null hypothesis is nOI rejected a l the 10 % leve l.log 680 6(.
for example. the individual's gen dcr (D I" which = I if the (ill per son is fem ale) and whether he or she has a college degrcc (D 2.teache r ratio in s uch a way that the effect on test scores of a change in the s tud en tIc~u. ° .8.ct jncOil1C Is of do\1 pd) In this regression model. .l he presence of many English learners in a dist rk'\ would inte ract with the studen t. The specificat ion in E quation (8. holding constant gender.< ' rr . if students who arc slilllearning English benc fit differenti<llly from oneonone or small grou p instruct ion .c ). The possibl e interaction be tween the studentteacher ratio and the fraction of English learners is an exam· pie o f the more general situation in which the effect all Y of a change in one inde pe ndent variable depends on the va lue of another independen t variable. We consider three ca ~ e s: when both inde pcndent variables are bina ry. no reason Ihalthis must be so. could depend on the value of 1. If sO. Phrased mathematically.' 50 ''"' ':> + f32D 21 + 1/. f31is the effect on log earnings of being fe male...3 Interactions Between Independent Variables In the in troduct ion 10 th is ch apter we wondered wheth er reducing the st udenlleacher ratio might have :l bigger effect on test scores in districts where ma ny students are slillieaming English than in those with few s@ learning Eng Iish. the re could be an interac tion between gender and having a college degree so that the value in the job market of a degree is diHerent for men and women.27) '"'" . (8. is the same for men and women. Therc is. Interactions Between Two Binary Variables Consider the population regression of log earnings [Y.(8.In other words. holding gender consta nt. when onc is binary and the other is continuous..ll1is could arise. d cubic sis is not on (8. on these two binary variables is Yi = f30 + {3ID li r J. bolding schooling constant and f32 is the effect of havi ng a college degree.26) in EqulI r p lo~s the .:r..< where Dh = I if the jlll person graduated from colleg. 1\) MOd ssion func l ions is th~ lession i~is . This section explai ns how to incorporate such interact ions between two inde pendent variab les into the multiple regression model.27) has an important limitalion:Thc effect of having a colkge degree in this spl.:iCicat ion.:hc r ratio wou ld depend on the fraction of English lea mers. however. and when both are continuo us. where Y.in the log Ification in rof thC R2. = tn(Eanlillgft)] against two binary va riables. holding D II constant. the effect of DlJ o n Y.3 Interactions Between Independent Variables 271 ~ l Fstatis· l hypath 8.111e pOpulll1ion li near r~grcssion of Y.
. ification so tha t il does by introducing anothe r regressor. Th is method.) Firsl of bin bee e1(pcC 1llt: new regressor. is called an inlersC1ion term or an inter...'d a binary vari able interactioo ~gression model. + ~3dl' (8. in the binary variable interaclion specification in Equation (8.D.19) Thus. for O2. If the person is m. lhis is £(Y.ID .lhat is.4)]. in effect. whic h is f10 + f3 \d l · The E(Y. (82. X Du... The c frecl of this c ha nge is the difference o f expected va lues (tha t is. The me thod we used here to inrerpre t the coeUicie nls was. to work through each possible combination of the binary variables.1. + f32D 2t + f3lD \. x Dli ) + I~.. the d ifference in Eq uation (8. calculate th e po pula tion effect of a chaDge in 0 21 using the general method lai d out in Key Concept 8. given the same value of D. the effect of acquiring a college degree is {Jz. aher the cha nge. = 111' DlJ "" 1) = Po + /3\ X d1 4. but if the person is female (d 1 I). AJtho ugh tJlis example was phrased using log earoings. DI/ x Dk The resulting Jegression is Y i = Po A IN R + P lO t.27) does not allow (or Ihis interac_ tion between gender and acq uiring a college degree.lle (d1 0).28) is call1. which applies 10 all regressions with binary variables.nge in D 1i ) depend ~ on the person's gender [th e value of O k which is d 1 in Equa tion (R.. given a value of 0 \..278 CHA. The int eraction term in Equation (8.28) allows the population effect on log earnings (Y. ~ d. l Dl i "'" d l . uni t cha. lhe point is a general one.D b = \)  £(Y.PTER 8 Nonlineor Regression Functions Although the specifi cation in Equation (8.ID \.gender.ID " = d. the product of the two binary varia bles. The (irst step is to comp ule the conditional expectat ion of Y. and the popula tion regression model in Equation (8. it is easy to mollify the spec. The coeffici ent f13 on the interaction te rm is tbe dif· fe rence in the effect of acquiring a coUege degree for wome n versus me n. = 0) = /3.D 21 = 0) = 130 + 131X (I I + fh X 0 + ~ X (d\ X 0) = next Slep is to compute the conditional expectation of Y. To show this malhema tically. is summarized in Key Concept s j · = = . The bina ry va ria ble imeraction regre~' sio n allows the e ffect of changing one of the bina r y indepe ndent "a riab lc~ to depend o n the value of the othe r binary variable.). the effect of acquiring a college degree (a. = a. Ih x l + P3 X (d t X I ) = /3 0 + /31(i l + /3 1 + f3 3(/ t. for Ov = I. acted regresso r.29) }. t he e(fect is /31 + ~. and acquiring a college degree. the product 0/ . this is £(Y.) of having a college degree (changing D21 from 0 21 = 0 L a D y:::: 1) to depend on ge nder (D \.28 ).
30) 7<2 = 0.2HiEL .9 . Each coefficient can then be expressed c ither as an expectcd value or as the difference between two or more expected \'alues. "'hie Key Concept 8} to one with a high slUde nt.tell<. and let HiEL I be a binary variable that equals I if lhe percentage of E nglish learners is 10% o r more and e q uals 0 other wise. with estimated coef ficie nts replacing th e population coefficients.18.o ~Jdl' (8. and HiEL.9 ("" 664.ios (HiS TR. The difference io (1.1.1 .ng from a district with a low srude nHeacher rati.2).9 points.1.30)."ti ll s method. According to the estima les in Equa li o n (8.28).9) (3. interac th~ ~pec If the two fOR INTERPRETING COEffiCIENTS IN REGRESSIONS WITH BINARY VARIABLES ~ A METHOD KEY CONa/'T II (8. this effect thus is . if the fra ction of English learners is low (HiEL = 0) .and acquiring e ractjon regre~' ent variables 10 n effect.1) (8.2 .9 + 3. The inte racled regression of test sco re~ aga inst HiSTR.290.2 ( = 664 .1 .3 Inleroction~ Between Independent Voriobles 279 r.8. Accordingly. the sample aver age lest score for districts with low studen t. 8. Next compare these expected values. This is done using the procedure in Key Concept R.1.The rive n a value f(dl x O)~ r'j after the TesrScore ~ = 664.1.:ht:r rat. then the effect Oll lest scores of moving from HiSTR = 0 to HiSTR = I is fo r lest scores to decline by 1.28) )f First compute the expecled values of Y for each possible e<lse described by the set "fbinary variables.5 ) . = 0) and low fr actions of Engli sh learn ers (HiEL. the sample lIveragc is 640. Let HiSTR . i33d\.3.5(HiSTR x HiEL) . is givc n by Equa tion (8.9HiSTR .3.18.29).29) )Il ion (8.5 ::< 5. The estimated regressio n in Equatio n (8.30) also can be used 10 estimate the me8n test scores for each of the four possible combinations of the binary variables. to \\. If the fraction of Eng lish lea rners is high.1. Whe n HiSTR . be a binary variable that equals 1 ifthe studentteacher ratio is 20 or more and equals 0 o therwise.5Hi EL. the sample average is 645.4 points. and when H iS TR j = I a nti Hi£L j = 1.3) (1. the the pers oO's pe rson is Olal~ erson is female term is the dif· . test scores a re estimated 10 decli ne by 1.4) (2.us mell . .9 .3 an inter S) is caned Application to the studentteacher ratio and the percentage of English learners. the sample average is 662.teache r ratio. ~ 0 a nd Hi EL j '" l.1 . That is.18.3. [.' is [feet all log D ll : 1)10 l ~ population ept S. For di stricts with HiS TR j = 1 ( high stude ntteache r ratios) a nd Hit: Lj "" 0 (low frac t ions of English learners). I. The pre dicted effect of movi.1 . = 0) is 664.5 ( = 664. t D u : d l • D.3.9). holding constant whel her the pe rcentage of Englisb learners is high or low.
.:a n depcnd on the lmwr~' ~'.11lle intercept..PIX fhD • p)(X x D\ allows for different intercepts and different slopes. ... and (clPo + PIX+ P2f X x DI has !he some intercept but allows lor dif· l Po +PI X jJopt ./ I"U .\) . ariable X <.1opc'l Illlffodion~ of binory voriablel and conlinl. IhtTt:Tl""llt ~'ol'''''' y I fJo~ (PI +13. ) .. PI reren t slopes. a college dt!grec (D .w d I II one binury variable.._ __ J . As .2 80 CHAPHR 8 Nonlinear Regression Functions FIGURE 8. PI flo x (a) O .111 ' .J I :."T<t:pl\...ff~.. when~ f) the j lh penon i') a college graduate) . =: In( c·(ll"IIill.)1 " sJope ~ PI +P: P' l~..l. hown in Figurc RI<....the 1\\0 regrcssion lines differ only in their imcrcepL 111 C ((lr rc~ponding populntion regression model is 1 ( ~.. Utnl' ~lo~ X (b) l>uft:Tl""nI 111l1. .'.8 Regression Functions Using 8inary and Continuoos Variables (Po ~fJ2)+(fJt +p)X }' I (jJe +P1hPIX I' . fJ2D allows for different intercepb but nos the some slope.... (he popul:Hion rc~rC~' sion line relaling Y and the continuou::.... Ihc individual's years of work cxperkncc (.lOO5varioble$ can produce rhree different population regreni<N1 fvr"IChons lal f30 + PIX.... fl.ln anle f) in three diffe rent way~ In Figure RXa.. Interactions Between a Continuous and a Binary Variable Next consider (he population regression of log ~ilrnings [ Y.. ..+fl2 I" . whether the worker hl\:::........:nt mt("rt"~rt~. ditr"'rC"1l1 .JJ against onc continuous variable.. fbi fJo +. x (el 5. SbptPI+Pl po+flz r /Jf) ..
13. the two lines have differen t slopes a nd intercepts. add an int. the populatio n regression fu nctio n is 130 + f31 Xi' whcre(t ~ if D j = 1.Sb. this specifi ca tion allows for two diffcrenl population regres~ i on functions relating Yi and X i' de pending o n th e val ut: of D . = 0) and f3 1 + (3J is this eUect ror g rHduates. In Figure 8. th e two lines in Figu re 8. hold ing college degree stat us constant. (8. {31is the effect of an additional year of wo rk expe rience for non graduates (D.8.3. ~) + (3\X .that is.3 Inleroctions Between Independent Variobles 281 This is the fa miliur multiple regression model with a population regression func tion that is linear in X I and D j " When Dr = 0. . but the intercept is f30 + f31·lllUS (J2 is the difference between the inlc rCepls of the two regression lines. thc popula tion regression funct ion is (f3l) + {32) + (131 + f3J)X. shown in Figure. as shown in Figure.) and ~feV"'lir . In the earnings example. In th is specification. and f32 is the effect of a college degree on log earni ngs.an be interpreted using K(. li where Xi X Di is a new variable. (8.. The differ e nt slopes permit the effect of a n additional year of work to differ for college grad uales a nd nongraduates. The di fference betwce n the two intcrcepts is f3z. is the effect on log earnings of an add it ional year of wo rk experic nce. so the intercep t is (Jo and the slolX' is (J \.. jf D. 8Bc. Slated in terms of the earnin gs cxample.33) Ilion rcgrl!~ binaf'\'ari· ~pl. a pply the proced ure in Key Co ncept S.Jl l The coefficien ts of this specification also c. + ~Di + (3J(X i X 0 i) + Il. Ihi$ specification allows for different effect ~ of experience on log earnings between college graduates and nongradua tes. "'" O. Doi ng so shows Ihat . Thus. the population regression fu nction is (Jo + f3 1X j.8a have the Silme slope.. the effect of an additional year of work expericnce is the same for college graduates a nd llongradUalCs.. (his specification corresponds 10 the population mean entrylevel \VolgC being Ihe same for college graduates and • . is that the two Jines have di fferent slopes bu t the same intercept.eraction term to Equa tion (8. When Di "'" 1. The interacte d regression model for this case is Yi "'" Ice (X. A third poss ibility.. 8. the product of X i and Vi. 10 terms of the earnings example. To in te rpret the coeffi· cients of this regression.ffect of a n additional yea r of work expe rience for college graduates versus 110n gradua tes. but requires that c xpcclcd log earnings be the sam e for both groups whe n they have no prior expe rience.31): Yi = (3{) + 13 I X. holding years of experience constant.3. fhc co f (:. Said differently. and the difference be tween the two slopes is {3~.. To allo w for different slopes. so 13J is the difference in the e.Sb.32) l luoctiom: allows s for dif £IIrningl. + f32(Xj x V i) + II . the populat ioll regression function is f30 + PI X. + {32' so the slope remains f3. as is shown in Figure S.8a.'y Concept 8.
! = 0. + 1320 . (8.1.. are vcrsion~ of the multi ple regression model of Chapter 6 and.ari able are summarized in Key Concepi 8.): Y/ = 130 . The three regression models with a binary and a comjnuous independent . and (8. and in practice this specificaiion is used less frequently than Equation (8.2 .<. .6HiEL .4. = 130 + f3lXi + P20i + f3iX.Sb ): Y. and the continuous variable Xi can have II slope that d ep~nd s on the binary variable D . Does the eHec r on test scores 01 cuuing the student.teacher ratio depend on whet he r the percentage 01 stmJe nLSsti llleamjog English is hig. (ll.J.8e): Yi = Po + PIX.}4i "R. which allows for dif ferent intercepts and slopes. This does nOI make much sense .~)  (0..1.~7) (8.n Lhis applicatioo.:: 682 .33). + II. 1. There are three po£sibilities: 0. the coefficients of all three can be estimated by OLS.R. Different intercept.282 CHAPTER 8 Nonlinear Regression functions INTERAcnONS BETWEEN BINARY AND CONTINUOUS VARIABLES 8. d ifferenl slope (Figure 8.fl 2(Xi X 0) + /1 /. depending on whether there are a high or low percentage of English learners.: 3..31). o nce the new variable Xi x D.h or loW'! One way to answer this questio n is 10 use a specificalio n Ihat allows for '''0 dif· feT eD( regression lines.305.. i~ cre· ated . This is achieved usin g I h ~ d iffere nt intercept/different slope specifica tion: TeslScore .97STR + S. same slope (Figure 8.5 ) (O.4 Through the use of the interaction term X.32). Same intercept. Eq uations (8.28(Sf"R x mEL). 1. All three specifications. Differen t intercept and slope (Figure 8.. x the population regression line: relating Y.0.32 ). X Di) + 'I. .: 2. no ngraduates.59) (10 ... Application to the studentteacher ratio and the percentage ofEngJish learners. f31X .
5 == 0.lhe estimated regression line is 682. 'Ine difference be tween these two effects. Fi rst the hypoth esis thatthc two lines are in fact t he same can be test ed by computing the F·statiSlic testing the jo int hypoth . . isCf<· ~ pr<lclice ndent van· offni /i" !acher ral1l1 high or lOll" . X HiELjare both 7. Thc HiCa tistic i~ I = 5. t he estimated regression lin e is 682. but the tes ts of the individual hypot heses usi ng the (·slatisti c fail to reject it. the h ypot h e~is that (WO lines have the same slope can be tested hy testing wh e the r the coeffic ien t on the interaction term is zero. Finally.2 . Tnis F·statistic is 5. the hypothesis that the st udent. &.34) can be used 10 test seve ral hypothe· sc:s a bout the popula tion regression line.1. Third .O. Even though it is im possible to tell which of the coefficie nlS is non7. Thus.28STR. esi~ that th e coeffi(. = 687.25STR.004 .28/0.. the hypothesis that the lines have the sa me intercept can bc tested by testing whe ther the population coefficient un HiE L is zero.2::1 .97 = .2. • . For districts with a high fraction of English le<lfncrs (Hi EL. For dimicts wi th a low fraction of English learners (Hi EL J = O).97STR. reducing the stu· dentteacher nnio by J is predicted to increase test scores by 0. 1.lgc of students stillleam· ing English in the district hi greate r than 10% and equals 0 otherwise.8. is the coefficient on th e interaction term in Equation (R. = 1). ·n lis results in la rge standard e rrors on the individual coefficien ts. These three (csts produce seemingly coorradictory results: The jOillt test using the F·statistic rejects the joint hypothesis that the slope and t he intercept are the same. (lrc highly cor· related.teach er ra tio does not enlt:r this spec· ificalion can be tested by computing the F·statistic for the join t hypot hesis that the coefficients on STR and on the inte raction tcnn are both zero. is less tha n 1. (~ }J) where the binary variable I1iELI equals I if the perce nt.:annot be rejected using a two·sided test at th e 10% significa nce level. so the hypothesis that the lines have the same lIltercepl cannot be reject ed at the 5% level.ero. for twOcbI" I ' percent~: [ferent s\~ riEL). The reason fo r th is is that the regressors..34).ero. which has (I pvalue of 0.9.6119.32.O. .25 poinls in districts with high fra c· tions of English learners.2 + 5. which is significan t at the 1% levcl. the coefficients on the stude ntte'lch ~ r ratio a f C sta tistically significa nt al the 1% significance level . 8 . The OLS regression in Equation (8.1.97S TR.645 in absolute value. Th e I·statistic. HiEL a nd STR x HiEL. . so the null hypothesis that the two li nes have the same slope (.6 .1. Second .04.97 poin t in districts wi th low unctions of E nglish learners but by 2.:ien t on HiEL. there is strong !!videncc against the hypothesis that both a re zero.28 points. According to these estimates. an d the coefficient on the interaction term S TR.3 Intero(:~ons Between Independent Variables 283 tES ·n line n the Iws for dif· ~ o ns of the ~ D. TIlis F·statist ic is 89.
078" (0.0008) si Femafe X Years ofI! (!m::miol/ POIentiaf e. .~ .011 ) 0.cy (scc Append.f. mfJoCt Are level.(XH 6) 0.484 (0.hIo. .on' for cadi regr""..\ 284 CHAPTER 8 Nonlineor Rq~tiOl'l f UI'ICliOM I n addi tion to its intellectual pleasu res.. Slandllrd " nor< a. the fu nctional form used in Chapter 5a simple linear reiatiOn implies \hal earnings change by a conSllI nl doll ar (roil/if/li ed) TABLE 8 ..011) 0.220 1..S1.(X)1 I ) Female .242 'h.0232" (0. for al least three reason" First.006) Midwest South \\Ie..oo..0 15) 0. 3. and 1\. ~ ••• .023) 0.01 80"' " (0..1). "" '" canl h. the boxes in Chaplcl10 3 and 5 show. l"f Intercept R' I..fper ienctf O.0.721 " (0. <00 add aris. arc 1lI<l' calor va riableS de nou ng the reJlOO of the Un lled SlIlla In ".·al.09.. however. In<h"'dual COC:ITOC:lCnlS arc "ausually 51gnirlCllnl al the .1XXJ8) . II ) 0.1XXJ8) )2) (3) )4) O.621" (0..s of educaflOn 0. education has economic rewa rds. workers with morc education lend 10 earn more than their counterparts with less educa tion.. F.54 S" (0.. il failed to control for other dctcnninants of earnings lhat might be correlated with educational uchievcmcnt.237'" (O..0..1 74 1.006) .22 1  . Tepmlcd .cX~) 0.. OIl~ " Year.0. reg: 1.0861" (O. 1 The Retum to Education and the Gender Gop: Regression Results for the United States in 2004 Dependenl _rjobl.n 1I. w the OLS estimator orthe coefficient on education could have omined variable bias.n the Mid. 11. Second. The dala are from 11M! March 200S Currenl PopulauOfI Sur.()914" (0.lXlII) .(l. A!..cp. Ill!. .: logarithm of Hourly Eornings.0207· (0.018) 0..0.0016) (O. theses ""low the: CSll mllled coeffICient$.0.2 15" (0..52 1" (0.OOOJ(l<'l· · (OJIOOO IS) .30" (0.o:".:h the work er loves: For example.. MIII .mal. SI.058" (0.030" (0. Porell1iul t!.63 ob!. Sou/h..:" and "qual) 0 nth" .n pI .:ilC&lOf uriilblc thai cquab t for "vtn<"11 and 0 for men...ll!e 'lampk SlZC IS n .(122) 0..~/. The analy~ j s in those boxes was incomplete. MIII" ·<"1/ equals I If 1110:: wurke Tlivn . Ililt omitted region is North NS/). R~renor.006) ..xperien ce su: dOl ).
1:(6) . on average men and women Ita\'!: nearly I]}e same il!"els of education. These estimate~ )207·" m32·· <XXI8) ~ 1\ .nctional cation terms. The dependent "ariable is the logarit hm of hourly earnings. the umis ~ion and women have different slopes.. gender and years of cducation are (XXXII') .ts' best estimates of lhe return to education generall y fall between 8% and II %.06'X. if omitted. . ). A recent survey by Ihe econometrician David Curd (1999) of dozcns of empirical studie~ concludes that labor cconomi<.. If you are inte rested in learning mo re ahout the economic sU~lIln tial .. .. data on rulltime . from the Cu rrent Population Surve)' (the CPS dltla arc de..!!· :& \ !flbc rted III pare n· JlgIllrlC3!lCC Ir years of education differ sysremat icnlly by return to cducation.. Third.215" 1018) Third.0... Thble 8. 19. and that uses a nonlin car fu nctional form relating education and earnings. Decau~c tbe regression functions for men {4 ) ' ""'.ht tbe [)n could It.030.1 has rour salient resul t\' First."Ond. thereby addre<. so anothe r ycar of education is a~iatcd with a conSr31l1 percentage mcrease (not dollar increase) in carnings.. relative to those reported in regression (3).scribc:d in Appendix 3. OOO~ of the return to education and of gender in regressio n (I) docs nOt result in omiHed variable bias: Even though gen the gender gap still have limitations. could cause omi tted va riable bias.0.. the e<. The estimated economic retUrn on ~due~ t ion 'elation 'fl/i/lw:~dJ nt dolhlr can be addrcs.. in percent): for 16 years o f education.(06) uncorrelated. ~.. the returns to education are e<:onomicaily and regression (3). ages 30 through 64.. the tstatistic testing the ~tatisti \. regression (4) control<.3% kill) i21 "· )22) (= 0..25 (= O.' .0%.0899 + 0.!. and that the ret urn depends on the quality of the education. Controlling for region makes a small difference to the estimated coefficients on the edu that1he dollar change in earnings is actually larger at higher levels of educa tion. hich.I are oomistent with Ihose obtained by economists who cllrcfully address these limitations.(!d by a multiple regression analysis that includes those deter minants of earnings .. and 11 .~ing potential omitted variable bias that might ari~e 0.1 summarizes regressions estimated using in regression (4) is 8.1).07S'" 1.. including the possibility of other omitted variables. Scr..= in Chapler 3. 1. as measured by years since completion of schooling. Nevertheless.J cally significantly diffe rent for men and women: In hypolhe~i~ 1.. For 12 years of education.05!!.00t6). der enters regression (2) significa ntly and with a large coefficient. Table 8.. the gender gap is less in perccntage terms. the gender gap is estimated to be 27. Fourth. ( .OI8OlO.0207..99% for cach rear of education for men.3 Inleroctions Between Independent Variable" 285 at mig. and potcntial problems associated with the way variables are measured in the CPs.. See Card ( 1999).006) that they are 1he samc is 11. SO amount for each additional year of education. regres. tha t is.242 otl!Iernlioro> J W"..timatcs in Table S. notably the native abilit}' of the worker.orkers.8.0207 x 12 . ... the box in Chapter 5 ignores the gcnder diffe rences in earnings highlighted in the bo. art in(.~ion (4) controll> for the rcgion orlhe Country in which the individual lives. herefl§ onc might su~pect region.521 . The estimated cOt!fficicl\ts imply a declining marginal valuc for each year of potential experience.. the gender gap depends o n thc years o f educa tion. for the potentilll experience of the worker. 10 percent) for women. All of these Iimi1ations .
! X z. computed tor the interacted regressiun function in Equation (8. are combined with logarithmic Iran~fonnutions.286 CHAPTU a Nonlinear RegrM~ion Functions Interactions Between Two Continuous Variables Now suppose that hoth independent variables (X ].1 on the illil'r' action tern. is!:J. To sec thi!.md X 2 • above and beyond the 'lUll of the effects of a unit increase in Xl alone and a unit increm:ie in X~ alone. and X u: Y.XI .!t t~1 = (~ + fl~I)' PUlling these two eff~cts togcthe r shows that the cocfficient fl.) + "l' (K ..' i~ the number of years he or she went to school. then the effect on log ea rnings of all additional year of expe rie nce is greater. A similar calculation ~bows that thc c[fect on Y of [1 chang~ .equivalen tly.l.. i. if fll is positl\<. This interaction can h~ modeled by augmenting the linear regression modd wilh an in teraction term th. there might be an intet<ll:linn between these twO variables so that lhe effect on wngc!. 11lUs.. . is the cffcct of a unit increase in X] .111e difference in Equation (K4 ). Y ::: (f3\ + fhX2P. If the POpll lalion regression functio n is lincar. In reality.:onstanl. and the finalt crm.1I is the product of X].\' [Exercise 8.IO(c>l 111~ first term i ~ the effect from changing Xl holding Xl constant: the second lern11~ the effect from changing X l holding Xl (. is log earnings of the lih worker. Thall.:.{3\ 6. by the amount {3" for each additional }ear of education the worker has.lX2 in X 2• hllkl· . Interactions between two variables are summarized as Key Concept 8.X\ + ({32 + f3JX\)~X2 + . Y . of an additional yei\! uf expcricnci.5.: depe nds on the number of yea rs of educa tion. they eJ O be used to estimute price elasticitie:i when the price elnstic::ity depends on Ih~ which depends on Xl' For example. am.. apply th e general meth od for willputing effects in nonlinear rcgrcs ~iOll models in Key Concept 8.thc effect on wages o f nn additional year \' experience does not depend on Lhe number of years of education or.10(a»).35) .':') 'Ille intcraction lerm allows the effect of a unil change in XI to de pend on X~. if XI change!.lX . by !iXI and Xl changes hy . increa Po + {3 ]X II + (32X2i + (3)(Xli x X 2.(/3] + /3.. however. then the expect ed change in } h 6.the e[fect on Yof a change in X l' holdi ng X 2 constant I!.lX] = {3t + fl3 X 2· ~y (f06) ing Xl constant.lX1.8J. and X 2s ) are continuous".n example is when Y. 1.lXl~Xl (Exercise 8.I~ the extra effect from changing both Xl and X 2. Wheu interaction!.~X~). the effect of an additional year of education does not depend on the number \11 years of work experience. X ]' is his or he r years of ""\1r~ experience. in the earnings example. = INTI The i prod jn inXt Xz T and b.
.Ind uding this interaction te rm allows the effect on Y o f II ('hangc in XI to depend o n the value of X2 and.}¥J!VCf. the percen tage of Engl ish learners (PctE L).019 = 0. ~ ut e d rcg res for _~Xl) ~X \ Application to the studentteacher ratio and the percentage of English learners. t en ~tcr.1 2 + 0.! 10 depend on lhe value of X l The coefficient o n X I X X2 is the dft!ct of 3 unit increase in XI (/lui X~ .422. \Vhe n the percentage of English learners i ~ at the 75 th pe rcenti le ( PctEL = 23. '111 1S is true whet he r X I a nd/or X 2 arc continuo us o r binary. ·111C difference between these eMimatcd cffcCl~ is n 01 statist ically signi ficant.59) (U . ho wc ve r:The Ista tistic reSoting wheth er the coefficient on the interaction term is zero i ~ I " U. iCond ternl i) ~ccrl H5.hOld Tes/Sco re = 686.85).0).8) (0. A different way to study this interaction is to examine the interaction between the studen tteacher ra tio and the continuous va riable . inbcr of tract ion year of can hI:: ~ nn that x (8.**1 KEY CONCEPT rfwork ~ pOpu INTERACTIONS IN MULTIPLE REGRESSION The: interac tion term between th e two indepe nde nt variables X l nnd X 2 is thei r product X l X 2.1.06. kms.12STR .3 .1 2 + 0. ThaI i.()9 points.0).. (11.5 I r lalcnlly. That is.the estimated effect of a unit reduction in the studentIeaeher ratio is to inercasc lest scores by 1.0% English learners.09 ( = 1..85 % English learn· ers.teacher ratio is estimated to be lang!:: In f 1<' 1.35) characteristics of the good (see the box "The De ma nd for Econom ic lou rna ls" fo r 0 " X.. h oSl tl \' '::.019) PcrEL). Th e pre vious e xampl es co nsider ed in te r actions between th e SIlI de nt. above and beyond the sum of the individual effects of a unit increase in X l a lone and a unit increase in X 2 a lo ne.0012 x 8.the slope o f the line relating test scores and the student.:her ratio and a binary variable indicating whet her the percentage of E ng lish learne rs is large or small. r rd an c:\amp le ).0012lO.37) (8.g5).11 points. The estima te d in te ractio n regressio n is llslanl. j ~car of 8.UOI2(STR x (0..11 ( = .0012 x 23.3 Interactions Between Independent Voriobles 287 rt. ~. wit h a slope of 1.teacher ratio by o ne unit is predicted to increase test scores by o nly 1. bUI for a district with 23. jus.0. is (8."1 ~.l.36) I . which is not significant althe 10% leve l. Ihe\ call ipcnds o~ tl1~ • .IO(c)I·I'h' (3)~Xl ~·y:· i~ When the pe rcentage of E nglish le arners is a t the median (PeIEL = g.67 PctE L ~ + O. A n 8. )0 the inter bnd the sunl 'one.37) N' = 0. conversely_allows th ~ e t'fcc! of n change in X. this li ne is esti mated 1 0 be natler.Iem. by the ~n X2 . reducing the student. for a dislrict wilh 8.
. lihraries (Y1) am! ils libmry sUh$cri p lion price using data for tnc year 2{)()O for ISO eco nomics journals..9 Sublc r ipdons library Subscriptions and Prices of Economics Journals In(Subscriptiom) . p. How elastic is thc demand by libraries for ceo nomks journals'! To find OUI..90 lor ISO economics ja''nlOr~ in 2OCX>. ib price is logically me~ urcd not in dollars per year or dollArs per page but in<itcad in dollars per idea . o . .per ma tion) number of U.s [ lIlla. o s shown in f ig ure 8. Most research in economics first apPC. we analyzed the rela tio n~hip between Ihe numher of subscriptions to a journnl at U..288 CHAPTER 8 Nonlineor Regression Funclions P ro ressional econo mists (o llow the most recent research in their areas of speciali:. " fiGURE 8.9c shows thai demand IS mQ(e elastic lor young journals {Age . ) 6 5 " . a good indirect measure h. the number of times thm articles in a journal are cOnlinllell . not thc paper on whkh it is prmted but mther the ideas it contains. Although we cannot mea')ure "ideas" directly.. . .)X'r ei t3tion lo(pricc pt{ citalio n ) (b) In(Sub:<criptlons) and In{Yrlct ptr CIl<lIIt)n) In (... S..' .. o '0 ) 6 S 4 3 2 . !\fal]. .$ 0 nonlinear inverse relolion between the . •.:ation . . figure 8...51 than for old journals (Age .491' :: 80 De~nd w~n • 0 6 S l 3 2 _I 2 3 I Ln(Price per citation) (c) In(Subolcnpuo1l1) lOO 111(1'II("e. " Sf:" if llt<.80). the relation between log quan tity CW"Id tog price appears 10 be appro)(imote/y lin eo. Bul 0S }~ in figure 8...r 3 There .I 2 < P r ice pe r citatiou (a) Subscri poons lnd P rice. • • 0 II II < 2 • 2..lrs in econom ics journals..9b. 'o( Itu In(S ubscriprions) 8 1 . Bccaus.. so economists.. library $ubK riplion~ (quontity) and the ljbrary price per citotion (price). I.S.or thcir librariessubscribc to economics journals.: the product of a Journal u.
per cita tion or mu re.41" (0.'( cllario/l) . tnan $2700.'ll11eriflln Ecollomic Ri:~'if')\') to . (2) T80 observation• . we m<:llsure price as the "prin' per ci tatio n" in the jourual.9a and R lJh provide c: mpi r ical '>uppo rl fo r this transformat ion . we u~c a loglog ~ pcc itica tion (Ke) Concept 8. Ac(.626 elastic JrI'IOis  ~hc r\Iali~hc t .(96) 4..2).pf>anl at u.098) 0.38) FSlatilnu and Summary Slatiltin F~lallslic testing coefficieniS on qUddratic and cubic lenns (p.899·· (0...or (I) (3) .11Y ) 0. B ~cause casure IS the cominued al arc TABLE B. Ih<:: h\·poth~. \i\!UI:) the in Fig )ood 0..f)44 ) 0.:.en In parcmh~ un<kr F·s[llhSUC5.622 0.. sivc per citation tx c au.705 0.lInd . ~rna:1 is leT the r 'w~ iogly.156*" (0.43" (().235· (0. Some journ als an: c xpen· ties.able: logarithm of l ubse.ord Journa! (ll a l m~1 E C(lm) 1III'u'jCJ cost m OT'.229 (U.2 Oe~ndent Iw i mwll I Estimates of tne Demand for Economic Journals va.040) In(Agt'j X In(Pru'" pn ci(m/rm) In(CllllfO("lt'r.025) 0.750 0.OS33" (0.0) 0..098) 0.206(0.555 I.(XXUX X1) !nttre!:p!  0. ()th o erS because their libra ry subscription price per year is very high: III 2006.424 '" (0. a library su bscription 10 the The scaHe rplots in Fig ure 8.s. from iCl peT ci ta tiun (the .i.lIS ) 0. 3 Interactions Belweefllndependenl Voriables 289 <.9\<::n in p"r<::nlheses under codflCienl).21·· (0.'" range is enormous.: few ci l ~lion .n Y ) 0.1i91 O.8.38) 3.1<. libraries in the Year 2000..W8'· (O.0037 (OJX)55) [In( l'rice per ("luNion )f [lnr Price pt'r d /(II/ol/)]' !n(Agt' ) 0. IWlos 'leN u.0.118) 0. "..052) 0. Rej!IC".~e ther hav.1 4' " (0.. Inc:l..tlcalJv :\.S ) 3. Ih~1 the ~frKiclIIs on rlnll'Ti.. The priel.~ pt"TCllollmlWand [In(P'''~ ~r'II"lUm)I' a r~ both zew.n:.id· U~[ cuetliclCQt5 ~re ~tatlj. lndi.373" (0.Iute"'! ~re g.Ii'nlricant al Ibe ·S~ lc'cl or •• I 'l(.607 ()..017 (O.onomic Uevie:w: Because we UTl' inleresh':d in estimming clastici nOlin ead in.u..O.llbsequentty d ied by other researc he rs.38) 3.961 "* (') lnl Price PI.()~ 9 times \h~' prk <: of a library subscription 10 th.145) (0..Am ~ri{'all Ec. • .374 (O .15 I (o.13ndard Cl'mf! arc .1[14) .f + l.6SS quan ..ly !in k2 0.77" (o.
0.ec UcrgSITOm (2001).tstic for older than for newer inSl:nsith'c to price. 3.:r istics of different dis Lricts. This demand is very inelastic: Demand is VCt)" lbc regression resuhs arc summarized in Table 8. If you areinlCf eS led in learning mOfe about the cronomics 0{ economIC! JOurnals.te acher mlio by tWO s tudents per tcac he r. and it is to such an exercise that we now turn. 2.· 290 CHAPUR 8 Nonlincor R:egreutoo Functions some of the o ldest and most prestigious journals are the cheapest per citation. espcciall~' (or old~r journals.2. holding pricc and age constmlt.:n.3 to 0.3 excl ude additio na l control va riables s uc h a'i t tLe s tude nts' economic backgro und .. and mo SI impon ant. as our s uperintendent from C hapter 4 propose). after controlling fo r differences in economic ch aract. IThese dat a wcrc graciOllsly provlded h)' Profe~~or lheodore Bergstrom of the DepilTtmcnt of Economic! III the Uni . afte r takjng econom ic facto rs and non linearities into account.teacber ratio? Third. abou! the effect on [est scores of reducing tbe s tude nt.67 (SE .teacher ratio depe nd Oil the frac tion of English learners? Second.2R (SE = 0.OJ)8).tU de n t. does the eHect on test sco res of reducing the s t udent.5.teaeber ra tio.: two cont rol vari· abies. Conseque n t I)" t hese results a rg uably are s u bject to o m itted \'3T1ablc bias. t he s pecifications in Sec tio ns 8. \\ hat is lhe estimated e ffect on test scores o f reduci ng the s t udent. rather than a ~'hur cubic.teacher ro ti o. Economics jour· nals arc. having thc most recent research on hand i~ a necessity. 8. the logarithm of age lind the logarithm of t he n umber of characters per year in the journal. against log price CQuid have omitted \"ariablc bias.0. experts estimate the demand elasticity for cigarettes to be in the range of . Demand is greater fo r journals with more lIcters. funct ion of log price. these nonlinear s pecifications must be augme nted ~~ith control va riables. nOI a luxury... dOCS th is effec t depend on the value of t he slUdc nI. 18.m 80yearold journal and a yea ~old ~.S<lnlu. Barbara. Demand is less cI. s. For libraries.it)· of Califomill. First. Our regressions therefore includ. Demand curves lor .4 Nonlinear Effects on Test Scores of the Student. To d raw s u bstantive concl us io n:. ns addictive as cigarett~sbut a lot ben e Tfor your hea lth! ' jourIlnl!l. it seems. Those resulLS yield the following conclusions (see if you can find the basis for these conclusions in the table!): 1.Teacher Ratio Thi s secti on 3ddrCS5e~ Ihree speci fic q uestions about tes t scores and the :. a rtgression of log quantit). upstan arc superimposed on the !OC8lterplot in Figure 8. TIle evidence SUpp<lflS fI linear.9c: the older joumal"~ demand elltSticit) is O. " 'hile the younge r Joumal"s is . to do? . By way of comparison.06). So what is the elasticity of demand for economics journa ls? It depends on the age of the journal. To keep the discussion foc used o n no nlinear models.
as indicated by the description in e<lch row.0. As in Section 7. is regressio n (3) in Table 7. IliEL x STR.1.50 = .. Regression (5) examines whether the effect of changing the student.16).1 repea led here fo r convenience.3 i ~ tbe interacted regression in Equation (8. so the first thing we. The logarithm of income is used because tbe empirical analysis o f Section 8.rized in Table 8.3. we do not include expe nd ilures pe r pupil as a regressor and in so doing we are considering the effect of decreasing the studen t. The cha nge in the coeffic ient o n STR is large e nough between regressions (I) a nd (2) to warrant including the loga rithm of income in the remaining regressions a!> a de terrent to omitted variable bias.1 l dOC:~ f hat i~ .6.1l1e log of income is sta tistically significant at the 1% level and the coefficient on th e s\ udenttt:achcr ratio becomes somewhat closer to zero. 0 . do is check whe ther the results change substantially when log income is included as an additional economic contro l vari able.3. but in neither case is the coefficient on the interaction term significa nt at the 5% level. laheled regression (1) in rhe ta ble. When the economic control variables (percentage eli gible for subsidized lu nch and log incom e) are added [regression (4) in the tableJ. cerlain Fstatisttc.s and thdr pvalues.58/0.l1te entries in tbe table are the coef ficients. on T est Scores of the SIudentT eoc:heI Rotio 291 We answer these questions by considering nonlinear regression specifica tions 0 include two measures of of the type discussed in Sections 8. but with no economic control varia.73. The first column of regression results.y l\\ll " . The columns labeled ( I) through (7) each report separate regressions. This regression does not control fo r income. Based on the evidence in regressio n (4). lhe hypothe sis Ihatthe effect of STR is the same for djSlriets wilh low and high percentages of English learners cannot be rejected <l Ithe 5% lelfe l (t he /SLlJ tisLic is I "" .4 Nonlinear Effech.34) with the binary vari able for a high or low percentage of E nglish learners. we are oat holding expenditures per pupil constant).3. was dropped because il was nOI significa nt in \ ·ec l1 ie bk 109 'ith stu' cter ~ thC tJl. fallin g from .2 suggests tllul th is spec· ification captures the non linear relationship betwee n lest scores and income.teacher ratio.teacher ra tio depcnds on Ihe value of the student. and summary sta tistics. Discussion of Regression Results The OLS regression results are summa. standard errors. extended 1 the economic background of the studen ts: the percentage of st udents eligible fo r a subsidized lunch and the logarithm of average district income. Regression (3) io Tahle 8.teacher ratio by includin g a cubic spec ifica tion in STR in add ition to the ot her control variables in regression (4) [the interactio n term .2 and 8.1.bles.0. the coefficients change. allow ing expendit ures per pupilla incre ase (t hat is.8. The results are given in regression (2) in Table R.0010 . although it remains statisticall y significant at the 1% level.
...V) 252..O.12 ( I. d~"1Ctlbed in ArflClldh ~ 1 '\lIll.b rn C"ahr(lrnia.R ~..U3I) IO%? (Binary.7IJS 0.lIJ (b) STR' .6·· (V..J/.: ui~trici inl'Ome (logarithm) I nterc~pl 11.IIIE! X STR·..[ 'M "ltnrj.6' (8.3 Noolineor Regression Models of Test Scores ~ Dependent TOriab': R.XU) ~2.1 (50..6) (11.6.) 122..7b) (1.d .88 8. ) (7 ) StudenHeacher r3110 (STR ) Snl~ I OU ((l.7'1') 07'l" Ibex rC!U"""'\ln~ "ac C'llm~ l cJ U'll\£ tile data (\Il Ii.41~' fl.I ) /<. 420 observations.~1 ~.!).305 U.EL )( S TR.<j.04 (0.mll) Jl..!9) 12.27 ) O. and p·.% (0 (~ji) (0.~l· A\erag.I~ .~) 65.0 ( \63.l.47 (l:!7) f % English learners % English ieamc" 2: 0.n..7) HitL.2) Hi f..t "_ In dbrrict.5) (lM.50 (9..1 .% l < (Will ) ~.. 5.0 5.12" 12.75' (1..2" O.1..SS (0.004) 5..".:all) "~.. lnd.H) \1 507" (1.44) J.. ( I) (2) (3 ) (4 ) (5) (. 0 Sf.2. IJ] (t).7<)1 0.5) STR J 4 :is'' ( \.·all1CS IIrC 11''('11 In Pl'fo:nl~ under F·".'" .003) « (WII ) C). '1.O:!9) (O..04 15.6) s (1 85.42" ( 1.075"· 10024 ) O.19:"· to.59) 0.81 ) 11.\6h ({J./Iln'"".L x 5rH! 6..1 7 « IJ..50) ~ 123.?3" (0. (a) All STR vari:lblc ~ and interacuons .VKiu.101 (0.1. 11t F..5 1) . STR' 0 (e) /f.021) 0.!:1I (25 ?6 ) ).176·· (O. j( STR' ".292 CH APTER B Nonlinear Regreuion Fuodion~ TABU 8.\lCf1i<:l.80) 5.'.lrc Sl~ U!1I11.lKII) 4.'} 11 .69 (O.Jj)""· (tU1~'>J (O..f).f"'''n!.U..41 1·· (OJT..'i. nu: k.tlJl) 0.97 (0.54) ~HiEL )( SfR' % Eligihle ror subsidized lunch 0.n".1 d"ln.'J) () I n~' (O.< STR O.1J7) .lj. lllEL) 11 9.2' ('.e.0·0) OS47·" (0.420·~ O.53 (0.003) 2.1\ U.gressor .OhO (OJJ21 0.J1.34 ) &U3' (Zt86) tlUO" (2S..1 0795 K~n 85 .024) 11.Wnl CTT"" lQ fQrcnl~ urwk.2fi (1J...7.nh . K :'1 R' an: ""'.7) fStotiltiU and pValu.1 on Joinl Hypothe . n" ( 103) 816. .') 658...dr... '..\:1) h.<}2 {'I:11 (0..\ (327. ('I.26) ..059 (0..3 1151 '· ( UO) 2+1 700.
to its ~mplt.ression functions are different for dis tricts with high a nd low percentages of English learne rs: however. and STR" we can check whethc r the (possibly cubic) population regression s funct ions relating test scores and STR are different for low and high perccntages of English learners. 7 ) 5..alue wa. the predicted .X (165.% (f).4 Nonlinear Effec~ on Test Score~ of the StudentTeocher Ralio 293 (7) ~S. holding fixed other vulues of the independent variables in th e regression . In all the specificat ions. 3 :lre most easily in terpreted graphi call y.29·· l<'. although the cubi c regressions flatten out for large val ues of the studentteacher ratio. ""S of S J 1<. The resulting Fstatisti c is 2.OJ') 1]. "For each curvc.knt Yllriahle~ hy the respecti ..::ssio n (7) is <l modification of regression (5 ). To do so. with a pvaluc of < O.:~ndent YJlriahte.teacher ralio but also on the frac tion of English le(lmcrs. I6n •• 1(0. Figure 8 . 111e null hypothesis thaI the rela tionship is linear is rejected at the 1% signHica n c ~ level against Ihe uherna live Iha l it is cubic (Iht.91 (UJ))) ) 5.8.47" (1.lre ~t~ • . thC"Se fi...27) o))60U (0.402 (O..·aIUl. and th" graph ofthc resulting adjustl""tl predicted values is the ~\1m3tcd Tegre!> ~ Ion line rclalll1!: test scores !Lnd the STR...046 and thus is significant at the 5% bu t not the 1 % significance !evel Illis provides some evidence thai the reg.o!: the other variables ron~tant at their sample aver figes. regression (4) at the 10% kvell.' ) :.'lti on (2) and the cubic specifica tions (5) and (7).17.Oo. Regression (6) further examines whether the effetl of the studentteacher ratio depends not jusl on the value of the student.d values of the inde~no.~1·· (1.10 g raph~ the esti mated regression functions relating test scores and lhe sludc m. o ther than STR."]\1:<0 ~ndarJe"''' . we test the res triction that the coeffi cients on the three interaction terms ure ze ro.~ oomput<:d by setllng each ind. alo ng witb a SC3uerplot of the data..fXH). 4 These estimated regression functio ns show t he predicted v<llue of test scores as a function of the studentteacher ralio.81) 244. T he estimated regression funct ions are a\1 cl ose to each other.The coeffi ci ent!:.. hold.1lle estimates in regression (5) arc consiste nt wit h the studentteacher ratio having a nonlinear effect. Regr.69.teachc:r ratio for the linear spccific. By including in te ractions between Hi EL and STR.! Fstatistic testing the hypothesis that the true cocfficienls on SFR2 and STRJ are zero is 6.1I"cra!Jc "!Lluc and compUling thc predicted vatue by multiplyinp. ' t(lcicn!... STR".]6j ~3. in which the con ti nuous vari able PctEL is used instead of the binary variable Hil:L to control for the pe r centage of English learners in th e district.57 0. on the olher regressors do nol change subsln nlially wheD this modification is made.... comparing regressions (6) <lnd (4) makes it clear that these differences are associated \\ith the qU3dratic and cubic terms.03') r 0.: ~hmllted coefficients (rom Tohle RJ. which has a p value of 0. the hypothesis tha t the studentteacher ratio does not enter the regressions is rejected at the 1% level. intlic<ltmg thaI the result s in regression (5) arc no! sensitive 10 whal measure of the percentage of E nglish learners is actually used in the regression.'llIis "as done for "arious . '1l1e nonlinear specifica ti ons ill Tanlc 8.1)21 ) 0 .
.. ~.teacher ratios for which we have the most data.. y. .. holding co nsta nt the stude ntteache r ra tio... ... As Figure 8... 10 Three Regression Functions Relating Te5t SeOret and Studentleocher Ratio fl Di The cubic fegression~ from column~ (. .(' 64{) .... but tbe e ffect o f a c ha nge in the studen tteache r ratio is essentiall y the sa me for the two gro ups... in addition to bei ng statistically significant. :::0' • o • . .'".0 . Cllbk regression (5) ..teache r ratio does not de pe nd on the pe rcentage of Engli'\h learners for the range o f student .~.a range that inclu des ~% of the observations. '111 e diSlricts with STR < 16.teacher ratios between 17 and 23. is of practical importance.. 0 0 680 6(.... '" • . . based on Figure 8. • "# '\ iK>n. ] ... Thus.11. J .. tha t is.PTER 8 Nonl inear Regression Func tions FIGURE 8 ...11 shows. for STR between 17 and 23.. so the differences between the nonlinear regression functions nTe renecting differences in these very few districts with very low stude ntteacher ratio~.. 1~ Studentteacher rrllio R egression (6) indica tes a sta tistica ll y sigoifica nt differe nce in the cuhic regression fuoClion s relating lest scores a nd STR . )~ .~ ....teacher ratios below 16. "..• 294 CHA. _.)0•• . .the two function s are se parated hy approximately ten poi[\ls but otherwise are very similar... '...5.3 ore nearl}' identical..5 constitute onl y 6% of Ihe observations.'l. de pend ing on whethe r the pcr centage of EogJish learners in the district is large or small." . . . for student.. "~*::. ~"' ... They indica te 0 $ffiQ1I amount o f oonlineority in the relation between ~I ~Qre~ Ten score no 700 and ~tudentIeocher rolio.. 11 graphs these two estimated regression functions so that we can scc whether this di fference. we conclude t hat the effect on test scores of a cha nge In th e st ude nt.5) and (7) of T obie 8.... . • .. 11le two regression func tions are diffcn: nt for student... .• tillt'ar regression (2) _ _ • 62tJ 6OO1~CCC~~CC~=~~C~ 12 14 16 H! 20 22 24 26 . Figure 8. Cubic ~rtMion (7) ." . . :o "fIi..... • '~t . : t ... districts with a lower perce ntage of E nglish learners do better.... hut we m ust be careful not to read mllTe into this than is justified.. .
73 x 2) points.8.ression (6) provides sta tistically significa nt e videoce (at the 5% level) that tht: regression functions are diUe reni fo r disuicts with high and low per ts.. with a tant the r ratiO is ce ntages of English learners. 'Tllird .46 ( = 0. ilfter controlling for economic backgro und..11 Nonlinear Effects on T est Scores of the SIudenHeocher Ratio 295 Regression f unctions for Districts with High and Low Percentages of English learners Test score Districb with low perc:anlogcs of English learners (HiEL =" 0 ) are shown by groy 0015 and dis/riels with HiEL = 1 are ~own by colored dols. First. " I '' 28 12 __~__L__L__~__~__~__~__~ 14 16 22 u 26 20 ~ ratio Studentteacher ratio " Summary of Findings e cubic the per hs these renec.11. The cubic regreuion function ~ HiEL = 1 from regression 1 6) in tobie 720 700 680 660 610 (. • ~ . i'J. and the estima ted effect o f this red uction is to impro ve test scores by 1. wheth er there are many or few English learners in the district does not have a substantial influence on the effecl on test scores of a change in lhe student. there is no statis tically significant evidence of such a diffe re nce. this effect does nOI d epend on the student..le<lcher ratio by two SludenLS per teacher. The cubic specification in reg. howevec the estimated regression functions have similar slopes in the range of studentteacher ralios i..teache r ratio. we now can return to the superinte ndent's problem that opened Chap ter 4. In the linear specifi cations. .' • • 8 3 i ~ oppfOximaiely 10 point$ below the (ubic regression function for Hifl = 0 for 17 "5 SfR "5 23.2U ~ LI • ... • • • • •• ••• . 4 FIGURE 8.i cant at th e 1% level). Second. . " '.e rile two rune iions hove similor ~pe$ and slopes in this range.. . 10 the linea r specificalion (2). • • • • • • .'O n taining most of our data. She wants to know the effect on test scores of reducing the student. but otherw<!. This effect is statistically significant at the I % level (the coefficients on STR2 and STR J are al ways signif. ~ . fOf which there ore few observo liom . . in ure 8. as shown in Figure 8. In fhe diffe rent ead jllorc % of the ctions are t~ I C ilc her coreS of a t nlage l. there is evidence of a non linear effect on tes Lscores of th e stud en tlea ch ~r ratio.teache r ral io itself.11 des 88% n pointS These results let us answer the three question s raised at the start of this section.i have th~ • . The slopes of the rogreu ion functions differ most for very large and small values of STR. after controlling for economic background .
:uncd fo r being a bit bcwilde red abo ut willcb 10 use in (I given applical ion.S Conclusion This ch~lpler presen te d several ways to model non linear regression functions. . then based on regression (5) the eSIimated erfe. II would be convenient if there were a ~in' gle recipe you cou ld follow that would always work in every appl ication. In the test sco r~ application. while based on regression (7) this esti mate is 2. independent variable? If so.. cring cUlling it to IS.i(lOs and exe rcise judgment along the way. the unkn own coefficients can be eSlimated by OLS.. XI' holoing the other independe nt v(lriables X 2• .rc~· sion function mighl depend on the value of that . tho. . most importantly.ue. for example_.00 points.lUdenl~ would differentially benefit from more personal aUention. or another. ~ addressed by your study? Answering th ese questi ons carefully will foell:' ~our analysis.R 8 NonlineoT Regres~i on Fundion ~ nonlinear specifica tions. By making the qucs.93 points. such reasoning led us 10 in\'e~ tigate whether hiri ng more teachers might have a greate r effect in districts \\ith .'. There... and yo u could nol be ol.lnd she is considering cutting itto 20. we were able to find a precise answer: After cOlltrollinR fOf lhe'_ j".! expected effect on Y of a change in onc of th e independen t variable:.._ __ _~ ..t of this reduction is to improve lesl scores by 1. ll1e single most important step in speci fying non linear regression functiOnS is 10 "usc your head.90. can you think of a rea~on .93. llo\\ should you analyze possible nonlinearities in p racticc? Sectio n 8. and she is COINu.. \\hlch nonlineari[ies (if any) could have major implicatio ns for the substantive i~.. .e those :. X~..z. tion precise. In Ihese models. perhaps Ih:cau. what sort of dependence might you expect? And . but this approach req uires you to make dcci".I large percentage ofstudenb stilllcarning English. this effect depends on the value o r the studentteacher ralio.1 laid o ut .t that cutting the studentIeacher ralio has a somewhat greate r effect if this mlio i\ already small. then based o n regression (5) th e estim ated effect of thi~ reduction is 10 improve tesl scores by 3.and '·statistics as described in Chap ter 7.1f her district currently has a slUden tlcacher ratio of 20. in ge ne ral depends on the values of XI' x.l~ecl Oil economic theory or expert judgment.296 CH APTf. The estimates from the nonl inear specifications suggl." Before you look at the data . will ie based on regre:'siun (7) this c~lima t c h 1. arc muny dHferent mode ls in this chapter. and hypotheses ahout the ir va lues can ~c tested using /. If her district currently bas a studentteacher ratio of 22.• X .J geno:ral approach {or :)uch an analysis. hul 10 practicc data analysis is rarely that simple. . consta nt. Because these models arc variants of the mul tiple regression mode l. wh y th e slope of the populatioll n:p. b. 8.
2. and a cu bic regressi on includeS X. ~hould g. 5. Summary 1..!ssion function at two values of the independent variable(s). The product of two varil. 1.I. changes in iI v.:ne ral eClsions re a sin· 11. A polynomi<ll regression includes powers or X as regressors.hi g .:asing (has a positive slope) and is steep fur small values of X but less steep for large values of X. 4.'ariablc ·) nlly. The e ffect on r of a change inlhe independent variabk(s) can be computed by evaluating the. they allow the regression slope of one variable \0 depend on the value of a nother variable.. Small changcs in logarithms ca n be interprdcd as proportional or percentage liolls. g. which live iSSlll·~ focUS your portional changes and clasticities. we found no statistically significant evidence of such an interaction. X 2• and X l. the 10iding o n the blamed . Can you thin k of an economic relationship wilh a shape like Ihis? us 10 i tl\' ':~ Inch \\il\1 a se s\Udel'l~ Ihe qU CS ling r(lr .llled an interaction tenn. Explai n how you would specify a nonlinear regression to model this shape.lriablc.. the slope of the popUlation regression funct ion depends on the value or one or more of th e independent variables. When interaction terms are included as regressors. regrl. based on reg. A quadrat ic regres sion includes X and X 1.rc~ l.. b ut in nclions i~ 011 . In a nonlinea r regression. Key Terms quadratic rcgrc~sion model (258) tog·linear model (271) loglog model (271) interact ion term (27~) il'L1 cracted regressor (278) interaction regression model (278) nonlinear regression funl:lion (260) polynomia l regression model (265) cuhic regression model (265) elasticity (267) exponen tial function (267) natural10garithm (268) linearlog model (269) nonlinear least squares (309) nonlinear lcaM squares estimators (309) Review the Concepts K_ l Sketch a regression fu nction that is incr. R egressions involving logarith ms are used 10 estimate pro nown can be c is.lble~ b c.RevIew the Concept~ 297 'id '"' his (7 ) and Icc\ sian economic background of the students.cst tio is The procedure is summar ized in Key Concept S.
Your profes· sor says. 9_ Using the results in column (1).2 Suppose th at a researcher collects data on houses thal have sold in a partiC ula r neighborhood ove r the pas t yea r and obtain s the regression results III the table shown below. II/!. \Vbat will happen Lo the value of III if CDP increases by 2%? %a1 will happen to m if the interest rate increases from 4% to 5% '? 8. Sales2(. /1 ($1001 saJ(~l.m '" 500.. a nd an e rror term II using the equation Q :0 AKf'>I L jJ~MjJle" . Supposc you ha ve data on produc tion and the fac l or~ of production from a random sample of finn s with the same Co bb· D ouglas pro d uc tio n functio n. 2002. How could you use an inter action te rm to capture this effect? Con. How would you usc regression analys is to estimate tht: production parameters? 8.2 you lhought that the value of {32 was not can· stan t.0 and # 2 = 0. . }Joo. Compute thl! percentage increase in sales using the us ual fo rmula 100 X SaJ° SER  Sum.orate as the percentage change increases'? 8. PI' P2• a nd /33 a re produc_ lio n pa ram c te rs..xplain /1 ff Si. c. but rat he r in creased when K increased. CDP is Ihe value of (real) gross domestic product. Sales1f. . . Ii' VarianJ.S Suppose tha t in probll!m 8.'2002 ) .3 A s tandard " money de ma nd" func tion used by macrocconomists has the fo rm In(m) = /30 + t3 lln(GDP) + t32R. Repeat (a) assuming Sales xm :0 205. S..1 binary .4 You have estimated a linear regression model relating Y to X. '0 '0 Ln" Po. aud raw materials (M).XI2 = 250. How good is the approxima tion when Lhe change is small? Does the quality of the approximation deteri. T. .. what is the ex pected cha nge in prict: of bui lding a 500·square·foot addition to a house? Construct a 95% can fid e nee inte rva l for the perceOlage change in price_ . where A . capital (K). how you would test the adequacy of your linear regression. Compare th is value to the approxim ation 100 X [In (Sa le.In(Sales2(X ll )].labor (L) .1 Sales in a company a re $196 million in 2001 and inc rease to $198 million in Inter. where In is the q ua ntit y of (reil!) money.2 A " Cobb· Douglas" production function relates production (Q) to factors of produc liun. ~ ble {! . Suppose that 13 1 = 1.:" . a nd R is the value of th e n(lminai inte rest rate measured in perce nt per year. a. think iliat the relationship between Yand X is nonli near.298 CHAPTER 8 Nonlineor Regression FUn<:lions 8." E. b.02. Exercises 8.
74 0.OJO) 0.72 0.069) U.91 (0. rool5 binary 'ari· able fltf hOU.035) 6. Compa ring columns (1) a nd (2) .10) POlll x View Condition 0.:e) 0. mm ory Stotiltin iUion in SFR l?l 0. Using column (2).12 (0.03) O'<Xl78 (0.F fJrdroom s 0.045) 10.·i!.036) 7.73 0.1 2 (0.73 0.00 (0.. 071 (O.036) 0.Exercises Regression Results for Exen:ise 8. Condition 5 bind!} '~riable (1 if realtor rcporl$ house is in excellent cotldit ion.029) 0.c: (in square reet ): BedrQllnls 5 numbl:r of ~drooms.028) 0..<)99 0.50) 0." 5 bi naf) V'dnabJe (I if house: has a nice ' ·ICW.40) S. The regression in column (3) adds the number of bedrooms to the regression.027 0.054) 0. 5 house >II.69 (0.c results i(l in pric~ of 95% coP  o.055) InL~il .73 ula llXl 00 :::> 500. Uolh.035) 0.099 0.026 (0.034) 0. what is the estimated effect o f pool on price? (Make sure you gellhe unit'> right.00342 (0. How large is the estimated effeet of an additional bed room? Is Ihe effeci stati stically significant? WIlY do you think th e esti matetl effect is so small'! (llilll: Which other variables are being held constant?) • .68 (0.099 0.000038) has Ihe f (real) Ie value that (31 1creases 1105%? r profcs· Explain not con an inter· 0..021 (0.14) 0.02 (7.12 (0.1m (0.e or In(Size) to expl ain bouse prices? c.53) 0.0036 (0.0022 (0..13 (0. is it better to usc Si:.12 (0.027 (0.2 Oepe ndent vorioble: In (Price) 299 lOTS of 'T le rm Toduc· lors of Reg reSiior [ 1[ (2) [3 [ [' [ IS) as pro· tile the Sizt III{SI.!) 0.60 Intercept 6.O otherwise).) Construct a 95% confidence interval for thi s effect.39) (0.071 (0.<c: has a swimm ing pool. 0 othc:rwl:.e).n a paru.032) 0.63 (O..102 0. Vall~bk dc:tinlu005' T'riec::5 sale price (5): Soli". . r".087) 0.69 (0.c): Vic.098 0.037) Pno/ lil(W 0.Ol.037 (0.07 1 (0.029) (O..026) 0.OJ5) 0.035) 6.57 (2. oes the ang e b.071 (0.
1 and the method in Key Co ncept 8.Her til<l1l 25. and then jumps aga in Ior class sizes gre. £In educt!. tor comme nts. Consider b. and STRsJnn{/ . There are no gains from reducing cl ass size below 20 students. Is there a la rge difference'! Is the differ ence sl atisticaUy significant? 8. Is the quad rat ic term In(Size)2 important? r. Sketch the regression tunetion relating TeJrSCQre to STR for hypothe tical val ues of the regress ion coefficiem s thai are consistent with the educatoc·s statement.' ~ between 20 and 25. Consider a man wilh 16 ).:lass sizes less tha n 20.\ size is less thH n 20 students and do very roorly when c1:ts. 8. A resea rche r tries to estimate the regression TewScore. th e regression Tr.: 1\ constant fo r <. = {3r.3 Aft~r reading Ih is chapter's analysis of test scores and cl ass si?c. Why? 8.1 to estimate the expected change in the logarithm of average hourly earnings (All E) a~soci at.'·The educator is describing a "threshold effec(· in wh ich per{ormanCl. . Usc the results fro m co lumn (4) of Tahk' 8. and 2 years of experienl!":· who is from a westcrn stalC. STRmoJerole = I ir 20 ==: STR ~ 25. defin e the binary vu riables STRsmoll = I if 5TR < 20.4 Read the box ··The Re lurns to Education and tbe G e nJ ~ r Ga p " in St: c tio n tD. Repeat (a) a~uming 10 years of elCperience.0 o the rwise. a. Rather.. T {33STRlar8ei + u r and fi nds Ihal her com puter crashes. " In my experience. + II. + {3~TRlllrl?t'. Use the regression in column (5) \ 0 com pute the expected chonge III price when a pool is added to a ho use withou t a \'iew. Repeat the l!xcr cise fo r a hOll. + {J 1STRsmnllj + ~STRmoderale. and STRlarge = 1 if STR > 25.• 300 CHAPTER 8 Nonlineor R C9re5Sion Funclions e.s size is greater th'ln 25. b.srSwre.To model these threshold effects.. = p (l + {3ISTRmwll.:m addi tional year of expcricn~.ears of education.<ic wi lh a vicw..:d with . th en jumps and is COm l <1 nt for cl ass SI 7(.studenl pcrformam:<: depends on class Sileo but not in the way your regressions say. and STRmoderatt! = 0 other wise. stud ents do well when c['l. and S TRmo(IerOle = 0 o the rwise. the r~L I ' lionship is constant in the in termedia te region between 20 and 25 stude nt~ and th ere is no loss 1 0 increasing class si ze when i t is alread)' greater t han 25.
&ercises c." Journals" in Sectio n 8.. c\fl:. ) c. Using the results in reg ression (4).o the linear specificatio n in column (7) of Table 8. r than 8.0.5 Read the box "The D eman d for Econo mic. Th e box reaches three conclusio ns. lhe box reports that the elast icity of demand fo r a n ROyearold jo urnal is . Looking a t the results in the ta ble.ne d from the estimated regression ? anc!! is ss siz!!5 25. Describe a nonlin ea r specification Lhat can be used to m odel this fo rm of nonlinea rity.i>ults in column (4) ch ange? 8.io n "Sta nda rd erro rs of eSL imated effects " below Key Concept 8. he conjeclUres (ba l increases in this varia ble from 10% to 20% have little effe ct o n lest score ~ bu t tha t cha nges fr o m 50% to 60% have a much lar ge r eUecl.3.000. How would the re. I. i. A researcher s uspects tha t tbe effect of %Eligihle for subsidized Illnch has a no nli nea r effect o n lest scores.6 Refe r to Ta ble 8. 30 T ! in d. How was this va lue de te nni. D escribe a nonlinea r specifi cat io n tha t ca n be used to m odel this form of no nlinea rit y.. wha Li ~ the basis fo r eac h of these conclusio ns? b. H OIII wo uld yo u ca lculate this standard error? (Hin!: See the di scus.28. Suppose tba t tbe vactable Characters had been divided by 1. a.000.rtic ula r. xer er duca size. if the person was a woma n? From fh e So ulh? Expla in.s.'S e. To it The box reports tha t the sta ndard error fo r the estimated elasticity is 0.06..000 instcad of 1. ii. b: and i. How wo uld you cha nge the regressio n il" you suspecte d fha l fhe effecl of expe rie nce o n ea rnings was diffe re nt fo r me n than fo r wo me n? r Ihnn e re ladents.3? ii. Expla in why the a nswecs to (a) and (b) are differe nt. 1. Is the diffaence in the answers to (a) and (b) statistically signiJicanl at the 5% level? E xpla in . would you test wheth er the researcher's conjecture was bel ter tban the linear specification in column (7) of Table 8. How • . r.J ? A resea rcher suspects tha t the effect o ( income on test scores is differ ent in di stri cts wi th small classes tha n in districts with large classes. How wo uld you test whe the r Ule rese arche r 's conjecture was bet ter tha.3. Wou ld your answers 10 (a}(d) c ha nge. In pa.
Tne estimated coeffi cien t on Female is .03) (0.44Felll(lle.28Female + 0. "Illc coefficien t on Female is now 0.65. Sketch the (allowing regression fun ction':> ( with value~ of .7 Thi s problem is inspired by a study of the "gender gap " in earnings in top corp0nLtc jobs [Bertrand and Hallock (2001 )J . iv. public corporation. Ex plain what this value means.004) /I . «1.37.g. (0. T he st udy compares total compensation among top execut ives in a large set o( U. (Eacb yea r these publicly traded corporations must re port total comp~nsation levels fo r their top five executivcs. Does this regression suggest that fe male top executives earn ics':> th an top male execu tives? E xplain.37 In(MarketValue) (0.< between 5 and IOU on the horizontal axis and va lues of Y on the vt:rllci\l axis): .302 CHAPTER" Nonlinear Regression Functions 8. fn(Eami.86 . A regression o[ the logarithm o[ earnings onto Fef1wlc yielJ. i . Does this regression suggest that there is gender discrimimHion? Explain. in millions of dollars) and stock return (a measure of !irm perfor· mance. S.48 . SER = 2. Are large firm s more likely to have female top executives than ::. c. b.003) 46.05) \h i~ i.65. in perce ntage points). the market value of the firm (a measure of firm size. are added 10 the regression: [D (Earnings) = 3. Two new variables.) a.: + O.O. Explain why it has changed from the regression in (a). ii. Explain wh at this value means.04) (0. The coemcient on In( MarkcrValu e) is 0. in the 1990s. iii..()()4Rcwfn. ii.0.01) (0. Let Female be an indicator variable that is equa l to I for remales ami 0 for males.S.28. "R 2 = 0.44..670. / i~ 3 binary variahle.0.S X is a conti nuous variab le that takes on values between 5 and 100.345. Explain what val ue mea ns. 111e SER is 2.) = 6.mall firms? Explai n.
i1 = /32 + /3~Xl (effect of change in Xl holding Xl constant).wil h Z = 1. = 2.Empirical Exercises g. [)U3) hal this value it has than small a. and Bachelor. gcnder (Female). with Z + 3. Run a regression of the logarithm average hourly earnings.1 10 show: a.. = {3(1 (I.0 X In(X) + 4. on A. e. Female. = 2. Run a regression of "vcmge hourly earnings (A H £ ) on age (Age).01X 2. 3.0 X In(X). Female. . Run 11 regression of the logaritllm average hourly earnings.s r l and \00.) + [lination'! + /3 3X2 (effect of change in X l holding X 1 constant). d. l' i. I.lHim:This requires esti mating a new regression using a different definition of th e regressors and the dl!pcndcnt variahle. Same as (i ). SHme Ull (i). how are earnings expecleLl to change? If Age increases from 33 1034. Empirical Exercises E8. If Age increases from 25 to 26.\ J on the .0 + 3. Ires lot a\ l)Orations port total b. and Bachelor.9 th is ilal ExplHin how you would use "Approach #1" of Sect ion 7. bUI with Z = O. .0 1. ii. Usc Key Concepl 8.02  =.! rcgrc~ion model Y. If XI ch. See Exercise (7. = lie:::.lhen AY = (/31 + /3. how arc earnings expected to change? b.mgel! by AX l and X2 ch anges by . how I I=.~\ = {3 l + {31X 1.0 X In(X).10 Co nsider Ih(. In(A II E). Z is tl ith values ()~ .O..'er\1G:1 . l' l' = 2. how arc earnings expected to change? If Age increases from 33 10 34.::' in top 303 a. on In (Age). and 0 wll' yields 2.9)..£:e. how are earnings e xpected to change? c.XJ !1X I + (/32 + f3~XI)ilXl + (3J!:J. b. of fnm firm perfor t' c.lX1.0 X In(X) + 4. \.0 X Z x In(X).3 to calculate the conlidcnce interval discussed helow Equation (KS). 8. but with Z = O. .1 to answer Ihe following questions.] earn less 8. If Age increases from 25 to 26.0 + 3. + (3 zX Zi + f3iXlI x X 2.0X .() + 125. Ii.Xl ~2 · )U4 Refill'll.1 Usc the data ~el CPS04 described in Empirical Exercise 4. Y= l. If Age increases from 25 to 26. In(AH E).02. :lnd education (Bachelor).0 c. i.
o n Age. (C) .St' . Is the CUCCI of Age o n earnings different for males chan fo r fem al e~? Specify and estima te a regression tb at you can use 10 answer this ques· tion.2 Using thc da ta sel TeacblngRatings described in E mpirical Exercl . summarize the effect of age on earnings fo r young v. RUll a regress ion of In(AHE ). th e regression predict fo r her value of In (A H E)? What is Ih e pre dicted difference be tween A lexis's and Jane's earnings? Bob is a 30 yearold male with a bachelo r's degree. Agel. k. FenUlle. Agr? Female. and (d) fo r males with a high school diploma. What does the re gressio n predict fo r her value of In(A H £)? Jane is a 30yearold female with a high school degree. Wha t does lile regression predict for his value of In(AHE)? What is the predicred differe nce betwee n Bob 's an d Jim's earni(lgs? j. h. if Age increases from 25 to 26. Do you prefer the regressio n in (d) to the regression in (b)? E xplain. Do you prefer the regression in (d) to the regression in (c)? Explain.' out the following exercises. and Bachelor. D esc ribe the si m il arilie~ a nd differences between the estimated regrc ss~o o func tions. I. f. What doe!. After running all of these regressions (a nd any others that you w3nl iO run). What does the regression predict for his value of In(A H E)? Jim is a 30yearo ld male with 3 high school degree. how are ea ming~ expected 10 change? If Age increases from 33 to 34. What does th e coefficient on the interaction term measure? Alexis is a 30yea rold female with a bache· lor's degree. E8. Ru n a regression of the logarit hm average hourly earnings.1rry . on Age. " C. how are ea rnings expected to change? e. Btlchelor. g. Do you prefer the regression in (c) to the regression in (b)? Explain. orkers. how a re earni ngs expected to change? d.304 CHAPTER 8 Nonlinear Regrenioo Functions are earni ngs expected [0 change? If Age increases from 33 to 34. ln(AII El. and the in teraction term Female X Bachelor. Is the e Uect of A ge a ll earnings different for high school graduates than college graduates? Specify and estima te a r~g(ession that you cafl use to answer Ihis q uestio n. PIOl the regression relation between Age and In( AH£) fro m (b). Would your answer cha nge if you plotted the regression function for females with college degrees? i.
BYlesl . Is the male.k.2. Mod ify the regressio n in (a) so that the effect o f BeOltry o n Course_EmJ is di ffe re nt fo r me n a nd wome n. how a re years of e duca tion e xpected to cha nge'! c. Female. OWllh ome. ho w a rc years of educaLion expected to cb ange'! IJ Dis' increases from 6 10 7 (from 60 to 70 m iles).. OWllhome "" 0. Run a regressio n of £ 0 o n D ist. Professo r Smith is a man . how £Ire yea rs of educa tio n e)( ~c te d to clllmge? If Disl increases fro m 6 to 7 (that is. Cue80. Fema/e. d. CtieSO. co nstruct a 95% confide nce fo r the increase in his course evaluaticm . Run a regress ion of In( £D) o n Disr. Dis?. DadColI . lllcomehi.fl. e. Estimate a regress ion of Course_EvaJon Beallty. What is his value o f Beamy before {he surge ry? After th e surgery? Using th e regressio n in (c). how a re years o f educa Lio n expected 10 c hange? If Di..06. Coosider a H ispanic fe male wilh Tuirion '" $950. a. [mro.H increases from 2 to 3 (fro m 20 ( 0 30 miles).~1 increases from 6 to 7 (fro m 60 [Q 70 m iles). Add A ge and A gr2 to the regression. If Dist increases fro m 2 to 3 ( tha t is. who is a woma n. lncornehi. Mom ColI.fema le dif fe re nce in the effect of BeaUty sta tisLica lly significJ nt? d. O Wl/home. fro m 20 10 30 m iles) . B/tu. Inwmehi. hmaJ e.3 Use the dat a set C ollegeD istance descr ibed in Em pi rical Exe rcise 4.)Mm/gSO. Femaf~. CueSO = 7. and . and NNEnglish. His panic. H e has cosme tic surgery thai increases his be auty index from o ne sta nda rd de viatio n belo w the ave rage to o nc standard deviation above the average. Tuilion. l.. Hlack. how 3re yea rs o f ed uca tion expected 10 change'? ca' t t. Tuilinn. . Tuition. Do you prefe r the regression in (c) to the regression in (a)? Expla in. • . Is the re evide nce tha t A ge has a nonli near cffect on Counoe_E van Is the re evide nce tha i A ge' has a ny effect o n Course_Evan c. Ctle80. Mom ColJ :os 1. If Dist inc reases fro m 2 to 3 (fro m 20 to 30 miles) . Byrest. O wnhome. Byte. how are years of ed uca ti on expected to cha nge? b. Mom Col/. and SIwmfgSO. Black . DadCoU. If Di. from 60 to 70 mi1es) . DadCo/l. 1. 305 b. DadColI.3 to a nswe r the foUowing q uestio ns.. ]n c:omehi = 0. R epea t (d) fo r Professor Jones. e. M omColf. MilWriry. a nd Srwm/g80.Empirical Exercise~ a. Igh . R un a regressio n of ED o n Disl. By test. Hisponic.58. cs· Hispanic. OlleCrulir. a nd S llI'mfg = SlO. E8.
'iinn (2) fi ts bette r than regression (I).w.lIher did not.. run the following fj\le regressions: GrUlI'l1z on ( 1) TradeS/iliff' and Ye(/rsSclwu i .havc for Dist > 10? Ho\\ many observat ion are lhere wilh Dis. \){hal does the regrcssion predicr fo r the diffe rence between Bon nie's and Mary's years of educa tion? h.:_ tions. Alexis. E8. Alex is's mother attended college.4 Using the da ra set Growth described in Empirical Ex~rcisc 4. Would your answer c.ool. Mary. Female. > 10? f. Assassil1Glions. and In(RC DP60). In(RCDP60).s<:huol}.. BYlest. Whal dot::s the regression predict fo r the differe nce between A le xis's and Mary's yeats of educa rio n? iii. CUf!80 and Stll'm/ g80. De:tCribe the similarities and differences between the est im. Using the regressions from (I): i. (3) TradeSha rt'. Tra deS}lII n::' In ( YcorsSchoo(). Neither of Mary's parents altended collcge.:lIed regression funl. but he r mother did not. What does the regression predict lo r the d iffere nce hetwee n Jane's and Mary 's yea rs oi education? ii. and Bonnie have the same values of Di. What docs the codficient on the inte raction term measure? g. Tlli. Add the interaction term DadColi x MomColi to th ~ regression in (c).4. A mlS)'ill{l/iul1s and In(RGDP60): (4) '{rcllt.ioll. Use the plot to e ~pl ain wh\' regres. Afler run ning all of these regressions (and any others that you want to run). Is there any e\l idence Iha! the e rrect o f Di:)'l o n ED depend" on Ihe fnmi ly's income? I. D oes the rdation ship look linenr or nonlinear? Explain. H ispallic. and (5) Tra deShare. 810ck. o. Plot the regression relation between Disr and ED from (3) lind ( C) for Disr in th e range of 0 10 10 ( from 0 to 100 miks). .uint/tion '>. How does the regression (unction (c) b(. but htlr f. In(YeorsSclioo(). and '{iI/dr Share x In(YeafJSchoo() . TrudeShare1 .306 CHAPTE R 8 Nonlinear R egreuion Function$ i. Construct a scatterplot of Groui/ll on YearsScI. Janc's fat her anended college.hange if yOll plotted the regression fu n ~ tion for a while male with lhe same characte ristics? ii. A ua. Own/tol1le. I n(Yt:{/r~. Jane. (2 ) TradeSlwre and In(Yea rsSchou/). Rt'v_Coups. Both of Bonnie's pa rents attcnded col lege. Rev_Coups.· Slum'. Fill come. ex cluding tht" datu for Malta .. Rev_Coups. summarize the effect of Disr on years of ed ucation .
y can be eSlimated using an ex te nsion of OLe.2 a nd 8. a • . Use regression (5) to predici the increase in Growth. Us ing regression (4). Co Test whether the coefficients 0 11 Assassinmiom and Rev_Coups equal to zero using regression (3). Use regressio n (2) to predict the increase in Growth. Because they are lin e. In some applica tions. economic rca~ni ng leads to regreSSion fu nctions that are not linear in the parameters... te .foT t:xmn pie the adoption of databasc management softw. . Use regression (3) to pre dict the increase in Growth. b Bon I the ~ want to ~uding the 'ut/l.n \Vl1~ LQgi)·tic curve_ Suppose you a re studying the m:ukc! penetration of a technology. how eve r.m: in different industries. The depe n.r in the unknown parame ters. Use regressio n (I) to predict the increase in G rowth. den ! va riahl e is the fraction of firm s in the industry that have ado pted the softwa rc. those ptl ramctc n. bdcOl_ }anes APPENDIX 1 8. a co untry contempla tes a trade policy that will increase the average value of T.5 to 1. This family of no n linear f(..1 Regression Functions That Are Nonlinear in the Parameters The nonli nea r regression functio ns conside red in Sections 8.'gression func tio ns is both rich and convenie nt to use.:arin the parameter. a cOll ntry contemplates an educa tion policy that will increase average ye<l rs of schooling fr o m 4 years 10 6 years.R egression Fuodions Thai Are Nonlineor in the Parameters 301 ~c) he ~mc · ~w fnc b. A llho ugh suc h regressio n functions cannOt he estimated by OLS. ca lled nonlincllr least squares.adtShare from 0. We then pro vide a general formul<ltion. ~st. the. is th ere evidence that the effect of TradeShare on Growrh depends on the level of educa tion in the counlry ? In e.. l rehl 1ion i.He d.] a re nonlinear fu nc tio ns of the X s but arc linear functions of Ihe un known para me te rs. In 1960. ca n be estimated by OLS after defining new regressors that a re nonli nea r Irans{ormaljons of the origi nal X's. In 1960.'SiJort' df!S/J(1f t . [ nd Tra de  Functions That Are Nonlinear in the Parameters We begin with two examples offunetions tha t are Ilo nlin. 4) Trot/e t/eS}J//rt' ' . Usin g regression (5) is t here evidence of a nonli near relationship betwee n TradeShare and G rowth? f...
40) in which there arc k independent \ariablesandm'" I paHlmcte.. For small i~ X..·ep 1\)( lu \\ \ alu c~ of X .( ' ..:s ~m.fJu.lU\. As Cim be seen in the grdl'lJ \alu~or the logistic function has an elongated "S··shape.':~lv(1 nential growth regrl's:.r lI1~tead a function that produces prcdicted values octween II 'Ille logisti. has a slope that is gr~· .. il makl. The tiOo logi~tic function with:1 si ngle X is graphed in Figure 8. between test ~core~ and income ha... .11l In~.. The arithmic ~peci iicn t ioo has a positive slope lor all valuc!\(jf income: ho" c\er. The nega ti\'e exponeDtLaI growth function is graphed III Figure S. and fur large values of X. l h~ X's ~nl~rcd this function nonlincarly. 130. regrc~it)n than l..~ihle ~re value for a district will exceed Ihe maximum on the test The negallve exponential gro\\ th mooel providcs a nonlinear \pecification tha t hfl~ a prn.2 for some val ues of income. C'p'l' rcgr.iti\c :. d./l. hut the paf:ll11ctl!r' cl1Icred linearly. 111e dependent variabk is between U ( 110 adopters) and 1 (100% adoption) Becaust a linear and 1. Th" logi~tic regre.as income !o.1 2b.  AXI.Hl.::h large.. The slope is stt... ln the 1111. fu nction smoothly increa~es from a minimum 01Uto a maximum of I... Nrgolil'r npoll clliia/ RrOtl'th.BtJ I + (I ..2 and R3. thl. the value of the (une ncitrly 0 .) + lip 1. but as :< increases it reaches an a5yrnptolc of f3v.: function appru1lches I and the sl ope is n:H again . '" I + etA.:: [or 1111 values of income.. ...': I3tJl  e IJ.s to infinuy).: paramctcn enter nonlinearly as \\. and hus an upr~er Ixm nd (that is. .li moJo ~I ope i~ ds can produce <l ncgative vcr~ implausible.!$ of Ihis appendlX..ion mode ls are special cases o( the general nonline:H model Y..lupo. TIle fu nctions lIsed in Seclilln 8.lhl..t\c s t al low values of inc.308 CHAPTlIt 8 Nonlinear Regreslion Functions slOglc Imk_ pcndcnt \ armble X describes all industry ChMtl. G~neral functions Ih(Jt ore IlOnlinrar in the paramelers.sion mood wi th it single X is Y. _ • •• Xh . TIle negative exponential growth model is (K~II) Y.AJ 1 + Uj..:u pos.md the ~ I opc IS nat: the curve is stee per [or moot:ratc values of X .tcnstlc..c 10 usc mudd could produce predicted \u iucs less than 0 or gre.1' cis of Sections 8.. nn asymptote re~rc ~s ion a~ inC\l!tlc increaSt.".. p. /J".. (S. . p. I n the: elCarnpll..:1nd you h:1\'e data on II industnes.e some udiciellcies. I 1.11 . 111e 1 0gLstic and ncgall\'''.. For eXllrnple. /J.3'. which 10 mudcllhe rel:. m~ and decrea~e'i !l~ inCQme ri ~c:l.t.. the polynomi. so for 50me incomes the predicl. the prcdicted vtllues increase without bound.
n conSlruellhc sum or squared prediction mistakes: f(X k ···· Xl..".l3eciluse the regression model is nonlinear in the o. .(~ . which hos 0 slope thot is always posilive and decreases os X increases.X + + bkX/.. Paraml.:d u. the " \"aluesof b .41 ) C<lr\~ ""II .t. .b " '.: daHl.12 Two Functions Thot Are Nonlinear in their Parameters y 309 !lOn 11 Ilion).: f ' .:~ t imator can be computed hy checking many trial o l Ihe valllc~ ~ali\"e e.rowth nun" he (une : and for I relation Pa rt (0) plOl$the logistic furn. :2:1) . . (~ .3 of the OLC) estimator of the cocfficientsof the lin ellr multiple regre ."".l()) h i> . howeve r. p.)f..:oef thi. method is called nonlin car ICllst ... hmlf (8. Ih\. If the pa r amctcr~ ...Regre$$ion Functions Thot ATe Nonlinear in the Porometers fI GURE 8. .qofl that minimize th~' sum of squared mistllles..../Cltion (8 ..... In principle. . and on asymptote 01 flo os X tend~ to in~nity. b. For a set of trial parameter values btl.sing (he method ornl! gcts described in Section R I.. b and scltlingon OLS o.:rs of the general nonlincltr regression model in EqulItion (SAO). 'reatcr ~' ecn 0 ~ l.~ion in Section 5.The \' 1'.I.J p."!!'" ~ ~ panllnl: l.~(Il mr~s. which ha~ predicted volues thai lie between 0 and 1. The OLS estimator minimiLes the sum of "<luared prcdic lion mistakes in Equation (5..{ic n rllTW x x (b ) A Il cglciw I 'XPOlltJlU.TIn: log are knolln.~jon l is sle":p [of model.lian of EqlXllion (8 .. (a) A lo~i.8).n:::.:lcrs that enter nonlinearly cannot be estimated by OLS.2m RcclIJI the di~w.391. " param. ~ ~.. In appl icatIOns. u~d Thb ~ame approach can he ficient~ to estim:lIc the p:namct. (8.38). a of income as Hlcomt: Nonlinear Least Squares Estimation Nonlinear leasl squares is linearl~ It g~" n~"ral method for e~limating Ihe unknown paramcterl> of a regression function when those parameters entcr the populatioIl rcgres~oD function non (8."1 ~ " L ' _ _ _ _ _ __ _ __ graph. + b.0<1' ..P''' Ir rcg. Pori (b1 plOI> the negative exponential growth function of Eql. but they can be c ~ timaled hy nonlinear least ~qu3rCs.:lcrs are un known and must prl!dich:d tha t ha~ be estimated from tho. ~ i'" """. then predkled effcct~ may be compu. ! (y.
~J4. consistent with ha\'ing an asymptote.. no such gene ral formula exislS for nonlinea r leMt . the pararn· res/Score = 703. • b. and . squa re~ miz. SO the no nlinear least squa res esti mator must be found nume rically using a compuler.% standa rd crro~ Just as in linear regressio n.square. Unde r gene ral conditions on the fun ction f.4l). f3 1'.(068).. The result of estimating fJ. mator sha res two key properties with lhe OLS estimator in the linear regression model: It is consistent a nd it is normally distri buted in large samples. . a n:lativcl y simple formula expresses the OLS cstimalOr as a func.2 (hc te roskedastlcit) . As a consequence.. In rcg.39) using the California test score data yields robuststa ndard e lTo r e te r estimates) is ~ Po = 703. in t hi~ cast· quite similar.ThUS = 4.44).12) (4. infere nce concerning the parameters ciln pm.0068) (4. The two spccificllt ions are. along wi th the logarithm" regression fu nction a nd a sca n e rplot of the data.! the highest levels of income. ceed as usual: in part icular..n. ports nonli nea r least squares estimation .2[1  e~O.:! errol'> sho uld be used. plus or min us 1.81 = 0.82 "" the estimated no nlinear regression function (with standard e rrors repo ned belo\>.:~li. and a 95"/Q confidence interval can be COllstructed as the estimated coeffic1enl..0552(5£ = 0.39) is positive1 and a n asymptote of f3u as income increases to infinity. the output typically reports sta ndard error~ for the esti mated paramete rs.lt In linear regressioD. Istatistics can be constructed using the general approach III Key Concept 5.0 (5£ " 4 4S). fit to distric t income (X) a nd test scores (Y). ha~ the desirable fea tures of a slope that is always positive [if 131 in Equatio n (8. th. . One differe nce is that the negative cX jXlnential gro wth curve fl a ttens out . (8.13.l)5S2(IJI(""'.44) (0.34.1.uio n proble m. the error tcnn in the non· linear regression model can be heteroskedastic.md IheX's. mini_ Regressio n sortware incorporales algo rith ms for solving Ihe nonlinear least mato r in practIce.. Application to the Test ScoreIncome Relation A negati ve e xpone ntial growth model. the nonlinear leas t squa res r~ll. • 13mare the values of /'0' b1• ··· minimize t he sum of squared predict ion mistakes in Equation (8.0) 1 .r 131' anJ 13~ in Equation (8.48) l11is estimated regression fun ction is plut ted in Figure 8. Unfo rtunatel y. so he teroskedaslicityrobust standan.ssion software that ~lI p. whic h simplifi es Ihe task o f comp ut ing the no nlinear least squares 1..310 CHAPTER 8 Nonlinear Regression fUodions The nonlinear least squares estimators of f3v. lion of the data.
Regression Fuoctioni Tho! ATe Nonlinear in the Porometef"s hal 311 FIGURE 8.. o 20 • • '" Districi income '" . . The negative uponentialSrowth regres' ~ function [Equation (8. One difference between tne !"wo functions is that the negative exponential growth model has on asymptote o~ Income incrooW!S 10 infinity.ThUS IC para m' (8. no fl11m f cSli i ~e~.13 The Negative Exponential Gl"owttt and Linearlag Regression fund'ion s .garilh ll1iC Ihis .42) .tsup 'fS fOf I lir.lt usn  n pro mKey licicnl.t8).42)) and the Test 'Cort 700 ~eSli d:h .... in us\icil~ ..~ ' .. • • . bul the JioeorIog regreuion Junclion does nol. ums oul <II .t • • Negatlve tlj)OOeflti.: .:ll''.has lhe ivcl ~nd lInd fJ. c non crton ~~~I···'r.eorIog regression function {Equation 61811 both copNre the OOI1lineor relo lion be!"ween 1 8st scores and district income. J.
CHAP TER 9 Assessing Studies Based on Multiple Regression T he preced ing five chapteT s e>.. Whal makes a study lhal use s mult iple regression reliabl\! or un reliable? We focus on statistical sludit:s Iha1 have th e objective of estimating the ca usa l effec t of a change in some ind ependent vari able.ion [0 analyze the relationshi p among variables in a data set.: ff~C( un test scores of cutti ng the st udent..plain how 1 0 use multiple rCgTcs:. and discuss how to ide ntify those threa ts in practice. This framework relies o n the concepts of intern al and ex\c rnlli validity.and provides an introduction to The threats to the \. on a depende nt variable. we discUSS inte rnal and exte rnal va lidity.4 we assess th e internal and exte rnal validit y of the study of the . A st udy IS inlernally valid if its sta ti stiC<! 1infe rences about causal eiCec ls are valid for the populalion a nd se tt ing studied. when will it fail to do so? To a nswer this question. . As an illustrati on of th e framework of internal and exte rn al v.. In Ihis chapler. juSt as importHlllly.. such as lest scores. list a variety of possible thre..lliciit)'. in Section 9. it i~ externally valid if its inferenct"S can be gene ralized to o ther popula ti<Jns and sc ltin gs.2 .ats to internal and exh:rnai validity. In Sections 9. Secti on 9. \: SIt'P back and ask. wh en will mUltiple regression pro vide a useful estimate of the causal effect and. this chapte r presents a tramework fo r clssessing stati stical studies in generaL whethe r or not they usc regrcssion analysis..3 discusst:s a differt:nt use of regression models. · +1 .:ll1dity of forecasts madc using regression models. 'me discussion in Sec ti ons 9.forecas ting.2 focuses on the estima tion of causa l effects fr om observati onal data. such as cl<lss size. For such stud ies..1 and 9.teacher ratio presented in Chapt ers 4.1 and 9.
ed. school dis· tricts. whe ther tllc o rganic me thods tha t work in the set ling of a labora tory also work in the selli ng of the rea l world. a high school (grades 912) prindpul might "'ani 10 gene ra lize our findin gs on class sizes a nd test scores in Ca lifornia e le me nta ry school districts (the popula tion stud ied) to the popUla tio n o f high schools (the popu l<l tion of inte resl).. By "selling. provide a framework for evalua ting whe the r a sta tistica l o r econome tric study is useful for a nswe ring a specific queslion o f in terest.1.". o r the populatio n of inte rest.defined in Key eoncepI9.}. 9. K£Y CoNcEPT I 1 "A :. aod so forth fro m which the sa mple W<lS drawn. is the population of e nti lies to which the ca usa) infe re nces from the study a re to be applied . We provide other exa mples o f differenceS in popul<ltions and sett ings la te r in this section. tha t is.teacher ratio in a certain regression .1 r 9.ased and consiste nt. Internal va lidit y has two compo nents.1 Internal and External Validity 111e concepts of inte rnal a nd ex te rnal validitY.la tistical analysis is internally valid if the statistical infe nm ce~ about ca usa l effects are valid for the population being studied. th e n {3STR should be an unbi <1sed an d consistent estimator of the true population causa l effect of a change in th e studentteacher ratio.scores of a urUt chaoge in the student. the estim a tor o f the causa l effect sho uld be uobi. compa nies. The analysis is externally ''alld if it~ inferences and conclusions can be generalized from the popu l<ltion and sel ting studierJ to other popu lations ami se ttings. TIle popula tion to which the results are gene ralized. For exa mple. For exam ple. For example. a nd econo mic environ· me nt. if ~STR is the OLS estimator of the effect on test..' .people. I" Threats to Internal Validity t' y L£. In te rnal a nd e xte rna l va lidity distinguish between the popula tjon a nd setting studied a nd the populaTion a nd setting to which Ihe results 3re gene rali7. ... legal. {3STR. 1 Internal and External Validity 313 v INTERNAL AND EXTERNAL VALIDITY I ' "0' '. First ... il would be important to know whe ther the find ings of a lab· o ra tory ex pe riment assessing me thods fo r growi ng o rganic to ma toes could be g." we mea n the institutional. The pOlmlation studied is the popula tion of e ntit ies.9.' . social.eneralized to the field.
confidence interval1Should contain the true population causal effect . Differences be tween the population studied and the populat ion of inte re"t can po:. J3s n~ .' popula tio n of inte rest. lat"!· oratory studies of the lOxic effects o f chemica1s typically use animal population' like mice (the population studied) . which violates Ihe firs t least squares assumption... . <1nd tha t s tandard errors arc computed III a way that mah'S conh. causal e ffects are estimated using the estima ted regn. . dencc intervals ha\£: the desired confidence level. the true causa l effect might nOI be Ihe orome in the popu l ali (~n studied and the population of interesl. but the results are used to write healt b and safel) regulations fo r human populalions (the PO p ul ~lIion of interest). Differences in populations. sian function and hypothesis tests nre performed using the estimated regression coe(ficienL<. hypothesi. W Ith probability 95% over repeated sa mples. one threa t thai we have discussed at length is omitted variable bias: it leads 10 corrC"ialion between one or mo re regressors and the I:!rror term . Threats to External Validity Poten tial threats to extern:tl validity arise (rom differences between the popula lion and selling stud ied and the population and selli ng of interest.96SE(~'I7"H) .JI vnlidity in multiple regression analysis and sugges~ how to mitigate them. and these reasons constitute th reats to internal va lidity. There are various reasons Ihls might nOI happen. For example.·r cnee..e the study is OUi of date. or becau<. then th is threal can be avoided by including that variable as an additional regressor. <. ~ . In regression analysis. More generally. able arc available. in a study based on OlS regn:~. thi. Accordingly. beca use of geographical di nt.2 provides a detai led discussion of the vario us threats 10 intt:rn.e i. icnnce le . and confidence imervals should have the desired confidence levd .:ristics of the populations. rbcse threats lead to failures o f one or more of the leaSI S4ua res assumptions in Kt'y Concept 6A. If data on the omitted vari.I~ chose n in way tbat mak es it di(fere nt from Iht. For example. because of differences in charact. tests should have the desired significance level (the actuil] rejection rate o f the test under the null hypothesis sho uld equal its desired signil.l threat to external valid ity. and their standard errors. For example.. Section 9. This could be because Ihe population \\. Wheth N mice and men differ sufficiently to threaten the external va lid ity of such studie' L~ a mailer of debate.. if a confidence interval is constructed as ~S1'R ::!:: I. lhe requirements for internal validity are thaI the OLS estimator is unbiased and consistent.ion. cl).3 14 CHAPTER 9 Auessing Studie~ Based on Multiple Regression Second.
Differences in settings. class sizes e!>limated lLsing the C"Ilifor nia elementary school d istrict dat a wou ld gene l' ~l h 7e to coll eges.ersilies).. • . Exte rnal validity mUl>t be Judged using specific knowledge of the populations and se n ings studied and those of inlerc. I r so.1I thc California resulb might gcneralil'c to performance on standardized tests in ot her U. it might not be possible to generalize the '\tudy resu lts if th. This analysis wa:. clemen t.or differences in the physical em'! T on men! (tililga teparty binge drinking in southern California ve rSlls Fairbanks.: settings differ. the leg. elemen tary school students.:. dif ferences in laws (differences in legal pcnalties).collegc students and collegc instruction are vcry different than elemen tary school slUden ls and instruction. Sometimes Ihere are two or more studies on dirrerenl but related popula tions. Alas ka). a study of the effect on college binge drinking of an an tidrinking ad\'enising campaign might not generalize 10 another identica l group of college students if the legal jXnalties for drinking at the two colleges dif fer. Suppose fo r the moment that these results are internally valid. so it is plausible th. based on test result:. estim atcd improvements in test scorel> resulting from reducing the student.lT)' school district'\. More generally. so it is impill usibic that the effe. for California school districts.ct of reducing. curriculum.al setting in which the study was cond u et~d differs from the lega l setting to which its results are applied .sl. For example. examples of differences in sCH ings include d ifferences in the institutional environment (public universities versus religious uni. For example.e of interest.. the Mronger b the casc for external \'alidit~. the external validity of bot h st udies Clill bc checked by comparing thei r results. and orgllnilfllion are broadly simi lar throughout Ihe United Slates. Chapters 7 and 8 reportcd sta tistically signifi ca nt.5 Even if the population being st udied and the popula tion of in terest are identical.S.teacher rat io. in Section 9.1 Inlemol and External Volidily 3 1.lIld. Application to test scores and the studentteacher ratio. To what o ther populations and settings of int er est could th is finding be generalized? The closer arc the population and setting of the ~ tudy 10 tho. l mpurtant differences between the two will cast doubt on thc external validity of thc study. sim ilar findings ill two or more studies bolster claims to How to assess the external validity of a study.' other h. In th is ell!>/:. On tht. but substanli\'e ly small.9.4 we (In<llyzc Icst score and cla'l> si/c dala fur de mentary school districts in Massachusetts and compare the MassachusetlS and Cal ifornia resul ts. For example. In general.
lish. 9.2 Threats to Internal Validity of Multiple Regression Analysis $ wdies based on rcgrcssion a nalysis arc intt:rnally valid if the esLim al cd regres. eve n in large samples: omin ed variables. I A com pari5Ol\ o f many relatcd studies on tbe same to pie is callcd iI m~ta. fid e nce int e r va ls wilh lhe desired confidence leve l. whi le d iffe rences in their findi ngs that nre not readily expl ained cast doubt On their external validity. Study design . sample selection. The inte rC!5led rellder is re(errtd 10 Hed¥cs and 01ldn (J91l~) and COOJ'Cf and H( W ' (l!n4}. I How to design an externally valid study..4.. The disCu. yield con. Because threats to extern al valid .! to Shat. and the interested reader is rdc rrl:t.uss wha t can be do ne to reduce t bl~ bias. Perfonfl1nll' mellianalysis of many §iudics has its own ehaUenges. imprecise measurement of the independent vari· abies C"errors in variables").sing Studies Based on Muhiple Regre~~ion external va lidi ty..anal~·51s. All fj\'e source) of bias arise beca use the regressor is correlated with the error term in the populatjo n regression. do you sort the: good studies from the t>aJ' 110100' do you cumpBre studies when Ibe dependent "ariabtcs differ" Should you put more ". and jJ their sta ndard t ITan. !Im. mi sspeciIication of the functi onal form of Ihe regression function . sion coeffici e nLs are unbiased and consistent .s beyond the scope of this textbook. TIli s st":ctlon surveys five reasons why the OLS estima tor of the multiple regression coefficients might be biased. an d Campbell (2002). For eac h. so that the OLS estima tor is inconsistent. Om itted Variable Bias Recall tha i omitted variable bias arises when a variable that both determinl!s Y and is correlated with one or more of the included regressurs is omitted fro m the regression. and sim uil .""n III tbe box 00 the ·· ~Ol:arl cffed" in Cbapte r 6 is ba. viola ting the first least squa res assumption in Key Concept 6. ity ste m from a lack of comparability of (Xlpulations a nd senings.meous causality. we disc. .eIght (lII' large ~tud) than a small siudy? A dl~sslon of nleta·anal~i~ and IIsch ~lIe n ges goes bc)'Ond ih ( "'.ed on II mCIIlanalySiS. Cook. Thi s bias persists even in large samples. The section con· eludes with a discussion of circumsta nces Iha l lead to inconsistelll standard errors and what can bt: done about it. How best to minimize omitted variable bias depends on whe ther or not data are available for the pote ntial omitted variable.Ior' of I hi~ textbook.316 CH A'TER 9 Asses. fur example.. before the data are colleeted. these threal\ are best minimized at the early stages of a study.
this step entails i ¢~ntjfy ing those determinants of test scores that. because Ibis is done before analyzi.. If the coefficients on the add itio nal variables are statistically signilicant. On the ot ber hanq . the decision wh eth er to include a va riable invol\.'hen the additi onal variables are incl uded. Prese nting th e other regressions.3.ng the data. In the test score example. If not. in Table 8.1: mreaTS to InI8l'TlOl ValidIty of Multi~ Regreuion AooIysis 317 Solutions to omitted variable bias when the omitted variable is observed. who can Ihen draw his o r her own conclusions. In other wo rds. These steps a re summarized in Key Conccpt 9. the starling poi nt for your empirica l regression a nalysis. add ing" new vari able has both costs and benefi ls. . The th ird step is to augm ent your base specifi cation wit h the addit ional ques tionable variables identified i...sion specifica tion. This provides "rull disclosure" to a polentia l skeptic. then yOu can include this variable in a multiple regression. Th e result of this step is a base regres. this is refe rred to as a priQri (" before the fact") reasoni ng.3 are examples of this Slralegy. then the)' should remain in the specification and you should modify your base speci ficatio n. including the \'aria blc when it does Dot belong (t hat is. omi tling the variable could result in omitted variable bias. permits the Skeptica l read er lO draw his o r her own conclusions.. o[ ir the estimated coefficien ts of imeresl change appreciably . and should occur before you actually run any regressions. and a lisl o f additiona l "4ueslio nable" variables that might he lp 10 mitigate possible omitted va riable bias. t h ere b~ addressing the problem. if ignored. there are four steps that can help you decide whether 10 include a variable or set of variables in a regressio n. however. In the test score regression s. The fourt h step is to prese nt an accurate SUmmilrY of your results in lahular fo rm . For exam ple.n Ihe second ste p and to test the hypotheses that their coefficients are ze ro. then these variables ca ll be excluded from the regression. The seco nd ste p is to ask yourself: What are the mosl likcly sources of impor tant omi tt~d variable bi as in this regressio n? Answering this question requircs applying economic th eory and expert knowledge. However. this is the coefficicni on the studenttea cher ralio. beca use that regression summarizes the relevant effects and llonlinearitics in the other regressions in that table. hecause the q ucstion o riginall y posed concerns th e effect o n test scores of reduc iog the sludcnllelleher ratio. O n the one hand. The fir. wben its popu lation regression coefficient is zero) red uces the precision of the estimators of the ot her regression coefficients.9. Tables 7.cs a tradeoff between bias and variance o f Ihe coefficie nts of interest. W~ could have presented o n I)' the regrcssion in co lumn (7) . In practice. H you have daHl on the omitted vuriable.l step is 10 identify the key coefficien ts o f in terest in your regression..2.could bias o ur estimato r o f the cl ass size e ffect. Land 8.
lines to help you decide whether to include un additionul variable: !. The first sol ution is to use data in wh ich the sam e obse rvat ion al unit is observed at differen t points in time. Te5t whether addit ional question able variables have nonzero cot!CficienlS. Randomized conlrollcd experi ments nrc: diS cussed in Chapter 13.. called an instrumental variable.lcading to a base specification and some "ques t ion abJe~ variables. Data in this form an: calL ed panel daiS. As e xplained in Chapte r 1 0.lrol es regression is discussed in Chapter 12. For example. 4. Provide "full disclosure" represen tative tabulations of your resulis so thaI oth· e rs can see the effect of induding the questionable vuriables on the coem· ciel1t(s) of inte rest. pane l data make it possible to con tro l for unobserved omi tted variables as lon g as those omilted var iables do not change over time.2 If you include another variable in your multiple regression. Each of these three solu tions circu mve nts omitted variable bia5 th rough the use of difrerent types of da ta.lt a randomized controlled experiment. tben aga in in 2000. Adding an omitt ed varia ble to a regression is not an option if )'Oll do nOI have dala on that variable. tcst score ancl related data mighl be collected fo r the same dislric ts in 1995. Use a priori reasoning to idcntify the most important potential sources of omitled variable hias. there are three other ways to solve omill ~d variable bias. 2. the effect of reducing class size on student achievement) is studied usil\. Instrumental vaei. 3. .31 8 CHAPTER 9 Assessing studies Based on Multiple Regression OMITTED VARIABLE BIAS:SHOUlD I INCLUDE MORE VARIABLES IN My REGRESSION? 9. you will eliminate the possibility of omitted variable bias from cxduding that variable but the variance of the estimator 01' the coefficients of interest can increase. Here are some guide. Be specific about the coefficient or wefficients of intercst. Still . Do your results change if you include a questionable variable? Solutions to omitted variable bias when the omitted variable is not observed.t (for example. The second solu tion is to usc instrumental vari ables regression_ Th is method re lies on a new variable. The third solution is to use a study design in which the effect of intere<.
Funct ional C onn misspecification often clln be detected hy plotting the data and the estimated regression function. if th~ population regression function is a quadratic pOlynomial. so that we ended up regressing t e~i. so thi ~ mixup would lead 10 bias in the estim<lted coefficie nt .3 . things are more complicated. in gene ral. and it can be corrected by using a different fu nctional form. Wh en the depend ent vari able is continuo us (lik e lest scores). then Ihb functionaJ form misspecificlttion mak es the OLS estimaTor biased. If the fu nctional form is misspeci fied . then a regression that omit s the square of the independent variabl e would suffer fr om omitted varia ble biii5.3 Misspecification ofthe Functional Form of the Regression Function If the True populati on regression function is nonlinear but th e estimated regres sio n is linear. 111is bias is a type of omitted va riable bias.1 Threats 10 Intemol Validity 01 Multiple Regression Anolyiis 31 9 FUNcnONAl FORM MISSPECIFICATION ~ KEY CONCEPt I' ' Functionnl form misspecificution arises when the functional form of the estimated regression function differs from the functional form of the population regression funclion.to:: ache r ratio fOr tenth graders in that d istrict. th ~'y are not the $<Ime. equals 1 if th e i'h person att ended college and equa ls 0 otherwise).Dias a rising from fun c(ionall"onn misspeciJication is sum· merl'l. Y. For exam ple.in·vari ables bias • . (he depende ot variable b dis· crete or binary (for exa mple.teach er rario for elem entary sch ool st udents and te nth graders might be correlated . then the estimator oj' the partial effect of a change in one of the variables will.ed in Key Concept 9.I scores for rifth grad ers on the Sludent. A lthough tbe slUdent. Th is is an eXlIlllple of errors. Regression wIth a di screte de pendent variable is discusset. ErrorsinVariables Suppose that in ou r regression of test scoreS against the student. 9 .9. howeveI. this problem of potentia l nonlineari ty ca n be solved using the methods of ChapteI 8. be biased..teacher ralio we had inadvertently mixed up our data .l io Chapler I I. in which the o mitted variables are the terms that reflecllhe mi!'. Solutions to functional form misspeci/ication .sing nonlinear aspecls of the regression function. If.
.II .) 0 . 111 . + Vi' (9. a responden t might give the wrong answer.. Because: the error is purel)' random .P I"!J~..... = I3IJ + (3)(.'anable. lion in the Current Population Survey in volves last year's earnings.. lr the data arc collected through a survey..J + 11. we might suppose that Wi has mean ze ro and vari:1nce and i ~ uncorrelalcd with XI and the regression error 11 /. II. + 1 . equa ls the actual. ". (9. .'(tri <r] '" cr. . A respondl::l1l might nOI know his exact earnings. = Po + PIX.ri + so ~L .tnd the error te rm.X. the regression equati oo actually t:stimated is the one ba~cd on X.~ mell~urcmcnt error assumption. plus a pure ly random compone nt .j • co~ (X. suppose that the survey respondent provides her best guess or recollection of the actua l va lue of the independent variable XiA conyen ic= nl way to represe nt this mathematically is 1 0 suppose thatlhe measured value of X. err Ihe have been typographica l errors when the da ta were first entered.) = 130 + f31 X ...oo. + tli is len bi3 y. will be corre lated with the error term aDd P1 will be biased and inconsistent. l'.. the measured value of the variable. is measun:d imprecisely by X.X. Written in terms of the imprecisely measured varia bl~ Xi' the populution regress ion equalion Y. (the respondent's estimate of income). 1<... lUlld£f th. Th e precise size and direction o( the bias in P I depe nd on the correlation be tween Xi and ( Xi .(X1_1<.) • n:. is X/ = Xi + hli .!.\'.. = 131 (X . and X.. For example.. If instead the data are obtained from compute rized administrative rccor~ there mi~ht abl. SO thai the OLS estimator is inconsistent if there is measurement error.e samples.). a bit of algebra 2 ~hows th at ~l has the probability limit a.... act ual income) but that X.Xi) + u. /3 l • p '+ ... Under thi s ass umption. P.aI(X.X.:.) .) _ P Ia::_ ll'll.00\'(.. There are many possible sources of measuremenlerror. ". c.) + CO\( X.. the population regression equation written in 1ConS of XI has an error term that contains the differe nce be tween X... . As an example. one qUt:~.2) luiJ(ui .\'. II this diffe rence is corre lated with the measured value X.1 ) whe re v..L. . denoted by X. No" oj .• tld oov(X1_"'.. on the specific nature of th e measurement error. uJ' 2 + .( X. "" 1J1(X. (say.This r E En bias persists even in very larg. To sec that errorsinvariables results in correlation between the regressor :. suppose there is_3 single regressor X. the n the regressor ....320 CHAPTER 9 Assessing Studies Bo~ on Multiple Regr«wion because its source is a n error in the measu re ment of the illdepcndcll l . in turn...) '" P. hi. Because X" not X.~ jnlfll EquAlion (61) . 13 t Ux u . unmeasured va lue.. PI .so co. P.u'. ... Th us.. or he might misstate it for some other reason. u!)!P.. Accord ingly.t "" "l" fJI".This correlation depends.w. is obse rved.
I al. but is uncorrelaled with the measureme nt error. One such method is instrumental varia bles regression . The best way to solve the errorsin variables problem is to get an accurate measure of X. In the e xtre me case that the measurement error is SO large that essen tially no informa tion about X .i o .2) to com pute an estima lor of {3j that corrects for the downwa rd bias. f31' A lthough the result in Equation (9. ~1 will be biased toward 0. If the measured variable equals the actual va lue plus a mcanzero.'This bias dt:!pends on the nature of the measurement error and persists even if the sample size is large. Beca use the ratio <1! is less th an 1. 9. independently d islri buled measuremem error lerro. and its probability limit is given in Equation (9. remains. Because tRis approach requi res spe ci<tHzed knowledge about the nature of the measurement error. even in large samp les. even in large sa mples. the de tails ty picall y are specific 10 a gi\len data set and its measurem ent probJems an d we shall not pursue this approach furth er in this textbook . how ever. = 0 so ~\ .4 That is. and if she knows or can estima T e the ratio Q'~.2) is 0 and ~l conve rges in probability to O . the ratio of the va riances in the final exprcssion in Equation (9. U'. then the OLS estimator in a regression with a single righthand variable is biased toward ze ro.. if the meas ure ment imprecision has the effect of simply adding a random element 10 the actu al value of the independent variable. if possible... then ~1 is inco nsistent. to use 1he resulting fo rm ulas to adj ust the estimates. Errorsin ·variables bias is summari 7cd in Key Concept 9.2) is specifi c to this particular type of mea sureme nt error. econometri c methods can be used to mitigate errors. For example.2)..J!. This method is studied in Ch:lpter 12.9. then she can use Equation (9.2 Threats to Intemol Validity of Multiple Regression Aoofysis 321  ERRORSiNVARiABLES BiAS Ke\' CONCEPT I' .. It relies on having anothe r va riable (the " inslrumenlaJ" variable) that is correla ted with the aemal value X.:s when an independent varj· able is meas ured imprecisely.. when there is no measur~ment error. A second mel hod is to deve lop a math ematical model of lhe me asurement e rror and.variable ~ bias.4.Errors.". it illustrates the mo r~ general proposition that if the independent variab le is measured imprecisely th en the OLS estimator is biased. In the o ther extrem~. .invariables bias in the OLS estimator aris. If th is is impossible.. if a researcher belie ves that the measured variable is in fac t the sum of the actual value and a random measurement error term. <41' Solutions to errorsinvariables bias.
I 322 CHAPTfR iii A!~~ing Sludies Based 0f1 Mukip\e Regression Sample Selection Sample selection bias occurs when the availability of the da. In that example.. abilit )" luck . Th ose methods build on the tech niques introduced in Chapter 11. provides infor ma1ion 1ha t 1he e rror term in l he regression.Thus. whe ther som.ls Outperform the Market?'" provides an example of s<lmple sckclion bias in fi msncial economic$.Thus. The me thods we h ave d iscussed so fa r can nul eliminate sample selection bi'ls.:o ne has a job is in pa rt de t~ rmined by the omined \'ariabLes in the e rror te nn in the wilge regres sion.Iiffercntly. lnis seier. A n example of sa mple SCIC(. by definition . and so fo rth. The me thods (or estimating models with sa mr Je sel ect ion are beyond the scope of Ihis book. and thus appears in the data '1CI. was given in fl box in Cha pter 3. where furthe r refere nces are providl!d. B i a~ can be inuoduced whe n tht' mc th. domly se lected phone numbe rs o f automohile owne rs) was rela ted to the depen. if data are collec ted from a population by simple random sampling.5_ 11le box "Do Stock Mu tual Fun<.. For example.16 car owners with phones were more likely to be R ep ublican s. wbere one lives. Such sam pling does not inlroduct bias. when e mployed .'~ nOI imroducc bias. tIle error term in Ihe \\Jge eq uat ion for that pe rson is positi ve. and cou ld be coreela le d witb the regressors. is pOS il iv~. the sa mple se lection met hod (ran.:nt variable. tbe :oimple fact that someone bas a job. all else equal . An example of sample selec tion bias in poll in r. On ly individuflls who have a job have wages.ol1 of sampling is related to Iht' \. Sample selection that is unrelated to the value of the dependent variable dUI.a re si milar to the factors thal de te rmine how much that person earm. Sample se lection bias i~ summarized in Key Concept 9. lation) has nOlhing to do with the value of the dependent variable. Th e factors (observable and unobservable) tha t de te rmine whe the r someone has a jobcdu· ca tion. exper i~nce. the sampling method (being dnlwn at mndom from the popu. because in 19.I. TIl is 1 00 ca n le ad to b ia~ In the OLS estimator. (jo n process can inlroduce . the fact that so meone has a job suggeST S that. Sa id <..ta is innuenccd b. .:lion in econom ics a rises in using a regression of wages on educa tion to estimaT e the effect o n wages of an addiTionul year of ~du· ca ti on . l i t least on ave rage.. a selection process that is rela ted 10 tbe value urlhe de pendent variablc. Solutions to selection bias. dent vari(lble (who The individual supponed for president in 1936). 'Orrclatio n becween tb e e rror te rm a nd the rcgrc~~or which leadlii to bias in the OLS estimator.tluc of th e depend.
• " • .
while Eq uation (9.teacher ra tio is positi . and one in which Y ca uses X: ~ I Yi = f30 + (3tXi + u. If there is simuhaneous cau'a]. but bccause of the govern ment program low test sco res would lead to low studenH eacher ratios.:nt variable (X ca uses Y). is the effec t on Yof a cho.. This in turn lea d~ to si multaneous causality hi as and inconsistency of the OLS estimator. consider just th e two variables X and }' and ignore ot her poss ible regressors. In t he tes t score example.4) represents the rC\'I!f'C causal effect of test scores on class size induced by the go\'crn mc:nt progra m. we have assumed that causality runs from the regressors to the dependl. In the test score problem .teacher ratio. o ur study o f lest scores focused on the eUect on test score~ of reducing tbe student.. Sim ultan eous causa lity leads 10 correlat ion be tween th e regressor and the error teml. that a government inHm. there are (\vo equations.! to o ne o r more regressors (Y causes X)? If so.:1[ as forward. ity would rUIl in bo th directions : For the usual educational reasons low student. there is simultaneous ca usHlity. that is. cau~.1f so. But what if causality also runs from the dependent variai'll!. Accordingly.. however.) (9 ') x.324 CHAPTER 9 Assessing Studies Based on Multiple Regression Simultaneous Causality So far.teacher ralio. a negative error term in the popu lation regression ot test scores on the studentteacher ratio reduces test scores. Thu s. + VI' Equation (9. Su ppose. is presumed to run from the studeot.Ieacher ra tio 10 lest scores. so that causa lit ). one in which X ca uses Y.n X. where 1/ represents olher factors. tive subsidized hiring teachers in school districts wilh poor tcstloCores. In o the r words. the student .nlle . Equation (9. this factor that produces low scores in turo resulls in a low student.3) is the fa miliar one in which /3. = 'Yo + 'Yt Y .. for convenience. .Ieache r ra tio.I.\ . an OLS regression picks up both effects w tht OLS estima tor is biased dnd inconsiste nt.leacher ra tios would arguabl y lead to high test scores. becauSt of the government program. for exa mple.3) represents the edllC a· tiona l effect of class size o n test scores. bUI because of the government program it also leHds to a decrea~e in U le student. suppose there is an omi tted fac tor that h::<lds 10 poor test scores.cly com:lated with the error term in the population regression. il )'. This correlatio n betwee n the error te nn and the regressor can be made pre· cise ma thema tica lly by introduci ng an additional equatio n that descri bes the reverse causa l link . .and (9'. Equation (9A) represent s the reverse causal d fect of Yon X. causality runs "backward·· a..
) • oo~( Yn .. will be positively correlated.. How eve r. It) .6.To see Ihis.U. One is 10 use inslrume ntal variables regressio n.. and u.11.).) Beca use thi s can be expressed mathematically using two simultaneous equa tio ns.CO\·(A.) .) .. inconsiste nt standard e rrors will prod uce hypothesis tesls with size that diffcrs from tbe desire d signi ficance level and "95%" confidence intervals thaI fail to include the true value in 95% of repeated sa mples.. "Y1. 9.. II. rn3themlllically. is nega tive. y .2 Threats 10 Intemol Volidily of Multiple RegrM~ Ano/ysis 325 1  SIMULTANEOUS CAUSALITY BIAS KEy CONe"" SImultaneous causality bias.:qualion (9.) .B ICO' ·(X .. u. / (1 . a Jow va lue o r Y.) = l'. Simuhaneous causalit y Leads ro correla tio n between X. The second is to design and 10 implement a randomized con tro ll ed experim e nt in which the r evcne causali ty channe l is nullified . II.. this lowe r value of Yi affects the va lue of XI tlHOUgh the second of these equations.YIP ).3).and if 1'1 is posi tive. Assu ming that cov(•••1 1. Solutions to simultaneous causality bias. X.. u.... in Eq ua tio n (9.. o. arises in a r~grt::ssio n of Yon X when. nme thaI Equation (1J. . + P .6 . '1h i hov.9.) + COY (l'.. hi. u. Simultaneous causalit y bia~ is summarized in Key Concept 9. there is fI causal link from Y to X This reverst causality makes X correla ted with the e rro r term in the population regression of interest. E ve n if the OLS estima tor is co nsistent a nd the sam pIc is large. in addition to the causal link of interest from X to Y. Sources of Inconsistency of OLS Standard Errors Inconsisten t standard errors pose a different threa t to inte rnal validi ty. u.It. wiU lead to a low va lue of XI" Thus. YICID·( Y.. by I. Il.X . Yl Y... which decreases V. and such experim ents are di sc u s~e d in Chapter 13. the to pic of C hapter l2..) then Yield) the result cov(X.. L . a nd the e rro r te rm II.. + 1'.'. if 'YI is positive. ]) 'h i~ in curn implieS that oo' IX.4 ) imp!ie~ I hal OOV(X.) "" " 100' ( Y. 'Y I~ Soll'ing for oov(X. The re are two ways to mitigate si mult aneous ca usa lity bia1>..r.) . also called simuhaneous equations bias. imagine tha t II. the simultaneous causality bias is somc time~ called simultaneo us equations bias.
4. Correlation ofthe error term across observotions_ [n some . If th e omitted varia bles th at const itute the regression error are persistent (like district demographics). In many C(lses.. biased or inconsistent. howevcr. the n this induces 'serial"correlalion in the regression error over time. Heteroskedasticity. sa mpling is onl y partially random.326 CHAPUR 9 Aueuing Studies Based on Multiple Regres~ion There are two main reasons for inconsisten( standard errors: improperly han dled beteroskedastici ty and correlat ion of the error term across observations. t. {or histories1 reasons some regression software report homosked3slicityonly standard errors. He teroskedasticityrobust standard errors are provided as an oplion in modern software packages. We provide such a fo rmula for comp uting standard e rr o r~ thaI are robust 10 both heteroskedasticity and se rial conelation in Cha pter 10 (regrc:s.7 summari zes the threats fa int ernal "alidity o f a muJtrrl . The most common circumstance is when the dala are repeated observations on the same e nti ty o ver time.. As discussed in Section 5. setlin g~.hose standard errors are not a reliable ba~i. Ano ther situatio n in which the erro r te ml ca n be correlated across obser. for bypOlhcsis tests and confidence intervals.4. lht. Thi s wi ll not happen if th e data are obtained by SatnDling a t random from th e pop Ulation because the rando mness o f the sampli ng process ensures tha i th e e rrors are ind ependentl y distributed fro m one observa tion 10 the next. these omilled va riables could result in correlation of the regression errors for adjacent observations. Key Concepl 9. the same school district for different years. (he population regression error can be correlated across observa tions.are incorrect in the sense tba t they do not produce confidence intervals wit h the desired confidence level.robust standard errors and to construct FSt3tistics using a he teroskednsticityrobusl variance estimator. but it does violate the second least squares assumr lion in Key Concept 6. sian wiLh panel da ta) and in Chapter 15 (regression with time series dat a).: regression study.a tjons is whe n sa mpling is based on a geographical unilo If there are omilled \'ari abies that reflect geographic influences. Somet imes. fo r example. this problem ca n be fi xed by using an alternative fo rmu la toT stand ard errors. 'Inc solution to this problem IS to use hetcroskedasricil y.: regression ~rror is beteroskedaslic. If. howe\'Cf. Serial corre laliOn in the error term can arise in panel data (d ata on multiple districts for multiple years) and in time series da ta (data on a single district for multiple yea rs). Correlation of the regression error across observations does no t make the OLS estimato]. The consequence is U 13t the OLS standard e rro~hoth homoskedasticyonlya1rd heteroskedaslicityrobust.
tions. Using Regression Models for Forecasting Cbapler 4 began by considering the problem of a school superintenden t who waDIS to know how much test scores would increase if she reduced class sizes in her .3 Internal and External Validity When the Regression Is Used for Forecasting Up to now. Apply ing this list o f threats to a multiple regression study provides a system atic way to assess the internal validity of tbat study. E(Ui IXli"' " Xk . Errorsinvariables (measurement error in the regressors) 4.. 9. including forecasting .9. Funct ional form misspccificalion 3. concerns aboul external validity are very import ant. which in tum means that the OLS estimator is biased and '* inconsislCllt.7 1 study: l. results in f& il ure of the first teast squares assumption.) 0. as can arise in panel and time series data.. rncorreci calculation of the standa rd errors also poses a threat to internal validity. but concerns about unbiased estimation of causal effe. if presen l. Simultaneous causality Each of these. Homoskedastidtyonly standard errors are invalid if heteroskedaslicily is present.7 V THREATS TO THE INTERNAL VALIDITY OF A M ULTIPLE REGRESSION STUDY There are five prim ary threats to the internal validity of a multiple regression ~ KEYCONCEPT 9 .ts are not. If the variables arc not independent acro~ observa. Re gr~s i on modds can be used for other purposes. then a further adjustment LO the standard error for mula is needed to obtain valid sta ndartl errors. When regression models are u ~ ed for forecasting. the discussion of multiple regression ana lysis has focus ed on the esti mation of causal effeCIs. Sam ple se lection S. how ever.3 Internol and External Validity When the Reg renion 15 used for Forecasting 3'2. O mi tted variables 2.
CI plans ( 0 choose wh\!re ( 0 live based in part on the qua(jty o r the local schools.nt body and SlUdents' other learning opportunities o utside school. To be sure. (l). Thl:: pa rent would like to know how different school districts perform on standardl/\. Olaplcrs 48 focused on using regrc. How can the paren t make this forecast? Recall the regression of test Scores on the stude nt.5) could he useful to the pare nt trying to choose a home. A parent moving (0 a metropolitan art. Equation (9. Assessing the Validity of Regression Models for Forecasting Because the superin tendent's problem and the pl ren l's probli". the ca usal effect on test ')C(lr~S of class size.:tive what matler5 is whether it is a reliahle predict or of tesLperformance.5) useless fo r an swering the causa l qu~ s tion . hut from the parelll's perspel. Although omitted va riabk hias renders Equation (1).evcn tf th~]r coefficients ha ve no causal interpretation.S size.328 CHAPTER 9 Aue55ing Studies Based on Multiple Regression school district lhat is. however. Rather. sion analysis 10 estimate causa l effects using observational data. t h ~ superintendent wants to k.sl gu e~.now the ca usal effect on tC~t scores of a change in cia5. Accordingly. the parenfs problem is 10 forecast average tl!~t scores in a given district based on information re lated to lest scores.9 . it still ean be usc· ful for forecast ing purposes.2. TIle pareot interesled in Corecasting lest sco res does no t ca re whe the r the codficie nt in Equation (9.5) estimate!.d tests. the parent mu.. Nevertheless.' dis tricts to which the parent is considering moving. Now consider a different problem. the pan:nt simply wants the regression to explain mueh 01 the varirll ion in test scores across districts and 10 be stable. Thai is. 5) We concluded that th is regressio n is no t useful fo r the superintendent :Th e OLS estim ator of th e slope is hiased beca use of omitted varia bles such as the composi· tion of the stud.. More ge nerally.tha t is.teRcher rati o (STR ) (rom Chapter 4: TesrScnre = 698.in panicu. Suppose.ressioD models [or fo recasting.e. 10 apply to thl. lar.'m are conceptu(lU~ very differcn l. class si7.the requirements for the validity of the regression are different fof . This reoogn ition underli es much ot 1M use ot" rcg. regressio n models can produce reliable fore<..2R x STR .J! bow well the different districts perfo rm on s tandardized tests based on a lim ih:d amount of information.usts.". elas!) size is not the only dete rmin ant of lest perfo rmance. that rest score dala are not available (perhaps the) are confident ial) but data on d ass sizes arc. In th is situat ion.
ation is made. lbus. in fac t. if we are to obtain reliable forecasts. fi nding si mil ar resullS about the effect of the studen t. Section 9. results for fou rth grade rs in 220 puhl ic school distric ts in Massachusetts in 1998. When a regrl:ssion model is used for forecasting. • . we return 10 the probkm of assessing the validity of <I regression model for forecasting. In contrast.1 nOled Ihat ha ving more than one study on the same topic provides an opportun ity to asseSS the exte rnal va litlity of both stud ie~ by comparing their rcsult!'l. I n Part IV. Conversely. future va lues of time series dala. based on standard· ized les.»\i matcd regression musl have good explanatory power.S.that is. we exami ne a dirferen t data set. To obtain credible eSlimates of causal effects. availa ble. we con sider whether Ihe results can be ge nemli7cd to perrormance andardi zed tests in other el eme nt ary pu blic school distri cts in the o n o 'h ~r SL U nited Stal es.nse that it is stabk and quanti tatively ap plicable 10 the eirClIlllstancc in which the forecast is made. the (. Similarly. its coeffici ents must be estimated precisely. In this seetlon.9.7 . whelher il is e~te r nally va !iddepends on the populalion and ~I! tl in g to which the generalir. Both the Massachusetts and Ca lifornhl tests are broad measures ofslu dent knowledge and academic skills. fi nding diffe re nt results in the two stales would ra ise ques ti ons aboulthe internal or c:= ternal validity of at least one o f the srudie!'l. External Val idity Whether the Calirornia analysis can be gene ra lized. a paramount concern is lhat the model is ex ternally valid. we mus t add ress the threats to internal "alidity summarized in Key Concept 9.4 Example: Test Scores and Class Size The uamcwork of internal and external \alidity helps us 10 take <I critical look at what we have learnedand what we ha\'e notfrom o ur analysis of the Ca)jfor nia t~s t score d ata. although the detaib dirfer. and il must be stable in the ~ nse that the regression esti mated on o ne set of data can be reliably used 10 make forecasts using ot her data. Here. other COm para hIe data sets : nc.!>lricts). e leme ntary school di. the orga nization of classroom instruction is broadly simila r at the elementary schoollc\el in the two states (as it is in most U.teache r ratio on test perfo rmance in the Califor nia and Mas!\aehusens dilta would be evidence o f externa l validi ty of the fi ndings in Californ ia. ln the case o f test scores and class size. in the se. 9 .a lthough aspects of elementary school fund ing and curriculum differ.4 Example: TMI Scores and dau Size 329 their rcspcclive problems.
that ill. The average test &core is higher in Massachuseus. To save s p ac~. 1.1 15.lli fornia data. The genera l pattern of Ihisscatterplot is similar to that in Figure &.'arit y. or nearly so.c district in come in Massach usetts. we do not pre )~nl sc3 tre rplots of all the Massachuseus data. Because il was 3 focus in Chaptcr 8. Like the C'. Th is ~ca tt crplot is presented in Figure 9. 2. Stondotd~ MauochuH1t. The cubic regression function hJ ~ 8 .. 2. More inform ation on the Mas sac hu s ~1lS data ~e t.1. Evidently.3 % Re ceiving lunc h su~idy Ave rage district income ($) Number o( obscrvlltion~ Year 15. ho\\ ever. it is in teresting to examine the relationship between T eST ~co r es lind avcr:lg.317 420 1999 S7226  SS""" Comparison ofthe California and Massachusetts data . Table 9.8% 44.1 % % English learners 1. Cubic and logari thm ic regr~" sion functions 3rt: also plou ed in Figure 9. The average studentIeacher rat io is higher in Califomia (19.3% 27.. Test scores and average distrt'ct income. including definitions of the variables.etts sam ples.:lchcr rat io  654. is given in Append ix 9.3)..me is grea ter in Cali fo rnia. but the test is dif£erent. the re is 3 greater spread in average district inco mes in California than in Massach usetts. The average perce nt age of students still learning Eng1i~h and th e ave rage perce ntage of s tudents rece iving subsidized lunches arc both much highe r in the Ca lifornia than in the Massachusetts d istricts.9% 15.8 173 I ~ Standard Oeviot>o" Test scor~s StudeDtt ~.1 19. A_.1 Summary StatistKs for CalifornKJ and Massochusetts Test Score Data Sets Colifornia A. Ave rage distr ic t income is 20% higher in Massachusem" but the standard deviation of inco.1. so a direct comparison o f scores is not appropriat ~. 1h e linear regression plOl' ted in the figure misses this apparent nonli m.6 15..7% 19. 7011.lifolll l<t data:The relal io nsh. 1 presents summary statistics fo r the California and Massachu:.1% $1 5.3% $I K747 220 199..2 for lhe C.6 versus 17.ip be1ween income and lest scores appears to be steep fo r 10\\ va lues of income and na lter for high values..1 '0  .330 CHAPTE R 9 Assessing Studies Based on Multiple Regression TABU 9... The definit ions of the variables in lhe Massachusett s dala set arc the s!l me as those in the Cali fornia data se l..1 I 19 18. the Massachuseu s data are althe school district leve l.
The (jrst regressio n. rpresc nt F9.2 than Ihe log<l ril bmic specificatio n (0.64 in regression (3).. iOn paverJgl! [r S..69 in regressio n (2) and . UbK regJ '..Ieacher ratio by 60% .7 and 9.L "''''. I() 50 Dim·jet income L _____________________________________ (Thousands of do Uars) rferent. 7<)(' l " .'. hls o nly the stude ntteacher ratio as a regressor.wf of the ~~ '~I e Cali· nitions c Cali· la ta SCi. The estimated lineorIog and cubic regreuion functions or. reported in column (1) in the table. luc both Multiple regression results..72 /050 :: . ""'•. from .1 Test Scores V$. the region conlolnll'9 mos! ob~I'YOIion$. ~ .455) . • .xample: Tesl5cofM and Clo~~ Size 33 1 FIGURE 9.. The remainingoolumns report the results of including additional variables that control for student characteri stics and uf imroducing nonlinearities into the esti mated regression function. T he slope is ncgalive (1...000.72). The precise functiona l forms that best describe this non li nearity differ.9.3.44). Regression result s for the Massachusetts data are presented inlable 9. huselt~ n 10 20 .cacher i~ 20% in Cali liforoi3 EngJ~h slightl y higher R. the pe rcenlllge of students eligible for a free lunch . also prescnt in the Massa chu se lh dfll3 . a nd average district income reduces the estimated cocfficienl on the student. ".0.1 s how~ th at the ge nera l pallern o f nonl inearify found in the California income and (.2. ho W> Falifo rniJ ep for 10 \\ r pl(l~> ~IC regrc~> ilion ha~· • .Massachusetts Dota 78il The esti mated li rlCOr otiol'l r l«ar rrgression regreu lon function does no! capture the nonli near relotion between income and te~t scOte$ in the Mos$OChur. 7 ~t ~ 74() ~ " 710 f .1. Comparing Figures 8. however.e1ts dalo. Controlling for the percentage of English learners.e~t score data i:. and the hypOthesis that the coeffici ent is zero can be rejected at the I % signifi cance level tr :: . ". t.486 ve rsus 0. with the cubic specification fin ing bes t in Massachusetts but lhe linear· log speciIicalion fitting best in Californ ia.. Income fo. similer for district inc:omes between $13.4 E.Tht.000 ond $30.72 in regr~s i on (1) 10 0. 6&Ji r..1.
0010) District inwrn c.434 (0 . even holding consta nt the :.87· (2.0.0023· (0.300) .4" (11.0022 (0.80 (0.097) .l Intercep.27) (3 ) (4 ) Studentteacher ratio (STR) STR2 .6&1 (0.. fUll } (0.521'" (0.0. Regressor (1 ) I' ) 0..CXHO) 744.35) 0.S5) 0.72) .67* (027) s· '" .184* (0. HiE L ) o.o53 u (0.a t isticall y significant evidence o f a nonlinear rela· tionsh ip between test scores and the st ude nt.0..1.0.IdentTeacher Ratio and Test Scores: Data from Mos5Q(husetts .15) . q Aueuing Studie~ Bo~ on Multiple Regres~ion TABLE 9. 437 (0..5· · (81.[U · de ntteache r ratio.013) .0010) 7 4 7 .31 ) 0.8) S£ R' .6 (9.5) 0. moth . .2 Multiple Regression Estimates of the Sl\.303) . /li STR·' % English learn ers % English learne rs > median? (Binary.1 cm:ww rJi Comp<l ring the :R2's o f regressions (2) and (3) indicates thai the cubic specifi · cation (3) provid es a better mode l of the re lat ionship betwee n test scores and income than does (he logari thmic specificati.306) . O.49) 0.6 (8. = Hi£L x 5TR % E ligi ble foc free lunch 0.1) { TaMe 9.3.37) 0.on (2).50) 0.104) .69* (0.3) 759.6) 682.·ut (0.0010) 665.value of 0.64* (0.0023* (0.3)  0.1 74 (0. there is no evidence that a reduction in I h~ student.2) (20. .02" (0.12.587" (0. fourth grade.3." "ore in tne f(hool district.0022· (o. 739.9" (23.077) 16..O .l..i2 u (0.091) Dislrict income (logarithm) District income District incorne l . A ") 16) OependenT voriable: average combined English.56) .teacher ratio has a di[fere nl effect in districts wi th many En glish learners .OS9) 0. and KierK..38 (2.641.0. Similarly.165 (O.l 07 (2.737) '.27) 12.0.0. 332 CHAPTER.4· ~  0 .49) .0" (21.53** (3.164 (0.22 (2. There is no st.7~ * (0..teacher rati o : The F· statislic if) regression (4) testing whether the population coefficients OIl STR2 and STR J are zero ha s a p.0) .4 (1 4. 220 obHrvorions.085) .090) 3.582" (0.
T ics.670 0.675 8. W· Comparison ofMassachusetts and Call/ornia results. Sland~t<. n!im~t~d USlIl! th« data on MassacbuseU$ e1emclll. regression (1)] to . (I I All 5TH.61 0. Adding variables Ihat con lrol for siude nibackgrou. Indi.'> • . a reduction of 68%. income' JIiEL. .1 rol1l.a ry :scboo] dl~lncl~ dncritx<.001)  «0.~dual "'odfincrm arc Stallst lC4lly significanl al th e '5% level or . . '_g1 are 1 in InC !arner.063 0.64 0. we found: I) " IHIII(d) J.5) ~HO) I "~ tban wilh few (the Istalisticon HiEL x 5TR in regr~i on (5) is 0.:.Tatt.63 0.l~~ ar~ given in parenTheses under th~ F. The effect of cutting th e student. 1. variables and lIl t ~ract i on s .038) 0. . are gwen in parentlle!>C's unde r the . even after adding variables tha t con trol (or Siude nt backgrou nd a nd diSlrict econo mic c haracleristics. 2.69  8.64 1) 4.56 = 1.85 (0.86 (0. Ihe res ulls in regression (3) are not sensitive to the changes in functional form and specification considered in regressions (4){6) in Table 9.674 R' 0.74 7. (3 ) .2. H.teacher ra tio does not change substanti ally when the percentage of English lea rners (which is insignifica nt in regression (3)1 is excluded.01 '" .'w~d) ..l errQ n.4 &omple: Tesl Scores ond Cion Size 333 ( TQh~ 9.. There is some evidence that the relationship between test scores and the stu dent.675 TIwlot regressions wer".002) STR 14.3.EL S£I< 7. ~nd p'~' all..4 . For the Ca lifo rnia data.80/0.. regression (6) shows that the estimated coefficient on the student. 1.020) STR!.62 8. pt:!cifi es and he stU i ~tic. regression (2)1. 13 ~~ .0_73 ITable 8.odficicnts.nd cha racle ristics red uced the coeffi cienl on the studentteilcher ratio from .01 % le vel. STRJ '" 0 Jllcomr. The hypothesis that the true coefficie nt on the studentteacher ratio is zero was rejected at The 1 % significance level.676 8. Testing Exclusion of Groups of VClrioble..55 (0. In shOJi.20S) 6.001) X 5.2. in n rela 3.003) 1.teacher ratio did not depend in an important way on the percentage of English learn ers in Ihe distri cL 4.28 [Table 7.45 (0. Therefore we adopt regression (3) as o ur bafie esti male of Ih~ effect in test scores of a change in the siudent.Ieacher Tfltio is nonlinear. 2. 0 (0.75 « 0.9.Statistics Clnd pValue.64 8.5S (O.l in Append»:: 9.leache r ra lio based on the Massachusetts dat a.2.431 Finally.
there arc nearly t". and the average district income incl uded as conlro l variables.. ru  catif nca CubiC" Rt'I/m Cubic: R.72 [Table 9. the lest scores are pur into the .73.0. a Jeduction of 60%.11)c fi rst column re ports the OLS est ima tes of the coefficient on the studentteache r ratio in a regression with the perce ntage of English learne rs.h. i c~ ilS ma ny observations in the Califo rnia data. Beca use the standard de viation of test scores is 19. th e coeffi cients th emsi. (2).lbe standard error of this estimate is 0. does oo t hold up in the Milssachusetts d ata: Th e h~ path· esis that the relationship betwee n the s tudelltt e~c h e r ratio and test sco reS is lin ear can not be rejected at the 5% sig nifi cance level whe n tested aga in st a cuhic specification. regression (I )Ito 0..c and divide by the standard deviation so thattbcy have a mea n of Uand a varianct: of 1.teachl" ra tio re main signific:::a nt aft e r adding the eo nlrol variables.46 poin ts. However.2.tandard deviation units.. Includi ng the additional control variables reduces the coefficient on the student.teacher ratio by two. and second in . the n the estimated class size effects can be compared.69 [Tank 9.teacher ratio by two students per teacher (our s uperintenden t's proposal).teacher ratio and Ihe binary v.'6/ 19. so it is no t surprising that the Califor lllil cstimates are morc precise..l across the two data sets. As io the Ca lifo rnia da ta. however.teache r ratio.aliforn ia dat a is . first in the umts of the test.teacher ratio from 1 .niable indicati ng a large percentage of Eng. lliis comparison is undertaken in Ta ble 9.. Mass1 Lmea~ Slandari . Fi nding (4).1 r TAB l. s estimated to increase district test scores by _0. and (3). the answer is } c$. there is no statistically significant evide nce in the MassachuSCIIS da ta o f an inte raction bet wee n the studen t. The coefficien ts on the studcnt. however.26 X 2 / 19.d!/(. The seco nd column report s the s ta nd ard deviation of the test sco res across districts.1 pomt. The slope cocfucienlS in the regression with the transformed test score equal the slope coefficie nts in the origi nal regressio o. regression (2)1. whereas they a r~ '). the percentage of students e ligible fo r a free lu nr. ca n he cOOl'p<lJe<.2. so cutt ing the st udent. liw O LS coefficie nt est imate using C._ lish learne rs in the d istrict...ame unit s...334 CHAPTU 9 As~ssing Studies Based on Multiple Regression Do we fi nd the same things in MaSSlichusem? For fi ndings (1 ). The fi na l t\HJ columns report the estimated effect on test scores of red ucing the student.1 = 0.7.!!" nificant at the 1% leve l in th e California data .076 standard deviations of the distribulton vi test scores across districts. If. Because the two standardized testS are different.. Those cocfficientl) arc only significant at the 5% kvel in the Massachuse l1s da ta. divided by tht: 5lim· dard dc"i(l lion o f lest scores. O ne way to do th is tS to transform th e test scort:s by standardizing Ihem: Subtract the sample a\'crag. For the linear specification.1 X ( 2) = 1.!h·c~ cannot be compared directly: One point on the Massachuse tts test is not the s<lme as one point on the California lest. 'Tnu5 the coefficie nt on the studen t. this corresponds to 1.3. divided by the standard deviation o f the tesi.
TIle non linea r models fo r Californ ia data suggest a somewhat larger effect.085 standard deviation uni t.. Reducing the student. In the Cal ifornia data. The estimuted effect from the linear modd is just over oneIent h Ihis size.3.027.0. .46 (0. fo r example.037) ruN.thle 8. Cutt ing the student. Based on the Massachusctts data.73 (11.64 (0. with the specific effect depend ing on the in itial stu dent. on the Te. with a standard error of 0. CaHtom io Standard Devtarion of T.s' Stores Across Dislri<'.0.1)..: T.teacher ratio is predicted to raise tcS! scores..26) 19.076 (0.1 1. CUll ing the Slu dent tcacher.2 / 19.099 (0. l. 0.·en In r arcnlh . 1) standard deviations.27) 15.teacher ra lio by two is a large change for a district.70) 190 (0. Ratios and Test Scores: Comparing the Estimates from California and Mossochusetts htimoted Effed of T_ F.036.teac he r ratio. so:.027.027) 0 153 (0.lnd .4 Example: Test Scores and dau Size 335 TABLE 9. In Units of: OLS Estimate 8Jr.1 19.' Deviotton* unear:Table 8..036) Lme!lrTahlc 9. or 0.ralio by twO would move a district on ly one ·tent h of the way from the med ian to the 75 th percentile of the distribUlion of test scores across d istricts.3(2) .3 StudentTeach. Based on the lin ea r model using Ca lifornia data. but the predicted improvement is sma ll.1 j 1. r Standard Pain'. arc small.085 (OJI36) to.soc hu Mtt$ 19.69) 1 0. the differe nce in test scores bet\\cen the medi an district and a diSlricl a t the 75 1h pe rcentile is 12.lble 8 3(7) RtJII('t oS (R Irom 22 11120 Mo. The estimated eff~c ls for the non linea r mode ls a nd thejr Slanda rd errOrs w~re computed using the method described in Section 8. nJ ~nQn Ire &I.w~ Stvdenf$ pet' TeeKher.076 standard deviation unit . this c:slimaled dfecl is 0.2 test score points (Table 4. but the estim ated benefits shown in Ta ble 9.54) !:>t.64 ( = 12.9. per T eacher is es lim at~d to increase T wilh B standard erro r ofO. while nonzero.2(3) ..1 2.52) 1 0.according to this esti mate.2H 0. These estimates arc essen tially the samc.93 (0. in other words. a red uction of two students est scores by 0.3(7) Rt rillCt srR from 2f)w 18 Cui"llc·1'.
Such omi ued factors could lead 10 om il1ed variable bias. and their subsequent performance on standardi zed tests could be compared.1ies that are more com mi11ed to enhancing their children's learni ng at ho me.2 listed five possible th reats to internal validity that could induce bias i. Functional form . while those th at wen. A[~o. . Possible omitted variables remain. lO conduct an experime n1. i!. diSlriCIS with a low student teacher rat io migh t also offer man y extracurricular learning opportunities. Such a study was in fact conducted in Tennessee.: did nOt substantially ah d the estimated effect of reducing the studcntteacher ratio. We consIder these threats in turn..336 CHAPTER 9 Assessing Studies Based on Multiple Reg r~sion This ana lysis of Ma~sachusetts data suggests that the California results <Ire externally valid.n the estimated effect on test scores on class size. Internal Validity T1le sim ilarity of the resulls (or California and Massach usetts does not ensure their imemal va lidity. this suggests tha t tho. and we examine it in Chap ter 13. For example. if the studenHeacher ratio is correlated wit h teacher quality (perhaps beca use better teache rs are at tracted to schoob with smaller stude nt. students could be randoml y aSSigned to different siu classes.teache r Hlli05). We found that some of th e possible nonl incarities investigated were not statistically significant. such as other school Hnd student charac teristics. and a broader measure of the affluence of the di suict (average district income).) Ihe coe ffi cie nt o n the ~lUdentt eac her ralio. TIle multiple regressions reported in this and previ ou~ chapters control for a student characterist ic (the percent age of English learn ers ). Section 9. districts W iUl a low studentteacher ratio might attract fami... and their omission might cause omitted variables hias. Altho ugh furthcr rune' tion<l l fo rm ana lysis could be carried oul.then omission ot teache r quality could bia'. Omitted variables.: m<liu findin gs of these studies are unlikely 10 be sensitive to using dirferent nonlinear regrc. at least in theory. Similarl y. at least when generalized to ekmenw ry school districts elsewherc in the United States.. The analysis here and in Chapter 8 exp lored a varict) of hmct ional forms. O ne way to eli minate omitted variable bias. and if t ea ch~r quality a {fecLS test scores. For example..io n specificatioos. a family economic characteristic (the percentage ot students receiving a subsidi/t"d Il1nch) .
could threaten Iheconsislency of the standard errors because simple random sampling was not used (t he sample cons ists of all elementary scJwol districts in the st ate) . because stu dents move in and OUI of districts. il1ed ~ fun c' Ings 01 ~essi o n Discussion and Implications The sim ilarity between l h~' Massachusetts and Cali fornia results suggest that these studies are externally valid. hut Ihis rel.the stude ntteacher ra lio might not accurately represen t the actual class sizes experienced by the students taking the lest. ~o th ere is no rea. for example. which in tum resulted in hir ing more teachers. .. so heleroskedasticit y does 0 01 Ihreaten in ternal va lidi ty. t. s). a series of court cases led to some equalization o f funding. Simultaneous ca usa lit y would arise if the perfo r ac the tter d if Ihe nl Iso. which in turn could lead to the estimated class size effect being biased towflrd zero. al or Th e average studentteache r rati o in the distri ct is a broad and potentially inaccurate measure o ( class size.. Those data wen~ taken from the 1990 census. II" the economic composition o f the district cbanged substantially over the 199Os. Correla tion of the error term across ob~ e r va li ons. this would be an imprecise me<lsure of the actual average dislric1 income. {n Massachusetts. '~om mance on standardized l es t~ affected the studentteacher ratio. while the ot her data pertain to 1998 (Massachusclls) or 1999 (Califo rnia )..ariables. Errorsin ~. in th e sense thai the main find ings can be generalized to perform:lnce on standa rd ized tests at oth er elemen tary school districts in the Uniled Stales. ct an size d be it in All o f lhc results repon ed here and in ea rl ier chapters use heteroskedasticrobust standard errors. alter Iyof g. In California . in neither Massachusells nor Calirornia does simultaneous causality appear 10 be a problem. ed CI Selection. u<.9.4 C Example: Test xares and Closs Size 337 e . however. no such mechanism for cqualization of school fin ancing was in place duri. :ould Heteroskedasticity and correlation ofthe error term across observations. Thus. Another variable with pote·nt ial measure ment error is average district incume.l istri butjo n of funds was not based o n student achieveme nt . ro r exam ple.'on to believe that sample se lection is a problem here. if there is a bureaucratic or political mechanism for increasing the fun ding of poorly pc rform ing schoub or districts. Altbough there afe alt er nat ive standard error formulas th at could be applied to this sil u<l tion.:. The California and the Massachusetts data cover a ll the puhlic ele mentary school dist ricts in the slate th aI satisfy minimum size restrictions... This could bap· pen.ng the time of these tests. the det ails are compl ica ted and specialized and we leave them to more advanced lex ts. Simultaneous causality.
. and district affluence. and after mode ling nonljnea rilies in Ihe reg ression functi on. The eSlimated effeci of Ihe proposal o n sta ndardized lest pe rfo rmance is one imporlan l input into her ca lcu lation of costs and be ne fil s. she will need to wei gh the COStS of th¢ proposed reducti on against the bendit s. A leading candidall' is omitted va riable bias.: the revie. Th e benefits include improved academic per formance .lIs to the internal validity of such a study include omitted vari ables. we arc able to answer the superintendent 's quest ion fr om Section 4. and by checking fo r oonlinearities in the regression funl"' tion. In making this dec ision. '\<'. SLiIl.5 Conclusion The concepls of int ernal and external validi ty provide a framework for assessing what has been learned fr om an econometric study.~ by "hrcnberg. student characte ristics. . imprecise measurement o f t he inde pend~nt ~J r you arc mterestro in learning more ahoulthe reilltion~hi p bel"'ccn cl a~ Site and Ic~l ~or~j. and district afnuence.1 : After controlling for fam il y economic backg round. which wt have measured by performa nce on standardized l e<. Thrc.ny stud ies that have investi ga ted the effects on test scores o f cla ss size reductions. some pote nti al threats to internal vali dity remain .nd Willms (200h. The costs in clude leacher salaries and expenses for additional classrooms.08 sta ndard deviatio n o f Ihe dis tri bu tion of teSI scores across districts. CUlling Ihe student. 9. and if standard errors 3re consisten t... A stu dy based on multiple regression is inlernally valid if the estimated cod fid ent s are unbi ased and consistent . perhaps arising because the control variables do nut capture o ther characte ristics of the school districts or extracurricular learning opportunities.. but it is quite smaiL TIlls sm<lll eslim atcd effect is in li ne with the results o f the ma.• 338 CHAPTER 9 Assessing Studies Based 00 Multiple Regression Some o f the most important potenli al threats to internal validi ty have been addressed by cOntrollin g for student background. Based on both th e California and the Massachuse liS data . 4 Th e superin tendent can now use lhis estimate to help her decide whether 10 reduce class ~izes. family economic back ground. GaIDOran.Ieacher raLio by two students per teache r is predicled to increase lest scores by approxi malet)' 0. 3. Brewe r. 2OO1b).t~ hUI there are o ther poten tial benefit s that we have not studied . includ ing lo\\er dropou t rates a nd enhanced future earni ngs. Thise ffeci is statistica lly significant. misspecific<1tiOn of funct ional form (nonlinearities).
sample selection . and simultaneous ca usalit y. Each of these introduces correlation between the regressor and the error term. OLS estimators will be inconsistent if the regressors and error terms are correlated .ms of this textbook develop ways 10 address threats to inter· na l valklity that canno t be mitigated by multiple regression analysis alone.r~ ss ion estimation of causal effects. randomized controlled experimenlS. assessing external validity requi res making judgments about thesimila rities of the population nod selling studit:d and the population and st: Hing to which the results are being generalized.t udied. then internal validity is compromised because the standard errors wi l\ be inconsistent. as they can be with time series data. If tbe e rrors arc correlated across observation s. o r if they are het eroskedaslic but the standard errors are comput ed using the homoskedasticilY o nly fo rmula. Some times il can help to compare two or more studies on the sa me topic. is ext ernally valid if its find ings can be generalized beyond the population and se lling studied . Part JJI also discusses a diffe re nt approach to obtaining intern al validity. an incorrect functional form is used .Summary 339 variables (errorsinvariables). The next {Wo p. In reg. the sam ple is chose n oo nrandomly fro m the population . o r there is sim ultaneo us causa lit y between the regressors and dependent variables. First. 2. Part IV deve lops me l hods for an alyzing time series da ta and fo r using lime series data 10 estimate socalled dynam ic causal effects. These latter problems ca n be addressed by computing the stao· dard errors properly. I'" . 3. Statistical studies are evaluati!d by asking whe ther the analysis is inte rnally and externally val id. Second. A study using regression ana lysis. Summary 1. Regressors and error terms roay be correlated when there are omitted variables. A study is externally valid if its infe rences and conclusions can be genera lized fro m the population and selling studied to o ther populations and sellings.f over time. Whether or no l the re a re two o r mo re such studies. like any statistica l study. g fIS r. which are causal eUects th~H vary " I . which in tum makes OLS estimators biased and inconsiste nt. con fid ence int ervals and hypot hesis tests are not valid when the standard errors are incorrect. A study is inte mally valid if the sta tistical in ferences about causa l effects are valid fo r the popUlation being :. however. Part III exte nds the multiple regressio n mode l in ways designed to mitigate all fi ve sources of potential bias in the OLS estimator. one o r mOre of the regresso1'5 is measured wi th erro l'. there are two types of threats to inter· nal valid ity.
D oes Ihis mean thai regression analysis is unreliahlc ? E xpla in .I. how ewr.1 \\'hat is the difference between internal and external validity? Be lwcellihe populal ioo Sl udie d a nd the populatio n o f int e resl? 9.tlidity (313) population ~ Iudied (313) popul at ion o ( inte rest (313) setting (315) functional form m isspccificario n (311) errors· inva ria ble bklS (3 19) sample se lection bias (322) simuhaneous c:lU sality (324) simult aneous eq uati ons bias (325) Review the Concepts 9. pUler software uses the bo moskcdasticiryonly stand ard errors. o r when the errOt term is correlated across differen t obse rvations.: b~ using citylevel da ta. The seco nd uses the heteroskedasticity·robust formu la. and these data were used in a s tudy of class size on studem ~rr<1T' mance. regression coefficie nts to be unbiased esLinHlIes of causal e(fects. thallhe regressio n model be exte rn ally va lid (or the forecastin g applica tion at hand.2 describes the problem of vnria ble selectio n in t e rm ~ 01 a tradeoff belwee n bias nnd varia nce.4 Suppose that a Slate offered vo lunta ry standardized lests to a ll of its thlrJ graders.carche r estimates the t: [fecl o n crime rales of spending o n polic.340 CHAPTER 9 Auening Studies Based on Mukip\e lI:egreuion • 4. Key Terms internal vatidity (3l3) external vl.. il is not neces'iury for th.H" very different. It is critical. When regression models are used solely for fo recasting.5 A re. 9. Standard errors arc incorrect when the errors are heteroskcdastic and the Com.2 Key Concept 9. Whicb should the researcher use? Why? . Explain how simul taneous causality migh t in vnlidJle tlw results. Wha l is this tradeoff? Why could incltld· ing an additional regressor decrease bias? iocrease variance? 9. 9. !.3 Economic variables a re oft e n measure d wil h e rro r. The s tandard e rrors . The first uses the homoskedasticityonly form ula for sta ndard errors.6 A researcher esti mates II regression using two diffe rent softwar e packages. Explain how sample se lection bias might inva lidate the resulls. 5..<.
is the mea· measured with error.3.4 Using the (egressions shown in colunm (2) o fT:i bk s'3 a nd column (2) of Table ~.· + v. T = = a. Usi ng data from New York during the 1970s.sa lisfies the assumpt ions in Key Concept 4. = f3u + f3LX. ( Him: Notice that the sample includes only women who a re working.erciSM 34 1 Exercises 9. and independe nt of Y.3 to compare the estimated effects of (l to Ol" increase in distri ct income o n test scores in California a nd Massa · chuse tts. . 9.) c. is the regressio n e rro r us ing the mismeasured depende nt variable.2 Consider the onevariable regression model: Y..!ing empirical result. Show tha t Vi = It. New York in 2006.ment error in Y is Dol : ' 9. occ upation.es a nd :. Use the concept of external validity to deter· mine if these results are likely to apply to B ~ t on in The 1970s: ~ Angeles in the 19705. Show that the regressio n V. .construct a table li ke Table 9. + Wj' b.3. Th ey found thai wome n with mo re ch ildre n had higher wages. Are the OLS estimators consb ltnl? d.ure me nt e rro r which is i. (Assume tha t W j is inde penden t o f Yj a nd Xj (or all values of i a ndj and has a finite fourth mo rne n!. Measure.conlroWng for these o lher fac tors. Suppose that Yj is Y j + 11'. + II.ubways was more effective than print advert ising. and Xi' Conside r the population rcgre ssio(l V i f3u + f31X i + v" ~here V. = 130 f3IX.) 9.2.) (This empir ical pu zzle mOlivated James H eckman's research on sample selection t hat led tu his 2000 Nobel Prize in eco nomics.1 Suppose that you have just read a careful sta tistica l stud y of the effect of adver1ising on the de mand for cigarelles.. nn d sup· pose that il satisfies the assumption in KeX Concept 4. it concl uded th a t ad vertising on bu:. Exp lain how sample se lectio n m ight be the cause of this result. Evalu ate these sta teme nt s: "Measure me nt erro r in the X's is a se rious proble m. Can confidence inte rva ls be constructed in t}le usual way? e. Using ra ndomly selected employed women. educa tio n.3 Labor econo mi sts stud ying the determ inant S of women's earnings discov ered a puv. :.x.Ld . and so fOrlh). so that the data arc Y. Y.. they regressed ea rnings o n the women's number of children and a set of con tro l va ri ables (age. whe re w.E.
Using these 200 o bservations.) 9.1. H. U~ Ihese to dete rmi ne the regressio n slatistics.: s~i o n Y . Is the es timated slope too large o r 100 sm all ? (Him: Use the fac t tha l de mand cu rve~ slope down and supply curves slope up.2) 15.. R' = 0.) yie ld the following r~ s ult s : regrl. fHint: Use Eq ua tio ns (4. Deri .l (. so he has 200 observalio ns (with observati on I entered twice.. . is regressed o n Pj' (That is.32 . d.ion coe ffic ie nts..d.7) a nd (4.8). observations for (Y"X. and so fort h).nd (c) to de rive values of Ihe regres:. is tb...:h observation twice.' a n d ~..In e rro r when he e nters the d a ta into his regression program: H e ente~ eiu. Use yO U( a nswe rs to (b) a.. b. Pdenates price. P. SER = _. the variance of Q. Suppos..s of Ya nd X as funcli o o ~ of the ·'cor· rect"' va lues. a.R~ = _. Y. where II de no te!. fac tors o th e r tha n p rice thai determine supply. S upply fo r the commodity is given by Q == "Yo + "YIP .5 The demand fo r a commodity is given by Q = {Jo + {3I P + /I . and Q.1 + 66. De ri ve the m ea ns of P and Q. whilt results wi ll be produced by his regression program? (Hint· W rite the '·incorrect'· values of the sam pit" means. A no the r researche r is interested in (he same regressio n. c.ine dem a nd. variances and covarianct.: lh at LI a nd v bo th have a mean of ze ro. observatio n 2 e nlered twice. A rand om sample uf o bserva tions of (Qj.6 Suppose n :: 100 i.) Suppose tha t the sa mple is very la rge.81. i. have varia nces u.1) (12. Qi is th e regressa nd a nd P.SER = (1 5. J ii. mOlUa lly uncorrc la led. Solve the twO simulta neous e quations 10 show how Q a nd P de rcnd o n!1 and v.) is C Ollected.342 CHAPTIR 9 As~sing Studies Bo~ on Multiple Regression 9.) y == _+ _ X. A researche r uses the slo pe of Ihis regressio n as an estimate of the s lope of the demand function (f3d. but he make. and art..: regressor. (.l[ de term.l .'X. where Q denOTes quantity. e the variance o f P. i. and t he covariance between Q and P. and u de no tes fact ors olher than price th.
1inear regression of Te. kes an rs each enterl!d " D iscuss th e inte rnal valid ity of the regressio ns th at you used L O answer Empirica l Exercise 8.1(1).5) be. Discuss the intem a l and exte rna l valid ity o t the estim ate d e ffect of price per cit atio n on su bscriptio ns.3.1(1).variables. Would either of these regressions provide a reliable estimate of the effect of income o n test scores? L) ion Would e it her of lhese regressions provide a re liable mc thod for forecasting lest scores? Explain. [Nult~: Reme mber to adjust fo r inflat ion as explained in Empirical Exercise 3. 9. 1 10 a nswer Ihe fo ll owing questio ns. Use these data to investigate the (temporal) ex terna l validity of the conclusions that you reached . his ~amplc Ie ··cor· b. "Each of the fi ve primar)' threats to inte rna l validity implit!s that X is correlated with the e rror te rm:' Would the regressio n in Equatio n (9. and inconsislency of the OLS standard errors. Include a discussion of possible om illed variable bias. 343 a. simultaneous causality.wn 'ession Empirical Exercises £9. 9. 1(b).11 Read the box "The De mand fo r EconOmics Jo uma ls'· in Section 8." b. "Ao ord inary least squares regressio n of Y on to X will be internally inconsiste nt if X is correlated with the e rro r te rm . useful fo r predi cti ng test sco res in a school d istrict in M assachusettS? Why or why no t? Consider the. Discuss the inte rnal a nd exle rnal va lidi ty of the estima ted effect of ed u catio n o n e a rn ings. [the large .3.18).1 0 Read the box "The Returns to Educatio n a nd Lhe Gender Gap" in SCClioo 8. 1 .1 includes data fro m 2004 and 1992.n E mpirica l E:cefcise 8. Which (if any) of the interna l validity cond itions are vio lated? Are the following state me nts true o r false? Expla in your ans\\eT .uScore on Income shown in Figure 8.Empiricol &erci5e~ b. E mpir ica l E:xe rcise 4. TIle da ta set C PS92_04 described in Empiocal Exercise 3. misspecificatio n of the functional fo rm of the regression. errorsin . sample selection .2 and the nonlinear regression in Equation (8.1 Use tht da ta se t CPS04 described in.
\lId .ion.3 Use the data set Coliege Dishm ee de scribed in Emp irical answer the foll owi ng Ques ti a n ~ g.3 10 Discuss the iote rnal va lid ity o f the regressio ns that yo u used to answer Empirical Exercise 8. and gender. \\Ih<ll is your advice? Justify your advice based on a care(ui and complete assessm ent of t he internal and C\II. U~e these data to investigate the (geographic) ext ernal validity of thc conclusions that you reached in E mpirical Exer· ci:"e 8.) You do not have time to collecl your own data. misspecificalion of the function. and inconsistency of the OLS standard errof5. The committee seeks your advIce. so you mUSl base your recommendations on thl analysis of the datase t TeachingRatings described in Empirical Exercise 4. (This is legal as 10tl~ as doing so is blind to race. The lest score is taken from the Massachusetts Campn:henswe Assessment Sr. religion.' Hl 1998. Exere is~ 4. b.. your help hefore reporting to the Dean.:r_ nal val idity oC Iht: regressions that you carried o ut 10 answer the E mrirical Exercises using these data in earlier chapters... APPENDIX 9. c rrors·in va ri ables..Thc test is sponsored by the M assachu~elts Department of Ed ucation . 9 Asw»sing Studie~ Based on Mu~iple Regreuion E9.2 A committee o n im proving undergraduate teaching at your college need. about whether your college should take physical appearance into account when hi ring teaching fucuil y.3(i). E9. age. Include a d iscussio n of possible omitted vari able bias. Based on yo ur analysis of these dala.1 form of the regres:.3(i) .1 The Massachusetts Elementary School Testing Data The Massltchusells data arc distrietwide averages for public elementary school dislfld.2 that has served as the basis for several Empirical Exercises in Part II of the lex\. simullaneous causality.344 CHAPTE R. sample selection. The data SC I ColtegeDistance excluded students from western s l atc~: data [ar these student s are included in the data ~e t Coll ege Dis· tanceWest. t~·1TI (MCAS) lest administered to all fuurt h graders in Massach usclts public sehoob in the spring of 1995. as an econometric expen.
and Ihe percentage of stude nlS still ieaming English are ave rages for clIch elemen tary :. and were obtained from the Massachu setts Dep:trlme nl o f Education. The dala ana lp~t"d here nre the ow:rali lotal soon:. lhe percentage of students receivi ng a subsidized lunch.I(1ll Is in . Data on avemge district Income were obtained from the I:~ Ihe 10< 1 ~ 1990 U. mat h.chool district for the 1997. lC ~: d ijst n ct~ III I Sy. :e.199S school year.ion. Cen~u~ "< k al :\ to weT ..al Ing me I Data on the ~ tu dentte3che r ratio.S. on the English. which is the sum of the scoret.b' :ltiOn aoJ . and science ponions ofl he test.The Mau ochuseth ElemenlQry School Testing Dota js 345 IS mandalury for all puhlic schools. .
~ RT TH REE I Further Topics in Regression Analysis C H A I'TEK 10 C Il A PTE R II Regression with Panel Data Regression with a Binary Dependent Variable Instrumental Va riables Regression Experiments and QUQ.'i Exp eriments C H A PTE R 12 C II A I'TB R 13 • .
In Seclioo 10..~UIU addfiJS thi s questio~ using data on traffic fatalities. in which each obscrva tion<il unit.. ".talilies?"We 1Ji. I . The em pirical application in this chapler concerns drunk driviQg: Whalare the effects of alcohol taxes and drunk driving laws on traffic fa. is an extension of muhiple regression that exploits panel data to control for variables that differ across entit ies but lire constanl over time.. ilke Improvements inj the safety of new cars.g..· ~ with Panel Data M ptx..... # variables th at differ (rom one state to the next. • ~w. these • 349 .~ CHA PTER 10 Regression rmw . This me thod requires a speci fic type I"""'" (" t/(A< . states for each of th e seven from 1982 10 1988. I( data are not 3vailable. allows e~ ~ ~~ to control for variables Ih al vary through lime.~ . bU.. ~? Section 10.~~ f1"1'1* 4kc .".Kee.t do not cbange over time.e~_ .~.!! omilted variable bias. Fixed effect s regression is introduced in Sectio ns 10.ud . but do not vary across s tates. however.(.. ~. it is possible 10 eliminaLe the·effect ofomitted variables that differ across ff ~ entitiesbut are constant" over time. rl'. the)' cannot be included in the regression <lnuthe OLS estimators of the regression coefficients could ha.foLsome ofi t he varia bles.faw s. (/~ dm 'fngdata ~..(ul~ U lti ~ l e regrcssio. 1 descri bes the structure ofpanel data ana introduces Ih" .'\flint! .P _/J. .S. .gM/~1 /wI) f Of data.. drunk driving ="> ...ily...roJ for unobserved ..!coho! taxes.. ~ panel da ta. This p'and data sci ~ts ~ears us cOlll. such liS preva iling cultural atlifudcs toward drinking.a!!d d. Fixed effects regression. Th is chapter descri bes a mcthod fo r contro ll ing fo r some Iypes of omitted variables wilho ut actua ll y ubserving them... called pane l data.3. o r enL.rivin. lualso ~ ~ ~t/ .. and related \'ariahles fo r tbe 48 contiguous U. the n for multiple time period:.lJ)' studying clltl1~geJ in the dependent variable over time.. is observed at twO or morc time periods..2 and 10.4. fi rst fo r the case of ani)' two lime periods." is a powe rfull ~ 1 for controiJ ing tor fhe effect of vanables on which wc· have data.. .. the main tool for regression analysis of (lVV tt.
I.u. wh ere tbe fi n. ervdr: .i= l. 1O.. lf the data set contains observations on the variables X and Y.. i.. .. on ..'h of II entities in lite I wht:lh~r ..1 k. When descri bing panel data....ohscrvalion~ that ~ . Som e add ition al terminology associated wit h panel data describes ~ ... Y.. for a total of 7 X 48 = 3 3(1 observations.nt iI Y. referred to the varia ble Y for rh ~ il h e. we need some addit ional n0 13 1 . nandt= I..h~ .' ~ Panel Data = rl IJ ft. \\hen: each emity is observed in T = 7 time periods (each of the! year.. then the data are denotcd (XIl'Y....bser"at io~ are mis_s~l~.J. ~~ pt. ~ 10.~ _ .. l.00 10 kCl'P Irack of both Ihe entity and the time period . and tile second sub.:.SI o f observ.vI ~ the vanablc~ arc ('~rved for each cntlty.... I n Section 10.aJl its . . refen to the date al which it is observed. script. t.U / ~Pve/ for unobserved va riables that <lre constant across enl ities but change over ti11\f"... I /. which control (!UK.I! PI e A4 : J.JtJ? ~me o.l. 1 Panel dOl' cons. When de!scribing crosssectional data it was usc fulto use a subscri pt to denote the e ntil y..~ observatio~.v!e J 1tvd(1 .T t<.. Those data arc for II 'R enli· ties (stales).' Recall fro m Section 1.' .5 d iscusses tbe panel data regression assumptions and standard crron for pfln el da ta regn:ssion. I. ods T.Apand 'lhal b.. ~I/AfiI. refers to the entity being observed.h of T~nods. refers to the ti me period of Ihe ~ ~P~j IJ ~ B~ /J . A bala~ced pancl . Section 10. refers to the entjty... 19M)..ume pcnQd. and the second .6.. 1982...on.).{.ThuS !J deno~es I~e variabl~ Yo~served for the .cs utlwo or more time peri. ffll e stale traffic fata lity data studied in this chapter are panel data.1) . me thods are extended to incorpora te socalled timc ~ed e ffects.he "men cn. T.iJ QVU ..and each.i. . we use these methods to st udy (he effect of alcohol (axes and drun k driving laws on traffic deaths...1Ji'lS notallon IS summanzed III Key Concepl lO. .ln is is done by us ing two subscripts rather than ••JIle: 'm e fi rst..350 CHAPTER 10 Regression with Ponel Data '__ KeY CONal'! NOTATION FOR PANEl DATA ~ (10. for e" amp!..1subscript..3 th at pa nel dala (also caJled longitudi nal data) rcfe~ to data for II differenl entities observed at T different time periods.
~ Example: Traffic Deaths and Alcohol Taxes f we did not have d ata on fatalities for some s lat c~ in 1983). hJd  .. put_into I~ dollars by adjusting for innation .f. I3J _ "'~ on the real beer tax is I V I~.~"/1c.. Ihey an:: pO I inlO"1988 Jollars~ Llsing Ihc r onsomer Price: i ndcx ICPI). but noutatist . although pre cisely how 10 do so .M. have heen drinking.' ln e measure of alcohol taxes we use is the " real" tax on a case of beer... 200J) estimatcs tha t as many as 25% of dri\(:~ on the road belwee n I A.01 .10.W~ ~ _ OLS rcgression line o btained by regre~sing the fatality ratc on the rea l bee r lax is contain~ .nel. I' ~ ~ ... j.im ately o nethird of fa tal crashes involve a driver who was drink· ing.o" softw"" he.1i.I' . . stales for all se\e n yea rs. however . the Iype of drunk driving laws in each state in each year.a bl.M.! "'''~ ~ / _ 'y _ a lo..lhe n the d a ta sel wo uld be unbalanced .:~firetcnt I 'fF. A Roint in th is scatte rplot reprc . I'br e){nmpk.o plotted in the figure: the estimated regression line IS ~t.delll hs wc usc is the fatahtv rate.1.cri!"lcd in more detail in. ng used. all t hese methods can be used with a n un balanced panel.a. If. I~?*' . hcc:lo"C of inflation a IU of S 1 in 1m \:orresponds 10 a tax o f $1."""' . ITo make the I~xt~ comp~r~ble ovCl" limc.  l7 10..rc dc!'. b. The tra ffi c fa tality data set has data for a ll 48 U.2 ) .! The dala.I ~ ~p wI~' if<.. so il is ba la nced..The methods presented in t his chapter ar e d escribed for a balanced panel .M 7i. wruch is the beer tax . Approx.23 in 1988 dollars . O ne stud y (Levitt and Porter..000 peo ple in tht' popula tio n in the st ate. The .. .cl.. including tbe num .  There a re a pproximately 40...t'. which is the number of a nn ua l traffic deaths per 10. ana the tax on beer in each slate..S... and) A." ~~=2. Fig ure 10. and th is fraction rises during peak drinking periods. se nts tne talUlily ra te III 1982 an d the real beer tax in 1982 for . _. ' lne measure of traffic ".x: fat a lity ra te and the rea l tax on a case of bee r. ]) ~ : In this cha pte r..000 highway traffic fatalities each year in the United Stales... (.e depends on the regress.n pract.=iT "I the 10% Ic\'... The pan el data set \Jriables related to traffic fatalities and alcOhol.1a is a scatterpl ot of the d:l1a for 1982 on two of these varia bles.. ho has not been drinki n!!:. some dala were missing (for example. OIS8eerTax l1982da ta ) po!I. I Panel Dote 3 St some mis.} given sta te....e _PJ (015) (9. if {.~ 1 _''~ ( IO.!in& data for at least onc time period (or at least one entity is called an unbalanced p.!r of traffic fatalities in eacb state in each year.. . wc s ludy how effec tive va rious government polic ies d~sigDed to discourage drunk dri ving aClUally are in reducing traffi c deaths. however. the . a nd that a dri\c r who is legally drunk is a t least 13 times as lik elv to ca use a (a lai nash as a d river \.~ .ltlVC.Appendix ... to 1 41'1 dr.
. on a case 01 beer (in 1988 cIoIlors) for 48 stole! in 1982.. ..000) ' .5 0. fie fatality (O~$ and !he real lev<.u I L _~'_~_~_ _ ~_~_~ 0.0 fiJ~l€ = 1.86 .1· 0.. . Ponel b shows the do to fo r 1988. (9 .liriu per 10.• . 1 The rroffk Fatot..0 15 1. Both plots show 0 posi · tive relolion$hip between the fotol ity rote and the rear beet ' ....0 0. .n .44BNrr. 0·· • . • • • .0 3.352 CHAPUR 10 Regreuion with Ponel Data • FIGURE 10..2. O .5 ' .• '.5 • •• • • • • Fillilft)fYlt " 2 01 .0 • ""' . 0. Fuality Rate (Fatalities per 10.0 1. • ~ .000) 1 .. • ·'.5 l'J Betr Tax (Dollars per c ase 51988) Fa la li ty Rail' .o • • ~.5 @ (Y.0 2. • 10 15 o 2.O.0 15 10 05 • • • • ••• . 2. (Y~ 25 \0 Bu r Ta~ (b ) 1988 mu (Dollars per case $1 988) .~ o. 2.ty Rate and the Tax on Bee. *~ • •: .0 J5 3..5 Panel 0 is a KOtterp/ot of lraf. . 5 3... .0t.0 LI_~:_~_~_~_ _~_' 0. I58mTar .5 I0 1.
to variable bias. This is done in Figure 10.44Beer"/(1X (1988 data). some of these va riablcs:such as· the cullural acceptance of drinking and driving.sio n line thro ugh Ihese dalil is FOiOtity~ = 1..~~~~WithTwoTime {'.Periods: " Before and After " Comparisons ~ ¥o.t.r:$ ~ ~"". 5' ~.53 Because we have da ta fo r more than o ne year.. Many fac tors affect the fatality rale.3) t~ ) In contrast to the regression using the 1982 data.Jfl J T r tf} J O~ a~d the~ willlea~ o~illed Y ~ . \Y 1$ vmues of the depe:den~:abl.n the When da ta fo r each sta le a re obtained [or7 (Y J!e4. we use OLS o .2) and (10. the coefficie nt on the real beer tax is statistically significant a t the 1% !eve I (the fst. and \."hese varia bles and add the m to [he annua l crosssectional regressions in Equa ITo ns ( 10. If these factors remain constan t over time in a given sta te.e lb~m.. the density of cars "Oil the road. we can in effect hold these ~ factors conSlan l..\:he lher il is socially acceptable to drink a nd drive.43) .J:.) {isti c is 3.3). . J o/~ 10 these pote ntial sources of Olllllled va riable bia s would be to collect data o n a U p. alcohol taxes..~ regression with fixed effects. which is the same scatter plOI as before.~/... fJJ.\ays arc in good repair.e'.. Th e O LS regrc1'. To do s o. trafficfatalitic._ ~ . ~ I @a. the estimated coefficie nt fo r t he 1982 and the 1988 data il.. Should we conclude that an increase in the tax on be!:r leads to more traffic deaths? Not necessanly. .11) (0. One approac h . whether the state high\. Unfo rtunately. if they are. =. except tha t it uses the da ta fo r 1988.d.86 + 0." ~ ~ ([) / () ~. (0. not fewer.2 time per iods. we can reexamine this rela lionship for a nother year.13) (10. Beca use we have panel data .t we cannot meaSUf.. PQsilife: Taken lite ral higher real beer taxes are associated with more. including the quality o[the automobiles driven in the state. even thoug!.<.1 b.s.. however.10. might he very bard or e\'en impossibJe to measure.' =. Curiously. the n anothe r route is available. whethe most driving is rural or urban..2 Ponel Dota with Two TIme Period~: ·Before ond After' Comparisons 3. it is possible to tV I Ei~ ~~ ~~uA: ~~~<. because these regressions could ha ve substantial onutted variable bias. . J.... A ny of these fac tors may be correlated \'lith IJ . .:IU.
Tl1U5.ln3· . IJ. Cultural altitudes toward dn. .t . + f3 2Z . ...!. tion. consicJc.'"Rather... . + {J1BeerTClxd'llt9!. Le i Z.~ (10.:r Equa tio n (lOA) fo r each o f the IWO years. + 1I. can be d iminatcd by a nalYling the cha nge in the fa tality ra te between th ~ two periods. these other ~urcts are changes in the lax on beer or changes in the error .<. For exa mple.":c\"e(..") is the error term and. on~e slate tothc next but do not changc o"er lime within the state. In othe. in this regressio n model. B'y focusing on challges in the dependenl vari:J blc.! fatality fat e i~ fir\t T ho 01 2: _[~~.: FlIWlity R(I//!. = 1.j98S' ( 10. To see th is mathemall(. k· and driving affect the IcveLof drunl:. that determine traffic deaths)..7) This specificat ion has an intuitive inte..!£rm 'which CilP ~ lUre.\ here ~~ ~.Ti"l:(. Z.:a lly. l)JefJ ~ .! after" comparison in eIfeet holds constant the unobserved factors that differ from .~eM stant over time. ~ ~ Specifying the regression in changes in Equation (10. IQiI2 ' (IlLS) f~~ ~.') Subt racting Equat ion (10.8mTa.therehy eliminating this sourcc: of omitted va ria ble bias...I'N.7).. but t.wc.l9AA . T..lnq.\ cen 1982 and 19SX.4) it will not prod uce a ny change in the ratali ty rate bc[\..!> changes in other facton.. fhu~ . 1982 and 1 9~: ' .. ho. lhis "before anI.FtltlllilY Rmf'd9S2 = f3 1(HeerTflx.'li'11:0. the influence of Z.  Be£'.=fJ.:/t ing .6) eli minates the effctl of 2. + lI. might be thl. = ~. we/' Bccu use 2/ does not cha nge over time. not cha nge over time (so the I subscript is o mi tted ). ~ ulation li near regression rela ting Z..1 ~ lies over time m ust bave arisen from other ~ources..5) from Equation ( W. n fatalities in the state.tvi"1r (p/?el P1 F(lta liry Rate. lA. in the reg res~ i on model in Equution (10. and the rea l beer tax to [hI.I:I\'IK! + 132Z .idered to be ~omi\a n t be. 1n Equation (10.354 CHAPTER 10 Regression with Panel Dole period. which changes siOwl~ and lbu..p<c t.Ie ~ p." ancJ I >= I•.·fJ. driving and the traffic fatality ..twee n 19M2 and 19R8:"'AccordingIy: th e p~p _ ~er". all: in /J • a stale. local cultural attitude toward drinking and dii"Ving. could be Foru. "" flo + {3 tBeer"{(I.I')~:!) + 1/11Y1l11  /I'I~' (10.FiiiiiIUlRat.."l I~ Jf.eSH 'f \.. . ~.r word . . they did not change between 1982 and 1988 theathey dill n~1 l.d.u.~ produce any c:1rang!J. Oc a varia ble tha t dctc nnine:s the fa tality ra te in the jlh state.Z. ' Fatci/jfy Rml..~ ~ Iyring changes in Y a nd X has Ihe cUect or controlling for variables Ihat arc (lIn . II" ..k fi... J tftL r _ . If.7) eliminates the df!!c t f1~ of the unobse rved \ariables Zi Ihal arc eonstanl over Time. any changes in traffic falall " •.l9N:.
.J. This estimated effect is very largc:1ne average fatality rate is approximately 2 in these data (that is.04(Beer ru·'·I~ .2 presents a scatt~rplO I of the challge in the fatality rate between 1982 and 1988 against the ch(lllge in the real bee r tax belwt::cn 1982 and 1988 . 1982..L 51. The hypothesis that the populatio n slope coefricicnt is. an increase in the real beer tax by $1 per case reduces the tranic fatality nile by 1.(her '. for the 4$ states in our data set.$..~.065) (0. 2 Chonges in Fatality Rates and Beer TOJ(es.zero is rejected at Ihe 5% sig nificance level. ~r Figure 10.R11tl962 • '¥ U~ IVII} . the estimated effect of a change in fhe reall>eer lax is neg.04(BmT.on 1 88.2 0.1.Filldlit.tti\·c..072 .OOO members ofthe population")_ ~~ pl'l"" .0 • n.. is nonzero..1)4 deaths per IO'(X JO peo ple. . .~.:t. h : in no< 'atali· ... in the absence of a change in the real beer tax. i : 0..uI9&l l and the chonge in reol beer Io..4 0. m e OLS regression line.6) f 7.2 0..7) \ J..[. <l .BwT. According to lhis esti mated coefficient. ' .0.5) &.l.i1 .2 Panel Octo with Two Time Pericxk: '"Before and Aher" Comparisons 355 .BeerTtLlt9W' (0.2 represe nts the change in the fatality nile and the change inlhe real bee r {ax betwee n 1982 find 1988 fo r a given state.000) traffic fololity rote Lu )It 1 FII~irJfiilltl 'JU .J$~: ~#~  .8) ..u\1IIIII .g ~ FllllililyRatt'I'»J1 n = cap Idf.36) (10.. c change in the C hange in Fatality R2 te (Fatalities per 10. '.5 OJ) between \982 and 1988 for 48 sfote1. • OA 0. 2 ratalities~r year pcrlO.{).s.II' ~ In cont rast to the crosssectional regreSSiOn results._~~_~. ! ().6 .l(os • 0.1 ana ~ con whe re including an inte rcept allows for the possibil ity that the mean change in Ihe ~ fat ality ratc.1988 This is 0 $COtterp/ot of the . C hllonge in Beer Ta x (DoUan per cau $1988) ). estimated usi ng these data and plottctl in the fig ure.d fiGURE 10 . as predicl~d by economic' theory.10.res .072 . :ing lIy.~ .¥ . 1. • ralilfily Rme l')I'. A) There is a negotive relotioluhip In .. between changes the fatali ty role and changes in the beer lox .. IS 'Ilk· .~~ 0 I.. A point in Figure 10.
. and it seems foo lish to discard those potent ially m erul additional data . we undert ake a more careful analy'lh that euntrals for several s. "But the "belore and after " method does not apply directly when T > 12.next but are conSlimt over time. = /Jo + PIX" + /32Z ...356 CHAPTER 10 Regression with Ponel Data ~/:d ~ ?J' :t1 so the c... we use the method of fi xed effects regression. howc"er. To analyze all the observations in o ur pa nel data sct. and if they chan~ over lime a nd arc correlated wilh the real beer lax. an d XII' re~rcc · lively: v.2. then their omission will pro duce oru illed variable bias. tm 10. half merely byincreas. ing thl! rcnltax on beer by $1 per casco By cxamin ing changes i n the fatality rate over time. + " ii' (10.: .<. Y. that differ from one entity to the ... Z. the regression in Equa.3 Fixed Effects Regression Fixed effects regreSSion is a met hod for controll ing for omit1 ed va ri ables in punel data when the omitted variables vary across entities (~ Iat es ) but do not change over lime. These binary variablesabsorb the influences of all omitted variable. The fixcd effects regression model has 1/ different intercept~ one for each entity. is an unobserved variable t hat varies fro m one state 10 the next but J<)t"$ no t change over time ( for example. This "berore and after" anal ysis works when the data arc obse rved in Iwo dll fe rent yea r ~ O Uf data set.timllle suggests thaI trafficlatalitict. fixed effects regression can be used when mere are two or more lime observations for each enti'ty. The Fixed Effects Regression Model Consider the regression model in Equation (10. can be c.8) controls fo r fi xed ractors such as cult ural att itudes lo ward drinking anti driving.. represents cult ural all ilud es towa rd dn(l\. Unlike the "be fo re and after" comparisons of Seclion 10. But tht!re arc many fac tors that influence tra ffic safety. III Sectioo 10:5. lion (10. V. so fOJ now it is best to refrain from dfilwing a ny substa ntive conclus ions a bout the e ffect o f real beer taxes on traffic fa lalitlt'l.ch factors.. These intercepts can be represented by a set of binary (or indicator) va n· abJes.4) wit h the de pendent yuriat'llc (Fa /(/fi ryRt/re) fin d observed regressor (BeerTtu) denoted a5 Yi.9) where Z..cont ains observations for seven different year\.
The stalespecific illlercepis in the fi xed e ffects regression model also can be expressed using binary variahles to denote the indi vidual stales. vary across entities but not over time. .~~. that blOary vana I. I the "II th t! 5S1 0 0. . ~ .(lO.lO) can be written equiva lently as v)i1!'f ~.w for all states. ""~r. rcspec ( 10. the fixe d effects regression model in Equation (to. like Z. Sec tio n 8.."n 'y fixed elfec's comes from omitted variables thaI . rd drink treated as unk nown intercepts to be estimated.BIX ir 'lne slope coeffici ent of the population regression line.10 ) comes from consideringr ~ .10). Accordingly.r each reach r) vari . a ll arc pant!l hange effects . '= f30 + IhZi' Then Equation ( 10. o di( years.10) can be thought of as the "crfecI" .10) a l '·· . for if we do the regressors will bc perfect ly multi coll inear (thi s is th e ·'dum my vari able trap" of Section 6. in which Y" = {3/X" + 0'. S~ . + IIII' ( 10. "'~ Equation ( 10. ~ Because t. the2Qpulation. the terms . the pop ulation regression model in Equat ion (1 0. That pop ul ation regressiou line was expressed mathemat icall usi" a single binary variable indicating one of the groups (case # I in Key Concept R.7). ~.9)..4).3 Fixed Effects Regression 357 eas qua and ange pro ing and dri ving). entit ies are ~ lates ). If we had only two slates in our data set.. '" arc known as <ntil)' fixed effects The variation in Ihe . be a bi nary variable that equals I when i = 1 an d eq uals 0 ot herwise: let 02. so we arbitrarily omit th e binary v~lriahl e Dli for the first group. .10. tet a .9) can be interpreted as having n inter cepts. We c"nnOl include all 11 binary vari ables plus a conunon intercept.lcS' th<l1 ~' ar ia1. . but the intercept or the populatio n regression line varic::s from one _ stale to the next. Specifically. .9) :tlysis wing lities. let 0 1. b.he intercept a t in Equation (1 0. one for each state. To develop tbe fixed effects regression model using binary variables. equal 1 when i = 2 and eq ual 0 othe rwise:and so on. Bt!c(luse Zj varies from one stale to the next but is constant over time.1.lIe . . being in entity i (in the current applicat ion. We want to estimale f3 1 ' (he effect on Y of X holding: constant (he unobserved state characteristics Z. we need add itional binary variables 10 ca pture all the slalespecific intercepts in Equa tion (10. one for each s t ale. in Equation (10.regression_linc for the i!h state: this populatioru:egression line is . Because we have more than two slates. . however. / O'j ~ . 1h~ in terprel<i lion ofa.3 coo sidered the case in which the observations belong to one of two groups and the populati on regression lin e has the same slope for both groups but different inter cepl ~ (see Figure K8a). as a slalespecificinterceptin Equation.9) bUI doc<.Bl~ 'is the same ~~~"..10) is the fixed errecls regression model. :0 sion model would apply here.
. fo: an mn..Jr in the veThion of Ihe fixed efiects model in Equa tion (10. + /111' (10. wh Extension to multiple XS. Doing so results in lhe fi xed errects regression mode l with multiple regressor". In Equation (10. l n Equation (lO. + y/. 358 CHAPTER 10 Regression with Panel Dolo • Y" = f30 + PIXi. is fJlJ + PIX.. Equations (10. )'2.in some software packages. + Y2D2. T n the first step. In both formul .f30 + )'.For the second and rcmai njng states. If there are o ther observed delemlinants of 1' Iha l are corre lated with X and tha t change over lime . however. J 1). bUI are faster hccause they employ some mathemat ical simplifications Ihal arise in Ihe ulgebra o r fixed effects regression. 'Y" are unknown cocfficienls 10 be esti mated. i.! of a single rcgrC:»I.11) fT n w where {Jo' {3 \. Ih at varies across slates but not over time.I binary \'ariables. This regression. Regression soflware typically Nm' putes the OLS fixed effects estimator in two s leps. com ider the cas. These spc.. + y"DII . there are two equiva lent ways to write the fixed effects regression model.the " .X II • so a 1 = /30. has k II regressors (the k X's.111) . for i ~ 2.11).so a. impossible to imple· ment if the number of entities is large.Iud is .·· ·. ThUs. 11) have tbe sam e ~o u rce: the un observed vari able Z.2. + y)D3. the en tity· spccificaverage is sublrJcted from each variahle.10).10) and the bi nary regressors in Equation (10. compare the population regression lines for each state in the Iwoequations. To derive the relationship between the coefficients in Equation (to.. so in practice this OLS regression is tedious or. the regresslOfl estimated uSihg "e. 11).ntitydC""meantd'· \"ariablCSl Specifically.10). Estimation and Inference In principle Ihe binary variable specification o f Ihe fixed eC fec ts regression model (Eq uation ( 10.13)1 can be estimated by OLS.10) and (10..:ni on X is the sa me from one state to the next The sta le specific intercepts in Equation (10.iJI routines are equivalent to lIsing OLS on the (ull binary variable regref:osion. In Equation (10. it is wriucn in terms 01 11 sta tes[X!cific imercepts. Econometric software therefore has spe cia l rou tines fOT OLS estimation of fix ed effects regression mode ls. summarized in Kl'Y Concept 10. Ihe fixe d effecls regression model has a common intcrc~pt and II . and the intercept ). In the second step. The "entitydemeaned" OLS algorithm. 1 5 .lIion!l. lhe J>OPulalio n regression equation for the first slale is Po + {3 .l1) nnd the intercepts 10 Equation (10. the slope cocffici!. Ihen these shou ld al~o he included in the regression to uvoid omitted va riable bias.I binar y regressors. + .
+ . and 0'1' •.10.t II"  =J ~:~L Yif' and Xi and Ii. . .. + "/. t he entity he regression 'r the cast: of . fixed effects estimation .~.7) (without an intercept). . . hask+ n estimated by the OLS regression of the "entitydemeaned" vari ables y" on XI/" In fact . . binary variables (Exercise HD'):': ) in practice . + f3~. II rive the 'cepls in Ie in the ! f the fim :0 + f31 X ir ... th ey pro duce id e ntical OLS estimates.( 10. this estimator is identical to the OLS e&limator of /3. whe re D 2.· ltcse special ~ressio n .. . + iI.10): then r !Ythat ~ = tha~Y. 11 and 1= 1. ThUs.. Xz' il is the value of Iht: second regreswr. .xk j.. where Xu. and the "ent itydemeaned·· spec ifinni on in Equa tion ( lO.11) have ( no t over 130 + f3IX ). tQuid also results in cd in Key take the avcrage o f both sides o f Equation (10. EquivalentlY. 'Cifications.Dnj + II".:: of the fi rst regressor (or c ntity j in lime period r. regression model can be written in temlS of a com mall intcrcep t..l2) grcssion terms o f on m odel lions. .D3. ll1uS Equationt10.14). K£y CONCEPT The fued effects regression model is Yi/ = 10.l0) implies 7.lO)an d O LS estimator {3] from th e bina ry variable specification anu fro m the "before and aft er " specificat ion are ident icnl if the intercept is excluded from the "before and after·' spt.: :...( + . (If" Ii. ~~ . The "before and after " regression \IS. Thcsc three methods a rc equivalent. + cr.:XII + U". + . /111' (JO.11) with its binary va ria bles looks quite differen t than the "before and after" regression model in Equation ( 10..1 binary variables rc presenting all but one en tilY: Yu =: ~ espccific ./~ D2.2 f3 1X 1Jl + . T.x.. is the vaiu. in the speci a l case that T = 2 the but 11 arise in the ~pically com p. and so forth. Y. lhe fixed effccl<. the where i = 1. there a re three ways to estimate {31 by OLS: the " before and a ft e r'· specificatio n in Equation (10. + f3l. = 13 ]X..X kJ1 + a t T .. • a" are en tityspecific intercepts..r!.: accordingly.X.~.n(IO. obtained byestimationo} th e·~xcd effects model in Equation . 1 \ = .. J)~~~~ 5. that is. A lthough Equation (10. .7). where = Xi' .11)..3 Fixed Effects Regression 359 ( 10. whe n T 2. X" Ii. (10.....sion model :r. . = Y" . (1014) !..= f3)(x. and II ...Ie to imple Wusing 11  I ore has spe. U. the binary variable specification in Equation (10.. and so fonh.).and Let~.... are de fined ·si milarly. = 1 if i = 2 a nd D2i = 0 otherwise. 13) + "/:. tbe X's.11) I THE FIXED EffECTS REGRESSION MODEl ~ .
. (1I115) where.ressions are no t identical because the ··Jif· {erences·· regression in Equa tio n (10. as is conventional. the estimated starei'ixedinle rccpb are not listed tOS<! I":' space anC:! bccau!'e"theyare norof primitry interest in this applica. as pr. the estimated elleHi· cient in the fi xe'd effects regressiorfincquation (l(X"~) is nt:gar~t hat.t:md ard errors for fi xed effects regressio n are discussed furt her in Section JU.! .Jlecls. and statistical inference.\ (0. if a set of assumption'l_ called the fixed effects regrcs:.. Like the "differences" specification in Eq uation (10.w Jl. and the standurd error can be used to construct !·statistic\ ami confidence intervals.. . Similarly.te'lting hypotheses (incl uding joi nt hypotheses using FstatisLics) and construcling confi . then !.. whereas the {ixed eJfc..15) Ihan 11'\ EquatIon (ro. the difference between those I\\'O years). if the four least squares as. statistical inference. : 'Irc~ . O. to the fatali ty rate. The fixed e ITects regression assumptions and . mated from Ihe data. Given the standard error.66BeerTu.15) u~s (he data for all seven years.ltor of Lhe variance thOi t is.3).. '111e variance of this sampling distrihUlion can be CStl. · dieted by economic tbeory.5. The two reg. based on all seven years of data (336 observation s).''i.2) and (10. and the square fool o f thh cstim.he sampling distribu. lion of the fixed effet·ts OLS estima tor is normal in large samples..umptit)O\ in Key Conccpt 6.2Q) + S((lt(·Fi.H. standard errors. in m ult iple regressio n with pa nel dat a. Because (lIth\! addilional Obscrvation~thc !1I'andard e rror is smal ler in Eq uiltioo.!oite of what we roundTn the initial crosssection~eglession.ion assum ptionshold.:t ~ r~gress i on in Equ:ltion (10.h J ~\? ~ Application to Traffic Deaths The OLS estimate o f the fi xed effects regression lin e relating the real beer t. the variance u{ that distribu tion can be cSl il11uted from the data. of Equations (iO. is hilaiitl~ .. In mu l1 iplc regression with cross·sectional data.360 CHAPTER 10 Regression with Panel Dolo The sampling distribution. the sw ndard crrorcan be used to tcst hypotheses using a (statistic and 10 cun.. (or 1'982 and Il)~ (specifically. then thc sampling distribution of the OLS estimator 1\ normal in large samples.8) uses only the li'!h. de nce intervalsproceeds in exactly the same wa y as in multiple regression \\'Ith cross·sectional data.8). \) Qi ' X l. the s{j uare rOOI o f Ihat estimatur is the standard error.. Cj .uQn. bifieneal neenaxeslfre associated withrewer fi affie death ~hc oppo." .8).4 hold.(lO.. Slruct confidence intervals.
Still.ng. alt hough the te rm /335. Although 5. such as cultural attitudes loward drinking and dri\. In the entity fixed effects model.4 Regression with Time Fixed Effects l 1 r lax 10 lust as fixed effect s for each e ntity ca ll conlrol for variables thai are constanl over lime hut differ across entities.\S.10 ).li? {vJ 1/u!' "1. + /fil' (10. they serve to red uce traffic fa talities in all sta tes. so can time fixed effects control for variables that are constant across e ntities but evoh'e ove r time.. then omitting 51from the regression leads to omi tted variable bias. its influence can be eliminated because it varies over lime but nol across states. just as il is possihle 10 elim inate the effeci of Z.15. safely impro\·emenls evolvcd over time hut wcre the same for all sla tes. suppose tha t t~e va riables Zf arc not prese nt. is unobserved.16). e where S. + (3. rematns.. however.: Yi . Beca use safelYimproveme nts in ne w cars a re inlroduced nationally. is correla ted with X". then i1 could be picking up the effect of overall a uto· mobile sa fe ty imprO\Cmen ls. a skeptic might suspect tha t there arc other raclors tha I could Icad to omi tt ed vtl riabJcs bias. which we will de note by 5. listies esting confi n with effccts 10. in .9) can he modified to include the effect of automobile safet y. represents va riables thai determine YII' if S. if is plaus ible 10 think of au to mobil e safe ty as a n omitted va riable Iha t c ha nges over lime but has the same vsl ue for all states.! 5) 130 + {3\X" + fJ1Z. For example. = ( Ill.. is unohserved..10. and where Ihe single "t" subscript emphasizes thai safety ch anges over time bUI is constant across states. The populalion regression in Equalion ( 10. l!. then we can elimina te the ir in fl uence hy includ jng lime fixed effects. which varies across state!:! but not ove r lime.4 Regression with Time Fixed Effecb 36 1 In 'lions or is csti ·that con ns . over this period cars were gellin g safer and occupants we re increasingly wearing seal bel ts: if the rcal tax on bee r rose on ave r age during the mid1980s. the presence o f Z. so thai I~e te rm IJ2Zj can be dropped fr om Equation (10. So. If. that vary across states but are constant over time within a state. Our objective is 10 esti mate 13" conlrolling for 5.. Because 13.tribu ce of Imator Including sta te fixed errects in the fatality rate regression " lets us avoid omit ted variables bias arising rrom omitted ractors. lcads to the fi xed effects regression model in Equation (10. k1 cO¢ffi Time Effects Only lhe ··dir· d 1988 e{icC~ sc of tlte 1 ) titan in For Ihe moment . 16) to sa.
1n c \<lri..."" as lime fixed errcds.. for each time pe riod.::gression mode l cnn be represent cd using n .17) can be though t of as the"cffect"" on Yor )'ea r I (or.17) and ( 10. so. in Equa tion (10. while others nrc constant across sta les hut vary over lime (such a~ national safety standards) .. + II tr (1 0.. (lO. and so fortI) . the n Ibese regre~so rs appear in Eq ual ions ( 10.. varie" over lime but not over s tates. + Ih8T. Ihe intercept A .I + ~B21 + . Both Entity and Time Fixed Effects If some omitted variables <lfo! constant over time hut vary ~H:r os s states (such .. Just as the ent ity fix ed effects r. because S. ATare kno.so the terms " \ .a... caD the time fixcd effects regression model he rep resented using T . Tht: combined entit). + AI + /1 //. more gencr_ a lly..1 bin ary indicators. ~ us to eliminate bias arising from omi tted variables like nationally in troduced safe l)' standards lhat change over lime but are ( h~ same across states in a given year. and the firs t binary variahle (B l .1 if t = 2 and 82. then It is appropriate to indude 1>01h en tit y (SCLtl!) (flit! tim e effccts. l)rare unknown coefficients. time period /).IY) . and where 82. ation in the time fixed effects comes from omitted variables tha i. too. 10 Regression with Panel Data wh ich cilch state has its own intercept (or fi xed effecl).. 17) b. A. the lime fixed effecls specification a llo\.. "~ Th is model has a different intercept.. In Ihe lraffic fatalhies regression. in (his version o f (he rime e(ft:cts model tbe intercept is includeo. vary over lime bUI not across en tit ies. As in the fix ed e ({ects regression model in Equ ation (10. in Equation ( 10. r t () otherwise. . leads to a regression model In which each time pe riod has iIS own intercept The time rh:ed effects regression model with a single .\.18) as wel l..) is omilled 10 prevent perieci multicollinearity.16). + 111/' ( lO I S) when! 8 2. tne presence o f S.I binary indicators: t YII = Al + (31X .I' cul tural norms )..\ ' regressor is ~ Yu· ""P~>'. When tbe re <I re additio na l o bserved "X" regresso rs..e ~ f. = {J IX u ..11 ). like 5.. Similarly. and time fixed effects regression model is Y..362 CHAPTER.
Fin<llly. if T = 2. in 11 balanced panel the coeffici e nt s o n the Xs can be computed by first deviat ing Ya nd the X s from their ent ity lind timepe riod means.a" When there are additional observed " X ' r~ g rcss ors. including the intercept in the r~es: SiOD~ThUS Ihe "before and afte r" regression re ported in Equation (to. in ener \Can S. in which the change in Fmalily Rate from 1982 to 1988 is regressed on lhe change in Beer Tax from 1982 to 1988 including an intercept.proach... then cstimaling the mUltiple regression equation of deviated Yon the dc\'iaied X~ Th is algorithm. 'Ibis modelean equiva lent ly be represented using n . IS) :2.•• y". then the se appear in • using rep Equation (10. estimated using data for the two years 1982 and 1988. provides the same estimate of the slope coefficient as the OLS regression o f FmalilyRmt! on Beer Tax. + (0.I time bi nary indicators.. and the time indicators from their state (but not time) means and to estimate k + Tcoefficients by muhiple regres sion of the deviated Y OD the dcviulCd Xs and the deviated time indicators. An equi valent approach is to devi ate Y.20) as. el im in ates the need 10 construct the fu ll set of binary indica tors t hat appear in Equa tion (1 0. ()2"" . + Yn Dn. + BrBT.= estimated by O LS by including the additional time binary variables. well. Estimation.17) A.10." Regression with Time Fixed Effect" 363 des.19) and (10.20) 0 . . 131 TarC unknown coefficients. ( 10..xed effect and A. + aiB2. the X's. 19) ~ regression res ults in the OLS estimate o f Ihe regression line : " ityRme = O.2 1) . . )'2" . I ill where a.25) TimeFixe(IEffect ~. + . . {such as Im~ {such ltV (state) model are bolh variants of the mulliple regression mode l. Thus their coefficients can bo. is the entity fl. which is common ly implementcd in regrcssion so ft wa re. The time fi xed eCfects model and the enti ty and lime fi xed e[fccts lO.uL~tion Ip.l .~ + 12D2.ap. = 0 ualion la nd the ressors lIows us d safel~' year. + III/' (10. along wi th an intercept : y~ = lJo + f3IX. + .64BeerTtu + SlOteFi. Addin g time effect s to the state fixed effects ( 10. "Ine combined Slate and time fix ed effects regressio n model eliminates omit ted variables bias arisiogboth from unobserved variables tbat are constant over time and from unobserved variables that are constant across slates. including entity and time fixe d effects. the entity and time fixed effects n:gression can be estimated using the :'before and aftcL:. in where /30.. is the time fixed dfecl... Application to traffic deaths.8).'«('(IEjjecf. Aller native ly.20 ).I en ti ty binary indicators and T .
.56). 2.25 .4) to panel data.lI~d for crossscctional data in Key Concept 6. in which there are no time effects. a nd an in tercept. THE There erfect cion:!> I.364 CHAPTER 10 R egression with Ponel Dolo This specification includes the beer tax.5 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects R~gression The standard errors reported so far in this chapter were com puted using the w. ·In c fu st fo ur of these assumptions extend the fou r least squares assumptions for cTosssectiona l data ( Key Concept 6.. This esl imalcd relatio nshi p be lween the eeaJ beer tax and traffic fata l il ie~ is immune 10 omitted vari able bias from variables that are constant either over time or across states..6 )'ear binary variables ( lime fixed effects). For m Xl.. L •• 5. The Fixed Effects Reg ression Assumptions The fixed crfccts regression <:I ssu mplio ns are summarized in Key Concept I O. we fi rst di!K: u s~ the assumptions underlying pa nel data regression and the construction o f standard e rrors for fi xed e ffects estimators. and the coefficient on the real beer tax remains sig.' nOlal ioo as simple as possible. to panel data. many import ant detcn nimlnts of tra ffi c dea ths do nOi fall into Ihis c.ual heteroskedasticityrobust formul a.3. rh~ first four of these assumptions extend th e fO UT least squares assumptions.. in wh ich case a different standard error formula should be useo. . These heteroskedasticityrobust standard errurs are valid in panel data when Tis moderate or large unde r a SCI of five ass um p t h)n~ cnl led the fi xed effects regrcssion assumptions. this section (ocuses on the entity fixed effects regres sion mode l of Sect ion 10.~.64 /0. Before turning 10 that study. . so tbal lhh regression aClUally has I + 47 + 6 + I = 55 righihand variablelo! The coc (ficienh • on the time and slate binary variables and the intercept are nOI reported becau. controlling for a variety of factors. To keer Iht. the fifth assumption is impbu· sible.2. 10. The fi ft h assumption req uires the errors II" to be uncorrcl:l!ed over time for each enti ty. sl.15) and (10. However. ( di 3. so this specificat ion could still be subj ect to omitted variable bias.'1tcgory.21)].Il" . £ Incl udi ng the lime e ffects has li llie impact on 1he estim ated relatio nship between the rea l beer lax aDd the fatality rale [compare Equations (l0.nificam at the 5% level (/ .6 therefore undertakes a more complete empirical examination (If the effect o f the bee r tax and of laws aimed directly at eliminating drunk drivi ng. In some panel data 5enings.0. c they are not of primary interest.4. 47 sla te binary variables (siale rued effects ). Section 10.
. conditional on the regressors. .• II. 11 are i. S The Fixed Effetts Regression A$$umptioos and Standard Errors for Fixed Effects Regression 365 THE FIXED EFFECTS REGRESSION ASSUMPTIONS KEY CONCEPT . Stated for a single observed regressor. The th ird and fou rth assumptions for fixed effects regression arc analogous to the third and fourth least squares a<.3 ~ I. which do not ha . There is no perfect multicollinearity. XiT• II/I ' !ll2" .2). a winter with more snow thllO average for Minnesota.sumptions for crosssectional data in K e ~' Concept 6... c <I time dime nsion. The fifth assumption is thatl hc errors /I I' in the fi xed effects regression model arc uncorrdatcd ove..). draws from their joint distribution. (X. Large outliers. This assumption plays the same role as the first least !iquares assumption in Key Concept 6. Th is ass umption is new aD o docs not arise in crosssectional data. ..d. across entities for i := 1. condiljonal on the regressors..I' X. but independently of. the second assumption for fixed effects regression holds if e ntities are selected by simple random sampling from the population. 11. . . specifically. one such factor is the weather: J\ part icula rly snowy winter in Minnesota.tand this assum pt ion is to recall thaI II" consists of timevary ing facto~ that are determinants of YI/ hut are not includcd as regre~ors. S.tha t is.r lime. because there already is a "Minnesota" fixed effect in the rcgressioncould result ... 3. For m ultiple observed regressors. X j2• ··· • Xj"/".1. In the traffic fat alities applicalion. ~ There are fh'e assumptions for the panel data regression model with entity fixeo effects (Key Concept 10.i.. 4. given all T values of X for that entity.10. the vari ables arc i.d.• X i"(' a il = 0 for / "* S. ···· X k. . i = 1. E(u"I X Il .. The errors fo r a give n entity are uncorrelated over time.. arc unlikely: (Xlt • UI') ha\'e nonzero finit e fourth moments. 2. the variables for another en tity: that is.4 and implies that there is no omitted variable bias.i.4 .. ~ The second assumption is that the variablcs for one entity Ilre distribuled 'iden tically 10. Like the second least !iquares assumption in Key Concept 6.1I' The first assumption is that the error term has conditional mean lero. o) = O.l2" .. X u should be replaced by the full list X UI' X 2. One wuy to unden. COV{1I11' li/tI X i! ' X !2 •. the five assumptions are as fo U ows: 10.
lrd errors will nUl be valid becllUM! they v. if Ii" is corre· laled ovc r time for a given en titytheulI" is said 10 be autocorrelal ed (correlated with ilself. Si milal'l j . will prod uce corre \atlon If u ti is correlated with UI~ for different values of oS (Ind Ithat is.366 CHAPTER 10 R egrenion with Panel Dota in unusually tr":<lcherous driving and unusually many fat(llllccid c nt ~ If the amount o f snow in Minnesota in one year is uncorrclate{) wit h tht! amounl o f snow in Ihe next year. cre derived uodcr thejaiscassumplion that they are not aJlo" correlated. over mu lIP e years. if unusuaU y snowy winter. jf the errors in panel data are autocorrelaled.llterns ~Iowly adju!l1. For exam.:r time within an entity are referred to a . ·n )Us.isl . pM SLt r'1 JA4/'~ Y' J vJtY'~ I ~M:sWted as req uiring a lack of autocorrelation of /l lf' condit ional o n the A San d [he .ge then ove r timc. th en (as discu_sed in Section 5.. Staled more gene rally. then the usual stand.:' · sian with cross·sectional data .: are uncorrclated is moderate 'or lar. lhe error teml over lime . a major road improvement project might reduc~ traffic accidents nOI only in the yt:!ar of completion but also in future years. Omitted fac tors like these. St andard errors thal..p~(.3 holds. v \' \ . ~ ~ntity fixed effects. . and the fift h assumption holds. if Uti COllSisLS of random factors (such as snowfall) that are unco rrelated from one year to the next . Simi.. One way to see thisi':i to draw an analogy to hetero':ikedasticity. lariy.. which rx:r.are valid if llu is potentia II) heteroskcdllstic aod po~ o' tially correlated o\'. ~~ . .> as the worker. : II 'elJ#Y" . If lI u is autocorrclated. conditional on the regressors.then the usual slandarderror formulft i ~ not valid.A dowl\t urn in the local economy might produce layoffs and dim inish commuting traffic. Appeudix 10. however. m . . then lhis om il1ed variable (snowfall) is uncorrela lCd from one year to the next. Standard Errors fur Fixed Effects Regression }f Assumption #5 in Key Concept 10. at Jiffere nt dates) or serially correlat ed .tV. conditional o n the regresso ~ ( the bee r IHX) and the state (Min n e~ota ) fixed e Uect. thus reducing traffic fatalities for two or mOre ) cur.4) the homoskedasticityonly standard errors are nOt valid becJ U~1.> look for n~w jobs and commuting p./'? r~e. then Assumption #5 fa/ Is.. . In thiscase"':"ifT" the u':iual (hctc roskcdasfiCityrobusI) standard errors arc valid .' they werc derived under tbeJalse assumption of homoskedaslicilY. conditional on the regressors. hClerosked asfid t~ · and aulocorreilitionconsistent (HAC) slandard C!rron . The fifth assumption migh t not hold in some applications.then thai omi tt ed factor would be corrclated. pie. Assumption #5 can be lyJt~' .2 provides a mathematical explanation of why tht' u ~ual standard errors arc nOl valid if the regression errors are autocorrclaled.tP""iI" vU~~. If Ihe errors are aUlOcorrelated. Tht standard errllI" . then the errors It. Autocorrd a tion is 3n essential and pervasive feature of time se ries data and is dbcussed in detail in Part IV.. ln a rCJlr. then lIu is uneorre latcd [rom \car 10 year.> in Mjnnesota tend 10 follow in !\uccession.It' Y"""' . if tbe errors are hctcroskedast ic.
r /tPe .5 % level: According 10 this esti mate.timate is sta I~tic. and a ~ tat l... the cocfftCtenron the real beer tax io. increasi ng I:k: ~r I.(Jmilting state economic conditions also cou ld frt.ter. OlS reported in colum n (3) [reported prev iou.10.. .ly as Equation ( 10.66). because vehicle use depends in part on whether dri· vcrs have jobs and becuusc tax changes can reneel economic condilions (a l)tale budget deficit can lead to tax hikl!~)..~.1. Lill ie eh ungl. and 9: Each colum n . I • .. which includes stale fixed dfecis. 10. ~ srant' lbis is done by estimating pand d~lIa regrcs!l. darderror..~ ' in their punishments for drun k d riving.·s when ti me effects are added. suggesls Ih<lt the positive coefficient in rcgrc<. we exte nd the preceding. 1 presents result s for the 01. cven in regress ion ~ with stale and tI. the cluste r consists of the observatio ns for the: . The formal of the table is Ihe same as that of th e ta bles of r e gr es~ i on re sults in C haptt:rs 7.. F~t."2..15)].O.. the state fix ed effects accoun t for II large amount of the variation in Ihe data._ resenling other drunk drh ing 13\\5 and state economic conditions.lered standard erro.lme entity at all times I I... In lh~ cJ ntcxt of au!ocorrdatio n in pA nel dOi ta. ~ .~ are ~ummarized in Table 10. hollJingcconomic conditjons con A _ ..21 and (10.. .21)].posiril'e (0.! th:1I cr:'leks down on drunk dri ? \ ing could do ~ across the hoard by toughening laws a5 wdl as ra ising (axt:!). ~cau"c they allow the errors to he correlated within a duo.:rror.6 Drunk Driving Laws and Traffic Deaths A lco hol taxes arc on ly one way 10 discourage drink ing lind driving.cd~ In this section..36) and the column (1) t!<... . . 8..al. <1.lhe regressio n in column (2) Ireported previousJ) as Equation (10. Column ( 1) in Ta ble 10. States differ ~.311.tmitting these laws could produce omitted variable biAS in the O LS estimator of • ~~ h e cffeci of real beer taxes on tra ffic fa lalilies. . I : When lInjscorrel hued over time. analysis to st udy the cffcct on traffic ~fatalitiesordrinKing law~ (im..'\ion (1) is the resuh of omit ted \'ariable bias (the coeflJdcnl on thl! rea l beer tax i~ ..\i. or grouping. If M). lbe regression"R! jumps from 0. f r l.'luding beer taxe~J.(3) are e~ £IL.J .L!reporlS a differen~ r~grcssion and each row r~porlS a ~OCfficient e~lilllale a~d slan .n. but assume that Ihl. .AS time fixed effects.\i not in the ~amc duster.6 Drunk Driving lows and Traffic Deaths 367 presenlcd in Appendb: 10. whiCh are one t}PC of HAC ~tandard error<.P'll rcsuh in omi lled variable bias. or othcr IIlformatlon aboUt the rcgres.090 10 OJ*l9 when fixed cffecls arc included: evidently.ally Signi ficantly diffe rent from zero at the . In addition. clu~tercd standard errors should I"lc used... ~..!} arc uncorrelated [or .11\e resu lts in columns (1). arc calltfd clu..loll.es inaeoses traffic fata lilies! Howe\'er.ions that include regressors rep ~.fIW'~: The result.sttcand pvalue.. As in the cro~ sectional regressions for 1982 and l~AA rEquation!l (10..S regression of the fatal ity rate on th t! real beer tax without Siale and ti me fi xeJ effecls...
r::""T"" [ 08 ~~ d2. • • .~S C'''  ':: >o~r.: % • • E ~ c o " ~ • o ~..~ • v :. r c • o .
capita (using the logarithm of income permit! the coefficient to be interpreted in terms of percentage changes oi income: see Section 8.000. This estimated e ffect is la rge: Because the average fat. along with ti me and slate effects. a reduction of O. Ooe way to ~vtlluatc th c magnitude of the coeffi cie nt is to imag· ine a state with an average real beer tax uoubling it!. and 20 (so the omined group is a .this en tails increasing the tax by SO. either mandatory jail lime or mandatory community service (the omitted group is less severe punishment).0 1).22. tax: because the average rcal beer tax in these data is approximatcly $0.o. TIle next three regression s in Table 10. and so forthbeing important determinants of thl.50 = (0.50Icase. TIlt! base specifica tion.45 x 0. 1. with a pvalue of 0. population density.10.ality rate is 2 per 10. general road conditions.96 x 0.45 X 0. the estimates arc small in mag / .1 include additional polential delermi millts of falality rates.e. The estimated coefficient (0. the 95% confidence interval for this effect is .0.23 dCilih pcr 10. .50 /case. the unemployment ratc. This wide 95% confidence interval includes values of the true effect that arc very nearly zero. rep resented by three binary variables for a minimum legal drinking age of 18.0.000.lega\ drinking age of21 or higher).50 = 0.! variation in traffic fatalities across 'Hates. 'illesec ond set of legal variables is punishmen~ associated with the first conviction for driving under the influenceof alcohol.egression in column (3).50 ::!: J. The joint hypothesis that the coefficients on the minimum legal drink ing age variahl es are zero cannot be rejected I I the 10% significance Icvcl:The Fstatistic test ing the joint hypothesis thm the threc coc(ficie nts are zero is 0.i. 19. incl udes two SCIS of legal variables relaled to drunk driving plus variables that control for the amount of driving and overall stale eco nomic condilions. [neluding theilddili~.696.45) continues to be negative and statistically significant at the 5% signif icance level.77 per 10.22 x 0.2).48."al variables reduces the e$fimated coefficient on the real beer lax.000.m. 2.2 ~ cor responds to decreasing the fatality rate to 1. TIlC regressio n in column (4) has four interesting results.and the logarithm of real pmdollars) personal income . attitudes toward drinking and driving. The three mea sures o[drivi ng and economic conditions ltre average vehiclemiles per driver.44. the esti mate is quile imprecise: Because the standard error on this coefficient is 0. The minimum legal drinking age is estimated to have very lillie effect o n traf fic fatalities. According to the estimate in column (4) . relativc to the r. the effect of a $0. This said.50 increase (in 19M dollars) in the beer tax is a decrease in the expected fat ality rate by 0. Mo reover. The firs t set of variables is the minimum legal drinking age.6 cOnSis[~Dt Drunk Driving laws and Traffic Deaths 369 " with tbe omitted (ixed factors.historical Bnd cultural factors.imum. repo n ed in co lum n (4)..
4.OOO. ('he other codficlents: the sensitIvity oflhe estimated beer tax coeffiCIen t 10 Indud ing the econom ic varillbles." Columns (5) and (6) aCTable 10 .'Simiiarly.} uf lhe results to u::. ~H/IJ~crea.v/ r '..peclllc.:.nificanll~ diffcr~nt from 1ero al th~ l O'li'..5 and Appe ndix. Th. ..A ~{0 . perhaps ncc.lln In [ the ba'>C ::. _p [k. High unemployme ll t ralcs a rc associa ted with rc\O.. lOdica tes that the e<:onomlc variables should rem.10.er fa talities: An incrca\c in the unc. canee Ic\el (the Fstatistic is 0. n .U~SSIOn in'column (6) examines the sen~itiv.llaJ.equcntly. TIle regression III 1:01· ~*".lOg a l...1 is Ihc reg.ltlon. ll1e estimated cocftJdl'nl~ ~] ..'s_ .000 (see Ca!>e I in Key Conccpt t! 2..hUercnt functional fonn for the drinking age (replacing the thr~C indicator variables ""ilh.thc .2':1 ). The economic variables ha\e considerable explanatory power ror traffic r.drinking age il::..lImal cu to' reduce traffic (atnlittcs by 0.02S deatb pcrlOJXXllhan a "t<. hut wit h ~Iu" ~ /.1 % gmlu. .thc conclusion from regression (4) that the coefficients 011 V. holding the other factors in the rcgn.'01 ~_ .'d with tbe sta tistica l signi fica nce of thc H~f. ~:IM . a sto le wit h a minimum legal drinking age of IS is C\l j. fie fat ulitics.e m the . fOr inl~r prclatioll o r th is coeffici ent) . pun· \/ ish menl \ariahk!>.:>:timateU effcct of the real beer tax.. mated to have a fatalilY rate higher byO. ~ ~~ · ' 1111':: finol column in Tablc 10.: res ult IS an .mpioymeDt rate by one percentage point is e<.litions are associated with higher fata lities.<llSI death per 10. J J ..:ancc Icvel '(the F·statlstlc IS 38.df) and combining the two bina t.lIC \\Jlh aminimum legal drinking age of 21. com binl. ities. umn(4).. (4) and (7) art: the same. "Ion constant.1 repon regressions th ai chec!::: the Se n"I\I\' ~ I /J I . in 1. hen income is high.. ficicnts on those variablcs. The rc::.of O.ludc. llle I ~ustered standard errOfS in c()iumn (1) arc larger than (he standard crron. The coeffitknls on the firsroffcnse punishmcOl "ariublc~ are also cslima\qJ 10 be small and jointl~ insig.Therel!l. Jy of these conclusions to changes 111 the base specifica ti on.. good ceo· nomic coO(. rcrcd standard ~rrors thut allow foraulocorrelati~n of thc error term wilhin an entity. . For cltamplc.1)63 death ~r IO.V~ f.[ 370 CHAPTER 10 Regression with Ponel Dolo n.. .>e of im:rcused tfaffic density when theunt:mployment mle is low or greater alco hoi con!>umption .2.r~ t.81. The two economic vari ables art: jOin tly '\ignificMt at Ihe0.ults 01 regrc~ion l4rare norscnSiti\'e to these chan!:\~:>. but no appr~ci<l blc chan 1!c In ~r.:au.17). as d~ussed in Section 10.. high vulues 01 r~al per capita income are associated \\.. ("()n::..ression ofcoTumn ( 4). i?coilimm. so a I ~ increase in real per capilli income is associalCU wilh sin increHsc 10 I!"df. the only difference is the slandn rd errof".."'gnui.ith high fatalities: The coefficient i!> 1. 3.~u mn (5) drops the \ar iabks thai control for economic cond~tlons. A ccording to these esti mat es.
"'(:onomlO of akvbot mOR !lC~rall~. I. 11~llh" <. the re"lta x o n beer.lVl!r lime (like culluml itlliIUd~:. as mcasurcd by the real tax o n beer.: Ruhm . .u~. hul. 0...". pun ang~ In contrast. In Io::i mmg JnOn' arout drunl dri\ on" .. io.. A subtler po<..ociated \\ ith puhlic eduC<lltoll campaigns.ing :llcohol taxes.. wider than Ihe interval [rom column (4 )_ ( ..'" s ugg. the 95% con fidence interval for the e ffec t on fatalili e!> of II cha nge in the beer tax using the IIAC ~ I andard error.7 Conclusion l b is cha pte r showe d how multiple observations ove r time on the same entily can ! h clm in an Icien l " .. is imprecisely estimatcd .QpI.. .: in co l nlS on be used to control for unobserved omilled \ariables thnt differ across entitics but : u ~ou lire mlcreqcd in seeing funher anatY\lI' of Ih~ da ra.me Fstatislics in column (7) arc smaller than thost! in column (4).f so.valuc~_ One substantive difference between columns (4) and (7) arises because tht: IIAC standard error on the beer tax cod ficient i!l larger than the s tandard error in column (4)_ Consequently. but there ure no qualitative differences in tht! \WO set!> of J"statislics and p .<o:t:."1.~~) ....~u . One potential source of o mi tt ed variable bias is that thc measure of akohol taxes used here.so traf· for ceo I. Ibese resu lts present a provocativ'e picture of mca~urc<.rs..ik. ti· ith '\'!s drun k driving laws and legal d rinking age. there is somc cvidence thatlllcrcao. ~ ~ ..e O~I g :md traffic fatalities.!. and the interval computed u.10.oJ)..89.<...ests interprding the results as pertaining more broadly than juo.limalet\..e of V ."C JntcrCloh:d IOd .. neither stiff punishmen ts in tbe minimum legal drinking <lge haH' important cffccts on f:1lal itie~_ r~ cod· ain in oflhc three . ( .1. T11.ing) OLdo....20).. an: not stuti:..intly siti.. chan~e:. does reduce traffic death..t 10 beer. .si hihty IS that hlk~ III the real beer tax could be a<.. ~ The Mrengtb of this analysis i~ that including Mate nnd time fixed effects mit igates thi:: threat of omitted v'ariable bia\ ari'1Ilg lrom unobserved "ariables that either do not change I.O. pcrhaps in response to political pressure.. could move with other alcohol taxes: th.." col i!lan gein dud· . ard drinking and dri .09. however. bm\c"cr....ing the II AC standard error includes L~ro_ led tfi· tal· An d 10 real I. .not \ary across "'ah~s (like ":)lJfdy innovationsJ...~ 10.0.. it is import ant to think about possible threats to validity. to COll\rol drunk dri nor incrt! ase ~ aleo· . Thc magnitude of Ihis effect... in the rcal heer ta\": could pick up the effect of a broader campai~n to red ucc drunk driving. According 10 thc~c c:.tically significant also obtai ns u~ing the HAC standard errors in column (7)._ M. to. As always..7 Coodusion 371 .
the topic of Cha pter 12. Whe n th ere <Ire two time periods. you need panel data . then I!. panel da ta methods requi re panel data . Entity fix ed effects regression can be estimated by induding bina ry variable" for /I . A twist on the fixe d effects regression model is 10 include lime l'i xed e ffccl~ which conlrol for unobserved variables thai change over lime but (lrc con. tiple regression mod el of Part 11 can be extended 10 include a full se t o( entity binary va riables. entity and lime fix ed effects regressio n cunnot control for o milled varia bles that vary hOlh across entities ami over time. fixed effect regression ca n be estimateo hy a " before and after" regression of the change in Y from the fir st pe ri od to the ~cc · ond on the cha nge in X. Th us there remains a need for a me thcxi that call eliminate the influe nce of un observed omit· ted variables when panel data methods cannot do the job.)1 lie elsewhe re.1 lime periods.j'mmar y r / J/ I . a nd so forthwhere eac h e ntity is observed a t two or more time periods (11.I e ntities. which can be c ~t i m LlIcd by OLS. Despite T hese virtues. Regressio n with entity fixed effects controls (or unobserved variables Ih. pl u~ the: }(' 5 and an inte rce pt. t¥>viously. ~~e'. Time fix ed effects control for unobserved variables Ihal <Ire the same acros> . If cul tural a ttitudes toward drink ing a nd driving do no t change appreciably over seven years within a slale.:111 1' ties bUI vary ove r lime.\:pld na lions for c ha nges in the traffic fatality ralC over those se\'cn years mo. Both en.. . The key insight is Ihal if the uno bscncd variable docs O(lt change over time.1 entities. .t jty and time fixed effects can be included in the regression to control for variables that vary across entities but are constant over ti me and for variablesI hat vary . To e xploi t th is insight.lI diffe r froO) one e nt ity to lhe ne xt but remain constant over time. you need data in wh ich the same enl ilY is observed cli two or more time periods. With panel da ta. this is the fixed effects regression model.hanges in the dependent variable must he due In influ_ e nces Olher than these fixed characteris tics. PC('lp1e. then any c. which often are not available. over time but are constant across en lilies.I 372 CH APTER 10 Regrenion with Ponel Doto g»?~ V<""~ gtJI. firms. A regression with time a nd e ntity fix ed effects ca n be estimated by incl uding binary variables for 11 .¢ are cons tant over li me. tha t is. 2.A powerful and general method for doing so is instrumental va riables regressio n.stutes. plus the observable independent va riables (the X's) and an in lt:r· cepl . the mul. 3. 4. stant acro~s entities. Panel data consist of obser vations on multiple (n) entities. And. 5. 6. bina ry variables fo r T .
Exercises 373 Key Terms panel data (350) on lanced p.2 does i refer 10? Wha t dOel..mel (350) unbalanced pa ne l (351 ) fixed effects regression model (357) entily fixe d effects (357) time fi xed effects regression model (362) time fi xed effects (362) e ntity and time fixed effects regression mode l (362) a Uiocorrelatcd (366) seria lly corre lated (366) h e tc rosk cda~" icilY· a nd a utocorrelation consistent (HAC) standard e rrors (366) clustered sta nd<ITd errors (367) Review the Concepts 10.. C1n yo u tbink of e xamples ofti. e ducation..trchcr is inte rested in Ihe e ((ecI of education o n e arni ngs. ge nde r. and age. Usc the results in col umn (4) 10 predict the number of lives Ihat would be saved over the next year. to describe pane! data'! What 10. New Jer.. 1 W hy is it necessary to u.i and I.!ffc:c1S in a panel da1a regression? 10.. G ive so me exa mples of unobserved personspecific varia hies tha t are cor re lated with bot h education anJ earning'$. N. ' Ille rese. Exercises 10...2 be used to csti mme the effect of gender on an individ ua l 's earn ings? Can that regres · sion be USed to esti mate th e effect of th e Illtlional unemployment rale on <I n individual's earni ngs? Explain .1 million people.3 Om the regres!>ion that you suggl!slc d in response 10 q uestion 10.ey has a populat io n of 8. er. 1 refer to? A researc her is using a panel dala sct On n = 1000 workers over T = LO years (from 1996 to 20(5) tha t contains the worke rs' earnings. Suppose that New Jer!>Cy increased (he tax o n a case of beer by $1 (in $l988). .1..e two subscripts.1 'Illis question re fers to the drunk d riving panel dal<l regression !>ummarized in Tal:llc 10.m e specific vanabk:s tha t might be correlmed with education and earnings? How would you control fo r Ihe!>e p e rsollspeeific and tim e !>pecific I. Conslruet a 95% confidence interval for you r an~ .
list to the e mpiric. Should time e ffects be include d in the regression? Wh)' o r why not? c.) 10. + ". Enlity J in time period 3? . It.4 Using the regression in Eq uation ( 10. a nd Xop as a perfect li nea r func tio n of the others. Entity 3 in time pe riod I? d.(" On. H ow wou ld you test th i!'. D Ii: that is. in the next yea r. change in the number of traffic fatalitie!'. what is the slope and intercept for a. where XQ. S uppose tbat 1/ = 3.. Apply thi!'.ll ) . that is. 10.t = I (or all i. Entity 1 io time period I? b. Suppose that real income per capita in New Jersey increases by 1% in the nex t year.2 Conside r the binary variable version o f the fixed effects mode l in Equation ( IO. d. A researcher conjectures that the une mploymen t nHe hm. c. The d rinking age in New Je rsey is 21. S how lhal the binary regressors and the "con· stant " regressor are perfectly multicollinear.. e red its d rinking age to l K Use the results in column (4) 10 predict tht.. Su ppose tha t New Je rsey low. Construct a 90% cont i. Show lhe result in (a) for genera l n .6 and the re by draw conclu sion s about its internal validity.3 Section 9 . dence interva l for your answer. The es timate of the coefficient on bee r tax in column (5) i$ significant at the l % leveJ. b. c. except with an additional regressor. f.The estimate in cohJmn (4) is significant at the 5% level. Construct a 95% confidence ioterval [o r your a nswer. a different effect on traffic fatalities in the weste rn states than in the o the r ~tate~. Does this mean t ha t the estim ate in (5) is more reliable? f. Entit y I in time period 3? c. expre ss one of tbe va ria bles 0 1" 02. let Yll ::: f30 + (31)(" + ylD l i + 'Y202.2 gave a list of five pOlelllialthreats to the interna l \lalidit~ l) ( a reg ress\on study..374 CHAPTER 10 lI:ogreuionwi!hPoneIDoIo b. + . Use the results in column (4) to predict the change In the nwnber o( traffic fa talities in the next year. hypo thesis"? (Be specific aboulthe specifica Lion of the regress ion and th~ statistica l test you WOuld use.. What will happe n if you try 10 estimate the coefficients of the regres sion by OLS? 10. D3i . + u"."ll a na lysis in Sec tion 10. 11 ).
PT)? 10.r Xu) CroOl the linear pane l data mode l Yj . 0 2. = = .1 . + ilu' ) 10.6 Suppose that the fixed eUeels regression ass umptions from Section 10. and so fOrlh . + 11/ model also can be written as 3 75 + un' This Y!/ = {JrJ + {JIXIJI + °2 82. b.. * 10.7 A researcher beLie\'cs that traffic fataliti es inc rease when roads are icy. ) = 0 for I s in E qualion (10.. What happens if Assum ptio n 115 is not trul!? 10•.. 1 = I ..s small (say. Com· men t o n the (ollowing me thods designed to estim ate the d fect oCsnow o n fa ta lities: a.. T.. = 1 if f 2 and 0 othenvise. Show that cov(V.00 with T fi xed? (Hin t: Analyze the model with no X's: Y . whe re 8 2 . (I re the fi xed e nti ty effects.u" .. + 51 STI + "Y2D2.5 Considerthc model with a single regres...Vi.10 a. are approximately normally dis trib· uted? Wby or why not? (Hint: Analyze the model Y II = a .N. The researcher collects data o n the ave rage snowfall (o r eac h state a nd adds this reg ressor (A vaogeSIIQlVj) to the regressio ns given in Table 10. . . If 11 is large (say. union status. do you think tbat the estimated values of 0". H ow would you estimate /31 ? 10..Exercises 10. a" .5 are sa tisfied. + y"Dn l + lI il . _ on )'2" '" y. so thai Slales with more snow will have more fata lities th an o the r states.8 Conside r o bserva tio ns (Y..5 is im portant fo r fixed e ffects regression. + .or YII = J3I X IJ/ + a. T = 4)..co n· sistently es timated as n i. + Air is an unobserved individua l·specific time tre nd.9 Exp la in why Assumption #5 gjve n in Section to..11 rn a study of the effect on earning~ of ed uca tion using panel dala on annual ea rni ngs for a large number of worke r". 8 2_. i= I •. The researc he r collects data o n (he snowfa ll in each s tale fo r each year in the sample (SIlOlV u) and adds this regressor to the regressio ns.) related to the coeffi cients (al" . whe re Ot. In the fixed effects reg ression model . 1 if i = 2 and 0 otherwise. = X i/{JI + a j + A) + ifu. .28). + Ujr') b.. 10.. and the worker's earnings in the . education../ = for 0 . n = 2(00) but Ti. . + .. . H ow are the coefficients (13 0. a researcher regresses earnings in a given year on age.. . ai.
as measured by stat istical significance? As measured by Ihe '"realworld'" signifi cance of the estimated coefficient ? iii. Proponcnts argue tbat.>0""" the 'More Gu~ Less Crime' HypoIhc$I'.: J HI hi' paper WIth Ian ' \\Tel. education. and pm 1029.) Empirical Exercises EIO. and Ihil l could Cllllse omi tted variabl!: bias in regression (2 ). In this exercise. "ShOOHIli 1. for the years 1977.lm. Does adding the con lrol variables in regression (2) change t he <. Will this regre'>sion gi\lc reliable esti males of the effccts of the regressors (age.rip tion is given in Guns_Description. il more people carry concealed wea pons. pwlO<H. Is this I<lrgc or small in a " realworld" sense? ~'sti mal e Ii.3 76 CHAPTER 10 Regrenionwith Panel Dolo previous year using [v. pbl064..l S!JDlC u. stat cs have enacted laws that allow citizens to carry concCa kd weapons. Interprel the cocfficient on shall in regression (2). illcurc]fIle. available on the Web site..SIUII/rmll <I" R. m increase because of accide ntal or sponlaneous use of the wea pon. crime will decline because crimilhlb are deterred from attack ing olher people. you will analyze the effect of concea led weapons lawf> on violent crimes.:ed cffects regression. regression assumptions in Section 10. i.!gressian results is ma rc credible. st:\t~s.s. These laws are known 3S "shallissue" la\los because they in'it ruct local authorities to issue a concealed weapons permit to all applicants who arc citizens...·jo) against shall.II<'" 2003: "5. (Him: Check the fixed cff ect. P()P. Estimate (I) a regression of In(vio) against shall and (2) a regressIOn af In( . and why? 'Thc~ UUI3 "cre provided by Professor John Donohue or Sianfonl Uni\'e~IIY and "~r( u<. .. On the textbook Web sitc www.} A detailed dc. plus t he District of Columbia. and have not been convicted of a fd{l]1\ (some states have some add itional restrictio ns). Oppone nts argue lhat cri me .awbccomlSlock_watson you will hnLl a data file G uns that CO/Hains a halanced panel of data from 50 U. \\ hidl sct of rt.. tI9_ ' t312. b. 0 0 the re ~ ults change when you add fixe d state effects? If so. avgillc..S.o.. a. Suggest a variable that varies across sta tes but plausibly \nrics lil tl eor not at aJ1ove r tim e.!'!II' mil ted eHect of a shall·carry law in regressio n (1).5. union statu~.. density. are men ta lly compelCn l. and previous year's earnings) on earnings? Explain.
. wh:1I co n clu~ions would you draw abou t the: effects of conccilled\\ capon laws on these crime rmcs? £10.ve these 103""'5 arc in increasing seat hell U~ and reducing fatali ties..discuss Ihe size of tht: col'fficienl on sIU1se(l1:c. Based on your analysis.: data were provIded by Pro{~ Lrran Einav ofStllnfonJ Um\co. spcC'lf7().': ~~(4) IQK_&n . In this exercise you will in\cstigate how effecti. "f .. available on the Web site. In(llI(:ome). Using the results in (e). <. Is it la rge? Small? How many lives \~ould he saved if seat be lt use increased fro m 52% to 90'10? f.comJsloclC YI·ubo n }OU will find a data file Selllbclts that contains a panel of dilla from 50 U.aYl·bc.t the driver if the officer observes..!ffccts'. a. wha t arc: the 1110"t importan t remaining threat<.~ A detailed description is give n in Seatbelb_Dcscri ption . e.I seat belt : "secondary" enforcement means that a police officer can write a ticket if an occupant is not wearing it sell helt."cmUII/lief and SfaflSlICS 200. which set of regre~ion results is more credible. and wh) "! d.. 10 thc in t ~rna l validi ty of this regresc. Do the results change when ). or (c).ou add time fixed errects plus sta te fixed effects? d.lhc federal government ha~ ~ ncouragc d states to institule mandalor~ scat hcltlaws to reduce the number of fatalities and serious injuries.Empirical o:S'Cises 377 c. (b) . R epeat the analysis using In(rob) and In(m"r) in place of In(vio). Through various spending polici~s. ··11n: Effccts of Mandalorv Seal Bcll U.is most reliable'! Explai n why. In your view.rl\· and "'crc IIscd In hIS po'l!)(r with Alma Cohen. for the years 198.S. ha08. hut must havc another '1'11'::<. 111ere are IwO ways that manda tory seatbelt law" are enforced: "Pri ma ry" en forceme nt means that a police officer can slap a car and ticke.ion suggest that increased seal belt usc reduces fatali ties? h.".ion ill1al)'si~'! r.1Il OC(:upant not ~earing ."~ on T>mmg HchJ\ ror Hnd Truffic nnal! lI~~ Tit.' I.ou add slate rixcd effccts? Provide an intuitive explanation for why the rc~u l ts changed. H..'11)97. plus the D iSlri ct of Co lu mbia.pud65. e.0. Estim ate the effed of scat bell usc on fa tal it ies by regressing Fa/afifyRme on sb_lI ~eage.. c. Slales. and age.2 Traffic crashes are Ihe leading causc of dcalh for American~ bCI\\l!cn Ihe ages of 5 and 32. Do the results change when ). Do the results change when you tltJd rixed tinH. On the text book Web site l\ wl\. drill knge21. Which re gress ion spccification(a ).' If so. DOI!s lhe estimated regress..
1 dc.. In(illcume). Personal incomt' was obtained from the u. TIle tT{lffic fa tality rate is the number of traffic d ea\ h ~ in a given state a given year.378 CHAP TEfI..rnbc Ihe SI. 1 are binary va ria bles indicating . The drin king age variables in Table 10. huOS.000 people \iving in tha t state in that year.. Traffic fatality datd obtained from the U.sis.~ morc gcn erally. The two binary punishment va nabks in Table 10. 10 Regreuioo with Ponel Data reason to ~ I Op the car.. Bu reau of Labor Statistics. .He 's minimum ~enlcncing rcquireme nts for an initial drun k dri ving con\·lctton: " Mandatory jail'!" equals I if the state requ ires jail time and equals 0 otherw l~c. APPENDIX 10.s. allhe University of North Carolina. ". stales (excluding Alaska and Hawllii). drinlwgc21.S. including fi :ted Sla le and time effects in the regressio n.ary is a binary variable for secolldary enforcement. and Ihe une mployme nt rale was obtained from the 1. State Traffic Fatality Data Set I~S2 In The (illia arc for the "lower 48" U. primary is a binary variable for primary e nforcement and second. annually for thro ugh 1988. R uhm (If {he: Department or EconomiC!'. and "Mandatory co mm uni ty service?" equals I i( the stale re qui re~ community se rvIce and aIS0 olherwise. speed65. Data on the total vehide miles traveled annually by Slate were obtained e(IU from the Department of Transport3tio n. Run a regression of sb _lIseage on primary. De partment ofTransportiition Fatal Accident Re porting \~ ere Sy~tem. per 10. hether the legal dri nking age i~ 18. In 2000. 111e~e data "ere graciously provided to us by ProCessor Christo pher 1.5. The beer tax is the tax on a case of beer.1 The . speed70.ec(mdur). Does pri mary enforce me nt lead to more seal belt use? What about secondary enforcemenl'! g. Ourea u o f Economic Anai}. and age. In the data set. New Jersey changed f rom secondary enforcement to primary e nforce ment. Estimate the number of lives saved per year b) making Ih i~ c ha nge.19.. or 20. . which is a measure of state alrohollaxe.s. .
dULIble ~ummalion i~ •• 1. r __ (10.. [hen the u~U<t l standard erfliT fonnula is innppropri alc.·· + '\ ~r) . ~ub '1h.XI' Y." L LX") .s~ion or Equ(llion (10. .~ . "(' LLX... First. The Asymptotic Distribution of the Fixed Effects Estimator Th\! fixed effect~e~timntoTof fJ. X lr) .. If " II is <1utocorrelateu.\'.~ X'l+ (Xl! • .. ~recifica l1 y when Assumption 5 of Key Conc~p t 10. + x : r ) I' • (X~I + x~:.. . .14) in which Y" is rcgrc~~cd (In XII' ~here Y" := Y" . om: o\cr enlities (i '" I. I the C:Xlcn~KJn 10 double sLl~rirl' or 11 ~ingle $\lmmlltiun: = L (X. we focus on the ClIl>C in .. hich the numbe r of entit ks is large but the numbe r of time periods T IS small: mathcmaticalt y. . . I ... 1V.lven in this aprendh: are extended to multiple reg. .. T)5..Y. .and f1utocorrelalionconsi~ l ent (HAC) standard errors.2 Standard Errors for Fixed Effects Regression with Serially Correlated Errors TIns appendix provides formulas for "tandard errors for fixed cffcclS regres~ion when the crrors are scriallycorrc l<1tcd.. 'Ib is appendix explains why this is ~o and presents an al te rn:lIive form ula f(Jf standard errors thnt Me valid if there i~ bClt' roskcdasticity [lntl/or autocorrelationthat i). = XII .re~<..3 of the 5. conditional on the X's..~c t iona l da ta. li nd Xi = rl l..: . f31 = ..()rr._tX".n) and one over lime pcriod~ (/ '" 1•../) + XI!" . + ..:ancd fixed cUects regression \Vith a singlc Tegres wr X.:1 SIondord Errors lor Fixed Effects Regression with SerioIty ~ Errors 379 APPENDIX 10.7) by IWO SUnHll1l1 io ns.t 0.. <.1mpling distribution of the OLS estima1<lr . it" CTQS5. in Exercisc IIU5..*1 L L X~ I~I The deri\'ation of tho: sampling distribulion of PI parallels the derivation in Appendix 4." + X u + .. The fommlas g. Throughout. h Vu in Equation (4 7) and by Tepkld ng the single SUIll Ill:l l i(l1l in Eq u lIlion (4. fo r heteroskeda~ticity..22) LL X. 1 ..this oorrcs ponds to treating II as increa~ ing to infim ty \\ hile T rcm~m~ fi "<ed. ...c OLS estimator I~ Y b~ obtai ned by replacing X i  X b) X" and Y. This appendix consider~ emitydcm. Xi. = T I '2.3 does not hold. The fo rmula for Lh. is the OLS estimator obtained using the entitydemea ned regre. ~'..
_\/7 fi £" ~~ "" Qx II (1 0. From Eq ulllio n (lO. b /3 I I  . X.IX" U " "" I .II" Ox Under the first fo ur assumptions o f Key Concept 10.. = £7' ..O'~) for /I large.1 I:_IX~. Qx L./3. V ir).~ Q r 7 ( 1U.X! Also.II)...r.'. thell rearrange the result 10 o btain y..25).::!1) Under Assumption S of Key Concepl 1O. v nT{fJ .380 (H A PUR 10 Regrenion with Ponel Dolo stilUte = fJI X" + u " [Equallon (l0. J~ the lotal nu mbe r of observations. '" }:. alor by T POV J Nex t.24) that..{).24). ~rrI7"C 'T11 is distri buted N(O. the va ri ance of t he largesa mple dislribu tion of PI is ± ( lU•.26) t herefore can be wrinen !l~ the sum of va ria nce!\.3.lhc expression fo r u~ in Etjua!ion (10.4.. II .divide the num~r. \'ar(V) + 200v(U.!.r=' left side by X. by the cen tral lim it Iheorem.1Q1) whe re (large II).. ~.) '" va r(fir.. ... a nd no te that :f.II.14)1 inlo the numeralOr of Equation (10. .(X"  W. p lus covariances: .IIT.' ~ II II· r I LLX)iit _ ..26) var(J31) .23) by nT. mulliply Ihe _ T .. .... The variance of1hc sum in Equalion (10. . whe rC: 1]._. The n v.3. ~ L: L:X.1.. unde r AUumplions 1. fo r two random . dJ"ide the de nominator of the figh l side of Equation (10. Re"alt tha t.. tne variance (If '1i' II follows [rom Equation (10. (1025) IT! = var(q. _I X"uj.fl1" ~ .21) where lill := and = n~ 1:. The scaling factor in Eq ua lion (lO.) is dis tribuled N(U. where fJ"~ IS VJ(iJl . 0.. Q. var(U + V) '" \·ar(lf) .•• X. a riables U and V.22).) = fl· V~" ~".':6) simplifies.) ]II.
fo r diffen:nt ti me periods but the same entity). in general./ ~ .27) ion (10. but are assumed to be uncorrelaled ac ross clusters..Vn)l. For the clustered va riance est imator in Equation ( 10.ith Serially C~ted Errors II .7 _I...26) cise 18. where the errors can be correlaled wi thin the cluste r (he re. (10.1 2'>")'.]) + v:n (va..) + .29) to be reliable.. 1 .28) nrc zero (Exe rcise 10. as n . Q (10. Duflo.I o:T~.28) f1var(V. The usual heleroskedas· ticityrobu~t variance esti mator se ts these covariances to zero..T) (1 0. then the covariances in EtlulIlkm (10.ls . lncn va r ( slondord Errors for Fixed Effecb Regression ..d~$Imd is a consiSlenteslilnalor of and T is a fixed COI1 SIIlOl . so if II i) is au tocorre lated the usual heterosi::edasticityrobust variance estimator does no t consiste ntly estimate (10... U nde r AS$um plion 5../ = Xi/u" a nd u" is the res idual from the OLS fix."2 _ _ I + .25) SEd).. . 11'1 might be correlated across entities..)  11T O'. thc erro rs a re uncorrela tcd across tJ me periods..1 5): tha t is.. T n some cases..:. Because the o mi tted factors thai en ter the e rro r te rm could have .C!~~rmd (clustered standard error). in a stud y of earnings. C\' CD u. that the sampl ing sche me selects f(lmilies by simple random The n tracks all siblings within 1I family.24).and au tocorrela tio n· consiste nt. For cmpirkal examples us ing H AC stltnda rd errors in economic panel data.) 381 ~ ) I ('i/I T... BertrAnd.23) + 2cov(v. Fo r example. the so·called clustered va ria nce estimator is valid eve n if I'll is condition· estimator i~ .30) x The cluste red varia nceesll mator if2 l). The clu~tered varianc~' u!.::d effects regression.~ " " = r\'ar = + ".. + va r(ii.1 2).] :V. nonzero.}' ~ fI . In contrast.10.28) are. . SCl' ( 10. This va riance cstimtltor is ca lled the clustered variance estimator because the e rrors a rc grouped into cluste rs o f o bservatio ns. the variance estimato r is hcte roskedastidty. + "ir ( 10.6).... But if U II is autocorrelated.24) ally autocorrelated .rI~j/""{ where 1 "(' ..) TIle clustered panel da ta standard e rrors are give n by (10. and Mullainathan (2004). ] . + 2oov(ii. . (Some soft "here O'~ i~ ware imple ments the clustered variance form ula with a degreesoffreedom adjustment.26) ') + vi"lr( V) 'j!!cn n~ the Standard Errors when SllPPOSC U it Is Correlated Across Entities ~a mpling..t <c if the re is heleroskedaslicily a nd/or autocorrelation (Exe r· ( 10. given the X'~so e numer· l Xirliu all the co\'arjances in Equation ( 10.29) X'ii· &... ) + ./lT. n sho uld be large.
<> ~ r~ ~ " ~ ~ ~ ~ i ~ .' \~ ~ ~ ~ ~ ~ ) ~ ~ 1 ~ ~ \. ~ o • • • w ~ u •  . h ~ M...." \ <0.. = .
.. fO r ~. should o ne chtck...crimin ation...rlng that whi les f.m payments take up most o r a ll of the applican t's monthly income...cd. comp.. wa lk into a bank and apply fo r a m ortgage.so<1 does not rcolly answer tbe questiun that in Ihl! mortgage market'? A start is to comp. Loans a re made and de nied for many Icgilimale reasons. ' ed'g. because tbe black a nd while applicants we re not necessarily p ....·/l41~U>uJ.':'" d. fA. _~ .J "identical but fo r their race:' Ima cad.. Also. identical but for their race.s w~o \~~rc denied a mOr)g~ge. /ia.can<s .f~: LpQ<H """w ~.P. huitJin•l(" Oilier ap(}lical1l c/wTl1cterist jcs CUIL\/llIlt. ~ Pi AJJe!~ opencd this chapter.. B ut how. the n a loan ollicer mig ht jusli(hlbly den) Ih e loan.. canJs i1 j ~.lnd minorities are treated dl(fercl1Ily." ~ pro posed lo. For exampk.bHe "pp l. I~}~le d~ta cxam!ncd in this chapter. ~..L. e ven lORn officers arc huma n a nd they ca n make ho nest milllakes. that is.conta ined in la'rge da ta se l ~ sho\l. Many studies of d illc rimimllio n thus look fo r sta tistical e\ide ncc of { fOc... If the It. r . ~uy C _. Bul whe ther Ihey actually do is a ma iler of great concern amo ng bank regulators. D oes the 44 bank treal them the sa me way'? Arc they Ool h e qually likely to have their mOrtgage applicatio n accepted? By law they must receive identical treat men I.nee of discriminatio n e~1 applica nt.lrc the fraction of minority and \\hite r. Butth.nial.)f ''''U : ~ ~t<.CHAP TE R 11 I'e ~ Ir~ fju.6 '~~ '1 di~rimination....? Regression with a !:::::~ Binary Dependent Variable~~ Wv. ~ Matis~eal c\:ide._~k evide nce ... MassachusellS.~ 1'~~/".. £0< " • 383 . area.ve I IM#rtf1j(I_ .~ '.l. precisely. ~ T wo people.~<e denjed mortg""" but o nly 9% of .. gn thered [rom mortgage applications in 1990 in the Boston . SO Ihc de nia l of a singJe minority applican t docs nOI prove a nything abo ut di<.... a large loan ~o thaI each can an identic(li house.n..IP. wc need a method for comparing rates of r A .Jik . ~ ~h 28% of black»ppl.
~} 'l Or"" . are discussed in Section 11 ... we regularly used binary variables as regre'\sors. TIle binary depende nt va riable conside red in this chapter is an example of a dependeot variable with a limited range. The twist is that the dependent variablewhether the applicant is denied. Models for o lhe r Iypes of limited depende nt variables.1 (0'. things are morc difficult: What does it mean to fi t a line to a dependent va riable that can take 00 only two values.l (f'" U . 10 Part II .t' ~~ ~.. Many o ther important questions also conce rn binary outcomes. In Section 11 .~ J) .. but with a twist. we apply these methods to the Boston mortgage application data set to see whether the re is evidence of racial bias in mo rtgage lending.3./e. 11... Section 11. Section 11 ..1 (N)U~ and they ca used no particular proble ms. r~~J!I PI A> ~ J . Whal I:> the dfet:t of a tuition subsidy on an individual's decision to go to college? \Vha l . dependent variables that take on mu ltiple discrete values. and it allows us to apply the multiple regression models fro m Part II {o binary dependent variabl es.. th~ method of ma.384 CHAPTU 11 Regrenion with 0 Binary Dependent Varioble Th i~ sounds like a job for multiple regression analysisand it is.. Binary Dependent Variables 'and the Linear Probability Model Whether a mortgage application is accepted or denied is one example of a binary variable. 0 and 1? The answe r to tbis question is to interpret Ihe regression runc:lioJl as a predicted probability. Th ese methods. it is a limited dependent l'anable.3.. Th is in terpreta tion is discussed in Section I I. in o ther words. • 11.... { ~U~ ! ~~d".1 goes over this "linear probability modeL·' But the predicted prohabili ty interpretation also suggests that alternative. J.. But whe n the depende nt variable is binary..1  .is bin~...\. imum likelihood estimation. ~ k~ p~~  .4. which is op tional.I. cal led " prohit" and ··togit·· regression. 1''</1 Y1I<.2. for exampk . al e surveyed in Appendi.. discusses the method used to estimate the coe[[icieots of the pro bit and logit regressions. nonlinear regression models can do a better job modeling these probabilit ies.
... O ne impo nanl pieee of infonnation 4 "[ is t he size o f the requi red loan paym ents Jc lath'l. and the ~ cont inuous variable PlI rmio . ole deny.k~ /4~. Iht: lin ear probability l {i i47't&~ >uuo. th e payment ·loincome ratio .v'. t f .' . Ali ~ anyone \I..' seems to show .our inC"Omc t ha n SO% ! We the refore begin by looking at the rela tionsrup be\\\ ccn two va riables: the binary depe nde nt va riable deny...m"y.f.4 art: denied. .1 is much easie r to make pavntcnts that ~ ~c 10% o f \. .3 b.md deny (the higher the PI! mtio.1 . ' t P.II. up s moking? What detecuunes whether a J raU"e fh'~r'< ~.p between deny and I t~'iiI ratio: Few applicant. relm. area in 1990. 7 .> sC3tt crplo t look') different th"'n the sc3ucrplots of Pan II because ~~ . The data are a subsct of a larger data sel compiled by researchers at the Federal Rescrve Ba nk o f Boston uml~ r the Home DisslmlJ[c Ast ( HMDA) . :f / MOrlnas . wbich equals 1. ~'I orlgage applications are complic8lcd. ~ the greatcr the fraction of deniu ls) is su mmarizcd in Figure ILl by the OL5 regres sion linc estimated using these 127 ob~ervations.:.. p application denkd.t' 14'!:f/.1 presents a sC811erplot o f tlen). ersus Pli rutio for 127 of the 2180 Dbscr\'arions in the data scI. and re late to mOrlgage a pplicatio ns fi led in the Boston.lhe teenager docs or does not take up smoking. this line plots thc pre dicted value o f tleny as a function o r the regressor. ~ ':70  . wh ich b the ratio of the a pplica nt's anticipated 10lal 5 . the OUicomc of interest is hinary:The student does or does nOI go 10 college.lke. S. (Thl: scatterplot is casier to read using th is subset of ~he datu . a coun try does or docs nOl rccej \"c foreign aid. As usual..obi..s b.. The 101111 officer must forecast whethe r Ihe . .j I ir the mortgage applicatio n was de nied and equals 0 if it was acce ptcd ."'. but most upplicants with a paymenttomcomc ratio exce!i. ~. . \\ ilh a paymenttoincome ratio ~ss than 0. ho has ~ rrowed 1ll0l1CV knO\:'rlo . I S Figure 11. I ' Illis positi ve relationship betwcen PII rario . Thc 130ston HM DA data are described in A ppcndix 11.1y Mode1+"!>ifK JvtA #<'1 country receives foreign aid? What determines whether a Lob applicant is sue ce~!)fur! In alllbe!>c examples. P<obob. Massachusen s. applicant will make his or her 1 01m payments.d rIvet{ ing..".. JJ J 'I ' .IX"..onsh.. and so is the proces!i by which the hank loan officer makes a dccision. l h en turns 10 the ~ i mpi cs i model 10 use with binary dependent va ri ubles.:ond the 'u".I J ~ f. tht! applicalll docs or does nol get a job. e their ... J~nninc:.I. whether a teenager 1...) Thil. mo nthly loa n pavmen ts to his or her month ly income.flo/ I .! to the applicant's income. . 0.1q b'/c.# t 'Binary Dependent Variables The applica tion examined in this chapter is whe ther race is a factor in denying a mortgllge applicat ion: the binary dependent variable is whether a mortgage appli cation is denied.'£' 1 ~..l .. This seclion discusses what distinguishes regression with a binary depende nt variable from regression with a cont in uous de pendent \'~riahle ..1. ~etH T r) tis .
and "probability mode l" because it models the probability thai the dependent variJPlc cq ulI ls I.8 0 .7 U.Thus. _ _ • • _ . the pred icted val ue of de ny is 0. .2 0) 0. if Y is a 01 binary..lth {l.20'1 i1. E(Y IX 1. when I'll r(ltio = 0..... does it mean for the predicted va lue of the binary variable demoto be 0.••••••••• . • X.2 II. if there were ma ny applicali on~ y. the probability of loan denial.20..R " RegrO$sion with 0 Binary Dependent Vorioble fiGURE 11 ... 1 Scorterplot of Mongoge Application Dental and the Paymentto·lncl Mortgage opplrconl1.5 U6 0. in our example.. .V I.t (or mean) is the probability that Y = I. [n short. £( YIX l' . . '/ U. the popul ation er~egress ion function is the expected value of Y given the regressors..3.. £(Y) = Pr(Y '" 1).. from Section 2. . Mortgage a~ 0.3 is estimated to be 20% . First. m ..... • •• • X~) =" Pr(Y = l IX I •./' ? .3. Second. when PI] ral... from Pan II . 1.. 1 X..2 . (~' ~ 4. .. For exampl e. . w ith a high rolio of debt payments 10 income (PII ratiol ore more d en y 1. But what. ~D ~ the probabil it y is conditional on X. thc predicted value f of 0..' PI! ratio P.20 is interpreted as meaning that.2 V..'2 1.4 0.O is 0... deny . precisely.and more generally 10 understa ndmg ~ cling the probabiliry that the dependent variable equals I.. ~ ::.= p.. )..'tP« ( IUt7 . Sa id differently.. that is._NPL.3. .• 0. 0 il approved) The lin· ear probability model U~ a Mroight line 10 model the probability of denial. then 20% of them would he denied. II I'l l rllIio "" 0. . given X. then its expected value I. In the re~rcs~ sia n context the expected va lue is conditional on the value of the regreSS(ll~.. This interpretation follows from two facts..tI~=1 regression with a binary dependent variableis to interpret th e regression a~ mod ~ The key 10 answe ring this q uestion.). • .0 ~ ••.0 U. condj· tional on the PI/ ratio . the probability of demal L.4 ttkely to have their oppficclion denied (dtony = 1 if denied.6 \Jnea( PtobabI'ity Mor!9* dfnltd ModtI IIA 0. c::d!? p nO L . Th us for a binary va riable.. .variable.' ~~ P a~? 0/. . •••• 0..386 CHAPTf. for a binary variable the predicted value fTOm th ~' popularion regression is the probability that Y = I. The lineur multiple regression model applied to a binary dependent vfln Jbl e is ca lled the lincar probability model: "l inear" because il is a straight lin e.
unless tbe gressors are also binary.r ' . given X. r" '~ 11 .. Ninetyfive percent confidence inter.e in X. . (. P~b.~ /J X~I 0 e tooll hat does no t ca • over is the R2..604PII ratio. nts can be teste using the Fstatlsllt seusse 10 Chapler 7: and IOterae Ions ). if the PI/ ratio Increases by 0.. Th Us._ f) de.. The OLS regressio n of th e binary P M l !. _ {. Because the dependen t variable Y is binary. Iv' n variables can be modeled usi the methods of Section 8.604 X 0. and the OLS estimator ffi l estimates the change in the probabiJ ity that }' = I associated with a unit change in X.. and !he U. PI[ rafio."" 0 •••".1. hypothcse~ concern ing several eoef fi .againsl the pa)'ment ·to·incom~ ratio.l4t I' 7 I ~ le.1). pl1 then the probability of denia l incrcases by 0. A jp.: vals can be formed as ± 1.J. fY~61:t "'*:"1t' ~ 387 f £R J. is the predicted probability that the dependent 1." '""0. pute the predicted change in the probability of denial. /J /} r' f J ff' ~ ~plicatjon to ~ 1d d~epcndc ~ nt van · .. given a change in the regres r.)~ ~0 ~ o f . that is.032) (0..8).11''1 e Boston HMDA data. the R2 is not a par ~ f"'~ ' ularly useful statisti he re.:scss ntialtbat hetcroskedasti . l. This coefficie nt can be used to com .96 standard errors...1) ". The population coefficient f3 1 on a regressor X is the cbQl!gp in 11ll' p ro bobjlic)' ~hM Y I associ tr.0 per I !L __ . 1. compuled using ~ . For cxample. l A lmost all of th e tools of Part II ca rryove r to th e linear probability model. E! 0.. Ir~yre likely to have their applicatio n denied. The coe fficie nl s C.3.". 1 8 : ~ y" _I 'I ~ variable equals 1. ~stimated Sing 311 2380 bservations in our data set is 1_i!~l'k 1.~ p. applicants with highcr debt payments as a frac tion of income are ". = Y . Accord ingly./th e estimated regression fUnc lion .fw. yrobust standard errors be used for inference. '. deny. B ecause the ~J:tors f thelinear probabili ty mod arc always heicroskedastic ( Exercise 11.'l a II 51" i. n i It ~ '.. Simil arly.~ Model ~ The Linear Probability Model Y ~~y. The linear probability model is the Ilame ror t he muhiple regression modiFot'"Pa~ II when the dependent variable is bi nary rather than conti nuous.. Y. . .. We re turn to measures of fi t in the next sectio n.060.O~ C.080 + O.t' ~~ .I. the OLS pred icted va lue... according to Equation (11.098) P ' ( 11 . This is impossible when the dependent van I I_ffble is binary.rted with a IInir chons.1Il be estimat ed by OLS. and the population coerfi· I cient is statistically significantly different from Icro at the I % level (the lstatistic . by 6. (0.1 centage points.. 1 8. The linea r prob ility model is summarized in Key Concept 1/.13).et w .~ 1/1 The estimated coefficient on PII ratio is positive. the popula tion regression function corresponds to the probabili ty thut the dependent variable equals one..•' ~w · 0. When the dependent variable is co IOUOU S. .. it is possible to ' agine a situation in which the R~ equals 1: All the (lala lie exactly on the rcgr ssion line.
holding con.. + . llle estimated linea r probability model i.089) "nl C coefficient on Mack.0.029) (0. f3A' TIle regression coefficients C an be est imated by OLS.stimate Equation ( 11.11 ).cd th. _ X~) = f3(J + f3I X j + {32X2 + ". '(k. a nd so forth for {32" " .091 + O.SSociatcd with a unit change in X I' holding constam the other regressors.3 and the predicled va lue from Eq uation ( 11.559PI} rar.. The estima ted linea r pro ba bili ty model in E quat ion ( 11 . is binary_ so thm Pr( Y .080 + 0.is is different than the proba bility of 20% based on the regn:ssion line in Figure ILL bcCOi USC that line was estima ted using only 127 III thl' 23SO obse rvations used to . according to th is linear pro babiji ly model.1 ) is .3 = 0.I XI' X l' .nJIJ ca nt al the t % h. J) ca n be used tll compute prell ieted de nia l proba bili lies as a funclion of the PII rmio. holding constant thl: 1'// ralio. For e xample.tant their pay mentIoincome ratio.388 CHAPUR 1 1 Regression with 0 Binary Dependent Variable • THE LINEAR PROBABILITY MODEL ~J 1 1.ry:::: . r rn. To estimale the effe ct of race.:vet ( th e Istatistic is 7. T f32Xb. we focus o n differences hetween black and \\ hilt' a pplican ls. if project ed dcbt paymen ts are 30% of a n a pplican t's income.) Whm is the effect of race on Ihe probability of den ial.101. the n the PI} rallU IS 0..o + O.i) with <l binary regressor that equals 1 if the applicant is I1lilek and equ:lls 0 if the applinmt i~ white. 'fllis coefl'icicnt I:' sif. : : (0.l77Mllck. 0. . + {J(X Ic The regression cocrficient {31 is the change in the probability tha t }' = 1 :l.. 177 _indicates that an A frican 'American appll. ( 11.! To keep th ings sim ple. hold ing constant the I'l l rmi() . a n I'I pplicam whose projected debt payments arc 30% o f income has a prob<lbility of to. 1 Yi = f1n + {Jrx l.1). a 17. and th e usual ( hcte roskcllasticityrobust) O LS s tandard errors cnll he used for confide nce inter vuls and hypothesis tests. "' ~ aug me~t Equaliorr(l t .7% higher probabil ity o f \l<lving a mortgage a pplicmion ~l'n..604 X 0.025) ( lUI (0. + fh x h + It.: Int ha.2) when: Y. .0. 1% that his or her aprli· ca tion \\ ill be den ied. ThaI is.lIl a \\ hile.
2 Probit and Logit Regression ( p .J..·T1te estimnted line representing tbe predicted prollahilities drops below ofor very low \alues of the PII rtlt. + ~tA1. p.ss~n:~ ~ 11.d J. so do many other factors../t f" Prohi l and logit 1 r~gre~sion arc nonlinear regression l1looeb spccificallv de~igncd ~.O'.\). .jti'V"'. Pro /I4I't~ bit cssion uses Ihe standard normaL c.robit regression with a single regressor.. ~ 11.V.. ICh as the applicant's carning potentia l and the individual's credit history.:.3. Look again al Fig ure 11._ u ulatlOn that forces (hc predIcted values to be between 0 and I. or PII n il/(}. TIms \\e must defer ~ny conclusions about discrimination in mortgage lendi ng un til we completc the more thorough analysis III Section 11. (114) . it makes sense to adopt a non lincar for ~JJCl /2. Although the payment toincome ratio plays a role in tbe loan officer's decision.~ able ~ models the probability th~t Y = \...>.ables'.on.Is ~ egress. '. uses thc "log. pl'eJ. we introduce new nonlinear models specifically designed for binary dependen t var. j " j: Probit Regression " / ""' P irdIX)" ~(. Logit regression. s pro uct! probabilities betwee n 0 (Section 2.1 ..t1 r (" C/". then their.~:£~'d{!t.o and exceeds 1 for high values.~rR7.. thiS estjmale suggests that (here might he racial bias in mortIt gage d('cision~ but such a conclusion would be premature. Becaust ptC~Umulati ve proba Islribul ion functions (e.c. This nonsensical fea lUre is an inevitable consequence of the linear regression. _.3) will cause onl ined variable bias. also cailed logb:fic ..f. The linearity that makes the li near probabilily model easy 10 use is also its major nnw.2 ProbitondtogitRegression 389 f. To address this problem. Ihey are used in logit and probit regrcssions.d.k.c" "dJ. for binary dcpendent \'ariables" BecauM= a regressIon "'i tll a hinary dependent vari 'O. omission from Equation (11...1'1l) ~~~~.' "d '0.!>O" black.! Bullh..SI. II any of these variables are correlated \\~lh the regres. f~ 'nle probit regression model wllb a stugk regressor X .rt? 0~ Thken literally.is is non sense: A probability cannot be il'ss than 0 o r greater than I.. ~"O"""d it I'r( Y ~ IIA1 ~ ">(/l..!. Shortcomings ofthe linear probability model.
an increase in X decreases the probability that Y = 1. 1IX).2 OS 0. What then is the prob ability of deni al if PII ratio"" 0.2 .0.:_~:~~ 0.tioo func tion 10 model the probability of denial given the paymentto income ratio or. and f31 "" 3.0. MOOgaqt'denltd I. then looking up tbe probability in the tail of the normal distribution to the kft of z . <t>( .4? According to Equation (11. The estj· .. Figu re 11.S P/I ratio 0. to model PriY . _ ________ __ • • • ____ _ Mortgao. the easiesl way to interpret a probit regr c~ sion is to plot the probahilities.0. equivalently. Unlike the linear probability model.2 + 3PI I ratio) =: <1>( . If /31 in Equa tion (11.2 + 3 X 0.•.::: . 0. According to the cumulative normal distribution table (A ppendix Table I).4..8 0. For example.::: $( . be done by fir st computing Ihe ·'zvalue.7 U. deny.2..2 0. which is 21. _ • • _ _ < 0. computed using the pro bit model wit h the coefficients f30 = . the term . the calculatiun in the previou s paragraph can...4). the probil conditionol probabilities are olways between 0 and 1. Thus. then an increase in X increases the prob.8) = Pr(Z ::.::: 2 + 3 X 0.8. suppose Y is the binary mort gag~ de nial variable. .4 = .2 Probil Model of the Probability of Denial. this proba bili t~ is ~ (Po + 13IPI l ratio) . andlorchanges in probabilities... __ .. Whe n lhere is just one regr. In the prohlt model..4) is positi ve. X is Iht' payme nlloincome ratio (PIf ralio)._ . Instead ." z = ~ + f3 \X .O .. un.. 1he predicted probability that the application will be denied is 21.2%.~)..:.l' bility that Y = 1. f30 = .6 0.8) = 21. That is. Beyond this. Given the P/I Ratio The probil model use the cumu lative normal distribl.3 OA " .. the coefficients are best interpreted indirectly by com' puting probabilirie:.0 · 0.0 _ ..390 CHAPTER 11  Regression with a Binary Dependent Variable FIGUR£ 11 . plays the TOle oC"z" in the cumula· tive standard normal distribution table in Appendi xTablc I . more generolly. deny 104 1.. if /31is negative.2 %..0 where <l> isthe cumulative standard normal distribut ion function (tabulated in AppeodixTablt! 1).l~ (lWIlW~ IL~_:..6 Probit Model 004 0.2 plots the estimated regression fu nction produced by the probit regrc~ion of deny on Pil ratio for the 127 observations in the sca lle rptOI.2 and f3t "" 3..pu + PI X.004 u . when the PI! ratio is O. • ..4) = (1)( . 0.8.2 %. it is not easy 10 interpret the probit coefficients (3'1 and f31directly. however. 1 0. sor.:_~~:::.2 1.
6 + 2 x 0.4 and X 2 = I is Pr(Y = ll Xl = 0. its condi tional expectation is the conditional probability that it equ a ls 1.4 an d X 2 = I.1 %. is . when the population regression function is a non lin ear fun ctio n of X.8o = 1.2 is Pr(deny :: 11 PIT rOlio = 0.6.n he b . the melhod of Key Con ce pt S. . then the zva lu e is z = . + (J. and. Probit regression with multiple regressors_ In aJl th e regressio n problems we have studied so far.81 = 2.e expected c hange in Yarisi ng from a change in X.. l yields the estima ted effect on Ihe probabililY that Y = I of a cbange in X.9%.0. X + ax.ul a ion  For example.4 + 0.1.5. the solution is 10 include the add it ional variable as a regresso r. When the PIT Talio is 0. then compute the dWerencc b!:lween the Iwo predicted values. compute the pre dicted value at the original value o f X using the estimated regression fu nc tion.2. leaving out a determina nt of Y that is corre lated with the included regressors results in om itte d va ria ble bias. For small values of the payment toincom e ratio.X. Ibe prahil popululion regression model with two regressors. According 10 this estima ted probh model. the probability that Y = 1 give n XI = 0.1 that. the proba bility of denial is sma ll .he ility ba thaI :s 131' :onJ .O.B 2 = 0. Xz = 1) = <fJ(. for ~pplicanls with high paymcn H oincome ratios.2) = 2. the esti ma ted p robability of denia l is 16.3.. tor example. The pro bit model wit h multiple regressors extends the si ngleregressor pro bit model by adding regressors to co mpute Ihe .X.3) = 38%. 1%.. (11.\i . Tn li nea r regressio n.: : value. is a lso the solutio n to o m itted vari.) = <I>(Po + p. This procedure is sum marized in Key Concept 8.. no matter how complicated the nonlinear model. ted ro . SO. the effect o n Y of a change in X is tb. Effect of 0 change in X . X.4. When Y is bi nary. If Xl = 0.11 . this method always works for computing predicted effects of a change in X.. compu te the predictcd value at the changed value of X. suppose. dlld when !he PJl ratIo IS 0.6. Accordingly. the de nial proba 'hilh)' is 98. When the Pi l TtI/io is 0.. When appl ied to the probi ! model. the probability oC den ia l increases sha rplv 10 SI. In general.).3. Xl and X 2.5 x 1 = .3%. tillS cxpected change is estimated in thre e steps: Firsl. the pro ba bi lity of denial is nearly I . Probit regressio n is no excep lio n. Thh.8)_ l ). for PI/ Ttlfio = 0. the estimated proba bility of de nial based o n the estimated probit function in fig ure 11.4. reS obit e.lit}' Prey = qx.2 Probit and legit Regre~ion 391 nUlled p robit regression function has 11 stre tch ed "S " shape: 11 is nearl y 0 and flal for small val ues of Pi l rlllio: it turns and incrcasc ~ for in termediate values: and it flatte ns ou t ag<lin ond is neorly 1 for large values. R ecall from SeCtion 8. As emphasized in Section 8. so the expected change in Y arising fro m a change in X is tlle change in the probability that Y = 1.able bias in probil regression. next.1.5) 1 .1.
! va lue. X. As an illustration. 19 + 2.~7 = 6. The prohit coeffi cients i31J• {3 ). + ~.) = <t>(flo + ~ . + {3 kX j.• XI.. X. X.:\Ll~W<r Ihis q uestion .!' . and esti mated effccts are sum marized in Key Conccpl ll . What is th e change in the predicted prohability lhat a n application \\111 be denied when the payme ntto· income ratio increases fro m 0..97{'11 raTio). Application to the mortgage data .2.392 CHAPTER 1 1 Regressioo with a Binary Dependent Variable Key CONCEPT Ti:JE.. .pROBABILITIES.toincome rat io (PII raTio): Pr(lieny =I PII ratio) = (N~2 ... z == f30 + {3IX I + f32X2 + .32).: is calcu_ l ated by computing the zval ue. • . lndced.::rencc. (11.. . lbe predicted probability thai Y "'" I. I> is the cumulative standard normal distribution function and X I' X 2. .( Y = II X.. we fit a probit modcl to the 2380 obse rvations in our data set on m o rtgage denial (deny) and th e pay me nt. and then loo king up Ihis zvalue in the normal distribution table (Ap pe ndix Table I). (0..97 are d ifficult to interpret bC c<lU SC thcy affect the probability of denial via the . given \'a lues of X l' X 2•. TIle probit regression model.X.. preditted probabilities. 7) is tha t the PII ratio is positive ly related to proba bi lity of denial (tbe coefficient 011 the ('II ratio is posilive) and Ihis re1ationsb ip is statistica lly significant (t "" 2..• 13k . are regressors. + .. AND ESTIMATED EFFECTS The po pu latiull pro bit mode l with multiple regre ssors i~ 11.J . do not h~\'e simple im erpret at io m..47) (11. c lc. the pre d icted probabi lity for the ini tial va lue of the regressors: (2) com puting the pre d icted probability for the new or changed value of the regrcssors: and (3) tak ing their ditT. the o nly thing that C .2 P.6) where the de penden t variable Y is binary. .7) The estimated coefficients of .'1).ln be rea dily concl ud ed from the estimated prohi! regression in E q uaLion (1 1. PREDICTED . 1: Compute th r: plob. The effect of a change in a regressor is computed by ~I) computing.2.. The model is best interpreted by computing predicted probabilities a nd the e ffect of II change in a regressor.16) (0. we follow the procedure in Key Concept K.4? To .£BOBIT MODEL.1 Y a nd 2.3 to O.97. + ~ . .X..
5.. a 95% confidence in tcrval for the .2 percent age poims..19 + 2.74PI/ ".. an increase in the paymentlaincome ratio from 0. of applications.7) is (1 )(2.97 x 0.3 to 0.0.lnd llonnaUy distributed in large samples. Standard errors produced by sllch software can be used in the !>amc way as the ~tandard errors of regression coefficients: for example. which produces efficient (minimum variance) estimalo~ in a wide variet). from 9.1 59 .0 percentage points.0.~I<lti " tjc on Ma ck j. The probability of uenial when PI! ratio = 0.062.tant their paymenttoincome mtio.97 x 0. 2 percentage points when the PITratio increases from 0.1. .3. (0. y.3 to 0.3 it is 23. 159.097 = 0.~ percentage poin ts.159.4 is associated with an incrense in the probabilit y of denial of 6.19 + 2. For exam ple. then the esti mated denial probabil ity based on Equation (1 1. Thb coefficient is sta ti. 'Illc maximum likelihood estimator is con sistent . Beca use the probit regression function is nonlinear.)0) 0. holding con stant the paymenttoincnmc ratio? To estimate this effect.00) = 0. s. 71) "" 0.5). so this is a simple method to apply in pract ice.4 is (1)( 2. Regression soft ware for estimating probit models typically uses maximu m likelihood estimation. if PII ratio = 0. or 8. indie:lIillg that ao Africa n American "pplica nt has a higher prohahility of denial than a white applicant . 3%. the predicted denial probllbilit y is 7.2. 'rne cocf(icicnt 0 11 black b positive. The probability of denial \\ hen PI! ratio = 0. Wh al b.4.rio + 0. and th en compute the difference.3 is (D( 2.:mls on Ihe starting value of X.5 is 0.097.0. so thai (statistics and confide nce interva ' ~ for the coefficients can be constructed in the usual \~a) . "The estimated change in the probability of denial is 0. then for PI! rmio = 0.4) = Ci)( .7% to 15.lhe dfcct of race on the probability o f mortgage denial. That is.9%.44) (1 [.16) (0.97 x 0. The probit coefficients reported here were estimated using the method of muximum likelibood.2 Probil and logit Regreuion 393 bililY of denial for PII ratio .239 .5) = <1>( 0. Thus the change io the predicted probability when PII ralio increases from 0. For a white applicant with Pli ralio = 03.4 to 0. l:uger than the increase of 6.ticaliy significant at the I % leve l (the I .26 + 2.3) . 19 + 2. the difference in denial probabili ties between ihese two hypothetical applic<llll s is 1 5. the effect of a ch a ng~ in X depl. including regres sion wit h a hinary dependent v'lriablc. $ ( 1 . hold ing conJ.0"3) (0. while for a black applicant with PI. e estimate a probi! regressiun witb boTh PIT rario and l>Itu:k as rcgrcssors: = Pr(dellY  IJ PI! ratio. the values of the codIidents are difficult to interpret hut the sign and significance are not.4. black) = cJ)( .t 1.5.7Iblack).5%. ratio = 0.8) Agl1in..239. sta t i~tical Estimation ofthe probit coefficients.
£ ·statistics computed usin g maximum likelih ood estimat ors ca n be used to test joint hypotheses. This is illust rated in fig: ure 11.. which we de note by F. ~ }~ Logit Regression The logit rezression model. Logit regression is su mmarized in Key Concept 1l.l he logit coeffic ients are best interpreted by computing pre· dicted probabilities and d iffere nces in predicted probabilities. t:XCCpl that the cumulal ivc distri bUlion fu nction is different. wh. so that tstClt istics and confidence intervals for thc coeffici ents can be con structed in the usu a l way. As with probil.ich graphs the probit and logit regression functions fo r the depenM n! variable deny and the single regressor Pil TaljO .. which is given as the final expression in Equat ion (11. Max imum li kelihood estimalion is discussed ~ ~ further in Section 11. except that the cumula tive standard normal d istribut ion funccion (}) in Equation (11 . The logistic cumulative distribution function has a specific functional form. The Jogit regression mooel is similar to the pro bit regression mode l.96 stan· dard errors..9).3 The po pulation logit model o f the binary dependent variable }' "'ilh mUltiple regressors is 1 ::: 1 + eCAl+.394 CHAPTfR 11 Regression with a Sioory Dependent Variable LOGIT REGRESSION 11. defined in te rms of the e xponent ial function.3.2.· tributio n fun crion . Similarly.3.3.~t ·p.B.{.6) is replaced by the cumulative srandard logistic di...X" p'>. The maxim um likeli hood estimator is consistent and no nnally distributed in l ar!!~ sa mples. The logit and pro bit regression func tion s are simila r. true probi! coefficient can be conslJucted as the e ~timated coeffici ent ::!: 1. estimated by maximum likelihOod . with additional del ails given in A ppendix 11. The coefficients of the logit model can be esti mated by maximum likelih ood. r Logit regressio n is similar 10 probil regression .
__ u _ u _ Mortgage approved 11. . The diffe «nces be twee n the two functions are sm all. .6 0.1O) (0.J1XO. so th e di ffe rence between the two proba bilities is 14.2 1 · l 1.8 percentage points. .2 _____ u _ _ _ .8 0(. .47). .3 Probit and logit Models of the Probability of Denial. _ . _.4%.0 0.• . black) = F( 4..37PI! ratio + (0.13 + 5.2 0.. Hi storically..lion. With the adve nt of morl! efficie nt co mputers._.. ( ~ using Ihe sam e 127 obse".2 03 OA 0. Ihis dis tinctio n is no longer important.7 08 P/ I ra n o ~. ~ ~". The pred icted denia l probability of a white applicant with PI! ratio = 0.I 0.3t1. .0 0. h I. d eny 14 1.5 0. (11.". 11.4 J OIl'" 0..".1 and 11.21XOJJ = 1/ /1 + e2.( ·UH 5.27hfack ).11 . .35) (0.222. _ . . Application to the Boston HMDA data _ A logit regression of dell)' against PI[ ratio and black . the main motiva tion for logil regression was that Ihe logist ic cum ulative distribution fu nction could he computed faster tn(lD the normal cumu lative distribution function.52J = 0. o r 22 .074. in Figu. produce nearly identicol esti motes of !he probability that a mortgage application will be denied.'l Probil and lagit Regression 395 f iGURE: 11 . using the 2380 observati ons in t he data set. yie lds the estimated regression function Vr(deny = l lP II ratio.... The pred icted de nia l probabilit y o f an African Ame rican applicant with PI! falin = 0. .0 IJ.3 is I / fl + e U5 J = 0..3 is 1/[1 + e .%) 1.15) lbe coeffi cient on black is' positive a nd statistically signiUcant at the 1% level (the tstatistic is 8. 04 0. _ _ _ _ _ _ _ _ _ • • _ .2%. Given the P/ 1Ratio Thew lagit and probit model. .. or 7.J' _ . given the poymentkl income ratio. "''''~:~ Mortgag~ dffiied .2 .
based on Eq uation (1L10).. was 14..8) the dif(cre ncx' in denial probabili ties between a black upplicant and a white applica nt with Pi l rario = OJ was cstimatcd to be J5.r<: s ion funclions can he estimated by O LS.: stimates arc very similar. in which ca ~ e th e linear probability model still can provide an adeq us it: approx imation . Fo r practical purposes thl! two t.111' dard no rmal tlist ribution function (I>. 7 pe rcent age points. the pTll\1i l coeffici ents {Ju' f31"" . however. and logitarc just approximati o n~ to the unknown popula tion regression funClion E(Y I X) . Probit and logit regressions freq uent ly produce si milar results. car prolxl bil ity model is casiest to use and to inte rpret.\. whereas the logit estima te of this gap.R percentage points. but it cannot ca pture thl.. Pr( Y ..1lle only way to know this. TIle linear pro ba bili ty mode l provides t h~ast sen sible approx imalion to the nonli near population regre!>sion functi on. . is 10 estimatl! both a hn ear and nonline ar model und 10 compare their predicted pro babil ities. th l! estima ted black/white gap from the li near p roba bili tv model is 17. O ne way to choose bdween logit and probi t is to pick the method that i~ easiest to use in your statistical soft ware. For exa mpl e. and differe nt researchers use diffe rent model\./3A' in Eq alion (11. and Logit Models All three mode lslinear probability. Th aI is.m""Y .3 Estimation and Inference in the Logit and Probit Models 2 The' non linea r models studied in Sections 8. I 1. huger than the probi! and 10gil estimates but slill qU<l litatively sim ilar. E ven so.If the in dependent varia bles hu t arc linear fu ncti ons of the unk nown coe ff ici cnl ~ (" parnmctcrs").. according to the estimated probit model in E quatio n (1 1. or oo". Probit and logit regressio ns mode l (his no nlinearity in the probabil ities.9 percentage points.2 and 8. So which shou ld you use in practi(:c? There is 0 0 o nc righ t a nswe r. In cont rast.. Consequently. L n the denial probability regression in Eq ua tio n ( 11.3 are nonlinear fu nctio n:. probit .:~ · si on fun ct ions are a nonlinear ~nc tio n of Ihe codricien!!>.: non linear nature o f the true populatio n regressio n fu nction . and Ihe logil c~ ci c nts in Equalioll ( J I.. 11 . in some d ata sets there may be few ext reme va lu ~s of the regressors. ~(J~. th e probit and logit re gr. but their regrc~ io n cocf f)Cients are more difficult to interpret. the unknown coefficients o r those nonlinear reg. Probit. 6) ap pear inside the cumulative <..'l l .\').396 CHA PTlR " Regression with e Binary Dependent V enable Comparing the Linear Probability.3) . The lin.
) = Pr( Y = l lx l. that is. There are. before turning to maxi mum likelihood.. which was introduced in Appe nd ix 8. X. which is a non linear fun ction of the pa ra me te rs. li ke the probit coefficien ts. 13\ ..2.) = I X Prey = IIX.isla kes produced by the model. . 1..)]'. Because it is bu ill into modern statistical soft ware.odel 'SS1011 Nonlinear Least Squares Estimation Nonlinear least squa res is a gene ral me thod fo r eSlimaling the unknown pa rame ters of a regression function whe n. the ncmlincar least squares estimator of the probit coefficienh is rarely used in practice.. :. .. :enial 0. appear inside lhc cumula th'e standa rd logistic distribution function F.considc r the no nlinear least sq ua res estima tor of the parame te rs of the probit model..: is (0 othe aybe . This section provides an in trod uction to the standard met hod fo r estimation of prohit a nd logit coefficients. To be concre lc. L i~ l ct>( bo + b.cicnts regre~ " IY. the nonlinear least sq uares esli ma tor or the probit coe{ficients are tho!!c values of bo' . Because the population regression func tion is a nonlinear func tion o f the coefficients f3o. + f3~k. . Pip those coefficients can not be estima ted by OLS. The no nlinear least squares esti ma tor. pic.. . is mo re complicated tha n the theory of least squa res..) + 0 X Prey = olx. ( 11.. Like O LS. For this reason. howeve r.~· prohll 'c sW ll ' '11 (1 [. .11 .th. howeve r. .3 gap. b" thaI minimize the sum o f squa red prediction mistakes: ions of '.. to the depe nde nt varia ble..9) It\ . . estimator5 that have a smaller variance tha n the nonlinear least squares eSlim alor.. )' Eslima tion by nonlinear least squa res fil s this cond itional expecta tion function.. muximum likeli hood: addit ional mathe ma tical dctails are givcn in Appe ndix 11. X. ...X.3 Edimotion and Wlfetence in the logil and Probit Models 397 (ms lin th< ogit ocf dels... + b. We the refore first discuss another estimation method. the nonlinear least squares esti ma tor is ine ffic ient... T he condi tio nal expecta tio n of Y given the K s is bility lut ~ liI1 a lin E(Y IX . those parame ters ent er the population regressio n functio D no nlinea rly. nonlinea r least squares..11) regr. and instead the paramete rs are est i rmlted b y maximum likelihood.. . TIle theory of maximum likeli hood estima tion. . Tha t is. X. The nonlinea r least squares estimato r sha res two key properties with the OLS estimato r in linear regression: It is consis tent ( the proba bility Ihat it is close to the true va lue approaches l as the sample size gets large) a nd it is normally distrib uted in large samples. . Xd = (1 )(/30 + PIX1 + . maximum likelihood est imation of the probit coeffic ie nts is easy in practice. nonlinear least sq ua res rinds the "a lues of the pa ra me ters tha t minimize the sum of sq ua red prediction m... ex te nds the OLS eSli malOr to regression funct io ns lhat a r~ no nlinear fu nc tio ns o f the parame te rs. X " + .
. !~ The likelihoo d fun ction is the joinl probability dio.sors. obscn atiom..the joint distribution is the product o f the individual d istributions I Equatim) = >" .. Yo:! = l 'l ) = The likelihood function is the joint probability dis tribution.1') 1. Yl"nd Y2· 01l <I binary depende nt variable wi th no regre::.398 CHAPTER 11 Regression with a Binary Dependent Veriable Maximum Likelihood Estimation ~~:~ p. the fraction of limes Y.ll c:. you can Iry dirrerc nt .r..1\1 mize the likelihood function. .d . 4jJf JI2 f~ .... the likeli hood fu nction is [(P:Y Y 2) = pC}' " 1'1)(1  "P)7(}' n (11 11) maxilllum likelihood estimato r o f p is the v" lue of p that maximizes thll likel ihood funct ion in Equation (1 1. th e joint probabi lil Y distri bulion of Y I a nd Y2 b Pr(Y1 {p " (1 . 'l1lUs' Y is a I" B~rnou)h . .lril:lu.[\' m:nor of p. = . ..l'V(" to. In thi ... 101 !!l'n' c rain..trib. that i:. with all maximization or minimizilli(ln problt:ms.12). Pr(Y . lion can be s ummariud in Ihe form ula Prey = y) = prO .: mtl. in dfecl thc MLE chooses the va lu es of the paramelers 10 maximize Ihc probability of drawing Ihe data thilt arc . o bservatio ns o n Bernoull i ran' dom varia bles.ThC Bernoulhdi<.YI' Y1 ~ Y2) "" Pr(YI == YI) Pr( Y1 = Y2). /. Because YI and Y2 are in d epe ndently di.ing the likelihood func..p)n = p a nd whe n y = 0.2).tribution of the data. actually ob!!cl'\cd. :( Yl tion using ca lculus p roduces a simple formula for th e MLE:111e MLE is PI me + Y!). howeve r... treated as a lunc lion of the unknown coefficie nt::. to maximize the like li hood func tio n. uled.)..l"'. .. of p and compu te the likclihoo df(p:Y.th:11 "" p V (Ihi!! is !!ho\'. lhe M LE i!) tbe uS".'S the unkno wn coeffiei~nt U 'V ~l r L4".p ~. 1 in the sample.+'~). in ot her words. soPr(Y1 =. the Ml..23)] . maxi mi7 . ulion. likelihood funclion.Ao.1')1 . ln this example.tlue::.:] = p (o.amlom va riable and the only un known parameter to esti mute is th e probability p that Y "" L which is also thl! mean of Y. For II '" 2 i..i. th e MLE p of the Bernoulli probability p is the :. the ML E of I) is just th e sample average! In fact. cons ider two i.p)IJ: when \' Pr ey = 1) = p l(l . which in turn requ ires an exp ression for the joim probahility .d. The joint p robability d istr ib uti on of the two oh~t:rval iol\s Y I and Y2 is P r( YI "" )' 1' Y 2 = >'1). Because the MLE chooSf. To obtain the maximum likelihood eSlimator we need a n exp re!!sion fo r the di~tri bution o f the dll t:!. 0) = p 0(1 _ p) l = I _ ".i .. To ill ustrate maximum likelihood estimation.I(J _ p) ~ f". are the parameter \alues " most likely to h a\c produced the datu.1\ a fun ctio n of the unk no~ coefficit=IlL<:' The maximulI1likelihood esti matur (ML£) of the unkno\'n codficiems consists of the " a lul!::' or the eoc(ficic nts tha i "1. c<tn be done b) Iria l and error. '\ or. 1. (2.'<imized this function. which is in turn Ihe joint probability dbl rib .] X Ip r'(J .. sense. trea ted .Y2) until you are sat isfied that ) OU h. ~ ~.n in Appendix 11.. Thu:.".. .thi:... in Ihis exmnplc.ample average.
probability p is not constant. Like the nonlinear least squares estim ator. which is given in Equation (11.96 stand. hypothesis tew. I he CHIsqua red statislic is q x F. . they produce idenlit'al infer ences.<.1.3 ~imotion and Inference in the logil one! Probit NtodeIs 399 Thb.. diSlributed as ~ (q in large sa mples. All o f this is complet e ly an::liogo us to slat istical inference in the linear regressio n model.. Because the two approaches differ only in whether they divide by q.q x Fis distributcd as ~ in large samples. the succes. Measures of Fit In Sect ion 11 . This is also true Cor probit and logil regression. statistica l infercncl.). E¥prcs ~ion~ C or the probi t and logit likdihood func li on~ nrc given in Appt>ndix 11. the prohit and logit li kelihood functions are similnr to the likelihood func tion in Eqw:lIion (1l. Bec. the n Y. Beca use 'the Fstatislic js. but you need to know wh ich approach is implememcd in your software so thlll )'ou Ube the correct critical values.lrd errors. is said to be correctly predicted.!dicted probability exceeds 50%. Two measures of fit for models wit h hinary dependent \'ariahle.~ are the "fraction correctly pre dicted " and the ' ·psc udo . but rather depends on X .or if YI = 0 and the pre dicted probabili ly is l es~ tha n 50% .9) for the logit model. und(".' about the pfobit and logit coeffi cients hased on the MLE proceeds in the same way as inference about the linear regres~ i o n function coeClk ients bil ~ed on the OLS estimato r."nl e fruction correctl) predicted ll'iCS the following ruk:: If Y. An important pr<lctical point is that some statistical software reports tests of joun hypot heses using th e Fshllistic. are performed using the I·statistic and 95% confidence intervals arc formed OI) :!: 1.R~. Th at is. the MLE is consistent and nor mally distribut ed in large samples. it was mentioned that the R2 is a poor measure o{fi l for the linear probubili1y model.lUse regression ~oftwa re commonly com pu tes the MLE of the probit eoe{ficient~ this cst imator is easy to use in practice. ·Jl.US. Tests of joint hypoth<ses o n multiple coefficie nts uSe the ':statistie. All the eSlimated probit and logit coefficients reported in this chapter are MLEs. where q is the number of restrictions being test ed. Ihat is. = I and the prt. In those models.11 .1 2). in a way similar to that discussed in Chapte r 7 fo r the lincar regressio n model. exeept lhat the success prob ahiliry varies from one observation to the next (heeau!'>c it depends on X. example is similar to the problem of estimating the unknown coefficients of the probi t and logit regression models.r the null hypothesis. it is thesucce~ probability con dilional on X. whil!! oth sja ll ~flc.2. Bcc:lUse the MLE is normally dis tributed in large samples.6) for the probit model and Equa tion (11. Statistical inference based on the MLE.
measured in terms of his or her income.• Y" that lire co rrectly predicted. The first of Ih"s~ is the PI] rario: the second is the ralio of housingrelated ex penses 10 income.the ob\cr. This is. • . This suggests measuring the quality of fit of a probi! model oy compari ng va lues of the maximized likelihood function with a ll the regressors to the value of tbe like lihood with none. rela tive to the assessed va lue of the home: if the loantovalue ratio is nearly 1. A fo rmula for Ihe pseudoR2 is given in Appendix 11. " . Th e " rractio n correctly predicted" h the fraction of th e /I observations Y I .lble paying off debts in the pas1.(S tbe likelihood fu nction. The next variable is the size of the loan.then the loan om· . A diMd. adding another regreSsOr to 0g1t model increases the value of the maximized likelihood. holdlng co nsta nt those applicant cha rac teristics Iha t a loan officer mightlegaUy conside r when deciding on a mort gage applicatio n. The fi na l th re e fi nancial variables summarize Ihe applican t's credit history. In this sectio n.. An advantage of this measure of fit is that it is easy to underst<\Od.4 Application to the Boston HMDA Data The re gr essions of the previ o us two sections indicat ed Ih<l l den ial rat es were higber for black than white applicants. Y. we take a closer look a t whe the r the re is s tali!otical c\" i denc~ of discrimination in the B~ton HMDA data. 11 . = I.1: these arc the variabl es we will foc us on in our e mpirical models of loan deci sions. If an a pplicant has been unrcli.. Loan office r~ however. Just like a probit o r 1 adding a regressor necessarily reduces the sum of squared res idua ls in linear regression by QLS. holding constant their payment toincome ratio.e a pplications in the Boston HMDA da ta set a re listed in Table 11.our objective is to estimate the effec t of race on tbe probabilit y o f denial. The most importa nt variables avaiJable to lOO n officers thro ugh thc mortgag. vantage is Iha t it does no t re flect the qua lity of the prediction: If Y. Specifi caUy. legitimate ly weigh ma ny factors whe n tlecid ing on a mortgage applicatio n.2. full amount of the loan if the applicant defau lts on the loan and the ba nk (€lre' closes. what the pscudo·R2dO('s. in fact. aod if an y of those other (actors differ systematically by race then the estimators cons idcretl so far have omitted variable bias.400 CHAPTU: 11 Regreulon with a Binory Dependent Variable Othe rwisc. then th e bank might have trouble recoupi ng the . The first twO va riables arc direct measures of the financi<ll burden the proposed loan wo uld place o n the applican t.. lio n is trea ted as correctly predicted whether the predicted probability is 51 % or 90%. The pseudoR 1 measures the fit of the mode l using the likelihood funCtion. Because tbe MLE maximi7. is said to be incorrectly predicted.
11 _ 4 Application to !he Bailon HMDA Dote 40 1 TABU 11.ili .1 V'anoble Variables Included in Regression Models of Mortgage Decistons Definition Sample Average I Fin.rlKt~rlstlcs ootherwise 1 if <lpplicant applied for mortgage insurance and was denied..120 ' eer legitimately mighlworry about his or her ability or des ire to ma ke mortgage payments in the futurc. 0 o ther .nt Ch.l'mem rate I if self.1 mort~ugf' credit scou L7 rpIlbliC bad credit record 0. OJ120 0.e payme nts 4 if more tha n two late mortgage paymen ts 1 if any public re cord of credit problems (bank ru ptcy.38 .wl/ul! mlio ronl /U/I{!ru«/u scort' 0. The thrce variables measure different types of credit his lories. the second.288 . 0 if whi te I if mo rt gage application denied.393 sdJ.employed .8 0 .rl.331 /rulisillg tXpI!IIWNO /Ill'omt rariQ Rati o of monthly housing expenses to total month ly income loan. 0 othe rvise I if applicant report ed being single.lo. .. 2. 0 otherwise 1 if applica nt grad uated from hig h Sl'hooi.employed )llIg le hl/. ise 1989 Massachusetts unemployment fa tc in the applicant"s industry 1 if un it is a condominium . 0 otherwise 0.1l6 0. previous mortgage payment history: and rh e third measures credit problems so severe that th ey appeared in a public legal record. The first concerns consumer credit .984 3.:. which the loan officer might weigh differently.1 Appllc./! o  Addltion.. such as cred it card debt.b1~f I'l l ratio Rati o of to tal monthly debt payments to totalmomltly income 0.\·llfon.. coUection actions) otherwise  0.142 0. 0 o th erwise 0. «mdommlum Mad <fen~' 1 if applicant is black.I('hool diplumll 1II1rmp /O. such as filing for ban kruplcy.255 _' Ratio of size of loa n to assessed value of ero£.ndaJ V. charge·d is.074 dfl lll!d mortgage in.rty I if!lo "slow" pa yments or delinquencies 2 if one or two slow paymen ts or delinque ncies 3 if !llore tha n two slow payments 4 if insufficient credit history for determination 5 if delinquent credit history wi th payments 60 days overdue 6 if de linquent credit bislOry wi th paymen ts 90 days overdue I if no late mongage payments 2 if!lo mortgage paymem hIStory 3 if one or two lale mongag.
1 a! OJ)45).449.<: omit t><:. Sometimes the applica nt must apply rOT private mon gagc insurance. ao increase in the P I I ratio of 0. mcd ium ( between 0. 31)(. so t he base specification for lh at vari ab le uses binary variilblcs for whet her the loantovulue ratio is high ( ~ 0.d· elanl here becau5C our sub!.111 office r's dl:ci.1 also lists some other variables relevant to th e 1 0. iJ lhnl Munnell el nl. for the loant"oval ue ru lio.8 and 0. whIch" lI. D urin" the ~noo of th i\ ~lud~..: ratio exceedi ng 95% is associated with an 18. (]WO).9 pe rcentage point i n e rca~t: (the I f" h.2 % of applica nts are black and 12..449 x 0. bl4 A" Inti L: . The final lwo variables in Table 11. ll1<. specific "Qudier.X. Include additional indicators for the locll1lOn of thc home and the IJ~n\ltV of the k l1der.Hive ly wilh the loan officer.n£ Ihi~ 'aria"tc has j\ re" vel") large posnilc and negative yaluesand thus risks makmg the r"ulb' F~l' 1. this case is omitted to avoid pe r· fect nlulticoUineari ty). Table 11..0% of applications arc denied..: ("~I(j cient on PI I ratio in column (1) is 0.. data which are not publicl~ a~'ailablc:al1 ind icator for a multifamily home. dthe k ·. hi.. include t h~ fina ncial varinbles in Table 11 1 plus the variables indicating whet her private mo rtgage insurance was denied and whe th er the applicant is selfemployed. ..1 Morll\iiSC i n su ran~c i~ an ins urance polic) unde r which th e in~ u rii nce compJnl mak~< the monthly pa~ mt:nl lO the bank if the oorro" cr defaults. Sim ilarl)'. Of low « O. 14. using a linear proba bility model. reported in columns ( 1}(3). 3 The loan office r knows whether thai application was denied.I'e to a fe . 1... denied or accepted. and that denial wouhJ weigh neg. "'The difference between the rcgreloSOts in column~ (t H ::.~ The regressio ns in columns (1)(3) differ on ly in how the denial probability is modeled . and a probit model . mari tal status.402 CH APTER 11 Regreuion wilh Q Binary Dependent Voriob'e Table 11. Accord ingly. TIle regressors in the first three columns are similar to Iho'I. relate 10 the prospective ahitity of the applicant to re pay. and 0.:t al.1 whether the application WID. and educa tional allainmcnl of the appi1c8m. respecrivel)".) li nd IhoM" in Mu nnell . sure.t"1 foclJ5C$ 011 singlefamily I"IOmcs:and net wealth .95) . hil\'ing a high loantova lue ralio increases the probability of deni al: a loanto·valu!. its codii clents are estima ted changes in predicted prohabilities arising from a unit l:hangc in the indcpende nt va ria ble. e mploymen t stat us.obscn.·htch .n· t()Ialuc ratio e. sion. L n these da ta. Because the regression in column (1) is a linear probability model..characteristics of the property are relevant as well and the next variable In dl (:lIes whether the property is a condominium.J arc whet he r the a pplicant is black o r while.2 preScn lS regression results based on these vaO<lbles. The base spec ifica tions.5 perccotage poin ts (th. II. I c.95). if/a .l uons.1 is c'>ti mated to increase the probability of de nial by 4.a. a logit model .C in the base specification considered by the Federal Reserve Bank of Boston rese archers in the ir origina l analysis o r these dat a. nexl lhrec variable~ • which concern the .t>I~ ~ 1). In the even! of foreclo.~e<h 80% Iht" applicanl typically was required to huy mong"!!c insurance.. Loan offi cers commo nly use thrc s h old~ or cuto ff values.
74) 2.183·· (0.fMn<ien1 variob": deny . X PII.15' (0./StNO (.44S) 2.18 (0.08) % of 0.t.21" (Mil) 1 '1 0..0.03 (0. 24) 0..07) 0.08) mnm:agr credit IiCOTt Pllh/t( bod eTer/1I Tt'c(Jfd drnitd mOrlgog' IIISlIrallct: ~rf{(mployed 1.099) 1 '1 0.031" (0. Rf'9"u lan RegfUIOI'" M~ J Log.57) 0.\ was 0.(198) 2.57" (0.011) 0.12) 259" (0...03 (0.189" (O.79" (0.Ii.29) 0.2 coonnued) .