Introduction to Econometrics
James H. Stock
HARVARD UNIVERSITY
Mark W. Watson
PRINCETON UNIVERSITY
~
...
Boston San Francisco N ew York
London Toronco Sydney ToJ..:yo Singapore Madrid
Mexico City Munich Paris Cape Town Hong Kong Montreal
Brief Contents
PART ONE
CHAPTE R I CHAPTER 2 CHAPTER 3
Introduction and Review
Economic Questions and Data ReviewofProbability Review of Statistics 17
65
3
PART TWO
CHAPT ER 4 CHAPTER 5
Fundamentals of Regression Analysis
Linear Regression with One Regressor 1II
109
Regression with a Singl e Regressor: Hypothesis Tests and
Confidence In tervals 148
Linear Regression with Multiple Regressors Hypothesis Tests and Confidence Intervals
in Mult ipl e Regression 220
No nlinear Regression Functions 254
312
Assessing Studies Based on Multiple Regression 186
CHAPTER 6 CHAPTER 7
CHAPTER 8 CHAPTER 9
PART THREE Further Topics in Regression Analysis
CHAPTER 10 CHAPTER II C HAPTE R 12 CHAPTER 13
347
383
Regression with Panel Data
349
421
468
Regression with a Binary Depe ndent Variable Instrumenta l Variables Regression Ex periments and Quasi Experiment s
PART FOUR
CHAPTER 14 CHAPTER 15 C HAPTER 16
Regression Analysis of Economic
Time Series Data 523
Introduction to T ime Series Regression and Forecasting Estimation of Dynamic Causal Effects 591
637
Additional Topics in Time Series Regression 525
PART FIVE
CHAPTER 17
The Econometric Theory of Regression Analysis
The Theory of Linear Regression wit h One Regressor 704
675
677
CHAPTER 18 The T heory of Multiple Regression
v
Contents
Preface
XXVIl
PART ONE
CHAPTER I
1.1
Introduction and Review
Economic Questions and Data 3
Economic Questions We Examine 4
Question # 1: Does Reducing Class Size Improve Elementary School
Education? 4
Question # 2: Is There Racial Discrimination in the Market for Home Loans? Question # 3: H ow Much Do Cigarette Taxes Reduce Smoking? 5
Question #4: W hat Will the Rate of Inflation Be Next Year? 6
Quantitative Questions, Quantitative Answers 7
5
1.2
Causal Effects and Idealized Experime nts
Estimation of Causal Effects 8
Forecasting and Causality 9
8
1.3
Data: Sources and Types
10
10
Expe rimental versus Observational Data CrossSectional Data [ I
Time Series Data II
Panel Data 13
CHAPTER 2
2.1
Review of Probability
17
18
Random Variables and Probability Distributions
Probabilities, the Sample Space, and Random Variables 18
Probability Distribution of a Discrete Random Variable 19
Probability Distribution of a Continuous Random Variable 2 1
2.2
Expected Values, Mean, and Variance
23
The Expected Value of a Random Variable 23
The Standard Deviation and Variance 24
Mean and Variance of a Linear Function of a Random Var iable Other Measures of the Shape of a Distribution 26
2.3
25
Two Random Variables
29
29
Joint and Marginal Distributions Conditional Distributions 30
vii
viii
CO NTENTS
Independence 34
Covariance and Correlation 34
The Mean and Variance of Sums of Random Variables
2.4
35
The Normal, ChiSquared ,Student t, and F Distributions
The Normal Distribution 39
The ChiSquared Di stribution 43
The Student t Di stribution 44
The F Distribution 44
39
2.5
Random Sampling and the Distribution of the Sample Average
Random Sampling 45
The Sampling Distribution of the Sample Average 46
45
2.6
LargeSample Approximations to Sampling Distributions
The Law of Large Numbers and Consi stency The Central Li mit Theorem 52
APPENDIX 2.1
48
49
63
Derivation of Results in Key Concept 2.3
CHAPTER 3
Review of Statistics
65
66
3.1
Estimation of the Population Mean
Esti mators and Their Properties 67
Properties of Y 68
The Importance of Random Sampling 70
3.2
Hypothesis Tests C oncerning the Population Mean
71
Null and Alternative Hypotheses 72
The p Value 72
Calculating thepValue When O' y Is Known 74
The Sample Variance, Sample Standard Deviation , and Standard Error Calculating thepValue When O'y Is Unknown 76
The tStatistic 77
Hypothesis Testing with a Prespecifled Significance Level 78
OneSided Alternatives 80
3.3 3.4
75
Confidence Interva ls for the Population Mean Comparing Means from Different Populations
81
83
84
Hypothesis Tests for the Difference Between Two Means 83
Confidence Intervals for the Difference Between Two Population Means
3.5
DifferencesofMe an s Estimat ion of Causa l Effects
Using Experiment al Data 85
The Causal Effect as a Difference of Conditional Expectations 85
Estimation of the Causal Effec t Using Differences of Means 87
CO NTENTS
ix
3.6 Using the tSt at istic W hen the Sample Size Is Small 88
T he tStatistic and the Student t Distribution 88
Use of the Student t Distribution in Practice 92
3.7 Scatterplot, the Sample Covariance, and the Sample Correlation 92
Scatterp[ots 93
Sample Covariance and Correlation 94
APPENDIX 3 . 1 APPENDIX 3.2 APPENDIX 3.3
The U.S. Current Population Survey
[05
Two Proofs That Y [s the Least Squares Estimator of J.Ly A ProofThat the Sample Variance Is Consistent [07
[06
PART TWO
CHAPTER 4
Fundamentals of Regression Analysis
linear Regression w ith One Regressor 112
III
109
4. 1 The Linear Regression Model
4. 2
Estimating the Coefficients of the Linear Regression Model
T he Ordinary Least Squares Estimator [18
OLS Estimates of the Re [ationsh ip Between Test Scores and the
StudentTeacher Ratio 120
Why Use the OLS Estimator? 12 1
116
4.3
Measures of Fit
123
The R2 [23
The Standard Error of the Regression 124
Application to the Test Score Data 125
4.4 T he Least Squares Assumptions 126
Assumption #1: The Conditional Distribution ofu, Given X, Has a Mean
of Zero [26
Assumption #2: (X,. Y,). i n Are Independent[y and Ide ntically
Distributed [28
Assumption #3: Large Outliers Are Unlike[y [29
Use of the Least Squares Assumptions 130
=[.. . ..
4.5 The Sampling Distribution of the OLS Estimators T he Sampling Distribution of the OLS Estimators
4.6 Conclusion
AP P E NDI X 4 . 1 A PPEND IX 4. 2 A PP EN D IX 4 . 3
131
132
135
The California Test Score Data Set Derivation of the OLS Estimators [43 143
144
Sampli ng Di stribution of the O LS Estimator
x
CO NTENTS
CHAPTER 5
Regression with a Single Regressor: Hypothesis Tests
and Confidence Intervals 148
Regression Coefficients 149
5.1 Testing Hypotheses About One of the
Tw oSided H ypotheses Concerning f31 149
O neSided H ypotheses Concerning f31 153
Testing Hypot heses About the Intercept f30 155
5.2
II
Confidence Intervals for a Regression Coefficient Regression W hen X Is a BinaryVari able
Inte rpretation of the Regression Coefficients
ISS
I.
I
5.3
158
158
5.4
Heteroskedasticity and Homoskedasticity
160
W hat Are Heteroskedasticity and H omoskedasticity? 160
Mathematical Implications of Hom oskedasticity 163
W hat Does This Mean in Practice? 164
5.5 The Theoretical Foundations of Ord inary Least Squares 166
Linear Cond itionally Unbiased Estimators and the Ga ussMarkov r heorem Regression Estimators Other Than OLS 168
5.6 Using the tStatistic in Regression When the Sample Si z~
167
Is Small
169
The tStatistic a nd the Student t Di stribution 170
Use of the Student t Distribution in Practice 170
5.7 Conclusion
171
180
APPE N DI X 5 .1 Form ulas for OLS Standard Errors APPEND IX 5.2
The GaussMarkov Cond itions and a Proof
of the GaussMarkov Theorem 182
CHAPTER 6
6.1
Linear Regression with Multiple Regressors Omi tted Variable Bias 186
186
Definition of Omitted Variable Bias 187
A Form ula for Omitted Va riable Bias 189
Addressing Omitted Variable Bias by Dividing the Data into Groub s
6.2
19 1
The Multiple Regression Mode l
193
194
The Population Regression Line 193
The Popu lation Multiple Regre ssion Model
6.3
The OLS Estimator in Multiple Regression
196
198
The OLS Esti ma tor 197
Appl ication to Test Score s and the Stude ntTeacher Ratio
CONT ENTS
xi
6.4 Measures of Fit in Multiple Regression
200
T he Standard Error of the Regression (SER) 200
The R2 200
The 'I\djusred R2" 20 I
Applicat ion to Test Scores 202
6.5 The Least Squares Assumptions in Multiple Regression
Ass umption # 1: The Conditio nal Distributi on of Given X I, ' X 2i , Mean of Zero 203
Assumption # 2: (XI, ' X 2" . . . , Xiu ' Y,) i I, ... ,n Are i.i.d. 203
As sum ption # 3: Large Outliers Are Unlikely 203
Assumpti on # 4: No Perfect Mul ticollinearity 203
u ,
202
. ..
,Xk, H as a
=
6.6 The Distribution of the OLS Estimators
in Multiple Regression
205
206
6.7 Mul t icollinearity 206
Examples of Perfect Multicollineari ty Impe rfect Multicollinearity 209
6.8 Conclusion
APP ENDIX 6 . 1 APPEN DIX 6 .2
210
Derivation of Eq uation (6 .1) 218
Di stribution of the OLS Estimators W hen There Are Two
Regressors and H omoskedastic Errors 218
CHAPTER 7 Hypothesis Tests and Confidence Intervals
in Multiple Regression fo r a Single Coeffic ie nt
220
221
7.1 Hypothesis Tests and Confidence Intervals
Standard Errors for the OLS Estimators 22 1
Hypothesis Tests for a Si ngle Coeffi cient 22 1
C onfidence Intervals for a Single Coefficient 223
Application to Test Scores and the Studen t Teacher Ratio
7.2 Tests of Joint Hypotheses 225
Testi ng H ypotheses on Tw o or More Coefficients 225
T he FStatistic 227
Applicatio n to Test Scores and the Student Teacher Ratio T he Homoskedastici ryOnly FStatistic 230
22 3
229
7.3 Testing Si ngle Restrictions
Involving Multiple Coefficient s
232
234
7.4 Confidence Sets for Multiple Coefficients
xii
CONTE NTS
7.S
Model Specification for Multipl e Regression
235
Omitted Variable Bias in Multiple Regression 236
Model Specification in Theory and in Practice 236
Interpreting the R2 and the Adjusted R2 in Practice 237
7.6 7.7
Analysis of the Test Score Data Set Conclusion 244
239
APPE ND IX 7. 1 The Bonferroni Test ofa Joint Hypotheses
25 1
CHAPTER 8
Nonlinear Regression Functions 8.1
254
A General Strategy for Modeling Nonlinear
Regression Functions 256
Test Scores and District Income 256
The Effect on Y of a Change in X in Nonlinear Specifications 260
A General Approach to Modeling Nonlinearities Using Multiple Regression
264
8 .2
Non linear Functions of a Single Independent Variable
264
Polynomials 265
Logarithms 267
Polynomial and Logarithmic Models of Test Scores and District Income 8.3
275
Interactions Between Independent Variables
277
280
Interactions Between Two Binary Variable s 277
Interactions Between a Continuous and a Binary Variable Interactions Between Two Continuous Variables 286
8.4
Nonli ne ar Effects on Test Scores of the StudentTeacher Ratio
Discussion of Regression Results Summary of Findings 295
291
290
8. 5
Conclusion
APPENDI X 8.1
296
Regression Functions That Are Nonlinear
in the Parameters 307
CHAPTER 9
Assessing Studies Based on Multiple Regression
312
9.1
Internal and Exte rna l Validity
Threats to Internal Validity 313
Threats to External Validity 314
31 3
9.2
Threats to Internal Va lidity of Multiple Regression Analysis
Omitted Variable Bias 316
Misspecification of the Functional Form of the Regression Function Errorsin Variables 319
Sample Se lection 322
316
319
CONTE NTS
xiii
Simulta neous Causa li ty 324
Sources of Inconsistency of OLS Standard Errors
325
9.3 Internal and External Validity When the Regression Is Used
for Forecast ing
327
328
Using Regression Models for Forecasting 327
Asse ssing the Validity of Regre ss ion Models for Foreca sti ng
9.4 Example: Test Scores and Class Size External Validity 329
Internal Validity 336
Discu ss ion and Implication s 337
9.S Conclusion
APPEND IX 9.1
329
338
The Massachu setts Elementary School Testing Data
344
PART THREE Further Topics in Regression Analysis
CHAPTER 10
347
Regression with Panel Data
349
351
10. 1 Panel Data 350
Example: Traffic Deaths and A lcohol Taxes
10.2 Panel Data with Two T ime Pe riods: "Before and After"
Comparisons
353
10.3 Fixed Effects Regression 356
The Fixed Effects Regression Mode l 356
Estimation and Inference 359
Application to Traffic Death s 360
10.4 Regression with Time Fixed Effects 361
Time Effects Only 36 1
Both En tity and Time Fixed Effects 362
10.S The Fixed Effects Regression Assumpt ions and Standard Errors
for Fixed Effects Regression
364
The Fixed Effects Regression Assumptions 364
Standard Errors for Fi xed Effects Regression 366
10.6 Drunk Driving Laws and Traffic Deaths 10.7 Conclusion
367
371
378
APPENDIX 10 . 1 The State Traffic Fatality Data Set APPENDIX 10 . 2
Standard Errors for Fixed Effects Regression
with Serially Correlated Errors 379
xiv
CONTENTS
CHAPTER II
11.1
Regression with a Binary DependentVariable
Binary Dependent Variables 385
The Linear Probability Model 387
383
384
Binary Dependent Variables and the Linear Probability Model
11.2
Probit and Logit Regression
Probit Regression 389
Logit Regression 394
389
Comparing the Linear Probability, Probit, and Logit Models
11 .3
396
Estimation and Inference in the Logit and Probit Models
Nonlinear Least Squares Estimation 397
Maximum Likelihood Estimation 398
Measures of Fit 399
396
11.4 11.5
Application to the Boston HMDA Data Summary 407
400
APPENDIX 11 . 1 The Boston HMDA Data Set APPENDIX 11 . 2 APPEND IX 11. 3
415
415
418
Maximum Likelihood Estimati on
Other Limited Dependent Variable Models
CHAPTER 12
12.1
Instrumental Variables Regression
421
The IV Estimator w ith a Single Regressor
and a Single Instrument 422
The IV Model and Assumptions 422
The Two Stage Least Squares Estimator 423
Why Does IV Regression Work? 424
The Sampling Distribution of the TSLS Estimator Application to the Demand for Cigarettes 430
428
12.2
The General IV Regression Model
432
434
TSLS in the General IV Model 433
Instrument Relevance and Exogeneity in the General IV Model The IV Regression Assumptions and Sampling Distribution
of the TSLS Estimator 434
Inference Using the TSLS Estimator 437
Appl ication to the Demand for Cigarettes 437
12.3
Checki ng Instrument Validity
439
Assumption # 1: Instrument Relevance 439
Assumption # 2: Ins trument Exogeneity 44 3
12.4
Application to the Demand for Cigarettes
445
CONTENTS
XV
12.5 Where Do VaJid Instruments Come From?
450
Three Examples
12.6 Conclusion
451
455
462
APPENDIX 12. 1 The Cigarette Consumption Panel Data Set APP EN DIX 12 .2
Derivation of the Formula for theTS LS Estimator in
Equation (12.4) 462
LargeSample Distribution ofthe TSLS Estimator 463
APPEND IX 12 .3 A PPE N DIX 12 .4
LargeSample Di stribu t ion of the TSLS Estimator When the
Instrument Is Not Valid 464
Instrumental Variables Anal ysis with Weak Instruments 466
APPENDIX 12.5
CHAPTER 13
Experiments and QuasiExperiments
Ideal Randomized Controlled Experiments The Differen ces Estimator 471
468
4 70
13.1 Idealized Experiments and Causal Effects
470
13. 2 Potential Problems with Experiments in Practice Threats to Internal Validity 472
Threats to External Validity 47 5
13.3 Regression Estimators of Causal Effects
472
Using Experimental Data
477
The Differences Estimator with Additi onal Regressors 477
The DifferencesinDifferences Estimator 480
Estimation of Causal Effects for Different Groups 484
Estimation When There Is Partial Compliance 484
Testing for Randomization 485
13.4 Experimental Estimates of the Effect of Class Size Reductions Experime ntal Design 486
Analys is of the STAR Data 487
Comparison of the Observational and Expe rimental Estimates
of Class Size Effects 492
13.5 QuasiExperiments 494
Examples 49 5
Econometric Methods for Analyzing QuasiExperiments
13.6 Potential Problems with QuasiExperiments
486
497
500
Threats to Internal Validity 500
Threats to External Validity 502
CONTENTS
xv
12.5 Where Do Valid Instruments Corne From? Three Examples 451
12.6 Conclusion
450
455
462
APPENDIX 12 . 1 The Cigarette Consumption Panel Data Set AP PENDIX 12.2
Derivation of the Formula for theTSLS Estimator in
Equation (12.4) 462
LargeSample Distribution oftheTSLS Estimator 463
APPENDIX 12.3 APPENDIX 12.4
LargeSample Distribution of the TSLS Estimator W hen the
Instrument Is Not Valid 464
Instrumental Variables Analysis with Weak In struments 466
APPENDIX 12.5
CHAPTER 13
Experiments and QuasiExperiments
468
13.1 Idealized Experiments and Causal Effects 470
Ideal Randomized Controlled Experiments 470
The Differen ces Estimator 471
13.2
Potential Problems with Experiments in Practice
Threats to Internal Validity 472
Threats to External Validity 475
472
13.3 Regression Estimators of Causal Effects
Using Experimental Data 477
The Differences Estimator with Additional Regressors 477
The Diffe rencesinDifferences Esti mator 480
Estimation of Causal Effects for Different Groups 484
Estim ation When There Is Partial Compliance 484
Testing for Randomization 485
13.4
Experimental Estimates of the Effect of Class Size Reductions
Experime ntal Design 486
Analysis of the STAR Data 487
Comparison of the Observational and Experimental Estimates
of Class Size Effects 492
486
13.5 QuasiExperiments 494
Examples 495
Econometric Me thods for Analyzing QuasiExperiments
13.6 Pot ential Problems with QuasiExperiments Threats to Internal Val idity 500
Threats to External Validity 502
497
500
xvi
CONTENTS
13.7
Experimental and QuasiExperimental Estimates in Heterogeneous Populations 502
Population H eteroge neity: Whose Causal Effect ? 502 OLS w ith Heterogeneous Causal Effe cts 503 IV Regression wi th Heterogeneous Causal Effects 50
13.8
Conclusion
A PP EN DIX 13.1 A PPEND IX 13. 2
507
The Project STAR DataS et 516
Extension of the Differencesi n Differences Estimator to Multiple Time Periods 517 Conditional Mean Independe nce 518
A PPE N D IX 13.3 APP EN DIX 13.4
I
Individuals
IV Estimation W hen the Causal Effect Varies Across 520
I
PART FOUR
14.1 14.2
Regression AnaJysis of Economic Time Series Data
Using Regression Mode ls for Forecasting 527 528
528
523
CHAPTER 14 Introduction to Time Series Regression and Forecasting
525
Introduction to T ime Series Data and Serial Correlation
The Rates of Inflation and Unempl oyme nt in the United States Lags, Fi rst Differences, Logarithm s, and Growth Rates 528 Autocorrelation 532 Other Examples of Economic Time Series 533
14.3
Autoregressions
535
The First Ord er A utoregressive Mode l 535 The p th Order Autoregressive Model 538
14.4
Time Series Regression with Addit ional Predictors and the Autoregressive Distributed Lag Model 541
Forecasting Change s in the Inflation Rate Using Past Unemployment Rates 541 Stationarity 544 Time Series Regre ssion w ith Multiple Predictors 545 Forecast Uncertainty and Forecast Intervals 548
14.5
Lag Length Selection Using Informat ion Criteria
549
553
Determining the Order of an Autoregression 55 1 Lag Le ngth Se lection in Time Series Regression W it h Mult iple Predictors
14.6
Nonstationarity I: Trends
What [s a Tre nd?
 ....... I ~... .."..~ P
554
" 1:; 7
555
I..... ~
r.., ........ .....I
c:: . __ l... .... ;" ;... T ...,.. .... ...I r
CONTENTS
xvii
De tec ting Stochas tic Tre nds : Testing for a Unit A R Roo t 560 Avoiding the Pro ble ms Caused by Stochastic Trends 564 14.7 Nonstationarity II: Breaks 565
W hat Is a Break? 565
Testi ng fo r Breaks 566
Pse udo Outo fSample Fo recasting 571
Avo iding the Proble ms Caused by Breaks 576
14.8 Conclusion
AP P ENDI X 14 . 1 A PPE NDI X 14 . 2 APPENDIX 14 .3 APPENDIX 14.4 A P PEN D IX 14.5
577
Time Series Data Used in Chapter 14 Stationarity in the A R(I ) Model Lag Operator Notation ARM A Models 589 588 586 586
Consistency of the BIC Lag Length Estimator 589
CHAPTER 15
1 S.l 1 S.2
Estimation of Dynamic Causal Effects An Init ial Taste of the O range J uice Data Dynamic Causal Effects 595
596
Causal Effects and Time Series Data Tw o Types ofExogenei ty 598
591 593
t 5.3
Estimation of Dynamic Causal Effects with Exogenous Regressors
The Distributed Lag Model Ass umptions 601 Au tocorrelated U t , Standard Errors, and Infe rence 60 I Dynamic Multi pli e rs and Cumulative Dynamic Mul t ipl iers
600
602
15.4 H et eroskedasticity and Autocorrelat ionConsistent Stan dard
Er rors
604
604 606
DIstribution of the OLS Estimator w ith Autocorre la ted Errors H AC Standard Errors
15.5 Estimation of Dynamic Causal Effects with Strict ly
Exogenous Regressors 608
The Distributed Lag Mode l wit h A R(I) Errors 609 OLS Estimation of the ADL Model 6 12 GL S Estimation 613 The Distributed Lag Model w ith Additional Lags a nd A R(p) Errors 15.6 Orange J uice Prices an d Cold Weather 15.7 Is Exogeneity Plausible? Some Examples U.S. Income a nd Aus tra lian Exports 625
Oil Prices and Inflation 626
6 15
618 624
xviii
CONTENTS
Monetary Policy and Inflation The Phillips Curve 627
15. 8
626
Conclusion
APPENDIX 15.1 APPENDIX 15.2
627
The Orange Juice DataSet 634
The ADL Model and Generalized Least Squares
in Lag Operator Notation 634
CHAPTER 16
16.1
Additional Topics in Time Series Regression Vector Autoregressions
638
637
The VAR Model 638
A VAR Model of the Rates of Inflation and Unemployment
16.2
641
Multiperiod Forecasts
642
Iterated Muliperiod Forecasts 643
Direct Multiperiod Forecasts 645
Which Method Should You Use? 647
16.3
O rders oflntegration and the DFGLS Unit Root Test
Other Models ofTrends and Orders of Integration 648
The DFGLS Test for a Unit Root 650
Why Do Unit Root Tests Have Nonnormal Distributions?
648
653
16.4
Cointegration
655
658
Cointegratlon and Error Correction 655
How Can You Tell Whether Two Variables Are Cointegrated 7 Estimation of Cointegrating Coefficients 660
Extension to Multiple Cointegrated Variables 661
Application to Interest Rates 662
16.S
Volatility Cl ustering and Autoregressive Conditional
Heteroskedasticity 664
Volatility Clustering 665
Autoregressive Conditional Heteroskedasticity Application to Stock Price Volatility 667
666
16.6
Conclusion
669
674
AP PENDIX 16 . 1 U.S. Financial Data Used in Chapter 16
PART FIVE
CHAPTE R 17
The Econometric Theory of Regression Analysis
The Theory of Linear Regression with One Regressor
The Extended Least Squares Assumption s The OLS Estimator 680
678
675
677
678
17.1
The Extended Least Squares Assumptions and the OLS Estimator
CON TENTS
xix
17.2
Fundamentals of Asymptotic Distribution Theory
680
Convergence in Probability and the Law of Large Numbers 681
The Centra l Limit Theorem and Convergence in Distri bution 683
Slutsky's Theore m and the Contin uous Mappi ng Theorem 685
Application to the tStatistic Based on the Sample Mean 68 5
17.3
Asym ptotic Distribut ion of the OLS Estimator and tStatist ic
686
Consistency and Asymptotic Normality of the OLS Estimators 686
Consistency of H eteroskedasti cityRo bust Standard Errors 686
Asymptotic Norma lity of the H eteroskedasticityRobust tStatistic 688
17.4
Exact Sampling Dist ributions When the Errors
Are Normally Distributed 688
Distribution of ~ J with No rma l Errors 688
Distribution of the Homoskedasticityonly tStatistic 690
17.5
Weigh ted Least Squares
691
W LS with Known Heteroske dasti city 69 1
W LS w ith H eteroskedasticity of Know n Functional Form 692
H eteroskedasticityRobust Standard Errors or W LS? 695
AP P ENDI X 17. 1 The Normal a nd Re lated Distributions and Moments of
Contin uous Random Variables
A PP ENDI X 17. 2
700
702
Two Inequa lit ies
C HAPTER 18 The Theory
18.1
of Multiple Regression
704
The Linear Multipl e Regressi on Model and OLS Estimator
in Matrix Form 706
The Mul tiple Regressio n Model in Matrix Notation The Extended Least Squares Assu mptions 707
The O LS Estimator 708
706
18.2
Asymptot ic Distribution of the OLS Estimator and tStatistic
The Mul tivariate Central Limit Theorem 7 10
Asymptotic Normali ty of ~ 710
Hete roskedastic ityRobust Standard Errors 711
Co nfidence Intervals for Predicted Effects 712
A symptotic Di stri but ion of the tStatistic 7 13
710
18.3
Tests of Join t Hypoth eses
713
Join t H ypotheses in Matrix Notation 713
Asymptotic Di stribution of the FStatistic 7 14
Confidence Sets for Mult iple Coefficients 71 4
18.4
Distribution of Regression Statistics w ith Normal Errors
Matrix Represe ntations ofO LS Regression Statistics Di stribution of ~ w ith Nor mal Errors 7 16
7 15
715
xx
CONTEN TS
Distribution of S$ 717
HomoskedasticityOnly Standard Errors Distribution of the tStatistic 718
Distribution of the FStatistic 718
717
18. 5
Efficiency of the OLS Estimator with Homoskedastic Errors
The GaussMarkov Conditions for Multiple Regression 719
Linear Conditionally Unbiased Estimators 719
The GaussMarkov Theorem for Multiple Regression 720
719
18.6 Generalized Least Squares 721
The GLS Assumptions 722
GLS When n Is Known 724
GLS When n Contains Unknown Parameters 725
The Zero Conditional Mean Assumption and GLS 725
18.7 Instrumental Variables and Generalized Method
of Moments Estimation
727
The IV Estimator in Matrix Form 728
Asymptotic Distribution of the TSLS Estimator 729
Properties ofTSLS When the Errors Are Homoskedastic 730
Generalized Method of Moments Estimation in Linear Models 733
APPENDIX 18, I APPENDIX 18,2 APPENDI X 18,3 APPE N D IX 18 .4
Summary of Matrix Algebra Multivariate Distributions
743
747
748
Derivation of the Asymptotic Distribution of {3
Normal Errors
APPENDIX 18,5
Derivations of Exact Distributions of OLS Test Statistics with
749
Proof of the GaussMarkov Theorem
for Multiple Regression 751
Proof of Selected Results for IV and GMM Estimation 752
APPENDIX 18.6
Appendix 755
References 763
Answers to "Review the Concepts " Questions Glossary 775
Index 783
767
Key Concepts
PART ONE
1.1
Introduction and Review
15
CrossSectional, Time Series, and Panel Data Expected Value and the Mean
2.1
2.2 2.3 2.4 2. 5 2.6 2 .7 3. , 3.2 3.3 3. 4 3.5 3.6 3. 7
24
25
Variance and Standard Deviation
Means, Variances, and Covariances of Sums of Random Variables Computing Probabil ities Involving Normal Random Variables Simple Random Sampling and i.i.d. Random Variables The Central Limit Theorem Estimators and Estimates Efficiency ofY: Y is BLUE The Standard Error ofY 67 47
38
40
50
Convergence in Probability, Consistency, and the Law of Large Numbers
55
Bias, Consistency, and Efficiency
68
70
79
1: ILY,0
76
80
The Terminology of H ypothesis Testing
Testing the Hypothesis E(Y) = J.Ly,0 Against the Alternative E(Y) Confidence Intervals for the Population Mean
82
PART TWO
4.1
4.2 4.3 4.4
Fundamentals of Regression Analysis
I 19
109
I 15
Terminology for the Linear Regression Model with a Single Regressor The OLS Estimator, Predicted Values, and Residuals The Least Squares Assumptions General Form of the tStatistic
131
LargeSample Distributions of.8o and.81
133
1:
5. 1
5.2 5.3 5.4 5.5 6. 1 6 .2 6.3 6.4 6.5
150
f3 1,0
Testing the Hypothe sis f3 1 = f31 ,0 Against the Alternative f31 Confidence Interval for f31 157 Heteroskedasticityand Homoskedasticity
152
162 189 198
T he GaussMarkov Theorem for .81 168 Omitted Variable Bias in Regression with a Single Regressor The Multiple Regress ion Model
196
The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression Model The Least Squares Assumptions in the Multiple Regression Model Large Sample Distribution of .80' .81' ... , .8k
204
206
xxi
xxii
KEY CONC EPTS
7.1 7. 2 7.3
Testi ng the H ypothe sis f3i = {3,.o Against the A lternat ive f3, 7= !3j.o 222
Confidence Intervals for a Single Coefficient in Multiple Regression 223
O mitted Variable Bias in Multiple Regression 237
7.4
8. 1 8.2 8 .3 8.4 8.5 9.1 9 .2 9 .3 9 .4 9 .5 9 .6 9.7
R2 and R2: W hat T hey Tell You and What They Don't 238
The Expected Effect on Y of a Change in X, in the Nonlinear Regression Model (8 .3) Logarithms in Regression: Three Cases 273
279
282
A Method fo r Interpreting Coefficients in Regressions w ith Bi nary Variables Interactions Between Binary and Continuous Variables Inte racti ons in Mu ltiple Regression Internal and External Validity 313
318
319
287
26 1
Omitted Var 'iable Bias: Should I Include More Variables in My Regression? Functional Form Misspecification ErrorsinVariables Bias Sam ple Se lection Bias 321
323
325
327
Simultaneous Causal ity Bias
Threats to the Internal Validity of a Multiple Regression Study
PART THREE Further Topics in Regression Analysis
10. 1 10.2 10.3 11. 1 11.2 11. 3 12.1 12. 2 12 .3 12.4 12 .5 12 .6 Notation for Panel Data 350
359
365
The Fixed Effects Regression Model The Linear Probability Model Logit Regression 394
388
347
The Fixed Effects Regre ssion Assumptions
The Probit Model, Predicted Probabilities, and Estimated Effects
392
433
The General Instrumenta l Variables Regression Model and Terminology Two Stage Least Squares 435
436
44 1
444
437
The Tv" 0 Condi t ions for Valid Instruments The IV Regression Assumptions
A Rule ofThumb for Checking for Weak Instruments The O veridentifying Restrictions Test (the }Statistic)
PART FOUR
14 .1 14. 2 14. 3 14.4 14.5
Regression Analysis of Economic Time Series Data
530
532
523
Lags , First Differences, Logarithm s, and Growth Rates Autocorre lation (Serial Correlation) and Autocovariance Autoregressions Stationarity 539
544
The A utoregressive Distributed Lag Model 545
KE Y CON CEPTS
xxiii
14. 6 Time Se nes Regression w ith Multiple Predictors 546
14.7 Gra nger Causal ity Tests (Te sts of Predictive Content) 547
14.8 The Augmented Dickey Fuller Test for a Unit Autoregressive Root 14 .9 The Q LR Test for Coefficient Stabil ity 14. 10 Pse udo OutofSample Forecasts 15.1 15 .3 15.4 16. I 16.2 16.3 16.4 16.5 572
600
602
6 17
569
562
The Distributed Lag Model and Exogeneity
15 .2 The Distributed Lag Model Assumptions
H AC Standard Errors 609
Estimation of Dynam ic Multipliers Under Strict Exoge nei ty Vector Autoregressions 639
645
650
Iterated Multiperiod Forecasts
Direct Multiperiod Forecasts 647
Orders of Integration, Differe ncing, and Stat ionarity Cointegration 658
PART FIVE
17 .1 18.2 18. 3 18.4
The Econometric Theory of Regression Analysis
675
680
707
The Extended Least Squares Assumptions for Regression w ith a Single Regressor The Multivariate Central Li mitTheorem The GLS Assumption s 723
710
721
18. 1 The Extended Least Squares Assumption s in the Multiple Regression Model GaussMarkov Theorem for Multiple Regression
G eneral Interest Boxes
The Distribution of Earnings in the United States in 2004 A Bad Day on Wall Street 42
Landon Wins I 7[
35
The Gender Gap of Earnings of College Graduate s in the United States A Novel Way to Boost Retire ment Savings 90
The "Beta" ofa Stock 122
86
The Economic Value of a Year of Education: Heteroskedasticity or Homoskedasticity? 165
The Mozart Effect: Omitted Variable Bias? 190
The Returns to Education and the Gender Gap The Demand for Economics Journa[s 288
323
407
425
Do Stock Mutua[ Funds Outperform the Market? Who Invented Instrumental Variable Regression? A Scary Regre ssion 426
The Externa[ities of Smoking The Hawthorne Effect 474
498
446
284
James Heckman and Danie[ McFadden , Nobel Laureate s
What [s the Effect on Employment of the Minimum Wage? Can You Beat the Market? Part I 540
The River of Blood 550
573
Can You Beat the Market? Part 11
N EWS FLASH : Commodity Traders Send Shivers Through Di sney World Robert Engle and Clive Granger, Nobel Laureates 657
625
Preface
E
conometrics ca n be a fun course for both teacher a nd student. Th e real world of economics, business, and government is a complicated and messy place. full of competing ideas and quest ions tha t dem and answe rs. Is it more effective to tackle drunk driving by passing tough laws or by increasing the tax on alcohol? Can yo u make money in the stock market by buying when prices are historically low, relative to earnings. or should yo u just sit tight as the ran dom walk theory of stock prices suggests? Can we improve elementary education by reducing class sizes. or should we simply have our children listen to Mozart for tcn min utes a day? Econometrics helps us to sort o ut sound ide as from crazy ones and to find qu an titative answers to important quantitative questions. E conometrics opens a wi n dow on our complicated world that lets us see the relationships on which people. businesses, and governments base their decisions. This textbook is designed for a first course in undergraduate econometrics. It is our experience that to make econometrics relevant in an introductory course, interesting applications must motivate the theory and the theory must match the applications. TIlis simple principle represents a significant departure from the older generation of econ ometrics books, in which theoretical models and ass um ptions do not match the app lications. It is no wonder tha t some students question the rel evance of econometrics after they spend much of their time learni ng assumptions that th ey subsequently realize are unrealistic, so that they must then learn "sol u tions" to " problems" that arise when the applications do not match the assump t ions. We believe that it is far better to motivate the need for tools with a concrete application, and thell to provide a few simple assumptions that match the appli catio n. Because the the ory is immedia tely relevant to tbe applica tions, this approach can make econometrics come alive. TIle second edition benefits from the many constructive suggestions of teach ers who used the first edition, while maintaining the philosophy tbat applications should drive the theory, not the other way around. The single greatest change in the seco nd edition is a reo rgani za tion and expan sion of the material o n core regression analysis: Part II, wh ich covers regression with crosssectional data, has been expanded from fo ur chapters to six. We have added new empirical exampl es (as boxes) drawn from economics and finance; some new optional sectio ns on
xxviii
PREFACE
classical regression theory; and many new exercises, both paperandpencil and computerbased empirical exercises using data sets newly placed on the textbook Web site. A more detailed description of changes to the second edition can be fo und on page xxxii.
Features of This Book
This textbook differs from others in three main ways. First, we integrate realworld questions and data into the development of the theory. and we take seriously the substantive findings of the resulting empirical analysis. Second, our choice of top ics reflects modern theory and practice. Third, we provide theory and assumptions that match the applications. Our aim is to teach students to become sophisticated consumers of econometrics and to do so at a level of mathematics appropriate for an introductory course.
Realworld Questions and Data
We organize each methodological topic around an important realworld question that demands a specific numerical answer. For example, we teach singlevariable regression, multiple regression , and functional form analysis in the context of esti mating the effect of school inputs on school outputs. (Do sma ller elementary school class sizes produce higher test scores?) We teach panel data methods in the context of analyzing the effect of drunk driving laws on traffic fatalities. We use possible racial discrimination in the market for home loans as th e empirical appli cation for teaching regression with a binary dependent variable (logit and probit). We teach instrumental variable estimation in the context of estimating the demand elasticity for cigarettes. Although these examples involve economic reasoning, all can be understood with only a single introductory course in economics, and many can be understood without any previous economics coursework.Thus the instruc tor can focus on teaching econometrics, not microeconomics or macroeconomics. We treat all our empirical applications seriously and in a way that shows stu dents how they can learn from data but at the same time be selfcritical and aware of the limitations of empirical analyses. Through each application, we teach students to explore alternative specifications an d thereby to assess whether their substantive findings are robust. The questions asked in the empirical appli cations are important, and we provide serious and, we think. credible answers. We encourage students and instructors to disagree, however, and invite th em to reanalyze the data , which are provided on the textbook's companion Web site (www.awbc.comlstock_ watson).
PREFACE
xxix
Contemporary Choice ofTopics
Econometrics has come a long way in the past two decades. The topics we cover reflect the best of contemporary appli ed econometrics. O ne can only do so much in an introductory course. so we focus on procedures and tests tha t ar e commo nly used in practice. For example:
• Instrumental variables regression. We present instrumental variables regres
sion as a general me thod for handJing correlation be tween the error term and a regressor, which can arise for many reasons, including omitted variables and simultaneous causality. The two assumptions for a valid instrum enl  exo geneity and relevanceare given equal billing. We follow that presentation with an extended discussion of where instruments come fro m, and with tests of overidentifying restrictions and d iagnostics fo r weak instrumentsa nd we explain what to do if these diagnostics suggest problems.
• Program evaluation. An increasing number of econometric studies analyze
either randomized controlled experiments or quasiexperiments, also known as natural experiments. We address these topics, often collectively referred to as program evaluation, in Chapter 13. We present this research strategy as an alternative approach to the problems of omitted variables. simultaneous causality, and selection, and we assess both the strengths and the weaknesses of studies using experimental or quasiexperimental data .
• Forecasting. The chapter on forecasting (Chapter 14) considers univariate
(autoregressive) and multivariate forecasts using time series regression, not large simultaneous equation structural models. We focus on simple and reli able tools, such as autoregressions and model selection via an information cri terion, that work well in practice. This chapter also fe atures a practica lly oriented treatment of stochastic trends (unit roo ts), unit root tests, tests for structural breaks (at known and unknown dates), and pseudo o Ulofsample forecasting, all in the context of developing stable and reliable time series fore casting models.
• Time series regression. We make a clear distinction between two very differ
ent applications of time series regression: forecasting and estima tion of dynamic causal effects. The chapter on causal inference using time series data (Chapter 15) pays careful attention to when different estima tion methods, including generalized least squares, will o r will not lead to valid causal infer ences, and when it is advisable to estimate dynamic regressions using OLS with heteroskedasticity and autocorrelationconsistent standard eITors.
xxx
PREFACE
Theory T hat Matches Applications
Although econometric tools are best motivated by empirical applications, students need to learn enough econometric theory to und erstand the strengths and limita tions of those tools. We provide a modern treatment in which the (it between tbe ory and applications is as tight as possible, while keeping tlle mathematics at a level th at requires only algebra . Modern empirical applica tions share some comm on characteristics: the da ta sets typically are large (hundreds of observations, often more); regressors are not fixed over repeated samples but rather are co llected by random sampling (or some other mechanism that makes them random); the data are not normally distributed; and there is no a priori reason to think that the errors are homoskedastic (although often there are reasons to think that they are heteroskedastic). These observations lead to important differences between the theoretical development in this textbook and other textbooks.
• Largesample approach. Because data sets are large, fro m the outset we use largesample normal approximations to sampling distribu tions for hypothesis testing and confidence intervals. O ur experience is tha t it takes less time to teach the rudiments of largesample approximations than to teach the Student l and exact F distributions, degreesoffreedom corrections, and so forth. This largesample approach also saves students the frustration of discovering that, because of nonnormal errors, the exact distribution theory they just mastered is irrelevant. Once taught in the context of the sample mean, the largesample approach to hypothesis testing and confidence intervals carr ies directly thro ugh mu ltiple regression analysis, logit and probit, instrumental variables estimation, and time series methods. • Random sampling. Because regressors are rarely fi xed in econometric appli cations, from the outset we treat data on all variables (dependent and inde pendent) as the result of random sampling. This assumption matches our initial applications to crosssectional data; it extends readily to panel and time series data; and because of our largesample approach, it poses no addi tional con ceptual or mathematical difficulties. • H eterosk edasticity. A pplied econometricians routin ely use het eroskedas tieity robust standard errors to elimina te worries about whe ther hetero skedasticity is presen t or not. In thi s book. we move be yond treating heterosked asticity as an exception or a "problem" to he "solved"; instead, we allow for hete roskedasticity fro m the outset and simply use heteroskedasticity
Ii
PREFACE
xxxi
robust standard errors. We present homoskedasticity as a special case that pro vides a theoretical motivation for OLS.
Skilled Producers, Sophisticated Consumers
We hope that students using this book will become sophisticated consumers of empirical analysis. To do so, they must learn not only how to use the tools of regression analysis, but also how to assess the validity of empirical analyses pre sented to them. Our approach to teaching how to assess an empirical study is threefold. First, immediately after introducing the main tools of regression analysis, we devote Chapter 9 to the threats to internal and external validity of an empirical study. This chapter discusses data problems and issues of generalizing findings to other set tings. It also examines the main threats to regression analysis, including omitted variables, functional form misspecification, errorsinvariables, selection, and simultaneityand ways to recognize these threats in practice. Second, we apply these methods for assessing empirical studies to the empir ical analysis of the ongoing examples in the book. We do so by considering alter native specifications and by systematically addressing the various threats to validity of the analyses presented in the book. Third, to become sophisticated consumers, students need firsthand experience as producers. Active learning beats passive learning, and econometrics is an ideal course for active learning. For this reason , the textbook Web site features data sets, software, and suggestions for empirical exercises of differing scopes. These web resources have been expanded considerably for the second edition.
Approach to Mathematics and Level of Rigor
Our aim is for students to develop a sophisticated understanding of the tools of modern regression analysis, whether the course is taught at a "high" or a "low" level of mathematics. Parts IIV of the text (which cover the substantive material) are accessible to students with only precalculus mathematics. Parts IIV have fewer equations, and more applications, than many introductory econometrics books, and far fewe r equations than books aimed at mathematical sections of under graduate courses. But more equations do not imply a more sophisticated treat ment. In our experience, a more mathematical treatment does not lead to a deeper understanding for most students. This said, different students learn differently, and for the mathematically weU prepared students, le arning can be enhanced by a more explicitly mathematical treatment. Part V therefore contains an introduction to econometric theory that
xxxii
PREfACE
is appropriate lor students with a stronge r mathematical background. We believe that, when the mathema tical chapters in Pa rt V are used in conjuDction with the ma terial in Parts IiV, this book is suita ble for advanced undergraduate or mas ter's level econometrics courses.
Changes to the Second Edition
The cb a nges introduced in th c second edition fall into three ca tegories: more empi rica l exa mples: ex pa nded theoretica l ma terial, especially in the treatmen t of the core regression topics; and additional student exercises.
More empirical examples. TIl e second edition retains tbe empirical examples from the first edition and adds a significan t number of new ones. T11cse additio na l examples include estima tion of the returns to education; infe rence abou t the gen der gap in earnings: the diffic ulty of for ecasting the stock market; and modeli ng the volatility clustering in stock returns. The data se ts for these empirical exam ples are posted on the course Web site. The secoml ed ition also includes more gen eralinterest boxes, for example how sample selection bias ("survivorship bias" ) can produce misleading conclusions about whether actively man aged mutual fun ds actually beat the marker. Expanded theoretical materiaL. The phil osophy of tbis and tbe previous edi tion is that t he modeling assumptions should be motivated by empirical applica tions. For this reason, our th re e basic least sq uares assumpt io ns tha t under pin regression with a single regressor includ e neith er normali ty nor homoskedastic ity, bot h o f which are argu ably the exception in eco nometric a pp li cations. This leads directly to largesample inference using heteroskedasticityrobust standard errors. O ur experie nce is that students do not find th is difficult in fact, wh at they fin d difficult is the traditiona l approach of introducing tile homoskedasticity and normali ty assum ptions, learn ing how to use { and Ftahl es, then being told that wh at they just learned is not reliabl e in applications because of the failure of these assumptions and that these " p roblems" mu st be '·fi xed." But n ot all instruc tors sh are th is view, anti some find it useful to introd uce the homos kedastic no rm al regressi on model. Moreover, even if homosk ed asticity is the exception instead of the rule. assuming homoskedastjcity permits djscllssing the G au ssM arkov theo rem , a key moti vation fo r using ordin ary least squares (OLS). For thcse reasons. the treatm ent of the core regression mate rial has been sig nificantly expanded in the second edition. and now includes sections on the theo retical motivation for OLS (the GaussMarkov theorem). small ;ample inference in the homoskedastic normal model. ,md multicolline <lDty and the dummy vari
PREFACE
xxxiii
able trap. To accommodate these new sec tions, the new empirical exam ples, the new generalin terest boxes. and the man y new exercises, the core regression chap ters have been expanded from two to fo ur: TIle linear regression model with a sin gle regressor and OLS (Chapter 4) ; inference in regression with a single regressor (Chapter 5); the mul1 iple regression model and OLS (Chapter 6) ; and inference in the m ultiple regression model (Cha pter 7). This exp anded and reorganized treatme nt of the core regression material constitutes the single greatest change in the second edi tion . The second edit ion also includes some additional topics req uested by some instructors. One such addition is specificat ion and esti matio n of models tha t are nonli near in the pa rameters (Appendix 8.1). Another is how to compute stan dard errors in panel data regression when the error te rm is serially correlated for a given entity (clustered standard errors; Section 10.5 and A ppendix 10.2) . A third addition is an introduction to current best practices for detecting and han dling weak instruments (Appendix 12.5 ), and a fourth additi on is a treatm ent , in a new final section of the last chapter (Section 18.7) , of efficient estimation in the heteroskedastic linear IV regression mo del using general ized met hod of moments.
Additional student exercises. Th e second edi tion contains many new exer cises, both "pa per and pencil" and empirical exercises that involve the use of data bases, supplied on the course Web site, and regression software. T he data section of the course Web site has been significantly enhanced by the addition of numer ous databases.
Contents and Organization
There are fiv e parts to the textboo k. This textbook assumes t hat th e student has had a cou rse in probability and statistics, although we review that material in Part I . We cover the core material of regression analysis in Part n. Parts Ill, IV, and V prese nt additional topics that build on the core trea tme nt in Pa rt II.
Part I
Chapter 1 introduces econometrics an d str esses the importa nce of providing quan titative answers to q uantitative questions. I t discusses the concept of causality in statistica l studies and surveys the d ifferent types of data encountered in econo metrics. Material from probabi li ty and statistics is reviewed in Chap ters 2 an d 3, respectively: whether these chapters are taught in a given course, or simply pro vi ded as a reference. depends o n tbe background of th e students.
xxxiv
PREFACE
Part II
Chapter 4 introduces regression with a single regressor and ordinary least squares (OLS) estimation, and Chapter 5 discusses hypothesis tests and confidence inter vals in the regression model with a single regressor. I n Chapter 6, slUdents learn how they can address omitted variable bias using multiple regression, thereby esti mating the effect of one independent variable while holding other independent variables constant. Chapter 7 covers hypothesis tests, including FteslS, and confi dence intervals in multiple regression. In Chapter 8, the linear regression model is extended to models with nonlinear population regression functions, with a focus on regression functions that are linear in the parameters (so that t he parameters can be estimated by OLS). In Chapter 9, students step back and learn how to iden tify the stren gths and limitations of regression studies, seeing in the process how to apply the concepts of intern al and external validity.
Part III
Part III presents extensions of regression methods. In Chapter 10, students learn how to use panel data to control for unobserved variables that are constant over time. Chapter 11 covers regression with a binary dependent variable. Chapter 12 shows how instrumental vari ables regression can be used to address a variety of problems that produce correlation between the error term and the regressor, and examines how one might find and evaluate valid instruments. Chapter 13 introduces students to the analysis of data fro m experiments and quasi , or natural, experi ments, topics often referred to as "program evaluation."
Part IV
Part IV takes up regression with time series data. Chapter 14 focuses on forecast ing and introduces various modern tools for analyzing time series regressions such as unit root tests and tests for stability. Chapter 15 discusses the use of time series data to estimate causal relations. Chapter 16 presents some more advanced tools for time series analysis, including models of conditional heteroskedasticity.
Part V
Part V is an introduction to econometric theory. This part is more than an appen dix that fills in mathemat ical details omitted from the text. R ather, it is a selfcon tained treatment of the econometric theory of estimation and inference in the linear regression model. Chapter 17 develops the theory of regression analysis for a single regressor; the exposition does not use matrix algebra, althougb it does demand a higher level of mathematical sophistication than the rest of the text.
PREFACE
xxxv
TABLE I Guide to Prerequisites for SpecialTopic Chapters in Parts III, IV, and V
Prerequisite parts or chapters
I
I I
Part I
I
I
Part II
I
I
I
I
Part III
10. 1, 10.2
I
I
14. 1 14.4
Part IV
14.5 14.8
IS
Part V
Chapter
13
I
47, 9
I
8
12.1, 12.2
I
I
17
I

1
10 11 12.1. 12.2 12.312.6  13
X' X'
xa
I
X· X' X·
c
15 16 17
 L4 
X· X' X·
xa
X· X· X' X' X X
X X X X X
I   b b b
I
X X

X X
X X
xa
X· X X
X
X X
18
X X
X
this table shows the minimum prerequisites needed to cover the material in a given cha!.'er. For example, estimation of
dynam ic causal effects with time series data (Chapter 15) first requires Part I (as neede ,depending on student preparation, and except as noted in footnote a), Part II (except for chapter 8; see footnote bJ, and Sections 14.114.4. _ aChopters 1016 use exclusively large.sa1.le aphroximations to samplinr, distributions, so the optional Sections 3.6 (the Student , d istribution for testing means) an 5 .6 (t e Student r distribution or testing regression coefficients) can be skipped. bChapters 14 16 (the time series chapters) can be taught without first teaching Chapter 8 (nonlinear regression functions) if the instructor pauses to explain the use of logarithmic transformations to approximate percentage changes.
'
I
Chapter 18 presents and studies the multiple regression model, instrumental vari ables regression, and generaliz ed method of m oments es tima tion of the linear model, all in matrix form.
Prerequisites Within the Book
Because different instructors like to emphasize different material, we wrote tltis book with diverse teaching preferences in mind. To the maximum extent possible, the chapters in Parts III, IV. and V are "standalone" in the sense that they do not require first teaching all the preceding chapters. The specific prerequisites for each chapter are described in Table 1. Although we have foun d that the sequence of top ics adopted in the textbook works well in our own courses, the chapters are written in a way that allows instTuctors to present topics in a di(ferent order if they so desire.
xxxvi
PREFACE
Sample Courses
This book accommodates several dillerent course structures.
Standard Introductory Econometrics
This course inlroduces econometdcs (Chapter 1) and reviews probability and sta tistics as needed (Chapters 2 and 3). It then moves on to regression with a single regressor, multiple regression, the basics of functional fOfm analysis. an d L eval he uation of regression studies (all of Part II) . The course proceeds to cover regres sion with panel data (Chapter 10), regression with a limited dependent vari able (Chapter 11), and/or instrumenta l variables regression (Chapter ] 2). as time per mits. The course concludes with experiments and quasiexperiments in Chapter 13, topics that provide a n opportunit y to return to the questions of estimating causal effects raised at the beginning of th e semeste r and to recapitulate core regression methods. Prerequisites: Algebra II and introdu ctory statistics.
Introd uctory Econometrics with Time Series and Forecasting Applications
Like the standard introductory course, this course covers all of Part I (as needed) and all of Part II. Optionally, the course next provides a brief introduction to panel dat a (Sections 10.l and 10.2) and takes up instrumental variables regression (Chapter 12, or just Sections 12.1 and 12.2). The course then proceeds to Part IV, covering fore casting (Chapter 14) and estimation of dynamic causal effects (Chap ter 15). If time permits. the course can incl ude some advanced topics in ti me series analysis such as volatility clustering and conditional heterosked asticity (Section 16.5). Prerequisites: Algebra f1 and introductory statistics.
.;
:
A pplied T ime Series Analysis and Forecasting
This book also can be used for a short course on applied time series an d fore casti ng, for which a course on regression analysis is a prereq uisite. Some time is spent reviewing the tools of basic regression analysis in Part II. depending on stu den t preparation. The course then moves directly to Part IV and works through forecasti ng (Chapter 14), estimation of dynamic causal effects (Chapter 15). and advanced topics in time series analysis (Chap ter 16), incl uding vector autore gTessions and conditional heteroskedasticity. An important component of this co urse is han dson forecasting exercises. ava il a ble to instructors on the book's accompa nying Web site. Prerequisites: A lgehra {J alld hasic introductory econo m etrics or the eqll;valelll .
PREFACE
xxxvii
Introduction to Econometric Theory
This book is also suitable {or an advanced undergraduate course in which the stu dents have a strong mathematical preparation, or for a master's level course in econometrics. The course briefly reviews the theory of statistics and probability as necessary (Part I). The course introduces regression analysis using the nonmath ematical, applicationsbased treatment of Part II. This introduction is followed by the theoretical development in Chapters 17 and 18 (through section 18.5). The course then takes up regression with a limited dependent variable (Chapter 11 ) and maximum likelihood estimation (Appendix 11.2). Next, the course optionally turns to instrumental variables regression and Generalized Method of Moments (Chapter 12 and Section 18.7), time series methods (Chapter 14), andlor the esti mation of causal effects using time series data and generalized least squares (Chap ter 15 and Section 18.6). Prerequisites: calculus and introduclory statistics. Chapter 18 assumes previous exposure to m atrix algebra .
Pedagogical Features
The textbook has a variety of pedagogical features aimed at helping students to understand, to retain, and to apply the essential ideas. Chapter introductions pro vide a realworld grounding and motivation, as well as a brief road map high lighting the sequence of the discussion. Key terms are boldfaced and defined in context throughout each chapter, and Key Concept boxes at regular intervals recap the central ideas. General interest boxes provide interesting excursions into related topics and highlight realworld stud. es that use the met hods or concepts being dis i cussed in the text. A numbered Summary concluding each chap ter serves as a help ful framework for reviewing the main points of coverage. The questions in the R eview the Concepts section check students' understanding of the core content, Exercises give more intensive practice working wi th the concepts and techniques introd uced in the chapter, and Empirical Exercises allow the students to apply what they have learned to answer realworld empirical questions. A t the end of the textbook, the References section lists sources for further reading, the Appen dix provides statistical tables, and a Glossary conveniently defines all the key terms in the book.
Supplements to Accompany the Textbook
The on li ne suppleme nts accompanying the Second E d ition of Introduction to Econom etrics inclu de the Solutions Manual, Test Bank (by Manfred W. Keil of Claremont McKenn a College), and PowerPoint Lecture Notes with text figures.
Our presentation of the mate rial on tim e series has benefited fro m di scussions with Yacin e Ai t Sal181ia (Princeton). A t H arvard 's Ken nedy School of Government. Suzanne Cooper provided invalu abl e suggestio ns and detailed comm ents on mul tip le drafts. offered in Test Ge nerator Software (Tesl Gen with Q uizMaster) . a Compa nion Web site. provides a wide range of addi ti o nal resource s fo r students and faculty.awbc. These include data se ts [or all the text examples. found at www. San Di ego).com/irc. Berkeley) provided thoughtful sugges tions abou t o ur treatmen t of mater ials on program evaluation. A lberto Abadie and Sue Dynarski. Lf instructors prefer their su pplements on a CDROM. John Bound (Un iversity of Michigan). Gregory .xxxviii PREFACE tables. Our biggest debts of gra tit ude arc to our colleagues at Har va rd and Prin ceton who used early drafts of this book jn tJleir classrooms. Andrew Harvey (Cambridge Un iversity). Acknowledgments A great many people contributed to the first edition of this book. A t Princeton. and an Excel addin fo r OLS regressions. avail able for Windows and Maci ntosh. anti Christopher Sims (Prince ton ). an d the Solut ions Manual. cont ains the PowerPoint Lectu re Notes. B ruce Hansen (Uni versi ty of Wisconsin .comlstoclcwatson. data se ts for the endofchapter Empirical Exercises. rep licat ion fil es for empirical resul ts reported in the text. We also owe much to many of our frie nds and colleagues in econometrics who spent time talking with us about the su bstance of this book and who collectively made so ma ny helpfu l suggestions. Mad i son) and Bo Honore (Princeton) provided helpful fee dback on very ~arly o ut lines and preliminary versions of the core materia l in Part n. EViews amI STATA tutorials for students.3\Vbc. Graham Elliott (University of Calirornia. while the Test Bank. As a coteacher with one of the authors (Stock). and Key Concepts. In addjtio n. FinaUy. she also helped to vet much of the material in this book while it was being developed for a requ ired course for maste r's studen ts at the Kenn edy School. These resources are avaiJ able fo r download from th e Inst ructor's Resource Center at www. the Test Bank. Joshua A ngris t (MlT) and Guido Imbens (U nivcrsi ty of Califo rnia. The Solutions Manua l includes solutions to alllhe end ofchapter exercises. provides a rich supply of easily edited test problems and questions of various types to mee t specific course needs. for their patient explanations of quasiexperiments and the fi eld of program eval uation and for th eir deta iled comments on early drafts of the text. Eli Tamer taught from an early draft and also provided belpful commen ts o n the penultimate draft of the book . many people maJe helpful suggestions on parts of the manuscript close to their area of expertise: Don Andrews (Yale). OUI Instructor's Resource Disk. We are also indebted to two o the r Kennedy School colleagu es.
University f Montana . We thank Jonathan Gruber (MIT) for sharing his data on cigarette sales. Agnello. We are also grateful for the many constructive. Eric Ha nushe k (the Hoover Ins titu tion). University of Maryland. and AJa n Krueger (Princeton) for his hel p with th e Te nnessee STAR data that we analyze in Chapter 11. We are grateful to Charlie DePascale. The California test score data were constructed with the assista nce of Les Axelrod of the Standards and Assessments Division. Blackburn.Alan Krueger (Princeton). Canada Richa rd 1. A lessand ro Tarozzi. Caroline Hoxby (H ar vard). we particularly thank Geoffrey TooteU for providing us with the updated version of the data set we use in Chapter 9. Bauru . U niversi ty of Delaware C lopper Almon. Han Hong (Princeton). Massachusetts Institute of Technology Swarnjit S.). Student Assessment Services. Baltimore County Tim Conley. Queen's University. University of Maryland Joshu a Angrist. Mi Iwa ukee Ch ristopher F. and Richard Zeckhauser (Harvard). David Neumark (Michigan State U niversity). James Heckm an (Unive rsity of Chicago). Corp. and Malt Watson worked through several chapters. G reensboro) graciously pro vided us with his data set on drunk driving la ws and traffic fa talities. David Dm kker (Stata . California Department of Education. Amber Henry. University of WiscQnsin. Kenneth Warne r (U niversi ty of Michigan). Universi ty of Chicago Douglas Daienberg. Thomas Downes (Tufts). Joseph Newhouse (Harvard) . G raduate School of Business. Ch ristopher Ruhm (U niversity o f North Carolina. Many people were very generous in provicling us with data. Hong Li. U niversity of New Mexico ChiYoung Choi. University of New Hampshire Dennis Coates. Kerry Griffin and Yair Listokin read the entire manu script. Ori Heffetz. and thoughtfu l com ments we received from those who reviewed various drafts for AddisonWesley: We t hank se veral people for carefu lly checking th e page proof for errors. Steven Levitt (University of Chicago).PREFAC E xxxix Chow (Princeton). and Andrew Fraker. Un iversity of South Carolina A lok Bohara. Pie rre Perron (B os ton University). A rora . Michael Abbott. Jean Baldwin G rossman (Princeton). detailed. for his help with aspects of the Massachusetts test score data set. The research department at the Federal Reserve Bank of Boston deserves thanks for putting together their data on racial discrimination in mortgage lending. Richa rd Light (Harvard ). an d Lynn Brown e for explaining its policy context. Massachusetts Departmen t of E ducation. Boston College McKinley L. which we analyze in Chapter 10.
The George Washington University Elia Kacapyr. University of Melbourne. Sh ippensburg University Tung Liu. LBJ School of Public Affairs. State University of New York. Claremont McKenna College Eugene Kroch. University of Wisconsin. California State University. Buffalo jjangfeng Zh ang. Hansen. Boston University Robert PhilEps. University of Minnesota Sunil Sapra. Australia Marc Henry. Fullerton Rae Jean B. Murray State U niversity A drian R. Truman State University Christopher Taber. Middlebury College Zhenhui Xu. D oyle. Syracuse U niversity Pierre Perron. Stratton. Wunnava. Denver Mototsugu Shintani. Naci Mocan. United States Naval A cademy Bruce E. F leissig. Davis Frederick L. Keil. The George Washington University Simran Sahi. Villanova University Gary Krueger. Vanderbilt University Mico Mrkaic. Joutz. A ustin . Goodman. Columbia University William Horrace. California State University. Los Angeles Frank Schorfheide. University of Pennsylvania Leslie S. Henry. Austin KimMarie McGoldrick. University of California. Stanford University M. Ithaca College Manfred W. Boulder H . Johns Hopkins University Jan Ondrich. University of Colorado. Duke University Serena Ng. Berkeley john Xu Z heng. University of Texas. Madison Peter Reinhard Hansen. Virginia Commonwealth U niversity Jane Sung. University of Colorado. University of Pennsylvania John Veitch. U niversity of Texas. Brown University Ian T. Vytlacil. Albany Daniel Lee. University of Arizona Oscar Jorda. Duquesne University Joanne M. University of San Francisco Edward 1. Georgia College and State University Yong Yin. Daniel Westbrook.xl PREFACE Antony D avies. Macalester College Kajal Lahiri. Ball State University Ken Matwiczak. James Madison University D avid Eaton. U niversity of Richmond Robert McNown. State University of New York . U niversity of California. Georgetown University Tiemen Woutersen. Northwestern University Petra Todd. UniverSity of Western Ontario Phanindra V.
We have been especially pleased by the number of ins tructors who contacted us directly with thoughtful suggestions for this edition. Southern Utah University Brad Curs. Jalon Gardella . the changes made in the second edition incorporate or reflect suggestions. Depken ll. D anie l Hamer mesh. Ed McKenna. Laura Chioda. Tom D oan. Bo Honore. and Samuel Thompson. Michael Jansson . Cecilia Rouse. com ments. and Motohiro Yogo. William Greene. Ross Levine. Steve Stauss. Harvey Rosen. Finally. Bridget Page (associate media producer) . and attention to detail improved the book in many ways. H eat her McNally (supplements coordinator). and extending through the entire publishing te am. Chris Murray. Charles Spaulding (senior designer). Chris Foote. who handled the entire production process. California Sta te Universi ty. and their effo rts are evident on every page of this book . Jane and Sylvia patientl y taught us a lot about writing. Kim Craft. Peter R. Hong Li. Avinash D ixit. We also benefited from thoughtful reviews for the second ed ition prepared for Addison Wesley by: Necati Aydin. Jeffrey Kling. sta rting with o ur excelle nt editor. Nancy Fenton (managing editor) and her selection of Nancy Freihofer and Thompson Steele Inc. and presentation. Brigham Young U niversity IMing Chiu. and help provided by Michael Ash. University of O regon Jamie E me rson . and Denise Clinton (editorinchief) . Hansen. Texas Tech University Barry Palko Iowa State University Gary Ferrier. Minot State U niversity R. Graham Elliott. John Donohue. Robert Porter. Linfield College James Cardon .PREFACE xli In the first edition we benefited from the help of an exceptional development editor. University of Arkansas . Clarkson UniverSity Scott England. large and small. whose creativity. Susan Dynarski. Sylvia Mallo ry. A ddisonWesley pro vided us with first rate support. William E vans. Roberto E . John List. Manfred Keil. Alan Krueger. Florida A&M U niversity Jim Bathgate. Craig A. corrections. wh o worked with us on the second edition: Adrie nne D 'A mbrosio (senior acquisitions editor). George Tauchen. We exte nd o ur thanks to the superb A ddisonWesley team. Ken Simons. Elena Pesavento. F resno Bradley Ewing. We also received a great deal of belp preparing the second edi tion. Liran Einav. This edition (including the new exercises) uses data gene rously supplied by Marianne Bertrand. In particular. Jane Tufts. Jim Bathgate. Wei bin Huang. and Della Lee Sue helped with the exercises and solutions. we had the benefit of Kay Ueno's skilled editing in tbe second edition. Jeffrey Lieb man. G iovanni Oppenheim. JeanFrancois Lamarche. organization. hard wo rk. Do uglas Staiger.
University of Wisconsin. James Madison University Above all.Madison Christina H ilmer. Califo rnia State University. Heinrich. Northern Illinois nive rsity Louis Putterman. . J L .ry Lee. University of Central Florida Edward Greenberg. University of Missouri . University of G eorgia William Wood. Emory University Susan PorterHudak. Wassell. Virginia Polytechnic Institute Luoj ia Hu. Madison Norman Swanson. They more than anyone bore the burden of th is commitment. Wright State U niversity Brian Karl Finch. Riverside Elena Pesavento. we are indebted to our families for their endurance throughout this project. University of California. Washington University Carolyn 1. Columbia John Spitzer. the p roject must have seemed endless. and for their help and support we are deeply grateful. Sacramen to Ron Warren. SUNY at Brockport Kyle Steigert. Wayne State U niversity Ta eHv. Writing this book took a long timefor them.xlii PREFACE Rudy Fichten baum. San Diego State Un iversity Shelby Gerking. Iowa State University Charles S. Central Washington University Rob Wassmer. Northwestern University Tom oni Kumagai. Brown University Sharon Ryan. University of Wisconsi n. Rutgers University Justin Tobias.
. in general terms. Econometric me thods a re also common ly used in Olhe r soci al sc iences. . including political scicncc and sociology. One might tell you that econometrics is the lh e~ science of testing econom ic ::. policy recommenda tio ns in government and business. . or stock prices. ind uding fi nance . The chapter concludes wilh a survey of the main types o f data available Lo econometricians for Answering these and other quantita tivI: ecunouUc questions. marketi ng. Another might ::.ay thaI econome trics is the process of fi tting m:uhcmatical economic mode ls to rcal world da ta. all these answers are right . the econometric a pproach to answering them. Econo me tric me thods are used in many bra nc hes o f econo mics.CHAPTER 1 Economic Questions and Data A is the a~ sk a half dozen econometricians what econome triCs is and you could get a half dozen differe nt answers. eco no me trics is the scie nce a nd ar! of using economic theory a nd stat istical techniques to analyze economic da l ~. quantitative questions taken from the world of business and governmenf policy. the o\'("rall growth of the economy. Al a broad level.. a nd eco nomic policy. Thi s book introduces you to the core se t o f methods used oy econometricians. I n fact. We wi ll use these methods to answc r a variety of specific. This ehaple r poses four of those questions and discusses.~ 1 second might Iell you that econometrics of tools used for forecast ing fut ure va lueS of economic variables slIc h a firm 's sales. A fourth might lell )'ou thai it is the science and an o ( using historical data to make nume rical. Or quanti tati ve. macroeconomics. labor economics. microecono mics.
and macroeconomic forecasting. writing. public ed ucation system genera te heated debate.S. Ele me ntary school education has various objecti ves. These decis ions require quanlitalive answers to quantitative questions. Four of these questions concern education policy. A decision make r contemplating hiring more leachers must weigh these costs against Ule benefits. students in districts with small class sizes te nd to perfonn belter on standardized tests than students in districts with larger classcs. To provide such an answer. the argume nt goes. bUI for many pare nrs and educators the most impo rt ant objective is basic academic lea rning: reading. In this book . BUI what. and gove rnmenl hinge_o n understanding relat ionships among variables in lbe world around us. and basic mathematics. racial bias in mort gage lending. r QuestionElementary School Education? #1 : Does Reducing Class Size L Improve Proposals for reform of tlle U. precise ly. With fewer students in the classroom. building more classrooms. Is the beneficial effect on basic learning of smaller classes large or small ? Is it possible that smaller class size actu· aU) has no C(Ccct on basic learning? A lth oug h common sense and everyday expe ri ence may suggest that more learning occurs when there are fewer sludenls.1 Economic Questions We Examine Many decisions in economics.that is. such as developing social ski lls. the decision make r must ha ve a precise quantitative understanding of the likely benefits. business. howeve r. evidence based 00 data. One prominent pro posal for improving haste lea rning is to reduce class sizes at e leme ntary schools. we mu st e xamine empirical evidence. To weigh COSIS and benefits. cigarette consumption. common sense cannot provide a quanlitalive answer to the question of wha t exact ly is the effect on basic learning o f reduci ng class size.4 CHAPTER 1 Economic Ovestion~ oncI Data 1. This book examines several quantitative queslions taken from cu rrent issues in economics. each student gels more o f tbe leac her's anenrion. and grades improve.re lating class size 10 basic learning in ~kme nt af y schools. In the Cali fornia da ta. lea rning is e nhanced. While this fact is . is the e ffect on e lementary school educat io n of red ucing class size? Reducing class size costs money: It requires hiring more teachers and. if the school is already ai capaci ty. Many of the pTop os al ~ concern the younges l sWden ts. the re are f~wer class disruptions. tbose in elementary schools. we examine the re lationship between class size and basic learn ing using data gathered (rom 420 California school districts in 1998.
so students in small class districts could have mo re opportunities for learn ing outsid\! the clnssroom. notablv their abilitv to re ay the loan. For example.<. sucb as the economic background of the students. Many o f the COSIS of smok ing. it m iglll simply re neel many o lher advantages that students in districts wit h sma ll classes have over the ir counte rparts in districts with large classes.!i. Question #3: How Much Do Cigarette Taxes Reduce Smoking? Cigare tte smok ing is II major public hea lth conce rn worldwide.S.a large loan secured by t he va lue of the home. we usc mu ltipl e regres. nOi smaller class sizes. tbe re should be no racial bjas in mOrl g. Pt'Y U~~ Most people buy thei r ho mes with the he lp of a mortgage. By law.. beca use the bL a nd wbjle applicants diffe r in man y ways other than th eir race.is to isolate the eUecl of changes io class size from changes in other factors. the n. researchers Dl the Federal Reserve Bank of Boston found (using data from th e early 1990s) that 28% of black appli · cants are d en j~u mongages. there is racial bias in mo ngage lending? 1f so. how large is it? The fa ct that more black than white applica nts arc den ied in the BasIOn Fed dala does nOI by ilsel f provide e vide nce of discrimination by mortgage le nders. if so. such as Ihe medical expe nses of ca.1. Do these data indicate that.ion ana ly:. U. In Part II . these data mUSI be examined man: closely to see if there is a difference in th e probability of being denied for OIherw.Jt idtntical applicants and. districts with small class sizes tend to have wealtbier reside nts than districts wit h large classes.1 Economic Oues~oo~ We Examine S consistent with the idea that smalle r classes produce bette r lest SCOfl!S.age le nding.cal0 all ways but their race should be equallY likely to have their mOTl· gage applicatio ns approved . To do so. In contrast to tbis theo retical conclusio n. Question #2: Is There Racial Discrimination in the Market for Home Loans? M~~ ilu"""I. in practice. lending institut ions cannot lake race into account when deciding to grant or deny a reques t for a mo rtgage: Applicants who are iden. It could be these ext ra learn ing oppor tunitjes that lead to highe r lest scores. io C hapter 11 we introduce econo me tric me thods that make it possible to quantify the effeci of race o n c hance of obtainin g a mortgagl1. holding COfISWnt o the r applicant characteristics.ring fo r Ihose made sick by . whether this diffe rence is large or sma ll. ack Befo re concluding that there is bias in Ihe mortgage market . while o nly 9% of while applicanl~ are denied. In theory.
states in the 1980s and 1990s. Question #4 : What Will the Rate of Inflation Be Next Year? II see ms thllt people always wa nt a sneak preview of tbe futu re. by flIi sing taxes..rsona l income for U. Howe\'er. in ot her words. we need to analyze data on eigarene consumption and prices.n the quantity demanded resu lting from a 1% increase in price is the price elasticity of demand. say 20%. it does not tell us the nume rica l va lue of the pr ice ci<lSlicil y of demand . [n Chapter 12 we study methods C handling this or "simultaneo us causal ity" and use those methods to esti mate lhe price elasticity o f cigarette demand. Basic economics says thai if cigarette prices go up.es. the analysis of th ese data is complicated because causal ity runs b01b ways: Low taxes lead to high demand.. by what percentage will the quanti ty o f cigarcltes sold d ecrease? The percent age change .n Ihe statc the n local politicians might try to keep ciga rette taxes low to satisfy their smok ing constilUents. prices. and thus low ciga re lle prices.o to the beach? . ta:. tlnd s tales with high prices nave low smok ing rales. consumption will go down.6 CHAPTER 1 Economic 0ue5tions and Data smoking and the tess quantifiab!e costs to nonsmoke rs who prefer not to breathe secondhand ciga re lle smok. One of the mosl flexib le wols fo r CUlling consumption is to increase taxes on cigarettes. there il> a role for govern· me nt interve ntion in reducing cigarette consumption. states wit h low taxes. In these data. But what is the price e lasticity of demand fo r cigarettes? Although economic theory provides us with the concepts thHt help us answe r this questio n. Bul by how much? Jf the sales p rice goes up by I %. have higb smokiog rates. and pc. bU I if there arc many smokers i. The dal3 we examine are cigareue sal es. are borne by other m embers of sO<"iet)'. by how much? Will city tax rece ipts next year cover planned expenditures on city services? Will your microeconomics exam nex t week focus on externalities or monopolies? Will Saturday be a nice day \0 g. Because Ihese costs are borne by people other than the s moker. lh en we need to know lhe price elasticity to ca lculUlc the price increase necessary to achieve this reduction in consumption. if so.S.e. If we wanl to reduce smoking by a cert ain amounl. To \carn the e lasticit y we must examine empirical cvidence abo ut th e behavior of smokers and potential smokers . What will Sales be next yea r at a firm considering in vestin g in new equipment ? Will lhc st ock market go up next month and .
For example. the mainstay of econometrics. our answers alwtlys have some uncerta inty: A differe nt sel of data would produce a different numerical answer.. by analyzing data.. 1 Economic QuestiOO1 We Examine 7 O ne a~pcct of the fulUre in which macroeconomists and financial economists arc particularly inte rested is the rate of O'w crall price inflation d uring the nc'\:t ) ~a r. A financial profes~ional mig.1. then they might increase interest rates by more than that to slo.ht ad\ isc a client whethe r to make a loan o r to tak e one o ut at (1 given rate of interest. so their decisions about how to set inte rest rales re ly on the outlook for innation ove r the next ye. introduced in Part n.tant. and econometricians do Ihis by using economic theory an d st. holding othe r thing. Beca use we usc data to answcr qua ntilath e questions.thc conceptual [ramework for the analy sis needs to provide both a numerical answer 10 the queslio n fi nd a measure o r how prcci~e the answer is.. O er many. in thcir vic\\. '" hat effect does 11 change in cla~s :.C.lr.:l tis tical lel'h niques to q uanli fy re latiunsh ips in hist orical data. Professional economists who rtdy on preci§c numerical forecast s usc econo me tric mode ls to ma ke those forecaSl ~ A fu rcca~ t c r's joh is to predict the fu ture using the pas\. dlher an unnecessary recession or an undesirable jump in the ratc of inflation." in which a currently low value of thc unemploymerll rate is associated with an increase in the rate of inflation over the next year. down an econ omy tbat. This model.. An impoTlant empi rical relationship in macroeconomic dat a is the "Phillips curve. If they guess \'oTong..111ere(ore. 1'le data we usc to fon:cas\ int1ation :Ice the rates of inflation and unemploy ment in tbe Uni ted States.> cont. arc responsible for keeping the rat e of price innation under control. D. Economists at central hanks like the Fcdcrol Rese rve Board in Washington. If they thin k the ra te or inflation wil l increase by a percentage point. The conceplUnl framework used in this book is the multiplc regression model. Quantitative Questions. the~ risk causing. provides a math ematical \\ ay to qua ntify how a change in one \!ariable affccts another variahle. Economic theor) pro vides clues alJout that answcrcigarelle consumpl ion ought to go down when the price goes upbul the actual va luc o f th~ num ber must be learned empirically. and the European Ce ntra l Ran k in Fran kfurt. One of the inflation fvrecasts we deve lop and evaluate in Chapt er 14 is based on the Phi ll ips curve. Quantitative Answers Each of these four questions requires a numerical answer. that is.depcnding o n her best guess of the rate of iol1a tion over the coming year. risks overheating.ize .
lt is controlled in the sense tha I the re are both a co ntrol group that receives 00 treatment ( no . Touc hing a hot Slave causes you to get burned. pUlling fe r tilizer on you r tOmato plants cau ses the m to produce more t(lm[l toes. measurable consequence (more tomatoes). Moreover.The dif· ference between (he average yield per square meter of the treated and untreilted plots is toe effect on tomatO production of [he fe rtilize r treatment.8 CHAPTER 1 Economic Questions OM Data ha ve on lest scores. whether a plot is ferti lized or oot is de te rmined randomly by a compute r. IwldinR consran( slUde nl characteristics (such as family income) Utal a school district administnHor cannot control? What effect does your race have on your chances of ha ving a mortgage application granted. ln common usage. the honicuhuralisl we ighs the h<lrvest from each plot . Each plol is te nded ident icaU y.2 Causal Effects and Idealized Experiments Like ma ny questions encountered in economerrics. drinking wa ter causes yo u to bl! less thirsty. holding ConSllI1If the in come of smo kelrs and potent ial smokers? The multiple regression model and it!'. e nsuring that any other di ffere nces between the plots are unrelated to whetb er they receive fe r tilizer. with one exceptio n: Some plots get 100 gr<lms of fe rtilizer per square meter. say 100 gram s of fertili zer pe r sq uare meter? One way to measure this causal eUecl is 10 conduct an experiment. of fh al action .holdillg COII SWIlf Other (uctQrs such as yo ur ability to repay the loan'? What effect does a 1% increase in the price of cigarelles have on cigarette consumption. ex te nsio ns provide a framework for answering these questions using data and {or quantifying the unce rt ainlY aswciated with those answers. putting air in your tires causes them to infl a te. C <lu sality means that a speciCic action (appJyi ng fertili ze r) le ads to a speci(ic. At the end of the growiog season.. a horticultural researcher plants ma ny plots of tomalQCs. while the rest get none.sbles. 1. or consequence. In that experiment. Estimation of Causal Effects H ow best might we measure [he causal effect on tomalQ yield (measured in kilo grams) of applying a certain amount of fertili ler. an action is said to cause an outcome if the outcome is the direc t result . Thi s is an eX<lmple of a randomized controlled experiment. the firsllh ree questions in Sec lio n Ll concern causal re l ~lionshjps among var.
Fo r example. Forecasting and Causality Although the first thr('c q ueslioos in Section 1. then in the ory this experiment would estimate the effect on test scores of red ucing class size. You do not need 10 know a causal rela lionship to make a good fo recast.1. bUll he act of using an um brella docs not cause it to rai n. I n such an experimenl. for exa mple. The concept of an idea l random ized conuoHed experiment is useful bel.1. In this book . ex perime nts are ra re in ccono me lrics beca use often Ih. to siudy class size one can im agine ra. the causal effect i ~ de fined to be the effect on an outcome of a given action or treatment. holding al1 else constolH . It is possible to imagine an idea l random izet! controlled experiment 10 answer each of Ihe first three qUt!stions in SeCi ion 1.:ouse it gives a definh ion of a causal effeCI. This random assignment e lim inates the possibility of a systematic rclationsll ip belween. so that t he only systcmatic diffe rcnce be tween the treatment and control groups is Ihe u eal ment . the only systematic reason fo r d ifferences in OUI comes belween thc trealme nt and conl rol groups is the trea tment itself.ndomly assigning '" trea tme nIS" of diffcrent class ~izes to differ~n t groups of student s' If the expe ri ment is designed and eXt!cuted so that the only sy~ (emali. as measured in an ideal randomized controlled exper ime nl. how sunny the plot is and whe ther il receives fefli li7. however. tbe fourthforecas ting innationdoes no\.2 Cousal Effects and Idealized Experiments 9 ferti lizer) and a treatm ent group that rece ives the treatment (100 g1m 2 of fertil izer).1 co ncern causal effects. A good way 10 ' fo recasl " i f il is raining is to observe whether pedestrians are using umbrellas. it i ~ not possible to per· form idea l experime nts. Ihen it will yield an esti mate of th e causal effect on the outcome of interest (tomato pro ductio n) of the treatment (applying 100 &1m2 of fertilizer).cy are une thicaL impossible 10 execute satisfact orily. economic theory suggests patterns and relationships that might be use ful for forecasting. In fact. Evcn t hough forecasting ne ed not involve ca usal relationships.e r.c analysis of causal effects using actual data. If this expcri m~nt is properly implemcnled on a large enough scale. TIlt:: concep t of the ideal randomized con trolled expe rimenl does. provide a theoretical benchmark for an econometri. or prohibitively expensive . multiple regression a n a l ysi~ allows us to quantify historical . how ever. In prllctice. As we sec in Chapter 14.cdifference between the groups of students is their class size. II is randomized in tbe sense t hat the trea tment is assigned ra ndomly.
" ampil:.. Moreover. and admini"irative records.3 Data: Sources and Types In econometric.: ofTcnncsscc finance d a I ~rge random it.lcms were ra n domly u~signcd \0 d.&tions and Dota relatlunships suggc~ted by economic theory.thcy have flaws relative 10 ideal randomi7ed controlled exper imcnts. is dc\'oted to methods for k: ~p eri l1lenl j) 1 .ers incxpensi\c cig. Obscrvatjonal data pose mujor challe nges to econometric allcmph to esti mate causal effeclS.uch us historical n:cords on Illortg. Instead. <i. and ethical problems... 1. and to asse~s the accuracy of those forecast. 110 It i ~ difficul t 10 sort out the effect of the "treatment " from other relevant factors.o.Qnomi(::) are ra re.arcllcs to ~c how ma n) they buy?) Because of these financial. data come from One of 1\\0 sou rces: experimenls or nonexpcri mental observations of the world. th ousand~ of "tut. the student. For e. mOllt economic data are obtained by observing real\\ orld behavior.lge applications main tai ncLl by lending institutions. Data obtained by observing actual behavior outside an t"xperimclltai setting <lre called obscrvaliomll data ..teacher ratio in the cla:. Becaulll! realworld experiments ~it h human llubJects .. and the tools of econometrics to t:lckle these rhullenge~ In the real ~orld. (Would it be eth ical to offer randomly selected tecnag.Ire difficuilio :ldmin ister and torontrol. experi ments in c(. which we examine in Chllpter 13.cd tests. Experimental versus Observational Data data come fro m experiments t.10 CHAPTER 1 Economic Q.:l ss~ of d ifferent si7es for sevcral years ami were given ann ual standardil.ed controlled experiment examining class site in the 19SOs.. and much of Ihi~ hook. pa renl.lesigned to cvahl(lIe a treatment or policy or 10 invcstigute R causal effect.. practical . 1Il some circumstances experiments arc not on ly expensive and difficulllO adminbter but also unethical.!>. Observation<ll data arc colkctcd using SllrvCy~ sllch 3" Iltclephone survey of consumers. The ' lcnncssee class size ex perimen t cost millions of dollars and required the o ngoing cooperalion of many adm inistrators.s size exa mple) arc not a~~igm:(j ul random. to check whet her those relationsbips have bl:CII stable over time. and leachers m er several years.. the statl. Much of econometrics.This book examines both expe nmental and oon e. In that expe riment. levels of "treatment" (the amount of fer tilizer in the tomato c_"ample.:perimental dahl sets. \0 make quantitatiVI! (oreca~1S about the future.
With cro~sectio na l data. firms. Whether the data are experimen tal or ob~cr"ational. time ~eri es data. Each row lists data for a differen t district. firm. divided by the number of classroom tea(. so for exam ple. the avcmge test score for the firM district ("'d istrict #1 ") is 690. CrossSectional Data Da ta on different cntitiesworkers. and the number of the district. The order of the rows is arbitrary.1. that is. the number of entilit:S on which we bave obse rvations is denoted by II.on d languagc an d who are 110t YCt proficient in Englishis 0% . For exam ple. and so fort hfor a single time period are called '.89. In this book )'ou will encounter all three types. is an arbitrarily assigned number that organizes the datil.."hcrs in d istrict #1 .consumers.e stude nt. is 17.'fosssectional datil . gm·cmmenta l units. Some o f these data are tabulated in 'Iable 1. firms.nlose data arc for 420 en tities (school districts) for a single time period (1998). Time Series Data lime series data ure data for a single entity (person. all the .8: this is the ave rage of the math and science test scores for all fi fth graders in that district in 1998 on a standardizeLl test (the Stanford Ach ieve men t TCM). the pen:l:!1tngc of students for whom En gli ~ h is a s~(. data se ts come in three main types: crosss~ctional data.10 gene ra l.1 .. As you can see in the Hlblc. the data on test scores in Californi'l school districts are cross sectional.te:lche r ratio in tha t dist rict is 17J':Y. The California te~t score data set contains measurements of ~c\('ral different varia hies for each district. o r Olher economic en tities du ring a single lime period. Fo r exam ple. The remaining rows prcsent data fo r other d islriclS. we ca n learn about re lationships among varillbles b) ~Iudying differences across people. The dat a set contains . arillblcs listed vary considera bly. The percentage of studen ts in that district still learning E nglishth at is. country) collected at multiple lime periods. and panel data. the number o f students in district #1. The averag. Average expen ditu re: per pupil in district #1 is $6. in the Califorma dala set II = 420. Our data set o n the ratcs of inflation and unemployment in ' he United State~ is an example of a lime series data SCI. which is called the observation number...3R5.3 Data: Soorces and Types 11 meeting the challenges encountered when realworld data arc used to estimate causa l effects.
_ .7% per year at all annual rate. The number of observations (thal is. 418 41 ' 420 NOlI' " Th..1n Pupil (S) Leominil English 690. CPl) would have increased by 0. and Ma rch : Ihe sec o nd quarter is April.6 641.3%.8 661.. the rate of CPT inf1.7 5502 7102   0.2 655.. 5.2 S6385 5099 0. for exa mple. and June . . May.8 tKo7 5236 _ ..!mploymeol) for a single enfity (the United Slates) for 183 time periods.0 672..   . which is denoted 1959:11 and end in the tourth quarter of 2004 (2004: IV). February. 24. Some observations in this data set are listed in Tabl e 1.no 17. 1 ~ted Ob$eTvatiom on Test Scores and ()tf.. 1 Economic Ouestions and Data TABLE 1 . In the second quarter of 1959. . In the third q uarter of 1959.rcentoee of Ob servation {District) Number j Diih'icl Average rest Stor. Because th ere nre 183 quarters from 1959:11 to 2004:lV. 4 1 ' ...()4 . and so fOrlh ).7 % .0 .1 o bservationS 0 0 two va riables ( the ra les of in(]ation and unt.n l time period (ye ar and qu arter). the rate of price inflation was 0. The observa tio ns in this data set begin in the second quarter of 1959.t he rate of unempl oyment was 5.12 CHAPTER.70 17.. In other words.T eacMr bpenclitu. the overall price leve l (as measured by the Consumer Price Index.9 5  6W.ln the second q uarter of 1959..1 %..6 30. Each lime period in this dala sct is a quarter of a year (tbe first quarter is January...0  643..89 2Jj2 18.0% . pet" Studentt • . ".3 4776 JO 5.2.. if inflation had continued for 12 months Ht its rale during the sec ond quarter of 1959. The data in each row correspond to a differe. and the ra te o f unemployment was 5. {Fifth Grad_ ) Studen.. .1 % of the labor fo rce report ed that they did not have a job but were looking for work.t. lime peri ods) in a time series data set is denoted by T. . 1% .d in Appendi x 4.. this data set contains T = 183 observa tions.2fl 19.er Variables for California School Distric~ in 1998 P..:1 j) .645.. 5993 Calif()mi a !C~t :>ooro: daLa 111:1 <Jc:scrilJ. 2U lI} 20. that is.lIion was 2.0 13. 441)3 ..
.3 5.3 1. Sources and Types 13 TABLE 1.or crt an annual ra". (%) .3. in Ihat data SCI are listed in Table 1.umplion and prices (Ire an ~xample of a panel data sel.. .crved at (wo or more time period ~.:: variables.59:111 19.d Observations on the Rates of Consumer Price Index (CPI) Inflation and Un&mpioyment in the United Stoles: Quarterly Data.1 5. in 1985. 3 19.l:1II 200UV ~ 5. (I re da ta for multiple entities in whic h each entity is ob.s. The number of entities in a panel data set is denoted b)' II. cigarcllc sales in Arkan ~~ were 12K5 packs pe r capita (the tolal num ber of pack~ of cigarettes sold = u. CPllnRmion Rote (% per y. The next block of 48 observations lisls {he data for 19M. and !ielecled vilriables and observation:.1 2OOJ1I 2()().6 . For example. The fir~ t block of 48 obse rvalion ~ lihts the data for each Slate in 1985. Unemployment Rat. organized alpha helically from Alabamllto Wyoming. So m~ dala from Ihe cigarc lle consumption dala set are listed in Tablc 1.~continental states (enti ties) for T = 11 years (time pe riods) from 1985 to 1~5 . 4. also called longitudinal data.6 '.4 0.. through 1995.6 • . and the num ber of time periods is denoted b~ T.4 . Thus the re is a tOlal o f /I x T = 48 x II = 528 obsen·atiollS.4 54 IS3 ·\·olr 35 described in A f'PI'ndi~ 14 I.7% 2.. J9592000 Obefvarion Number (Vear:quarter) . Ill" t·" InnaLM'on 3nd un<'mplo\1T1cnl tbLa loCI Oy tracki ng a single cnlily over time.2 SeIect.3 Dolo. 1>1 I ~l '. a nd \0 fO r1h. In the cigarette data set. wc have obsen 'ations o n" 4.1.3.5. time series data c<l n be used to st udy the evolution of variables over lime and to forecast future values of thos.59:IV 1960:1 1960:11 0..59:11 19. Panel Data Panel data.1 % 5. O ur da la on cigarette COI1 :.1 5..
335 97 1.O!t9 0.14 CHAPTER I Economic OvesIion ~ ond Dolo TABLE 1.3 SeIeded Observations on Cigarette Sales.1.0i5 2 Arkans.015.8 129. and pane l d ata a re summa rized in Ke y Concept 1. 19851995 A~Pric. and loca l taxes.S. was $1. 12. ta)c. Tohri Taxe. COP5umpUon dDt~ SCI h. Pa nel da ta can be used (0 learn about economic rl'ia tionships h om the expe rie nces of the many difCe rent enl ities in the da ta set a nd fro m the evolution ove r lime o f the variables for each e nt ity.985 .5).985 U'S6 0. in Arkans:ls in 1985 divided by the total population of Ark a nsas in 1985 equals 128... Alaba ma Cigarett.olon tQJ<J SO. and Taxes.1107  96 Wyoming Alabdma . (cigar.362  47 West Virgima Wyoming 1985 11 2.2 1.935  n.370 I 1165 1285 1045 SI. Prices. ObNlrvotion Number .4 J. of which 37~ we nt to federal .ai  3 A rizona .!{. 1lIe Clgar.. SlaMS.""" 1.986 1987 127.2  0.3)3 0. by State and Year for U. st al e.1.360  NOft.kansas in 1985. ~bc:d i. incl uding tax.3...986 Al nbama 11 7..et.8 0.n Appendi:>.240 0.382 '" 49 ' 985 .985 (poco per UlpitgJ per Pock (Induding tox••) + . exci.022 1.135  528 Wyoming . .240 0. TIle definitio ns of c rosssectional data. .585 0. Soles y.The average price of It pack of cigarettes in Ar.. time series d a ta.'\4 1.995 112.8 115.
TIME SERIES. Conceptually. Summary 1. AND PANEL DATA • Crosssectional da ta consist of multiple entities observed at a single time period. the way to estim a te a causal effect is in an ideal randomized con trolled experi ment. Key Terms randomiled contro lled e xpe rime nt (S) conlrol grou p (8) treatmen t group (9) causa l effec t (9) experime ntal dat a (0 ) observational da ta (10) crosssectional data (l L) observation number (I I) lime se ries data (11) pa nel data (13) longitudina l dala (13) . 2. impract ica l.. whe re each e ntity is observed at two or more time period!. lime series da ta a re gathe red by obse rving a single e ntity a t multiple points in time: a nd panel data a re ga the red by observi ng multiple e mities. • Pa nel data (also kno wn as longitudinal data) consist of multiple e ntities. imperfect experime nts 4. Key Termi 15 CROSSSECTIONAL. J. Ma ny decisions in busim:ss and economics require quant itati ve estim ates of how a ch ange in one va riable atfec ts Another variable. but performing such experiments in economic appl ic(l ti oos is usually unet hical. Econo me trics provides tools for estimaling causal effects using ei the r observa tional ( no nexpe rime ntal) da ta o r data from realworld .each of which is obse rved a t multiple poin ts in time. or 10 0 expensive. Crosssectional data a re ga the red by observin g multiple entities at a single point in time.1 • TIme series data consist of a single emity observcd at multiple time periods. KEV CONCEPT 1.
1 Design a hypothetical ideal randomized controlled experiment to study the effect of hours spent studying on performance on microeconomics exams. imen ls to implementing this experiment in practice.16 CHAPTEt 1 Economic Oues~on5 and Dolo Review the Concepts 1.livity of its workers (output per worker per hour). an observational time series data set for studying this effect: and d.I.3 You are asked to study the relationship between hours spent on employee training (measured in hou~ per worker per week) in a manufacturing plant and the prodU(. all ideal random ized co ntrolled experimelll to measure 1hb e. Suggest some imped iments to implementing this experiment in prtlclice.:l u~a l effect : b. 1. Sugge!il some imped. c. De ~ribe: :. an observational crosssectional data set with which you could study this effect . ..2 Design <l hypothetical idea l randomized controlJed experiment to study the crfect on highway traffic deaths of wearing seat belts. an observational panel data SCI for st udying Ihi~ dfecf. 1.
·~ntral importance in econometrics: the randomness that a rises hy randomly drawing a sample o f data from a larger popula tion. you 17 . a nti F distributions. chapter. For example.CHAPTER 2 Review of Probability T his chapte r reviews the core ideas of the theory of probabililv thaI a rc nectl ~d to understand regression an alYsis and econometrics. chi·squared.3 introd uces the basic cle men IS of probability theory for two ra n(]olll variables. Mecau5/! you chose the sample a t random. Most of the in teresting pro blems in economi cs invoh c more than onc variable. mean. a nd Secfion 2. o ~clions of this chapler focus on a specific source of randomness of (. and varia nce of a single random variable:.1 revie ws prob.ou still should skim the chnph:r and the Icml S a nd concept s at the end \0 make sure you arc familiar with Ihe ideas J nd nOl3tion. The fina lt .. suppose you survey tcn rece nl co llege graduates selected a t random . and compu te the average earnings using these ten daln points (or ··observations'·). Jf your knowledge of prohabilil~ is state.2 covers the mathema tical cxpectation.. and Section 2. We <I~sume p roha ~ i lity that you have taken an introductory course in and statistics. Sectio n 2. record (or "obscn·c··) their earnings. The theory o f proba bility provides ma thema tical tools fo r qua ntifying a nd describing this randomn e~ Section 2. Most aspects of the world around Uii have an elemenl of randomness..4 discusses three srecia l prubabili ly d istributions Ihat p laya centra l role in sla Tislics il Od econometrics: the normal. 1f you feel confident with the material. ).. you should rcfrc<:h it by reading thi!.lbility distributions for a single random variable.
If the probability o f yo ur com puter not crashing while you are writing a lem p<lper is HO %. the sample .)\'erage has a probability d istribution. . Because the a verage earnin~ vary from one randomly chosen sample to the nex l. and Random Variables Probabilities and outcomes. you will comple te 80% without a crash .1 Random Variables and Probability Distributions Probabilities. ' n le probability o f an ou tcome is t he proportio n of the lime that t he outcome occurs ill the long run.18 CHAPUR 2 Review of ProOObility could have c hosen ten differt:nt gradua tes by pure random chance. The refore. in general. which is discussed in Sec tion 2. Only one of these outcomes will <lclually occur ( the outcomes a re mutually exclusive). your computer might never crash. a nd so on. referred (0 as its sa mpling distribution because Ibis di stri bution de.6. the Sample Space. complicated. a nd the number of times your computer will crash while you afe writing a term pape r all have an ele me nt o f chance or rnndomness. yo u would have o bse rved ten different earnings and you would have comput ed a differe nt sample ave rage.stri bution of th e sa mple (Ive rage. had you done so.5 disc usses ra ndom sampling and the sampling di. It might crash o nce. your grade on an exa m. thcn oyer the course of writing many term papers. When th e sample size is sufficiently large. il result known as the ce ntral limit theore m. The mutually exclu sive potentia l results of a ra ndo m process atc C<'l iled tbe outcomes. ln each of these exarnpk:s. This sa mpling distri bution is.~. il might crash twice. Section 2.lhe sample average is itself a random variable. 2. For exam ple.nown lhat is eventually revealed . there is something nOl yet k. and tbe outcomes need not be equally likcly. The gende r o ( the nex t ne w person you meet . the sampling distrib ution of th e sa mple average is approximately normal .ribes Ihe different possible values of the sample average that might have occurred hnd a diffe re nl sa mple been drawn . howe ver.
whereas a continuous Iandom nlri Hble ta kes o n a COnlinuum of J>O". Pr( M = I) is th(. Probability Distribution of a Discrete Random Variable Probability distribution .: probabilhies s um 10 l(XI%. ~o il b a random variable. le i M be the nu mber o f t i m e~ yo ur co mpute r c ras h~s while you are writing a T erm pape r. t hree..ible value:. A~ their names suggest.1: in this dist ribution. This pro bability di~lribution is plo u e d in Figure 2.xa mple..1. 6%.jves Ihe (.ing a term paper is random and takes on a numerical valul. an event is a set of onc or more outcomes. . i~ (h e pro ba bi lity of no computer cr. Cumulative probability distribution . For example. :t discrete nmdom \'ariable lllkes on only a discre te set of va lues.1 b. lll C set of a ll possible o utcomes is called the sa mple space. 3 % .umulat ive pro bability di. . Some random variables a re discrete a nd some are eontin uou. A n . the probability of the eve nt of one or 1\\"0 crashes is the sum of the proba bilities of the cons tituent o utcomes. The probability of an eve nt can be eom pUled from t he probability distribution . A random variable is a numerical summary of a random outcome.1 6.·. Probabilities 0/ rvents. An eVl'nt is a subset of the sa mple space..xi1mfllc of a probabi lity distribution fo r M is given in the second row of Ta ble 2. 111C l a~t row uf'labll: 2. Thest.trihutiun of the random ." Random variables. 1. like O. you will quit and write Ihe pape r by h<lnd.10 + 11. According 10 thiS distributio n..W = 1) + Pr( M = 2) = 0. TIle event " my computer will crash no more than once" is the ~c l co nsisting of twO outcomes: "no crashes" and " one crash.viii occ ur.' rrobabilit y o f a si ngk compu te r crash: and so fOTlh . . For e.'.. . The probability distrihution of the random variable M b th e list of probabi li ties of e ach possibl e o utcome : th e fl robabi lity that M = O. 1 Random Variables and Probability Distributions 19 The sample space and rvents.lshe~. the probabili ty or no crashes is 80%: the probabil· ity of one crash is 10%: a nd Ihe probabilit y of two. The cumulative probability distribution i"the prnbability lh:llihe random variable is less than or equal to a particular value.or 16%. Tnese probabilities sum to I. Pr(M = lor M = 2) = Pr(. or four crashes is. 1. The numbe r of times your computer crashes while you nrc writ.<. The probability distributiun or a disc rete r andom \'.uiable is the list o f a ll possible va lues o f the . 2 . if your computer crashes four times.00 = 0. a nd 1%. that is. l11at is. denoted Pr(M ::: 0).'aria hlc a nd the probability t hat each value . respectively.
00 0.1 Probability of Your Computer Crashing At Times Outcome (n umber of trG$h. th at is. A bi nary rllndom vJJ iable is ca lled a B ~ rnoulli rand om va riable (in ho nor of the seven teenlhce nlury Swiss ma thematician and scienti st Jacob Gemoll lli).. or a cumulati ve d istri bution . FIGURE 2 . The height of the li r~1 bor is 0.99 0. 50 14.10 0. Probability Distribution of the Number of Computer Crashes Pro ln biliry Th e Bernoulli distribution. 0.8. e probobility of 1 com' puter crash is 10%. 1 compuler cro~ The height 01 each bor is !he probability thot the the indicated number 01 times.' . so the P'"obobility of 0 com pu te r crashes is 80'1. A n important special case of a discrete random variable is whe n the rando m variable is binary.90 1.01 ProbabililYdi mibution Cumulative probabilily diSiribulion 0.' Nu m~r of cn. 0. the probability of at most olle crash.[.20 CHAPffR 2 Review of Probability TABLE 2.d.J 0 2 3 • 0.96 0. For example.boes . a c... Pr(M :S I).06 0.80 0..00 variable M. a nd so forth for the other bars.80 . A cumulative probabili ty dis lributi on is also referred to as a cum ulative dis· tributioll fUll cti on . and its probability distribution is called the Bernoulli dist rib ution . which is t11e sum of the probabilities of no crashes (80%) and oC Ollt: c ra~h ( 10% ). is 90% . The height of the second bor is 0 . th e o utco mes are 0 or I.
1 R ondom Voriobl63 One! Probobility Distributions 21 For exam ple. it is 03tur31 10 Ireat it as a continuous random variable. Figure 2. a de nsity fu nctio n. Because a continuous random variable can take on a contin uum of possible val ues.2a pl o ts a hypothetica l cumula ti.. The area unde r the pro ba bility de nsit).. For exa mple. A probabil ity de nsity function is also calle d a p. is no t suitable for continuous variables. .s less than o r eq ual to a p articular val ue. this prob a bility can be see n on the cumulative distribut io n in Figure 2.d.p..anable. . th e cumula li ve pro bability distribution of a continuo us random va riable is the probabili ty that the random varia ble . 1) where JJ is the probability of the next new person you mee t bei ng a woma n. The outcomes of G and their probabilities thus are 1 with probability p { 0 with probabilit y I .1) is tbe Bernoulli distribution. Inste ud.L be tween 15 mi nutes a nd 20 minutes. conside r a student who dri ves from ho me to school. lhe probability is summarized by the probability density fundioD .2a as the difference Probability density function . the pro ba bili ty distribution used fo r discre te variables. which lists the probability of each possible value of the random .. Probability Distribution of a Continuous Random Variable Cumulative probability distribution . the probability that th e commute takes less than 15 minutes is 20"/0 and the probabilily (hat il la kes less th an 20 minutes is 78%. which is 0. G = (2.2a. o r simply a density.. The probabililY thai the commu te takes betwee n 15 and 20 minutes is given by the a rea under the p. The cu mula tive probabilily distribu· tion for a continuous variable is defined just as it is fo r a disc re te ra ndom va ria ble. This stude nt '5 co mm uting time can lake o n a conlinuum of va lues a od . wh e re G "" 0 indicates tha t the pe rson is ma le and G = 1 indicates that she is fema le. le t G be the gende r of the nex t new pe rson you mee t.f. because it depends on ra n· do m facto rs such as the WeatJle r and traffic conditio ns.d. The probab ility di stribution in Equation (2. Figure 2.58. Equi vale ntly. e d istribu· tion of com muting tim es. Tha t is. For example.2b p lo ts the probability density function of commu ting limes corre sponding to the cumulative dislribution in Jlgu re 2.2. or 58% . function between any two points is the probabilit y that the rando m varia ble falls between those two po ints.
INnbuIHlI'I lillll'UOIl ufcomllLuring 11111(' Pr(CommutmgtJrn«:S 15).(1' . and i1given by the area under the curve belwNo 15 and 20 minules .) of commuting limes.. i1 0 58 158%). .20 {or 20%).20 tl.linn of '(}OUlilUtil1!o! nme Figure 2 20 ~~ the cumulative ptobobitity dimibvtion {OI'" c.lbd' t\' d<'lNf'I fUIl. :::::.' 058 !J20 Illxll .2 Cumulative [)i$tribvtion and Probability Density Functions of Commuting Tima Probability 1..ll Pr (l) < Comml1ti"9llmt S 20). Pr ((ommurmg 1Jme S IS) .22 '! M .d.78 {78%}.dJ) of commuting times.:" .78 1l. 058 """' n.. Probabililiei orc given by ontos u~ the p.20 0. leu thon 20 minule$ is 0.1 "'I/'} \\. Commutin g rim e (llIi nUlcs) •• J~ '" II) C urrmunw .ll6 Pr (Commuling bIM :> 20):: 0 22 1111.d f The pobobility tnol 0 commuting time is between 15 and 20 minule1./ I '" Probab ility df'llsi ly 4115 " . 25 )) ~ ~ CUJnm\ltin~ ( i lnt' (milUII(lS ) (b) Il ruh. 10 0. 0.0. Figura 22b ~$!he probability density fvnctiQl'l (or p.L~.H n f. .11 "'7"'~ .22 CHAPTER 2 Re"Iiew of Probobilily FIGURE 2 ..f. •· J 1' 1 I~ 0. ood!he probobilily thai it i. The probobiHly thol 0 com muting lime is less than 15 minutes is 0.
'an of Y and is denoted by . 4 X 0.1t. suppose you loan a friend SIOO a t 10% inle ces!. O ver many such loon".35. 2. de notcd £(Y). AI.1. Thus.01 = $108.2) means that the average num· ber of cra~h~ o\cr many such term papers is 0.. and Variance The Expected Value of a Random Variable Expected value. Thus the expec ted value of your repayment (o r the "mean repayment") is $1 08. If the loan is repuid you get $110 (the principal of $100 plus interest of $10).""y. The expected v'ltue of a discrete random variable: is comp Ulcd as a weighted average o f the possible ou tcom~ of that random variable.35. The expected value of Y is alw called the expectation o f Y o r the ml.01 = 0.99 and equals SO wit h probability 0. the ca!culutio n in Equation (2.1. The expected '1llue of a ra ndom \'ariable Y. Mean.35 times while writing a particular tcnn paper! Rather. An important special case of the general formula in Key Concepl2. the acw al number of c rashes musl always bt: an integer: it makes no <. the amount you lire repaid is a random vari able that eq uals SIlO \\ jlh probabilit y 0. h Ul lhere i ~ a risk of I % tha i your friend will deff. is the longrun [lverage v<liue of the randum va riable over many repcated trials or occurrences. The e xpected value o f M is tbe a verage number of crashes over many term papers.1 i~ Ihe mcan 01 u Bernoulli rllnJom . the expected number of compu ter c rashes while writing a term paper is 0.. 111t' formula for the ex pected value of a discrete random variable Y .Cordingly.01..35. bUl l % of the time you would get nothing.99 + SO X 0.!'o on average you would he repaid $110 x 0.90. ond Voriance 23 hctween the probability that the commute is less than 20 minut es (78%) and the probability that it is less than 15 minutes (20%).2. For exa mple. co nsi der the number of compul e r ccashes M wilh the probability rJistributi on given in Table 2.lI ca n take: On k different \ alues is given as Key Concept 2. ' E(M) = 0 X 0.cnse 10 say tha t the computer crashed 0. Expected yalue ofa Bernoulli random 'IIariable.2) That b. O f cou rse. where the weights a rc the probabilities of that outcome.00 + 1 X 0.2 Expected Volues.10 + 2 X 0.Thu~ the probability density func· tion and the cumulativc prob..06 + 3 X 0.u lt and yo u wiIJ gel nothing at al l.lhility distribution sho\\ the same information in dif4 ferent formats. (2. A~ a second cx<lmplc. weighted by the frequency with which a crash of a given size occurs.99% of tbe time you would be paiL! back $1 10. Mean.03 .90.2 Expected Values.
which is the square rool of the \ ariance and is denoted U)" The :.4) Thus the expected va lue of a Be rnoulli random variable is p . is the expected value of th e square of the deviation of Y from its mean: var( Y) = E({Y _ ~ .2. _y". is FXPFrTFn VAlllF AND THE MEAN _ _ _ __ . ) 2]. Ihl! formal ma thematical de fi nitio n of ils expecta tio n involves calculus and its definition is given in Append ix 17. Because a continuous random variable can HIke o n a continu um of possible values.1 Suppose the random variable Y takes on k possible values. Because the variance involves the square o f Y. 1).. Lei G be the Be rnoulli random variable wi lh lhe probabi lity distribution in Equatio n (2. The ex pected va lu e of G is E { C) ~ 1xP+0 x (I . denoted var(Y). which makes the variance awkward to interpret. and that the prob ability that Y lakes on )'1 is P" the probability thai Y takes on >"2is P2' and so fo rth. The variance of a random variable Y." Expected value of a continuous random yariable.p) ~ p.P. . denotes the first value.24 CHAPTER 2 Review of Probability  f KEY CONCEPT 2. The Standard Deviation and Variance 1lle varia nce and standard deviation measure the dispersion or the "spread" of a probability distributi on.Iandard deviation has the same units a~ Y. the units (If the variance are the units of the sq uare of Y. fo r i running fro m I (0 k:" The expected vaJue of Y is also called the mean of Y or the expectation of Yand is denoted lJ.v.1 . the prohabi lit y that it lakes on the value " 1. and so fort h.y. (Z. £(Y) = YI P I + YZPl T ••• + y" Pl = LY. denoted £(Y) . The expected value o f Y. i'" I (2 . .3) where the n Olat ion " ~~]Ji Pi " means ·'the sum o f Yi P. )'.Y2 denotes the second value.. The expected va lue of a continuous random variable i ~ also Ihe prolmbility·wcighlt:d average of lhe pos sible o utcomes of the random va (iable. These definitions are ~ummarized in Key Concept 2.. where. .. variable. It is thererore co mmon to measure the spread by the siundard devia lion .
1lO + (I .1) is J.35)' x 0.2 (2.. 2:.(2.35). con!'ider an income tax scht:mt: under which a ""orker is taxed III a ral~ of 20% on his o r her ea rnings and then given . (2. 10 .0.35: var( M) = (0 . and Vooonce 25 r The: vuriUllCC of the discrete random va riable Y. the variance of the number of computer cm.\) p. a ft erlax earn ings Y arc reb tcd 10 pre.2 Expected Volues.p)' x p .35)' X 0.S) 11\:lt is.tax earnings X by the equation Y = 2UOO + 0. Mean and Variance of a Linear Function of a Random Variable 'ntis section discusses random \ ariablcs (say. .6475.\'.7) lltus the standard devia tion of a Bernoulli random nlriahlc is (T(.0.1'(1 . i_ 2. denoted (f~ •• is °1 ___ VARIANC E AND STANDARD DEVIATION KEYCONCEI"I I (1'\~ = "m'(Y) = f":{( Y .8. MI!Jon. so (T M "" vO.03 + (4 .06 (2. Variance of a Bernoulli random yariable. TIle sta ndard d evia tion o f M is Ihe !>LJunrc roOl o f the vari ance.L) )2. afterlax cOI rni n.l!.LG = P (Equa tion (2.0. For e xample.6) + (3 . 0.\ = .pl· :: (2 . U nder Ihh ta x ~heme.1lO. X 0.X and Y) that arc related by a linear function .0 1 = 0. TIle mean of the Bemoulli nm da m v. (taxfree) gr.35)' x 0.tax earnings)(.p)' x (I .s )? is 80% uf pre.lriable G with prohabi lity distribution in E quiltion (2.I {y.hes At is the pro b 3hilityweigbtcd average of the squared difference betwee n M and its mean.yf P" TIle ~t3 ndard dcvint ion of Yis (F l" the squart: roo t of the variance.4)] so its v<Hia nce is "ar(G) = uJ = (0 .r 2.J.mt of 52000. For exam ple. plus $2000.35 )' x O. v'p(l p). The units of the standard de viation a rc the Slme as the units o f }'.6475 '" 0.0.p) + (I .0..
e expected value of ( Y . BC(a use Y = 2000 ..(2000 + 0. . Thus the expect ed v~ l ue of her afterlax earnings is 0'1. ~ 2000 + O.Lx and (2.Ii4 £ l(X .. plus $2 . '1 "..13) if} = b2ul.1'.. Ihe standard de viatio n of Y is a" = 0. (2.26 CHAPna 2 Review of Probability Suppose an individual's pre· lax earnings next yea r arc a random variable wiLh mean /.8(X .000. which .. This ana lysis can be ge neralized SO tha i Y depends on X with an inlercept a (instea d o f $2000) and a slope b (i n s te ~d of 0...r. Then the mean and vari ance of Yare Py = (2..9) and (2 ...1 3) wilh a = 2000 a nd b = 0.riunce of aft ertax: earnings is th.. so that Y = a + bX. El( Y . 11 follows thal m(Y) ~ O.12) (2....8.I ''.....10) 11mt is.LLy)'l.1 .I'... .·". ..)'J= E[[O.1' . . Wh at are the mean and standard devia tio ns o f her aftertax earnings under this tax? Afler taxes... the standard deviation of the distributio n of her aft erlax ea rnin gs is 80% of the stand ard deviati on of the distribution of pre· tax earn ings. 12) and ) (2. 1l) 0 + bJ. Y ..)...I'x» )' ) = O._...Lx and variance Because pre·tax: earnings are random. taking th e square root of the va riance...8(X .1 .~ ••• : __ .11>u.. Other Measures ofthe Shape of a Distribution The mean and standard deviation measure IWO importaot fcalurc:s of a distribu tion: its cen ter (the mean) and ils spread (the standard devialion).0.l'x)')...<o<I"r.. :..SJ.9) The va. :_••. l l1is section dis· cusses measures of two other features of a distribution: the skewness.. (2.8).8I'x) = O.'..8X..8 X . so arc afterlax earnings..t ..Lx.·. so. and the standard deviation of Y is u y = ba x' Tht: expressions in Equations (2.64var(X ).8(1'".111 [I re applicati ons of th ~ more ge nera l fonnul as in Equations (2.:.. her earnings arc 80% of the original pre·tax earnings... £( Y) ~ ~y ::::: 2(XX) + O.
3 plots four distributions.i./.y)" can not be negative. Beca use ( Y . positi ve va lues of (Y . variance. lhe reIore.JL yt.JL)')~ afe no t full y o ffset by neg a live vu lul!s. changing the unils of Y docs nOl ch. If a dislribulio n is nOl symmetric.'J <7} (2.Y. Kurtosis. its skewnes. Dividi ng by u ~ in tbe denominator of Equolion (2. Bdow each of the fo ur dislribulions in Figure 2. lf so. If a distribut ion t)as a lo ng left tail. in Olher words. the more likely arc outliers.ty)J will be offset on average (in expectation) by equally lik ely negalivc va lues. Skewness./. th e distribution in Figure 2. for a distribution with a large Ilmount of mass in its tails. skewncss.2 Expected V~.mge j.15) If a distribution has a large amount of mass in its lails.e.rom extreme va lues. The kurtosis o f the distri bution o f Y is Kurtosis = El(Y . . the kurtosis cannot be: negative. 14) ca nce ls the uni lS of y J in the numeral0 r. is a measure of how much of the va riance of Y arises f. of (Y . . The mean. and kurtosis are all bused on what are called the moments of a distribution.3 is ilS skewness. Ilnd these very large values will lead to large va lues.J../ky) J) q~ (2.eon. Figure 2. The skewness of a distri bution provides a mathematical way \0 desc ribe how much a di stribution deviates from symmetry. lhen a positive va lue of (Y . so lhe skewness is nonzero {o r a disni butio n lhal is no t symme tric. then some ex lreme depar tures o f Y from its mean are lik ely. If a d istri but jo n has a lo ng righllail ./. An eXlreme value o f Y is called an outlier.3d appears to devjate more from symme try than does the distribut io n in Figu re 2. The kurtosis of a dislribuLion is a measure of how much mass is in its tails and. twO which are symmetric and two which <I re not. and the skewness is positi ve.3c.).s is negati ve. on ave rage (in expect ation) . so the skewness is unit fre. Visually.ts skewness. the kurtosis will be large. a va lue of Ya given amo unt abovc its mea n is just as like ly as a va lue of Y the same amoun! below its me<m . The skewness of the distribution of a random variable Y is Skewness = £[ (Y . For a symmetric distribution.}J. and Variance 27 how thick. Ef( Y ." are its tail s. then posi tjve va lues o f (Y .ty) ~J = 0: the skewness of a symme tric distribut ion is zero.t y)3 generall y is nm offset on average by an equally likely nega tive value.). Th us. or "heavy.2. The greater the kurto sis of a dislribution.llws. 14) where ITy is the sta ndal'lJ devia tion o f Y. for a symmeLric distribu tion.
.~ is called Il'ptokurlic or.. .tiled.\·WIlt"\~ 0. " = ... "" .6.I'! .. 3 ... .3bd arc hC3\yt:. so changing the units or y dOC3 nOI change its kurtosis.e djW"ibutiOl'1~ hove 0 mean of 0 and a variance of I.~ .  3 .\ / \ J 1. the kurtosi ~ is unil free. kurto". (c) ~1.. k urlO'il> . the di$lributions with nonzero skewness (c and d) are not symmetric_The distributions wi.\ '" 1).'I Iq~ / \ I i' ~ 0.' IU Four Distributions with Different Skewneu al'ld Kurtosis '" ". in tIg ure~ 2.1 ( b) ')kCI'v1K"5 ..1 t" I . kUrooi. !l_3~ \ u" 11..<' . (a) skc\\ I1l"' ''::: (). Like skewness. . t "' H. n. ~l (d) " .hs tribu tion. A distribution with kUrlo~i s exceeding ...3 (\. n. The dislribvrioru with skewnen of zero 10 and b) ore symmetric.J is its kUrlnsi. . 5 .1l . All of the:.. .. :W !. \ 0.. IJ . II.: . ...~ .1. . 111. \\ur<. more simply.' (1. heavytailed .\ " . kunos]. . ( !) "2 IlL 11 ':i) I .u U.28 CHAPTER 2 Review of Probability FIGURE 2 .r 0. .5 II.. '" :'i " I >. ... Below each of the four distributions In Figure 2.2 .1." kurtosis exceeding 3 [bd) hove heavy loill " me kurtosis of a normally distributed random variable is:t so a random vari · abl e with kurtosis exceeding ~ has more mass in its tails than a normal random variable.
. Le t Y be a binary ran dom variable tha t equals I if the commutc is s hort (less th an 20 minu tcs) and cqua l~O otherwise. Y . and no rain and shon commute (X = I.15. rainy com mute is IY'/o. 2. or Pr(X = 0. is the probability tha t lhe f(ln dom variab l e~ simultaneously take on certain values..0 the four proba bilities sum to l. 11le joint probabi lity distribution is the frequency with which each of these fou r outcomes occurs over IlHlny repea ted commutes. weathcr conditionswh e the r o r not it is rainingaffect the commuti ng lim\! of the student comm ut e r in Section 2. According to this distribution .sec ond. An example o f a join t distribution of these two va ria bt e~ is gi\'en in Table 2. 15% or the days have rain ll1Y and a long com mUle (X = O.l11al is. the ~xpc clCd value of Y' is called the r l h mome nt of the random variable )J. Y = 1) = 0. Bet ween these two random va riables. In general. o. A n ~wering such questions requires an unde rstandi ng of the concepts of joint.lity di~tribut.\') combi· n<ltions sum to 1. The skewn e~s is a function of the first. The joint probHbilily diSlrilmfion of Iwo d iscre te random variable!'.ion s.3 Two Random Variables Most o f the interesting questions in economic.o. Y = y). £( Y:!).2. Y = 1). Y = 1). is a l ~o ca lled the firs t momen t of Y. For example. involve two or morc variables. and Ihe kurtosis is a function of the first through fourth moments of Y. and third mome nts of Y. lhe probability of a long. 1l1e joint prob.07. Y = 0) = 0. and cond itiona l probabi. Also. a nd the expected value of the square of Y. ATe collcge graduate~ more likely to have ajob tha n nongraduatcs? HO\~ does the dis tribution of income for women compare to that for men?These questions co ncern the di<. . there are four possible out WIllCS: it rains and the com mute is long ( X . and lei X be a binary random variable that equais O if il is raiD ing and 1 if nol. say X and y . 15. Pr(X = I.mar ginal . over mc commutes. considered togethe r (ed ucatio n and employment SlatllS in the fi rst exa mple. Joint and Marginal Distributions Joint distribution. 1nese four possible out comes are mutually exclusive and const itute the sample space <. say x and y .. that is. is coiled the second moment of Y. the rib moment of)' is £( 1"'").1. income and gender in the second).3 Two Rondom Voriobles 29 Moments.lribu tion of two random variahles. l' = 0). Y = 0) = 0. Pr(X = 0. I.2. and Pr(X = I.abil it y distri bution can be wriUen as the funct ion Pr(X = x. The probahilitiell of all possihle (x .0).63. Y = 0): rain and short commute (X = 0. Y = I) = O. The mean of Y. no ra in and long commute (X ".. E(Y).
. as shown in the fina l row 0 1 Ta ble 2. The m<l rginal distribution of Y can be com put ed fr om the join! distri bution of X a nd Y by addin g up the probabilili~s.2.litional on another ra ndom variable X taking on a spcci fic value is ca lled Ihe con ditional distribu lion of Y gil'en X. the proba· bility of a long comm ute (Y = 0). is 50%. For exa mple. Of this 30% of commutes. the probability of a long rainy commute is )5% and the probability of a long commute with no ril in is 7% .07 0.16) For example.. The ma rginal distrib utio n of commut ing times is given in the final column of Table 2. the marginal probability of rain is 30%: that is. so if it is ra ining a long commute and a short comm ute are equall y likely. Equivalently. Th u<.2.15 0.78 100 Marzinal probability distribution. 50% = ... over many commutes it rains 30% of the time. the joint pro babilit y of a ra iny short com· mute is 15% and the joint probability o f a rainy long commut e is 15 %. conditional o n it being rainy (X = 0).2. so tt)e probability of a lo ng comm ute (rainy or not) is 22%.63 0.x" the n the ma rginal probabilit y that Y takes on the value y is PT(Y ~ y) ~ :L P.' Long Commute ( Y Shon Commute ( y Tolal '" 0) "C 0. Conditional Distributions Conditional distribution. The distribution o f a ra ndom variable Y conc. .30 0.22 I)  0. (X ~ x.2.15 0.30 CHAPTU.. the ma rginal pro b. II X can take on f different v<llues X I ' .on) from thc jo int diMribut ion of Y and a na l he r ra ndom va riable. The mQrgi'ud I>robability distribution of a random variable Y is jusl anothe r name for ils probabililYdistribution. Y ~ y).2 Joint Distribution of Weather Condi6ons and Commuting Times Roin (x '" 0 ) No Roln (X .=1 I (2. ..2 Review 01 Probobility TABLE 2.. 'Tlte conditional pro bability Ihat Y take!io on t h ~ Vul lle y whe n X t<l kes o n the value X is writte n Pr( Y = y lX = or). or Pr( Y 0 IX = 0) = 0.Th is te rm is used (Q distinguish lhe distribution of Ya lone (the ma rginal diSlribut.70 0. 1) T .biJil Y thai it will r<li n is 30%. in Table 2.50. Similarly. what is the probability of a long commute (Y = 0) if you know it is raining (X = O)? From Table 2. of £I ll possible out comes for whi ch Y takes on a specified va lue.
is a random variable.10 0.1 0. Suppose you use a computer in the libra ry to type yo ur term pape r a nd rhe librarian ra ndomly assigns you a compute r frum tnose ava il a bl e. Joint Distribution '" Old compu!er (A .06 0. a the age of th e computer yo u use.00 1. 1 ". "' _ 3 "' .065 0. lhe cond ilional prohability of a lo ng commUle given thal it is rainy 0) .01 .07 0.Q25 O.00 B. given th e age of the computer.01 In gene ral.35. Y = O) I Pc(X = .01 '" _ 3 '" 0.3 Joint and Conditional Distributions of Computer Crashes 1 and Computer Age (A) M! A. A (= 1 if the compuler is new.05 0." = Y ~ y) (2 .1 5/0. is given in Part B of the table. For e:w:am pie.r is P.given that you a re using an old com pute r.0.0) Pr(. the probability of three crashes is 5% wi lh a n o ld computer but J % with a new com puter. Then the condi tional distribution of computer crashes. As a second eX!:I mple.3.03 1.W' A I) ". Conditional Distrioorions of M given A '" Prt.1) 0.lO . .3.3 T Random Voriobles wo 31 TABU 2..35 . . In contrast . "' _ 1 '" .02 0.45 0. consider a mod ification Of tnc cras hing compute r exam ple. is Pc(Y = OIX = 0) = Pc (X = 0..O) / Pr(A = 0) = 0. .2. the cond itional probahili ty of no c rashes given tha t you are assigned a new compute r is 90%. 4 0.35 /0.'/IA . 0.90 0.. the newer compUie rs a re less likely 10 crash tha n the old ones: for exam ple. or 70%.= 0.1 1.02 O.. = 0 if it is old) .50 O. because half the computers are old ._1 0. A . the conditio nal probabilit y o f nocrashes. .00 •.50. According to the conditional distributio ns in Part B of Table 2.8 0.m 0.50 = 0. Because you are randomly assigned L a co mpule r.70. half of which a re new a nd half of which a re old .17) Par example.t3 0. Suppose the joint disLribution of the random variables M and A is given in Part A of Table 2. the conditional d istributio n of Y given X = . the jo int probabilil YM = 0 a nd A = 0 is 0.00 0.Ql Tou l 0. is Pr(M = OIA = 0) = Pr(M .00 o.UO~ 0. . 0) ~cwcomp u l c r(A 0.OS 0. .30 0.(y = YI X = x) = Pre: ~.Q35 0.
.• )~.s 0.1l1ed the •..56. is £(M A 1) 0.. .20) where the inner c. (2.).' expected Dum· her of computer crashes. TllC conditiona l expecta tion of Y given X = .es o n the I \ al m~~ x l.13 + 2 x 0.l(pcct<ltion on the righthand . (2.. Equation (2. the conditional e:\:pectation of Y given that the co mpu tcr is new is 0.19).ln the e xam ple ofTable 2.20) is computed u.'o nditional mean of Y gino X .. also c.. h:~~ Ih:1Il for Ihe old computers.1 7) (sec EXl. given thaI the computer is nc:w.05 + 4 x 0.. the law of ilerated expecl.. is the mean of the conditional distribu· tion of Y given X.\".20) is known a.32 CHAPTER 1 Review of Probability The condit ion al expecl alioll or Y gh cn X.02 = 0. so the condi tio nal l! xpecta lio n of Y give n thm thc compute r is o ld is 0.shes is 0. Pre Y ~ y.111(... . the mean he ight o f adult!. the I!xpectCltiol1 of Y is the expe ctation of the condit ional expectation of Y given X. The mean of Y is th.18) ano (2. Stalci.56.19) Eq uation (2.. \. the conditional expectation I~ lhe t:=:(pecled value of Y.Qlion..lhc expected number of computer cf<I!lhes.side of Equat ion (2..r is jU thc mean value of Y whe n $t X .omen.. is E(M A 0) = a x 0. .!mong !le. For example.J .l.lbl:o E(Y) ~ L . weighted b)' the probability distributio n of X. For example..18) For example. St:Hed differl!ntJy.70 + I x 0. That ic. then the conditional mean of Y given X = of is E(YlX ~ x) ~ Conditional expectation .14.: weighted CI\' er" g~ of the cond itiona l expectation of Y given X..: X ~ x).56 fo r old com· puter<.. tha t is.1lJ) follows from Equations (2.I • L)'.x.lional expectalion .. computed using the conditiona l distribution of Y given X If Y lakes on k values YI .:d average of the conditional ('x(lCctation of M given Ihal il is old and the cund.. E(Y) = EIE(YIX)]. give n tha t the compu te r is old. hased on the condit ional dislribUlions In Table 2.rcisc 2. is the weigh ted average of the mean he ight of men and the mean heigh l of women . (2.Il'r(X ~ x. I I E( Y I X ~ . = = = The law ofiterated expectations.\ eightcd hy the proportions of men ani.14. Similarly... lhe mean number of cT<\shes i. if X la k.. x1.ln number of cr.3..sing thi! conditional distribution of Y given X and the outer expectation i~ com· puted using the marginal di"tribution (If X..J mathematically.\ cight.J.. the me. co mpu te rs.\ 4.10 + 3 x 0. the mean number of crashes M i3 the \.
For example. as measured by the condilional s tandard deviation. then E(Y) = ElE(Y [XJI = £IOJ .56)! X 0. and the spread of thc distribution of the num· ber of crashes.22.70 + (I .2.3..1 (I . thaI is.lb is is an immcdintc consequence of 0::: E4 ualion (2.20).56)2 X 0. weighted by the propo rtion o f computers with that value of both A and P. so £(/1'1) = E(M iA = 0) X Pr(A = 0) + £(MIA => 1) X (' r(A . of the expected number of crashes for a comput.d mathematically.3 TYI'O Random Variables JJ of M given that it is new.2). where E(Y ~ X.t . X.:d ex pectations implies thut if the conditional mean of Y given.56)'2 X 0. (2. y.ero.S"id dirrer· cntly.:. and Z be random vari· :lblcs that are jointly distributed.0.. iY. i~ tht. in the comput.cro. so the standard de\'iation of M for new computers is \1'0. the. For the conditional dis tribu tions in Table 2..O. as calculated in Equation (2. The conditional vari:lOce of M given thaI Ii I j . P) i~ the expected nu mber of crashes for a computer with age .1lle expectl!d numbe r of c ra~h~s overall . t :(M ).:r wi th age A and number of progra ms P. Z ) 1.\1 .20 provides some addition. the condit ional variance of Y given X is .99). Exercise 2. 0::: 0::: 0::: . Z) is the conditional c'<pcctalion of Y given both X and Z.( . the expected number of crashes for new com puters (O.\' IS z.0.0. var(Y I X = x) = L . Stalc. The \l ari ltDCe of Y conditional on X i ~ the variance of Ih~ cond it ion al distribut ion or Y gi ven X.99 = 0. '1l1e law of iterated expccHllions also appl ies 10 expectations tha t arc condi· tional on multiple random v::trianlcs.99.50 + 0. then it must be th nt thc probahilityweighted avcrage of th ese conditional means is zero. the variance of the distri· bu tion in the second row o f Panel H o f Table 2.1properties of conditioml l expectations with mult iple \ anahles. Conditional variance. the cond itional variance of the num ber of crabhes given that the compu ler is old is \ar(M IA 0) (0 . 111en the law of iterated expeC(.21) For example.1be standard deviation of the contlitiona l distribution of M gi"en tha i A 0 is thu~ v'O.l tions say~ that E( }') = £{£( Y .50 = 0.3. thc mean of Y must be zero. if E( Y IX) = 0.05 + (4 .56). is smaller for new computers (0.35... which is 0. (3 .56)'2 x 0.47.14 X 0..n the mean of Y is i'cro.99.:r crash illustration of Table 2. let X.22 0.47) th:m (or old (0. l~) is less than that for old complllers (0. [X = x) .t that has P programs installed.0.56 X 0.3..13 + (2 0.1l1is is the mean of the ma rgina l tlistnhution (If M..! weighted avcrage .56)2 X 0.02 ~ O. 1) 0. if the mean of Y given X is I. For exa mple. 0::: 0::: E( YI X =x)1' PreY =). kt P denote the number o f program~ installed o n the computer.. The law o f iteratt. then F.
the n the covariance is negat ive.L x is positive).I'./Ly < U). . . X and Yare inde penden tly distrib uted iI. Specifica lly. (2.22) into Eq uatio n (2. Finally. Thai is. .Ll' is the mean of Y. whe re ILx is the mean of X and /.~x)(Y . suppose that when X is greater tban ils mcan (so that X .J. if knowing tbe va lue of one of the variables provicJes no information about the o ther." . then Y tends to be less than its mean (so that Y . if X a nd Y are indepc ndc nI .\ /'" \ To interpret this formu la. Prey c y1 X . I (2. y is positive) . the jo int d istributio n o f two independc nI random va ria bles is the prollucl of their marginal distributions. fnr all values of x and y .x ) X (Y ./. and whe n Xis less tha n its mcan (so that X .y}(y. then the covaria nce is zero (see Exercise 2. If X om take on I values a nd Yea n take on k values.. Covariance and Correlation Covan'ance. then Pr(X ~ x. then the covaria nce is give n by tbe for mula cov(X. (x.) .u y) tcnds to be positive. (2... if X and Y lend to move in opposite d irections (so thai X is la rge whe n Y is small. . If X a nd Ya re independent. Y ~ y) ~ Pr(X · x)Pr(Y ~ y) .r) = PreY::: y) (inde pende nce of Xa nd Y) . and vice versa).i. t 7) gives an alterna tive expre!.22) Substitut ing Equation (2. X and Y arc independern if the conditio nal distribution of Y gh'en X c40als the margina l dislri bUlioD of Y.Y) or by (T XI' . In bot h cases.. One measure of tbe exte nt \0 which t wo random varia bles move toge ther is their covariance.)) =L L ." y) Pr(X ~ x" Y ~ y.p. the product (X .23) Thai is.· sion for indepe ndent rando m variables in terms of their join t distribution .24) . In contrast.19)./L y)l .iJ. The ctw aria nce is denoted by cov(X.TIle covariance between X and Y is the expected value EI(X .x < 0). or indepe ndent. Y) ~ d Xy ~ E{(X .1 34 CHAP1U:1 Review 01 Probobility Independence Two random variables X and Y are independently distributed. so the covariance is posilive./Lx)(Y . then Y tends be grcateI than its menn (so that Y .
then cov(Y. its units arc. tben the preceding proof applies.\ ' + /L y.(y)(X .2: . the units cancel and the correlation is unitless.1. awkwardly. as prove n in Ap pe n dix 2.is the sum of their me'llls: £(X + Y) = £(X) + £(Y) = /L.28) . ff the condi tion al mean of Y does not depend on X. :I Two Random Voriob~ 35 Because the covariance is the product of X and Y. Y) = O.}. It is n OI necessarily true. O. however. If Y and X do not ha ve mean zero.X) = E[( Y .oo (2.27) We now show this result.23. ·lhis "units" p roblem can make numerical values of the covariance difficul! to interpret. Sa id diffe rently. then Y and X are uncorrelated.By th e bwofi tcrated expectations [Eq"". The Mean and Variance of Sums of Random Variables 'nle mean of Ihe sum o f Iwo random variables. = eov( X. divided by their stan dard deviations: corr(X. The correlation always is between .X) = O. An example is given in Exercise 2.25) are the same as those o f the de nominator. it is possi ble for the conditional mean of Yto be a function of X but fo r Y and X nonet he less to be uncorrelated.Y) Correlation. first subtract off their means. First suppose th at Y an u X have mea n zero. if £( Yl X) = JAy.X) = 0 and corr(Y. the correlation betw een X and Y is the covari ance be tween X and Y..X) = 0 into th e de(ini tion of correlation in E quation (2.27) follows by substituting cov(Y. the units of X times the uni ts of Y. X and Y. Y) :0. E(YX) = E{£(Y [X)X] = 0 be""..20) ]. tha t if X and Y are uncorrela ted.25).\'Uy · (2..I and I: tha t is. 111e correlation is an alternative m e a~ure of dependence between X and Y that solves the "un its" problem of the covariancc.2fi) Correlation and conditional mean . then the conditional mean of Y given X does oot depend on X. so t hat cov(Y.I s curr(X. (2 .25) Because the units of the numerator in Eq uation (2. The random va ri a b l c~ X and Y arc said to be uncorrelated if corr(X. Equation (2.J. . '0 oo' (J. (2. (2. Speci fi cally. YJ v'var( X ) var( Y) =~ u. deviated from thei r means.X) . E(Y IX) = O.Lx)] = E(YX).. 1 (correlfHion inequality). That is.
4. UC . $13. and..:cd ueUled me n? degree = high ~clf{)ol (lip /Villa.nb..11I .S. ~nd l>unuSC1. dlf.ce n workers gradua lc ~ ho are co lle~e i!> a high school di plo mathat i".. Intcresllngly. erc compul. E( Fllrmnf:~ j/liglr.{/.m be ~cell Onc wny to n!lswcr [h e~c questions is \0 e \:amine ~e hoo l for the tW groups of ml:n lFigur.· degree) and on gender_ These arc shown in ble 2. Ih..25 JX:r htlUr.}.:n with high 'ichooJ diplon1..viotion "" .3• CHAPTER 2 Review 01 Probability S ome parenl:> leU their children that ~kip th~y will bl' F.: . if Son.! Ui3 :t:'i. hig h c r' ra~jngjob I{they get higher education..'lrn ing~ arc higher {or those with degree (Ta diploma or b.educatc!] W Omen paid ll~ wdl as t he l'k!stpaid co Ueg..t horn the M:lJdI .4 . who h.1ch. does the distribution of earning~ for men a nd women differ? For example.. )1 d iploma (b) Women w)lh [ouryear colle!!c degree (c) NIl. "'''!!CS.lribution of t hc~e some pcr~nli1cs nre a college de.'I 17.im Har educntiQn. ho. $20.\'nate huurlr cllrmnJl. and thc mcan. ')tandard dC\'ialion. "'hoclll~ d..:u In Appendl\. FullTime Worh rs in 2004 Given Educotion level and Gender (n) Women with hIgh sd100.' I th..26 2d ~5 48.~ure 2.. salaries.2J Jlt85 11..54 17.31 35. Fo r both men .w!! only a high school diploma.4 Summor~s of the Conditional Distribution of Averoge Ho urly Earnings of U.. nnd of the conditional dl~tribUlion') able to !let u heUe r. 1For c\.tl page TABLE 2.tmple:... are thc bes t paid college.02 19.!005 rum:nl 1\'f'LIlalJon SUI"ey..$ 8.:: 2Ad :lnd O tl collc~e t he di~tri bUli on of e. the ~pre a d of the distribution of ~ilrn ing...:grel: than If Ihc)' Are presentcd in Tahle 2.co nditional o n thc high eM cducatiull nl deeree achie ved (high four conditional d i ~tribut i ons Figure lAc)..25 21.: conditIonal mean of earnings for " omen ('05/ wh~e parents rig ht'! Docs th~' \010 high.mlings.'7 l3.04 $12.63 27.87 24.2) 15.1 ! . as measured cQlI/il/lled N I I/e.lOll \\'('lmcn.26 J4.74 11..71 ucgrec .dJ~t<lcd tw Ihe number ur hU'l!" "'l"lctJ annual!) l llc: d~ltll>ulil'n\ . mean e.~ .12 Perc~tUe Standgrd o.clor<.'!>l dCgrl'C earnings differ hcl". first numeriC column ). and worken. fmwlt'}is SJ3.. hpo.!o<.lrihuilon fo r womcn with only II high school degree ( Fittu rc 2Alt ): Ihe ~n m e ~hifl c.l~) 2:t'!.79 lmedian) '''''' "" $ Iti.... female college gradu a[c ~ Gelillcr = Il1e distribu tion of iI\'cragc houri\' ea rnin~~ for (Figure 2At'!) is ~hiftcd to thc right of the dif..1 (d) Men wllh fouryea r oollcg.4. ·? Among workers wit h a <.15 S 7. 'urn u( annuat prt'ta:c.
2.3 Two R ondom Voriable! 37 by the st:munrd deviation. is greater for those with a Another feature of these distributions is tha t the .
ami covarian(:e!. VARIANCES.3 arc derht.. \'ar(lIX + by) """ (/ 2(rk (2. + (T~ (if X and Y ar" independent). (237) Us(.'rul expressions for means. \'ar( X ) . /)(TXl' cov(a + IlX + cV. Y.3 iT t + bX + e Y) = (I + h/L.32) + 2abCT.l') = (T X ) + C<Tn"' ( 2.{ Q"~ (corrdalion inequality). b. variance.33) E(XY) = :5 + . (1.'O' + /rn+...36) If X a nd Y arc indepcndcm.o h iog weighted sums or random \'ariable~ are collected in K ey Concept 2.15 fo llow from the ddini nd lions of the mean. and leI a. . variances.Y ) = uk + fT~ + '2<rn. 2. ar(Y) + 2cov( X .29) var(a . H c be conSlants.and (2. The rcs u lt~ in Key Concept 2..35) The varianct! of the sum of X lwd Y is the sum of their \·arianccs. y be the covariance betwee n X and Y (and so forlh for the othe r vari· abies).38 CH APTU 2 Review of Probability KEy CONCEPT MEANS.31 ) (2. and covariance: E( tI  .b Y) = h2(f~. plus twice their covariance: var(X + y) .1.:d in Appendix 2.3.u..\ + ell i • (2.. and V be random variables. AND COVARIANCES Of SUMS Of RANDOM VARIABLES LeI X.xlt y. Y) (Fr."\I. (2. leI Q). f:( y I ) "" I Icorr(X. + . 111C fo llowing f.uf'.34 ) 1 and '" n ! S V{T. let I1x and be the mean and variance of X . then the covariance is zero and the \ariancc of their sum is the sum of their variances: var(X +n = var(X) + var(Y) = if !. in .30) (2.
." The standard nnrmlll dist ribution i~ the normal distribu · tion with mean J.f.1 is a bell·lhopcd curve.1. I).As Figure 2. and variance ~J 2 is symmetric afound ill> mean an d ha .u.t + l.. Student t. Some sJ'lCcia l nOlalion aDd tc rminology have been developed for the normal di . where c hi <..ull .I. ·lne specific (ullction defining Ihe normal probability densit y is given in Appendix 17.cd conci"cly a<.t 'JNJ I' JI . Chi·Squared.I. ".. ond F Oi~tributions 39 FIGURE 2 . Student t.t ribulion. it mUSI be st:mdardized by first subtract ing the mean.d.<)()(f )' 2..2.5 The Normal Probability Density The normal probobility den ~i ly function with mean p. q 2J.:lhili ly density shown in Fi gure 2. th en Jividing the re:.%u.5. The Normal Distribution A eonl inuo ul> Tnndom variab le wit h n no rmal dhtrilJUl ioll h.4 The Normal . Random variables Ihat have a N(O. centered at JA.. I.. and variance .9<xr and J. and the standard normal cumulative distribution fun c tio n is denoted b~ the Gree k leiter <1>: accordingl:v. The area under the I"IOfmol p. Values of the standard nor· mal cumulative dil>tribution function a rc tabula ted in Appendix Table L To compuk pmhabilities for a norma l variable \\ilh" genera l mean and \an ance. (J~ ).1 constanl. The normal d istribution with mea n J.4 The Normal.·S. = 0 and variance q2 = I and is denoted N(O. + 1.95 The normal di~tribution is denoted N l. and F Distributions The prohabililYdislnbUlions mOSI often encounte red in econome trics are the· nor maL c hi ~uared. Pr(Z s c) = (fl(e). JI .5 SllOV. and F distributions.900 and p. Student t.900 i! 0. 1. . 95% of its prubllbility between p. between JA.. ChiSquared. ··N(p.\" the familiar bell· shaped prob. 1.I) distribu tion art' often denoted by Z .. the nor· mal denslI) with mean p.md vari:lOce a ~ is expres!l.
69 1 is taken fr o m Appe lldi xTablc 1. that is.1) s }. wha t i s the shaded area in Figure 2. that is'1( Y . distri bu ted with a me an of 1 a nd <. th e random va riable ~(Y .• 40 CHAPTER 2 Review 01 Probobility r. (2.. "A Bad D ay o n Wall Slree t. NY " 2) ~ Prfl (Y .<t>(dl). Then.and Pr(c l s Y s e2) = Pr(d J S Z s d 2) = (I'(tll ) . Let CI and C1 denote two numbers with c1 < d" = «('2 . so its skcwness is zero.I varia nce of 4. Y is nonnall).1) / V4 "" iCY .. ( Y .1)" \1~ Pr(Z s j) ~ <1>(0. l llen Y is sta nda rdized by sUbtracting ils me an and dividing by its standard deviation. The sa me approach can be applied to co mpUle the probability tha t a normally distri buted random variable exceeds some value o r thaI it fa lls in 11 cen ai n range.1).) /u and (2.that is.t) s ~(2 . The nonna! distribution is symme tric. (2. 4).4.(I'« i l ). These ste ps arc summarized in Key Concept 2.38) (2./L }/a. Now Y s 2 is eq ui va le nt 10 i(Y .6b." presents a n unusual applicatio n of the c umulati ve no nnal distribut ion. "Ibe kurlosis or the nor ma l distribution is 3.4 1) where the value 0. Th ~ multivariate normal dis tribution_ l lie normal distribution can be gen .. (2' and let d 1 = (e1 .. suppose Y is distributed N(l . (Tl).] ). • KEY CONC' COMPUTING PROBABILITIES INVOLVING NORMAL RANDOM VARIABLES Suppose Y is normally distributed with mean Ii and variance 2. by computing Z = (Y . tha t is.I ) is normally distributed with meaLl zcro and va ria nce o ne (see Exercise 2... Accordingly. it hA the sta ndard S normal distribution shown in Figure 2. Wha t is the proba bility that Y s 2.40) 'nle normal cu mulat ive distribution function ell is tabulated in Appendix Ta ble 1.39) PreY ~ el ) = Pr(Z S el2) = ¢I(d2}..5) ~ 0. by the standard deviation .L)hr.8).J.4 c:r: in other word. For exa mple.691. Pr(Y ~('I) = Pr (Z ~ dd = I . The box. Ih at is. di vided by its s ta nda rd devia tio n. Y is d ist ributed N(IL. ThUs.6a'! 1he standa rdized vers io n o f Y is Y minus its mean.
N(o. 1. then any linearcombin. tondord normol distribu tion tobia.60. and if a and b are two constants. a2uJ + b2al + 2abif.5 (b) 1\'(0.d. the bivariate normal distribution. Y bivariate normal) (2 . NU .d. I) distribution 0.own in Figure 2.2.0 2. and the corresponding probobility one.. from Af:Ipendix T obie I. is given in Appendix 17.I.4 FIGURE 2.n) ( X . PrjY s 2) = PrCT:S Pril $ 0. is given in Appendix 18.0. OT. 1Me probability lOOt '( s 2 is shoo. 4) To calculate Pr{ Y$ 2).on in I Figure 2.arioble.2). 4) r Y) P s 0.69 1. 4) distnbution y. The multivariate normal dist ribution has thrce important p roperties. ChiSquared. If X and Y have a bi variate no rmal distribution with covariance (Txy.. and the formula for the general mult ivaria te nonnal p.42) Mo re generally.51..x + hp.0 (a) ~~1.5) 0. 8ecouse the slondordized random variable. . then aX + bY has the normal distribution .6 The Normal. standordize Y.1.f. Siudenl t. and FDislributions 41 Cokulating the Probability that Y:s 2 When Yis Distributed Nfl. is a standard normal (II random .1lle fomlUl a fo r the bivariate normal p. ~en use the .691 .0 0..6b.. slondordizing Yis J.f. if /I random variables h.. I) .y. rlZ Pl'(l:s: 0. aX + bY is distributed N(ap.5) . if on ly IwO vari ables are being considered.we u Illultiv:lrinte nomlul distribution. Y is standardized by subtroct ing its mean (~ ...'llion of these variables (such as their sum) is normally distrihUied.1) and dividing by Pr(Y s 2) its standard deviation (17 . the distribution is called the multi"ariate normal distribution.
. 19H7. . This is 11 lot. then th."1\0131 o f l06leTO!>! 25. dard dC\'ialions.. 1')). 1') .05'l ond its standard devioliQn WQS 1 16%.6fL 16) stall t:rm lillll l'd FIGURE 2.'i . If da ily pa ccnlage price (.IK ~ l"iU 1%.. the ~tan d a rd uc \'iation o f daily pcrccnl agl' Yuu will not find thi ~ va lu~ In Appendix Ta hle 1.. The enormi ty (If Ihi.. On "Black M(.6%. 1987"BIock Moodoy" iho . ().42 CHAPTER 2 Review of probabili ty O n iI typica l d tl~ the overall \ alue of stock:.»<b loll 25. . i< ·2!l ~19.1987 ·1 ~ " L~c~c~c~~~__LLL__L" I 1')kU 1')111 I<.!!flli \ c return of Z2( '" 2. 10 OC loher 16.1 6% .1onday.21) '1 22 ).hul nothing com r.Ired In what happened on \.: (an average of 30 large mdustrial stocks) fell by 25. so lhe drop of you can cakulal(' it u ~if1g a computer (Ir). or more thon 22 siondord deviotion. '( t mded on the 0. On October 19. Pc: rce n{ c han gt It) the 1980s. drop eM\ be seen in Figure L 7.OOO . Y"8 r ..hungcs nrc normally distributed. bUl price changes on the Dow wtl ~ 1.00014. a plot o{ the clail.6%! From January I.hc Oow l o nes Industrial An:mg.7 I '/). it!). the overage percentoge doily change 01 "!he Dow" index was 0.6% wa ~ fl ne. I whe re {h er e Are . market cun rise o r fall hy I % or t.7 Duri ng Daily Percentage Changes in the Dow Jones Industrial Average in the 1980. 1987.)(ltbty. This probabililY is 104 x 10 w' . ...'· . thA is.1 1'~ :. October 19.~ 1"11" 1'1'11 .: probabili ty of 11 drop of al k usl 12 standard devin lic)nS IS Pr( t $ . t'lS{. returns on the D o w during the 1980s. 19110.''''~n more:.:" 510d.
stock price per centage changes hove II distribution wilh heavier tails than th.1 X 10 un. th e n X and Ya re in de p en de nt. • Tht: universe is believed to have existed for 15 hil lion years. if va ri a bles with a m u ltivariate n o rmal d istri b ut io n have covariances tha t equ a l ze ro . The ehi·squllred distributio n is th e distribution of the sum o f m squ a red ind e pendent !Standard nOlTtl al ran dom varia bles. the n the con verse is a lso true. if a se t or variables has a mullivaria le normal di:\tri bution. tion (2.18.? Consider the fo!lowing: • The world popula tion b ahout 6 billion .4 x 10 101 . The ChiSquared Distribution Thc chisquared d istributio n is used when testing certajn types o f hypo theses in statistics and economelrics. Z z.81\Ce of the l>erccntagc change in siock prices can cvoh'e over time.42) by se tting a = 1 and b = O Third.that zero covaria nce imp lies indepe nde nce is a special prope rt y o f the m ult ivariate n o r m<l l dis tribution that is no t tr ue in general. if X and Y h a ve a biva ria te norm al di stribution a nd a xy = 0. The Normol.3 il was sta ted that if X and Ya re inde pe nde m then. These models with changing variances fi re morc consistent with the very had:md "cry gooddays we aemall)' see on Wall Street. ~conds. financc profe~siollllls ability of choosing n particular second at random from all the !. This resull.tion 2. 1l1en + + Zi has a ch i·squared d ist ribution \\. lhen the ma rg inal d islri bu lio n of each o f the va ria bles is no rma l (this follows fro m Equa J. This distribution depe nds on m .sn the prob· changes than the normal distrib ulion would suggest. ChiSquared.I)' thu n o thers.. and F Oi$tribution$ 43 How small i~ 1. so some periods have higher "olatH. For this reaso n. which is ca lle d Ihe d egrees of freedom o f the c hisq uared d istribut ion.!O about o ne in 6 billion. If X a nd Y a re jointly no rma lly distribute d .13• Second . TIle name (or this zi Z 1 .:ginning of time is 2 x 10. there <ITt! more days wit h large positive or large negative to lO. let 2 1.'i lh 3 degrees of freedom. U XY =: O.lhe fact tha t il happened at aU suggests tnat its probability was mon: than 1.tconds since the hl. Studenl/. or 2 X Ahhough Wall Slrecl did have a bad day. of winning a random Jonery among att living people . or about 5 X 10 1. Thu s. use economet ric models in which the \'8r. a nd Zl be ind e pende nt sta ndard norma l ra ndom variables. so the probabilit). In fact.2. • The re are app roximately 1043 molecules of gas ill the first kilomete r above Ih t! earths surface.: normal dis tribution: in ot her words. For example.. In Se!. rega rd less o f the ir jo int disu ib utio n. Ihen the var iabJes are ind epende n\. The probabili ty of choosing one at random is 10.
The n the random variable z rv'iv~ has a Student {dis tribution (al so called the (distribution ) with III degrees o f frl. a nd the I. In stati"tics and econometrics..let W be a chi·squared random variable with m degrees of freedolll and let V be a c bi· ~quared random variable with 11 degrees of freedom. __ _ _.. lei W be a random variable with a chisquared distribu ti on wi th //I degrees of frce· dom. To ~late this mathematicaUy. let Z be a stamlard normal random varia hie. The Student t Distribution The Student I distribution wi th m degrees of freedom is defined to be the distri· bution of the ral io of a standard nonnat random variable. to an indcpe ndentl)' d i~ lri b ut cd chi·squared random variable with degrees of fre edom 11 . .:cs of freedom til .. an F distribution with numera tor degrees of lIeedom m and denominator degrces of freedom II..i.. distribution equals th e 'itandard normal distribution. it i::.XI) = 0. Appendix Table 3 shows Ihallhc 951h percentile of the .81. is defined to be the distrihution of the ratio of a chi sq uared random variable witb degrees of freedom m. '.. That is. divided by the square root o f a n imlcpcnde ntly distributed ch i·squa red ra ndom variable with m degrees of freedom divided by m .!rcent iles of the Sru dent I distribution are given in Appendi.. divided by 11 .that is.. The Student (di ~lribution depends on tbe degr. . In this limiti ng c<I!'e. For example.rl d is tribution is 7. ._ _ .44 CHAPTU! '2 Review of Probability distribution derives from the Gree k leiter used 10 denote il : a chisquared distri bution with 111 degrees of freedom is denoted x. _ •.. The F Distribution The F distribution with m and 1/ degrees o f freedom.so Pr(Zr + Z~ + 21 :5 7. ._ _: . L .quared ran I._ • .. the de nominator random variable V is the mean of infinitely many chic.. and Jl:\ Z a nd W be independe ntly distribu ted.. w_ ~" ____ _t ___•• _ _ . divided by m.. a " faller" bell shape than Ihe nonnal.... denoted F"... Then has an I~Jln distribution.. where Wand \' are inde pendently di~Mibuted. The St u· dent I distribu tion has a hell shape similar to that of the nonnal distri but ion. .". b UI when III is ~mall (20 o r less) il has more mass in the laih. Wben m is 30 or more./' Selected percentiles oCthe distribu tion arc given in Appendix Table 3.. the Student 1 dist ribution is well approxima ted by the standard normal dislribUl ion. This d istributioo is denoted (". x. Selected p!. I" . distrihu tion.!cdom.. _ __ _ :~ .'I( Table 2. all important special case of the F distribution arises whe n the de nomimHor degrees of freedom is large enough that the FM II dis tribution can be approximaled hy the /'~' . . I _ _ _ • .thal is.95."... Thus the 95 111 pcfcentill: of the 1m distribution de pends o n the degrees of freedo m m.
7.60.w:.ample of del la.S Random Sampling and the Di~tribvlion of the Sample Averoge 4S random . the value of the commuting time on one of these randomly selected day:.92.tribution of the l>3mple averag\!. "hich is the same as the 95 111 percentile of the ..>t. ." dis tribution tend.'ariable with In degrees of freedom.KI (from Appendix Table 2). 2. which is:1 (7$113 = 2.c these days were selected at random. The <let of random sampling. an cssenlial step towanj understanding the performallce of econometric procedures. Bec..5 Random Sampling and the Distribution of the Sample Average Almost arr the ~ I atistieal and econometric procedures used in th is hook involve ave rages or weighted uverages o f a :. The 9()lh. from Appendix Table 4. This section concludes with some properties of the sampling di:. 95111 • and 99 111 percentiles of the Fm •II distribUlion arc gi\'en in Appen dix Table 5 for se lected values of In and n.60.9(I distribution is 2. This section introduces soml! basic concepts about random . and the 95 .2.1mple average itsel f a random variable. c distribution function in Figure 2. the 95 ' " percentile of the P.(0).1O distrihmion is 2." limit of 2...l rihutiOll S of averages that are u:!ied throughout the book.1 aspires 10 be a ~tal i stieian and decides to record her commuting times on vilrious days. distribution is the distri bUlion of a chisq uared random .24}:Ibus the F.2a. beca use the days werc selected al rantJom.lily com muting time has the cumuiali .. randomly drawing a s~lmplc from a larger population.. knowing. it has a prohability distrihution. Because the sample avcrage is a random . dis tribution is 2. For example.71.elects these days al random from the school year. WI. For example.has thl! effect of making the 5. which is called its sampling dbtribulion. ides no infonnation about the commuting time on nnut her of the days: tbat is.11 per centile of the Fh.Ind her d.the 95 . Characterizing the distribu· tions of sample averages therefore i. Suppose our commuting studcnt from Section 2.11 percentile of the F!O.that is. the 95 th percentile of the F~.ampling and the di:...: begin by discu s~ing random !!:"llllpling. :. Random Sampling Simple random sampling. divided by m: Wl m is dbtribuled F". She . divided by the degrees of free dom. to the Fl. the "alues of the commuting time on each of the dirfcrent days arc independcntly distributed random vllriahies. As the de nominator degrees of freedom 1/ increa~es. pro. .rl distribution.. ariab1c.'ariable is I (see Exercise 2.
they are sa id to be independently and identically di~1 rihut e d . .elected days and Yj is the commuting time o n the ph of her randomly seJected days. under simple rantlom sampling. Y" can be trea ted as ra ndom vari abl es.. " . is the same as the margin al d istri bution of Y"2' In o ther words. has the same marginal distribution for i "" 1. in which n objects are selected at random from a populalion ( the population of commut ing days) and ~ach member or the population (each day) is equally likely to be included in the sample. In the co mmut ing 2 example. . YII is Y ==  I (YI n + Y1 + . . . Y" are rando m. Because the s._1 (2.5. then Y b . YI!' When YI . thei r values of Y will differ. Y. . . is random . draws. . Y" are randomly drawn iTom the same popu la tioD. Y I is the commuting time on the firs t of her II randomly :..d.. . Y" are drawn from the same dLstributi on and are indepe ndently dislri buted .. + I y~) = If " L Yi' . the margina l distri bution of Yi is the same fo r each j == I. Because YI .. o f the n observations Y.. .!lmplc was drawn at random. the values o f the observatio ns Y1' •• • • YII are themselves random. ..d. Yt is dist ributed independe ntly of Y 1.43) An essential concept is that the act of dra wing a random sample has th e eUeet of making lhe sample average Y a random variable. . their average is random. th is margin al distrib ution is the distribution of Y in the popu lation being sampled . ... The Sampling Distribution of the Sample Average The sample average. or U. Be fo re they are sampled. Y is the second observation. draws ure summarized in Key Concept 2. i.. .. called simple r~ndom sampling. the act of random sam pling means thar Y1' . Simple random sampling and Li. a nd so forth . Be<:a use YI . ..11m&. . Y L•••• . Y n can take on many possible vJ lues:a h er they are sampled.. Because the members of the popu la tion induded in the sa mple are selected at ra ndom...46 CHAPTER 2 Review 01 ProlxIbility The si lUarion described in the previous paragraph is an example of the sim plest sampling schem e used in statistics. Y" are said T be iden O tically distributed . U ode r sim ple rando m sampling. then the obM! rvotio lls and . Y". When Y.. . . f1 . a specific val ue is recorded [or each observa tion. If dif fe rent membe rs of the populati on are chosen. . . knowing the value of Y1 provides 00 infor mation abollt Y 2• so the conditional dis tribut io n or Y2 given Y.i. Had a different sample bee n drawn. The n observations in the sample are denoted Y. " .". where Y1 is the first obse rvat io n. . the value of each Y..d.
Beca use Yis nmdo m.. + Y2 is given by applying Equation (2. Whe n n = 2. drawn objec t is denoted Y. (2. YJ) O. Had she chosen fi ve different d ays.44) The variance of Y is found by applying Eq uation (2. Tht: di ~iribu t i on o f Y is called the sampling distribution of Y. .• and let ~y and (T~ denote Ihe mean and va riance of Y. var(Y\ + Y l ) = 2(11'.. 11 and Y1 is distributed independen tly of Y2"'" YII a nd so forth. nrc independen tly distributed for i .. We start our dis(.• Y" are i. 2. For general/I.1xmon 01 !he Sample A"'"'9" 47 L .d . (beca use Ihl! o bse rva tions are i.she would have recorded five difrereot tim esand thus would ba ve computed a dilTerenl value of Ihe sam ple average. then compUled th~ average of those five limes.LE(Y. . I 2. Y" are i.suppose OUT student commuter selected £iye days al random to record her commute times.d .i.Ld.) = n ".. il has a probabilit y distributio n.. The sampling distribution of averages and weight ed averages pla ys a central role in statistics and econometrics. because l't .1 popu la tion and each object is equally li kely to be drawn .. the random variables Yt .).  t . Thus the mean of t he sample ave rage is El~(YI . For example. Y". I " E(Y) ~ .. for n = 2.". so [by applying Equa tion (2.d "y. .•• •• Y" are independe ntly a nd idcnlically distributed (i. The va lue of the random variable Y for the . is the sa me fo r aU i . tha t isthe dislT i· bution of Yi is the sa me for all i = I..5 their sample average would have been differe nt the vulue o f Y differs from one randomly drawn sample to the next. 1 KEY CONCEPT SIMPLE RANDOM SAMPLING AND IJ.S R oodom Sampliog ond!he [J..• Y. Because each object is equally li kely to be drawn a nd the distri bution of Y.)... t he mean of the sum Y . . D. Mea n and variance 0/ YO . the mean and variance is the !oa me for all i = 1•. ..i.• In general.. •n). .i.28): E(Y 1 + Yi> = ~y + ~y = 2~y. j."ussion of th e samplin g distri butio n of Y by computing its mean and va riance under general conditions on the population distributioCl o f Y. so cov( 1'.31) with a = b = ~ and co\'( Y "Y2) = OJ. var(¥) = ~(]"f. and Y. bccause if is the pwbability distributio n asso ciated wit h possible va lues of Y that could be computed fo r different possi ble sam· pIes Y1 • .."tb ra ndoml).)J = x 211y = }J. n objects are drawn al random from . Suppose Illal fhe observHlions Y1•.... For example. RANDOM VARIABLES Jo a simple random samp le.d.37). y. ' nlU S. .
tandard deviation of the sampling distribution of Y. dra\\s from the N(Jl). and the standard deviation of Yare £(Y) ~ \'<lr( Y ) 0:= IL" . the variance. d<xs not need to take on a specific form. .46) 0 (Tv = • ul. O'l' )' then Y is distributed N(..4S) These rc~ult ~ hold whatever thl.. " L cov(Yjo Y.) + 11. ' vii (lA7) (2. the lli ~lribulion of Y.d. The notation (T~ denotes the va riance of the sampling dhtribution of the ~.'1ch entai ls deri\ing a formula for the .dcv(Y) y c u.~5) LJ 1/. j"' The standard deviation of Y is the MJuarc root of the variance.Impling distribution o f Y is.uy. (T). Y" are i.so it i~ important to kno". 2. is: that i~. = .) C2 . As Slated fo llowing Equation (2..!.. this mean s that.. Suppose thai Y lo." U and sld. In summary. (2 . Sampling distribution ofY when Y is normally distributed. ". q r denotes Iht: !l. Because the mean of"f is Jlr and the variance ofY is (T~Ill.' Jistri bulion of Y. There arc two approachcs to charnctcri711lg "'iltllpli ng distribution!': an ··c:. the su m of 1/ nonnally distribull'd random variables is itse lf nor mally diqrihuted..) I.I LY. . the mean. u~/1!).4 7). Similarly.act·· approach :lnd an ··upproximnte·· approach.Iv'ii. for Equa· tions (2. draws from the N(Pyoq~) distribution ... . L . Umt i~ the vari ance of the population di ~tribUlion from whkh the observation i.L " L '.'C Y. The "\:xac! 'Oappro. in a mathematical sen'ic. .Lt! . Y" are i. if Y 1• • •• .46). m pIc averagf! Y .}! and econometric procedures. (2. and (2.42) .. tn cont ra~tot7 i· is the variance of each individual Y. drawn.amplin!: dl'lrihutinn .48) 10 hold. .i. what the ~:.48 CHAPTER 2 Review of Probability var(Y ) "" var ( I" 1 " ~ .6 LargeSample Approximations to Sampling Distributions Sampling di~ t ributions pia) a central role in the development of statisti!. such a~ the norma l distribution.u == .
y)/Uy.d .·. i = I. . then (as d iscussed in Section 2.•• • Y" are i. II are i. This section presents the two key tools used to approximate sampling distri· butio ns whe n the sa mple size is large.2. . A lthough exacL sa mpling distributions are complicll ted and depend on the dis· tribuLion of Y." When a large number of random variables with the same mean are averaged toge ther.jJ. Because sa mple sizes used in practice in economet· rics typically number lDthe hundreds or thousands. The law of large numbers says that. if Y is normally distributed.d.y) / a r does 1101 de pend on the dist ribution of Y. Unfortunately.· .5) the eXllct distribution of Y is normal wi th mean p. the sampling distribution of the standardi zed sa m· pi e average. This normal approximate d istribution provides enormo us sim plifications and unde rlies the theory of regression used throughout this book.. Y will be ncar /. and Y I • . Y. these approximations ca n be 'cry accurate even if thc sampk.. Because she used simple random sampling. druws of a Be rnoulli random . in which she simply records whe1he r her commute was sho rt (less than 20 minut es) or long. Thus.y and variance u~ 1 n. Y will b~ close to p..0 lorgeSomple Approximations !o Sompling Distributions 49 descri bes the distribution of Y for any" is c'llIed the exad distribution or finite sample distribution of Y.i ..e is o nly " . Y" arc i.d. Le t Yi equ al 1 if her commute was ~ h o r( on the j Ib ra ndomly selected day and equal 0 if it was long. these asym pto tic distributions ca n be counted o n to provide very good approximations to the exact sampling distribution. under general conditions. Y I . This is someti mes called the '"law of averages.si. (¥ ."asym ptotjc" because the approximations become exact in the limi t that /I ~ ce. The large !<ample approximaiion to the sampling distribution is oflen called the asymptotic distributioo. the large values balance the small values and their sample aver age is close to their common mean. The "app roxim ate " approach uses approximat ions to the sampling distribu tion that rely on the sa mple size being large.remarkably the asymptotic normal distribution of (¥ .y with very high probability. As we see in this section. if the distribution of Y is not nor mal .30 observations. . The Law of Large Numbers and Consistency The law of large numbers states (hat. For example.i. is approximately normal.iJ. consider a simplified ve rsion of o ur student com muter's expe r iment. when the sample size is large. For example. Moreover. (he law of large numbers and the central lim it theore m. the asympto tic disLri bu tions are si mple.l )' with \cry high probability when II is large. when the sample size is large. The central limit theorem says that.z. then in general the exaet sampling distri bution of Y is very complic<Hed and depe nds on the di:mibution of Y..i .
r. . is rinite. Y can take on on ly threc val ues: O.. d. (r~.) or. assumption holds.~. When II = 2 (figure 2..l't II increases is called cOII\'c rgc nce in pro bubi lit) or.: ~ampk a\oerage \.8 sho\\·s the sam pling distribution oCY for various sample sizes n . un more values and the sampling distrihution bl!comes tightly centered o n }Ly.c 10 the Irue proportion in Ihe popuJalion. '!lIe sampk ave rage Y con\'crgcs io probability to jJ. then the Ll. equivalently. where (from Table 2. .y .C 10 Pol + c bccQlm:'s ~ ~ 2. Y takc::.1f the data arc collected by simple random sampli ng. \ariable. The a~sumption that t he varianc. <lnd I (neit he r co m mute . CONCEPT CONVERGENCE IN PROBABILITY.y wilh increasing prohnhilit~ .. as short..that is. ere "hurt ). out lieni.78. = 1 is O..2) the proba bility Ihal Y.) "" /L y ::: 0. consiste nc) (sec K c~ Con cept 2. i = 1. cally if \'ar( V. The C"Onditions for the law of large numbers that we will USe in this book are that Y. The law of large numbers Slall!s Ihal.. . 'Ille propen)' tbal Y is near J1. <lnd that the variance of Y.6). Fo r example.afe unlikely and observed infre quen ll~ : otherwise... The sample avcnlge Y is the fract ion of d A in her sample in which her YS commu te was shurt.l.. . Thb l~ wri llc n as y 4 Ilr. none of which b parlicular!) c10. the variance or the dl.I.)" . bccauM! the re is an upper limit \0 our sludent's commuting time (~he could park and walk if the trumc is dreadful).::pt:nden tly and idc niitillly dist ri huted with £ ( Vi) = jJ.d. equi\alcnlly. the n Y ~ J. 11 a rc ind.) = if y < '". nlC math(' rnatieal role uf these condi tions is made dear in Section 17. Beca use thr: expectat ion of a Bernoulli ra nd om variab l~ is its success probability.. more concl!lch.2. . y (or. CONSISTENCY. This assumption is plausible for the applications in this book .6 ! a rbitrarily close to one as " increases for any constant c > O. 111C law of large n umbers says t hat if Y I' i .7K As II increases. where the law of IUfi!e numbers is pruve n.8a).d).I).. under cerlain condilions.) if the p robability thai Y is in the fange lJ.trihution of commuting times is finile. . Y con veri!l!s in probability to J1.' ).. th:'II Y is consistent for J1."ould be unreliable. Figure 2. 11 arc Li. one was short. <lnd hoth ..: is finit!! sa ys that e"tremc1y large \·alues of Y.. . y and if large o utlie rs Il T unli kely (ICChlli e .50 CHAPTER 2 Review of Probobility L ". AND THE LAw Of LARGE NlIMRFR<... Y is con sish:nl (o r J.h . E( Y. these large value~ cuuld dominatc Y and th. howc ver (fig ures V'. O.7R.
\ . ~ 1) "" 0. Val ue of sam ple average (d) 11"= tuO The di~tr.bution5 are the $Ompliog di~butions of Y. .1. so the sampling dislribuhon becomes more tighrly concen· 'rated around in mean p. l) .78 1i.5 ProbabililY Probabili ty o J25 /1 ::: II Ii 'i2:.' ( 011 1 1 ..II 112 0 25 OS" U. 'it' (") 7'> 1. .78 os the somp/e ~ze n increases . The yorionce of the sam· piing distribution 01 Y cleaeoses a s n gets larger.'rage (b ) II 1 . "J . the sample average 01 n independent Bernoulli random YOriabies with p Pr(Y. 1 ..6 lorge·Sample ApproKimolions 10 Sampling Distributions 5I fiG URE 2. Uti 25 (.i Va lue o f sam ple average (a) II Va lue o f sa m p le a\. JI . iI.I)(.2.7 P robabili lY '" "4 p " O.0U \..1 ". .)..78 (the probability of 0 short commute is 78'1.75 1.IM' Value of sam ple average (C)II :!. 07S . '"' 0.5<J 11. (1 .." ".. l)7~ J.3 '1'.5 1 'Xl flO 02'i O'jj. 11 1 . (!.1" .. ..8 Sampling Distribution of the Sample Average of n Bernoulli Random Variables Probability ".
As dis cussed ilt the end of Section 2. (Y . that is qu ite different from the Bernoulli dis tri but ion . the distribution of Y is well approx imaled by a normal distribution when fI i~ large. arc themselves nonnally distributed. when the underlying Y.8. Recall that the mean of Y is p. so lhat it has a meltn of 0 and 'l variance of 1.. 25. on the distribution of the underly ing Y. how lmge is " Iarge e nough" ? Th at is.p. the distribution of Y is f'xlIclly N(p.10 for a popu la tio n distribution. then Ihis approximation can require" 30 or even more. this requires some squ inting. how large mUst If be fo r the distribution of Y to be approximately normal? rne answer is '" it dcpend~"The quality of the normal approximation depend. It wou ld be easie r to see the shape of th e d istribution of Y if you used a magnifying glas!> or hud :some other way to zoom in Of to cxpand the horizontal axis of the figure. However. the normal appro ximation still has not iceable imperfections. if the Y.y a nd its variance is u~ u }ln. C. After this change of scale. TI}is point is illustrated in Figure 2. normal approxi ma tion can be see n (a bit) in Figurc 2. that make up the average. y)/U y is ploued in fig ure 2. shown in Figure 2. rrj. because the distributioo gets quite tight for large n. in contrast.})/u y . ::0 . the distribution of Y is we ll approximated by a normal distributio n.. 1) distribution when If is large.5. At one extremc. if 11 is large enough. and 100. is shown in Figures 2.fJ. then Y is exactly normally distributed for all II .Althougb the sampling distributjon is approaching the bell shape for /I . when the sa mple is drawn from a population with the normal distribut ion N (p. lOb. The central limit theorem 5<1)'S that this same resuh is approximlltely true when II is large even if Y1•• • •• Y" are not themselves normally distributed. 103. According to the central limit the orem. Ih is distribution should be well approxi mated by a N(O. and d for n = 5. The convergence of the distribu tion of Y to the bellshaped. under gener al conditions.. u?).y. This distribu tio n hus a long right lail (it is "skewe d" to the right).52 CHAPTER 2 Review 01 Probability The Central Limit Theorem The centrallimit the orem says tha I. respectively.9 fo r the distributions in Figure 2. when n is large the d istribution of Y is approximillely N(p. A ccording to the ccntral limit theorem.25.}. 0:= at> One might ask.1his leads to examining the distribution of the standardiLed version of Y. after centeri ng a nd scali ng. it is easy to sec that.9 are e xacliy the same as in Figure 2.). e xcepl th at the scale of the horizontal axis is changed so that the standardized vuriable has a mean of 0 and a variance of I.}. themselves have a distribution that is far from normal . The sampling distribution of Y.8: the distributions in Figure 2. The d istribution of the standardized average (Y .8. One way to do this is 10 slanda rd i%e Y by su btracting it:i mea n a nd dividing by its s tandanJ deviation.
5 " 0.· tions in Figure 2. .u 53 FIGURf 2...0 III lUi " I .lO Su.< ~ " . This centef"51he d.11 .2. U..l on .3.1. the $OIfIpli"9 di1tributions are increasingly well approximated by the normal dhtributioo {the solid I. . 3 .8 is plolted here otter slondardizing Y..ndardiu. 05 predicted by the cenlmllimit theorem..J(. (I 2. 30 .!ll .) P robability 0.oe).d va lul! or u mpll! average (a) . .8 and magnifies the scole on the horizontal axis by a Iodot of Vii.78 Probability (I.!5 The 5CJfTIpling dj ~tribution of Y in Figure 2.i1 nnJ.3 02 01 U.l00 tel /I .7 ...9 Oi$tribution of the Standardized Sample Average of n Bernoulli Random Variables with P = 0.  Staodardi t:ed va lue or "'rnple average (b ) n " 5 Prob~b iU ry 2 P ro babiU ry 01 Standa rdized value o f sa mple average Standa rdin d val ue or . c. The normal distribution is Kaleel $0 that the hei9ht of the di:ltribution5 i5 opproximotely the 5ClmG in 011 figures.2 ..\ "' ('1 0... When the sample size 11 Iorije.6 LorgeSomple Approximations to Sompling Di~bulio.5tribt.ample avt rage (d) n .11 011 " 10 2 11 .U . .
. =:. ~2 j Stnnd ardi zed val ue of sarllple ave rage (d ) II =. Bvt when n is large (n = 100). is skewed.. When n is smo!! In . r ~ 1 '4'T~~~ " ."'1 P robability (U~ t'I. the sampling di~bution is well approximated by a stondord oormot dip.1:: Probabili ty 01" I UIIl 0(1\ ~ .lll lI . like the population dip. The nor mal distribution is KOIed so that the height of the dislriburions is opproximately the some in 011 figvrM.ibu\ioo (~id tine). I) 3( \ Sta ndardized value of sa mple average (a) 01 = 1 Stand ard ized Vol lue o f sa m ple :lvera ge (b) .3. as ptedided by the ceotrat limit theorem. the sampling distribu· tion. 10 Distribution of the Stondordized Sample Average of n Draws from c Skewed Distribution P ro ba b ilit y 11.201 n..100.0 Standardized value of sam ple a~'erage (c) ..YI 0. Probability 0..ibution.\i) <l.54 CHAPTER:Z Review of Probability FIGURE 2.Inn The figur~ show the !oOmpiing distribution 01 the sta ndardized sample overoge 01 n drows from the skewed (asymmetric) population distriootion shown in Figure 2. 5)..
•. 2. Th. for" 2:: 100 the normni approximation 10 th~ distribution ofY typical!} i~ vcr)' good for a wide variet)... Y is said to he asyml)loti ca ll~' norma II)' distributed . of popula lion distributio n ~ l llC cc nlrallimil theorem is a remark a ble res ult While the "small n" distrib utio ns of Y in parts band c of Figures 2.llion is quite good. or is the probabil ity dist ribu tion of Y.' com'eniencc! of the normal approximation. the probabi lity distribution fun ction (for discrete random variables) . combmed with its wide applicability because of the central limit theorem. B ccau~c th e distribution of Y approachcs the !lumMI as 1/ grows large. and the probabllity density function (for con tinuous random variables).}Jy) /u y (where = a} /lI) \lCcomes ilTbitrarily weH lIpproximated by the standard normal distribution.d.cd in Key Con cept 2. amazingly.. The joint probahilitics for two random variables X and Yare summ ariLcd by t heir joint prObability distribution." di stributions in Figures 2.5 .1 OU are simple and.).. Y. The ccn trullimit theorem is summaril..rSummary 55 ~ ' f¥4i%JIlJtp. The conditional probabilit y distribution or Y given X . denoted £( Y).7~ <?C.10 a TC complic<t led and q uile dirrl'TC ni from each o ther.. whe re o< . The va ria nce of Y is CT~ = E(Y . makes it a key underpinning of modern applied econometrics. the "Iarge . .vl KEY CONe".. .. 2. In fact. conditional on X taking on the value x. the distrihuti on of (Y .. 3. howc\'er. 9d and 2. is its probabil ityweighted average value. A Tlormn ll y dist ri but ed ra ndom varia hie has the bcll lo hnped probability dc n ~ity in l::"jgurc 2. JA. with £(Y/) = p. and the sta ndard deviatio n o f Y is the sq ua re root of its variance.i. arc i. 4. To calculat e a probilbility associo t('d with a normal random variable.7 (!? By II = 100.lhe norma l approxim.) = 0'1. Summary L The probabili ties wit h whic h a random variable takes o n diffe re nt values are summarized by the CUlllubtivc distribution Iunction.7.lLy)1J. LTHE CENTRAL LIMIT THEOREM Su ppose that Y. As II + 0:: . ha ve a simila r s hape. )' and va r( Y.. The expcctl!d value of a random variable Y ( also called its mean.9 and 2. .
f. the sa mpling dis tribution of Y has mean iL y a nd variance and ur = «1. the law of large numbers says tha t Y converges in probability to J. Simple random sampJ.. (Y .py)lu y .d. a re i. .5. Y.) kUrlQsis (27) oUl lier (27) le plo kurlic (28) joint probability dislribution (29) marginal probability distribution (30) conditio nal diSlribution (30) conditional expecl3tion (32) condi tio nal mean (32) law of iterated expecta tio ns (32) conditional varia nce (33) independence (34) covariance (34) (20) Be rno ulli rand om variable (20) "B ernoulli disuibution (20) pro ba bi lity de nsity funct ion (p. CH. b. th e central limit theorem ~ays thai the standardilcd versio n of Y.f.i. . If Y I •. Y" thai are inde pendently and iiJeolically distributed (i.d. varies from one rando mly chosen sample to the next and thus is a rando m variable with a sa mpling distributio n. 5.ly: c. .i. . then use the standard norma l cumulative distribu tion tabula ted in Appendix Table 1.d. Key Terms o utcomes (18) probability (1 8) sa mple space (19) event (19) discre te random variable (19) continuous rando m variable (19) probability distributio n (19) c umulative pro bability distribulion (19) cumula tive distribution fund ion (c..d. 6.) (21) de nsit y function (21 ) density (2 expected va lue (23) expectation (23) mean (23) varia nce (24) standard devia tio n (24) mo me nts of a distributio n (27) skewness (27) correlation (35) uncorrelaled (35) normal distributio n ( 39) n standard normal distribution (39) standardi ze a variable (39) muhiva ri ate normal disuibution (4 1) bivariate normal distributio n (41 ) chisq uared distribution (43) Studenl. • Y".ing produces n random observations Y•. . ). has a standard normal distribution [N(O. then: a.prEII: 2 Review c:J Probability fi rst standardize the variable.l n. The sam ple ave rage. l) di stribution] when II is large. dislribulion (44) FdiSlribulio n (44) .
d. and the mean studen t weight is 145 lbs. describe bow the densities differ.1). 51 Whal about II = 25 or n = l 00? Explain.s Suppose that Y 1• •• •• Y" are Ll. (b) the number of times a computer crashes...1 Examples of random variables used in this chapter included: (a) the gender of the next person you meet. In words.) (46) sa mpling distribution (47) exact (finitesa mple) distribution (49) asymptotic distribution (49) law of large numbers (49) convergence in probabilit y (50) consistency (50) cenlIallimit theorem (52) asympto tic nonnal distribution (55) 57 Review the Concepts 2.l\'c some la rge ou tliers.e rela tionsbip between your answer and the law of large numbers? 2.2 Suppose (hat the ra ndom variables X and Yare independent and you know the ir distributio ns.i. 2.i. . Will the average weigJ1t of [he studeuts in the sam ple equal 145 Ibs. .. 2.6 Suppose that YI ..? Why or why not? Use this exa mple to explain why the sample average. A re X and Y independen t? Explain.d. Explain why each can be thought of as random .Review of (oncepls simple random sam pling (46) popul ation (46) identically distributed (46) independently and idenlicaJly distributed (. Explain why n ran dom yariables drawn from Ihis distribution might h. 4) distribu· tioo . 10a. (e) the time it takes to commute to school . Y" are i.7 Y is a random yariable with k "" 0. skewness = 0.. A random sample of 4 students is selected from the class and their a\'eJ age weight is calculated. Y. random ya riables with the probabilit y dis· tributio n given in Figure 2. What i ~ th.. Explain why knowing the value of X tells you nothing about the \'alue o f Y.. (d) whether the computer you are assigned in the library is new or old. Sketch the probability density ofY whe n /I = 2. Ske tch a hypothetical prObabilit y distribution of Y.d.3 Suppose that X denotes the amount o f rainfall in your hometown during a given month and Y denotes the number of children boro in Los Angeles dur ing tht: same month.. . 100. (Iy = I. Would it be reasonable to use tbe normal approximatio n if II . Repeat this for II = 10 and n . and (e) whether it is raining or no1. 2.. and kurtosis = 100. Z. random variables with a N(t .4 An econometrics class has 80 students. You wa nt to calcula te Pr(Y $ 0. is a random variable. 2.
the joint probability d istribut io n be tween empluy me". cumula tive probability dist ribution of Y.4 Suppose X is a Be rno ulli random vari ahle wit h P( X a.3 U~ing. 2.7Y. De rive th .2. Comput e (a) £(W) and £ (V) . I) . a nd kur· IOsis o f X.58 CHAPU:R 2 Review of Probability Exercises 2. (b) (T~. Ce nsus.21. V) .1) Total 0. status a nd college graduation a mong Ihose ei ther e mployed or look ing for \\ork (u nemployed) in the working age U.5 In 5e pte mbc r. Shnw E(X~) == p . Suppose that p == 0. ( Him: You migh t find il he lpfuJ 10 use Ihe fo rmulas give n in E xercise 2. a nd vari ance in °C? 2.000 . 2. st:l nd a rd devia tio n.1l5<J 1. Joint DislTibution of Employment Status and CcMlege Graduation in the U. consider two new ran dom variables W == 3 . Population Aged 2564. . 1990 Unemployed IY" 0) Noncollcg<: grm.005 0. Wha t is tbe mea n. D e ri ve the probabilit y distribut ion of Y. oX and " == 20 . va r i a n cc.7119 O. c.UJ College gr:ld~ IX "" I) TO III O.) 2. Derive the mean and variance of Y.9'iO O. h. popula tion. b. ~ 2.2 Usc the pro babi lity dislfihUl ion given in Table 2.S. Compute tbe me a n. Y ).hl 0.6 11le following table give:.1 Let Y denote the number or" heads" tbat occur when IwO coins tIfC tossed.146 0. . based on the 1990 U.24t 0.( ~5 EmploYK IY.S. l hc nmdom variable s X and Y from Table 2. Scatth:'s daily high te m pe ra ture has a mea n of 70°F a nd II !itandard devialion of JOF. c.ls ( X .s k ew nes~..p . Show E(X*) == p for k > O.3 .S.. n. and (r ~: and (c) (fll' V and corr( W.2 to compUlc (a) £( Y ) and E(X): (b) <I\: ami a ~: Hnd (c) a xy and corr(X.
:(Y d. Thai is. W ha t is the :.1 mean of $40.01 0. " 30 40 .17 0. c. a. the cova ria nce between m ale a nd fe mail: ~lIrni ngs? c.02 O . Wh a t is the probabil it y tha t thb worker is a COllege graduate'! A noncollege graduate? f.lIld a standard dcvi<ltion of S IROOO. and so Iorth. an d variance o f Y give n Calcu la te the covaria nce and corre lation bel\\·ecn X lllld Y.7 In a given fX)pula lion of twoearner male/fe male couples.09 0. a. b. 2.l). mean. . Calculat e the probabili ty distribut ion. Lei C denote the comhin ed e<l rnings fo r a randomly selected couple. e. 2.115 U. Ca1cuhHe E( YI X = I) and F I X = 0).80..9 X a nd Ya re discre te rando m variables with the follow ing joint dist ribul ion: Value of Y 14 0. Show that lhc unemployment fOlie is given by 1 .W 5 0."' 0. male earnings have .!uutcs and (ii) non college graduates.02 • X · 8. Female cam ings have a mean of $45. 10 11.£( Y). 0. Calculate the pro ba bilit y t.0 1. The corre lation between male a nd female e arnings for a couple is 0.01 0. Calcu late the une mployment rate for 0) (·ol1 cgc grut. h. independent? Explai n. Convert the answe rs to (a). 59 Compu te £ ( Y) .1 5 om 0.L = OandfT> "" 1.02 Value (Ie x . A randomly selected member of this population re ports being unem· pJoyud.0:. t'.15 0.02.!istrihution . Y = 14) = 0..8 Th e ra ndom variab le Y has a me an of 1 and a varinnce of 4. and va ria nce of Y.I . mean.Exerci5eS ~. Arc ed ucational achievement and employment strum.ShowthatJJ. 2.. The unemployme nt fate is the fraction of the labor force that is une m ployed. Le t Z = ~ (Y .(c) from S (dolla rs) to E: (curos).\andard dev iation o ( C? d. W hat is the mean of C"! b_ Wha t i. Pr(X .000 per year .000 per year and a standard devia tion of S 12.1.
1. 9). I). Y ~ distributed N(O. If Y is d istributed X~.31). find Prey > 4. . If Y is distributed X~.ewness for :l symme tric distribution? ) c.0) .10 Cornru le the foll owing probabilities: It. find Pre Y > 1.12 Compute the I'o llowing prob. If Y is distributed F7.99. Why are th e answers to (b) and (c) the sa me ? t!. find Pre .99 Y s 1. Use the centra ll imil th eorem to . Show that £(y 4) = 3 a nd £( W4) = 3 X t 002.13 X is a Be rno ulli random variable with Pr(X = 1) = 0.) d.:'l bi lilies: a.and S = Wwhen X "" 0. 2. If Y is dimibuted ( 15. b.) o. If Y is dis(ribUled .99 :s Y :s 1. 2. I).) 2. !i> c. If Yis distributed Fj4. Why a re the answers to (b) a nd (e) approx imately Lhe sa me? c. Show that £(y2) :0 1 and £(W2) '= 100.. If Y is distribute d N( I.t1o. r[ Y is diSlribUl ed N(3. Le i S = XY + (1 . (Him: Use the law of itera ted expectation s conditioning on X = 0 and X = 1. S = Ywhe n X = I. If Y is distri buted N(5. 2).11 Compute the following probabiliti es: a. (Hint: Use the definition of (heXi distribution. Derive E(S) . find Pr(Y :s 1.83).99). d. If Y is distributed FlO. Show tha t £ ( y 3) = 0 a nd £(W3) = O. find PreY > 1.78).1. 4}..75).lind Pre Y > 18. d. find Prey > 0). c..X)W. Y s 8). b. £(5 3 ) and £(5 4 ) . 25).1' = 100 and u} answe r the following questions: = 43. b. 1. b. (l1il1l: Usc the fac t th a t the kurt osis is 3 for a no rmal d istributio n.99). (Thai is. If Y is dislribuled N(O. (Hilll: Wh at is the sk. £(5 2 ).14 In a popula tion 11. (ind Prey ~ J). r. find Pr( Y :s 7.) e. find Pr(6 s Y :s 52 ).l2O" find Pr( Y > 2. find Pr(40 :$ d.79) . and W is distributed N{O 1(0).60 CHAPTER:I Review of Probability 2. If Y is distributed /90' find Pre . D erive the skewness a nd kurtosis (or 5.12). Co If Y is distributed N(50.
I n a rando m sample of size 1/ = 165. fi nd Pr( 101 S Y s 103). .. In a ra ndom sa mple of size II  64 . 17 Vi' i = 1.Exercises a. Let Y de no te the sample mean .4. c. .. 2•.18 In a ny yea r. a. i. (i) What is the expected value of the average da mage Y? (ii) What i:.d . 1(0) and you wa nt 10 calc ulate Pr( Y < 3. Le t Y denote the 3\l. Comp ute Pr(9 . draws from Ihe N(5. eac h d istribu ted N(10. How large would n need to ~ 10 e nsure That Pr(OJ 9 :S Y :S 0. 2.6). 2.15 Suppose Y i . " a. From yea r to yt=ar. (i i) II = lOO. UnfOrlU· nately. 2.nsura!lCC pool " of 100 people whose homes are suW ciently d ispersed so that. the damage is ra ndom. Pr(Y :S 0. Y~ lOA) when (i) IJ = 20.ragc damage to these 100 homes in a year.0 as n grows large.1. Suppose c is a positivc num be r. 1(0) distributio n. random variables. n are i. you do have ~ou r com· puter and a computer program that can generate Ll. Explain how you can use your computer to compute an accurate approx imation for Pr(Y < 3.. What is the mean and standard deviation of the damage in any year ? b.6 s (iii) n = I. b.000.e that in 95% of the years Y = SO. Use th e cen trallimil to compute approxima tio ns fo r i. in llny year. . c. The wealhc r can innict storm damage 10 3 home.4 ). III..95'? (Use the central limit theore m to comput e an a pproximate a nswer. n.37) when n = 4(JO. However. li nd PreY s 10 1). the d amage to d ifferent ho mes can be viewed as independen tly dis tributed ra ndom variables.d . but in 5% of The years Y = \20.i .) 1.. lllld b. Use you r a nswe r in (b) tu argue Iha l Y conve rges in p ro ba bility 10 10.6).4 1) ~ O. b. Pr(Y 2: 0. Consider an "i. arc i. Suppo:. fin d Pr( Y > 98).OOJ.d.. the proba bility thai Y exceeds $2000'1 . you do not have your tcxthook and do nOl have access to a normal probabil ity table like Appendix Table 1. Show tha t Pr( ID  C S Y :S 10 + c) becomes d ose to 1. ii. Berno ulli ra ndo m va ri ables wilh p = 0. In a random sample of size 11 = 61 100.16 Y is distributed N(5 . Let Y de note the dollar val ue of damage in a ny given year.43) when n = 100.
.19). Compute the me a n a nd standa rd devia tion of R. •..W . Suppose that IV = as Compute the mean and sta nda rd deviation of R. 0 a nd 2.• . [Hint: This is a ge ne ra liza tion o f Equation (2.Y):< O .l'. ) :l. + 6IE(X)]' IE(X') ] 2.3(E(X'JIIE(X)1 + 2(£(XlJ'.. Suppose tha i $1 in vested in a stock fu nd yie lds R.05 (5%) an d standard deviation 0. f'r (X x . ...20 Conside r three ra ndom varia bles X.4(E(X) IIE(X ' )] 3IE(X)]'.r.for sim plic ity.20).)Pr(X "" ... + (1 ..XI.. X) .22 Suppose )'ou have some money 10 io\'esl. Show that Pr(Y = y) = L:~ I Pre Y = yjl X = .)' ~ E(X') . a nd Rb is 0. The correlation betwee n R.)l ~ E(X') .75. Explain how the marginal proba bility that Y = Y ca n be ca lcul a ted from the joint probability distrib utio n. in the bond fun d. into a bo nd mutual fund . l ' " d y .IV .. Show £(X .. The joint probability distribut ion of X. [Hint: This is a genera liza tio n of Equatio ns (2. Z = z).. Suppose tha t X a nd Y a re independent. Suppose thaI Y takes on k values Yl' . Show E(X .. the n th e return o n your investment is R = wR. 16). Suppose tha I Y takes on k va l ues Yl' ... a nd Z. If you place a fra cti on w of your m o ney in the ~Iock fun d and the rest . 1 . Sl and you are pla nning to p ut a fraction w into a stock ma rke t mutual fund a nd the res' . Y. Z is Pr(X "" x. 1 .)... Show that q corr(X. x . and th at Rb is rando m with mean 0.25 ... a.t /.1'/).. Use your answer 10 (a) to ve rify Eq uation (2 ... a. b.J b. and that Z tak es o n m va l ues ll .19 Coosider ( Wo random variables X and Y. CHAP1Ea 1 Reviewof Probability 2. and so fo rth . E(X}).I1')R/. 1:".Z)).( Z = z) = f'r(Y . c. Show tha t £( Y) = E IE(Y IX. is ra ndo m with mean 0.08 (8%) and standard deviatio n 0.l l X is a ra ndo m va ri able with mo me nlS £(X ). (Hillt: Use the definition of Pr( Y = yJI X = x. c. after one year and thai $1 invcsted in a bond fund yie lds RbI that R.J b..7 { } . b. Wha t val ue o f w makes the mean o f R as large as possible? What is the standard devia tion of R for this value o f w? . £(X2).19) a nd (2.. " y~.07. a. Y = y.. and the conditional probability di stributio n of Y given X and Z is Pr(Y = IX = ".04 . Y. .... and tha t X takes on I values XI' .• Yk> Ihal X lakes on I values X I ' . Suppose that w = 0.
NCO.. 2. a. Eqoatio n (2.1 ~I is d istributed 1.2 .3 1). var(b + bY) = Ella E(b + bYWI = Ellh(Y .. (Harder) What is the value of w that minimizes the standard devia tion of R? (You ca n show this using a graph..Deriva ~on of Results in Key Concept 2.) d. Show that £(Y IX) = X 2 . is distrib uted i. Show (hat E( XY ) = O. y) Z] :>: f>2q}. y "" 1.y)FI . usc lht defi nition of the variance 1 wrile 0 var (oX + bY) = EH(aX + bY) . Show that £(W) = n. (Hilll: Use your answer to (a). (249) . Show thai W = ~ ~. 3 2_ 111 This appendix deri ves Ihe equations in Key Concept 2. c. Y) "" O.23 Th is exercise provides an ex ample ora pair of random va ri ablt': s X and Y for which tbe condit ional mean of Y gi ven X depends o n X bU I corr(X.uy)2) u2 var(X) + 2lIbcov(X. lei X and Z be IWO independent ly distribu ted standard nonnal random va ri ables. zY j n .~ x) = E(a l( X  + bY  + bey . algebra.d .l(Y .30). b.Y ) "" O.p. a Show that £(Yl'q'2 ) = I. use Ihe defin._ Derivation of Results in Key Concept 2. 1. .O. i.1)0 \' y + b?u f.(oJJ x + b. n.29) follows from the dd inilion of the expectation.uy)J'1 = £{[u(X .Io< x)(Y + b 2var(Y) .3 63 d. (Him: Use the fact that the odd moments of a sl::J. Show thaI cov(X. b. To de ri ve Equalion (2./oly)] + '" E(b2(y . Y) = 0 and thus corr(X.3.n (2. To derive Equill.. a nd le I Y "" X l + z. I k .lioD of the variance to write.:'J 'II is dislribuied X.lYt. Y) = al ai + 21. .p.._ 1' APPENDIX _ __ . Show that JJ..) 2..ty)]2] ~ x) l l + 2E [a b(X .ndard normal random variable are all zero.J. o r calculus. c. Sbow (h a l V::. 0"2) fo r i :.24 Suppose Y..) d.
\' Uy) !S 1.y = (T X Y + 14 ./f.lx) + JJdl( Y .(y 2) = £U(Y It 1') + 214\.II (250) '" b u XY'" coVI" \\hieh is Equation (2.34). C4ui\i"lk ntly.ty) + j..ly) + ~ III + ILxlA. t h. lcorr( X ..JJx ) F. and the fou rth equality fo llows by the definition of the vari.\' .~ because E(Y ." .35). it CHnno t be n.J..s b)' collecting lenns.A.fl + .51 ) it must he tha t u?  ul Y/ tfl ~ O. \II Y ".II + E iI'(V  ".loUt (covariance Inequality ).HY JJ.ud + J.'0\'(11 • !J X + cV.lIlce and co\ oriance.Po rl! = E II b(. Equalion (2.IJ.Pr») + J.S1) .E( Y  JJ..llity implic!' th a t dJ yl(('~ u ~)!S I o r.y.~x)(Y .1'IJ. lcorr(X. o~ .:quality in Equil tio n (2. 1 252) incqu.lxE( Y  j.r) . ~o from t he fina l line of Equ.l tion (2.FI = L:.[(X .ty£ (X .O'. o.33)... write F.:gal ilc.. I. To derive Equiltion (2.Y)1 S I . To derive Equ3tinrl (2. write E(X Y) = E II ( X . = We now prove Ihe co rrelation in... Itf \ y / (u. i Jlie cmilria ncc I s (.33). usc tht· defin ition of the co \ ari:mce 10 write (.IT n /(' j.32).l S  u} + /.64 CHAPU t 2 Review of Probobilily where the 'lCcond equali l )' folJo.. lei n "" . Ihal is.mse var( aX + Y) is II variance.. .e(V = E JJv)lI Y . . we huve lilal var(aX + Y) = (l1" i + O'f + 20fTXV '" ( . Bec. ilnd b  L A pplying. Y ) I :s.JJ\') "'f.3 1). Rea rranging this ineq uality yid ds o.: thi rd equilhty follo~ b~ expand ing Ihe quad ra tic.tfhltf } . which ( using the definit Ion of thc correla tion) prow:s the co rrelation mequality. To derive EqulHion (2."'I'll ". >lI Y IIb(X . Y) ~ EU II + bX + cV  £( 11 + b X + e VJI[ Y .W o"j + u f· n + 2(u xy/(I'l)uxt' (2.J.
survey the e ntire U. . Thus a di(fere nt.lordinary commitme nt." of distributions in populations of interesl. The o nly comprehensive censu~ l llC survey of the U. differ for men and \\omen and. whill is the mean of the distribution of earnings of reCent college graduale~'! Do mean earning.In practice.5 .S. SuujSlicallools help 10 answer ques lions about unknow n c ha raclcrislic.S. Des pi" . Usi ng statistical methods. if so.. we can u~ this sample to reach tentat ive conclusio ns. 1000 members o f the 'Pula tio n. such a co mprehe nsive !>urVl"~ would he extremely eXfXn~iw. Rather tha n . by how mucb? 1l1t:~e qu eslion ~ rclatl. O ne way to answer these questions wou ld be 1 perfo rm an exhausti ve 0 slIrw y of the population of work ers.sur vey.about characteristics of the full popululion. For examp1t:. hi!> cx trl.CHAP TER 3 Review of Statistics S IlIl is lic~ is the scie nce o f using d <lta 10 lea rn a bou l Ih e world around us. population. managing and cond uc ting the surve ys. we might . population is the dece nnial 20lX) U.S.:.ing thc da ta ta kcs te n yea rs. selecte d at random hy simp le random sam pling..Io draw »tati!>tical inferences. llIa ny me mbef100 of the po pulation slip through the crac ks a nd arc no t surveyed. measuring the earni ngs of each worker and th us fi nding. lhe population d istribution of e arninw>. and compiling a nd ana ly:t. more practica l approllc h is needed. and the process of design ing the ccn»us (orms.! to the distribulion of earn ings in the population o f worker!>. howc\"cr. TIle key insight of st atistics is that o ne elm learn about a population d i~lribulio n by st:1ecting a random sa mple frO\lllhat population.. say. Census cost $ 10 billion.
5 focus on the use of the normal distribution fo r pertorming hypolhcsis lesls and fOr constructing confidence intervals when the samplc si7.ty and the properties of Y as an estimalOr of fJ. Confidence interva ls use a set of da ta 1 0 estimate an imerval or range for an unknown popula tion characteristic.ion of the sampl~ correlation and scatterplots in Sect ion 3. .4.. Y1. and confidence interva ls in the context of statistical inference about an unknown popula tion mean .t). from a sample o f data. is the re a gap between the mea n ea rnin gs for male and fe male recent college grad uates? In Sec tion 3.lals can be based 00 the Student f dis tribution instead of th e normal distribution: these special drCUl11sta nces are discussed in Section 3. For example. 1 Estimation of the Population Mean I ~' Suppose you wan llO know the mean value of Y (JI. In some special circumstances. MOSl of the interesting questions in eco no mics involve rela tio nships be lween two or more variables or comparisons between different popula ti ons.i.1.•• CHAPTER 3 Review of Siotisrics Tbrec types of statistical me thods are used thro ughout econometrics: estima tion. hypothesis testing.c is large.. Y" are i.3. A natural way to esti mate this mean is to compute the sample average Y from Ii sample of II inde pen dently and ide nticall y dis tributed (ij . •• Y" (recall that Y1.\" . and 3.23. lhe methods for learni ng abou t th e mea n of a singJe population in Seclions 3. ••• ..r) in 11 populalion. Section 3. such "" Ih t: mean earnings of women recemly graduated from college. Esti mation entails computing a "best guess" n umerical va lue for a n unk nown characteristic of a population disllibulion.2.5 discusses how the means of twO pop ula tions can b~ u~t:d method~ for comparing the to estimate causal effecl'i in experiments.3 review estimation.3 are extended to compare means in two different populations. and confidence intervals.) observations.3.d .d. then using sample e\'idcnce 1 0 decide whe the r it is uue.. hypOlhcsis testing. hypothesis tests and confidence intef'. H ypolh e~is testing enwils fomlUlating a specific hypothesis about (he population. Ibis section discusses cSlimation of j. if they are coll ected by simple random salllplin~). The ch<lpter concludes wjlh a discuss.such as its mean. Sections 3. 3 .7. Sections 3.1 .
There a re. To slate this mathematically. let jJ. For example. The estimator j.. so what makes o ne estimator '"better'" than another? Because estimators are random variables. o therwise.3. y. many estimalors of J. Es timators. Thus. YI' Both Y and YI are functions of the data that are designed 10 eSli mate Jot)': using the term inology in Key Concept 3. Suppose you eval uate an estimator many times over repeat ed randomly d rawn samples. we would like the sampling distribution of an estima tor to be as tightly cen tered on the unknown value as possible. while an estimate is a nonrandom number. This observation leads to three specific desirable characteristics of an estimator: unbiasedness (II lack ofbias}. in fael. on average. you would get the right answer. There are many possible eSlimators. .. the estimators Y and Y\ both have sampling distributions. It is reasonable to hope that. Th us a desirable property of an estimator is that the mean of it::.1..y. such as Y or YI .. this question ca n be phrased more precisely: What are desirable characteristics of the sampling di stri bution of an estimator'! In general..y denote some estim ator of ~)'.Ly. A n estimate is th e numerical value of the est imator when it is actually computed using data from a specific sample.. where E(jly) is the mean of the sampling distribution of p. but it is Ilot the o nly way. I Eslimalion 01the Population Mean 67 ESTIMATORS AND ESTIMATES An eslimtllor is a function of a sample of data to be drawn randomly from a pop ulat ioll . sampling distribution equals ~)'.y) :: Joty. 1 Estimators and T he ir Properties The sample average Y is a natural way 10 estimate j. if so. and erficiency. take on different values (Ihey pro· duce differe lll esti mates) from one sample to the ncxt. we would like an estimator Ihal gets as close as possible to the un known true va lue. ano ther way to estimate /Jy is simply to use the fi rst observalion. both are estimators of ILl" When evaluated in repeated samples. Unbiosedness. of which Y and YI are rwo examples. the estimator is said to be unbiased. co nsistency.l. in oth er wo rds. 3. fly is biased . Y and Y. An eslimutor is a random variable because of randomness in selecting the sample. y is unbiased if £(j. al lenst in sOllie average sense.
consistency. AND EFFICIENCY • p.h t you c hoose between the m? O ne wa) to do <. Sla ted more precisely. when th. Suppose you have Iwoca ndidate e~ lim a lo rs.2 ~. distribution. Variance and efficiency Properties of Y H o w does Y fa re as a n estimator o f lLy when judged by the three criteri a of b ia~. Jf iL) has a sma ller variance than il y • the n jLy is said to be more efficient than ill" The terminology "efficiency" stems fro m the not ioll that. Bias. l::P' ) = ~y.al of the true villue J. Let it y be nn e~timator of J.f.4) approaches I iI~ the ~a mpl c size increases.y is tha t. As sho wn in Section 2.l arisi ng fro m random \ ariat ions in the sam pl e is vcry "mall.lIld iiI b~' pick.. The n [A. and efficie ncy'! Bias and consistency.'Y a re summaril. BIAS..0 b t ochoo~e the estimator wit h the tightest sampling. jJ.5. y. A not he r dcsira ble prope rty of an estima to r ji. r is consistent for lLy (Key Con cept 2.2..y) < v8r(i~y ). • Let jiy be Ilno ther estimator of p.y is a consistent estimalor of IL)' if it y ~ p.0. Consistency.: sa mple ~ i z~ is la rge. ~. if itt hilS a sm alle r variance than iLl" the n it uses the informatio n in the da ta more efficie ntly tha n docs .Ly.l. iI desira ble prope rt ~ of ~ )' is thilt the probability that it is within a ~mall inter\. and dficilo!nl.68 CHAPTER 3 Review 01 Sloliuics KEY CoNCEPt 3.ing the e:.timator with the §milllcst vari· ance.r' Then: • TIle bius of it)' is E(~y) .)" and suppose thn t both jJ. the uncertai nty about Ihe \lIl ue of J..6).ed in Key Concept 3.y and iii' orc unbiilsed.y and ILy. • ity is an unbiased estimator o f Ji. CONSISTENCY.l. co n sist~ney.p.y if £(~r) = p... Th..y is said to be morc emcienl than ill' if var(fj. bot h o f which are unbiased . How mig. so Y is a n .: ~ mpling distribution of Y has a lready bee n exam ine d in Sections 2.5 and 2. that is. This ~ug gests choosing betwcl:n iLl .'y.
l. Y is a morc efficient estimator than YI so. Y is consistent..d . of Y to the estimato r Y 1• Beca use Y 1• ••• • Y" are i. we need to specify the estimator o r estima tors to which Y is to be compared. t E 5timotion 01 the Population Mean 69 unbiased estimator of J. Y has a larger variance th an Y lllU S Y b more efficient than Y.5. (31) whe re the number of obse rva tio ns 1/ is assumed to be even for conVt:o ience. the \'ariance of Y is less than the vari · ance of YI: that is.3 and is pro\ en in Chapter 5. the mean of Ihe sampl ing distribution of Y 1 is F. Thus.3. Th is resul t is sta ted in Key Concept 3. the variance of Y is (T ~. The comparisons in the previous two paragraphs !>how th at rhc weighted averages YI and Y have larger variances than Y. Y 1• and Y have a common math \!mlHical structure: They an~ weighted averages of YJ Y".ll'. Y is the Bc!>t l.". From Section 2. accord ing 10 the cri terion of efficie ncy. for 11 === 2.inear Unbiased £ stimalOr (BLUE ): Ih'lI is. Y i ~ consistent. < ••• . The estimators Y. . Said differ· o ( till unbiased esti mators that arc 'Acighted averages of Y I cn lly. il is the most effi cie nt (be'll) estimator among all estimators Ihat are unbi ased and are line.l r functions of Y1...y and its variance is var(Y) = 1. Y should be used instead of Yt. 11 ). th at Y is a more desirable estim ato r than Y 1• What aboul a less obviously poor estimator? Consider the weighted ave rage in which the obst:rvations are alle rn ately weighted by and~: Efficiency. because var(Y) t () as n + 0(1. In fact. < •••• Y .." llle estimat or Y I might strike }OU as an obviously poor estimator. . Y".. llle mean of Yis p. i 1 1 3 Y = Ii ( 2 Y1 + IYl 1 3 I 3 + 2Y3 + I Y• + . Simi larly. . I " (Exe rcise 3.25(T~.niance is V3T( Y I ) = O' ~.6) sta tes thaI Y 4 }Ly.ly. the law of large numbers (Key Concept 2.I I1. that is. However. + 2" Y"1 + 2" Y").( Y I ) = My: thus YI is an unbiased estimator of J.The sample average Y provides the hcst fit 10 the da ta in Ihe se nse Iha l the average squared differences between the observations and Y are the smallest of aLI possible estimators.why would you go to the trouble of co llect ing a sa mplt: of n observations only to throwaway all but the firsl?and the con ce pt ot c rricie ncy provides a (o rmal way to sho. What can be said about the effit:icney of F! Because efficiency entails a comparison of estima tors..i. We start by comparing the efficienc). these conclusions reOect a morc general result: Y is Ihc most efficient estimator Y".y.s the least squares estimatoro{J. Thus Y is unbi ased and . it s v.
2) as small as possible. then var(Y'} < var(.M . Because most empl oyed peo ple are at work at lhat hour (fl a t siuing in the park!) . Y is the most efficient estimator of J. on the second Wednesday of the month ..rl.y be an estimator of Py th at is a weighted average of Y.70 CHAPTER 3 Review of Statistics KEY CONe"'! EFFICIENCY OF Y: Y IS BLUE 3.. The estimator m that minimizes tbe sum of squa red gaps Yj . .. Because m is an estimator of £(Y) . yo u can think of it as <l prediction of the value of Y/.2).. Suppose that. as is done in Appendix 3.i.Ly among all unbiased esti· mators that are weighted averages of Y. a" are nonrandom constants.Ly. the unemployed are o verly . The Importance of Random Sampling We have assumed that YJ ••• • . (3 . a statistical agency adopts a sampling scheme in wh ich intervi ewers survey workingage adults sitt ing in cit y parks al 10:00 A.. Y n are i. _. to estimate tbe montMy na tional unemploymen t rate. p. One can imagine using trial and error 10 s.. you can use algebra or calculus to show that choosing m = Y minim ize!> the sum ot' squared gaps in expression (3. A lte rnatively.In in expressio n (3. Yw :0 Consider the problem of finding the esti mator m that minimizes " L (Y. draws.uy) unless f. The sum of squared gaps in expression (3.2. that is. This assumption is impo rta nt because nonra ndom sampling can result in Y being biased . If p.y ~ ~~_ l aiYj' where al ' " . that is. •.olve the least squa res problem: Try many vaJ ues of m um il you are satisfied that you have (he value that makes expressi on (3. .y is unbiased .2) which is a measure of the tot al squ ared gap or distance between the estimator m and the sample points. • Y".m can be thought of as a pre diction mistake.50 that Y is the least squares estimator of f. i"'l 01)' . .2) can be th ought of as the sum of squ ared prediction mistakes.2) is called the Jeast squares estimator.t y = Y. Thus Y is the Best Linear Unbiased Estimator (BLUE). such as those that would be obtained from sim ple random sam pling. so that the gap Y.3 ~ Let fJ.
. are taken up in Section 3. CUrTent Population Survey (CPS).4..ilerary Gazelfl! puhlished a poll indicating thai Alf M. as a landslide.2 Hypothesi ~ T ests Conce.. T his bias arises beca use th is sampling scheme overTepresents. the sur vey it uses to estimate the monthly U.or oversamples. The was right Ihnl the elect ion . Do you think surveys conducted over the Inler net might have a similar problem with bias? represented in the sa mple.3. unemployment ra te.. 3. but the " La ndon Wins!" box gives a realworld example o f biases introd uced by sampli ng thaI is ~ ot ent irely random. and an estima te of the unem ployment rate based o n this sampling plan would be biased.2 Hypothesis Tests Concerning the Population Mean Many hypOlheses about the world around us ca n be phrased as yeslno questions. the estimato r was biased and the G(Il(!lte but it was wrung about the winner: Roosevelr won by mad.ning the Population Meon 11 S hortly before the 1936 Presidential election. 1 includes a discussion o f what the Bureau o f Labor Statistics aCllIally does whe n it cooducts Ibe U.. the in 1936 mllny households did nOI have cars or tele I. This section describes lesling hypolheses concerning the population mean (Does the population mean of hourly earn in l!..S. D o the mean hou rly earnings of recen! college graduates equal S20lhour? Are mean earn ings the same for male Clnd fe male college grad ua tes? Both these ques tions embody specific hypot heses about the popula tion distribution of earn ings.S.S20?). HVDJHhesis tests 109 two populations (A re mean earnings !be s m c Mr me l' 2nd p evus?. by a l a n d~ l ide57% to 43%.! an 59% to 41 %! How could the Gazelte have made such a big mis take'l The Ga~elle's sample was chosen from tele· phone records and automobile registration files.. Because the telephone survey did not sample randomly from t he Gll~e(l(! popuhuion but instead undcrsampled Democrats. This example is ficl"itious.and those that did tcnded to be richerand we re also more likcty to be Republican. the unemployed meml"le rs o f the populatio n.s. It is important to design sample se lection schemes in a way that min imizes bias. • . Landon would defeatlhe incumbenl. The statistical challenge is to aoswer these questions based o n a sample of evi dence. u. Franklin D. phones. Appendix 3. Roosevelt.s eCIUOl I.. BUI embarrassing mistake.
colle ge Snlduates earn S20/ho ur consti tutes a Pi!!! b YQQ1bo is gbm!! the oopula tion distri bution of ho urly carnin So Sta ted ma thematically. tha i ho lds if the n ull d oes no t...Y.o. because It a llows £(Y) to be c lthe r le ')s than o r greate r tha n ~) :!l. sta tistica l hypothesis testing can be posed ~S. .:value _ I c.!l.~ \ ~~s.. For this reason. o n average in the populatio n. _. 11 p..D iffe rt!nccs between Y and .. ~ I 20 III b qu a bon (3 J).I.l. ../.3) IAA h. t ~ ~V . aod these a re d i.u):o can arise because the true t ca ll in fact does not e ual ./4 e as t!ither rt!jccti.. (3. a n y give n sample. The aliem. the sample average Y mrety be exact ly cqua l lo the hypothesized va lue JL Y. called the IIl1erllatit'e h )'p9Ihc'j:is.72 CHAPTER 3 Review of Statistics Null and Alternative Hypotheses "m e starting point of statistical h. The pro bl em facing the of ~taLis ti cian is o n addi ti onal evidence.. y. . called the Dull hypothesis. •. ta kes on a specific value.()o The null hypothesis is de noted Ho and thus is inc tile Hri E(Y) ~ ~"" (3. u~pothes is or fai li ng to do so. the conjecture that. ..ltl ve h ypothesIs spi:<:llies whalls true If tIle: null h ypotheSIS IS nut Ibe most ge neral alternative hypothesIS is thai E( Y) fJ. For c xumpie. t en e nu ypo le IS IS a c .~ _ . Ihis is called a 1wo sided alternative hYR9/hni.O ::: r r JJ. H ypothesis tcstinp enta ils using data to com. de noted by 1Ly. (the Dull hypothesis is false or mea n equa S fro(the nul] hypothesis IS true Ul Iffers from ~ y'o because of ~iII I .ypotheses test ing is specifying the hypolbesis to be tested. pare the null hypothesis to a second hypothesis. if Y is the ho urly e arning o f a ra ndomly selecte d recent co cge gra un e. IIdll iljpGtI1ESa IS tliJi population mean .scussed la te r in Ih is set:tion. ~..The twoside d alternative IS wnHe n as * H I: £( Y) "* JLy'o (twosidt:d alternative).:. £(Y).J .4) O neside d a l1 e rnativcs are a lso possible..
. IS pva ue IS sma . ~ That is. Thepva lue is the leas t as different (rom POPU [:l tIO value of $22./ is consiste n t with the null hypothesis... ingasfallSIIC31 caslasa versetol cnu y t sisa I comute In your sam Ie.s calcula lio)involves using the da ta to compute the pva lue of the n ull hypothes is. AHhougha sample of data canno l pro vide conclusive e vide nce about the null hypothesis.24 by pure nmaom samp im ' 120 (Ihe esis is true.5) /JrI \ . (Tt ). the pvalue js the a rea in the ta ils of the distributio n of V under the null hypo thesis beyond IYIIe/ . suppose tha t.5" pl~ would have been drawn if the null hy  p value = PrHJI"V .ol· If the p value is large." ~ . " k• A h mpme t e p va ue ry 0 now the sampling distribu tion of Y under th e null hypothesis... is the r .II. .the pull hYpothesis in a wa y thaI accounts (or §ampUrg unserlaint)'.' ue IS I c probability of drawmg. in your sa m ple of recent college g ra u ate~ c r'fI • P"f1AIfAJ: but ion und /) fJ'''''.24.6.~ . Thi..eon 73 is im possib le 1 distinguish betwe en these T possibilities with 0 WO certainly..Y al leas! as far in the tails of its distrih si~ s sam Ie avera£e ou actually computed. ..LlI5.o ) · (3. when th Ie size is lar 'e the sam li n dlSlri Imate by a normal dist ribution .say . nder the nu Yl norma l d ist ri bu tion is . but if the value IS small. then the observed value y uc.'. ~"The pvalue. it is possible to do a pro babilistic ca lculation tha t permils tes ting V I" <>a'''I'I.~.. is no . when the sam Ie size is small this dlslnbutJon is com licated . A s d iscussed in Section 2.u.2 Hypo~is Test~ Concerning the Population M. .."1 _~~( average wage is $22.. o r exa mple.'.3. it the o rem.. also called the signifiamce robabilih'. "':~_ 'sel f _A' ''~ A. In the case at hand. the p.y'(~ so unde r the n ull hypothesis Y is d istributed N(IJ.it ..)~o..~Y. ..ol > IV"" . H ow v r accord in 0 the . assumin the null h ' thesis is correct.u.
then under the null hypothesis the sampling d istribution cry is N(JLy'o.1 (that is. is k nown . Thus.U~1 under the null hypolhesis." (3. unde r the null hypotbesis.fl va lue of Y 0 ."'~i).74 CHAPTER 3 Review of Statistics FIGURE 3.1 Calculating a pvolue The JM'Olue is !he probability of drawing 0 voIue of Y tOOt differs from ""'(0 by at Ieo~t 0$ much os Y "".. (Y JL}. Calculating thepValue When Uy Is Known Th e c:llcu la lion of the p value when U y is known is summari7. has a standa rd normal d istrib ut ion . Thus the pyolue is the ~ 51ondord normal tail probabilIty oul~ide tYOC1 . I).rr)/ u.1. val ue is t he probability of o b ta ining ." .~r)/ar' = !'f{O.Th i s largesample no rma l approximation ma kes it possible III compute the pvalue without needing 10 know the population d istribution of Y.ln large samples.. l~ lUi! prob abil it)' in I.ro. the pva!uc) is i ij pvnluc  P'HW ~:Y'I > IY °i) .6) . bg wcye r d COrud o n whether a1.. lT y)' whe r~(:~.'. the stan darJlI~cd ve rSIOn o f Y. as long us the samp le size is large.2q{IY.0.lriboled """.. If the sample size is la rge.gure 3. The details of the cakuhll ion . Y is di~tributed N(p. ·'u/.jA..ul I o ". is di. so (V' . r 1" """I "y z where u~ = (T ~/ Il. "" O"~ / " . The p .cd in Figure 3. I) _ f oli' ~ It I.U)IcT} .
l.in remain  is.1 8.n. The Sample Variance.I in Eq uation (3.o) / u y.I )(T5 Di viding by n .3. is the sou. e reason for th~ fin. ut The formula for the pvaluc in Equalion (3.I V)' . the avera s the divisor" I instead of II. In practice. pie average of (Y.ion. . and as a res ult s~ is unbiased.JJ. ThaI is.I ~(Y . t he lH' :.l.6) ~kgepd § 99 the "niaper Of !he ·0 ulalion distribution.•f = _ 1_ . and second.i = 1. . A n exception is w en i IS mary so !ts ISln UUon IS e rnou I. .f freedom 0 l. £!( Y( y )2 J [(II 1) / lIlo} Th us. wo 10 xercise 3. • ui.t:'L~I (l'. .. is . and Standard Error 'ille sa mple variance $~ is an estimator of th<!' population va rianc~ (T ~: the sample ~ I<lndard deviation s} is a n estimato r of the popul ation siandmd deviation (Ty: and the sta ndard error of the sa mple average Y is an estimator of the standard devia Lion o f the sampling d istributio n of Y. Sample Standard Deviation.s the area in the tails of a standard norm lll distri hutio n outsid~ == (Y""  ILy.f~.t modificatIon y is unknown and thus must be estima ted: the natural estimator of IL )' is Y. nle sumplc "ariance. . in which case the van ance ISdele rimned bv the nul! h)'QQ!h£§i§' Set' EpllalipQ (2 7l ! Because in general must be estimated before the [Iva lue can be compu ted. D ividing b" 2 ! in 5'11 1'$11 til:a) '.L.ILy)2. £(Y .. we now turn to the problem of es tima ting (T ~. The populat ion variance .. (3.J. The sample variance and standard deviation . is the average value of (Y . .'lrr root of the s..yfl = (II .! ios'C o g f by II is Ih O' nO " mati ng J. .y)2 in the population distribut.weed of n is called a degrees or freedom correction: Est'f' matin the mean uses u some of the IIIlo rmatlo ll that uses u . Yl! " £(( Y. one "<. Similarly.y )2.lOlDlc= variance. with twOmodifications : Firsl. the sample vuriantt is th e sam.7) instead of ncoI" " rects for this small downward bias.I. this variance is ty iennv unknown. n .Iluc .2 Hypori'Iesi$ Tem Concerning the Population Meal 7S where <1'1 is the standard norma l cumulative distribution fUllc tio n. e data..y by Y inl roduc ' . :~rce 0 .'flle rea· son fo r the second modificatio ndividi ng bv " ..7) l nc sample ~tanda rd deviatjon h·. t±r is repl aced by Y. c formula for the sample variance is much like Ihe formula fo r the popula· ti on variance. so th a t 0: 0 egrc£s o reedor e ' rees .
d .p )/ 11 [see E q ua lio n (2. The resull in Equal ion (3. Y. The fo rmula for fhe sta ndard error also ta kes o n a simple form that depends o nl y o n Y and 11 : S£(P) = v' Y (l I IT . 1. itively.9) In other words. SE(Y) = (r y_ That is. . Whe n YI •. The estimator of a y • Sy / VIi. the p...i. The stan dard e rror of Y is denoted by S£(Y) or by uy.. Y~ are i. u Y) Calculating the pValue When Uy Is Unknown Because s~ is a consiste nt estima tor of u ~.9) is proven in A ppendix 3. aod Y.. (3...)2911'5' hi] illite.76 CHAPTER 3 R eview of Stotisti c~ KEY CONCEPT THE STANDARD ERROR OF Y ~ 3. must have a fin ite fo urth mo me nt. 15 consistent IS that It IS a sample e 'sin Ke awof lar: e numbers.3 unde r the assumptio ns tha t Y. de no led by SE(P) o r by y (the . th e reason that s~. in o the r words. 5£(1') = up = sy/'Vii .4 The sJaPdard error of Y is an est jmatgr of the §1ftpda rd deviatio Q of Y. The standard error of Y is sum marized as Key Co ncept 3.. is cal led Ihe standard e rro r o f Y a nd i!.8) Consistency ofthe sample variance. . Yn are Li.9) justifies using sy/Vii as an estima· to r of a y .. dra ws from a Be rn o ulli distribution with su ccess probability p. bas a finite fo urth mo me n!: thaI is. The standard error ofY.d..i. E quati on (3.ion (3... wh en U y is .Y" are i. ~" over the symbol me ans th a I this is a n esriml'l lor of u y ) . Because the standard deviation of th e sampling dis tribut ion of Y is CTr = a ylVn.value can be computed by replac ing u y in Eq ua t. But for st to obec le law o f lar' 4\ must be i 6. (3.7)1. the formula for the varia nce of Y simplifies to p(1 . (y.4. est imator of th e population varil'lDce: 111e sample variflnce is a consiSle lll sf ~ u~. When Y " . the sample varia nce is close to the population va ria nce with high probability when 11 is large. C'I VJ .6) by the standard e rror.d.
The value of Ihe Ista tistic is ( fl<1 = (22.I). t is approximately distributed N(O.s calculated using the fo rmula 77 pv.}Ly...... whe n " is largt:. E(y) . Y II are jj. 11) In gene ral.2.2 HypoIhe~i ~ T COf"ICefning the Pop!Jlalion M.06. . Le t I~(.. or 3.7).}Ly.039.9%. Fro m Appe nd ix Ta ble I.e II..o r = S£(Y) (3..u SE (Y) (3.20)/1. The n the sta nda rd e rro r o f Y issyl vn "" 18.3.va lue ~ 2e1l( .. is $20Ihour. assuming the null hypothesis .""'. Acco rdingly. I de note the valu e of the Isia tistic actua lly computed: or r". (3.l 2) The formul a for the p value in E quati o n (3.s} is close to (1~ with high probabilit y.'1 y . a test st atistic is a statistic used to perfo rm a hypothesis test. The lStatistic is a n im po rtant example o f a test statistic. under the null hypothesis. lue = 2<1>  sEc'Y) ( I l'~' ./J. suppose tha t a sample o f II "" 200 rece nt college grad uates is used to test the null hypothesis tha t tne mean wage.l' .. 13) Accord ingl y. Thus the distribution of the /Sta tistic is approxim ate ly the sa me as the distributio n of (Y . tbe pvalue can be calcula ted usin g p. the lstutlslic o r lraUg... Thllt is.10) The tStatistic The standardized sample ave rage (Y .'I).28.64 .yu)/a V' wh ich in turn is we ll approxim ated by the sta nda rd norma l d istri butio n whe n" is large beca use o f the ce ntra l limit tbeore m ( Key Conce pt 2.. LarResample distribution ofthe tstatistic.14) As a hypothe tica l e xa mple. Y .64 a nd the sa mple standa rd deviatio n is $)' = S1 8. 1) fo r la rs. The sa mple a ve rage wage js Y"CI = $22.2.(6) = 0. Whe n " is large.14 .p. (3.28 .y. . the pval uc is 2~( .ean esh unk nown a nd Y I _. the pva lue .14 / \1200 = 1. (J.d .o)/SE(V) plays a cent ral role in testing sta tistical hypotheses and has a specia l name.1 0) ca n be re written in te rms the Istatistic.
l cs ting hypo theses using a prespecifie d significance level docs not req uire computi ng pvalues. If you chouse a prespeci .:d fro m the sample j~ greate r {han 1.that i ~. ~ 'F" ~ (forcxa mple_5%). hut in many prac tica l sit ualions th is prefe re ntial treatme nt is appropria te.96.  p Reject I/o iftt'''''1 > 1.15) is 5%. so th e hypothesis is r~jected a t the 5% level. Hypothesis Testing with a Prespecified Significance Level When you undertake a statistical hypothesis lest. of ~. Thus. A1thoug h performing the test with a 5% signi ficance level is easy.9%.5 . 1) distribution.78 CHAPTER 3 Review of Sloli~tics to be true. In the previous exampl e o f testing th e hypothesis that the me a n ea rnin g o f re ce nt college graduates is $20. T11.15) 10at is. you clln make fwO types of mis takes: You can incorrect ly reject tbe null hypothesis whcn it is true. the probabilit y o{ e rro neously rcje("tmg the null hypothesis ( rejecting tile null h)'Polhesis w h~n it is in fact true) is 5% .: e»ceeds 1. the probabili ty of obtaining a sample 3vcragt! atlCasl as different from the null as the one actua lly computed is 3.05.o at the 5% sig nificance le vel. This (camework fo r testing statistical hypothes~s has some s pecialized termi no logy. th e (sta tistic Wil') 2.>ugh. 1f th e lest rejects a t the 5% ~ig oiJicance leve l" tb e po pu lation mean fJ. the null the n you wi ll r~ject the nul! hYPO l h~si s if and on ly if the p value is less th an 0.06. and the rejection region is the va lues o f tbe Istatistic outside :t 1.96. t hen unde r the null h y po th ~s i s the (statistic has a N(O. r. . This a pproach gives prefere nt ia l trealme nt to the null hypothesis. summ arized in Key Concept 3. or you can fa il to reject the null hypothesis when il is fa lse.. reject if the absolute value of tllC (statistic c()mpu{. Hypothesis tests can be pe rformed wi thout computing the p valuc if you ar e willing to specify in adva nce the proba· bility yo u a rc willing to rolerate of maki ng the first kind of mistake .96.96. repon ing o nl y whe ther the nu ll . 'me signi(icance levt:l of the test in Eq ua ti o n (3.i. If I I is large e n.y is said 10 be sta tislica ll y sig nificantly dif fe rent from J. (3. the critica l value of this twoside d te!.t is 1.96.
and it t~' pc U error. "'.'.::st.''Y''''''' ""1 THE TERMINOLOGY OF HYPOTHESIS TESTING A Statislica! hypothesis leSt can make 1\\10 types uf mislakes: a type I error."'. 'nli:: critical "alue o f the teS! sta tistic is the va lue of the statistic for wltieh the test j ust rejects the null hypothesis a t the give n significance level..ervcd.3. and the va lues of the tt'l)l statistic for which it d oe s not rcjectlhe null h)'pothes. in which the null h ~rpot hesi s is rejected whe n in I'aet it is true. by random sampling vnriat io n. The probability tha l the test actually incof reelly rejects the null h y pOTh es i ~ when it is true is the size of the test.1 'Yo. For e xample.".. assuming tha t Ihe Ilull hypolhl!sis is correct. the pvalue is the smallest significance level at which )'OU can reject the null hypothesis.. The p re~ p ecificd rejec tion probability of a statist ica l hypothesis test when the null hypothesis is true that is. Eq uivalently. • "/''l'"Y. lind the prob at'lilit)' thai th e [cst correctly rejeclS the !lull hypothesis when the al tcrn ative is true is the power of tlli~ tr.:: is the proba bility of oh ta inin g a te st statistic. a t least as adverse (0 the null hypotht.eoo 79 '. What significance level should you use in practice? In many cases. in which the null hypothesis is nol rejected whe n ill fuet it is false. The sct o f values of the test statistic for which the test rej ~cts the null hYP()l hesis is the rcjl'C: lion region.s is th e . In some legal sc ll ings the significance level used is 1% or even 0. a very conservative standard migh t be in order so that consumers can be sure thOl the drugs available in the markel actually work.1 of Ihe tcst. to avoid Ihis sort of mistake.· ticians and econome tricians use a 5% si gn ificance level. then one would wan l to be quite sure that n rejection of the n ult (conclusion of gui lt) is not just a result of random sample varia tion . if a government agency is considering pCfmilling the sa le of a new d rug.5 hypothesis is rejected at a prespecified ~i gni fieance level conveys less informa tion than rcponing the pvalue. The p vnlur.lcccptanc:c region. the prespccified probability of n type I e rroris the significance le.sis value a~ is the statistic ac tually ob<. So metimes a more conservative signi fica nce level might be in order. yo u would incorrectly reject the null on aver· age once in 20 cases. KEY CONCEPT ~ 3. Similarly.//. stat h. and the null hypothesis could bc thai the defe ndant is not guilty. . If you were to test many !>latistieal hypotheses at the 5% level.2 Hypothesi~ Test~ Concerning the Population M. legal cases sometimes involve sla tislical evidence.
Ly. construct the istatistic in Equation (3. if l t~1 1 > L96)." J CONe"" KEY 3. 1.13)J. but rather th at graduates earn mo re than nongralluates.. 2. OneSided Alternatives In some circumstances. In fact . Compute the standard erro r of Y. has a cost: The smaller lhe significance level.o (onesi ded alternative).'I. TESTING THE HYPOTHESIS E(Y) = My.Y. so th e reI· evant alternative to the null hypothesis that earn ings are the same for college grad· uates and nongraduates is not just that their ea rnings differ. the larger Ihe critical value. Many economic and policy application s cao call for less conservatism than a legal case.14)]_ Reject the hypothesis at the 5'"1/0 sig nificance level if the pvalue is less than 0. in the sense of using a very low signific ance leve l. .O. * .05 (equivalently.16) The general approach to computing p . the alternat ive hypothesis might be lhat the mean exceeds . For example.6 summarizcs hypothesis tests (or me population mean against the twosided alternative. one hopes thai education helps in the labor market...8)1. PTER 3 Review of Statistin "O_. so a 5% significance \evel is often considered 10 be a reasonable compromise.O AGAINST THE A LTERNATIVE E(Y) Mr.6 ".o . The p value is the area under the standard norma l distribution to the right of the .. 16). Being conse rvative. Comp ute the p value [Equation (3. the most can· servative thing to do is never to rejecllhe null hypothesisbut if thaI is your view. with the modifi cation that only large positive values of the (statistic reject the n ull hypot hesis. then you never need to look at any statistical evidence. (3.values and to hypothesis testing is the same for onesided alte rnatives as it is for twosided alternatives. 13).• 80 CH . rather than values that are large in absolute val ue. Key Concept 3.0'1/". 3. Compute the {statistic [Equation (3. the lower the power of the test. This is ca ll ed a o nesided allernathc bypothesis and ca n be written HI: E(Y) > f. Specifically.S£ (Y) IEq uation (3. for you will never change your mind! The lower the signifi cance level. and th e more difficult it becomes to reject the nu ll when the null is false. 10 test the onesilled hypothesis in Equation (3.
.« Jtr. Such a set is called a confidence set .u r. called a cOlllidence interval.Now pick anothe r a rbitrary value of JAr.o is no t rejected a t the 5% level. Tha t is.645.lblc property: The pro bability Iha t it con tains the true value of the population mean is 95%. . based on the N(O. A bit of clever reasoning shows that this se t of values has a rc markl. so that the coo fid e nce set is an interval. and write down this nonrejected value 1'. is p value = PrJ/a(Z > .0.o a nd test it .t)  I . (3.O t he n the discussion ' of the previous paragraph applies exce pt that the signs a re switched.o by com puting the {statistic. the p va lue. you can tell him whetber his hypothesis is rejected or not sim ply by lOOking up his number o n your handy list. if you cannot reject it.. Do this again a nd a gain: indeed. the 5% rejection region consists o f values o f the (statistic less than . H e re is one way to construct a 95% confide nce set fo r the popul atio n mean.16) concerns values of I'. sible to use da ta fro m a rando m sample to construct a set of va lues that con ta ins the true popu la tion mean JJy with a certa in prespecifi cd probability. fo r exam ple. The o nesided hypothesis in Equa tio n (3.645.645..o agains t the alterna tive tha t Jt)' .o.17) The N(O . write this value down o n your list. If instead the alternative hypothesis is that E(Y) < .):0. if il is less tha n 1.. The confidence sct fo r fJy tu rns out to be all the possible vn lu ~s of the mean be tween a lower and an upper limi t. and the prespecified probability tha t JI. Continuing this process yields the set of a ll va lues of the populatio n mean tha t cannot be rejected al tbe. This list is useful becau~e it !'>ummarizes the set of hypotheses you can and can not reject (at the . it is pos <. .3.u Y. I) critical value for a onesided test with a 5% significance leve l is 1.<1>(..5% level by a twosided hypothesis test.5% level) based on your dat a: If someone walks up to ).1. Ihis hypothesized va lue . Begin by pick ing some arbitrary va lue for the mean: ca11 this 11y.3 Con ~dence Intervals for the PoptJlo~on Mean 8t calculated tstatistic.'). I) approximation to the distri bution of the (statistic.3 Confidence Intervals for the Population Mean Because of rando m sampling error. keep doing this for a ll possible val ues o f the pop ulation mean.96.ou with a specific numbe r in mind. However. Test the null hypo thesis tha t JAy = I'y.. it is impossible to learn the exact value of the population mean of Y lL .y is conta ined in this set is called the confidence le\·el. 3... ' m e rejection regio n for th is test is all values of Ih e (statistic exceeding 1. ing only the info rmation in a sample.yexcceding I'y.
Although onesided confi . This met hod of constructing a confidence SCI is im practical.13) . ] n 95% of all samples.5 (although we do nOI know this).%S£(V) j.% x 1.l. According to the formula for the Istatistic in Equation (3.15 ).yas nu ll hypotheses.7 A 95% twosided confidence inlervaJ fo r Il y is an inte rval constructed so that it contain s the true value of Il l' in !}5% of all pt)~sib1c random samples. Thus. $25. 90% . of those values within :':" J. The 95% conUdenC(~ interval for mean hourly earn ings is 22. y that are not rejected a l Ihe 5% leve l consist.I. 1.64 :: 2.1.5 1 = [$20.5 has a N(O..5 at the 5% leve l is 5%. ier approach.5. When the sample size II is la rge.. [J ut bccau~e you tested al i possible values of the population mean in constructing your set .7 summaril.64 and SfTV) 1.82 CHAPTER 3 Review of Slothtic~ 3.64 .et uf values of Il r that cannot be rejected hy a onesided hypothesis test. if n is large.es tb i~ A s an example.96 stamiard error.::!:.2:< = 22. Ill" = 21.y ::= 21. That is.):u is rejected at the 5% level if it is more than 1.5!{S£(YH· 90% confidence in terval for Il) = [y ::!: 1. you will correctl y (Iccep l 21. fo r 11 requires you to test all p os~ible VAlues of lJ.28.96SE(y) of Y.y. K~y Co ncept 3.t:. Thus.5: Ihis means that in 95% of all sampl~ your list will contain the tr ue value o f Ily . l11i s discussion so far has focused on twosided co nfidence in tervals. consider the problem of constructing a 95% confidence intl' r val for the n K~an hourly earnings of recent co llege graduates using a hypothetical random SAmple of 200 recent college gmduatcs whe re Y = $22. 95 % .64SfiY)}. Fortunately there is a much e. O m' could instead construct a onesided confidence interval as the . Su ppose the true va lue o f J1.) is 21.l. a 95% confidem:t: interval for Il ) is Y . a trial val ue of J1. Y+ 1. and the Istatistic test in g the null hypothesis Ily "" 2 1. Then Y has a normal d istri bu tion centere d on 21. the values on your liq co nstitute a 95% confidence set for J. 99% confid ence inte rval for Jly "" TIle clever r e asoning goes li ke t his.96S£(Y) :s: Jll :S approach. a nd 99 % confidence intervnls for Jly are CONFIDENCE INTERVALS FOR THE POPULATION MEA N ')5 % confidence interva l for Jly = (y =.(Y). in particular you tested the true VAlue. Thu~ the set of "alues of J.13.96SF. the probability of rejecting the null hypothesis J1. I) d istri bution. a\\a~ from Y.5. (Y ::': 2.
.. .· of cltrnings for men. . according to the central limil theo rem. for men and Y.fllm )' where ""}" is the popUlation vari anc.. lin using Y".3.Y. we need to know the distribution of Y".. H I: 11". .. (f . . Considc!' Ihe Il ull hypoth esis that earn ings for these two populations differ by a ce r tain amount .. dV' (3.Then an estimator of Po". approximately distributed N(JJ.11 .  11.18) with dn ::: O.Y. be the popula tion me an fo r rccently grad ua ted men.. ..p. sa y do.....radu (lIeu fro m college and let 11. men and " .. .. This sectio n summarizes how to lest hypothe!oes and how populalions.~ Recall thill Ym is..4 Comparing Means from Different Populations Do recent male and female college graduHles earn the same amount on a verage? This question involves compa ring the means of two diffe rent population distribu tions. :.. computed over all poss ible ra ndom sarn plesIhal il cOnlai ns the true population mea n. 3. fO const ruct confide nce intervals (or the d iffe rence in Ihe means from IwO diffe rent Hypothesis Tests for the Difference Between Two Means Let 11.)... be the mean hourly earning in the popuilltion of women recently g.... . is Y". dn "s.lhey are uncom mon in applied econometric analysh.. .4 Comparing ho\eons from Dilferent Populations 83 denec 1I11en'uh have application!) in some brunches of st8tistics. Suppose we have samples of II". Y. is appro:<i matcly distributed N(iJ. Simi larly. Leltbe !lample u"erage a nnual earnings be Y"... . Because these population means are unk no\\ ll. 18) The null hypothesis that men a nd women in these populations have the same earn ings corrc~ponds to Ho in Equation (3. . women drawn aT random IIOIn their popula tions. To lest the null hypothesis That IL"..JL. The cove rage probah ility of u confidence inlerv" l for Ihe POPUI<l lioli mean is the pro bability... for women..""q~/" ..". Coverage probabilities.IL.The n the null hypot hesis and the twosi ded a lte rna tive hypothesis ure 11& 11". they m U!lt be estimaTed (rom somple~ of men and wome n..
Y". (U~' ''''I) + (0''. and dividing the result by the standard e rror of '1".. and .' are known. recall from Sec· tion 2.L.) = \ nm / S2 m + n·.. II' . except that the statistic is computed only for the men in the sample. and Y. 2. then the lest is modifi ed as outlined in Section 3. then th." where .is Istatistic has a standard normal distribution..w.'1. = do. and a test with a 5% sig.96.8.F". tbe n tb is approximate normal distribution can be used to compute pva lues for the lest of the null hypothesis that J. As before. ..: t = (i'm. they can be estimated usi ng the sample variances. simply calculate tho:: Istatist ic in Equation (3. Y is distributed .val ue is comput ed using Equation (3.f.)). .~. the null hypothesis is rejected at the 5% significance level if the absolu t. ~'" .20) If hoth Ifm and If... If the alternative is one·sided ra ther than twosided (that is.Lm .In practice.2 (3..L. > do). . these population varia nces a re typica ll y unknown so they must be estimated. s~ and . ..J. if the alternative is that "'.1 11...do ( SE( Y .I.nifi can ce level rejects when ( > 1. ) Istatistic or companng two means ... by subtracting the null hypot hesized va lue of J. For ex ample. is Y".. H u~ and 0'..J. Because the (statisric in Equation (3 .20) and compare it 10 Ih e appropria te critical value.: extends to constructing n confidence imcn'al for the difference belween the .. are large.. II' ~.: value of the I·statist ic exceeds 1.. Ih e .Y _ _ M. th ey arc indepe nde nt random va riables.J) . Also. CHAPTER 3 Review of Stoti)tlCl where O'~."'" from the eslimalor Y"..  SE(Y.. Thus. the pvalue of the IwosideJ test is computed exactly as it was in the case o( a single population: that is. Th us the standard error of Y".. arc constructed from differe nt randomly se lected samples.14) .20) has a standard normal distribution under the null hypothesis when tI". Confidence Intervals for the Difference Between Two Population Means The method for constructing confidence intervals summarized in Section 3. and If " are large.f.4 that a weighted average of two normal mndom vu riables is itself normally dislribUied . ) .65. however.7). To conduct a lest wi th a prespecifi ed significance level.. 17). is defined si milarly for the women.Y.. Because Y. (3.Y ) .. f . 19) The Istatistic for testing the null hypothesis is constructed analogously to the /Statistic fo r testing <l hypothesis about a single population mean. The pvalue is computed using Equation (3. is the population vAriance of earnings for women . is defin ed as in Equation (3. ..11"..
. If the lreatment is binary treatment. The diffe rence bd wccn the sample means of the Ire. then the causal effect (Ihut is. Specif ically..Umen! and camral groups is a n est ima tOr of the causa] effect of the treatmen t.  Y"" is less than 1.96standardcrrorsofY".Y.S.1::(Y IX "" 0 ). Ihe 95% twosided conridcnce interval for (/ consists o f those values lII' d within ± 1.. the causal e ffe ct on Y of Iremment level . or to a conlrol group.. which receives the exper imenta l treatment.21) 95% confidence interval for d = IJ.3.m }l . then we can let X = 0 dcnOlc fhe control group und X = I denote the treatment group. if the treatmen t is binary). m .: (3.E(YIX = x) . lllUS.  Y. The Causal Effect as a Difference of Conditional Expectations The causal effect of a treatment is the expected effecl on the outcome of interest of the treatment as measured in an ideal randomi7c d controlled experimcnL 1'11is effect can be expressed as the difference of two conditional expectations.Y. is (Y. where E( Y IX = x ) is the expected value or y ror the trealment group (which rece ives treatmcnt 1e\'el X "" x) in an ideal randomized controlled experiment and F:( YI X = 0) is the expected value of Y for th e control group (which receives treatment lc vel X = 0). is the difference in the conditional expcclalions. . the trcalmenl effect) is E(Y I X = I) . ..lhe causal effect is also called the treatment effect.E(YlX = 0) in an ideal randomized colllrolled experiment.fl.96. then randomly assign~ them either to a treatment group.5 DifferencesofMeons Estimorioo of Cousal Effects Using Experimenlal Dolo means.S DifferencesofMeans Estimation of Causal Effects Using Experimental Data Reca ll from Section 1. 3..2 tha i a random ized controlled ~x pcri mc nt randomly selects su bjects (individuals Of. Y".96S£(Y. which docs not receive Ihe {rcalme n!. 1.96'((0 will he in the confidence sel if fll:S.. RUI III s: 1.96 mea ns IhHI the estimated diUercncc. 8S Because the hypothesized value do is rejected al the 5 % level if lll > 1." contains an empirical investigation of gender differ· ences in cumings of US college graduates....: in hand. (/ = IJ. ). If there arc only two trealmenlle"e ls (that is.. entit ies ) from a populati on o f interest. W ith these form ula!. the box " The G ender G ap of Earni ngs o f Col · lege Gradu ates in Ih e U. morc generally. )" 1. In the context of e xperiments. .96 standard errors away from do.
male college graduates e arn more than female ("Ollegc grad uates. The difference I.16.37 0.$18.48 10.n 2.ulX! Surw:r ron dueled m Mardi of lhe f\l:lt yel r (for exlmple..33 19. the dati for::!O:M wen: collected in M a ~b ZOOS). 1996. in 2004 of the increase is not statistically significant :It the 5% sig· nificancc level (Exercise 3. L In 2004.:r yea r.73/hour to 53. in 2004 women C :UlI/illIlCti TABU 3.99 of 50. FirsLthc gender gap is large.79/hour in rcnl terms over th is sample. w~ Confid..31 :woo 2004 21. Sli P.3 1377 1300 1901 7. $21. f_ ' 8.03 8.521hour: howevc r. from 52.111us the estimate of the gende r gap in earnin~ for 2004 .99.22.29 2.39. An hourly gup of $3.31 "" ( S2. in 2004 DoH ars .52 21.. Earnings for 1992.40 2.56" 3.1 gives eSlimltlcs of hourly earnings for collegeeducated full time wo rkcrsagcd 2534 in the U nited States in 1m.60 16. but o\'er a year it adds up to $7 .72 18.70 8. 1992 10 2004.<1.1 @1I739).163..39 " 1592 y 17.21 18. this Mable or hll ~ it dimini~hed over time'! Table 3.99 ThdC estimaLes are compuled usintdaLI on III flill ume " 'orkers aged 2534 wrvcyed 111 the CllrrC"nt l'lIpul.52 .30 2. Y_.20 IU6 0.914 1. The results in Table 3.30 0. The a verage hourly carning..86 CHAPUR 3 Review 01Stotistk:$ The box in Chaptet 2. Ages 2534. aDd the sta ndard deviation of earnings was SS." .on a verage.3'P11901 + 8.73" 2.901 men surveyed was $21. 1996.!: 1.'J I .1 7). "The Distribution of Eam 1739 women surveYl:d was $1 8. . and 2000 we re adjusted for infla tion by pUlling the m in 2004 dollars using the Consumer Price Indel(.. and 2004. Third. as.. 2000.1..52" 0..\ suggest four conclusions.52 ( . What arc the recenltrends in this "gender gap" in earnings? Social no rms and laws governing gender discrimina tion in t he work place have changed 5ubstantially in the Uni ted Stales.. Is the gender gap in c:lrnings of college grad uate~ SJ. Ow.31 a sta nda rd c rror (~ V lO." shoW! that.90 " 1370 1235 l lR2 1739 Y  y. "'imfican!ly dlU crcmirom zero at Lbc ··1 % IIgniroana: te'·el. using data collected by the Curren! Popula tion Survey. wit h i~ I ings in the Uni te d Slates in 2004.1 Trends in Hourly Earnings in the Uniled Slote. Ihis gap is large if il is meas ured instead in percentage tcrms:Accord ing to the estimates in Table 3.. and 50 paid weeks pc..00 10.. the avcrage hourly earnings of the 1.'17).HJ4. and the standard deviation of ea rn ings for me n was $10. ~ .s of Working College Graduates. 12) .80" 3.52 might not sound like much..rKe .29 2. Y_l 1992 1996 20.3.96 x 0.47 ' 6..48. the esti mated gender glLp has ineTensed by $O.n ¥'S. w~ Diffto"• • Mo.~uming a 4Ohour work wed. Second.. The 95 % confi dence intenal {or the gender ga p in earnings in 2004 is 3.~O..
: wa y to ma ke collegeeducated workers.346 to PUt them into "2004 dollars. experi ments tend to be expensive.83 .83] among all full'lime earnings in 19')2 canno t be directly compared to earnings in 2004 without adjusting {or mfla tion..·as worth more than a dollar in 2004.1S3. experience. 1. wellrun experiment can provide a compelling estimate of a causal effect. In economics. the mean earnings for all collegecduc:lled women working fu lltime in 2004 was $21.521 $21. howclJcr.99).1) than it is for al! college graduates (ana men and women'? Does it reflect differences in choice of jobs'! O r is tht:re some other cause? We return to [hest: questions once we have in hand the tools of multiple regression analysis.12.33). and . b~' multiply fere nces in skills. in the sense tha t a dollar in 1992 could bu~' more goods and ~crvices !hun a doll~ r in lOOJ could.5 Difference~ol. For this reason. ethically ques {ionab le. in which some event . econometricians sometimes st udy "natural experimen ts:' a lso called quasiexperimen ts. ~ was $27. so a 95% con fidence interva l for the causa l effect can be constructed using Equation (3. so they remain rare. lyzed in Table 2. the topic of Pa rt IJ.6%: in other words. which can be tested using the 'statistic for comparing two means. the gender gap is smaller for young college gradu31t's (the group analyzed in Table 3.' in 19')2 and 2(Xl~ C()m why this gap exists. On. For Ihis reason.Means Estimation of Cousal Effec~ Using Experimental Dolo 87 earned 16% less per hour than lIlen did (53. the recent P<lSL The analysis does not.60 in 2004. To mak e earning.~ (CP1).. Founh. which corresponds to a gender gap of Estimation of the Causal Effect Using Differences of Means If the treatmen t in a randomized co ntIoU experiment is binary. TIlis empirical analysis documents Ihat the "gen a measure of the price of a "mMket basket" of consumer de r gap" in hourly earnings is large and has been goods and ~el"iiCC5 constructed by the Bureau of ubor Sta fairly stahle (or perhaps increased slightly) over the tistics. Does it arise from gende r dis parahle in THble 3. tell us CPI baske t of goods and ~rvjccs thai COSt SIOO in 1992 cost $134. while for men this mean lBecause ofi nflatioll. then the causaJ ed effect can be estimated by the difference in the sample average outcom es berween the trea tment and control groups. Ol'cr the twcl\'e ~'cars rrom 1m to 2004. !his adjustmell t is to uS<: th~ Consumer Price Indt . by 1.4): As reported in Table 2.73/ $20..12)127. 1992 earnings ~re infla ted b~' the crimination in the labor market? Does it rencet dif amount of (wendl CPt price inflation. A 95% confidence interval for the difference in tbe means orlhe two groups is a 95% confidence interval for the causal effect. the pric.. given in Equation (3.3. more than the gap o f 13% ~cn in 1992 (S2.a dollar in 1'192 . however.21) . in some cases. The hypothesis thatlhe treatment is ineffective is equi valent to the hypothesis that the two mea ns are the same.21. randomized controlled experiments are commonly conducted in some fie lds.: of the CPI market baske\ rose hy 34. difficult to administer. such as medicine.lhHI is. or education between ing 1992 earning:. Thus 24% [= (27.4.20). A welldesigned.
..2 through 3. it can be unrcli able if /I i..." provides an exam ple of such a quasiexperime nt thai yielded some surpri si ng conclusions.10). which Hpplies wh en th e ~ am plc size is large... using dala Yl ..tie is reliable for 11 wide range of distributions of Y if II is large. .22) where ~~ is gi\'en in Equation (3. The formula for thi s stlltiSlic is given by Equation (3. however. tfUt' [see Equation (3..LI. Y" .: in \\hich .. The exact distribution of the fstausticdepcnd. under ge neral wndilions the !Stalistie h~s a (tan dilrd normal distribution if the sampk size is large and the null h ypo t hesi~ i::.1 degrees of freedom.O . and it can he \'cry complicated. the finitesample dil3tribution: see Section 2.2.:!" the sample siLe b small . sr·/n (3. the population d ist ribution is ilself normally distributed.88 CHAPTER 3 Review 01 Skltistic~ unrclnted 10 the treat ment or subject characteristit:S has the effeci of a&. If.. 3.6 Using the tStatistic When the Sample Size Is Small 1n Sl:ctions 3.Substitution of the laller expression into the formur yields the fur mula for the {statistic:  y V J. . "A No\cI Way to Boosl Retirement Savings.Although the standard normal approximation III thl! Ista~ tl. the (stalistic is used in conjunct ion with critical va l ues from the standard nonnal distribution for hypothesis It!lIli ng and for the con struction of confidence intervals. 'fbcre is.ign ing dif ferent lrenlmcnLS to different subjects as if they had been part of a randomi/. As discussed in Section 3. The box . 7). the standard normal distribution can provide (j poor approximation to t he distribu tion of the {slatistic.5. however.6) of thl. hypoth es i ~ Co nsider the {·statistic used 10 test the thai the mean of Y is . onc special en. small. where the standa rd error of Y is given hy Equat ion (3.~.. and critical values can be taken from the St udent (distribu tion.B).1 2)j . The usc of t he standard norma l distributio n is justified by the ccntra l limi! theorem . Wh!.!' (~st at i stic testing the mean of a si ngle population i!> the Student f distribution with /I . then the exact distribution (that is. The tStatistic and the Student t Distribution The tstatistic testing the mean .'i on the di~tribution or Y.um.cd controlled experime nt.
. W is II random varinble with 11 chisquared distribution with 1/ .09SE(Y} This confide nce inter.(T ~) .1).. Recall from Section 2. The St udent I distribution docs not apply here because the variance estimator used to compute the standard error in Eq uation (1 .I degree ~ of freedom . constructed using the / 19 dis tri but ion.. In additio n. then the sampling. then under the nu ll hypoth esis the H t n ii~ ti c given in Equation (3 . then Z = (y .4 that the Student I distribution with 11 .i./J.llIy.19) does 11 0t produce a de nominator in the Is ta tistic with a chisquared distribution. the 5% twosided critical value for thl' ' 19 distrihution is 2. are indepen den tly distributed.09) . the null hypo th esis would be rejected <It the 5% si gnificance Icvel against the twosided al ternative.}'.. IThe d<:sired ex pression i~ obtained by mul!jpl~'ing and dl\'idinj! by ~ and oollccting terms: l' ..09. and Y and . wo uld be Y !: 2. . a nd Z and lV arc independently distributed. To "crifv this resu ll..20)..1 degree~ of freedom.. thc n critical va lues (rol11 the Student I distribution can be used to perfo rm hypothesis tests and to const ruct confidence intervals. It fo llows thllt. if the population dist ribu tion of Yis nonnal.1 = 19. Because the Istatistic is larger in abso lute value than the critical value (2. consider a hypothetical problem in which lOCI = 2.3. The Ista tis tic testing the d if· ference o f two means. the Istatistic can be written as such aralia. (1'~). y.96. W = ( 11 . l .. \ "i ( Y ..i... distribution ofY is exactly N(p.l distribution for all II.o)/V(T~ / 1I has a standard normal distribut io n fo r all /I. where Z is a random variable with a standard no rmal distribution.!!.d.py. does not have a Student .22) has HStude nt t diSlrihUlion with 11 . 1(11 . and the pop ubtian distribution of Y is N(p.4 that if )'l.15 > 2..f~..al is somewhat wider than the confidence interva l constructed using the standard normal critical value of 1. When y 1•. then th e {slatistic in Eq uation (3. thus..o)I'/r:rf/1I and let W = (/I .JA~_") \l'ul"" + GI.l)s~ / rT~: then some algebra l shows that the Istatistic in Equation (3.iJ). A s an exa m ple. . even if the population dist ribu tion of Y is normal.d. if the n ull hypo thesis JI. .1 degre'es of freedom is deIJncd 10 be the distribution of Zl'l w /( 11 . u }! n ) for all II. d ist ri bution .1 degrees of freedom.6 Using the IStati1lic When !he Somple Size Is Small 89 distributed. ' Y" are i.I~n I"" V4.J..I) .22) has an exad Studell t I distribution wit h II .= The tstatistic testing differences ofmeans.y = P1. .ViV / (>II).15 and 11 = 20 so that the degrees of freedom is" ....22) ean be writt en as Z /VW /(lI . Fro m Appen dix Table 2. y. Spccifi c.lel Z = (Y . give n in Equation (3.) . YII are i. and the populat io n dis tribution of Y is N(J.l )s~h'~ bas a X~ .1. The 95% con fi dence interval fO r My. y. recall from Section 2.O is correct.I)Jifui· Vvrlll '\ " Z . If the population dist ri but io n is normal ly d istributed.
According to conven tional cwnoll1 ic models of hehavior. In an imponllOl study published in 2001. Madrian and Shea studied a large firm Ihilt chang~'d retirement.livings anot her cxphmalion is thai you ng workers il~ ing wrong? Doe~ would simply rather not think abou t and plan directly .. fidence for thc treatment effect is light (46. the method of enrollmentopt out .ays optional. from an econometrician's pcrspecti\'c.. Although neither cxplanation is eco nomically rational in a conventional sense.lIing employ· ees. ) phm from .! the same.2%). lax code. Madrian and Shea found that thc default enrol! taken oul of the paycheck of particip.·· and bolh could Icad to lIcttpting the default rOlllilJu{'l f the default option for it~ 401(1. Howcver. ~a\'i n gs no sY!olcmatic dif· ferences bel\\ccn the workers hired before and after the change in the enrollment ddault.:onfusing thatlhey his o r her enrollment stlltus simply and th~' fill~ out a fonn.o c. 10 change To economists sympathetic to the conventional view lhat the dc{aull enrollment scheme should nOl mattcr. is al. Brigitte Madrian and Dennis ShCl1 considered Onc sueh uncon\'cntional method for Slimul tit ing ret irement saving. The ior is not 1I1~'ays in D eeord with conventional cco financial aspect!:> of the plan wer. "Iadrian and Shea argued thm there "... One potential exphmation for their finding is thaI many worken.S. called 401(k) plans afler the applicable section of the U.. there has heen an upsurge ill interest in unconventional ways to influ ence economic decisions.8% to 50.4% (n = 4249).. s. dollar value of the ti me required to fill out She~ the form is vcry small compared with the finRm:ial implications of thi~ decision.5% ( = 85.he method of enroll ment.imply Ircat the defanlt option as if il were r.igned treatment and the clluS<\1 effect of the change coutd be estimated by the difference in mean~ between the Iwo groups. whereas the cnrollment rote for the "OPI()UI " (lrC3tmem) group was 85.90 CHAPU t 3 Review of Stotis~c~ M any economists think th tll worker.. The esti male of thc trealment effect is 48. ThUs.·ladrian and wOllder~d. and those hired in the }'~ar tional mel hods for encouraging retirement ceonomi~ts focus on financial incen tives. Many [UIllS offer rClircmelll savings plans in which the firm matches. Con\'cn ~vings nonpartidpation to panicipallOn. the change was like a randomly as.. both are consiMent with the predictions of "bchllviortll eco nomics. As a con)cque ncc. nre aUlomlllically enrolled In such 11 plan unless they choose to Opl QUI: ."II other firms employees are enrolled only if they choose to opt in.9% 37' %). at some firms employres ment rule made a huge difference: The enrollmcnl ralC (or thc "optin~(control) group ~as 37. have increa)ingly o~f\'cd arter the {hangt. however.:liable ad vi ce~ cOlild this coO\'entiona l reasoning be the metliod of 1'Ilrolllllf!1It in a . find these plims:. Beeau~ thei r sample is large.efl' nomic models. or opt inshould scarce ly milller: An cmployee who wnnt:. 'JbC) compared two groups of worlen: those hired the year before the change and not automatically enrolled (but could opt in).. the 95% con.{[cel its enrollment rate? To mcasure the rf(ce t of t. Enrollment in such pl:tns. in full or in part.: that behav amI au tomatically enrolled (bu t t"Quld l)pl OUI). Recently..c... r.9% (II = 51301). Madrian and Shea's finding was lIstomshing. But. tend nOlIO save enough for their retirtmenl .
ervalions in group In and the second sum· mallon is for the ohservatio lls in group IV.23) ~' ('! u p g. the null distribution of the pooled (statistic is not a St udent I distrihution . the pooled variance estimator is bi<lsed and incon sistent. where the standard error is the pooled standard error. the pooled standard error and the pooled (statislic should nOI be used unless you have a good reason to believe Iha t th e population variances are the same. A modified ver"ion of the dit'fcrcncesofmcans Istatistic.. if the popUlation distribution of Y in group n· is N(." )..20) ..Stoti~tic VVhen the Sample Size Is Small 91 enrollmellt option . (T~. If the population variances are different but the pooled variance formula is used . the pooled SI<lD d<lrd e rror formul <l applies only in t he spt:cial case Ihallhe two group~ h:lVC the same variance or that each group has the same number of observations (Exercise 3. = n". .) ..'v .so that the twO groups arc denoted as m and IV. ] (3.rM is Ihat it applies only if the two population variances an: the same (assuming lIm *.2 degrees of freedom . it does not cven have a standard normal distribution in large samples.J. . SEp(><>I~iy. I'f the population distribu tion of Y in group 1M is N(J..~ ).:l.y~. Q". + "".6 Using the t. .I '" ..}Q()I~d . cven if the data arc normally distributed. . then under the null hypothesis the Istatist ic computed using lhe pooled standard erro r has a Stude nt I distribution with " ". distribu tion when Y is normally distributed: howc\'cr.. where the fi rst summation is for the ob:.r:.. Therefore. + 11 .  y.m. Inc reasingly mll ny ccono mi.mup .U the pop ulation vmiances arc differen t.1. ( Yi  Y". in fact. and the pooled I·statistic is computed using Equation (3.such details lIligh! be as iml'or IImt a~ fmll nci(ll aspecb for boosting enrollment in retircmcOi savings plan. see Thaler and starting [ 0 think tha t . Adopt the nOlalion of Equation (3.u ~l (.. The pooled standard error of the dif fcr~ncc in means is SEpooirA Ym .19).)2 + ~ iI ' (Y. based on a differ ent standard error formulat he "pooled " standard error COOTmlahas an exact $lUdenl .21).) . = 0". The drawback of using the pooled variance estimator S~~ . S .).ts <l rt To 1e:HIl more about behavioral economics lind tho....y~. and if the \v.) = spoolrd X ~ + l In". The pooled variance estimator is.n1e (that is. " 2 [ L .o group variances are the ::. Bena rlzi (2004).: design of retiremen t ~1\'in gs plans.
Even thoug. . Even if the population distributions are norm al.h the Student {distribution is rarely applit:n ble in economics. which allows for diffe rent group variances. the Studenl r distribution is applicable if the underlying population distribution of Y is normal. Therefore. th e . and the sample correlation coefficient. In most modern applications. any econom. Even if the underlying da ta are not normally distributed. the Sample Covariance. This section reviews three ways to summarize the relationship between variables: the scan crpiol. is as given in Equation (3. for n > 80. 3.92 CHAPTER 3 Review 01Statistics Use of the Student t Distribution in Practice 'F or the problem of testing the mean of Y.19) . inferences about differences in means should be based on Equation (3 . stat istic computed using the standard error fonn ul a in EqouLioD (3.7 Scatterplots. the normal approximation to the distribution of the .Ul . like many oth · ers. some software uses the Student I distribution to compute pvalues an d confidence mler vals.19) does nOI l w\'~ n Student t distribution. X (age). \¥hen comparing two means.19). the diffcrence in th e pvalu~s comp uted using th" Student (and standard normal distributions ne ver exceed U. "The Distribution of Earnings in the United Slates in 2004" and "A Bad Dayan Wall Street"). Accordingly. For n > 15. large enough for the differe nce between the Student r distribution and the sta ndard norm al distribution to be neg ligible. 10 anothcr. the)" never ~xceed 0. and in all applications in th is textbook. this does not pose a problem because the differe nce betwee n the Student I distribution and the standard normal distribution is negligible if the sam ple size is large.. In practice. sec the boxes in Chaple r 2. used io conjunction with the large sample standard normal approximation.ic reason for two groups havin g dif fe rent means typically implies that the two groups also could have different vari ances. therefore.statistic is valid if the sam ple size is large. In practice.002. the pooled standard error form ula is inapp ropriate and the correct standard error fonnula. inferenceshypothesis leslS and confidence intervals about the mean of a distribution should be based on the hlrgesample normal approximation. and the Sample Correlation What is the relationship between age and earnings? This question. Y (earnings). normal distributions lire the exception (for example. however. relates one variable. the sample sizes are in the hundreds or thousands. For economic variables. the sample covariance.
lf~ Scatterplot 01 Average Hourly Earnings Vi. Scatterplots A scatterplot is a plOLof n observa lions on X . E ach dot in Figure 3. I . Figure 3. and earnings could nOI be pre d icted perfectly using only a pe r~on '" agl::.. The sc<l. .. Age Averagt Hourly earnings 90 • R() 70 ' ~}f • • • • • • • •• • • •• • • • • • I • • • 401• • • • • •• • • •• • • • • • • I • • I • I • "'lI • •• • • • I ••• • • • • •• I • •• • " • • 201. .• • • • • • • • • • • •• •• • • • 501 • 0 !O 25 30 35 40 45 50 5S 60 A" 65 Exh point in the plot reprosent5the age and average earnings alone of tho 200 workers in the sample.7 Scotlefplo~. the Sample Covariance. I : : I i·· .2 corresponds 10 an (X.2. I .25 per hour. • ••• .2 is a sca ne rplol of age (X) and hourly earnings (Y) for a sa mple of 200 workers in Ihe information industry from the March 2005 CPs.. The data are for technicians in the informotion industry from the March 2005 CPS. .25 per hour: this worker"s age Clnd earnin gs are indicated by the colo red dot in Figure 3. For example one o( the workers in this sample is 40 years old and earns $31. Y) pair for one of the observations.. II . I I . I • • • I • · ' • \0 I.•. and the Sample Correlotion 93 ! FIGURE 3.YJ For example.2 . tlerplot shows a positive relationsh ip between age and earnings in this sample : O lder wo rk ers tend to earn more than yo unger workers. . The colored clot corre~51o a dO·year·old worker who oorns $31 .3 . Th is re lationship is nO! exact. I' • •• I • •••• • . and Y" in wh ich each observation is represenled by the poin t (X. however..
More generally. the sample covariance is consiMenl.pectat ion ) with a sample average. A high correlation eodfiden t dOes not necessarily mean that Ihe line has a ~tl:!ep slope .1\). how eve r.: popula tion covariance and correlation can. Y. the average In Equation (3. The sample correlat ion equals I if X. then thcre is a po~itive rela tionsh ip between X and }.. The ~amplc cova riance a nd correlation arc estimators of the popula tion cova riance and corrc lation. thl! correlation is . this difference ste mS [rom usi ng X a nd Y to esti mate the respective popula tion mcans. denoted.2') b computed by dividing by 1/ .' (the c>.). . . . i = 1.(X.. Th(. .1'. tht: closer is th e correlation 10 :: 1. Y) ' I .2.:rplot fall very close to a straight line. for all i. (3. If the line ~lopes down.24) Like thc sam pic va riance.94 CHA PTfR 3 Review of SIo~5ticS Sample Covariance and Correlation The covariance and correlat ion were introduced in Section 2.1. the sampk correlalion is unitle~) a nd lic~ betwcen .3 as two propertics of the joint prohability distri bution of the random variables X a nd Y..I instead of n: here. Consistency o/the sample covariance and correlation . The sa mple I.X )( Y. .:!: I if the scattcrplol is a straighlline. Like the sampk . it makes Ii HIe diffcrL'nce whether division is by n or n ..a mple correlation . is 'n° ~ 1/  L " I . amI the correlation is I. The s1tmplc correla tion coe ffi cie nt.. Like the cstim:llors di~u~sed previously in this chap te r. Like the population correlation. then there i~ a negative rdat ion<:hip a nd the correlalion i~ I. variancc. Whe n IJ is hLrge.:0. == . in practice we do Dot know the population covariance or correlat ion.\ ) nle sample correlation measures th e ~trl:!ngth of th e linear a~socjation be tween X and Y in a sample of II observations. . rather. IS dcnoteJ r x)" and is the ratio of the ~am ple covariance to the sample sumdard d~\'iati ons: rn' =~) )' tn (3.1 . or . Thot is. too.:ova riancc. be t!stimalcd by Inking tI random sample of /I members of thc population and collecting the data (X" Y. Because the population distribution is unknown. If the line slopes upward. the)" are computed by replacing a population "vcrag!. The closer the scanerplot is 10 a straight line. for all i and equals I if X. it means that the points in the snllh. n .] and l: r\y l s: 1.
3b shows a strong negative relationship wilh a sample correlation of . a measure of linear association. th is rela tionship ill far from perfecl.79) = 0.3 that lhe sample covariance is consistent.n ~ corr(X.1mple correlation is zero.l'l:.01 /(10. the rea. Y in itiaUy increases but then decreases.S13.. The (.3c shows a scallerplot wit h no evident rela tions. . for these data. in which casl.75 X 1379) = O or 25 %.75 years and the sample standard deviation of earnings is.75 X 13.1l1ere is a relationshi p in Figure 3. For these 200 workers. Example.. thc correlation coefficient is r!lF = 37.). small vlIlues of Y are associated with oo. and is left as an exe rcise (Exe rcise 3.U" (3. Figure 3. The covariance between age and earnings is s"t.on is tha t. the s. and that X. = 37. nol readily interpretable).0. that is.orrelill ion of 0. but as is evident in the SClt u erplol. X increases. . Figure 3. and Ihe s. and Y.25 means Ihal there is a posit ive rela tion:. .3d ~hows a clear relationshi p: A.7 5<_pl"I>.9.. r . i~ 3701 (u nit s a re years x cen ts/ho ur): the n the correlation is 3701/(1 0. but it b not linear. mple correlation is 0.8.11lUs.2.79Ihour.25 or 25% . the sam ple correlation coefficient is ('on3blem. 111.d.26) In othcr "orus.) are Li.3 gives additi onal e xa mples of scan erpJOl!. Despih~ this discernable relationship between X a nd Y. This final example emphasizes an important poinl:'i"ne correlation coefficie nt i. Y.20). U.26) under the assumpt ion that (X.anooce.: the sample sta n d ll rd devia tions of earn ings is 1379~Jhour und the covar iance bet ween nge ami earning:. Figure 3.h large and ~m a ll values of X. and correlation .Ion"" ~n 95 L. consider the data on age and earnmgs in Figure 3. the sample st<tndard deviation of age is.i"!cn!. A~ an example.hip. To verify Ihat the correlation does not dl:!pe nd o n the uni ts of measurement. . and the Sample ("". suppose that earnings had been reponcd in ce nts.01 (the units arc years x doUarll per hour. proof of the result ill Equation (3.and the sample correlation is zero.3a shows a strong positive linear relat ionship between these vnriables.2S Figure 3. have rinite fourth momcnlS is similar to Ihe proof in Appendix 3.:. Fig ure 3. Y.hip between age and earnings..' .. the Sample Co.\1I = 10. in large samples the sample covariance is close 10 the popula tio n covariance wilh high probahilily.3d. l3ecause the sample variance and sample oova riancc arc con<..
1"0. . '" '"o 7" 70 80 90 100 110 120 130 ... ' I . .... X is inde pendent 01 Yond rhe two voriobles orc uncorreloted....0.0 30 301 20: Wi • tlO '..0 (d) Corrcl:lt K)o .Ill '" 01 70 1'10 t)O 100 110 t2t) 130 .30 oM 3. . . c· '0 ·. "''hen Y .. the two variables olw are unconeloted even though they ore related nonlirleOrly...0 . In Figure 3..3d./11...." . ... .'ll1lple average. 60 '0 ~.m • ...\h \ .' . • .. 120 . . . " '" . 01 .. . ' . 70 ." . '.." . . ': '. '" •0 30 .. . . .: " • ' .. " '. is :\O estimator of the population mcan.. . ' •.. (d CorrcLmon . ... '..:. " 7u 90 lfIO 110 120 Do ..3 ill...0. Y. Figures 3. . .. (a) O mclation .' ' .0 (qu:ldratlc ) The scolterpioo i.. Y is unbiased: . . ! . ..' ... '. i: .3(..9 (b) Corro:hlll<..8 7 6 W 50 " .. . \.' .96 CHAPTfR 3 Review of StotitliC\ FIGURE 3.~ . w 10 01 70 . the sampling dist ribution of Y has mea n JLy lind \ ariancc /r ~ = (T ~/ " : h..' I ..'.01 . ' • \"1>" w 50 . The s. ·.3b show strong linear relationships between Xcod Y. .: :. ..'. . . • .jJ. .. 6111 . . 1111 .. In Figure 3.y. Y" arc LLtI .. . KO . a. " ' : . Scotterplob for Four HypothetKal Data Seb . 100 130 ..'" " :..:.1:' • .. Summary I .
A 95°/() con fidence in terval for J. The (stat islic can ~ used to calculate the pva lue associatt:d wit h the nu ll hypoth esis.q uares estima tor (70) hypothesis test (71) null and a lternative hypotheses (72) twol>ided alternati\'e hypothesis (72) I)value (significa nce pro ha hilily) (73) "ample variance (75) "ampit.'I. H ypolhesis lesls ami confidence intervals fo r Ihe difference in the means of twO popu lnt ions arc conceptually similar to tests and intervals for the llli:!aD of a sin gle popu la tio n. !. The sample correla lio n cue[ficic nl is an estimator of Ihe population correlation coe ffi cient and measures the lini:!ur relationship be tween IWO varia blesthat ill.Iam.lior (76) I~talistic «(ralio) (77) te". If II is large.n cslim. 4 . 5. and efficiency (68) critical value (79 ) rejection region (79) .s i~l c nl: and d .ht line.. the 'statistic has a standard normal sampling dis tril:>ulion when the null hypothesis is true.Ieeeptanee region (79) size o r a Ic<:1 (79) powc r (79) onellided al terna tive hypothesis (80) confi d e llc~ sct (81) confidencc Ic\'el (S I) confi de nce interval (HI) coveragc prohahility (83) t~ 1 for t he difrerencc hclween two means BLUE (69) lea<:t <.a5. Key Terms e"timator (68 ) cS li ma t~ (68) l. con<:i<.ample covariance (9~ ) sample currelation coeffieienl (sample correhlli<1I1) (94) eau~al . (77) lype I error (79) type II error (79) "ignificl\nce level (79) (8] ) c{(ecl (85) treatmen t effect (85) scalterplut (93) <. A sma ll p va rue is evide nce Ihrl t th e null hypothesis is falsc.. Y has an approximatcly normal samrling distribution when the <.'a tblic.Key Terms 91 c.ly is a n interval conslructr:u so thot 11 conlainb the true value of p.y in 95% of rcpealed samples. 2.. 3.lard error of 1I. 6. by the law of large Dumber~ Y is cOIl.: ~landaf(1 de"iation (75) degrees of frecdom (75) .te ncy. how well their scallcrplot is approximated by a st ra ig. by the central limit theorem.ample size il> largc.. 'i"he ' lltatislic is ust!o 10 test the null hypolht!sis Ihal the popuilltion mean takes on a part icular value..
2 Let Y be a Bernoulli random va ri ab le with success probability Prey = l) =< p.2 Explain the diHcrence between an estimator and an estimate.. 38 Sketch a hypothetical scatterplot for a sample of size 10 for two random vari· abies wilh a population correlalion of (a) 1. sample fro m this populat ion for (0) n = 10: (b) 11 = 100. ~hn . (b) .5: (elO.he law of large numbers.0.e 11 = 64. Ihlll v!lr(j1\ = n(l  .7 Explain why the differe nces·ofmeans estimato r. 165.0. and power? Between a onesided and lwosidt:d alter· oative hypoUles is? 3.3 A population distribution hal' a mean of 10 and a variance of 16. Show that b.. applied 1 data Crom a ran 0 domized COntrolled experi me nt. In a random sa mpl e of size n = 100.i. is an estimatOr of the treatment effect. 3. b.1.\/ . find PreY > 98).98 CHA..0.. Show that p is an unbiased esti mator of p.PTER 3 Review 01 Stoli~liC5 Review the Concepts 3. •' p = y. find Pr(lDl c.O. and Je t Y . Use t he centra l Limit th eorem answer the following questi ons: It) o. In a random sa mple of si7.1 Explain the diffe rence between the sample average Y and the population mean. find Pr(Y < 101). (c) 0.. In a random sample of size n = < Y < 103). 3. draws from this distribution.d. Re late your answers to .. and (e) n = WOO..d. y~ be U . Le i p be the (ral" lion of successes (I s) in this sa mple.. significa nce level.9. De termine the mean and va riance o f Y from a ll i.4 Wh "t role does th e central limit Lheorem play in stati stical hypoth esis test ing? In the construction of confidence intervals? 3. Provide an example o f each. 3. . a. y = toO and a~ = 43. t In a popula tio n p.5 Wh at is th e difference between a null and allernative hypothes is'? Among size.ul t of a single hypot hesis les!'! 3. . 3.. Cd) . Exercises 3.6 Wh y does a con fiuence interval contain mo re informa lion than the re:<. .
50 vs. to calculate the standard error of your estimator. c. Construct a ~5<J/o confidence interval for p. Let p denote the unction of ali likdy voters who preferred the incumbent at the time of the survey.5 vs. HI: P 0.5 vs. You arc interested in the competing hypotheses: Ho: p = 0. a. Why is the iolerval in ( b) wider than the in terva l in (a)? d. ". test the hypothesis Ho: P = 0. an d the votcrs are asked to choose between candidate A and candidate B.3: a. Test Jio:p = 0.5? e.5 vs. Construct a 50% confidence interval for p.02. HI:p > 0.5 vs. What is the p. c. Let p denote Ihe fraction of voters in the popu lation who prefer candidate A . ft i.p)ln.Exercises 99 3. 3. . HI:P :#0.5 Ys.3 In <t survey of 400 like ly vOIers.51 > 0. nod let p be the fractio n of survey respondents who pre ferred the incumben t.50 at the 5% significance level. 215 responded tbal tbey w(luld vote for the incumbent and 185 responded that they would yote for the challenge r. Construct a 99% confidence interval for p. What is the pva lue for the leS1 Ho: P = 0.5 using it 5% significance level. and lei tion of Yoters in the sample who prefer Candidate A.0(1 . Without doing any additional calculations. Why do th e rc~ ults from (c) and (d) differ? f. b. b.S? d.54 . ii. What is the size of Ihis test? ii. Compute the power of this test if P = 0. HI: P j:. In the survey j. Construct a 95% confidence interval for p. HI :P > 0. a.S uppose that yo u decide torejeci Ho if 1 .0. .53. p denote the frac 3. iii. Use the estimator of the voriance of p. Construct a 99% confidence inte rval for p. Use the sur vey results to estimate p. iv. Test Ho:p = 0. b. Did the survey contain statistica lly significant evidence that the incum bent was ahe<ld of {he challenger at the time of the survey? E:\ plain.5 A survey of 105S registered voters is conducted .5 using a 5% significance level. H r: p "* 0. ={:.4 Using the data in Exercise 3.value for (he test Ho:p = 0. O.5. p= 0.
. ConsiJer tht: null ami alternative h ypothesi~ Hu: p.03. In survey jil rgon..: thai the ne\\.denote the mean of the ne w process.pI > 0. What is tbe po.01) ~ 0. plant produces bulhs with a mean life o f 2000 hours and a standard dc ~' jation of1oo hours. il is ~ linH:!S the length of 95% co nfidence imerval. d ra ws from a d imibution with mean /l. Can you determine if p.8 A ne. 3. \Vhal is the probability thai the true value of p is contained in all 20 of these l.process is in fact beller and has.96 x SE(p): that i'l... l ll<'ll t~ you wnnh.t mean hulb life of215O houf'. How many Qf lhese con fidence the true value of p? int~f\iab do you c '\:pcct to contain d.7 In it given population. Let Il.lcncc inter· \al? Explain. version of the SAT Icsl is given to H)(X) randoml)' selected high schoul seniors. 11 % of the like ly VOlers arc A frican American .:. she "'HI conclud e that the new p races:. value of 0. a. "" {) is containcd in the 95 % confi<. = 2()(X) vs.:: 20 ~ur. the " margin of error" is 1. a 95~i) confidence interval fo r p is constructed.. 8% African Americans. Ie!S I score i5 3.hould 1/ h~ if the sur vey usc:'> simple random sampling? 3. The sample mean 1110 and the sample 1Ilamlard de\ ia· tion is 123.9 Suppose that a Lightbulb manufacturing. A test o f Hu: J1 = 5 versus Ill: J1 "* 5 using the usual fstatj~tic yield" a ".: invenlor 's claim if the sample IDcan life of the bul b!! is grea te r th.100 CHAPTER 3 Review of Slatistics c. The plnnt l1lannger randoml~ St!lcCI!! 100 bulbs produced by the p rocess.. Construct a 95% confidence inkrval for the population mea n test score for high school senion. An inwnlOr cI"im ~ 10 have developed an impro"fo!d process that produces hu l h~ with a longer mean life an d the S<l me standard devitttion. fl 2100 h ours:othl. Dues the 95% confi dence interval con lain J1 '" 5? Explain. b. Suppose thallhc survey is carrieo out 20 lime. i.05.'d Pr{ li) . using independently selected voters in each survey. Is there e\ idence that the sur\ey IS biased? E xplain .cY". cr of the plant manager'" h!sling . Ho w la rge :. A sur vey using a simple r an<. What is the sitc of the planl manager '~ testing p roce dure? h...d.'Onfidence intervals? ii. fl l : It > 2lWlO. is no better than Ilk old process..6 Lei Yl" ' " Yn be i. Supp~t. u. 3..i .. Fo r each of Ihes..lom !!ample of 6(Xlland·li lle telephone numbers find:. She says that she will bt:licvc th. Suppose you wnn tcd to design a survey thai hau a margin of error of a1 mO~1 I %.:rwi~e.
t about w<lgc d irfcrenec~ in Ihe firm ? Do Ihey repre~nt staListie<llly significan t evidence Ihat wages of men and women are d ifferent? (To answer this qu~ ~tion . Th e authors plan 10 admini ster th e test to <III thirdgrade stude nts in New Jersey.. A !'. Whal lesti ng procedure should the pla nt m.. d is~ .t is given to tOO r<lndomly selccted third· grade st uden ts in New Jersey.: inte rval for Ihc mean score of all New Je rsey third graders. um· mary of Ihc resu lling monthly salnries follo ws: Averoge Solory (f ) Standard Deviation (sr) " 1 00 Men S. c. 12 To invcstiga te possible gende r d iscriminatio n in a firm . a.5 .25(T ~.\lOO $2900 $200 Women S:t20 64 n.Exerci~ 101 c.:ics? Explain...value of the le:. 10 Suppose a new standardized te:. he wants the si/e of he r test to be 5 %? 3. is 8 poin t!\.a and New Jersey students are different'! (\Vhat IS Ihe Slandard error of the difference in the two SHmplc means? Wh<lt is the p . Show that (a) E(Y ) = J. compute thc relevant (statistic: third. first slale the null and alternati\e hypothesis: second . producing a sample average of 62 points and sample sta n· dard de. and finally usc the pvalue to answer the question.\ of no difference in mean'\ veniUS some difference?) 3.mager use if . What do these data sugge:..l yield Y "" 646. b.l..on ~ y = 19. 3.13 Data on fift h·grade test scores (reading and mathemafics) for 420 school Iriels in ("alifllrn.a sample of 100 men a nd 64 women with simila r job descri ptions <lrc selecled al random. comput e the p value associated with the /'SllIlislie.dcfi ned in E q uation (3.2 and ~larHhtf(1 de"ial.. 11 Consider Ihcc!'. sr. Construct a 95% confident/. Construct a 90% confi de nce interval fo r Ihe difference in mean scores between 10Wl! and New Jersey.l n. Suppose the same test is givc n (0200 randomly selected third grade rs fro m Iowa. 1).) b. Do these data suggest Ih:lI thc firm is guilt}' of gender discri mina tion in its compe n ~a lion polir. 3. The sa mple average itCorc Y on the test is Sf! points and Ihe sH mplc standard deviatio n.timalor y . iation of 11 points. Can you conclude with a high degree of confidence that the populat ion means for 10. y a nd (b) var( Y) = 1..
. TIle resu lting summary sta tist ics aT e = 70. ~orc in t be 1>. The test is administe red to 453 ran domly selected stu de nts in Florida: in this sample. and 350 re pon ed 11 preference (or Senator Jo hn Kerry. 5 inches: Y = J 58 Ihs. 2004. 3. a nd 378 re ported a preference (or Ke rry. b. a..4 17... 3.102 CHAPTER J Review 01 Stati stics B..R inches: Sy = 14.0 19. Construct a 95% confide nce inte rva l for the frac tion of likely VO ters in the popula tion who favored Bush in t:a rly October 2004. sur veyed 755 likely voters.15 The CNNtUSA Today/Gallup poll cond uc ted o n Septe mber 35. Bush. the mea n is 1013 and the sta ndard deviation (s) is 108. 1lle Cl\'N/U$ A 10day/GaUup poll conducted o n Octoher 1. Construct u 95 % confidence inteTVul fo r the average test score for Florida stude nts.73 inches X Ibs.. the {ollowi ng.9 [5 there s tatistica lly significa nt evidence Ihal the dist ricts wit h smaller classes have higher average test scores? Explain.1 6 Grades on a sta nda rdi7. a. Was there a statistically significant c ha nge in vote rs' opinions anoss the twO dates? 3.surveycd 756 likely \ '0[ t: rs: 37l=i re pon ed a prefe re nce for Bush. SXY = 21.14 Va lues of heigh I in inches ( X ) and weigh! in pounds (Y) tl f e recorded fro m X a slImple of 300 male college stude nts. and r XY = 0.85. 405 reported a prefere nce fo r Preside nt George w. c. ConSlruCI a 95% con fidence iDlerval for the mean test population.n the Un ited States. 1004.2 Ibs. Ckru Siq Score I Yj Stondard ~ lsrl " 238 182 Small Large: 65 i A 650. results were found : A. When the djsuicts we re divided inl o districts wjlh small classes «20 students pe r leache r) a nd la rge classes (2:: 20 stude nts per leacbe r) . Construct a 95% co nfidence inte rval for the fraction of likely voters in the population who favored Bush in early September 2004. s x ::: I.. Con vert these stati stics to the me tric system (meters and kilograms).3.cd test a re known to have 3 mean of 1000 fo r studentS i. .
and the standard devialion of the cbange is 60 po int ~ i..18 n lis excrcise shows that the sample variance is an unbiased estimator of the popula tio n variance when Y i •. 'TIley are given a threehour preparation course before the test is adminis tered. b. . I s there statistically sis nificant evidcnce Ihat the prep course helped? d. Construct a 95% con fi dence interva l for the change io average test score associated with the prep course.d .I91J2 ..2004 . The o riginal 453 student s iJ re s iven the prep course and then asked to talce the test a second time.y)2 1= var(Yi ) 2cov(Y"Y) + var(Y) . Thei r average test score is 1019 with a standard deviation of 95.y.) 3. Is there statistica lly significa nt evidence th at students will perform beller on thei r second attempt after laking the prep course? iii. Construct a 95% confidence in terval for the change in average test scores. Is there statistica lly significant evidence thaI Florida s tudents perform diffe rently than other slUdents in the United Stales? c. Another 503 students are selected at rando m from Florida.".17 Read the box "The Ge nde r Ga p in Earni ngs of College Grad uates in the United States. Describe an experiment (hal would quantify these two effects. with mean tL y and variance tT~. c. ii. ..I. 3.3 1) ( 0 show tbal E( Yi . Construct a 95% confide nce interval fo r the change in men 's average hourly earnings between 1992 and 2004. it. Y" are U. Construct a 95 % confidence interval ror the change in women's aver age hourly earn ings between 1992 and 2004 . The average change io their test scores is 9 points. Ii.2Il).. I992 is independl!nt oCYm. Use Equation (2. tudents may have performed hetter in their second att empt S because of the prep course or because they gai ned tcst·taking experience in the ir first attempt.. i. Construct a 95 % confidence in terval for the change in the gender gap in average ho urly earnings between 1992 and 2004. (Him: Y.Y.. ." iI.Exercise5 103 b..
l fo r the mc:m of A H E for high school graduates.ults in parts (0) <lnd (b) to show thai E(si·) '"' (I"~. would you use the re!tults fro m (a) or from (h)? Explain. A detai led descript ion is given in C PS92_04_ Descriplio n.i. 11.. agl! 2534. Usc the rc.3.S. Use the 2004 dlltll to con~truct a 95% confidence intcrv". (llm/: Usc th~ strategy of Appendix 3. 3. Con:. d.ing power from 1992102004.. 1I.u.h:!111 estimator 01 3. In 2004. = n. 1n 1992. the value or the Consumer I'rice Index (CPI) was 18R9.u\\·bc. where SXY is defined in Equation (3.23) cqual$ the usual standard error fo r the d ifference in mean<.t aDd the change hetween 1992 and 2{X). "contains data on fulltiml!.. Y.) are i. adjust the 1992 data fo r the price innation that occurred between 1 992 and 20()4. in Equation (3. 19 u. R cpc <11 (a) but usc A IIE mca sured in real 2004 dollars ($2(X)4): that is. with finite fOUl1h momtmt~ IL5? Prove that th.) 3. c. Use Equation (2.' degree. .. Is yl a consi!..104 CHA PTf R 3 Review ol Statistics b.3 and the Cauchy· Sch\\aru inequality. Is )72 an unbiased estima tor of . c. avai lahl\! on the We h site. If you \\ cre interested in the change in workers' pu rchll:. Compute the sample mean for average hourly earnings (A H E) in 19l)~ and in 2004. Y is a consistent ~slimutor of JJy. )'.~'! b.comlstock_wa tson you \\ill find a data iilc W C I'S91_04 that con tai n) an extended version of the dmascl used in Table .Y.1 ' of the text for the years 1992 and 2(04.. the value o f the C PI was 140. l On the text We b site W \".d.m. Use these data to answe r the following questions.33) 10 show that cov(¥.19) wht:n the two group sizes lJrl! the S<lnlC (1/".Y. fullye ar workers. as the ir highe. Empirical Exercise E3. Con"truct a 95% conndcnc~' ..l.A.21 Show that the pooted standard error [SEp<WJI~d(Ym .')I given fo llowing Equat io n (3.).)  u} f ll.20 Suppose that (X. Y is an unbiased estim:ltor of /J. with a high school d iplo ma or G.24). lhat is.truCl a 95 % confidencl! interval for the population means of A H E in 1992 and 2(1().. )..\'Y ~ (1"..:: sam· pie co\ ariancc is a con~ i s l cn l estimator of the population covariance.
lh.'~ bt:1\\een the results for high ~chool and college gradualc.'als. l abJe 3..t 1 were computed workcr~ thl' M."lbe exact random sampling scheme rather complica ted (firsl.The sample I~ chosen by nmdomly select ing addresse~ from a datab<lsc of addresses from the I~ mo~t recent decenmal c<::nsus aug· mented with dllta on new housing units corntrucled afler the I:ISt census.S. .S. More th<ln 50.:u!.lIa are for fuil·tlme defined to be somehody emplovcd more than 35 hours per week for al least 4R weeks in the prevIous vcar. . IS more delailed than in other month~ . e.. go~) The ~Uf\C) conducted :aeh March Jbout earnmgs during the ~un'cy~. Prepare a similar lable for high schOOl graduatcl' using the 1992 and 2~ data. Repeat (d) using the 1\)92 data expressed in $2004. unemployment.. Current Population Survey l OS imen·al for the mean of AHE for workcrs with a college degree. Con struct a 95% confidcncc interval for the. which provides datu on labor foret' ehnraeterisl!t'S of the population..1bor Sllllistics in the U.. The statl~lic:.h school graJuatcs increase? Explain. Are there any notable differc. Current Population Survey _.c/eclcu.. f.'! APPENDIX ___3 _ .. 1 Fnch month the Bureau of L. Oepartment of Lahor conducts the "Curren t Popu lation Surve y" (CPS).tnd a. Did real wages of college graduates increase? D id the gap between earnings of college and hig.000 U.(. households are sun'eyed each month. g.lrch The CPS earnings d.:n hoUSing units within Ihese = s are (WW\\ can he found in the Hallllh()tIK of Labor Sranstic5 and on thl' Bureau of LatlOr StatistiC"i Web site hl. confidence inter.tphical areas are random ly randoml~ ~Iected}:dctllils . including thi: level of emrlo)·mcnt.1 The U.and earnings.h(fercnce between the IWO me:ln5.lhe u. using appropriate l'~timalcs..r.ks que~ti(ln~ u~ing pre\'iou~ ~enr..1 presents information on the gender gap for college gradu ates.s. in lithic . Did rcal (inna tionadjusted) wages or hi~h school graduates increase from I Y92 to 2004" Explain. and test sl3tis tics. S. small geo.
thai Y is the leasl square~ t.Y .. l .Y . thai Y minimw:s the sum of squared prediction mistakes in Equation (3. the sum of squared predict ion mistakes IEqUnll0 n (3. so thil! dW . second equality uses the [act that I ~ 'l (Y'  Y) = O.2 Two Proofs That Y Is the Least Squares Estimator of f. Calculus Proof To minimize the sum of squared prediction miSHlkcs.'\lng d to make the second term. one using c. . . . (3.".Ly This appendix provides two proofs.(Y...III) . . T 2t(2 (Yi .1. + 211111:: O .2)) is n +d II L (Y."Slimalor of £(Y).(Y  be tert>.. . II m)2 = L (Y. Noncalculus Proof The strategy is tashow that Ihe difference between the least squares esti mator and Y muSt III . by selting.m )2 is minimized by ch()(). from which it follows tha i Y is Ihe leasl squ a r~ estimator.m)2 ..([Y. lake its derivat ive lind set il iO zero: d ~ d L( Y m I_ I ' m = m)l = . Because both terms in the final linc of Equat ion (3.=1 n . This is done by setting d = 0. 1 " (3. Then (Y..2 LY. . thai is.m)2 is minimized when y.Y] + d)2 "" ( Y.y) ~ .::!d) where th.'Y)1 + 2deY.so that Y is the least squan:s es tim.. (Y. ndl. . N .27 ) Solving fo r the final eq uation fo r m sho ws thaI ~:.l 2: (YI. I :. as small JS possible.28) are nonnegative and because the first term does not depe nd on d. m . 1(Yi .d. CHAPTER 3 Review of Statistics APPENDIX 3.m. .~1  Y) + I1(P = L (Yi . Thus...'D.l lQr o f E(Y ). Lei d =: Y ..2)lhal is. .lculu$ and one nOl .I  ¥f' + II(J ~.
y)l II  1 .l y 10 wrile (Yi initionaf I~ [Equation (3.: (Y._l (V ' ~.p.. hy a~sumptio n .)'l (" )(y.) . so the first term in Equation (3. W" art: i..J. we have that  y)2= [( Y.'.:. a co n· sisten t est imator of the population varia nce uf·.l)) .J. . ni (3. Define Wi = (Y.  lIy)2 .y.2(Yj .. Combining tbese results yields j~. .d.y)1 into Ih(: def· I • " • ... and E( Y ~) < ""'.. .) < "". . y) ..l y) + (V . w(n  I) + I.)·f '·1 I II •(_ )[! ± n n.J'.d. In Key Concepl 2..=j'" 2" L(Y. the Tnndom variables W" . add and subtract J.~.i.y)] and by collecting terms. SO ~ ~~..A Proof That the Sample Variance Is Consistent 107 APPENDIX ___.n(Y .6 and \V ~ E(W. Y~ arc i...L (Y . . £( Yi) < 00.7)]._11 33 A ProofThat the Sample Variance Is Consistent This a ppendix uses the law of large numbers to prove thM the sample variance $~ i!.y) l  .l y) +. First.! 2.21}) .p.~ con\'erges In probability ul..: u ~ (by the d~tinition of tht: variance).I(YI  /.  #.J.~Yf. The law of large numbers can now be applied to the twO terms in the final ti ne of E qua· lion (3.. a s st ated in Equation (3. ~ u1.~y)2 ~ 0 so Ihe second term conerges in probability to ze ro. and var( IV.29) p.i. .I( Y' .t y)2. . CY toul·.4) 2 and E(Wi): ui·.: E[(Y.. Because the random variables Yl " .29) where the fin a l equality follows (rom Ihede finit ion ofY [which impl ies that I:71( Y .l yf Substituting this ex pression for (Y. Now £(W.:r L(Y ..l n.ly)(Y .·1 J. y)F . .p.. Y" are Li. £(Wl) . • IV~ are i j. ..1 ~ ' I II _ 1 .9)./J.).. .d. Also. whe n YI.(.'L )')~ J < "" because.lIy)(Y . Thus IV.\ L(Y.d.so IV satis fi es the condItIOns for the law of large num· ber<. In addition.J.(Y .. Because Y !:.. Dut IV .
PA RT TWO Fundamentals of Regression Analysis CHA PTER 4 Linear Regression with One Regressor Regression wilh a Single Regressor: Hypothesis Tests and Confidence Intervals Linear Regression with Multiple Regressors Hypothesis Tests and Confidence Intervals in Multiple Regression Nonlinear Regression Functions CHAPTER 5 CHAPTER 6 CHAPTeR 7 CIIAPTER 8 CHAPTER 9 Assessing Studies Based on Multiple Regression .
a sample of data on these IWO vari:tb l t'~ ll1 is cha pler describes methods for estimating this ~Iope using a random sample of data on X and Y. or earnings ). or )'e:lf~ of schooling). Just as the mean of Y is an unknown charaClerif'tie o f the population distribution of Y. daS:. Y (Y being highway dealhs. dcmc:ntary ize :lehool classes: Wha t is the: effec t on its students' sta ndardized le:. studen t test scores. to e~limate the dfec\ on Y of a unit change in X using.s on drunk ~rivers. 111ls chapler introduces the linea r regression model relat ing onc variable. X. say. duting class sizes by.This model postulates a linem rela tionshi p belwccn X and Y: the slope of the line relating X and Y is the efft!ci of a oneunit change in X on Y. Y. . on a nolhe r variable.1 scores? You succc:. we show how to e"l imate the expected effeci on teM scores of r.. using dala on c1a~s size~ and lest scores from different school districts. one student per class. size. X (X being penalties fOT d runk driving.ough new penaltie.: Wh.'1t is the: dfcct on hLghway fatalities? A school dlSlrlCI eUls the S of 11'0. the slope 01' the line relating X lind V is an unknown <.scs: What your fut ure earnings? i~ Ihe e(feet o n All three of these questions are abom the unknown effect of changing one variable. imple men~ I.:haraetcristi e of the population join t distribution or X and Y. to a nothe r.ChAPTER 4 Linear Regression with One Regressor A S'3IC.. For inslance. 111 . The slope and the intercept of the line relating X and Y can be estimated by a mcthod called ordinary leasl squares (OLS) . 'nle economC"tric problem is to esti mate this slopeth at is.sfully compielc one more year of college cla:.
Si:t' You could also answer the superintendent 's actual question. ClassSize ' ('..) X ( 2) .TeslScore = f30""Si:t X Il ClassSiz.6. If she hires the teachers..6. J3c~s. rearrange Equution (4. To do so. Sill! Caces a tradeoCe.1 • 112 CHAPTIR 4 linear Regre~sioo wilh One Regl"e$SOf 4.1 ) so that !J.) Suppose thai f3(1JwS. In many school di strict~ student performance is measured by standardi7ed tests. Parents wan I smaller classes so that their ch ildren can rccci\e more individualized attent ion. (della) stands for "change In:'That i s.:r = 0.. Thus.. she will reduce the number of students per teacber (the student.(. and the job status or pay of some administrators can depend in part on how welllheir student s do on these tests. which is nOI (0 the liking of those paying the bill! So she asks you: I f she eUl s class sizes.u is the change in the test score that results from changing the class size. If you were lucky enough to know (3climSlw You wouJd be able to tell the super· intendent that decreasing class size by ooe studenl wou ld change dist rict wide tesl scores by f3 C/a. what would she expect the change in standardized lest scores to he? We can write this as a maLheOlalical relationship using the Gree k let ter heta .6.2 points as a resu lt or the redll(' (iew in class sizc!> by two siudents per class. Thcn a red uction in class size of IWO st udentS pttf d ass would yield a pred icted change in test scores of (0. We th t:refore sharpltn the superintendent 's question: If sh. (3c:w&~" where tbe suh script " ClassSize" d istinguishes the effect o f changing tbe class size (rom otbt:r effects.6.. J3CI.e reduces the average class size by twO student s.ll where the Greek letter. wha t wiu Ihe effect be o n slUdent performance? '.2: tb al j" )'ou would predict that lest scores wou ld ri~e by 1. . If the superin tendent changes the class size by a certain amount. what will the effect be on standa rdized test scores in her d istrict? A precise answer to this qu estion requires a quan titative stat ement abo ut changes. (J.. 1. divided by the change in the class size.TeslScore change in ClassSize . which concerned changing class size by two st udents per class.teacher ratio) by Iwo.1 The Linear Regression Model The superintendent of an elementary school dislrict must decide whether to hire atlditional teachers and she wants your advice.e.. But hiring more teachers means spending more money.ssSi~r = change in TestScore _ ..
3).•.3) 10 the superinte ndent. she te lls you Ihat some tbing is wrong with Ihis formulation. including each district's unique characte ristics (for example. Equation (4..'iI~t. we sim ply lump all these "other [<lC tors" togethe r and write the relationsh ip for a given district as Tes(Score = f30 + f3CltJS. f3CIll»SI~t is the slope. One approach would be to list the most important factors and to introd uce the m explicit ly into Equation (4.'il~~ x CIt.. but you also would be able to predic t the average test score itself fo r a given class size. For now.4.3) will not hold exactly fo r all districts. A lthough this discussion has focused on test scores and class size. She points oUl th3t class size is just oDe of many face ts of elementar y educalion. b<lckground of lheir stude nts. how lucky the students were on test day). She is right.'~:~ X ClnssSize. 1) is the definit io n o f the slope of a )t raight line relating test scores and class size. Instead.4) ThUs.3) (an idea we re turn to in Chapter 6). so it is useful to introduce more • . According to Equa tion (4. This straight li ne can be writte n TesrScore = f30 + f3 C/<U.3) whe re f30 is the intercept of Ihis straight line. that re presents the :weruge e ffect of class size on scores in the populat ion of school districts and a second component that represents all other factors. as before. A versio n of this linear rela tionship that holds for tile" distric! must incorpo ra te these othe r faclors innuencing. not only would you be able to determ ine the change in test scores a t a district associated with a change in class size. if you knew flo and f3cw. (4.4) is much more geneml. f30 + {JC/lu. t he leSI score fo r the districi is writte n in te rms o f o ne co mponent. a nd Ihal lwo districts with the same class sizes wi U ha ve dilfe re nttest scores for many reasons. a nd.'\i~~ x C/assSize + o lher factors..JsSize. it shouJd be viewed as a statcme n! a bout a re la tio nship that holds 011 overage across the population of districts. (4. the idea expressed in Equation (4.of course.. howeve r. When you propose Equatio n (4. test sco res. t The linear R egrenion ~r 113 Equation (4. One di<. quality of their teachers. for all these reasons.t rict mieht h ~ te r teachers differe nt test scores for essentially sons having todowith the performa nce of the individual s\udc ntson Iheday ofthe test.
.. = f3a + /3 IX."" district. (45 ) . be the average class SILt? in the . 14."" dislrict. and let fI . be t he average test score in the . Then Equa tion (4 . let X..4) can be written more generally as Y. Suppose yo u have a sample of " district". Let Y. denote the ot h ~ r factors innuenci ng the t C~1 score In the i 'b district. .114 CHAPTfR 4 linear Regression with One Regressor ge neral notation.
Because of the other factors that determine test per{onnance.1 Yj = fJo + {3 !X ... Now return to your probl em as advisor lO the superin tendent: What is the ex pected eHect on test scores of reducing the student.so lest scores for thai district were worse th an predicted.:.wr.linear regression model is 'Jl'I~ TERMINOLOGY FOR THE LINEAR REGRESSION ~jr. the regres.teacher ratio by two stu dents pe r teacher? The answer is easy: Th e expected change is (2) x f3 C/IlSSS. .1 The Uneor Regression Model 115 ~ .1 do not fall eX<lctly on the population regression line. . = . which means that districts with lowel" studentteacher ratios (small er classes) tend to have higher tesLscores. the value of Y for dbtrict ttl. The intercept 130 has a mathe matical meaning as the value of the Y axis intersected by the population regres sion line. MODel WITH A SINGLE REGRESSOR I..ze' But what is the value o f {3 ClimS.. . runs over obs~rvations. or simply the left·hand varit/bll!. the hypotheti cal observa tions in Figure 4. X j is the illdepcndell1 variable. Yi is the depelUlf:1Il variable. it has no realworld meHning in this example.1 summarizes the linear regression model with a single regressor fo r seven hypoth etica l observations on lest scores (Y ) and class size (X). and III is the error Icrm. For exam ple. 1"<" . the regress(I1ld. where Ibe subscript... .4.. The popu · lation regression line is the straight line f30 + " IX. i = 1.. bul. Y2 is below the populalion regression linc. . and U2 < O. ii i' is positive. Figure 4. . 111is mea ns that test scores in district I#l were bette r than predicted by the population regression line. . .:r? ~ • . . . Y\. as mentioned earlier. 11 .1'J" II KEY CONCfPT :_ _ 4. or simply the righl·hand varia/)/e: f:3o + fJJX is the population regression line or poplI/miol! regression funclioll: f10 is t he inlercepl of the population regression line: /31 is the s/opt of the population regression line. In con· lTasl. + U i .. so the e rror lenn for that dist rict. The population regression line slopes down (f31 < 0). is above the population regression linc.
LBo + PIX).nown popula tion n.her Ratio (Hypothetical Dotal Test sco re (Y) 7j~) The =1Ieq>Ioi >how. The population regression line is/Jo + PIX. n.. Although the populalion mean earn ings ur.2 Estimating the Coefficients of the Linear Regression Model In a practical situa tion..(~I . i\)O li nc relating X (class si. lhe intercept 1311 anlll>lopc f3 1 of the popula tion regression line are unknown. unknown.:c) and }' ( tcst score . we can estimate the populat ion mean s using a random samph: of milk nnd fema le college graduates.. 1 Scatter plot of Test Score n.lIed from college. we must usc data to el>timate the unknown slope and intercept of the popu· lation reg ression line. point to the popukrlion regreuion line is Y. is the average earmngs or the feln. hypothetical observations for s. Thert: · fore.11 6 CHAPTER 4 linear Regnmlon with One Regressor FIGURE 4 .. (or example... """ (... such as t he applica tion to cI .fllf example.' the slope or the unk. ).:. Then the natural estimator of the un kno\\ n popu lution menn earnings for \~ ome n. observation.nuJ uatcs in the sample.grt:s. veriicol distance from the jl. 'rne same idea extends to the linear regression model. SUPPOSt. We do not know th\! population va lue of i3ulW$.even school dislricb.Y I) " " • • • <>Ill 6:!lI 6001L_ _ _~_ _ _'_ _ _'_ _ _J 10 15 20 25 )0 Stutl entteac h tr rauo (X) 4.. for the .ISS size lind tcst seorel>. whieh is !he population error term u. But jus t as it \\a.'lt" college g. possible 1\) lc:un about thl! population mean u~ing a sample of data drawn from thai . This estim ation problem is similar to others you have faced in statistic. (XI. StvdentTeoc...l you want to compare the mean earning"S of men and women \\ ho recenlly grad u.
Ii . only 10% of districts have SiU· dent. One way to draw the line .ould be to take out a pencil and a ruler and 10 ·'eyeball"' the besl line ).] Percentile (median) ""' 60% "" 10.. __ '.7 679.9 90% 19.2. Despi te th is low correlation.3 (that is.9..6 6652 1.'~' based on these data. SO is it possible to lea rn about the population <..3).1 popu]'Hion.4.3 197 ZIl.0 variables.ul£ 4 ._ .Ivcrllge studentteacher rati o is 19.Teacher R atios and f jftflGrode Test Scores for 420 K8 Districts in Califomio in 1998 Av" rQg. then the sl o~ o f this line \l. which is Ihc no .hown in Figure 4. "hou ld you choose among the many possible lines" By far th e most common way is 10 choose the line thai produces th e ··Ieast squares" fi l to Ih c~c data.mkr using a sam ple of datil . the districlwide student teacher rauo.l MilA 21. then. it is \ery ullscien ti fic and different people will creale different e!)tima lcd lines. The test score is the districtwide average of reading and math scores for fifth graders..ould be an estimate of {3CIIU><. I 654.. while the district 3t the 901h percentile has a studentteacher ratio or 21.U· ~llc ~ r o .9 students per teacher.. if one cou ld somehow draw a srraight line through these data. The . • . k h Un 10 ' t hu l sample. . While this method is easy.1ope {3( 't.l. The IOIIa percentile of the distribu tion of the stud enlteacher ratio is 17.9 19. • . there arc other determinants of test scores that keep the observat ions from fa lling perfectly aiong a sl raight line.23."Ores and the ~Iudenltcache r r:llio is ::. A scatterplol of these 420 ob~ rval ion s on test M. How.. Class size ca n be measured in various ways.. The data we analyze here consist of test scores find class siz~ in 1999 in 420 California school districts thai ser\~ kindergarten through eighth grade. The samph! correlation is 0..teacher r.9 re'>t ~Lore 630.ou cou ld. .1 11.4 MI}..3tios belo w 17. JIJ. Although larger da~~cs in this sample tend 10 have lower test scores. indicating a weak nega live relationship betwee n the 1\1.6 ~ l udcnls per teacher and the standard deviation is 1.• .. The measure u~t!d here is one of Ihe broadest. 1 Summary of the Distribution of Studen..that is. to use the ord inary least squares (OLS ) estunotor.2 E~rimoting the Coefficients 01 the linear Regression Mode! 117 T. e~c d<l ta arc described .5 ""'.6 640. "i1UJ<:nt~ leather ratio Standard Deviation '0% "" 18.Ita describel n Appendix ". the district divided by Ihe number of II is.
•• .. ... In fact..2). . bo in expression (4.... As discussed in Section 3...6) is the extension of the sum of the squared mistak es for the problem of estimating th. .~I~ .. :'. .X.o (California School District Data) Dolo from .20 &O. •••'.: . • .. Th us.2).. 1.• 118 CHAPTIR 4 Linear Regreuion with One Regressor fiGURE 4 . that is.Teochef' Rat.. .. Y.ttl ..~. . (4. ll1e sum of these squarcd prediction mistakes ovcr all n observa tions i~ . • Of''' "..I..'..mJ2 among all possible eSlima tors m (see expressio n (3_ ..6)..:... ..6) and the two problems nre identical except for the different notalion (m in expression (3.ntt each er ratio . 660 640 (.. predicted using this line is bo + b l X ... the sample average. .' . . Just as there is:1 unique estimato r.. " .:"' . Let ho and hI be some estimators of I3u and f3 1• The regression line based on these csll mators is ho + b l X...h..(bo + bI X .. _ .23. l"'. whe re close ness is mel· sUTed by the sum of the squared miswkes made in predicting Y given X...' 10 ____~~L~~____~~~~~~~______~~ 15 2() 25 30 Stud e.t.'.. is 0.... T.~ •.' . .) == Y/ . Y. that minimizes Ihe expression (3. so the value of Y. . 2)J The OLS estimator extends this idea to the linear regression model. ' . to 1 '.' 'i.. '. ... " '... .."$.420 Coli· fernio ~hOOI diitrim...1 " L (Y. . '). Y minimizes the 10lal squa red estimalion • mistakes L~_ I( Y..st SCO ff no There i~ 0 week negative relotion~ip 700 ""') between the student teocher rotio one! leU scores: The sample correlo~oo . then III does not e nte r expressio n (4.I !..6» ).e mean in expression (3. is !.6) The sum of the squared mistakes for the linear regression mode l in expres sion (4. '."I.2 ScatterpIot of Test Score v.. .he least squares estima to r of the popu lation mean.2). .••Q: ... '.·:t"' .l. ". . .:f... ':. so is there a unique pair of estimators o f 130 and f3 1 that minimize expression (4. . " " " The Ordinary Least Squares Estimator T he OLS est imator chooses the regression coefficients so that the estimated regression line is as close as possi ble to the observed daiS.. ..bo  b.. the mistake made in predicting the j1 11 observation is Yi .. if there is no regressor.) .. Studen.j'... ..\. £(Y)./) lX i..
i = 1.6) are ca lle d the ordinary least squares (OLS) estima tors of .v.2 Estimating the Coefficienb 01 the lineor Regre~sion Model 119 ~ THE OlS ESTIMATOR. PREDICTED VALUES.X... are y. derived by min.. These are estimates of the unknown true population intercept ({3o). el.81 is denoted P I' ~c OLA regress ion S tine is the straigh t line constructed using the OLS estimators: f3Jl + 8. =1 PO= Y .81 by tTying diffe rent \'alues oC bu and b 1 repeatedly until you find those tha t minim ize the total squared mis wkes in expression (4. L (Xi .80 is denoted 130. . and the intercept P il are 4. .t X)(Y..10) 30 ra tio j'i = . . however. You could compute the OLS estimators f30 and . II. .9) (4. . then b1 :ept fo r the IS there is :t unique pair • .im iz ing ex pression (4. II. YJ . This method would be quite tedious. slope (~I)' and residual (iii) are computed from a sample of " o bserva tions of X. dUl l strea mlin e the calculation 01" the OLS estimato rs.hu 'alions is (4. These f()fmulas arc dcrived in Appendix 4..2.2 (4. .2)].6) In cxpres )roblem of m. slope (13 1) .limatcd . AND RESIDUALS KEY CONe"" iI The O LS estima tors of the slope p.. Fo rtuna te ly there a re form ulas.6) : they are the le ast squares estimates.PIX' .7 ) 1'1 f'i _ i~l " L (X. The OLS estimator of . = . and tbe OLS estima to r o f ..6) usi ng calculus. . Le i /}o hese esti = The estimat ors of the intercept and slope thai minimize the sum of squa red mistakes in expression (4. f30 + {3\Xi• Yj i == I.5 IS mea The esti mated intercept ({3o). "' II (4 .X )! Y) _ SXY . Y. and error term (uJ s estima 'ilimation (3. The 'ili bet ween Y/ a nd its predicted resid ual f' a1ue: Iii "" Y.8) The aLS predicted vulues Vi and residuals iI.. These form ulas arc im pl emented in virtually all sta tistical and spreadsheet programs. and Yioi = 1.}J. The OLS formulas and terminology are collected in Key Concept 4. \ (4.8 1.. .80 and .. OLS has liS own specia l notation a nd terminology.2 A _ 'lXi·ThuS.4.
..Teacher Ratio When OLS is used \0 estimate a lin. 1 2 X ( .2. ....PIS p~r tcacher = flarger classes) is associated with poorer pe rformance o n the test IS now possible 10 prediclthc di'Slriclwide lest score gl\~a lu e of tht. ~ :: 6989. .. pe r cl ass is 99 uyepve 2§SOCIiIIC" with a n in£[CilW in I "S I scows pf 4..teacher ratio 10 le~1 scores using the 420 obse rvations in Figure 4.flo (>(..\ISCOrt' in E4u31ion (4...teacher ratio by ont' ~tudcn t pcrclass is.. .. !he e1limoted regres sion predim IMI las! ~e~ will increcne by 2.. associated with a decline in district wide l~lIt score:.9 ... . . Accordingly. studenH cachcr ratio.~ Mil '.7) indicalc ~ Ihallhis is the predicted value based on the O LS regression line. ..': ...'It '.28 ". 21\ " ~ (.~. ' 'S ' he ' mere 5"':!(.. . 28)1 "0 $ :egaliiC sls. For example. .._ .ion line for these '20 II..~.... on average.. j J "1 ~ •• .2.. fl:?l) .(X•. observations is TeslScore = 698..56 poi nt..!..2.tht: FIGURE 4. for a dbt rict with 20 student:.28 means that an increase in the st udent.'C'1": ·. by 2..11) where Tes/Score is the ave rage {eM score in Ihe district lind STR is tht: s tudent.{I . 28 points o n the test. .. r ' ." ."..... .....p belween te~t KOfe~ and the student teacher ratio.'C .. 'kdecrcass ip .. TIle slope of 2 .. ~ (4. . ' . ' 5111 ~ '!. "'" .. ..be >'II"cR'dCa r h gr prje AI' 2 st lldent.3 The Estimated Regression line for the California Data Tell .:.._ _ _ ~=~_~_~. per teacher.28 points. " (..1.~cor(' The emmoted regres sion line show$ a 720 71ltl negotive relatian~...228 X STH.. ' " 'I D" .. • '..... '_~ '" • "" .. •• ..." • ' •• . ~" over T. figure 4. the OLS regrc. 111e symbol . .·..: relu ting the student.: .. .• ".?~.3 plots this OLS regress ion line superi mposed over Ihe scatterplolo[ the data previo usly ~ how n in Figure 4. .. tf doss !>izes fall by 1 stvdent.. '..lhc estimated slope is 2.\.• 120 CHAPUR 4 linear R egression with One RegresS()( OLS Estimates of the Relationship Between Test Scores and the Student..Ieacher ralio.18 and tht: estimated inter~pt is 698.9..1 ' II ' ..H':' _ '. ". ...' 2"" ~( J Slud"ntleacher nuio ... ~I~_~_ _ _ ~~_ _ 10 15 ~~:'_.
~s perform . Recall that she is contem plating hiring enough teach· ers to reduce the studentteache r ratio by 2. According to these esti mat es. From Table 4.11). the smallest studentteacher ratio in these dala is 14. a decrease in class size t.5.28 X 20 = 653.5. Suppose ht!r d istrict is at the median of the California districts. they are predicted to increase to 659. the median studentIeacher rat io is 19. based on their studc ntteacher ratio.9 . cutting the s tud en t U~acher ratio by 2 is pre dicted to increase test scores by approximately 4. Presenting results using OLS (or its vari ants discussed tater in this boo k) mC'lrlS that you ure "spea king the same language" w •• • .on Model 121 predicted tcst score is 698. Why Use the OLS Estimator? There are both practical and theoretical reasons to use lhe OLS estimalQrs f30 and ~]' Because OLS is the dominant melhod used in practice. and as the figure shows. ThUs. wou ld move her studenHeacher ratio from the 50th percentile to very ncar the 10111 percentile.::ording to Equation (4. BUlthe regression line docs give a prediction (the OLS prediction) of wllat test scores would be for that district.4 . this prediction will not be exactly right because of the other factors that determ ine a district's pcr formance. How would it affect test scores? Ac. 654. These data con tain no infor mation on how districts with ext remely small clas.2 .2.7 to 17. we return to the superintendent's problem. such as reducing the stude ntteacher ratio from 20 students per leacher to 5? Unfortunately. This is a big change.1 .culling the student. from 19. and sht! would need to hire man)' new teachers.11) would not be very usefu l to her.1. A reduction of2 Students per class. it has become the com Illon language for regression analysis throughout economics.6 points: if her district's test scores are at the median..hat would place her district close to the 10% with thc smallest classes would movc her test scores from the 50tll to the 60 Ih percentile.teacher ra tio.3. 'Ihis regression was estimated using Ihe data ill Figure 4. at leasl. Is this improve mentlarge or small? According to Table 4.7. Is this estimate o f the slope large or small'! To answer this. the estimates in Equation (4.1 . Of course.teacher ratio by a largc amounl (2 students per teacher) would help and might be worth doing depe nding on her budgetary situ· ation. absent those other factors.7 and the median test score is 654. finance (see Ihe box)..2 E stimating !he Coefficients of the Linear Regrew. but it would not be a panacea. this improvement would move her dist rict from the median to just shan of the 6(Jlh percentile. so these data alonc lire nOI a re liable basis for prcdict ing Ihe effect of a radical mO\'e to such an ext reme ly low student. and the social sciences more generally. Wh at if the superintendent were contemplatin g a far more radica l change.
~hC)ul d be po<.70 U.~ as other economi!ots and statisticians.fAthe OLS _ __ .Ire an. estimators also have d esirable theoretical properties. . a Mod buu~ht On January I (or 5100.01 Y a!o un est imator l. H...ur..es.lhc Iff>.\1anagcmcnt (\\iI~te di~ptJ.5% p!u. in\'e$tmcnl.Ill spread !ohcc t a nd s ta tistical software packag.xces. consumer producl!I (irm~ like Kd mCil~u re th¢ risk of a siock i~ not by i1:.. ~tocks L o\\" ri~k ~ix .u. wilh [Oil' {h: riskier technology hal e high but rathe r or its 1·(11 /lria/ICt' ""jlh the market.ll· ogous 10 the de~ira blc properties.locb on in\'cslment Those /3's I~pically fiml Web ~iti:r. CAP.R / ). or cx~('ted The "hcla" of :l qock has l:'Iecome a workhorse of tho: iD n:.~ r('tu m on a portfolio of iJlI a..\f.. for hundred5 of :.:. can he reduced by holdin).. 0. ken 10 be the ra te of interest on d~'bl. The OLS formulas are built into \'irtual1\" ..Th('" f<. thc CAPM says tbat R R.50]/$JU(l 7. studied in Section 3..W! di\l\knd dUl'ln~ the year and ~lU on Ikccmt>cr 31 [or$J05 .! other stocks in a "portrolio" in olher word~ by diversify· illg you r fin ancial hllidi ngs.R. government According to Ihi.quipmcnl ret.. ho wcver. . The QU. (4.17 2. Sm.lhe c:I:pcct.. and you C oblBin ei>li· Dn mated fJ·~ risk·free. rolio and thu s eomaods a higher C\rec1cd e:\fX:>:.."peclCd Compony return on an asset is pro].!!t portfolio. In 0.al) Spri nt Nextd (tdecommUntC3lllms) nam6 and '\ohle (hook retaikr) \lim~fl (software) Bcsi Bur (electronic I. making OLS easy to usc.12) whcre R". ri~k ier than the market pMt U linancial incenti\'c to tak e (l ril.O~ The capital ai>.... Thus thc exc... .! . m I.R. Solid di ffcre nll~ .S. 1. . H. the risk·rree return I~ often t.. These .. on R". ~·url· logg have Stock:!.l'hort·tcnn U. M uch or Ihlll be mea~ured by il~ OLS re!!ressioll of the actual CXCC~) return on the stock again~t the actual excess ret urn on a broad markcl index.:! 1.7K I.lurn on (In in~C!I\m~\l( is the chRnge in itl! pnct any payout Idil'ldend) frum !he ill" estlllent ItS" pcl ccnillge of ill> initial priet:o For c.itin:.::>s retu rn..ould ha\(.! c x~cd the rcturn un a S<1fe.e! pncing model l('APM ) formal· lZes Ihh idea.:d return l on a ri~ky inve~lmcnt.\umplc.· ~ion of R .65 U. CAP~l..l5 2.ailahlc as~ts Thai i!\.s. R . .' a relOm (If H [(S105 $1(0) + $2. (3(1< .:0nlra5t.eRegres~ A fundamenlalldca uf modern fina nce is that an inveslor nc:c:d~ a stock with iI {J > I 1<.. arc c~limated hy At fi r$t it might ~hould ~ecm like the ri~k of a stock I'ariance..\. "..:! mar ket pori Kellogg (breakfasl cereal) \Val.. is the: ex~C'tcd rdum on Ih.Mart (discount retailer) Waste.f the population mean....k.llkr) Ama?Qn (online retailer) . The table below gi\'cs estimate:d p's for stocks. like owmng stock in a company. This me~l1I~ Ihat thc right way to (mce U. r• In practice..RI' on a risk). expected i...)mtionul 10 Ihe (the ·'nlllrke t portfolio~ ). 3 stock "'illl a fJ < I hil~ bs risk Ihlln Ihe ma rket ponfolio lind therefore has a lower expected excess rcturn than the mar!. ret urn.lment industry... btimoted IJ O.O. Under tbe 3$SUmplions introducted in Section ..122 CH APTfR 4 lineorRegrenionwitno. Iht..' t"1)l!frlCient in thc population rcgre:. mu. imcstmclII.. which then pmd 11 $2..M folio and /3 c. According to the l' xce!:>~ e".
4. TheR2 The regression R2 is the (raclion of Ihe sample va ria nce of Y.~ The eXI..j')'... . . this efficiency result hold::. 3 Meonlr~ of Fit 123 C:. . "" ~ + iiI' (4. thai is expbined by X.5 ..ures Ihe frac tion of the variance of Y.:diclcd by) X . The slaodl"lrd error of the r. I (4.1l ) 10 the sample vari ance of Y. Does the regressor account for much or for little of the varhuioll in the dependent variable? Are the obscnation.. ):. the R2 is the ral io o f Ihe sample \arianee of Y.1. R1 and the standard e rror of the regression measure how "". and the total s lim o(squllres (TSS) is the sum of squared dc\iulions of 1'.15) Equation (4 . 14) uses 11ll: (.. and furtherdi~ussion of thi~ result is deferr~d until Section 5.timator is also efficient among a ~crtain class of unbiased estimators. Ihe R2 can he written as the ratio or the e xplained sum of squares to the total sum of square. or arc they spread out? 11lt. explained by (or pn.3).Y)2 .::gres sion measures how fa r Y. typically is from it5 predicted . as Ih e sum of the pre dkled value. Y.ell the OLS regression line filS the dala. from its 8\'Crage: In Ihis notalion.4.el thaI the sllmplc average O LS predicted valuc equals Y (provcn ill Appendix 4. .2) ~l1ow us to write the dependellt variable Y.) " TSS = ~(l~ . from their avcrage...lhe Rl r anges between 0 and I and mea. M.3 Measures of Fit H aving estimated a !ineur regression.· 1 (4. ESS = ~ (Y. The definitions of Ihe predicled value find the residual (~ce Key Concept 4. under sollle IIddilional special conditions.timator is unbiased and consbtcnLThe O LS e::..\. plus the residual i : u Y.llhemalically. 'alue. you might wonder h<. tightly c1u~tered around the regression linc. however. " .Jained sum or ~qullres (ES~l is thc sum of squared deviations of the predicted values of 1'.lw well that regression line describes the data.
is the sum of Iht. In genera l. For example. so Iha t ESS = TSS a nd R2 = 1. The s um of squa red residuals. tbe R2 does nut take 00 the extre me values of 0 or 1 but fa lls somewhe re in be tween. if XI expla ins a ll of the variatio n of Y..he S<. In this case. a nd the pred icted va lue o f Yi based o n the regressio n is jusl the sample average o f Y.16) A lte rn'Hively. A n Rl near 1 indicales that the regressor is good at predicting Y" wh. are the same.SS R.. In coolTas!. . t l i = 0).1 8) Fina lly.3 th at TSS = ESS . The units of 1(.. Thus the RZ a lso can be a expressed as 1 minus the ratio of the sum of squared residua ls L tbe tOlal sum o t squares: R2 =t _ SSR TSS' (4.that is. then Y. i~l (4.ile <I n R2 nel:lr 0 indic a t~ " th at the regresso r is not very good at predictin g Yi .in dolla rs. nOI e xpill ined by X i. the R2 can be wrin e n in te rms o f the fraction of the variance o f Y. the n Xi expl ains no ne of tbe va ria tio n of Y. ze ro (that is. Ihe magnitude of a typical regressio n e rro r..)ua re of the co rre lati o n coeffi cient between Y a nd X The R2 ranges be twe e n 0 and 1.124 CHAPTER 4 linear Regreision with One Regressor The R1 is tht: ralio of lhe explained sum of sq uares to the total sum of squares R2 = E SS T SS' (4." sq ua red OLS residuals: SSR = " 2. The Standard Error ofthe Regression The sta nda rd e rror of the regression (SER ) is an es timator of the stan da rd deviation of the regression error iii . if th e units of the depen de nt va ri able are dollars. Ii. O1ed sured in the units of the dependen t variable. and Y.17) It is sho wn in A ppendix 4. = Yfo r a ll i a nd every residual is . thus the R2 is ze ro. so the SER is a measure of the spread of the o bservations aro und th e regr ession lin e. the n the SER meas ures the m agnit ude o f a typi cal devia tio n from the regressio n line.. th e R2 of Ule regression of Yon the single regressor X is !. If ~ l = 0. or SSR. the expla ined sum o [ sq uares is ze ro <lnd the sum of sq ua red resid uals e q uals the to ta l sum of sq ua res.
3 MeowrM 01 Fit 125 Because the regression errors U I••. Because the standard deviation is a measure of spread. by n ..) When n is large.051 means Ihal thl: regressor STR explains 5. relating the standardized test score (TesIScort!) 10 the student.I 11 1" = nSSR2' _ n U (4. but much varia lion remains unaccounted for.. Tbe form ula for the S£R is SER 1 ~. .1 9) is similar to the fomm la for the sample standard deviation of Y given in Equation (3.un arc unobserved.thc ScR is computed using their sample counterparts.teacher ralio e xplains some of the variation in test scores.4. Figu re 4. The formula for the SER in Equation (4..19) whe re the form ula fo r uses the fact ( proven in Appendix 4. This large spread means that predictions of test scores made using only the sludent. The R2 of 0.11) reports the regression line. What shou ld we make of this low R2 and large SER? Th e fact that the R Zof this regress ion is low (and Ihe S£R is large) does nOI. it.. The reason for using the divisor IJ . whereas here it is n . si Application to the Test Score Data Equation (4...2. (The mathematics behind lhis is discussed in Section 5.1 %.or hy 11 2 is negligible. where sJ = =2 i.6.6 means that there is a large spread of the scalterplot in Figure 4.3 around the regression line as measured in points on the lest. = So.7): It or corrects for a slight downwa rd bias introduced beca use two regression coefficie nts were estimated. whe re the units are points on the standardized lest. As the sc3 uerplot shows. imply Ihilt this .so the divisor in this factor is 11 .. . and the S£R is 18. the Stll dent. estimated usi ng the California test score data.(.2 here (instead of II) is the same as the reason C usi ng the divisor II . The R2 of this regression is 0.3) Ihat the sample average o f the OLS residuals is zero. The S£R of 18.7) is replaced by it" and th e dh'isor in Eq uation (3.teachcr ratio fo r that district will often be wrong by a large am ount. fil e difference between dividing by n. two "degrees of freedom " of the data were lost .1 % of the va riance ot" th e dependent va riable TwScore.3 superimposes this regression line on the scalterplot of the Tes/Senre and STH data .1.2. o r 5..7) is i " .051. This is called a "degrees of fn:ed om" correction.6.Y in Equation (3.2.7) in Section 3..teacher ratio (STR). the SER of 18. because two coef· ficients were estimated Iflo and (31).6.6 means that standard deviation o f the regression residuals is 18.I in Equation (3. except that Y .l.the OLS resid uals u\. by itself.
"nlCSC fa ctors could include difrcrcnces in the student body neros)' districts. Assumption #1 : The Conditional Distribution ofuj GivenX j Has a Mean of Zero The firs t least squares assumption is that the condit ional di~tribution of u.) .im plcr notation. > 0) and so nw times to worse perfe rmam:e (II. . and asserts that the~' otner fuc tors are umt' laled 10 X.• "6 CHAPTU 4 linear Regres~ion with Ooe Regressor regression is eithe r "good " o r ·" bad.. As shu\\n in Fi gure 4. = x) = 0 or. population regression line at X . someli ll1~') these olhe r ractors lead to hetter perfonmnce th.)rncwhat . E(II .Hlu understand ing these assllm pt iom is essential for understa nd ing whe n O I. 4. f30 and (3 1.. l lley do. howe ver.20 and.. a t a give n value of class size.lI1 predicted (II.Ini tia ll y these assumptions might appear abstract. centerc on l' .e iven = 20 the mean 0 tel!) ~ 7SW ' 9 Flgms ] .. or luck on the leSI. <II I ere nlly. .ero: stated mathcmutically.pre dictionisri~h t . differences in !>Chuo! quality unrclalt'd 10 the Slu dcnl.The low R2 and high SER do not tell us what these factors arc.lIc estimator (I r the un known regression coeffi cients.S willand wi ll notgiv~ useful estimates of the regression cocfficknts. the distribution 0 iI. but Ihey do indicate that Ihe studcnHeac her ratio alone exp l ili ll~ only a small pari of Ihe variation in leSI scores in thc!oC da ta . ! con 1110na on " .ion line. have natur~11 illtcrprctMions. has a mean of zero. and thL' e rror teml II.:1 to differ from thc prediction hased on the population regres. more gene rally. in SI.I X. in the sense that.4.: r ratio. . The populat ion rcgrc~si on is the relationsh ip tha t holds on a verage bet ween class size and test scores in the populat ion.11 is illustmted in Figure 4. re prese nts the other factors that lead test sco res at a given diM no. at othe r va lues x 01 : j c . < 0). given a . 1111. say 20 stude nts per class.'alue o f X" the mean of the distributiun 0 f these uther factors is 7cro.4 The Least Squares Assumptions This sect ion presen ts a sel of three assumptions on the linear regression modd and Ihe sampling scheme under wbich OLS provides an appropri.4.o. bUI on avera c over the popu lation the. lll ol h er words. given X. be in!!.Icl\ch. £(uI IX. ' < In' .lit o f 7. This assumption is a form al mathem:ltical sta tement abllul the "other factors " contained in II.4 Ib is is shown as the distrihution of II .." Wha t Ihe low R2 does te ll us is Ihal oth er important factors innue"ce lest scores.
Instcad. (a mathe matical proof of thi s is left as Exercise 4. which in tum implies L the conditional mcan o f II givcn X is zero. 25) Olstrlbu\Jon 01 Ywl!enX = 25 / fj(J . X is not randomly assigned in a n experimen t. hal In obsen'ationa l da ta. Y is distributed around the regr~sion line and the error. has a condilionol mean 01 ~o for all values of X .) 0 i~ o. III Fi '" ." The figure shows the conditional probability of test scores b. Random assignmen t makes X find II indepe ndent. AI 0 givoo yolue of X. given X.IX. The conditional mean ofu in a randomized controlled experiment. WheLher this assumption holds in Ii givcn cmpirical appli cation with observational data requires earefullhought and ju<. th e assumplion that E(III IX. v .PIX (.) = O.4 The Conditional Probability Distributions and the Population Regression Une T~st score 7:!1I 7~1 Dbufbution 01 Y wIll'l1 X'" 15 (!l'>tJ / (Ylh IS) DlSIribution of Ywhtn X • 20 N >I I / E(Y!X" 20) ((f!X". 20...6) .4 The least Squares Assumptions 127 fiGURE 4.4(1 6~1 f. The random assignment typo iC<. • .Illy is done using a computer program thit' uses no information about the subject. The mean of rhe conditionol di~lribvtion 01te1l scores. is the population regrenion line flo . n IX.' conditional mean nf Y.1n be hoped for is that X is fll if randomly assigned .nt to assuming. . Ih.ubjccl. Tn a rdndomized controlled experiment.lgment . StuJC'ntlea c her nliQ . e nsuring that X is dislri huled independently of all pcrona l characteris tics of the <. g llfOfl the Uudenlteochet rolio.and we return to this issue rc pea tedly. a nd 25 slvdC11h. suhjccls are randomly assigned to the treAt mcnt grour (X = 1) o r LO the control group (X = 0).di~lrich wirh cla~~ $iws of 15.4.fW..:qui va1e. As shown in Figurc 4.1 the popul ation regrcssioll lin e is Ihl. f {YjXj..4 . the best thai c. in thc precise sense that E(II.Y .{flo + (jIXI.
if Xi and 1/. . and imagine drawing a person a t random fro m the population of workers.n are independe nt lv a nd ide ntically distributed ( i. 1f .5 (Key Concept 2.i .. II ..d. O ne example is when the values of X are not drawn from a random sample of the pop' ulation but rather are set by a researcher as part o f an experiment. then il m us t be the case that £(u. For example. The i.. they are i.d.i .• 128 CHAPTER 4 linear Regression with One Regre»af Correlation and conditional mean .). If she picks Ih e techniques (Ihe level of X) to be used on the J1h plot a nd applies the same kchniqu e to Ihe ph plot in all re petitions of the expe riment. does not change (rom o ne sample to the next.). For example.d.d. As discussed in Sectio n 2. that is. Because correlat ion is a measure of linear assa cia Lion . then (XbY. It is the refore often convenient to discuss the conditional mean assumption in terms of possible corre lation bc{weell X. If ther are drawn nt random they are also distributed independently t"rom one observa tion to the next. Thus. and U r If Xi and Il i are correl ated . O. 'Ib e results prese nted in this chnph:: r . a nd II. necessarily have the same distrihution.d. let X be the age of a worker and Y be his or her earnings.27)].. Thus X . the conditio nal mea n assumption £(u. the n the conditional mean assumption is violated. a re uneorre latcd.. then (X"Y.i.observa tio ns on (X" Yj).). . this is a statement about how the sample is drawn.. are correl a ted . so the sa mpling sche me is nO I Li. Recall from Secti on 2. i "" 1. 1Xi ) is nonzero. then the IWO ran dom variables have zero covariance and thus are uncorrclated [Equation (2.) across observations.) sample of II workers is drawn from this population.d. 3 tha t if the condit ional mean of one ra ndom va riable given a nother is zero. howe ver. X and Y will take on some vaJues). Assumption #2: (Xi' Y. Not all sampling schemes produce i.d. . That randomly drawn person will have a certain age lind ea rnings (that is. s urvey dala from a randoml y chosen s ubsel of the population Iyri caUy can be treated as i.i . .. If the obser vations are drawn hysimple random sampling from a singJe large popula tion .5) . . suppose a borticuhuralist wants to study the effects of different organic weeding methods (X) on tomato produc tion (Y) a nd accordingly grows d iffe rent plots of tomatoes using different organic weeding techniques.• n Are Independently and Identically Distributed The second least squares assumption is that (X"Y.. the conditional mean of u i given Xi might be nonzero.) ::=.i. this implica tion does not go the other way: eve n if X.n. i = 1.J i := 1.. However. IX j) = 0 implies tbat X i and u. then the value of X. assumption is a reasonable one for many data collection schem es. . 3fe uncorreialed. For example. is nonrandom (although the outcome Y j is ran dom). i = I•. /I aTe i. or corr(X.i.
d .i.4 The Leost Sqvares Assumptions 129 developed for i. tbe level of X is random and (Xi' Yj ) are i.i. Ti me series data in trod uce a set of complications that arc best hand led aft·e r devel oping the basic lools of regres~ i o n analysis.IJ. modc rn experimen tal pro tocols would have the hort icu lt uralist assign the level of X to the differe nt plots using a computerized random number gencrator. Large outliers can make OLS regression results misleading. observations with values n/" X i andlor y/ rar out side the usual range of the dataare unlikely. ~ "'i. We encounler ~ d this assum ption in Cha pter 3 whe n d iscussing the coMistency of the sample variance. If Y1. such as a lypographicaJ error o r incor rect ly using differel1l uni LS (or different observations: Imagine coUC<:ting ncS. and the fourth mo me nt o f Y. 1 (Yi .d. then the law of large numbe rs in Key Conce pt 2. howevcr. TIlis is an exa mple o f time se ries data. ••.d. sampling is when observa tions refer 10 thesame unit of observation over lime.6 a pplies to t he average.9) Slales tha llhe sample va riance s? is a consistent estimator o f the population variance oi. The assumption of fini te kurtosis is used in the mathematics that justify the largesample approx imations to the dist ributions o f t he OLS test s ta tistics.i. where these data a re collected over tjme from a specific firm.. One source of large outliers is dalRen try errors. The ca. we might ha"e d ata on inventory lev e ls (Y) at a firm an d the inte rest ra te at which the firm can bo rrow (X) . For example. In this book. For examplc. This potential sensi tivity of O LS to extreme o ulliers is ill ustrated in Figure 4.5 using hypot het ica l data. This pattern of correlation viol a t e~ the ·' in dependence·· part of the i. Another example of nOIli. Specific<l lly. . for exa mple. y)2. the assumption that large outliers arc unl ikely is made mathe ma tically precise by assuming (hat X and Y have nonzero fi nite fourth mo menLS: 0 < E(Xn < (XI an d 0 < E( y n < co.d. is finite. ypiJ ill. • Y" are i. q uite specia l. assumption.4. regressors 3re a lso true if the regressors are nonrando m. A no ther way to sta le this assump tio n is Ihal X and Y have finite kurt osis.! ". they might be recorded fo ur limes a year (qu an e rty) for 30 years.i.d . thcy arc likely to be low next quarter.74.3 showing that is consis tent. and a key fe ature o f time series data is tha t observations fa lli ng close to each o lher in time are oot independent but rather tend to be correlated with each uther: if interest rates are luw now.>that is. When this mode rn e..ling \ S (If ~ o l!S l C j1h 'ro lll s? . a key Slep in lhe proof in Appendix 3.(s~ ~ u~). Assumption #3 : Large Outliers Are Unlikely The third least squares assumption is that large outlier. thcre by circum venling a ny possible bias by the horticuh urll iist (she might usc her favo rittl weed ing method for the tomatoes in the sunniest plo t).i.p Iplc.e o f a no nrnndom regressor is.'<perimental protocol is used . Equation (3.
5 The Sensitivity of OLS to large O\Itlien y 2(1)1) Th is hypothetical dolo ~ has ooe oudie. 2.The least squarc~ assumptions play t Will roles. drop the obse rvalio n fro m your data !. then you C<ln either correct th~' error or.re"H.(l 7(1 X data on the heigh t of sludents in meters.CI.'19 POSltrve reIotioruhip 17uO 141_1 • between X and Y.rC'i~ion lin. More generally.!:.. If you decide tiwi an out li er is due 10 a tlata entry error.... as a ffi <l the matinll matter.i:. some dist ributio ns h.. they ncce~sari ly have finite kurto<.i ~ is a plausible on" in man)' applications with economic data. The OLS regres ~ line estimated with me oudier ~ a stro.)\) hll~ C'xduJmg oot!J ~r I I _~J )0 (. Use of the Least Squares Assumptions The Ihrec Ica. the assumption of finite k unrn. but inadvertently rccortling one stutlent\ height in centimeters instcatl.• I 40 OLS n::.130 CHAPTER 4 linear Regression with One Regressor FIGURE 4 .: four moments. nnd Ih is assumption rules ou t those distributions. ir that is impossible.ucdl y throu ghout this textbook. und "L' return 10 thcm repc. Data entry e rrors aside..l ve infin llc fourt h mome ll t~. commonly used distributions s u(.. Class size is ca pped by the physical capaci ty of a classroom: the hcst you can do o n a st andardiLed test is to get all the qucstions right and the worst you can do is 10 get all the q uestions wrong. If this a ~sumr lion h old ~ then it is unlikely tha i statistical in fe rences using O LS will be tlominatr.3. ~ \ lgOU(hn . .:J by a fewobse rv:lIions. Onc way to fi nd outl iers i::.)1~ () . O lS 5~) n:j. BecamL' c\a~s size and lest scores ha ve a fin ite ran g~ . but the OlS regression line estima ted without 1 HNl MOO the ou~. to 1'101 your tlata.t squa res assumptions fo r the linear rcgrcs')ion model nrc .er shows no relationship.umm:l rized in Key ('nllcepI4. Still .'h as the noml<l l di stribu ti on h<l\'<..
.you should examine those out liers carefully to make s ure those observations arecu[ rectly recorded and belong in the data set I I and . It is also impo rtant to consider whether the second assumption holds in an application.":':..d. . If your data set comains large outliers. this largesample normal distribution JetS us deve lop methods for hypothesis testing and constr ucting conCidcnct: intervals using th e OLS estimat ors.:. K£Y CONCEPT : I Yi = f30 + f3\X. Although it plausibly holds in many crosssectional data sets.:>ump tion miglll no t ho ld in practice is discus. c<ln be sensiti ve to large outliers.2. n are inde pendent and identicaHy distributed (i. As we wit! see.'c 4.1 ha\'c ilfinite isump linatt:d Their fi rst role is mathematical: If these assumptions hold.:?.5 Sampling Distribution 01 the OlS Estimators 131 v THE LEAST SQUARES ASSUMPTIONS rAi\H"1I4'1!(?~ . ' . the firsll east squares assumption is t he most important to conside r in practice.3 1. 111 turn.. urge o utliers are unlikely: Xi and Yj have nonzero finite fourth moments.i = 1..'/"::.. the regres· sion methods devc loped under assumptio n 2 req ui re modificat ion for some applic.4. One reason why the fi rst leasl squares as. just like the sa mple mean.i.: . the inde pendence assumption is inappropriate for lime series data. where 4.. The third assum ption se n 'es as a reminder Ih<ll O LS. (Xi. c n f~ ryou ~t the ~ ono.YJ i = 1.5 Sampling Distribution ofthe OLS Estimators Because th e OLS es timators ~() and ffi 1 are computed fr om a randoml y drawn sample. .) draws {rom their joint distrib ution: and 3.Ihe sampling distributionthat describes the values they could take over >umm a • . + II i. 11.. and additional reasons are discussed in Section 9. in large samples the OLS esti malors have sampling distribu tions thaL are normal. 'caust' rtosi!>. Therefore. 1l1eir second role is to organize the circun1Sl ances that pose difficultic~ for O LS regress ion . the estimalors therru.:>ed in C hapter 6. as is shown in the next section. then.1\. The t'fror term IIi has conditiOnill mean zero given Xi: E(u j!X 1) = 0: 2. : ... .ons with time !!oeries data.: ~' s ica l lll tn!.es are random variables with a probability distri bution.el.
(4.> lions 2.) : ~.• 132 CHAPTER 4 lineor Regression with One Regressor differen t possi ble random samples. In pa rticu la r. these distributions arc complicated.A. a nd ~t a re random va riables tha t take on diffe rent values from o ne sampk to the nex t. tions. In particu la r. lf n is la rge. Y is a random variable that takes o n differen t values from o ne sa mple to the next the probability of these different values is summarizt!d ill its sampling dislribUlion. The sampling distribution of and These ideas carry over to the OLS estima tors a nd ~l o f th e unknown intercept AI and slope /31of the population regression line.6 about the sampling distribution of t he sample average.5 and 2.Po (ind ~I are unbi(ised estimators of f30 a nd P I' The proof that PL is ullhitlseJ is given in Append ix 4. Po Po PI Although the sampling distribution of and can be complica ted whcn lh~ sample size is small. an e~t l ' malOr of the unknown populati on mean of Y.). the mean of the sampling distributions of ~() and are J30 anJ 131. Review ofthe sampling distribution of Y .. the probability of these di(fe re nt values is summarized in their sam pling distributions. the cenl ra llimit t heorem (Section 2. This implies Ihal the marginal distributions of and are nornlJ l in large samples. £(Y) . under the least squares assumptions in Key Concept 4. they are approximately normal because o f the cenlrallimit theorem. A llha ugh Ihe sampling distrihution of Y ca n be compli ca led whe n thi. y. it is possible to ma ke certain sta te me nlS about il t hat hold for all II. Y.! sa mple size is small.J).3 .. Decause Ihe OLS estim ators are calculated using a random sam ple. Th is section prese nts these sampling distribu . 1n othe r words. P) I ill II. The Sampling Distribution of the OLS Estimators Recall the discussion in Ser. y. bUI in large sampk. ~y.so Y is a n unbiased estimalor o f J).21)) Lhat is. the n more can be said about the sa mpling distribution.7 If the sam ple is suf(iciently large. ill E(P u) : ~o and E(P. iJo Po iJo PI . by the central limit theorem Ihe ~ampJinf dist ribution of and ~1 is well approximated by the bivariate normal dislTibutiOl1 (Section 2. it is possible to make certain state ments a bout ill hat hold for all In particular. the mea n of the sampling distributio n is 111' tha t is. In small samples. Because Y is calculated using a ra ndoml y drawn sample.4.3 and the proof that is unbiased is \e ft <IS Exercise 4.6) states that this dist ribution is approximately normal.
22) This argume nt invoke!!.:re H . in regression analysis. like )7. As dis c ussed furth e r in A ppendix 4.6 we suggested that n = 100 is sufficiently large for the sampl ing distribution of Y 10 be well approximated by a normal dist ribut ion.Y)(X.4.. the centm llimit theore m co ncerns Ihi: distribUlion of averages (like }i).) A rc [e" ant queslion in practice is how large n must be [or these "pproximat io ns to be reliable.4 = 1 vur\(X. (Tk . This criterion carries ovcr to the more complicated 3\'t!rages a ppearing.~ (. where _ _ .7) for Pl' you will sec thai iL too.21) n [var(X. II I E(H ~}j2' \\hr. and of th e eSli mators decrease to lCro liS 11 increases (1/ appears in the denominator of the formulas fo r the variance!!).enot a simple ave ra ge. it is normally distributed in large sam ples. The largesample normal 4.1 . . il) a I)'PC of averag. ~o a nd ~l will be close to the true popula· tion coeffi ci ent~ /30 and f31 with high probabilit y. U summarizes the de riva tion of these fo rmulas. . bu t a n average of th~ product. SO Ihe distribution of the OLS est.! DISTRIBUTIONS OF ~O AND ~I KEY CONC£PT I lfthe lcust squares assumptions in Key Concepl4.E. In Section 2. lhe ccnlrallimit theorem.) £( X l) X I_ (4.)F .d ~~. The normal npproximation to the di~lrib ut i on o r the OLS estima tors in large samples is summarized in Key ConccpI 4.4 .ons 10 thin k otherwise.. The rc:. when the s.X ). This is bt!cause the variances ~ . lik e the simpkr average Y ."lmple size is large.4 imply that the ()LS estimators arc consistent that is.L. Technically. If you e xamine the numer aTor in Equtll ion (4.3. (Y. The largesample normal distribution of iJo is N(Ar (TJ). then in large !>l'lrnp1cs j. f30 and /31' when 1/ is large. (rt ~. := .4 distribution of ~ I is N(fJ" ("tll. . where the variance of this distribution. In virtually all modern econometric applications" > 100. ll ..1) a jointly normal sam pling distribution.3 hold. a nd sometimes smaller /I suffices.tL.5 Samping DistributiQ"l of the 0lS E~timotors 133 lARGE·SAMPLE __ and PI JHlVt.50 \\C willlrea t the normal approxi mations to the distributions of the OLS estimator. Ihe cenlral limit theorem applies to th is a\'erage so thai . malo r~ will be lightly concentrated around their mcnns.ults in Key Concept ~. as reliable unless there arc good reac.] l • is (4 . . (Appendix .1 var(llilll ) "iJ.
... Suppose you were asked to draw a ljn~ as acc urately as possible through either the colored or the black dots.·" I··" •• .."'.: the larger is var(X i ) .... . . .. th is arises because the \'ariance of PI in Equation (4.. the larger {he variance of X.b for making infe rences a bout the true population values o f the regression coeffi cientS using only a sample of dat a.:. • • ••:: • .21) so the smaller is To get a bett er se nse o f wh y thi" is so...r. the large r the variance of X" the smaller the variance erJ . The 0 black dots represent 0 set of X.. . .. which have a larger vari ance t!:l an th e colored dots. Similarly. ::'1 :" ... and the Variance of X y 206 The coiOfed don repre sent 0 set of X:s wjth small vorionce. . ·. •• • · . The no rm al approximation to the sampling distribution of ~f) and b l is a pow erIul tool.. . the more pre PI' u J.J 97 98 99 100 101 102 103 X Another implication of the di~lributions in Key Conce pt 4.. look a1 fi gure 4.. With this approxima tion in hand .• 134 CHAPTER 4 lineor Regres~ion with One Regres!>Or fiGURE 4 .. The regren ion line can 204 be estimated more occurotely with the black dots thon with the colored dob. we arc able to develop methoc.6 The Variance of p.... ..· ~ ..' .4 is lhal. o ( .6. in ge neral.. .' ·... ..2 1) is in versely propo rt ional to the squ are of (he variance of X . cise is b" ... ·. ..... • • ' • • • • 194 LI_ _ _"_ _ '_ _ _'_ _'_ __ '_ _. Mathemat ically.. ..which would you choose? It would be easier to draw a precise line through the black dots.' .. . . '. witn a lorge variance. 202 t :t 1% • • • . which presents a scatlerpio\ of 150 arli1icial data poi nts on X and y' The data points indicated by the colored dots are th e 75 obse r yation ~ closest to X... (he targer is the de nomi n<l tor in Equalion (4..
130.!st a hypothe sis anoul tbe valu e of {31 or to construct a confidence inlerval [or {31' Doing so requires an estimator of the standard deviation of the sampling dist ribution.3 S 4. The rtsulls in this chapter d e~cribe th e sampling distribution of the OLS esti m<ltor. are consistent. dependent variable. The first ass umption is that the error te rm in the linear regression model has a conditional me:m of zero.Sumroory 1 . This assumption yields th e formula . X and Y have finite fou rth moments (fi nite kurtosis). Key Conce pt 4. f30 + PIX. This assumption implies lhat the OLS estimator is unbiased. the standaru error of the OLS estimator. . If the least squares assumptions hold. There lirc ma ny ways to draw a straight line th ro ugh a scauerplot .1 summarizes Ihe le nninology of the population linear regres sion model. {3 1' is the ex p~c t ed change in Y associated with a Iu nit change in X. The third assumptio n is that large o utliers ure unlikely. The inte rcept. for the vari ance of the samplin g distribu tion of the OLS est imator.) are i. then Ihe sampling dislribmion o f tJle OLS estimator is normal..4. These impor1l'lol properties of the sa mpli ng distribut ion of the OLS estima tor hold under the three least sq uares assumptions.6 Conclusion Thischapter has (ocused on tbe use of o rdinary least sq uares to estimate the inter ce pt and slope of a popula tio n regression Ii n ~ using a sampk. but doing so using OLS has scveral virtues. and have a sampling distribution wit h a variance (hat is inversely proponionallo the sample size 11 _ Moreover. for . and a single regressor. ~ is rri pre: ~"ic nl S ~ : Pi Ie Id Summary l. determines the level (or hcight) of the regression line. The second assumption is that (X. thcn the OLS estim fllOf'3 of the slope and inter cept are unbiased. and confidence imervals is taken in lhe next chapter. as is the case if the data are collected b} simple ra ndom sampling.. however. th ese results are not sufficien t to tl.given the regressor X. Y. if n is large.Y. The slo pe. The popula tion regression line. The reason for 'h is assump tiOn is {h at OLS can be unreli able if the re are large ou tliers.d. is the mean o[ Yas a fu nction of the va lue of X. By themselves.i.. hypot hesis tests. This stepmoving from the sampling distri bution of to its standard error.that is. pre ' sented in Key Concept 4. Slated more fo rmally. X.of It observa tions o n 3.
.sion "Jth Rl _ 0. .. If these assumpt ions hold .). II. and then provide <In example In which the a.sion (S £ R) are measur~ of how close thl.( }/.9. Sketch a hypothetical scaacrplOi oC data for a rcgres:.. jlh R! 0..1 Explain the difference he tween and 131: bet .lX) 4. . values of Y.Lst sq uares (OlS).rc<. The popu lation rl. 4. P I . 111e R' and qandard error of the regre<. regression intercept and slope ure denoted hy ami P 3. and the regression error If. \\ ith a larger vulue indicat ing Ihllt the Yj's are closer to the linc. n by ordinary le. random draws from the popu lation: ~nd (3) .: (2) the sample observations arc i. X. The OLS estimators of the l.sumption faii~ 4.' unlikely. ..d.'gression linc can be est imated uliing s(lmpie observat ions ( Y. the OLS estimators (30 and (3 1 arc mpie is large.ion \I.ee n the residual Ii..3 Sketch a hypothetical "Cauerplot of data [or an estimated reg. have a mean of lero condit io nal on the regressors X.. The standard error of the regre<:<:ion is an eSlimator of the standard deviation of the regression crror.. (1) unbiased: (2) consistent :and (3) normtl ily distribut ed when the S<L Po Key Terms linear regression mooel wilh a si ngle regressor ( 114) depe nden t variable (!l4) independent variahle (114) regressor ( 114) populatio n regression lint! (114) popu lation rt!gression fu nction ( 11 4) populat ion in tercept and slope (1 14) population coefficients (1 14 ) pnni metcfS (114) error term (I 14) o rdi na ry least sq uarc:s (Ol S) esti mato r ( \ \9) OLS regression line ( 119) predicted va lue ( 11 9) resid ual (1 19) regrcssion H2{I 2.: and between the O LS predicted value Y. '1l1C R2 hi between 0 and 1.2 For each least Stluares assumption. provide an example in which the :lSSUI1lI " lion LS hllid. and "..umplions fo r the linear regression model: (1) The regres· sion errors. i = l. arc 10 the estimatcd n::gression\ine.1) explained sum of sq uares (ESS) (123) lOla I sum o f squares (TSS) (123) sum of sq ua red resid ua ls (SSR) (1 24) standard erro r o f the regression (SER) (124) least squares assumpt ions (126) Review the Concepts 4.i. 136 CHAPTER 4 linear Regression with One Regressor 2.5.I arg~~ outlier') an.. ·n lere are three key as:.
3 A regres.) rl .7 + 9. A cla:. Wha t is the sam ple sta nda rd de via tion of test scores across the 100 clas. estimates the OLS regress ion . .e varia ble are measured in centimeters and kilograms. sizc (CS) and a verage le st scores from loo thirdgrade cl asses. Last yea r a classroom had 19 s tude nts. selected from a popula tion a nd tha tthest! me n's height a nd we ight are recorded.4 . What is the regrcs..82 x CS. ) 4. the regression's weight prediction for someone who is 70 inc hes tall? 65 inches tall ? 74 inches tall? b. S£R = 10. Rl = 0. A rcgres· :.).1. What i:. thc.5. and S £R.41 + 3. 23 stude nts.sroom has 22 students. What a re the regres!.SER = 11.<.99. and this year il ha:.c.1 Suppose lha l a researcher. B. inche~ where Weighl is measured in pounds and llldg'" is measured in )I. A man has a late growth spurt lind grows i .SER = 624.4. What is the sam ple a verage of the test scor~\o across the 1(X) classroom!. TestScore ~ = 520.ize across the 100 classrooms is 2 1. R2 "" O . .81.. lh ~ rcgn:ssion's predict ion for thlll classroom's a verage tesl score? b.M.ion of weight o n height yields Weigh! ~ = .2 Sup pose tha t a ra ndom sample of 200 twentyyea rold men i!.ion of ave rage weekly ea rnings (A IV£..94 x Height. measured in dollars) o n age (measured in years) using a random sa mple of collegeed uca1ed fuli tim e workt!TS aged 2565 yields the following: AWl:' = 6%.ion estima1es from this ne w centime t cT~ L:ilogra m the '. estimated coefficie nts. The sam ple a verage class !.. o n clus:.5.023. usin g dill .Exerci$eS 137 Exercises 4. Suppose that instead of measuring weight a nd height in pounds a nd inches. for the R2 and S ER.6 x A gC'".) \\'ilh with 4.rooms? (Hint: Revie w the fo rmula:. R2:= O.2. Wlla! i!. Ill l} ' regression? (G ive all resuhs. Rl.? (llint: Review the fo rm ulas fo r the OLS estimators. Wha t is the regression's pred iction fo r the increase in this ma n's we ight? c.5 inches over the course uf a year.io n·s prl!dictio n fo r tht! change in the classroom a ve rage test score? c.
What are the units of measuremen t for the Rl (dollars? yea rs? o r is R'l unitfree)? d. do you think il is plausihle Ihm the distribu\j on o f errors in the regression is nor· mal? (Hint: D o you think thai the distribution is symme llic or skewed? What is the smallest va lue of earnings.Rj ) for thjs stock is greate r than the va riance o f (Rm .5% and the rate of return on a large di versified portfo lio o f stocks ( the S&P 500) is 7.is sa mple is 4 1. .2. . G i ve n what you k now abom the distribution o f ea rnings. Tn a given year.. the rate of re turn on 3monl h Treasury bills is 3. The average age in th.• 138 CH APTE R 4 linear R egression with Ooe Regre~sor o.1. What is the regression's predicted earnings for a 25·yearold worker? A 45yearold worker? e. He gives each of the 400 sfude nts in his course the same fioa l exam. Ex pla in what the eaefficieol values 696.Rrl for this Slock is greater than the variance of (R".: examinat io n times based o n the nip of a coin. TIlt regression Rl is 0.. b.4 Read the box "111C ' Beta' o f a Stock" in Section 4. a. For each company listed in the table al ine end of the box . What is the a verage value of A WE in the sample? (Him: Rev iew Key Conccpt 4. b. The shmdard error o flh e regression (SER) is 624. Suppose thll lthe va lue o f f3 is greale r than 1 for a part icuklf siock. fJ IX. use the estimated value of /3 to estim ate the slock's expected rate of retum.7 and ~. let X. denote the amount o f lime thai the student has to complete the exam (X.) 4. + II /< .3%. but some stude nts have 90 minutes 10 complete the eXdTll whi le others have 120 minutes.5 A professor decides to run an experiment to measure the effect of li me pres ' sure o n fi nal exam scores.. 6 mean. Will the regression give reliable pred iction s for a 99yea ro ld worked W hy or why not? f.90 or 110).2. Show that the va riance o f ( R . Suppose that the value o f f3 is less than 1 for a particula r slock. 4.)? (Hilll: Don '\ forget the regression error.).) c. What are the Unil'l of measurement for the SER (do llars? years? or is S£R unitfree)? c.023. :5 loo). and consider the rcgrcssion model Y.R.6 years. " nd is il consistent with a normal tlislIibu lioo'!) g.. Let Y/ denote the number of points scored o n the exam by the jlh stutlent (0 :5 Y.R. = Po . Each student is ra ndo mly assigned one of th. Is it possible Ihal variance of (R .
is a flernou ll i random va riable wit h Pr( X =: 1) . Explain why t:{u. + II " a. im plies that P.l X.Exercises a.1f. 4.10 Suppose Ihat Y. = f30 '. is N(O.) ~ ~o ~ 0& O. (ll int: U~e the fact that ~I is unbiaseJ .) £(Y. A linear regression yields PI = O Show tha t R2 . A linear regression yie lds R2 = O Doc!. E(II . i.6 Show that the firs t leasl squ a res assumption .: ~Iirnal()r of {31 " It formu la fo r Ihe \cast squares Ihe least squares :::. 11 Co nsider Iheregression model Y. Wh y will different students have b. Whi':. and 150 minu tes.3 =: Explain.0 . = O? 4. W llich is shown in A ppendix 4..3 are b.9 u.? tI . where (X .O . = 0 for [his regression model..) are Li." ion assumpt ions in K ey Conccpt 4. in Ke y Conce pt 4..:::: Iln + f3\X. D erive a fo rmula for .21).X. #.d.!X. a ssu l11 p ti o n~ n. s~l i s fied ? A re the other assumplions in K ey Concept 4... .4? Wha t about ~l'?) 4.IX. and X . .J 4. Suppose you kno\\ Ihal J30 = O De rive .:d exce ptlhal lhe fi rst assumptio n is rcplaced wilh E(II. Explain what the term d ifferen l values of II. I).IX. Derive a n e xpression for the largesample variance o f [Him: E valuate the Icrmsin Equalion (4. ii. b.) l·. 4lJ + 0.. Compute the estimated regression 's prediction for the average :. l h ! estim ated regression is Y. b.20. . Compute the estimated gain in sco re fo r a s tuden t who is given an additional 10 minutes o n the exa m.4 continue 1 hold ? Which changc'! Why? (Is (31 0 normally distributed in la rge sa mples wilh mea n a nd variance given in Key Co ncepl 4. c!'timlllor of fj . 4).3. this imply that . p. d.) 4. Suppmc you know that f3t. When X = \. + " /. II . whe n X = 0._ 4.h pariS o f K . 139 represents. ) =: 2.8 Suppose that all of the rcgrcs.3 a re sal is· fi.. i~ N (O..1 Show tha t P o is a n unbiased esti ma tor o f f3o.::d.core o f studcn [s gi\'e n 90 minutes to complete Ihe exam: ' 20 min utes.f3\X.::y Concept 4.11.24 X . Show thaI Ihe regression satisfi. 4.
August 2U05. with 11 high school diploma or B.o::re u~. as their higheSTdeg.aw· bc.. In this exercise yoo will investigate bow course eva luations :lre re lated to the professor's beaut )'. "Beauty in tbe Cl1IS)room: t llsiructor~' Pulehntude lind l'utJ' live Peda'O£. mated regression.aw·bc. Show thai the r~gr ession Rl in the regression of Yon X is the squared va lue of the sa mple corre la tion between X a nd y. course characteri stics. Predict Bob's earnings using (he esti. Predict Alexis's earnings using the estimated regression .ilb Amy !'arke r.ree. A detailed description is given in CPS04_Dt:sl:ript ion . Does age account for a large fraction of the variance in earnings across individuals? Explain . l'I"htK dutll " ere provided by Professor Daniel Hamcrmesh of the Um.1 for 2004.ica1 Producli~ily.369376. rho b.. Alexis is a 30yea rold worker. That is. fuilye<lr workers. show Ihal Rl::.comlsfock_wafson).you will find a dahl fi le Teachin gRatin gs tha t contili ns data on course eva lu ation:>. also ava ilable on tbe Web sil e. 24(4): Pf'. Wha t is the esti ma ted intercept'! What is the estimated slope? Use the estimated regressio n 10 answe r this question: . (These are the same data as in CPS92_04 but an: limited to the year 2004 ) T thili exercise you will investigate the re l ~lIion sh ir between a worker's ag~ n and ea rnings. Empirical Exercises E4..c.H ow much do ea rnings increase as workers age by one yea r? b.140 CHAPua" linear Regre~on with One Regressor 4. £ 4.Economics of £dllCfJllOll Rt:v.. One of tbe characteristiCS is an index of Ihe professor's " beauty" (I S rated by a panel of six judges.S.n hiS paper ".com!stock_watso n).2 On th e text Web site (www. o lder worke rs ha\'e mort: job experience.~. .ity of'texas at AtlSun lind ". age 2534.) n. Ru n a regressio n of average hourly earn ings (A H £) on age (Age). c. and professor characteristics for 463 courses at the Univt:f· sit y of Texas al A U51i n. (Gene rally.AJB. also avail able on the Web site. leadi ng to highe r productivity and earnings. Bob is a 26yearold worker.12 a.1 On the text Web site (www. 1 A detailed description is given in Tcachi ngRat· ings_Dcscription . Show that the R2 from the regression of Yon X is the same as the R~ from the regression of X on y . It contains data for full·time ..you will fi nd a data file CPS 04 that contains an extended version of the dat a set used in Ta ble 3.
comlstock_walsoo) . What is the estima ted inte rcept ? What is the estimated slope? Explain wby Ihe esti mated interce pt is equal to the sample Oleall of Course_Eval." e. pp217. Run a regression o f years of compieled education (ED) on diSlance to t h~ nearest college (D ist).surtsS (lltil Et·o"umic Slil/l~·'IQ.ity and wcre used In her paper ~ DcmOCfa!iz.Empericol Exercises 3. so Il)at slUdents who Ihe closer to a four. d. (Prox imil Y( 0 college lowers the cost o f educa tion..ation or DI"er(ion1 The EUecl of Community Colleges 011 Educational AIt ain· men!:· Journal of BI.~ data we re pro. Construct 14 1 a scauerplot of ave rage course evalua tions (Course_Evnt) on the profe ssor's bcaUly (Beou/y). where Disl is mC<lsured in te ns of miks. ( Hint: What is the sample mean of Bcalll)'? ) c_ Professor Watson bas an average val ue of Beuuty.lable o n the Web site.·ided hy Professor Cecilia Rou.hi p be tween Ihe number of completed years of ed uca tio(l for young adult.2 means that tJl(~ distance is 20 miles.year college should .) A detailed descriptio n is give n in College Distance_Descriplion .e'· and "sl1'I<.3 O n Ihe text Web site (www. here students go to high school? !J1. Apnl 1995. data [rom a random sample of high school seniors interviewed in 1900 and re·interviewed in 1986. Predict Profe ssor Stock's and Professor Watson's course evaluations.ression o r average course evaluations (Course_ElIlIl) o n the professor's bea uty (Beottty).you wi ll find a dala fil e Coliegef)iSlonce Ihat cont ain::. Is the estimated effeci of Beauty o n Course_Eva/ large or smu ll? Explain what you mean by " Ia rg. complete more yea rs of highe r educatio n. 2 D.224.awbc. Disl . o n average..III.><! of Princeton Un lVe n. also avai.) What is the esti mated inte rcept? What is the estimated slope? Use the esti mated regression 10 answer this q uestion: How does the average value o f years of comple ted schooling change when colleges are built close 10 . In this exercise you will usc these da ta (0 investiga te the re lation:. .. (For example. . while Professor Stock's va lue of Beauty is one standard deviatio n above the ave rage. D ocs there appea r to be a relation ship between the va riables? b. Run a reg.. Commen t on the size of the regression 's slope. ~va l uatio ns E4.s nnd the distance from each stutlcnt's Iligh scbool to the nea rest fo uryear college. 12(2). Does BC(tllIY explain a large fraelion of the va riance in across courses? Explain .
142 CHAPTER 4 Linear Regression with One Regressor h. cent ~. ml the s. Bob's high sl:hool was 20 miles frOTllthc nearest college.ny 3ntl wef(' Ol<J j()I!nW/ . .C()/IOIIUO.5 "nd wit h a trade sha re e qual to 1. 2\00.comfstoek_wwtson). run a regression of Growth on TmdeShare."lf'< ' of Finan. Does Malta look like an outlier'. Hlso lIv"H"ble on the We b site. Constr uct a S(·atterplot of a verage a nnua l growth ra te (Growlh) on Iht: 'lverage trade share (TrtttieShare)..o: WI.. In hi~ p. along wi th va riables that are put e nt ia lly re la ted to growth A uelailed description is given in Growth_D escription ..lI is the esti mated intercept? Use th~ regression 10 predict the growth rate for a country with trade share 01 0. E::. d..UrI.hc..300. Where is MaILa? Why is the Ma lta IraJl' share so large ? Should Ma lt." indi\'iduals? Explai n.e Gr. An::. yea rs of completed education using the estimated regression. grams.ship between growth and trade. UFin'ln~.0.a. ami ~orman Loa)7. lit! Thor~h:n Bcd. ] n. One country. "'~r~ pro\"idcd by Profc~1IOr R ~ U\ln( 01 Bro .. Using all obser vations. " I 2h1.. or some thing else)? E4..~... Find Malta 0 0 the sc'ltte rplot.timate the same regression excluding the: dat a from Malta . e.th.aw. dollars.. n Uni~cr. Predict Bob'!. Does the re a ppear [0 be a re la tion· ship bet ween the variables? b. Does distance to college expltlin a large (met Ion of the variance in educational allainment aeros. has a trade s hare much larger than the other countries...wer the same questions in (c).. W hat is the value of the standa rd error of the regression? What are the units for the standard error (meters. Ilow would the prediction change if Bob lived 10 miles from the nearest college? 1.4 On the text Web !>ile (www.. In lhis exercise you will in ves tiga te the re lati on.yo u will fi nd a da ta rile Growth that contains data un Jve rage growth rates over 19601995 for ti5 coun tries.' c.'C:'o l. What is the estimated slope'! Wh.:. d.. years.~ l:. Ma lia.! be included or excluded rrom the unaly~ i~? ~I"h~'s.
and the percentage of studenls who are Eng lish learne r" (that is..:d l b ling !lnd Rl'pl1rting da ta )ct contains d<ll:\ on lesl JX rfor mance. " 11 . a slAndanJi'lcd leSI administered 1 fiflhgrade SlUdc nts. for ".23) fi.1 The California Test Score Data Set 'fhe Cillifornin Stnndardizo._. and studen t dcmogruph ic backgrounds.l.2. .24) .b " . APPENDIX _ _ _4_. equIValently. I( Y. (4. . . School characlcriSlic. .:) per f. O.s (:tvc r 0 aged :lcross the district) include.~..b" " Ix . number of computers per cl.hlch the ucri\luivcs in EquatIons (4. l " bll  bIX.b() b. f '" 2 "L (Yi .Xi nr. (Y..Oemot.gov). All of these data were obtain..( dcrivath e~ with respect to blJ and b l : iJIo .. I f1~ . Ihe p. and cxpendi\ un.) and (4.I. SllIdCnlS for whom Engl ish is a second language).ca.oo of the as EsJimoton 143 APPENDIX 4 .hal minim.rcent age o f sludenls who qualify for a reduced price lunch.\ssroom.blX.cde. school dWl"nctcrblic. To minimit:c the sum of squared prediction mlSlaKes I:~ I (Y' .)! IEquation (46)1.)X.' (rom fl ll 420 Kfl3nd KlS dbtricts in Califo rnia wilh dutl! OV:lilllbtc for l\J9g and 1999. giv. The OLS CSlHnator. firs t lake the parti.e L: .. L (Y.\udenl.~ bll .ba .. enroll me nt.dl\idcd by the num her of full time cqui \'alcm teachers. . and ~" are the values of bl• and b l t.1 Derivation of the OLS Estimators 2 I biS appendix u~es calculus to derive tho: formulas fo r thc OLS cstimnton..'.:iI)ufcd liS "fulltime equ ivalcnl) ").TIl": S\U denlleacher ralio used here b the number of ~tudenl S in the district.'mographic vurinb1cs {or the students also are 1I\'cTilged across the di"lricL'Ille demographic variables ind ude the percentage of students who) are III Ihe public lI~ista nct program CaIWorh (formerly A FDC). L (V.·d from the Califorma Do:pan ment of Education (w\W.:n in Key Concept 4..xy = 2 2. nu mber of t e ach e r~ (mo..23) p  l I ~ l btX.lhe vn lues of /Junod b. Test scores arc Ihe It\'cmgc of the read ing and mmh scores on the Stanford 9 Achievement lest. The dat a used here :tn.
.(X.v. set ling these dcrivuti. . • . BcC.)1 + :L.X.X)(¥..Accordingly...o and lion (4.Y) " '"".htX "" 0 and 1 I " ..X Y 13.X)( Y.. distnblllion given in Ke y Conce pt 4.iJo .27) and (4.ii.lIl<. $0 the ~I in Equation (4. collecting lerm\. in large !>Bmpl¢. = P! in terms of the regressors and ~'rrors.+:~: the formula~.28) are the fo rmulas (o r .y) ..LX. . i(X. bas the normal sa mpli ng.Ii) . (4 2~) Y .(X. I_I .e nllmerator of the form ula for AJ + f3 1X/ + u j• Y._ _ _ __ . .~._I LX.x.Y.2 7) b yn .0 LX.  Y = /31( X. .• "/1 " Solving this pairor equations for (4.1. APPENDIX ___4 _ Sampling Distribution of the OLS Estimator _"311 In this ap~ndlx. . we show Ihallhe O LS estima tor PI is unbiased and.X)[~.4.24) equnl zero. . .::!lI) Equa l i on~ (4. (4.(X I' L{X.Po and PI.2(. ._I ! i x. . 1 X)(ll.X) + II.  u)j (4.27) is i( x.) iJo and P I yields i{X. .sxyh} is obtained by dividing tht numerator :md denomm tllor in Equa. . ." A _ II !_I ! II .P.esequallo zero. Representation of P I in Terms of the Regressors and Errors We sta rt by providing an expression fo r Y. . 144 CHAPTER 4 linear Regreuion with One Reg(~~ and (4. must satisfy the two equation.27) iJ" Y . PI given In Key Concept . and dividing by" shows thill thc OLS esumalors.XI i_I + (u.XI' " (·1./" 1 fJ.29) " "" PI L I I (X j  " :\.
I(X. .X l' 1/ . . .:L (X. .. I " " . ~ thalthc (undilional eXJ'Cc\i! ' large hraekets . .. so L::(lItI X I" lion In ..X )/1 L.PI. j(X.alenlly.kl  I ~~ 1X. Substituting 2:7 I(X. given X I' . I (1.. II . LargeSample Normal Distribution of the OLS Estimator The largc slLmrlc normal approximalion to Ihe lim1llngdi~trihution olP. tnto Ihe Gmll e x pression in Equation (4. ) .onccpt4.(. so that E(PI) = Pi: that IS..4) .. The cxpcct.. Is Unbiased ln E(P1)=P j "' /:.) obt3lncd h} l'un~ldc ring the bcha\'ior of Ihe finallenn in Equation (.  >.X H Y.3 ). .)' ) ::: PI }.\:.  f3 11 X I" . ] " ..L(X .1  . . I(X. II) = 145 2:7. .X )(11/ d X)II.X)II. (Key C.h:. Ihal is. E(lIiI X. XJ =. .PdXI•.. I~ (.L: :(X. Xn)) =' O. + E [ .. .. ..I X.t. ~o that [(P I . By the fir~t lenst squarc~ assumplion. all y unhill'!ed .O.. which implies that ~ 7...' )11. [ f(X. .O Equi\._I ' ~ ] (4. .  X)II.X l' i~I where Ihe ~econd equality In Equ at ion (4.Pi .~1 l'1condilion' .. PI IS unbiased. X~ ) ~ E(II . By the la..27) yidds 1 " .s E~timo lor No w ~. of iterated expecta tions qjJ.X)(u. .' I(X.il'__ ""Pi . Subslituttng this II ) .X.)F.Sampling Distribution of the a.30) ProofThat Thus..ltions other tha n i. I. howcvcr..30).(XI..31 ) _ f31' n ' :L(X.E(~tl X..).] ~..X.L ( X.Ptl = FIF.31) fo ilo \\'5 by U ~tng the la w of itcm ted expecta tions (Section 2.n the 'Second line of Equation (4..X) ! + _t cxpre~sion in turn into the formula for P in Equa : tion (4.31) is zero. . .29) yields l ~ I( X i L..I (X..11 X III = O.'. .XlU .L~ I (X. .) X II .) = O.X'. B~ the second lea~ t squares assum pt ion.X f .7 (X.(u.~ . .UO). herl' the finn l equnluy follow'S from lhe definitiOn of X . . .X)II .. It fol1o. .. P. . X" ..{J .uion 01P I ~ obtained by taking the elpect3tion of bolh "id~ of b.. ...jurttion (1.trit'outed mdepclldc nlly of X for all Obierl'.
( 4.1.'" 0 and Ir.X :' 0.).ltio n variance. "" var( v)/ [va r( X.. As discussed in Section 3. i" :2>.) TSS = SSR + ESS . we have that. (4.. the sample variance is a consiste nt estimator of th e.32) nLI Y.. . =1 I " .• 146 CHAPHR 4 linear Regre~5 ion with One Regre~SOf Fll'S\ consider the numerator of this tenn..i.".md i. =Y. is. d istribu ted N(O. (Xi . Th ~· fero re. and SSR arc defined in Equatio ns. TSS. and (4. the term in the numerator of Eq uation (4. ze r(l~ the sample average of the O LS predict ed values cq u. ".ILx)u J /[ n[var(X.J· Some Additional Algebraic Facts About OLS The O LS re~ idua l s and predicted values satisfy : 1 " nL u.JLX)If..il~·.3. X is nea rl y equal 10 I'x' Thus.. the samp le covariance s~\ betwee n the OLS residuals and the regressors is ze ro.1 4). By the second leas t squares ass umption. which is inconseque ntia l ir n is large).. is u . l]. ilnd the ro tal sum of squares is In.popul. 13..~.d. where v. vl u " is. . . tT)n) distribu tion. (4. By the first least square5 assumption. II satisfies all the requiremen ts of the cenlrallimil theorem (Key Conc. Ihis is the sample v" ri am'c of X(exn:'pt di vidingb}' n [alher Ihan n .l which.8)]. is large. is i.".3)) (.30) is The sample ave rage V.32) through (4. 1). Vi has a mea n o f ze ro.2 1) . which is the e xpression in Eq u.3~) Equations (.30). in large s<lmph. in large 'l amples.)J"I . = var[(X.".I (4. N(t3I ' q~. .15). if the sample sue close a pproximation. Combining these two results.: sum of the sum of squared residuals and the expla ined sum of squa res [the ESS. ~ I  f3 1. Thus the d istrib ution of v is well approximated by the N(O. Thus. 7).~ nonzero and (in. whe re (Tt.2 [Equa tion (3.35) say t ha t t he sample average of the OLS residuals i<. to il Bcca u~ X is consistent.ux)lI. In large sa mples. 17)1. by the th ird least squares assumption. so in large samples it is arbitra rily close to the population vari· flnce of X .x.. ii lvar(X. =O' .! is Y .) so IhM lite sampling distribution of lion (4. (4. = var(Xi .:pl 2.! consider the express!on in the denominalO r in Equal io n (4. The variance of v. whe re u~ = u~/ n. Nc). .
I ._ 1 • ~ X)..35) fo llows from the previous results and some algebra: TSS = .X ). .}2 + .I iu. ~ L (Y. in Equa tio n (4.}~ ) :: 0 and S:. fin a l equality fo llows from = 0 by the previous n:suhs. = ( Y. i" = O . SSR + ESS . • . = O. '0 .:. + P I I:~ lil.IY whe r(' lhc:: second equalil}' is a consequence of EquatIOn (4 ..so l L~.]. im plies that 5"X Equatio n (4.  y. .(X.j "" 0 im plics r..l. no te t ha t Y. .i.L:~ Ili.Y i r.:.Y) .Sampling Oi ~tribution of the 0lS Estimator 147 To \c. nOle t ha t the definition of AI lets us wri te the OLS residu als as Ii.I i:(y.._I iCY. .i. ~ X)'  (06) o. + .II. note tha t I:. .o r.~ \ Y.~ ii. r.27).(X. where t he fina l equa lity in E<luru ion (4.33). . ~ y) . . To verify Equation (4. '_ . :. combined wi th t!le preceding resul ts.   X}  P/L1 ( X..34). i = X) ] (X. . ._ I {(Y.1i.I . )(Y.y) 2 = "iCY. To verify Eq uation (4.  But t he defini tion of Y a nd X imply that }:~_ I (Y' ..X. ::: L:.36) is o btained using the formula fo r /3. "" Y.37) = SSR whe re the + £ss + 2±U.  iJo .X ): thus ft L u.~J Y.~IX.(X/ ..yi . L (X. nlis res ult.rify Equation (4.= . I.. ) = Pur:.. lli. + Y I y)2 = I_I i (y.. I(Xi X) = O. 1_ 1 i~ ' Y)  p.32)..  y }2 + 2i(y.Y. .X) L(Y Y}(X.x.Y) (4. I• y.X.(Po + PIX.i / r:.P.IY + L:_I. .32).
and ih standard error to test hypot heses. 15.3 takes up thl:! regressor. under certain condition. Section 5. how iJI has a sampling distrihution..:d how the OLS estimntor {J! of the slope coefficie nt {3\ diffe rs from one sampl e to the nexl.6 dic. Ih r. Chapler 4 cx plain. Section 5. some ~\ron g. which qates thaI. If. Section 5.: r ~pedlll case of a binary conditions hold .2 explains how to construct confidence intervals for f31' Section 5. which measures the spread o f the s ampling tl istribution of 131 Sect ion 5. then shuws how \0 use [J. we show how knowlft dge of this sa mpling distribution can be used to mak~' Sl alCI1lt. OLS i!I effiei!!n! (hilS thc smal1e~! variance) among a certain class of estimators.Ihtll is. Sections 5.: Ihe sampling unce rlai nty.3 assume that tho..:nts abou t f31 tha t llcc ura le ly Sllmmnri/o.cusscs the d istribut ion of the OLS estimator when the population distribution o f the rcgrc"sion c r l'\)r~ i~ nonna!..1 provides an expression {or this s ta ndard ~rror (and for the Siandard e rror of the OLS estimator of 1ho. [n th is chapter.5 presen ts thl:! G '1U ~~ ·M arko\' theorem.4. .: intercept). The sla Tting poin l b the slancJa ru error of lhe OLS esli ma lo r.. in additi on.:c k asl squart:s assumptions of Ch apter 4 hold. then some stronger rc\ults can be derived regllrding the dbtribu tion 0f Ihe OLS estimator One of these strongc. conditiOns is Ihm lhe crroI3 lire homo)kedastic.. a concept int rod uced in Section 5..CHAPTER 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals T hiSchapler conlinucs (he trea tmen t o f linear r~g ress i on wilh a single regressor.
'\'idenec? This section di'SCusscs tests of hypotheses aoou tt he slope {31o r inte rcept Po of the popu latio n regressio n line. the (sta tistic is r Tes ting hypotheses about the population mean .J.r:o.o)/ SE(y). the supe rintendent calls you with a prohle m.1 Testing Hypotheses About One of the Regression Coefficients Your clie nt. has no effect on Icst scores.1 Testing Hypolheses About One 01 the Regres~ion Coefficients 149 5.2 that the null hypolhe!)is Ihat the mea n of Y is a specific v<l lue l.6. the n lOrn to one·sided tests a nd to tesls of h ypo the!)e ~ rcgarding the inte rcept f3 u. Heca use the effeci o n test scores of a unit c hange in cl ass size is Ihl(f~SiU" the tl'lx payer is asse rting th at the population regression line is fiatthaI is.1. Is there.ilalis tic actua lly obse n'ed. th e slope f3 0 <l". Recall fr om Section 3. The second !)Iep is 10 compute the Istatistic. so that n:ducing them further is a waste of mon ey. which has the general form gi ven in Key Concept 5. at Icast as diffe rent from the null hypothesis value as is the f. the superint edent ask~. which is Ihc smallest signilicancc level at which th e null hypothesis could be rejected.O. The tax payer's claim ca n be rephrased in the language of r~gre ss ion analysis. The tes t of the nu ll hypothesis Ho aga inst the twosided alternative proceeds as in the three ste ps summarized in Key Concept 3. The [i ~ 1 is to compute t he sta ndard error of }It S£(Y'). the pvalue is the prObabi lity of obtCl in ing!\ slatistic. assuming t hat the null hypot hesis is correct .! tcst statistic actually obsc rvcd: equivalcntiy. so we begin with a brief revie w. which is an estimator of the sta nda rd deviation of the sa mpling distribution of Y. The gene ral approc to testi ng hypoth eses about these coefficie nts is Ihe S(lm c as lch to testing hypotheses about the population mean. She has an a ngry tax payer in her offi ce who asserts that cutting class sizt'owill not help boost test scores.i r. applied he re.\l:~ =. evidence in your sample of 420 obse rvations on Californ ia school dislricts thai th is slope is nOfl1.'iJu of the popu lation regression line is 'Z e ro.5. o can be written as flo: E( Y ) = p.L}' o.. at leas! ten tatively pending Cunhcr ne\\ I. a nd the two·sided alte rnative is " l: £(Y) t:. by ran dom snmpling varia lion . (Y  I' r. The third step is to compute the p value. We sta n by discussing twosided tests of the slope f3 ] in de tail . based on Iht.ero? Ca n yuu reject Ihe taxpaye r's hypothesis thai {3Ch1u. or stl ould you accept ii . TwoSided Hypotheses Concerning 13. Class sizt'othe taxpayer cla ims.
o. H1 : {3.1 is the value of the .arr l > 1. tes t is 2¢( l r<Jell) .96.lope f31 can be It.Al a theoreticallevc l. we follow th e sa me three sleps as for the popul a· lio n mea n. the sampling distribution of Y is approximate ly normal.. the critical fCJ ture justifying tht.  SE(Pl ) = . Testing hypotheses about the slope P t.l ) ~ (Key Concept 3.. iD large samples... (5. lion of 131' Specifica lly.Under the twosided a lt ern ative. hypotheses about the true value of the s.'sted using the same general approach. Th e firs t step is to com pute the st andArd error of SE(~ l) ' The standard error of ~I is an estimator of lTiJ • th e standard devia tion of the S<1mpling diSlribu .0' Tha t is.' foregoing test ing procedurc for the popula tion mean is that..2 ) To test th e null hypothesis H o. where lU/. "* /3 1..n A ppendix Table 1 l) A lternatively. the p va [ue fo r a twosided h y p o th e~ i..J) .m ce level For pri<: example.~ and the twosided Illtemative hypothesis are Ho: {31 = f31./3 l. In Ihis case. ..s ta tistic act uall y computed and e is the cumul ative standa rd normal distributi on ta bula ted i.• 150 CH APTER S linear Regres~ion with One R egressor KEY CONCfPl GENERAL FORM OF THE ISTATISTIC 5.<1 (t wosided a lt e rna tive ).. the null bypothesi.0 "s. the third ste p can be re pl aced by simply co mpa ring the Istatistic to the critica l valu e apprO l le for the test wilh the desired signific.5). u twos ided resl with a 5% significance level wOllld rejectlh c nulJ hypoth esis if I. (5. {31 does nOI equal f31.1 In general. Pt. under the nu!! hypothesis the true popuilnion slope OJ takes on some specific value. The null and alte rna tive hypotheses need to be slated precisely before thl!~ can be tested.. the popul at ion mean is said to be sta tistically sig nifkall1ly diffe re nt than the hypothesized value at the 5% significance level.statistic has Ihe form I = csLimator .1 .. ally.. the . Beca use the Istatistic has a standard normal distributi on in lurge samples under the null hypothesis.hypothesized value sta ndard error of the estimator (5. Because PI also has a normal sampling distribution in large sa mples. The angry l<ixpayer"s hypmhesis is that O CllUSSj~t = O More gener .
0 (5. Altern:Hively. 1 /I  [1i (x. x . Alt hough the formula for oJl is compl icated. ~ .5) The third step is to compule the pvalue . and rejecti ng the nuU hypothesis al Ihe 5% level if IrMfI > 1.a). I '~'I)· (5. If so..PUll! SE(~.96.4 ) is disc u s~ d in Appe ndix 5. (5. . the hypoth esis can be tested at the 5% significance leve l simply by comparing the val ue of the Istatislic to ± 1. provides e vide~cc against the null hypothesis in the sense that tbe chaoce of obtaining a value of /31 by pure ran dom variatio n from one sample to th e next is less th an 5% if. the probability of observing a value of at least as diffe rent from f31. the null hypothesis is correct.. [IP fu'l > Ik' p. so in large samples.. the critica l value for a twosided te"t.96. ) (11> ' ". PI pvalue = PrHJ IP I  f31. in applications the standard e rror is computed by regression software SO that it is easy to usc in practice. H.oas the estimate actually computed (jf.statistic actually computed.11("11) = 2$( l lq("ll).~ SE(iJ. . Because ffi) is approximately normally distrihuted in large sam ples." I] = 'SE(~.(31.4) Th e estimator of the variance in Eq uatio n (5. °lllese ste ps are summarized in Key Concept 5.7) A small va lue of the pvalue. 1. pvalue = Pr( IZ I > 1. The second step is to compute the Istatistic. (5. _ 1 ± (Xj 2 /. in fact.) Pr ~ p. assum iog tbat the nuU hypothesis is correcl.S. 1 _ X )l/il 2 .1 Testing Hypotheses About One of the Regreuion Coefficients T51 where __ 1 "t.2. say less than 5%. the null hypothesis is rejected at the 5% significance level. uoder Lhe null hypothesis the (statistic is approximlltely dist ributed as a StHn d<lrd no rmal random variable. the sec ond eq uality fo ll ows by dividing by SE(Pl)' and I n(/ is th e value of the . Stil ted lnilUle matically.) ' (31 . _X)'] II .ul > liff' .6) where PrH" denotes the probability compuTed under the null hypothesis.
One compact way to rdpaft the standa rd errors is to place them in parentheses below Ihe respective coeHicients of' the O LS regression line: Te~'ISwre ~ "" 698.S) also rcporl s Lhe regression R2and Ihe Slandard erro r of the regre..05 or.. if It.6.· sian (S E R) followin g the estimated regression line. llle Istat istic is const ructed hy substituting the hypothesized value of fJl under the null hypothesis (zcro).O <..S) Equation (5.96. (5.7)J. ll). and it will he used tbroughout the rest of th is boof..3)]. The :. Reject the hypothesis 011 the 5% sig nificance level if the pvaluc is less than 0. .2.. Compute the standard error of ~I' SE(P .. yielded iJo = 698. and its standard error from Equa ti on (5.  TIle standard ~rror and (typica lly) the tslatislic and pvalue testing f3! = 0 afe computed automatically by regression soflware..ncl udtXI whe n reporling the estimated OLS coe((icicn ts. Reporting regression equations and application to test scores. * 5.9 ...4 and S£(&l) = 0.8) at the 5% signi lica nce level. 1.52) . Thu s E quation (5 . R( 10.4) (0...5)].~ • 152 CHAPTER 5 linear Regres1ioo with One Regre~!>Or KEY CONCEPT TESTING THE HYPOTHESIS {31 = {3. estimates of the sampling uncertainty of the slope and the inlercepl (tbe standard e rrors). 3.' I ~\[ldard errors of these esti ma tes arc SE(~n) = 10.28 X STR.28. 0 AGAINST THE ALTERNATIVE {3. SER ::: 11:1. by convent ion they are .2 .8) into the general rormula .52.2. the estl" mated slope. The O LS regression of lhe test score against the studentleacher ra lio.OS l . Suppose you wish to test Ihe null hypothesis 1hllt the slope PI is zero in th. Because of the imporlance of the standard errors.9 anti ~I = . Compute the tstatistic [EqUlIlioTl (5.) {Equation (5. 2. equivalen tly. and two measures of the fit of this regression line (the! R2 and the Sl:R):l11is is a common form at for reportlllg a single regression equ ation. {3'l..1(11> 1. repon ed in Equa lion (4.S) provides the estimated regression lin e. To do so.: population count erpart of Equat ion (5. Compute the pvaluc [Equation (S. the 5% (twosided) critical \'alu ~' take n from the standard normal distri bution.= O.% . construct the Istatistic and compare it to 1.
S. book in the One.0000 1.: to c:om:lude thai Lhe n ull hypothesis is fal se. is only 0.ac' = 4_ 38.4. I are N(O.%.. where Z i~ 0 stondord normal ro ndom variable and fX' is the value of the tstatistic colculated from the ~mple.0' Sometimes. when . IW can wmpute thepvalue associated with ! "O .28 .2.s.S) c !'it of porting ~ i.o. as shown in Fig ure 5. This probability is ext. t h~ result .38. Thi s Is ta [ist ic. however. AlLernativeiy. 1 Calculating the pValue of a TwoSided Test When t(J(l = .001'Yo .52 = .o against Ih e h ypothe si~ Ihat {31 f3 !. it is appropri ate to use a o ne sided h y pot hc~is tcst. the pro bahility of o btaining a value of as far fr om the null as the value we actually obt .11115 probability is the area in th e tails of standard normal distribution. Th e dis(:ussion so far has foc used on testi ng the hypothesis th.0)/U.remely small.:. if the null hypoth esis {3Ck'>iS. • .lined is.38. That {JI I rc£.resinty of I (5.4. For exa mple. in th e studen t. IOLS ~ qua ~ 1M pvalue Is tilt area ~.38. 1) ~.:n f3! = f3l. it is rcasonubl(.' = 0 is true. approxim ately 0.'luse under the alternative /3 1could be eithe r larger or sma ller than 131.l Testing Hypoiheses About One of the Regression Coefficients 1S3 FIGURE 5.Sided Hypotheses Concerning P.5).8) in Equation (5. bec.001% . ma ny people think that sma lle r classes provide a better o do SO . l e~s than 0.1.38 The pva!ue of 0 twosided test is the probabili ty thot II > 1t"""I. o r 0.$ I QCf = ( . exceeds ( in absolute va lue) the 5% twosided cri tical value of 1.l to the riqIIt of "'4. the pvoloo 1 s lg 96.y are ra\' to bcu\·e (5. e xtre me ly Slll<l ll. at value e te ct ll~ the e~ti for !l1 11 \l1 * • . to thf ~ft of U8 l cst! the art.00001 .4. 0 ·DM . Because this eve nl is so unlikely.teacher ratio/test score problem.1l1is is a tWOsided hypothesis test. so th e null hypothesis is rejected in ravor of Ih" twoside d alternati ve at the 5% significance le vel.
') (pvalue.9) is reve rsed . prior e mpirica l evidence. For the ones id ed alt ~ matj ve in Equm ion (5. (5.0 vs. In prac · tice.10) are reversed.154 CHAPHR 5 lilleQr Regre~~ ion with One Regreuor learning environment. the hypothesis is rejeCied at the 5% significa nce level if l ilCI < . th e inequa lities in E4u3tio ns (5 .o' If the ahe rna tive is that P I IS greater than f31.96.. or bOl h.. . TIle onl y difference betwee n a one ..645.n the cl ass size eXlImp1c. is less tban f3l..<1>(./\ > 1. This rea son could come rrom economic theory.and a two sided hypothe sis test. therefore. Because the null hypothesis is the same Ior a one.9). Ho"' ever. such ambiguity often leads economeuicians to u~ twosided tests. the null hypothesis and [he oncsided alternative hypoth esis are Ho: {31 = /11 .o. Under that hypothesis. th e constructi on of th e .onesided lefttail les!). but not large posit ive.teacher ra tio exam ple) a nd the alterna ti ve is tha t 13. For a onesided lest.9) where P.. (5. upon reflection th is might not necessarily be so. the nuH hypothesis is rejected aga inst the o nesided alte rnati ve for large negative.. PI is negative: Smaller classes lead to higher scores. Pr(Z > ( l!('I ). one si ded a1t e rn a l i \o.and twosided hypothesis test is how you interpret the (statislic.statistic is the same. When should a one~sided test be used? In practice. so the pvaluc is the rightt ail probability.9) and (5 . to tes t th e nu ll hypothesis that P I = 0 (no drect) against the onesided alternative th at (3 1 < O . HI : P I < fh. I. A newly rormulated drug undergoin1! cli nica l tria ls actually co uld prove harmIul because o f previo usly unrecognized sid e effects.10 ) If the alternatjve hypothesis is lhat {31 is greater than P I». .1. va lu es of the ( st<uistic: Instead o f rejecting it' . The pvalue for a onesided test is obtained fTom the cumulati ve standard nor mal dis tributiOI) as pvalue = Pr( Z < "4 . we are reminded of the graduati on joke th ai a uni versity's secret of success is to adm it talented students and the n make sur~' that the facu lty stays oul of tbei r way and does as linle damage as possiblc. (Doesided alternative) . 0 is the value of P.the inequality in Equation (5. eve n ir it init ia lly seems that Ihe rclevant a lte rna ti ve is o nesided ..0. It might make sense. unde r the null (0 in the !>I udent.~ hypotheses should be used only when there is a clea r reason {or do ing so.
Occasion ally. appli e d to. we cannot de termine The true value of f31 e xactl y from a sample of da ta.2 Coo ~del'l(e Intervals for a Regression Coeffici ent 155 :ad h at ) Ih 5.()(XI6%. This calls for constructing a COnfidence inte rva l. th e hypothes is concerns the intercept.33 ( the crit ica l value fo r <l onesided test with a 1% signifi cance Ic\'e l) . This is less tha n 2.5.0 = 0 in Eq ua tion (5. HoW ~ergoi ng The genera l approach to lesting Ihi s null hypothesis consists of the L hree steps in Key Concept 5.9)) is / iKI = 4. Being able to uccept o r to reject this null hypothesis based on the statistica l evide nce provides a powerful tool for coping with the uncer tainly inherent in using a sample to learn about lhe population. rd. In prac 5. Based o n these data.li os t the onc sided alternat ive at the I % level. 131 is Testing Hypotheses About the Intercept 130 ot be ~ec n lr the .st the the t 5% 111i5 discussion has focu sed on testing hypotheses abo ut the slo pe.l3o. ~ rn ati vC l11is rca' b. and instead one would like to koo w a range of values of the coefficient tha i are consiste nt with the data. there a re many times that no single hypothesis a bout 3 regression coefficient is domi na nt. t3o. this approach is modified as was d iscussed in the previo us subsection for hypotheses abou t the slope.o (twosided alte rnative).cessarily has sampling unce r tainty. H ypOl hesis tests are use(ul if you have a specific nuU hypothesis in mind (as did o ur a ngry taxpayer).11) lC fd nor P o (S. upon f.I3o "* f3o.2.IO) ~it iCS in ~ability. (3\ .1l1e null hypothesis con cerning the inte rcept fi nd the twosided a lte rnative are Ho: (31) = .0gnized ~ o k e that )ukc sure ~. It • .38.1J vs. (5.9) ~am Application to test scorl!s. however. If th e alte rnative is onesided.1).l3o (the formu la fo r the standa rd error of is given in Appendix 5. H I :. In fac!_th e pvalue is less tban O. Ye t. you ca n reject the a ngry ta xpaye r's assert ion tha t the negat ive estim ale o f the slope arose purely beca use of ra ndom sampllng variation at the I % signi ficance level . lnc Istatistic tcsting the bypOl hes is tha t there is no effect of class size on test scores [so /3 1.2 Confidence Intervals for a Regression Coefficient Because any stati stical estimate 0 1 the slope {31 ne. so the null hypothesis is rejected 3g.
· Reca ll .! It 95% co nfide nce in ter val ctln be computed hy tcsting all possi ble \'al· ues of /3 1 (t hat is. tes ting the null hypothesis /31 "" (Jill fo r all va lues of /31. Because the 95 % co nfide nce inte rva l (as de fi ned in thc t1r\t de finitio n) is the set o f all valu es o f 13 1 I1Mt a re nor rejec ted 3 1 the 5% significan ce level. First .sta Tislic.1<11.96S£(P ')'{.. by definitio n. Th ~ OLS reg ressio n of Ihe tcst score again ~t til" stude nt.).28 and SF:(~. = T he conSl ructi un of a co nfidence in terval for Co ncept 5.hat a 95 % eo nfi d ~ n ee interval for 131 ha.amplcs the true vulue 01 will 1/01 be rejected . 156 CHAPTER S linear R egression with One Regressor is.. 1. the 95% confide nce interva l for (JI h the int cc\:ll 1. l '= 0.•.26. As in th e case of a co nfidence inte rva l fo r the pop ulatio n me a n (Section 3. two e q uiva lent de Clllilio ns.3. B UI construl·ti ng the tsta lis tic for a ll va l u e~ o f /3 1 would ta ke fo re ver.. possible to usc the OLS estimator and its sta ndard erro r to construct a confide nce inle rva l fo r the slope 131or fo r the inte n. it is the sct of va lues thai ca nnot be rejected u ~ i ng a twosided h)'pothesis les! wit h a 5% significa nce level. in principiI.2. A hypothesis t ~ t with il 5 % s ig nificance level will.28 .81and SF.. yielded == .' 1 . PI .0) ill thr. the confidence inl erval will contain the trUI: va lue of Pl' Beca use this inte rva l contains the true val ue in 95% of all sa mp les. TIlat is. in 95 % of a ll possible . The 95% twosi ded confidence interval for /3 is [.%S£ (P.: 0 is not contai ned in Ihis confide nce lntc n <ll. il is said to have a confidence level of 95%. 131 is summari/c d ....30 :5 13 1 S . re po rted in Eq uation (5.96SE<p.3. Application to test scores.2.(p]) .H). that is.:cpt /3r.Th is argume nt para lle ls the argume nt used W develop a confide nce inte rval fo r the populati o n mean.c PI l.3).ll fo r f3!\ is const ruclt:J i3u <lnd SE<flIl) repiacing . Second. The value {:J I ::.. Confidence intervai for 13 . wi th Po.: 5% sig ni fica nce le ve l using the .teacher ratio.1 + 1.8. reject the true va lue of 13 ) in on l} {J I 5% of a ll possible s:lmples. however. 1 . A n easier way to construct the confid e nce inte rva l is to nOtC thlli lhe 1. A 95% conridence in te rv.3. in 95% o f ~ sible samples tha t might be d rawn.. will he contained inl he co nfi dence inter · val in 95% of all possib le sam ples. it fol l ow~ th a t the true va lue of {3.521 tlf . il is an interva l ilia! has a 95 % probabili ty of containing the true value of 13 1: that is..96 X 0. lie will reject the hypothes ized va lue J3 I.1.:I S K ey Confidence interval for as in Kt:y Co ncept 5.. llie reason these two de finitio ns a rc eq uivale nt is as fo llo ws. T he 95% confiden<:e inte rval is tlwl! the cnllectio n o f nJl lhe va lues of 13) Ihnt a re nOl rejec te d.)}.52.U whe never /3 10 is o utside t he rang.
96SE(P.52 a nd 6.3 "is l l!SI only lut' of IP.13) l SITlIC1.1 2) c n r~ l icullce . p . Because one end of a 95% confidence interval for III is ~l .. ) x . + 1.1.l.f pos true rtCs. The 95% con· n 3.! predicted change in Y ASS O cia ted with this cha nge ill X is f3lux. Tht..1.p.teache r ralio by 2 is predicted to increase test sco res by between 2.teacher ra1io by 2 could be as great as . Thus decreas ing Ihe student .. rii&'I(lf*'4 KEYCONCEI'T I ~sing has erval ~ . it A 95% twosided confidence interval for {Jl is an interval tha t con tains the t rue \.96SE(~1 ) 1 X ax. inter so (as we knew already from Section 5.96SE(jJ I). or as littl e as . The other end of the confideoce inter val is ~ I + 1. as Key 95% confidence in te rva l fo r PltU: = IPl <l x . or e int erval. Thus a 95% confidence interval fo r the effect of chang ing x by the amount tJ. 60.3. the eHect of reduci ng the st udent.60 points.: is then lng the Istcll is I ran ge e used !O iin tt:rval fidence interval for (31 can be used to construct a 95 % confi dence interval for the pred icted effect o f fI general change in X Consider changing X by a given amo unt. x.30 X ( .3 ). but because we can co nstruct a confidence i. it contai ns the true "'" Iue of f31 in 95% of aU possible randomly drawn s amples. For example.) x ~x.2) = 2.26 x ( .alue of /3\ with a 95% probability: that is.> x l." CONFIDENCE INTERVAL FOR (3.1. ~ x. . Ie val al th. Beca use Ihe 95% confidence in terval for /31 is [. it is th e set of values of {3 1 [hal can no t he rejected by a 5% twosided hypothesis test .1 .1. • .96SE(P. Equivalentl y.2) = 6.96SE(P .96SE(~I ) I x !l x..30.x can be expressed as u3.)].1 ) the hypothesis III = 0 can be rejected at th e 5% significance level. our hypothetical superintendent is contempl ating reducing the student.teacher ra1io by 2.2 Confi dence Intervals for 0 Regression Coefficient 151 ~ 1  .52\.nle rvai fo r Ill .dx • 1.( P11 ( 0. with a 95 % confidence level.26]. (5. (5. Confidence intervals for predicted effects ofchanging X . The pop ula tio n slope f31 is unknown.96SE(~l) ' the predictcd effect of th e chan ge !l x using this estimate of (31 is [~I .we can construcl a confi dence interval for the predicted effect f3 1tJ.d ~a im' t \ h l! \d SE. When the sample size is large. ..3.52. kruct 5. j( is conslTuCted as 95% confide nce inte rval for (3 1 0:: 5.965£(&1) ' and the predicted effect of the change using that esti mate is + 1.
wbether a school dislfict is urban or rural ( = 1 if urban..4. the coeffi cient on 0i' l! /3 J in Eq uation (5 . (0 .l me as the r. A binary variable is also called an indicalor \'ariablc or somet imes a dummy \·Hriable. If the studt. + /I .16) . thC:fI D. it j.0 and' Eq uat ion (5. Because D.teache r ratio is less than 20: 1 if the stude nt.teacher ratio in 0· = { .. Th e interpretation of /31' however. 11. as descrilxd in Secti on 3.! way \0 inter pre t f30 a nd /3 t in a regression with a binary regressor is to co n~id e r . one at a l im~ the two possible cases.l + II..:nl.or is Y. 1h district < 20 district <!: 2U. (5. (5.:gression model with the con linuous reg resso r X ...15) becomes Y.Ieachcr rat io is high . For exa mple. X mig. indeed. 0 or 1.. or whether Ihe district's class s i7c is small Of large ( = I if small.lhat is. ::: 0 (11ld V I = I. except Ihat now the regr~sso r is the binary variable D j . Thus we wilt not refer to /3 1 <IS the sloJX in Equa tion (5.: coeffid ent multiplying D J in this regress ion o r. h not tonti nuo us. = 0 if rural).1 51 This is the S. 0 . suppose yo u have a variable 0 . . because D j ca n ta ke on o nly Iwo \ <11· ucs. more compactl y. and il turn s OUt tha t regres· sion with a bin ary variable is eq uivalent to performin g a lIirf~ r ~nce of means analysis. = 0 ir male) . .15) is 11 0 1 a slope. 0 if the student. i = 1. not useful (0 think of P I <IS a slope. To sec this. 15): instead we will simply rere r to P I a~ Ih. (5. there is no " li ne" so it makes no sense to talk about a slope. = 0). . is different .ht be a worker\ gender ( = I if fe male. variable. = 0 jf large). depend· ing on wh ether thl:! siud ent. Interpretation of the Regression Coefficients llle mechanics of regression wi lh a binary regressor arc the same <IS if il is con · tinuous. == A. whe n it lakes on o nly two va lues..3 Regression When X Is a Binary Variable The discussion so r<lf has foc used on the case that the regressor is a cont in uou'i.teacher mlio in .::: f30 + PJD. lhal equals either 0 or I.158 CHAPTU S linear Regression wi th One Regressor 5. . then what is it? TIle be:. 1h . Regression a n a ly ~is can also be used when t b ~ regressor is binarY.141 The popuhuion regression model with D j as the regre5<.
£( Y. then 13 1 in Equation (5. If the two population means are the sa me.ID. and in fact this is t he case. it makes se nse (hal the OLS e~timator f3. provides a 95% confid encc in terval for the difference between the I WO popula iio n means.teacher ratio binary variable D defined in Eq uation (5.1I the two population mea ns a re the same can be lesled against the " licrna tivc hypoth esis that they diIIer by testing the null hypothesis f3 1 = 0 against the ahernalive (3. til.96 in absol ~te value.{3o is t he popul ation mean valueof lest scores when the stude ntteacher ratio is high. = I. = I and f3'1 is the pop ulation mean of Yi when Di = O. the null hypoth esis can be rejected at tht' 5% level agai nst the two sided altern ative when the OLS Istatistic I "" ~\I SI::(~I ) exceeds .3 Regression When X Is a Binary Voriable 159 iouO US ~·~th a( Because E(ud D. Similarl y.1 Di = 0). Because f30 + f3 j is the popula tion mean of Yi when 0. when D. = 130 + 13 1 + II.f30 = /3 1is the differ ence between these two means.::: 1). whe n Di = 0 is £ (YiI D .15 ) r X" except :muous. Specifically.14) esti mated by OLS usin g Ihe 420 obscn'ati ons in Figure 4. it is rliy tWO val we will not . (1. 10) .111is hypothesis can be tested lIsing the procedure outlined in Section 5. Because {3] is th e difference in the porulation means. to /31 as the ~ coctlicient Application to test scores.0 + 7. = I and when Di = (). (5. or /31 == £(Y.40.2. ) = 0. is high. in th c two . is tile difference between the sa mrle averageS of Y. As an exampl e. Si milarly. yields res/ Score = 650." Nay 10 again slthe student.E(Y.Jtion mean value of test scores when the student. = 0) = /30: that is. = I) .. 17) TUTal :::n large ThUs. R2 = 0. In oth er words.2. /31 is the differe nce betwecn th e conditional ex pect ati on of Yi when D.7. SER == IS.0. when D j = I . /31 is lhe d iffe rence between mean lest score in d ist ricts with low student. 18) (5. depend· = f30 + /31. a regression o f the lest score inter Ine at a ti nle.teacher ratios and the mean leSt score in diSlricts wi th high studentteacher ratios. 13u + PI is the popul.15) is rem.leacher ralio is low. the conditi onal ex pectation of Y. (5.035.\ /J. 'orker's Jf Y..1. "" I) 'iable or it is ca n· at regrcs of nleans . In the lest score example. groups. Ibm i!>. constructed as (3 \ :: 1. (D . +. the difference ({Jo + f3\ ) .14) Hypothesis tests and confidence intervals.1.8) ~ (5. Thu ~ the Dull hypothesis Ih. a 95% confidence interval for (3 \.S.3) ( l. (5.96S£(/3 I) a" described in Section 5.
]Ill plificd formula s in practice. its theoretical implica tions.. This is the OlS estimate of fJ. This exceeds 1.4 :!: 1. given X. .. 10.ti. lhe n the errors arc said to be homoskedastic.: between the sample average It:st scores for the two groups is 7. IhC II.11\1 confidence int e rva l e xclude:. fu rthe rmore. and the risks yo u run if you use these .9. .0) is 650..teache r ratios less than 20 (so 0 = 1) is 650. If.96 x 1. What Are Heteroskedasticity and Homoskedasticity? Definitions of heteroskedasticity and homoskedasticity.4 / 1. 5 lineor Regres~ion witn One RegresSOf" where the staodard errors of the OLS estimates of the coefficie nts f3 0 and {3t ilro. Thus the average test score for Ih\: subsamplc with student.teache r ratios grea ter than o r equal to 20 (Ihat is.. The OLS estima tor a nd its sta ndard error can be used to construct a 95% Con fidem::e inte rval fo r the !r ue differe nce in means.8 ~ (3..0. Th is section discusses homoskedasticity. the I'orial/n' of this cond itio n.teacher ratio binary variable D..8 = 4. illr which D . construct thc .4 Heteroskedasticity and Homoskedasticity O ur o nly assumptio n "bou t the d isLribulion of lI. Is the difference in the population mean tesT scorl!S in the two grou~ stati. (11 = 0.4 = 657'. OtherwisC".. The differcn. Th is is 7.. so thc hypothesis that thc popu la tio n mean test scores in districts with high a nd low MU dent.n and in particu lar does nOI depe nd on X.4. I . 1distributio n does no t de pend o n X. cally significantly different from zcro at the 5% level? To fi nd OUI . 5.st a tistic on {31: ( = 7. the simplified formulas (or the standard errors o f the OLS estimators that a rise if the errors arc homoskedaslic.9). given in parentheses below the OLS estimates. and Ihe average test score for the subsample with Stu dent.160 CHAPTER.teacher ralios is the same can be rejected a llhe 5% sign ificance leve l.0 + 7.:. c rror term is h e l ero~ ke d as tic .04. the coefficient on the stude nt.conditional o n Xi is Ihal11 ha!l a mean of zero (t he first least squa res assumption) . Thc error l~rlll is homoskedaS if the variance of the condi ti onal distri bution of II. so tha t (as we know from the 5 previo us paragra ph) the hypothesis f3 1 = 0 ca n be rejected a l the 5% signt ficance level.96 in absolute va lue. is Iic' constant for i "" I.
so Iha t (<IS we know from Ihe previo us parasxaph) the hypOlhesis 131 = 0 can be rejected :l tt he 5% significance le vel. 7..8 = 4. for which D == 0) is 650.. If.• 160 CHAPTER 5 linear Regreuion with One RegreMOr whe re the standard errors of the OLS estimates or th e coe ffi cients (ju nnd {J.. What Are Heteroskedasticity and Homoskedasticity? Definitions ofheteroskedasticity and homoskedasticity.onst ruct a 95% con fide nce inl(.04 . .teacher ratio binary variable D. thai arise if the errors arc homoskedastic. .l lte differen . Th is exceeds 1. (3..lIcs. = ~ .s you run if you usc t h e~ sim plified ro rmu l a~ in pracLice.H :. [he error term is heleroskedaslil. subsample with studen t...teacher ra tios is the same can be rejected at the 5% significance level. The OLS t!~l i m ato r and ils standard error can b~ used to(.sta tistic on 131: t = 7... 11 lind in particular docs not dcpend on Xi. Is the diffe rence in the population mean lest scores in the IWO grou ps slatl'>\I' cally significa ntly d iffe rent {rom zero at the 5% level? To fin d out. co nstruct the . conditional o n X i is that it ha~ a mean of zero (the fir st leas t squares assllmption ). the simplified formulas for the standard errors of the O LS estima tors.teacher ra tios grea ter than o r eq uaJ to 20 (tbat is.4 Heteroskedasticityand Homoskedasticity Ou r o nly assumptio n about the distrihution of u.0 1herwise.Ieache r ratios less than 20 (so D . an d the risk.4 ~ 1. the l·ari(llU"t· of this condi tional distribution does not depend on X" tben the errors are said to he bo moskedaslic."'0. This is the OLS estimate of 13••the coefficient on the st udent. furthermore. This is 7.: I) is 650. Thus the average test score for tho. The error {al11 uj is homoskcdastic ir the va riance of the conditional distribution of II.4 / 1. is constant for i = 1.9).0 . de nt.':(\" .\1 for the true differen ce in means..4 657..% x I..HI.0. .! given in parentheses below the O LS estim.4. its theore tica l implica tions.9. gi\ cn X. and the average test score for the subsu mplc with Slu. .4. Like F 5. Th is section discusses homoskedasticit y."\! betwee n the sample ave rage \es\ scores (or the two groups is 7. . 10. so the hypothesis tha I th e population mean test SCa TeS in districts with high and low ~IU · dent.96 in absolute value.... Th is confide nce interval excludes /31 = 0.
4.2 the variance of uj given Xi <= x increases with x. For small val ues of x. / Y hmX . re turn to Figure 4. I y varia/l ce of .. Thus. so tbe errors illustrated in Figure 4. Because tbis distribution applies specifica lly for the indicated vaJue of x. in Figure 5.tribution of lesl ~es 101' Ihree diffeeenl doss sizes.. bu t for larger values of x. il has a greater spread. in Figure 4. In contrast. for \\'ith stu lifferellce . spreads o ut as x inc reases.8 =' )W from the ~igni{ic a ncc 6OO. As drawn in tha t fi gure. The dis tribution of th e errors II .>0 StlldcntIeac her ratio l C' Uke Figure 4. That is.2 An Example of Heteroskedasticity Test seore 72U 7()O "1M"'''' 0 'hm) O_tioo " XIS '" 660 1.. Ihe varia oce of these dist.4.kedostic .4 are homoskeoastic. this d istributio n is tight.t.~f~~__~~~~__~ ~__~~~____~ J'i 10 25 . x does nOt de pe nd on x. Unlike Figure 4.2 illustrales a case in which the condi tio nal dis tribuLion of Il.4.!L ~l 95 % con 64U 620 IU /3 0 . The definitions of heteroske dasticity and nomoskedasticity are summarized in Key Concept 5.. vor(u(XI.2 are hete roskedastic. the . Becouse lhe voriance 01 1 the dislribution 01 vgiveo x. all these condi tional distributions have the same spread: more pre cisely.ld low stu lcvl. hetero.'( f Ywl\en X= 25 .ri bUlioos is the same for the various \'al ue~ of x. given Xi IS Jtherwise. so the FIGURE 5.4. t!lis is the conditional distribution o f IIi given X j x.orM:liliOflOI di!..the QLS b\e D ps statisti lstruct the lue. Figure 5. tne:le distributions become more $preod out lhave 0 lorger variancei for lorger doss sizes.4. the conditional variance o f Iii given X./J IX 96x 1. that it ha:'o a As an illustration .. 0:= 0:= l[C said to be lica! irnplica tirnato rs that Ise these sirn 11C e rror tern 1 f II . so that tile e rrors in Figure 5.4 Heteroskedasli<:ity and Homoskedosticity 161 10/31 arc ~ fo r the la\ is. is shown fo r vario us values o( . u i. 20 01"" 0/ ". Ihi~ ~how1 !he c. depends OIl X.
h woman's ea rnings from the popu l. be a binar) va riable Ihal equals 1 [or male college gralJua tes a nd equals 0 for female gradu a tes. He re the regresso r is MA LE. = f30 + Earnings. (wome n ) a mI (S. we digress from the stude ntteache r r.f3 1)' Jt foll ows tha t the: . = II .. (or wome n. it is use ful to write Equa tion (5. (Jl i!> the diffe rence in the popu lation means of th e t wo group ~ . IIi is the deviation of the /. Example. s t at e~ (h a t the va ria nce o t II .in this case. O therwise. = f3(1 + {3 \M A L E i + It. is the de viation of the /.lt ia/lest score problem and in stead re turn to t he example of ea rn ings of male vcr sus female coJlege graduates co nsidered in the box in Chaple r 3. 1~) (or i = 1." Lei MA LE.162 CHAPTER S Linear Reg re~sion with One R egressor KEYCONcm HETEROSKEDASTICITY AND HOMOSKEDASTICITY The error te rm If. In ot he r wo rd~.te roskcdastic.. In this rega rd. t he differe nce in mean e a rn iTl g~ betwee n me n and women who graduated from college. .. one for me n and one for wome n: Eamiltgs. (5.20) t30 + f3 1 + II. frO Ill the pop ulatiolllllcan earnings for me n (f3o '. does \lll t depend on the regressor.h ma n'~ )' earning!.. Deciding whet her lhe vari ance of " i de pe nds on MALE. so at iss ue is whe the r the va riance o f the e rror term de pe nds on MALI:. . is homoskedastic if the varia nce of the conditional distribution of IIi given X .. .onstalll for i = 1. To hc lp cl a rify them with an e xam ple. (5. Because Ihe regressor is hi na ry. u... 11. . (m en)..4 .1' li on mean earnings for women (l3Q and for me n.1 9) as t\\·o sepa rate equations. is (. var{lIi l Xi = x). . the error te rm is he.ll ) Thus. . it is heleroskedastic.. is the va ria nce ot' the e rror te rm the same for me n a nd for wume n':' If so. the e rror is homoskcda ' tic: if not . 'l11e bin a ry \'a riable regressio n model re la ti ng somcone 's earn ings (0 his or her ge nde r is &millXs... 5. requ ires think in!! hard a bout what the error term act ually is.". and in particular doe!' not depend un x. "'Jh e G ~ nder G ap in E a rni ngs of College Grad uales in the United Slates. These terms a re a mouthful a nd the defi nitions might see m a bstract. The definition of homoskcdasti cit). 11 .
"the variance of e arni ]]g~ is the same for men as it is for women. the OLS estimators remain unbiased and consiste nt even if the e rrors are hOOloskedastic.1 . "th e variance of II. docs not whether the variance of omoskedlls res thinking . the error term is heteroskedastie. p homoskedasticityonly estimator of lhe va riance of {J l: ~ v'iif. Whet her the errors are homoskedastic or he teroskedastic. consistent .tract. that tht: where s~ is given in Equation (4.l n addition. which is called the GaussMarkov theorem. then the formulas for the variances of ~o and ~! in Key Concept 4. is the p.22) (X. . Therefore. the OLS estimator is unbiased.' . the • . does not depend on MALE. If the least squares assumptions in Key ~o llceel 4.19).X) n the popula .f the i!h man'" IlIow!. In the special case thaI X is a binary variable.t he e rror term is homoskedastic if the variance of the population distribution of earnings is the same for men and womcn . i =i ~ J" .5. Because the least squares assumptions in Key Concept 4. Consequentl y. X. (5.4 simplify. then the OLS estimators 130 and {J I are efficient among all est ima lors that are linear in Y j . in this ~Xal11 p k:. is SE (Pl ) = where ij~ . ••• • Y" and are unbiased. 1 ab!. if these vari ances differ. This result. The homoskedasticity·only fornlul a for the st andard error of ~o is given in Appendix S.211 2 a = ~J . is discussed in Section S.. the OLS estimators have sampling distributions that are normal in large samples even if the e rrors are ho moskedastic.1Y) Efficiency ofthe OLS estimator when the errors are homoskedastic.wri te Equa 'n : Homoskedasticityonly variance formula .3 place no restrictions on the condit iona l variance. conditiona l on XI".3 hold and Ihe errors are homoskedastic. the t:~timator of the variance of P I under homoskedaslicity {thaI is. t_teacber male ver :nder Gap e a binary ale gradu tS to his or (5 .S..1." In otlleT words." is eq uiva lent to the st atement. they apply to both the general case of heteroskedas· ticily and the special case of homoskedasricity. if the errors are homosked astic.4 Heteroskedos~city and Homoskedas~cily 163 lribution does not statement. and asympto tically normal. 1 the popu HI earn ings H. Mathematical Implications of Homoskedasticity The OLS estimators remain unbiased and asymptotically normal.L.2 (homoskedasticityonly) .20) (5. (5. then there is a specialized formula that can be used for tbe standard e rrors of ~o and ~l' The homoskedaslicityonly standard error of ~l' derived in A ppe ndix 5. . If the error term is homoskedas tic.
UJI #ll What Does This Mean in Practice? Which is more realistic. As the name sugge. wi th how pcopk are paid in the world around us gives ~O Oli. and White (1980). the correct critica l va lues 10 u~e fo r thi ~ homoskeda sticityon ly (statist ic depend on the precise na tu re of the het ct\1skedilstici t y. if the errors !Irc hctcr<)skedaslic. to a le ~ ~t:r exten t. they will be referred to as the " homoskedaslicityonly" formulas for the variance and standard error of the OLS esti mators.. so those (.J on thoSt: standa rd e rrors a re valid whether o r not the errors a rc hd eroskcdastic.• .women were not found in th e toppaying jobs:There ha ve alw a~~ been poorly paid men. and (jJ" of the va riances o f PI and given in Equa tio ns (5 . In other words. those based on Eq uat io ns (5. thaI are valid whether or not the errors are beleroskedastic. Specifically. but there have rarely been highly paid womc n. heteroskedasticity or homoskedosticity? ·lllC answe r to thi ~ question depends on the application. Thus hypothesis tesLS and confidence intervals ba~e t.ritical values cannot be tabulated. Because these alternative formulas are deri ved l or the special case thallhc errors arc homoskcdaSlic and do not apply if the errors are hClcroskcdastic. i.'ll in ference.· clues as to which assumption is more sensible. even in large ~a m p l es. they are also referred to as EickcrHuberWhite stan dard errors.23). cven in large snmpics.. In conlnlSt. For many yearsand .Cluse ho moskedaslicity is a special case o r he te roskedaslicily. give n in Equation (3. then t he homoskedastici tyonly standard errors nre inappropriate.26) produce va lid statistical infere nces whe the r the errors a re hel e roskcdast ic o r homoskcdastic. In fact.4) and (5.. if the errors are het croskedas tic. then the {statistic computed usin!! the homoskedaslicityonly standard error does not have a standard norm al distIi hUl ion.96 homoskedas· ticilyonly standa rd CHors. Because such fo rmulas were proposed by Eicker (1%7). Th is sug gests that the distribution of earnings among women is ligh ter than among men (See the box ill Chapte r ). Simil arly.26)] lead to stalislic. Because the swndard errors we have used so far (i.n general the probability th at thi s intervid contains the Irue va lue of th e coefficicnt is not 95%..4 ) the est imators and (5. Familia rit). today.sts.. the i"slles can \:"It' clarified by returning to the example of the gender gap in earn ings among collcgt: graduates.e.. if the error~ are hi:!lcroskedaslic but a confidence interval is constructed as == 1.. However.5 linear Regression wilh One Regre5S01" square of the standard error of ~l lLndcr homoskedaslir:ily) is the socalled pooled variance formula for the di ffe rence in mea ns. CHAPTER. "111e Gender Gap in Earnings of College Grad uates in the United States") . the variance of the e rror term in Equ a . be(.they are called heteros ked ast idtyrobust sian da rd errors . lJu b~r (1!J67) .
'lbe data come from the Ma rch 2005 Cu rren t Population Survey.47 for eacb additional year of educatio n. BUI if the beslpaying jobs mainly go to the col lege educated. The 95% confide nc¢ interval for this coefficicnt is 1.:' ill n in E qu a '" ~ I ..nd.93) (0. slItndard deviation is $7.4 Heteroskedaslicity ond Homoskedosricity 165 loled the ·.11 . This Ho urlr cUlli ng..96 X 0.ol1. no t ali college grudu.rrors n::::q uin's analyzing data.43: and for workers with a college degree.to 3OYear Olds in the United States in 2004 ?. This can be stated more precisely by lookin!! at the spread of Ih ~' rc ~ idual ~ IC' hel out as education increases? This i ~ an empir ical q uestion. . H uhe r lOS in A ppendix 3. lbe first is that the mean of the distribut ion of earnings i~reaselt with the num ber of years of education.3 is that tbe spread of the distribution of earnings increases wi th the years of r:duea tion . S£R . [t11 only te n years of education ha\'C no shot at those jobs. distri or this O n average. but some wili .33 to 1.i vcssome to a lesser a ve alw a ~ s 1. 01 education H" lor 2950 full·ti me. 'Ibis increase i:. with between 6 and 18 ~ears of education.47 ::!. til 15 "'''.. 1. on 'ty ? The les can lx ng colkg. Hourly &arnings are plotted against year.4) HC hel ntervals are hcl bascdon 1ethc r or ). this standard deviation increases to $10.. the standard deviation of Ihe residuals is $5. The coefficient of I 47 in the OLS rcgr es.~7Years Educm. The second striking fr:ltl ur.ith more education have higher earnings Ihnn workers .the regrc~s i()n error~ are hete roskedastic.77..5 . which is de~ibed skcdas a ins the arou nd the OLS regression line..3.cClling that rhe regreuion erren ore hel~kedo~tic . ". or \ using tion. r~ or ~u n"jon 20 .mong ml!\l radua h. workers . by $1.llnd worke r.. the variance of Ihe residuals in the regressio n of Equation (5.u es \\i ll he earning S5OIhour by the time they Art: 29.II!illl . Figun: 5. so ans we ring it : <.\1 'I! and 'C het pri atc. very fe w workers wiLh Jow li:\'c!S of education have highpaying jobs. .23) depends on thc value of the r<:grc)sor (the years of educlltion): in other WOrds...B + 1. (0.e FIGURE S . Figure 5. (5.130. ages 29 and 30. !he yean of education.. summari7ed by the OLS regression line..78. for workers wi th a high school diploma.. they .us! st an ~ "" . For workers with len years of education.ilh le~ educa average.3 has two striking features. The spt"eod arOUnd !he regression line increases wit/. While some ".07) Rl = 0.: of Figure 5.1. In realworld terms.. hourl~ earnings increa5c. .07. 29· to 3Q·yoor·old 'H(lrkers. Because thcsc stAnda rd devillliolb differ for diffe rent lncls of education.3 Scatterplot of Hourly Earnings and Years of E ducation for 29.61.23) !lile sIan This line is plotted in Figure 5. .3 is a scanerplot of the hourly earnings and the Ilumherof years of edu· cation (or a sample of 2950 fu lltime workers in the United States in 2004. s ug. I L.. 8.. il migh t n l~o be tha t the spr('u(1 of the distribution of earnings is gl't"ate r for workers with more education. this laSlieilY· (5 .3. orkers with many years of education ha\'~ lowpa~ in g jobs. Docs the dis tri bution o f earnings ~pread 1.~ion line means tbat .46.
The simplest tbing.5 The Theoretical Foundations of Ordinary Least Squares As discussed in Section 4. . hcteroskedasticity arise!> III many econometric applicalions. The de tails of how to implemcnI bctcroskedasticilyrobusl standard errors depend on the software package you u. is consistent. I '5.c. it is usefu l to im agine compu ting both .:Iri anec th at is in versely proportional to n. and has a nonnal sampling dislributhln lI n casc: Ihis book .21) for men. nothing is lost by using the heteroskedaslicit). 10 the Ii~t of lea~1 squarcs assumption '!. has a \'. All of the empirica l examples in this book em ploy heteroskedasticityrobusl standard errors unless explicitly staled otherwise. example as heteroskedastic.. For tlistorical reasons.ticityonly and he tc roskedastidlyrobusl standard errors are the sa me.. It therefore is prudent to aSs ume that the errors mi ght be heteroskedastic unl ess you hu ve com pelli ng reasons to believe o therwise.~r. AI a general level .:s that a llow for hctc roskedas. so it is up to the user to s pecify the option of heteroskedaSticityrobust standard errors. A ~ just .ter chap1en.. ho.$ addit lonlll IIs~umpuon is nOI needed for the validil)' of OLS regression analysis I" lon~ J' hctemd. \.."t standard errors: if they diffe r.. the presence of a "glass ceiling" for women's jobs and pay suggests that the error term in the binary variable regression model in Equation (5..the OLS estimator is unbiased. Unless there are compelling reasons 10 the con· nary. ibIS iC!CIIOn IS optional .166 CHAPTER 5 Linear Regression with One Regressor tion (5. J( the homoskedas. Practical implications.ilh OIher I C ~I~ it might he helpful to TWtt: Ihm 5(lme le\I' booh add homO$ledo)hCII). howeveJ.robust standard errors. is a lways to use tht' hete roskedaslicily.robu. Tn thi!> rega rd.Jiscussed.·.:li releva nce in this disell" sion is whet her one should use lleteroskedasticityrobust o r homoskedaslieityonly standard errors.t he n.20) for ".t9). economic theory rarely gi \'e~ any reason to believe \hal the errors are homoskedastic.ticity.uIIC'I)robu~1 )tandard crmrs are U50ed.s hcteroskedastic. Ih.:cd. then c hoo~ ing be tween tbem.omen is plausib ly less than the variance of the error tenn In Equation (5. Th e main issue ot practie.5. Thus.. As this example of modeling earnings illustrates..nd is not used in I..and we can think of noneit mak es sense to treat the error term in Ihi. then you should use the mo re re liable on.1 used In conjunction .. many software programs usc the ho moskedasticity·only standard e rrors as their default se lling.
. among aU estjmators in the class of li near conditionally unbiased esti mators.24) . then it can be wrillen as (31 = fO iY..l l~ ~l /31 ([3J is conditionally unbiased). X". . III <ldd ition. which is a consequence or th e Gauss·Markov thcorem. the OLS estimator is the Besl Linear conditio nally Unbi ased E stimatorthat is. • Yn and that are unbiased.. :ityonly :!oplion plcment you usC. under certain conditions the OLS esti· mator is more efficient than some other candidate estimators. .as a van . X" bul not on Y1.r) (). That is. ((31 is linear). that the sample average Y is the most efficienl discus ty only choos· robust 'robust )le oneS usc the estim aLOr of the population mea n among the cla ss of all estimators Ihm ure un bi ased and are linear fun ction s (weighted averages) of Y l •• ••• Y". X".. ..24) (it is linear) and if Equation (5. l11e class of linea r conditionally unbiased es timators con sists of all estimators o f f3J that are linear fu nctions of Yl .. conditional on X l" . s The Theoretico! Foundations of Ordinory leosl Squores m In . jobs :lei in : con n th is i<:es in I gives lent 10 )ellin g 167 when the sa mple size is large.. lhen the OLS est'im ator has the sma llest variance.25) hold~ (it is . Xn) = t somf.3..s. yrobus t Linear conditionally unbiased estimators. . conditional on Xl. given Xl' ..• X".lbe section concludes wit h a discussio n of alternative esti mators that are more efficien t than OLS whe n lhe conditions of the GaussMarkov theorem do no t hold. 5 as lonF.tributiofl where lhe weights 01 . . assumptions hold and if the errors are homoskedastic.. •• ·.... then the O LS estim ator has the smallest variance of aU conditionally unhiased estimators that are linear functi ons of Y\ . is P l' That is. summ ari zed in Key Con cept 3. if the least sq uare:.· . it is BL UE. Tn other words.. YII" This section explains and discusses this result..as The esti mator (3\ is a linear conditionally unbi<lsed estimator if it c<ln be writ ten in the for m o f Equation (5. if ~l is a lin ear estimator. ' a'l can depend on Xl •._) d.= 1 (5. the esti mcllor lJl is conditionally un biased if E(~ ll Xl··· · . Yw'rlle estima tor ~ I is conditionall y unbiased if th e meiln of its condition al sam pling dislribut ion.3 ) hold and if the error is homoskedaSlic. however. Specific<l lIy. This result extends to regression the result . Linear Conditionally Unbiased Estimators and the GaussMarkov Theorem If the three least squares assum ptions (Key Concept 4. .
5 amJ prove n in A ppe ndix 5. In part icular.2.2 tha t tilt: O LS estim ato r is li n ear and condi tionally unbiased. called the weighted ICilst squa res estinHII OI.. if the e rror term is hete roskedasticas it oCi e n is in economic a ppiica lion'ithen the OLS estima to r is no longer BLUE.5 If the thrce least squares assumptions in Key Conccpt 4. of Ihe theorem hold. Regression Estimators Other Than OLS Unde r certain conditions. ll1e Ga ussMarkov theorem i!ro sta lt': d in Ke y Conce pt 5.'it ic. undtt a se t of conditi ons known as lhe GaussMarkov condi tions. Conscqucntlr. but it does mea n that O LS is no lo nger th e efficien t lin car condition<l lly unbi ased estimator. First. I ' discus$ed be low. the re are other cand ida te estimators tha t a re not lin ear a nd conditionally unbiased. then OLS is HLU E. its conditions might not hold in practice. X n • of all lincar condit ion ally unbiased estimatOrs of fil: that is.. The GaussMarkov theon. under some condi tions. As discussed in SecLion SA the prt':\encc o f helc roskedasticity does not pose a th reat to infe rence hased on hcte ros k ~d.m~ implied hy T three le a~ t squares he assumptions plus the assumption that the e rrors nre ho moskedastie. which a re stated in Appendix 5. the OLS estim ator /3 1 has th e s m{lllt:~t cond it ional variance.t efficient) Linear con ditionally Unbiased Estimator (is BLUE). the OLS estimator is BLUE. these othe r estima to!' . Limitations ofthe Gauss Morkoy theorem .• .3 hold alld if errors are homoskeda. 111e Ga uss·Markov The nrem stales Ihat. given XI •. l condi tionally unhiased). . The second limitation of the Ca ussMarkov theorem is that e ve n if the Clm" diti on:. It is shown in A ppend ix 5. if the three least sq ua res assu mptions ho ld and the e rro r~ ilre homoskedtlstic.ls" ticit yrobust standard e rrors. I he GaussMarkO\ conditio ns.Ir~' mo rc efficient tha n OLS. some regression estimators arc mo re effici c nt than 01> l . the theore m has t\\O important lim itations.2. then the OLS estimator ~I is the Best (mo:.168 CH APTE R 5 linear Regr~sion with One RegrElS50r KEVCONcm THE GAussMARKOV THEOREM FOR ~1 5. Howeve r.. A n alte rnative to OLS whcn lht:fe b het· eroskedasticity of a known form .:m provides a theoretical justification fOT using OLS. The GaussMarkoy theorem.
As discussed in Section 4.5. so OlS. the Ihree leasl squares assumplions ho ld . the exact distribution of the Istatistic is compli cated and depends on the unknown popul ation distribution of the data.6) .then the presence Ilcte roskedas :f the efficient 'n there. in which the regreSSion coefficients (3() and {3. is het S estimator. 3. If.ussMarko v east squares bnsequentl\'. If the nature of the hClcroskedastic is knownspecifi ca lly.something that is rarely known in applications.re ho moskedastic. severe outliers in II are rare. except th at the absolute value of the prediction "mistak e" is used instead of its sq uare. this estima tor is less sensi tive to large outliers kov theorem )rcm has tWO In particular. 'alions. is BLUE.jnear con DalOr is lin IhaLunder 's(imator /31 :mditionalh . weights the i lh observation by the inverse of the square root of the condi tional va riance of u. are obtained by solving a min im iza tion like that in E quation (4. then the OLS if the CO Il nal are nol lin Yell estimator" tlrC' "111is secli()n is oplional lind is nO! used in la\cr c hapICrs. the regressio n e rro rs a. if the conditional variance of u j given X j is kn own up to a constant fac tor of proportionahly.:' lI Y. Although theoretica ll y elegant. ca lled weighted least squares (WLS). and Ihe regression errors a re normally distributed . given X i' Because of th is weighting. ~das ti c. how e\'er. the OLS estimator can be se n ~ iti ve to outliers. in II •5. depends on Xi . i~ than i~ OL$. If extrt!me outliers are not rare. In many economic data sets. then OLS is nO longer BLUE. the least absolute devi a tions estima to rs o f ~.6 Using the IStonsnc in Regression VVhen the Sample Size Is Small 169 The weighted least squares estimator. when a pplied to the weighted data. so use of the LAD estimator. Thus Ihe tre<ltment of lin ear regression throughout the n:mainder of this text focuse s exclusively on least squares met hods. In practice. is uncommon in applications.6 Using the tStatistic in Regression When the Sample Size Is Small When the sample size is small. • . the e rrors in this weighted regression arc homoskelias tjc. That is.then il is possible 10 construct an estimator that has a small er variance than Ute OLS estimator. then rep! 5_5 and The least absolute deviations estimator. the practical problem with weigh ted least squares is that you must know bow the cond itional variance of II. then ot her estimators can be more eHicient than OLS and can produce inferences that are more reliabl e.bo Pu aod PI are the values of ho and bl tha t minimize blXil. This method. . One such estimator is the least absolute deviations (LAD) esti mator. or other estimators with reduced sensitivity 10 outli ers. lerrors are . If the errors arc heteroskedastic.
Use of the Student t Distribution in Practice If the regress ion c rror~ arc ho moskcdastic a nd norma lly dist ributed and if Iht' homoskcdastici tyon ly lsUltist ic is used . This result is closely related to a result discusS('d in Secliun 3.23) ). Xn . X". the horn os ktd a~ ticityo nly standard erro r fo r P t simplifies to the JXlolcd sta ndard error formul a {or tbe diffe re nce o f means.2. if the hOllloskcdaslic normal regression assumptiom hold. the (normalized) homo~kcdas ticityonly variance eSlimtllOr has it c hisquared d istribution with 1/ .. • X" Isee Equalion (5. thcn the (pooled) Ista tistic has a Student {d iSlribmion.. In that proble m.5 is a speci(ll . and iT3r and PI arc indcpcndenlly distributed .the th ree least sq uares a~!\u mp · lions. Conseq ue ntly.. Under the homoskedastic norm al regr. ' X". \\be n X is binary.21. conditio nal on X l.. and that the errors are normally distrib· C uted... Be catl~c a wc i ~hl ed a verage ot inde pendent normal random varia bles is no r~ a l ly distrit'l 0'1 uled. divided by n . .:d in Eq ua tion (5. thc OLS estimator is a weighted a verage of YI..• 170 CHAPTER 5 Linear Regr~sion with One Regres5Ol'" es tima tor is normally dislribuled and the bomoskedasticityon ly Istatistic has a SlUden! I distr ibution...j1.. if the h~V populatio n d is tributio ns are norma l with the same varia nce a nd if the Islat lslir i~ constructed U Sing the pooled standard e rror form ula IEqutl tion (3.'!1 the homoskedasticit yo nly regression rSI::J lislic has a Studenl {di stribut ion (:. the Istatistic computed usi ng the homoskedasticityon ly standard emIr can be wrille n in this form . lhal the e rrors aT homoskedastic. the n critical values should be taken frtHII ' I . the homoskcdaslicityonly r· stati~tic hu~ a Stude nt I distribution with n .4 lhul the Sludell( ( distribution with III degrees of freedom is defi ned to bt' the distrihution o f Z / v'W I m .' wh ere i ~ uefint. th e weigh IS de pend o n XI " . A ') di. P I has a norm<ll dis lrihulio n. conditional on Xl' . hUlian wi th 1M degrees of freedom. Y has a norma l distri bution.'~ Exerci se 5. The t·Statistic and the Student t Distribution Rccull from Section 2.:s s ion ass umptions.). These fj..(3 1. and Z and Ware independent.~ sio n ilssllm p li on ~ . . where Z is a rnndom vurt.i'~ o f the result Ihal.t of testing (o r the eq ualit y of the mea ns in two samples.10).111US (/31 ..IIl!. cussed in Section 5.a re collecti vely called th e homoskedastic normal regre. cond itional on XI "" . The homoskedas lidt yonly {statistic testing fJr ~ f3 10 is I = (~l ..P I n) h~I" a normal distributio n under Ihe null hypothcsis.'c assumptions. Under the nul\ hypothesis. lV is a random variable with a chisqua red dbtrj . In addi· tion.u) h..2 degrees of fr eedom..32) in Appe ndix 5..tble with a s tn ndard no rmal distri bution..2 degrees of freedom.I.5 in the cont!.22).5. It fo llows th at the resull of Section 3. Yn • whcu.
in a practical sense: Districts with 2 fewer students per teacher have.robust standard errors. What have we learned that she might fi nd useful? Our regression analysis. but is {his relationship necessarily the causal one tha t th e superi ntendent needs 10 make her decision ? Districts with lower ~ tudent t e<lc hcr ratios have.1.e null I error 5. a n<lgging concern. t e ~t scores that arc ·1. on average. A 95 % confidence in terval for f3) is .ligible if 11 is moderate or large. TIli s corresponds (0 moving a district at the 50 110 percentile of the dil.6 points higher. .1 Conclusion 171 las a Lmp .(. Howeve r.00 ] %. and thcn using tht! standard nonu<ll distri bution !O compute p values. there is rarely a reason to bel ieve that the errors are homoskedastic and normally distri buted . based on the 420 observations for IY9R in the Cali fo rnia test :.S.teache r T atio and teS1 scores. Ye t. where ·ecaust distrih 1 has t. (3) 5 .3. th e proba bi lit y of doing so (and of obl<lining a [sta tistic on f3! as large as we did) purely b)' ran dom variation over potentia l samples is exceed ingly ~m all ..lion (sec find if tlW ~ken tronl • . and we might simply have estimated OUT nega tive coerricien t by random sampling \'ariation. on av era ge. showed t ha t there was a nt' g<lti vc relati onshi p between er the slUdent.uib 0 115. in fact.teacher ra lio and test scores: Districts wilh srnaU classes have higher tes t scores. Because the difference between the Student I distribution and the nor mal distributiun is neg. re mains. highcr test scorcs.S dis .Q) Inaddi . however.:ore data st't. by fi rst computing heteroskedaslicity. Thcre is a negative relat ions hip between th ~ student. a pproximately 0. inference can proceed as dcscrihcd in Sect ions 5.quared j ~L are istic has context the t WO atis!iC is then the oskcdas for mula ~da l case old.that is.30 :s.. In econometric application s.'fll e population coefficient mig. O!dom : wi th diSl ri the St udent I distri bu tion (Appendix Table 2) instead of the standa rd nor mal dis trihut iOll. nlls represents considerable progress toward answering the supcrintendent's question. The coefficie nt is mode rately large.teacher ratio will. h)'pothcsis tests.2.26. increase sco res? )) Iii~" t"!gres A. Because ~amp l e sizes typically are large. TIle coefficie nt on the slUde ntteache r ratio is statistically significnntly d if terent from 0 at the 5% significance level. and coufidence intervals.1 and 5.tribution of test scores to approximatel y the 60 th pe rcenti le. then . Bu t does thi s mL'"an thilt red ucing the student.ht be 0. this disti nct ion is relevant only if the sample size is small .7 Conclusion Re turn for a momenl to the problem Ihat started Cha pler 4: the su p"ri ntende nl who is considering hiri ng addiLionalteachers 10 cut the st udentteacher mt io.
California has a large immigrant community. in millly cases. Moreo\Cr. That method is mUl ti ple regression analysis. holding Ihese other /oclOrJ COlfSlal/(.n : Use th e (s lat i ~ tic to calcula te th e pva lu es and either or "crept or rej eellht': null hypothesis. Like a confid e. aft ~t al l. Ihe regression model can be used 1. 2.96 stand ard errors.0" group and the "X = 1" gro up. .lndeed.:d va ria hies. students at wealthier schno ls tend themselves tn come [rom more affluent rami lit:s. their children are not native English speakers. Ihe real came of the lower tes t swres.teac he r ratio alone would nOl cha nge these other factors Ihat determine a child 's pe rformance al school. It thus migh t be that Oll r negative estimated re lationship between test scores and the student. These o ther faclors. and betterpaid teachers.or "o mi th.112 CHAPTER 5 linear Regre~on with One Regressor There is." could mea n that the O LS anllly<.costs money. B ut students at wealth ier schools also have ot her advantages ove r their poorer neigh bors. newer books.. in faci.Ieac her ratio.. it could be misleading: C ha nging the student. When X is hi nary. To address this problem. H ypothesis testing fo r regression coefficiems is analogous to hypothesis testmg C the population mea. done so far has linle va lue 10 the supe rintendenl. in fact. we need a method that will nllow us to isolate the effcct on tt::~ t scores of changi ng Iht:: stu dent. a 95% confidence interval fo r a regression coefficien t is computed a~ th~ estimator ± 1. Hiring mare teachers.nce int erval for the popul ation mt':all.0 estimate and test hypolh.:' ses about the difference between the population mea ns of th e "X . lht:: to pic o f Chaplc r 7 and 8. Summary I.. including better facilities. reason to worry tha t it might not. these immigrants lend to be poorer than Ihe ovenlil population and .teacher ra tio is a consequence orlu rge cl as~t:~ being fouml in conjunction with many other fa ctors that are.so wealthier school di!itricLS can hetter a(ford smaller classes.i. For example. and thus have othe r advantage!> not directly aSSOCiated with their school.
1") depends on x. the n. but hetc roskedasticityrobusl standard e rrors do. if the regression e rro rs are homoskedas lic. For end to 3.fl" f Bu! )cigh. = . the OLS estima tor is BLUE. (163) be lcroskedasticityrobust standa l"d crro r (164) best linear unbiased estimator (BLU E) (168) GaussMarkov theorem ( 168) weight ed least squares (169) homoskedastic normal regression assum ptions (170) GaussMarko v conditions ( 182) Is testing either D putatioO cd as the rd IhypothC ~roup (lnd (160) • .lX. that is. ~ovc r. (15S) he te ros kedasTicil y and homoskedast icily bomoskedaSlici (yonly stand ard errors P.statistic computed using homoskedasticityonly standard errors h as a Student I distribution whe n the null hypothesis is true. fa mi :>1. The diffe rence between the Studelll l distribut ion and the normal distribution is negligible if the sample size is mod erate or large. vaT(l/ . A specia l case is when the error is homoskedas lic. S. as a resu lt of the GaussMa rk ov theorem.Key Term$ '73 I. In general Ihe error Ui is hcteroskedasticthat is.( 158) coefficient o n D. the variance o( III at a given va lue of X" var(uilX. then the OLS . fac tors Key Terms null hypothesis (150) twosided alternati ve hypothesis (150) standanJ e rror of (151) Ista listic (lS I) pya lue ( 151 ) confidence interval for f31 (156) confidence level (JSh) indicator variable (158) du mmy vari able (158) coefficient multiplying variable 0. 4. Homoskcdasticityon ly standard errors do not produce valid statistical inferences when the errors are helerosk edastic. = x} is constant.and if th e regression errors are normally distributed. I f the three least squares assumption hold and if t he regression errors are Ire not onship classes I cause r nal~sis ~'e need It he stu ~u 1t iple ~ adlOg: horooskedastic. If the th ree le aST sq uare s assumption s hold.
.. /3..1 Suppose that a rescarcner.\·ER (.11. "Wag(: = 12.520.1 O ut line the proced ures fo r computing the p value o f a t wosided lest 0 1 H rj. My = 0 using an i. eSlimales tbe OLS regression. Outline the pru· cedures for comput ing t he p ·\.5.6 is cont ained in the 95 % confidence inle rval for fJl' d. .5.leol variables? 5. .0.idty and heferosker/aslicily.81  x CS.. Do you reject the null hypothesis a t the 5% le .12 x Ma le.52 + 2. Ca lcula te the p ·. (20.4) (2 .ulue of a twosided lest of Ho: f3 1 "'" 0 in ~ n:grcssion model using nn i.i = 1. Construct n 99 % confide nce inter. 5. and expl ain you reasoning..lk workers and 280 fe male worke rs.. using data on da ~ size (CS) and average te(t sco res from IOU lhird grade cJas::.es.nl for f3n..5.R . " 5.3 Define hQmosketlol'. sel of obscrvitlions ( Y X .4 . i = 1. using wage data on 250 randomly selected m. R7 . Construct a 95% confide nce interval fo r 131' the regression li lope cod fici e nt. estimates tbe OLS regression. Wh~ 1 arc the dependellt and indcpem.).el? At the 1% le .i. Exercises 5... T~!ifSCO'~ . P rovide a hypothetical empirical example in which you think the errors woul d be heteroskeda"l ic. Without doing any additional calculat ions. n.08. b. = O.23) (0. . l 174 CHAPHR S linear Regr~ion with One Regre~sor Review the Concepts 5.d .2 Suppose that a researcher. II.d..21) II. R~ = O..06. determine whe ther .0. .i. al ue for the twosided test of the null hypothesis Ho: /3 1 = 5. set of observations Y. Calcula te the p va lue for the twosided tesl of the null hypothesis Ho.2... SF.2 Explain how you could use a regression model 10 estimate the wage gentkr gap using the data on earn ings o f me n a nd wome n..36) = 4..el? c..
What is the worker's expected ave rage hourly earnings'! id mall! b.) l\ :c:r m' c. CoO$lfuct a 99% confidence interval for the person 's weight gain. what is the mean wage of women? Of men'! cal r' test c. What are the regre. A randomly selected 30yearold worker reports an education level of Hi years.2.99. d. .23) to answer the followi ng. is measured in inches.31) = r f  lO. R.== . In the sample. A man has a JUIl. Define the wage gender gap as the difference in mean ea rnings between men and women.3 Suppose that a random sample of 200 twentyyear·old men is selecled [rom a population and their heights and we ight s ilrc recorded. What is the es timated gender gap? h. Use the regression reported in Equation (5. A regression af weigh t on height yields ~ ~ .: grllwth spurt and grows 1. [ I whe re Weighl is measured in pounds and H eigh. but regresses Wages on Fema lt!.4. 5. Anot her researcher uses Ihese same dat a.4 Read the box "The Economic Value of a Year of Education: Heteroskedas· licil y or H a moskedaSlicity?" in Section 5. a.94 x Height.5 inches over the course of a year.f " where Woge is measured in SJhour and Male is a binary variable that is equal to I if the person is a male and 0 if the person is a fema le.SER (2. Construct a 95% confidence interval for the gender gap. A high s< :hool graduate ( 12 yea rs of education) is contemplating going to a communi ty co llege fo r a two· year degree. a variable th at is equal to I if the person is fema le and 0 if the person a male.15) (0. How much is this worker's ave rage hourly earnings expected 10 increase? • . SER ~ _ _ _. a.Exercises 175 . Rl = 0. Is Ihe es timated gender gap significant ly differen t (rom zero? (Com· pute the p .ssion esti mates calculated from this regressio n?  Wage  + .41 + 3. 5. x Female.R1.val uc for testing the null hypoth esis that there is no gender gap.
on <:average. b. .I binary variable equal to l if the stude nt is assigned to a small class a nd equal to 0 ot herwise. (2. in (be populatio n.3). Do small classes improve les t scores? By how much? Is the effect la rge? Explain. college gradu <lteS ea rn $10 per hour more than high school gradua tes. consisteot with lhe regression evidence? 5.) Su ppose Ihal . A rand.5(c)? Expla in. (Regular c lasses con ta ined approx imately 24 students a nd small classes contained approximately 15 students. and given s ta ndardized tests at the e nd of the year.ll (1. A hi gh school (.. A regression of Tes(score on Smal/Class y ie ld~ ~ "" 918. b.6) X SmailClas~) R 2 = O. 3.5. 5.2X.6.oullse lo r [e lls a student that. Co Construct a 99% confidence interval for the effect of SmaIlCla.ln! sample of size 11 . Suppose tha t lhe rcgres· sio n errors were ho moskedasLic: Would th is affect the v<:alidity of the conf ide nce interval con5IJucied in Exercise 5. Is Ihis state· ment consistent with the reg.176 CHAPTUI: 5 linear Regres. the s tandardized tests have a mean score of 925 poi nts and a standard deviatio n of 75 points.2. R2 = (3.ression evidence? What range or values i<.5 10 the 198Os.01.5) 0.3.9 (1. SER = 6.0 + 13. S£(bl) was computed usi ng Equation (5.250 is drawn and yields Y== 5. 5.4 + 3.5) a.26.7 Suppose tbat (Yo Xj) sa tisfy the assumptions in Ke y Concept 4.. Is the estima ted e He ct of class size o n test scores statlsticaUy s ignifi ca ot? Carry oU( a test at the 5% level.stic ! Expla in . Let SmollCltlSs denote .sion with One Regreuor c. Tennessee conducted an experiment in which kindergarten stu dents were r ando mly assigned LO " regular" and "small " classes.6 Refer 10 the regression described in Exercise 5.~s on test score.SER = 74. Do )'OU toinl:: th at the regression errors plausibly are homoskeda.
3. Test Ho: {31 = 55 \'s. b. Test Ho: f31 = 55 vs.4) where the numbers in pa ren theses are th e homoskedast tco nly standard erron. a.8 S u p po~e {hat (Y.2 + 61. we re independent.9 Consider the regression model skedast ic? Y. R2 = 0. (10. X . .10 Let Xi denote a bin ary va riable and consider the regression Y.Exercises 177 gradu . where Y and X are the sample means of Yi and X i' respecti vely. . y = 43 . Construct a 95% confidence interval for {31 c. + U i . d.. SER = 1. = f3 0 + {3 IX. lilY of the . H I: (3 ) > 55 at the 5% level. Let denote an estimator of f3 that is constr ucted as {j = ..5X.54 . Test Ho: P I = 0 vs. Show tllal l3 is conditionally unbi ased . Y".. 0 be included in th e confi dence interval from (b)? denote a Lnd equal I 5. b.) satisfy the a!isumptions in Key Conce pl 4. and Xi satisfy th e assumpt ions in Key Concep t 4. Le i Yodenote th e sam ple mea n for observa tion s with X == 0 and ~ P • . Would yo u be surprised ? Explai n. Show lhal {J is a linear function of Y l. and X.3. and (a) and (Il) answered..tate 9.Y b . Suppose thai Yi and lues is 'ten st U Xi are independent and many samples o f size L given d approx :udents. ) score of n = 250 are drawn. h. . a~ ) and is independent of X A sample of size n = 30 yields 5. H J: {31 :f. In what fraction of the samples would Ho from (a) be rejected? [n what fr action of samples would me value {3.52. he r egres . '" 0 al lhe 5% level. ::0 f3 X. 5. c. in addi I' tion. . A ran dom where u. Suppose you learned th at Y. ffeet signifi : iass on 5.2) (7.. II i is N( O. a.. regressions estimated. + 11 .3 and.55 at the 5% level. . H l : J3. for rhe regressi on coeffic ients. Construct a 95% confide nce interval for f3o.
).. = x) is consta nt ? d.. + II . = J..1. = }l1  Yo 5.) sa ti sfy the assumptions in Key Concept 4..d' Let ~m I denote lhe QLS cstimatorconSlructcu using .. c $51 .3 and " va r(II/IX.Y". <lnJ the independe nt samples are for men and wo men. = D e rive the least squares estimator of {3 and show that it is a linear fU lictio n of Ylo. IT~ ) a nd is independe nt of X. Womell.o + {J. denotes yea rs o f schooling..10.. To p~ specific.3 1). is run. Prove thaI the estima tor is BLUE.3 and.22).:: $485.. Lt:t WOmell dcnOie nn indie<t lOr vari able that i s eq ua l 10 J for women and 0 for men.. H. derive the variance of ticilYgiven in Equa tio n (5.13 Suppose that ( Y" X. wbere (1I i' X. How would your a nswers to (aJ and (b) change if you assumed an i) that ( Yj • Xi) sa tisfi ed the assumprio ns in Key Concept 4. suppose that Y.1. Show that the estima tor is conditionally unbiased. VA {Jo + /3. is N(O.. c. ~~:1 y~ $523.. i = P.. d. S.. II.1 0.3? 5. == 120 me n and " ".be assumptio ns io Key Concepl 4. and the sample st:mdard deviation (sm = L:=. How \\ould your answers to (a) a nd (b) c hange if you assumed only that (Y Xi) satisfi ed t.. IX . Yn ... {3X. The corresponding values for women are Yo. Is {J l the best linea r co ndit io nally unbiased estimator o r P I? c..:n The sample 3\'e rage of me n's weekly earnings <V"..15 A researcher has two indepe nde nl s amples of o bservations on ( Y. (Y . ". = 13 1 womr. a nd s uppose that 311 251 observat ions a re used in the regression Y.. Show that AI ~ ~J AI + ~ I = YI. denotes earn ings. I X.." + /(""1' a nd tbe regression for WOnll!n as )' = {J. b. De rivc the conditio nal variance of the estimator. Js {J l conditi onall y unbiased? b. + 1/. Write the rcgre ~sion [Of men as Ym..U A random sample of workers co nt ains n.o + Pm.r + Il.. X. P under homos kedn& o 5.• 118 CHAPTER 5 lineor Regression with One Regreuof de note the sample mea n for obse r va tio ns wilh X = J. Find the OLS estimates of f30 and {31 And thei r correspond ing sta ndard errors.. 5. U Sta rt ing from E quation (4. in addl' tion . ".) ~a ti s ty the Ga ussMarkov conJi lions given in Equation (5.10 and s.28) in A ppe ndix 5.1.andll..X.14 Suppose that Y. )2) is $1). a.
Let . the sample of men ..3.umed pt4. homosked<lS a. • . the data set ConegeDistance described in Empi rical Exercise 4.1O .3 Usin g. t·. Is the estimated regression slope coef(icien t statisticall y significant ? That is. stat istic'! ::.· b.. in add. ~ $52 3.To hc Df schooling. d.. or 1% signifi canct' leve l? What is the p value associated with coeUicien t's . and SE(h". run a regres sion of ave rage ho urly ea rnings (AH£) o n Age and carry OU I the followi ng exercises. Construct a 95% con fi dence interva l for the slope coe fficient..I ) denote the corresponding standard errors. Construct a 95% confidence interval for the slope coefHdent."J  p".. anJ l e regressio n [Of )r women as Y.5%. • $51.1 . X . )I'.1 5. 5%. Repeat (a) usin g o nly the data for college graduates. can you reject the null hypothc sis Ho: 131 = 0 versus a twosided a llernativc a t the 10%... . Is Ihe effcci of age o n ea rnings diffe rc nt fo r high school graduates (han for college graduates? Explai n..)1) is $68.~ ..s thc pvalue associa red with coefficie nt's .. Show (hat the sta ndard e rror of ~"'. c. ~.l Using lhe data set C PS04 described in Empirical Exercise 4. Re peat (a) using only the data for high school grad ua tes.. 13.2. Run Ihe regressio n using data o nly o n fema les a nd repeat (b). ) J' + ISE(~.1denote the aLS estimator constructed from the sam ple of women. and. or 1% significance leve l? Whal . can you reject the null hypo lhesis Ho: 131 = 0 vcrsus a two· sided alte rnativc at the 10%. can you reject the null hypot hesis Ho: {31 = 0 versus a two sided ahe rnative a t the 10%. T a regression of years of completed e ducation (ED) on dist ance to th e near es! college (Dist) and carry out the fo llowing exc rcise~ u.2 Using the data set TeachingRatings describe d in Empirical Exercise 4.1.]) and SE(j3.5%..) ~ v'ISE(~m.10.. (Hillt: See Exercise 5.en and 0 for ..ression Yj = J31and their Empirical Exercises ES. Is the estimated regression slope coe((icient sta tistically significa nt? Thai is.).statistic'! I I? .. run a regression of COI/I'se_Eva{ on Beatay.. e. l is given by SE(P.3 and sumed only )' Marko v condi· is a linear un E5.Empirical Exercises 179 lat flo = Y o· 131 women..) E5.l.. or 1% significance level? Wha t is the p value associa ted with coefficient's Istal istic? o n (Y. :onstructcd using b. Is the estimated regression slope coe fficient statistically significant? That is.
jliil 11 .robu~1 ~(Iln' iJo is 1 ILx)II.21) b)' Ihl'.:'I { XI  xl Replacing \'a rI{Xj  estimatoN. Is the effect of distance on completed years of educat ion different for men lhan for women? ( Hint: See Exercise 5. .e 1\\0 da rd errors is discussed in Sect ion 17. which allow for heleroskedasti.J.21) by the corresponding sample variances.in Equation (5.180 CHAPTER 5 linear Regression with One R egre1W1" d.robust " standilrd errQrs. The consistency of heterosk~dasticlly .) APPENDIX 5.2 1) is estimated by ~. FormuliLS for tht.) m Equllt ion (4 . of the OLS esti mators and the associated standa rd errors arc thcn given fo r the special of honlOskedasticity.. here the dIvisor n .4). The variance in the denominator is estimated b~ ~ L.1 Formulas for OLS Standard Errors 111 is appendix d i ~cu ~es the formulas for OLS standard errors.4) is obwined by replacing the population \3ri ances in Equation (4. ca~e HeteroskedasticityRobust Standard Errors The est imator (.I(X..16J .3.· tty: these are the "he leroskedasticity.X )lil!.J and va r(X. for downward bias.15.~ " 2/_ 1 OJ. R un the regression using data on ly on males a nd repeat (b). e. v. .3. yields "i. L:'. '" it x (l±HfY nj &1 ( ~. Thc~ :Ire first prese nl~d unde r the least squ<tres assum ptions in Key ('.it h a modification llle variance in the nume rator of Equation (4.onccpt 4..3. defi ned in Equation (5. The ~Stimator of the variance of 1 ~ . analogously to the degreesoffreedom adj ust meOl used in the deflm· lion of the S £R in Section 4.. varianc/..2 (instead of n) i ncorpcmlte~ a degreesoffreedo m adj ustment to correct ".
= I .) = fT~.m: ~.dudl ") = E[ICX." L /I '1 U (homoskedasticit'1only) . write the numera tor in Eq uation (4.] = 0 (by the first kaS l squares assumpt ion) and where Ihl: final equality follows from the law of iter ated expectatio ns (Sectio n 2.J.timated by )'1 these \\\'0 'robust stan' _.J.formulo$ for OLS Standard Errors 181 ni for v.h~ = CT~CT1. The reaso n ing be hind the estimator b the same as behind OJ. here H. . . the fann ulas in Key Concept 4.•: ~resent cd '" (5.Xl' " (5.tX)II. IX.27) and (5.. A simila r calc ulation yields Equation (5. .29) . (5.)] ". . .X l ) s! I .X) ' • . (T~ E[(X. (homoskedasticityonly) and (5.J. If the errors ar~' homoskedastic.d u:l = £[( X. The ril~.:z 1 Ihe defini .] = £ (l( X .28) skeJaslic aria nee of ca~ To derive Equ.I X. 'lbe result in Equation (5.tX) II.. L(X.IX.' 2 £( X f) 1 (5. The homoskedasticityonly estima tors of these variances .2o ) ij2 _ ~. .JLX)2j by s ubsti tul ing this expression into t.2 1) as var[(X. It to COffect Homoskedasticity.27) follo ws lation vari cation./oIx)II . .3..21 ) and simplifying.tx) 2vu r(u.. o 11fT). .  )eclal E[(X.. • .28).28). the condItional vari.) ).27) if.t. where the second equality follo ws Occause £I(X. and by estimating the variance of ui by the square of theSER. If u.}tioll (5 .) == (T~ so E{(X.t . where nume ralOr of Equation (4. . . and ste ms from replacing population Vi expectations wjth sample averages. IX.mcc of to II.fixJ1var(u.(f I~ ~:_l XfJX. .J. HomoskedasticityOnly Variances U nder homoskedastici l'1.27). then var( II. given X j is a eon"lant: "ar(lI.rl = £[(X.J. The standard e rror or Po is SE(Po) = VJ[. ( 1 . s~ . L(X.4 simplify tY't • = ~ and fliT../oIX)ll..Only Standard Errors The homoskeda:<>ticityonly stand ard errors are obta ined by substituting sample means and va riances for the pop uillll00 means and van a nces in Equations (5.0". is homos kedastic..30) .
X. that I' .E(/I.II/I X. j..: square roots of Di. has mean olCro. This appe ndix begins by s til ting the Gauss·Markov condI tions and showing that they are implied by the three least sq uares c(lOdi lion pl us homoskcdasticity. Xl "'" X. Because arc i.i.) = 0 for all i. so condition (il) holds.) = E(II/I X') £(II/I Xj ) for i ? j: bc:cnu<. n. .) ...) "" ~ . £(u..~·d Xs (Xl..1 X I' . . We next show tha t the OLS estimator is a linear condit io nally un hia wd estimator. X. we turn 10 the proof of the theorem. where aillhese ~tate mc nts hold coodition:tl1y on all ob scr\. hilS a CODSlant va ria nce.I.) = £(II.d. 0 < O" ~ < (iii) £(11.var(u. so condition ( L ii) . ..X~) "" 0 for all i . by Assum ptio n 2. Assumption 2 also implics Ihm E(II. the Ga uss·Mil rkov theo rem siaies Ihal if the GaussMarkm condilion§ hold . il fo llows that E(II/II.O: thuscond ilio n (i) ho lds.182 CHAPTER S linear R egreuion wi"" One Regressor where ~ is given in Equatio n ( 4. XJ .efficient) condi tionally lifl1!dr unbiased estimator (is BLUE). stall' thJI u. i 00 (5 ...! X.) ... The homoskedasticityonly SlandaroJ errors are Ih. an d lhat the e rrnTS !'I re unco rrelated for Jiffe re m observlllions. . The GaussMarkov Conditions The three Ga uss·Markov conditions are (i) £(u.. .I X. . (Assum ption 2). £(uil X. . .X~).\Usc (Xi' Y. XI) bec...I X I . respect ively.1'1 )...3)....1 X. The GaussMarkov conditions are implied by the th ree least sq uares assum ptIon) (Kc~ Concept 4. j .31 ) "* j whe re Ihe condi tions !:Iold for i. . To show Ih. the n the OLS estimator is the bcsl (mo~t .11.0.) arc i. and becau~ the errors arc assumed to be homoskcdastic.i.c E('I. The three condi tions. .) . .. . and a t.I X I . which is conslant .IIJIX" X.) . notc that E( uju/I X I .. var(u.)..d. XII) "" 0 (ii) va r (u.. IX ..• X.1( cond ition (iii) is implied by Ihe tellst square~ U!>suml" tions. lind by Alisump lion I. )C. va r( II. Similarly. APPENDIX The GaussMarkov Conditions and 5.).. XI " " . Finally.I X !.Assumplion 3 (nonlero fini te fo urth moments) ensures that 0 < ~ <. . . plus the addit io nal assumptions thai the the ob~rvations erro~ are homoskedastic.. hy Assump tion 2..5..2 a Proofofthe GaussMarkov Theorem As discussed in Section 5.
. va r(tld X j.'..e the weigh IS a i = 1. X.X) = O(by Ihe definitio n o(.. a. = (X.Pl i~ conditionally unbiased. . because :E~_I (X. ThUs." linear ndi ro ::E . .X) Yi • Substi tuting this result into the formu la for ~l in Equalion (4. . I1. ~Ie that Proof of the GaussMarkov Theorem We start by deri ving some faclS Ihal hold [or ail linea rconditionai lly unbiased e stintators tha i IS. ms(Key Becau$t ~.X)' . i = I 0 and ~G. ta king condition al expectations of both sides of Equation (5.YL. Is a Linear Conditionally Unbiased Estimator 10 show Ihal ~ 1 is linear. . Y ix l i where. . . XlV.X..::.'.= ".. ' X R bUI nOI o n " YI .3.... . th us...25).The GaussMarkO\' Conditions and a Proof01the Gou~~Morkov Theorem 183 'h' holds. ~rko\' X) Y ." .. plus homoskcdaslicil).E(ud X1. is .) +~..61 10 be condilio na11y unbiased...X.: \ (5. thc least squares assumptions in Key Concept 4. imply the GaussM arkO" conditions in E quation (5.X)(Yi y) = ~:.. Because ~1 is conditionally unbiased by tlssumption.35) • .  " ::E (X.) .(Xi . (5. wc have thaI iY. .) .3. biased Y" .x.) = . Substituting Y. iI X) X) ' " ::E (X. .) + f31(2.IQ a nd collecting terms. bccau~c il ion (iIi) " La."<" 1 ' '' 1 II (5. X) " ~ '" (5.31) The rl!sult tha t f31is conditionOlUy unbiased was previo usly shown ill Appendix 4...). and the \'aria nce of the conditional distribution of ~T' give n Xl'" . (.X) = :l:. X n .!I"" .24) !lnd (5.X.lIi _IG 0. ••• . Under the Gauss·Markov conditions. it mus t be t ha t f30 (L._I(X.) + PI (L::' lll.33) ::E(X.  (5. .(in.7) yields  ~7 1 (X..suOlP· Assump' By the fi rst Gauss·Markov cond ition.'..\a. II1..) = P I. l (X. = f30 related bsen 'ed + f3.. E(27 1 IX I" . (or .Y). Xn) ". o f Iht' errors. . (X. .32) I plu!> Beca u:. ..Xn) = 2 7 . the OLS estima tor 0 is 1 11 linear esti mator.34) \ SSUOlP' .. = I.  " "" La. .34) yi e lds E(. . for a ll esti mato rs PI salisfying E q uations (5. i.I .Q. . (i n + i. .'. flISt note thill. + U j into PI = I :..BnO:. b ut for thisequa1ity to hold for a ll values of 130 (lnd P I it must be lhe case thaI. . • X.x. '. X )' " . The OLS Estimator (3 . . 0.3 1).8 d X I• .. 11 in Eq u!llion (5. ) ~<% . .32) depe nd on X l •.
c:.:...o . . .~ .:....c: 0>': .c: 11 ..E 0:.~~ ~ H! .' "0 E o c: Z > ~ . of c ~. .' " .. • • • u ~ • ...
i. are i. mple 10 the next.• XII take on the same \ alues from o ne :. Y. Y" are i.i.d. .." so that the only regressor is the constant regre!>SOT X I\' = I.. ~u m · The Sample Average is the Efficient Linear Estimator of E(Y) An implication of Ine GauJOs· Markov theorem is Iha\ the sample average. ...d" lhen .i.. O.baS 18S all of Ine "condnional on Xl.. Then the OLS estimator (5.ssary because X l •. a.. .I' ... It follo ws that. so iI {ollow$ that Y is BL UE if YI. 'xceeds \ + d..' is no regressor...). consider the case of regression without an " X.'C.36) Equa· for Ine iJo = Y.The Gouu'lv\oriov Conditions and a Proof of the GouwMortov Theorem . Y. Y is BLUE Note thn! Ihe Gauss·Markov requireme ntlh!!t the error be homosl:edastic is irreleva nt in this case because then.) when Vi' . ..37) ..3.. 11 . Thus his result (5. Xn ): .xCCpl [nat • .o. lllis result was stated previously in Key Co nce pt 3. To see this. 1". .. But o nonran· : repealed >sumptio n i. under the GllussMarkov :lssumptions.. is the most efficient lincar estimator of £( Y .11. X~ " s t nlcmcnlS arc unne.' )l".d.
sion error term. if we have dHta on thest: omitted variables. Co uld th is have produced misleading results a nd. The key idea of multiple regression is that.CHA PTER 6 Linear Regression with Multiple Regressors C ha Ple r 5 e nded o n a wo rried nOle. 6. I Omitted Variable Bias f " By foc using o nl y on tbe student.can in fact make The ordjoary least sq ua res (OLS) estima tor of the effect of class size on lest scorc ~ misleading or.l nese omitted r. the empirical analysis in ChaP tcrs4 and 5 ignored some potentially imponant detennina. A lthough school dislricls with lower studentteacher ralios te nd 10 have higher test scores in the California data set perh aps stude nts from districts with small classes have olher adva ntages that help the m perform weI! on standardi7cd leSIS. wha t can be done? O mitted fac tors. suc h as stude nt cha racte ristics.nts attest scores b) col lect ing thei r influences in the regreo. Thh c haple r e xplains how to estjrnate the coefficients of the multiple tincar regression model. then we caD include them as addilional regressor.. Ma ny aspects of muhiple regression parallel those of regression with a single regresso r. ctors include • .teacher rat io. The coeffici e nts of tbe mult iple regressio n model can be estimated fr om da ta using OLS: the OLS estimators in multiple regression are random variables beca use the y depend on data trom a random sample: (Itld in large samples the sampling distributions of the OLS estim atOrS are approxim :Hely normal. a method lhal can eli minat e omi!led varia ble bias. a nd the re by estimate Ihe e Uect of one regressor (the studenHeache r ralio) while ho ld ing conStant the other varia bles (such as srude ot characte ristics). biased . TIlis c hapter e xpl:lins this "omitted variable bi::ls " a nd introduces mllhipJe regression. if so. more precisely. slUdied in C hapte rs 4 and 5.
Icacher ra Tio could e rroneously lind a correla tio n and produce a large estima ted coefficie nl .even zero.teache r ratio (larger classes).Ieacher raljo were unrebte tlt o the pe rcentage of E ngli sh learners. il is possible that the OLS coefficie nl in the regression 01 test scores on the student.Iude nttcacher ra1io.lence to this concern. the superinte ndent might hire e nough new teach e rs to reduce the studentteache r ratio by two.teacher ratio. but her hopedfor improvement in test scores will fa il to maleriali ze if the true coefficient is small or zero. But because the student. A look a t the California dal3 lends crec.ud< been omin ed from the analysis (t he percentage of English learners) and that deter mines. such as fam ily background .l compute r usage. We begin by considering a n offiillCd student c haracleristic that is particularly re levan t io Ca lifo rnia beca use of its large immigrant population: the prevalence in the school district of students who a re sli1l learning English . and st uden t characteriSlics. If the swtlent. O mit led variable bias occurs when lwo conditions are true: (1) the o mitted variable is correlated with the included regressor: a nd (2) the o mitted variable is a determinant of tbe depende nt va ria ble. If dis tricls with la rge classes also have many siudents still learning English. Th is small but posilive correlation suggests that districts with more English learners tcnd to have a highe r studen t.6. To illustrate these conditions. the de penden t va riable (tesl scores). that islhe mea n of the sampling dislribution of the OLS estima tor might not equ al the tr ue effect on test scores of u unit cha nge in the student. the OLS esti ma lor of the slope in the r egrcssion of test scores on the s tude nt. col .teacher ratio) is correia le d wi th a varia ble thai has !+ ~ "p .ed. then the OLS estimalo r will have omitled variable bias. Stude nts who a re still learni ng Eng li!oh might perform worse on standardized tests than nativc English speakers. such as tcache r quality a nc. By ignori ng the percent age of English le arn ers in the district . consider three examples of variables tha t arc omiltcd from the regression of test scores o n the studentteacher ratio.teache r r:uio and the pe rcentage of English learne rs are correlated. . Definition of Omitted Variable Bias II" the regressor (the studen l.teacher rat io reflects tha t influe nce. when in fact the Irue causal effect of cutting class sizes on tesl scores is small. 19.. in part. thell it would be safe 10 igno re E ngIJsh profic ie ncy in the regression of lest scores ag<li nsl Ihl: ::. Ihen the OLS regression of lest scores o n Ihe stude nt . H ere is the reasoning. A ccordi ngly.Ieacher ratio co uld be bia!1.1 Omitted Variable Bios 187 school c haractcrislics. based on lhe analysis of Chapters 4 and 5. The correlation belweeo Ihe SludenHe<lchel r:lIio and the percenlageof E nglish learners (students who a re not na tive English spt:akers and who have not yet mastered English) in (h e district is 0.
other than Xj..• . tha t arc determinants of Yi . sch ools with more teachers per pupi l probably have mo re teacher parking space. the slUdent. so the first conditi on wou ld be satisfie d.. For example. O mitted variable bias means that th e firs\l east squares assumpt io n.thaI E(u. parking lot space has no direct effect on learnin g. recall thaI the error term II. a5 listed in Key Concept 4. .· sion of les t scores 00 the stude ntIeacher ralio could incorrectly refleci the inOu_ ence of the o m illed variable.teacher ral io. I X. It is plausible thai students who are still learning Eng. Con ve rsely. Example #3: Parking lot space per pupil. CHAPTER 6 linear Regression with Multiple Regressors Because the percentage of English Ic arners is correlated with the stude nt. Fo r th is o mi tted variabl e.3is incorrect. lish will do worse o n standardized tests than native English speakers. Speci.) = 0. However. no t the parking lot. so the second condition holds. in which case tbe percen tage o f English learners is a determinant o f test scores and the seCOnd conditio n for o mitted vari able bias holds.e r omiued variable is park ing lot space per pupil (the a rea of the teac he r parking I~ I di vided by the n umber of stude nts). Example #1: Percentage ofEnglish learners.teacher ratio. Thai is. Because parking lot space per pupil is no t a det e rm in a nt of test scores. thus the second condition does nol hold. if the lime of day of the lest varies from o ne districT to the next in a way that is unrelated to class size. th en the time of day a nd class size would be uncorrelated so the fir st conditio n does nol ho ld. omitt ing it from the analysis docs not lead to om itte d variable bias. Thus. in Ihe linear regressio n mooel with a single regressor re presents a Ufactors. the pe rcentage of English learne rs. the OLS estimator in the regrc<. H owe ver. it is plausible that the first condition for omitted variable bias does not ho ld but the second condition does. A no lb. the firs t condition for omiued variable bias holds. because in this example the time that tbe test is administered is uncorrelate d with the studen t.teacher ratio could not he incorrectly picking u p the " lime of day" e ffect Thus o mitting the time of day of the Lest does nOI resuh in o miued variahle bias. Example #2: Time ofday of the test. Omitted variable bias is smrunarized in Key Concept 6. Another variable omille d fro m the analysis is the time of day that (he test was administered.fically. This variable satisfies the first but no t the secolld co ndition fo r omit ted variable bias. If one of these other factors is corrclHled with XI· . 1. omitting tbe pe rcentage o f English learners may introduce omitte d va riable bias. the lime of day of the test could affect scores (ale rtness varies thro ugh the sc hoo l day) . unde r th ~ ass umption that learning takes place in Ihe classroom . Omitted variable bios and the first least squares assumption . To see why.
if an omilted variable is a determinant of Y. is nol a consistent estima tor of f31 when there is o mitted vari ahle bias. Omitted variable bi as is a problem whether the sample size is large or small.ITor term (which contains this factor) is correlated with Xi. but .... and if it is correlated with Xi' Ihen the error lerm is eorrelal. KEY CONG'" j 6. lId riable.s from !of day / . X. then it is in the error te rm..Iff x) with increas ingly high probability. Y.1) is Ihe bias in {. Suppose that the second and third least squares asswnplions hold .m the this means that the (. This correlation therefore violates lhe firsllea sl squares assumption.. /3 1 '+ f3 1 + P...1 :gres influ lilt ing 1. as r term Ui in ..(<J . of day" variable A Formula for Omitted Variable Bias The discussion of the previous section about omitted variable bias can be sum marized mathema tically by a formu la for this bias. given Xi is nonzero. 1 Omitted Variable Bio~ :e of n ror 189 Eng case 'eand O MlnED VARIABLE BIAS IN REGRESSION WITH A SINGLE REGRESSOR Omitted variable bias is the bias in the OLS estim ator that arises when the regres sor. is correlated with an omitted variable. Omitted X) d ). Con l ( r:l~ :~: .I that persists e ven in large sa mples.u.(<J. and the aLS estimator is inconsisten t. The formula in Equation (6. be corr(Xj. e parking (6.\'I' (J'x' ~ is park : number fo r omit ~bly ha"':: 1. ThaI is.In o ther words.!!.1 ) condil io n Int of lest ias. rauo. as tbe sample size inc reases.3>'. but the first does oat because fix" is nonzero. For omilled variable bias to oceuT./ux) in Equat ion (6. 1) summarize~ severaJ of the ideas discussed above about omi tted variabl e bias: 1.. the condit ional mea n of lI. TIlen the OLS esti mator has the limit (derived jn Appendix 6.other tn:lI1 ed with X" Because P I does n OI converge in probability to the true value (31' ~1 is incon sistent. The omitted variable is a determinant o[ the dependent variable.1) P .. tha t is. . Let the correlation be tween Xi and U. and the con sequence i~ serious: The OLS estimator is biased. X is correlated with the omitted variable. PI .:d with Xi' Because ii.0WCver. TIlt! term Px.6. twO conditions must be true: f"" '*''. 2.E.. This bias does no t van ish even in very large samples. {31 is dose to (31 + Px.) = Px. and Xi a re correlated .
sic courses or and visualizing: shapes. 1176) Hnd the one by Lois Hetland (pp. belween the regressor a nd the error term.art. 3 . the estimated relationship betWeen lest score~ and laking optional lSec the jrJ/lTllul of At!Slheric Edlt((Jl'OIl 34: 34 (FaIiMill ler 2(00). The larger is IPx. W hether Ihis bias is large o r small in practice depends on the corre lation Px.) Taken together. the 'luthors of the review suggested that the correlat ion between testing well and taking art or music could arise from any number of things.8% H.~ilil'ldy correlated with t he stud entteacher ratio I l'er~entagj <:: 1. So is therc a M07.'xpe riments eliminate om itted variable bias by ran domly assigning participants to " treatmen t" and "control" groups.:ncr students might ha \"c more time LO take optional mu. For example.1I sludy made hig new~ and politicians and parents saw an o. try to fit in a lillIe M07..O~ :>23. suggests that the rea! fI.... TABLE 2.9.8. I99J) suggested that listening to nlUsic courses appears to have omined variable bias. 6 linear Regression with Multiple Regressors A stUd)' published in JIoiarurl! in 1993 (R::Iuseher. how· e\'er. l. For rcnsons not fully understood.ISY way to make their childr\!n smarter. too. or those schools with . the larger is the bias. the state of Geor gia even distributcd classical music CDs!O all infanlS in the state. more interest in doing aCross the board..:ct fail to show thAt listening to MOJ:art improves IQ or general lcsi pt:rformancc.art effecr~ One way to find out is to do a randomized controlled experimcnt.lson trolled experiments on the Mozart eff. ever.~.0% . it seems that listening to classical music (Ioes help temporarily in one narrow area: fold ing paper Instead. Mozan for 1015 min utes could temporarily raise your IO by 8 0l '} points. the many con What is Ihe evidence for lhe "MOlart effect"? A re view of doz~ns of studies found Ihal students who take optional music or arts courses in high school do in fact have higher E nglish and math test scores than those who don't. Th.. For a while. 190 CHAPTER . So the next lime you !. randomized controlled \. (pp. For exampk. By omitling factors such a~ the s tudent's innate abil ity or the overall quali ty of the school."l'am for an origami exam.$tudying music appear.: bc tter schools In the terminology of reg ression . we specullted that the percentage of $lU' dents learning English has a lI eglllil'e effect on district test scores (students still learning English have lower scores). 111e direction of the bi as in ~ t d epend s on whether X an d 1I Alt<. the acade mically b.' A closer look at these Sludies. how· for the betler lest performance has littk to do with those courses.105. to have <1 n effect on test scores when in fact it has none.1 deeper music curriculum might juSt bo.R2J . ~O.lislne are positivel y or negalively corre late d. the fraction of English learne rs is pv..9% 1. In our data. so th at the percentage of English learners enters the error term with a negat ive sign.148). especially the lIr1iclc by Ellen Winner and Mon ica Cooper. (As diocussed in Chapter 4. Shaw ::Ind Ky.
in districts with compa r<l ble percentages of English learners. 665. but she has no control ove·r the fraclion of immigrants in her community...I Omi!!ed Variable Bios I9 T (dist ricts wilh more English learners have larger classes).. Thus llle student.9 3. 23. 7..6..0..~t $c. so one reason tha t the O LS estimator suggests that small classes improve lest scores may be that the districts with small classes have fewer English learners.statistic 4... f SlU ['or 664. As a result .teacher ralio ~I would be bi. she is interested in the eITect of the studentteache r ral io on test scores. so Px" < 0 and the coeffi cient on Ihe student.ct. by the Percentoge of English Learners in the District StvOentT_heI'" ROlio < 20 pr_ the ~s ~II Te st Sc:_ . of English learners comparable to hers.. In othe r words. I. Ratio ~ StudentT_'20 DiH... having a small percentage of English lea rners is assoc iated both with high lest scores and low student.5 665.2 654. ::.8 6'9. 636.' 7.0.''"% .' 61 I. 1 reports evidence on the relatiom hip between class size and test scores with.0 .72 0. Test $c_ " 182 di. 657. This new way of posing her question sug gests that... do those with smaller classes do better on standardized tests? Table 6..4 .teacher ratios....1 " 44 50 51 1.lscd toward a negative number. Among this subset of dis tricts.30 1.\ " 238 . holding conslUnt other factors.U% ..04 PCrCtn13gc: of English learners '< I~·ll.nn(U 650. Addressing Omitted Variable Bias by Dividing the Data into Croups What can you do about omilled vari able bias? O ur su perint endent is considering increasing the num be r of teachers in her distr.13 denl'> gli~h II} 'ItR"o ion of nl ti Cl r..d \ T ABU 6 .. Districts " " . includ ing th~ percentage of English learncn>. ".3 low". perhaps we should foc us on districts with percentage!.rer>(e in T.68 634. instead of using data for aJI districts.4 661. 1 \ Differences in Test Scores for California Sthool Districts with Low and High StudentTeacher Ratios. Htgh 5TR .s.7 27 44 .teacher ra lia (X ) would ~ lU!g(lfjVe/y correlated with Ihe error term (II).' .
First.he average for the 27 with bigh student. d istricts arc further broken down into two groups. districts wi th low stude ntteacher ratios had test sco res tha i averaged 3. the {statistic is 4.. the average test Scare is 7.my quarti le? The answer is Ihnt Iht: d ist ri cts with the most English learn e rs tend 10 have borh the h igh~~t student. At first this finding might seem puzzli ng. the difference in performance betW een di~ trie ts with hig h and low studentteacher ratios is perhaps half (or Jess) of the overall estimate of 7. or the districts wi th the fewes t English learners « 1.teacher ratios is 664. How can the overall effect 0 1" te'! scores be twice the cffcct of test scores witllin .1 report the diffe rence in test sco res between d istricts wit h low and high studentteache r ratios. the dist ricts with the most English learners have both lower lest scor~s and higher studentteacher ratios than the other d i. .teacher ratios alld the lowest test scores.4 points.)triclS. Th is evidence presents a diffe rent picture.9 points fo r the quartile of districts with the most English lea rners. Second. < 20 and equals 0 oUlerwise. the districts are broken into four ca tegoric~ that correspond to the quartile~ of thc distribution of the perce ntage of English learners across districts.9%). so (h e null hypoth esis that th e mean test sco re is the same in the two groups is rejected at the 1% significance level.04 . th at is. within each of lhese four categories. the diffe rence in test scores between these two groups without breaking them down further iOlo the quartilcs of E nglish learners. So.4. The districts with few English learn ers tend 10 have lower student.4 points higher in d istricts with a low student. The difference in the average le~1 score hetween districts in the lowest and highest quartile of the percentage of En!!" lisb learnerS is large.teacher ratios: 74 % (76 of 103 ) of \be districts Ln the first quartile of English learners have small classes (STR < 20) .9 points lower in the districts with low stu dentl eacher rfllios! II) the second quarti le.teac he r ratios is 665.2 points for the third quarLile and only 1. The firs t row in TabJe 6. depending on whether the studentteacher ratio is small (S'f R < 20) or large (STR ::! 20).te acher ratio th an a hi gh one .1 8) as the OLS estima te of the eeefti · ciem on DI in the regression of TesrSco re on Db where Di is a binary regressor thaI equals 1 if STR .teacher ratios. Thus. the average test ::score for those 76 with low student. The final fo ur rows in Table 6. approximately 30 points. for the districts with the fewest E nglish learners.) Over the full sample of 420 districts. while 01\Iy42% (44 of 10:5) of the djstricts in the quartile with the most English learners have sm311 classes.3 poi nts highe r th an those wi th bigh stu dentteacher raLios: this gap was 5.1 reports the overall difference in average te51SCorcs between districts with low and high student. Once we hold the J>l!r centage of English learners constant. (Recall that this difCerence was previ ously report ed in regression form in Equation (S.192 CHAPTER 6 L inear Regression with Multiple Regrew>r~ ore d ivided iota ei!!)lt groups. test scores were on average 0.5 and t. broken down by the q uartile of the percentage of E nglish learners.
1 improve lipan the simple differenceofmeans analy sis in the firsl line ofl"ahle 6. = x2) is the condit ional expectation at Y. = Xl a nu X 2. using the method of mull iple regressio n.6. Eq uatio n (6. = X 2' That is. is the slope coeffici e nt of Xli or.eacher I gh s1U . IX jj = Xl' X 2i = Xz) = ~o + f3 1. acher n1l! an This analysis reinforces the superi ntendent's worry that omitted va riable bias is prescnt in t he regression of test scores against the studentteacher ratio. given hy the lin ear fUllction I '.) of the stu dent.l X l' = X l ' X z. mo rc simply.). or rc for 7 wilh nglish o w stu The Population Regression Line Suppose fo r the mo ment thai the re a re o nly two independent variables. In the class size problem.. given that X l.~ ha n ging onc variable (X t . In the linear multi p!e r egression model.2 The Multiple Regression Model The multiple regression mode l extends the single variable regression mode l o f C hapte rs 4 and 5 to in clude additional variables as regressors. tween tile o f re. 1.teacher ra tjo in the jlb district (X li) equals some value x. O ne or more of the inde penden t variables in the multiple regression model arc some times referred to as control l'ariables. given the studentte acher ratio anu the percenl flge o f English learners is given by Equation (6.) while holdi ng the other regressors (Xi i' X. This model permits estimatin g the effect on Y. the lest score differences in the second part o f Table 6.2) is the population regreSSion line o r popuJatioll regressio n runc tio n in the multiple regressio n model. Such an estimate can be pro vided .2) b highest ~Tage test ~e of Eng ~isb \caW districts in :only42% ~ave small Itest sco T':S wh cre E( Y. The coefficient f30 is the inte rce pt. of .teache r ratio (X].i' and so forth) constan t.2). Slili. the coeffici ent o n Xu. is.9 points the per reen dis ~) of the ~\ of I CS t thai the E( Y. Xli a nd X 2i . the mu ltiple regressio n model provides a way 10 isola(e the effect on lest scores (Y.2 The Multiple Regreuion Model 193 ~ies i ·lish are he r ores "nee into usl\' £fi r that triets. the coeffici ent on XII' lind the coefficienl Pz is the slope coeffi cient of Xu o r. more simply. • . Y.t] "t" f3rz. the coef ficient f3 . By look ing with in quartiles of lhe percent age or English learners. holding constan t the fract ion of E nglish learne rs. if the student. 6. however. th e average r elationsbip be twee n these two independent variables a nd the de pendent \'ariable. a nd the percentage of English learners in the J"th district (Xu) equals X2' then the expected value of Y.ful estimate of the effect on test Scores of changing class size. th is analysis does not yet pro vide the super intende nt with a use. (6..) while holding constfln t lhe percentage of studen ts in the district who are E nglish learners (X 20. l vel.
Y = Po + J3 . test scores are innuenced by school characteristics. and luck. een tb e expecled value of Y when the independenl variables take on the values XI + I I and X2 a nd Ihe expec ted value of Y whe n the independent variables take on th!' values X I anu X 2. this relationship does not hold exactly because many other (acto rs influe nce tbe depe ndent variable. Accordingly.2). The interpret ation of the intercept in the multiple regression model.2) needs to tK: augmented to incorporate these additional factors. ctoTS Ihal deter .194 CHAPTER 6 Unear R egrcs5ian with Multiple Regres$Or5 The interpretation uf the coeffi ci. J USI as in the case o f rcgre<. (or exam ple.(X I + AX\) + P2X 2• (6.XI .enl p.2) is different than it was when X II was the only regresso r: In E q ua lio n (6.2) as Y = Po + (3 jX 1 + ~X2 ' and imagine changing X l by the amount .4) The coefficieot p\ is the effect on Y (the expected change in Y ) of a unit change in Xl' bo lding X'2 rucd . Y + 6 Y.holding X 2 cons taot (6.nition that the expected ellen on Yo ( a change in X l' 6.1 y. the f. in Eq uation (6. say . hOldi ng X 2 consta nt. that is.1 X \ while O changjng X 2. while holding Xl constan!.. the new va lue o f Y. The Population Multiple Regression Model TIle population regression line in Equatio n (6.i s sim ilar to Inc interpretatio n of the intercept in the singl e re gr~ s sor model : It is the e:'Ipected va lue of Y.. Ot Y w1. 131 = :~. yielding 6.. fJ . mines how fa r up the Y axis the population regressio n line starts. write the popu lation regression (unelion in Equa tion (6. is the effect on Y o f a unit change in X I ' holding Xl co nstant or controlling for Xl ' This interpretation of P! follows from the defi. After this change. o ther student characteristics. however. + /3zX2 from Equation (6. Just as in the case of regression with a single regressor. is y + 6.11 change by some amount . Because Xl has changed.sion wi th a single regressor. is the difference bc tll.3). when X li and Xu are zero.Ieacher ra tio and the fractio n of students still iea rniog E nglish. Simply pU the intercept 130 dctt:r L. Tn addit ion to the stU dent.2) is the relat ionship between Yand Xl Rnd X 2 that holds on average in the population.fJo. Ano ther phrase used 1 describe 131is the plllrtial erred on 0 Yoi Xl' holding X1 fixed .Y = 1316XL' Th at is.3) An equation for 6 Y in terms of /lX l is obtained by su btracting the eq uation Y = AI + PtX. Thus the popul ation regression function in Equation (6.
Eq uatio ns (6. Similarly. Accordingly.. The definitio ns of ho moskedasl ic ity a nd heleroske dastic ity in Ihe multi ple regression model are extensions of their defi nitions in the singleregressor mode l.tj in the multi ple regression model is homoskedastic if the vari ance of the con dit ional distribution of II. the intercept. more gener ally..6. . .i. (6. X li and X2r I n regression with binary regressors it can be useful to treat (30 as the coeffi cien( on a regressor tnat always eq uals 1. . = (30 + {3\ X Jj e + M 2i + u" i = I. (6.5) . . bIll Olhe r measurable fac tors that might a[fect test pe rformance. (3(1) is some limes called Ihe const aol term in the regression .. . .. 5) is the population multiple regressio n model whe n there are two regressors. where XOi == 1 for i = 1.. E quation (6.. For example.5) can a lte rnatively be wriuen as d. . the populat ion multiple regressio n mode l in Equa lio n (6. The multiple regression model holds out the promise of provid ing just what the superintende nt wa nts to know: the effec t of cha nging the student. Xl e where the subscr ipt i indica tes th e ph of the 11 o bserva tions (distric ts) in the sampl e. This reasoning leads u ~ to consi de r a model with three regressors or. incl uding the economic background of the siudenis.n. .6) K6. the error term is beteroskedastic.. Xl i' X li' " . .X.he stu· "ample..2 ) aj.IXli •···• X ki ). The two waysofwriling Ihe po pula tio n regressio n model.\i' var(u.2 The Multiple Regres~on oYlodel 19S .. just as ignoring the frac tion of English learners did. we have Y. is constant for i := 1. a re eq uiva le nt The discussion so fa r has focused on the case of a single addilio na l variable.2. Accordingly. is summ a rized as K ey Conce pt 6.6). a mod el tbat includes k regressors. (In The variable Xr~ is sometim es called the constant regressor because it takes on the same value. The multiple regressio n model with k regressors.the value Ifor all observati o ns. given Xli"'" .whe reXr~ == l. n. + . t "e rror" term u i' This error term is {he deviation of a particul ar observation (test scores in the /1h district in our eXample ) from the average population r elationship.teacher ratio. ignoring the stude nts' economic background might result ill omitted variable bias. holding constant o the r fac tors that are beyond her con trol. TIle error te rm l. n and thus does no t depend o n the values of Xli' . == f3oXOi + {3\Xli + ~X2' + ui. ds to \'lC a l d~ter . Xl' In practice. the re might be multiple factors omitted from the si ngle regressor model.3) ~ i on Y. teristi(:S. . To be • .5) and (6.. XJ:.i = J... These factors incl ude nOI just thc percentage of English learners. n.Xkr Otherwise.4) aoge cl an r sim is the (leter· ~ ynnd rcs~i o n )cca use . tltink of f30 as the coefficient on Xu. however.
. '~I:of the population regre::. ·· · . these coefficient!.3 The OLS Estimator in Multiple Regression This section describes how the coefficients of the multiple regression model can be esti mated using OLS... Fortunately. resulting from changing X li by one unit . . + fJlrf /i' • f31 is the slope coefficient on X" {32 is the coefficient on X 2• and so on. .ion with Multiple Regre$$OrS I Kf' CONaPT THE MUlTIPLE REGRESSION MODEl The mu1ripk regression model is YI = 6. + fJ. however. • The intercept fJo is the expected valu~ of Y when all the X s equal O. we need 10 provide her wit h c~ti m'ltcs of the un known population coefficien ts flu" ..7 ) Po+ {JIXU + fJ2Xu + . X/w The coefficients on the other Ks afC interpreted similarly.. sian model calculated using a sa mple of data..Xj.tit observations on each of the k regressors: and Yj is .2 ~ 1. The coefficient fJl is the expected change in Y.. . of practical help to the superintendent. 6.ln be cstimaled using ordinary least squares.....11t observat ion on the dependem variable: XI I' Xu . is the error term. 'Th. X ... II (6...196 CHA PH I 6 lineor Regres!. holding constant Xl.Xtj + Ill ' i = where • .:.. • The population regression line is the relationship thai holds between Y and the Ks on average in the population: £(Y !X II =.. that equals I for:lll i.XI. d '"""fJo + P I Xj + ~1'2 + .. e. are the /I .. . " ". .Xl l = X2. = x.e interce pt can be thought or as the coefficient on 11 regressor.. Xo. ..
. The OLS regression line is the strai ght line consLIucted using the OLS estimators: ~o + ~I XI + ..'oo and . to use explicit fo r mulas for th e OLS estim ators that are derived using calculus. . + ~kXk' TIle predicted value of Yi given Xli' . (6.tXkt. . h k until you are satisfied that you have minimized the total sum of squares in expression (6.. . . .es for the linear regression modd in expression (6. The formulas fo r the OLS estimat ors in the multiple regression model are simil ar to those in Key Con ccpt 4. bl: b e eslimators of . The key idea is Ihat these coefficients can be estimated by minimizing the sum of squared prediction miSlakes.ression model with a single regressor.b o . is V.Tht: sum of these squared prediction mistakes over all II observations thus is ..bo .8) Tlle sum of the squared mistak.3 The OLS Estimator in Muhiple R:egfe~sion 197 The OLS Estimator Section 4. .. the formulas are best expressed and di ~ c usse d using matrix not Cltion .. + + ~r... =[ " 2: (Y. The OLS residual for t he /"Th observation is the dif fere nce between Y..(bo + b 1X1..6) fo r the linear regression mode! with a single regressor. In the mu ltiple regression model .8) is the extension of the sum of the squared mistakes given in Equation (4.okin the multiple regression model. + b0/. .OJ. w their presentation is deferred to Sect ion 1B.ok.l'. .6. ~ ~o + ~lXl. 131.8) are called the ordinary l~as! squar:s (OLS) estimators of /30. .. is bo + b]Xli + . The estimators that d o so are the OLS eslima 1 0rs..00'. . Y. lhal is.XI:'. It is far easier. b. Lei boob].b1X t.1. repe atedly tryin g differen t values of bo.0. . . ..The predicted value o f Y" ca[culated using thcseestimators. .:i) ~ Y. .8).bl Xy. however..Yj . by choosing the est imators bo and b l so as to minimize L~.. T he es timators of Ihe coefficients fio.2 shows how to estimate the intercept and s lope coefficients in the singleregressor model by a pplying OLS to a sample of obserntions of Y and X.x. + b~k" and the mis wke in pred icti ng Yi is Yi .... . 111e OLS .X" . /31' .I (Yi .0 1... The method of OLS also can be used to estimate t he coefficients /30.'ok that minimize th e sum of squart:d mistakes in expression (6.f3k· The OLS estimators are d e noted 'ou· 131' · . These form ulas are incorporated into mod ern statistical software. the OLS residual is can u. . •.b. X kl' based on Ihe OLS regression line. + . . 13k· The terminology 01" OLS in the linear multiple regression model is the same as in the linear reg. . 0= call e~tima t ors could be computed by trial a nd error.. . bo b. and its OLS predicted value_ th at is. .2 (or the singleregressor mode l..
198 CHAPTER 6 Linear Regression with Multiple Regressors THE OlS ESTIMATORS... '~k and residualtii are computed from a sampl e of II observati ons of (Xli' . and residuals ii.e = 698.28 x STR. = + ~S. the estim ated O LS regression line...11 ).3 The OLS est imators fJo. and (6. .{3I " . f31' . we used OLS to estimate the inte rcept a nd slope coefficient of the regressio n relating test scores (TcsrScore) to the slUdeoHeacher ra tio (STR ). i = I. IJ.. That is..:. is r. •fh arc the values of bo...b~J.__~~==~~~~ MIILTIPLE REGRESSION MODEL ______________________ T ••.3. are y/  Po + b1 U x il. 6.... ANn RFSlnllAI ~ IN THF ~CONaM . These are estimators of the unknown true popula tion coe ffic ients {3o.. 11.. using OUf 420 obse rvations for California school distric ts....9  2.. TIle defi nitions and terminology o f OLS in muhiple regression are sumnw · riled in Key Co nc~ pl 6. it is possible that the OLS estimator i" subject to om itted va riable bias. PREDICTED VALUES.• (3A and error term. The OLS estimators ~o.bo . X ki • Y.. reported in Eq ua tio n (4 .9) (6. We arc now in a posit ion 1 address this concern by us ing OLS to estimate a 0 multiple regression in which the dependent variable is Ihe test score ( Y. 11... ~I' .) and thc:rl! are I~O regressors: the studcnI.i = 1.hl be picking up the effect of having ma ny E ngl ish learn e r~ in ois tricts with large classes...2.. 11 /..) and the percent age of Engli~h . .teachcr ratio (XI.s.. . .b1XII . h( that minimize the sum of squared prediction mistakes L~_I (Yj .10) Yj  Y. Application to Test Scores and the StudentTeacher Ratio in Section 4.). (0.b l•..Y' The OLS predicted values Y.Xk/' i = I..11 ) O ur concern has been that this rela tionship is mis lead in g becau se the stu · de nHeac hcr ra tio mig.
which is what the superintendent need!. red ucing the stu dentteache r ra ljo is estimated to have a larger e ffect o n test scores.'hal yo u Jearn ed ahout the OLS estima tor wit h a single regressor carries over to mu lt iple regression with few or no mod ifi cations.11 ). balding constant (or controlling for) PctEL.11) ).0 .28 points. 12)]. it provides a quantitative esti mate of the effeCl o f a unit decrease in the studentteacher ratio. and the OLS estim ate of tlte ud coeff icient on the percentage E nglish le arn ers lfh) is . multi ple regression has twO important advantages.65. . . . First.12)]. Much of ". Second.10 X STR . th e OLS estima te of the coef ficient on the sL ent.6. but in the mu l tiple regression equation (Equation (6. to make her decision.1) and tbe multi ple regressio n approach (Equation (6.10.teacher ratio by two differ ent pa ths: tile tabul ar approach of dividing lh e data into groups (Section 6. • .1 1l he stu 1earo l ataT is t imate D ~d ther: :English where PCfEL is In e percentage of students io the district who are English lea rn ers.0. we saw Ihal districts with a high percelHage of E nglish 1c arn e r ~ tend to have no t on ly low test scores but also a high st ude ntteacher ra lio.0.teacher ratio (131) is 1. it readily ex te nds to mo re than two regressors.3 The OlS Estimator in Multiple R egression 199 learners in the school district (Xl. so we will focus on that which is new wit h multi ple regression.1. \'vhcreas in the singleregressor regression. Pc{E L is not held constant.a unit decrease in the STR is estimated to increase test scores by 2. butl his estimate refl ects hmh the effect of a change in the studenH eacher ratio ami the omitted effect of baving fewe r English learne rs in the distri ct. it is estimated to increase test scores by only 1..) for our 420 distri cts (i = 1.1 0 points.65 X Pct EL.  (6. The estimated effect on test scores of a change in the studentteacher rat io in the multiple regression is a pproximately half as large as when the studentteacher ratio is the onty regressor: in the single regressor equation (Equation (6. If the fractio n o f E nglish It'a rne rs is o mi lled fro m the regressio n. TIle es ti· mated OLS regression line for litis multiple regression is TesrScQre = 6M. These two es tim ates ca n be reconciled by concl uding Ihat th ere is omitted imate in the singleregressor model in Equation (6. We have reached the same conclusion that the re is omitted variable bias in the relationship between lest scores and th e student. 0.420) .1. ln variable bias in th e eST Section 6. We begin by discussing measures of fit for the multiple regression model.1 2) f the us~ ng t SSIOn (6. O f lhese two methods. TIle rest of this chapter i~ devoted to understanding and to using OLS in the multi ple regression model. so that muHiple regression can be used to con trol for mea surable fac tors olber than just the percent age of English Jearn ers.. The OLS estimate of the intercept St3o) is 686. This difference occurs hecause the coeffici ent on STR in the mu lti ple regression is tbe effect of a change in STR. .
The mathematica l defin ition of the R2is Ihe saml. 'i. Equivalently_the R2 is I minus the fraction of Ihe \..TSS ' (6 141 }.ion line describes..3.7_ l itf.k . w 1 l ere.2. the SER is a measure or the spread of Ihe distribution (If Yaround the regression line.l bias int rod uced by I!stima ti ng twocoefficie nlS (the .l) where the SS H.Y)~.l summary statistics in multiple regression are the standOl rd error or the regression . If there is a sing.In unee of Y i ll o t explained by the regressors. In multiple regression. so the formula in Section 4.3 is the same as in Equa tion (6.2 (rather than 1/) adjusts for the downwa n.: l of the degrees·offree dom adjustment is negligible.4 Measures of Fit in Multiple Regression Three commonly usel. In Section 4.1 rather than n is called a degreesof·freedom adjuslmenl. When II is large.3. The only difference b~ t wcc n the definition in Equation (6. TheR 2 The regression R! is the fraction of tbe sample variA nce of Y. the divisor 1/ ..n .. All lh ree stat istics measure how well the OLS estimate of the multiple rcgrcs. SSR . lhe di visor n ..1.2 _ 1 '" ' 2 _ k _ t ~fli  1' II _ SS R k I' (6.... then k =. The Standard Error of the Regression (SER) Tbe standard error of the regression (S ER) estima tes the sta ndard deviation of the error term H i' Thus. 13).1. (Y.explained by (or pre dicted by) the regressors. using n . and the adjusted R2 (also known as R ~ l.) as for regression with a .3 for the singleregressor model is that here the dIvi sor is 11 ~ k ~ 1 rathe r lhan n ...13) and tht: defm ition of Ine S£R in Section 4. As in Section 4.I adjusts for the downward bius introd uced by es tima ting k + 1 eoeft"iciell1 S (the k slope coeffi· cients plus the inte rcl!p t) .j. or ·'fits.. Herl!. . the dfcr. the SER is SE R  ~.lope and int ercept of the regression line) .k . is tht: s um of squared residuals.· 200 CHAPTER 6 linear Regression with Multiple Regres$(I(1 6. the regressio n R2.. :£7." the data.)2 and where the explained sum of squa res is ESS = squares is TSS = l: ~"I (Yi .  the total sum of .le regressor.in gle regressor: R2 _ ESS SS R TSS = 1 .
OlS finds the values of the coefficient s that minimize the sum of squared res idual ~ If OLS happens to choose the coeffi cient on the n ~ w regressor to be cxactly zero.1) increases.\)/(11 . But if OLS chooses any value other than zero. First. the e Heet Ii?· is n I SSR 5'1: n .. sothe . O n the other hond. which increases the R2. When you U51. add ing a regresso r ha s two opposite effects on the R2.ller than 1. The adjusled Rl.15) s hows..2. 'r (6. .k . TIle Til = I _ he defin i the divi 'r than n} the slope ts for the pc cocfii II is called " l.13) The "Ad. Tn this sense.6_ 4 Wieowres 01 Fit in Multiple R egrenion 201 lndard <IS Rl ).I).1 4 ) total sum of The differe nce b~t wee n this Connula a nd the second defin.L th e R2 Clln be negative. Whe ther the 'R2increases or decreases de pends on which of these two eHects is stron ger. and this is what the adjusted R 2. or /i. the SSR taIl s. the R2 increases whe never a regressor is added. rc~~ ion llion of ution of In mult ipl e regressio n.ition of tbe R2 in Equa tion (6.1 T SS = 1 . is a modi fie d version of the R2 that does nol neces sarily inc rease whe n a nelV regressor is added.k .the factor (n . then it must be thaI this .1) /(11 . does.k .13)} 10 the sa mple varia nce of Y Th ere are three useful lhings to know about the R2. then the SSR wi ll be the same whether or no t the seco nd va riable is included in the regress ion.usted n2" Because the R2 increases when a new variable is added.k . unless the e~ l im a( e d coefficient on the added regresso r is exactly zero. taken toge th e r. about starting wit h o ne regressor and I h~ n add ing a second. SO]?2 is always less than R2. On the o ne hood. One way to correct {or this is to deflat e o r reduce the R2 by some factor.·alue reduced the SSR relative 10 the rcgression that excludes this regressor. "nlin. so in general the SSR will decrease whe n il new regressor is added. red uce Ihe sum of squ ared res id uals by such a small amount that Ihis reduction fails to offset the factor (11 .1) /(n .1)1 (n . the R2 gives an inflated estima te of how we ll the regression fits the data. 15) by (or prt: of the vari ~ with a sin (6.J) is always gre .(n .. or "R 2. To see this.14) is tha t the ra lio of the s um of square d residuals to the Lmal sum of squares is multiplie d by fhe faclor (n .As the second ex pression in Equation (6. think.: OLS 10 estimate the model with both regressors. Th is happens when the regressors. But this means th at the R2 generally incre ases (and never decreases) when a Dew regressor is added. Second . (6.I). an increase in the R2 does not mean Ihat adding a variabl e actually improves the [it of tb e model. Ihis mea ns thai the a djusted R 2 is 1 minus the ra tio of th e sa mple variance o f the OLS residua ls [wilh the degreesoffreedom correct ion in Equution (6.k . In practice it is extremely unusual for an estim ated coefficient to be exactly zero.
We return to th e issue of hoW 10 decide which vari. 6. o nly a small iract ion o f the va riation in TestScore is explained . The R2is useful because it quanti rie. In thi s sense.instead. the decision about whe tbe r to include a variable in a muhiple regression should be based on wheth er including th aI va ri able alloW S you belle r to estimate the causal effect of interest. or explain. however.5 The Least Squares Assumptions in Multiple Regression lliere a re fo ur least squares assumptions in thc multiple regression moJ:. mo re than twofifths (42.426. the \'a ria tion in the depen' dent va riahle. The R2 fo r this regression line is R2 "" 0.051 100. Because 11 is la rge and only two regressors appea r ill Eq uati on (6. firs!.202 CHAPTER 6 lineor Regression with Multiple Regre~S()(~ Appl ication to Test Scores Eq ualjon (6.41. TIle SER for tlte regression excluding PcrEL is 18..424).ng the least squMes assumptio ns of Cha pte r 4 to the case of multiple regressors.IEL is add\!d to the regressio n.12). however. I . th~ difference between R2 a nd adjusted R2 i$ very small (R l "" 0.. the extent 10 which the regressors account for. and the stand ard error of Ihc regression is SER "" 14. Neverlheless. the adj usted R2 is R2 = 0. is excluded (Equatio n (6. " maximiu Ihe Rh is ra rely the a nswer to a ny economically or sHlt islically meaningful queslion.lbles to includcand which to exclude.. ion with o nly STR as a regressor. heavy reliance on the l?2 (or R2) can bot: a trap. Whe n lht: only regressor is STR.12) re ports the estimated regression line for the mull iple regression rela ting leSl sco res (TestScore) to the stude nHeache r ra tio (STR) and the per.6%) of Ihe variati on in test sco re~ is explain \!d. we need to develop methods for quantifying the sampling unce rt:"linty of the OLS estimator.11 )J shows tha i including PerEL in t he regression increased Ihc R2 from 0. The starting point for doing so is e xtcndi. Using the Rl and adjusted R2.5 when PCIEL is included as a second regressor. this va lue falls to 14. In <lpph· cations.426 ve rsus"R2 "" 0. 424 .(. The units of the SER a re points on the sta ndardized test The reduction in lhe SER tells us tha t predic tions about stan· da rdized lest scores are s ubsta ntia lly morc precise if they are made using the regression with both STR a nd PC:IEL tban if they a re made using the regrc~ .6.5. cent age of English learners (PcrEL).in Chapter 7. Compa ring these measures offit with those for tbe regression in whic h PetEl. including the percentage of E ngli sh learn ers substant ially improves the fit of th e regression. whe n Pc.
The fourth assumption is new and is discussed in mOTe detail.) < 00 .: i .i. .d . is zero.. Y. n Are i.. X .. the Assumption #2: (X ii I X 2il • •• I X k i . ~gresstOn ~ tifie s the re de pen .5 ~o tn t s on lout stan ~sin!1.6. It rules out an inconvenient situation. ariahle in ble al\ows ue of how ter 7.) random v:uiables. Another way to state this assumption is that the dependent variable an d regressors have finit e kurtosis.1his assum ption holds automaticall y if the data are collected by simple random sampli ng. TIle second assumption is that (Xli" . ca ll ed perfect multicollinearity.. and Y. YJ .u has a mean of zero.3 for a single regressor also apply to multiple regressors. PctEL ession a small . in which it is impossible to • . ex tended to aHow for multiple regressors. ~ to \4. As is the case for regression with a single regressor. . 'In is assumption extends the first least squares assumption with a single regressor to multiple regressors. observations with values far outside the usual range of lhe dataare unlikely. l 11e commellls on this assump tion appearing in Section 4. . this is the kcy assump tion that makes the OLS estimators unbiased.) < 00• . but on average over the popu lation Yi fall s on the popul at ion regression line.Xki. 11 are independently and identically distributed (i. ~atistica\lY . .) .. p.i.) < 00 and 0 < £(Y . . First.ln appli Assumption #3: Large Outliers Are Unlikely The third least squares assumption is that large o utl iersth at is.426. . . . added bores is antially ~ress oTs iT)' small .3). i = I. This assumption is used to derive the properties of OLS regression sta tistics in large samples.5 The Least Squores Assumptions in Multiple Regression 203 (Key Concept 4.TIlis assump tion means that sometimes Y i is above the population regression line and someti mes Yi is below the population regressi on li ne. .I • •• I X ki Has a Mean of Zero The first assumption is that the condi tional distri bution of II i given Xl i'" . have nonzero finit e fourth moments: 0< E(X1. We rNurn to omitted variable bias in mulliple regression in Section 7. as in singleregressor case.0 < E(X l. X /. !ssion e peT j0.d. and these are dis cussed only briefl y. .5. the OLS estimator of the coefficien ts in the multiple regression model can be sensitive to large outliers. s S£ R Assumption #1: The Conditional Distribution of U j Given X liI X 2. for any value of thc regressors. the expected value of II. The assumption that large outliers are unlikely is made matllcmatically precise by assuming that Xli' . i = 1. Therefore. ertaintyof ast squa re5 Assumption #4 : No Perfect Multicollinearity The fo urlh assumption is new to the multiple regression model. This assumption serves as a reminder that.
..... The (ouClh least sq uares assumptio n is Ihal Ihe regressors are nOl perfectly multicollinear... The regressors are said 10 be perfectly mu1li co ll in~ar (or 10 exhibit perfert multicollinearity) if one of the regressors is a perfect linear funclion of the o the r regressors. . 3. is a problem because you are asking the regression to answer an illogical question .h the computer.sor (the second occurrence of STR). 2. the coefrici em on tbe fi rst occurrence of STR is the effect on t~SI scores of a change in STR .• X ki amI Yi have nonzero fi nite fourt h moments. 4.n. .P1 6.. and 00 _____ • _ .: Ihal is. holding constanl STR.. Depending on how your suft ware package handles perfect multjcollinearity. " I comp ute th e OLS estimator.xk/ + ff"i = t.'" ' XJ. . on STRi and PaEL. The mathematical reason [or this fai lure is that perfect multicolli nearity produces divi sion by lero in the OLS formuJufo..d. (Xli' Xli' . 111cre is no perfect mult icoll inearity.::ss TeSISt'orc.where L ". I/. perfect muhicollinellrit). _ . .. has condit ional mean zero given Xli' X:u. .4 THE LEAST SQUARES ASSUMPTIONS IN THE MUlTIPLE REC.) draws from their joint distribution. + f3zX2i + . thl! coefficient 00 one of the regressors is thc effect of a change in lh:l! regressor." are independently and identica lly distrib.. Large outliers a re unlikely: X l i •. In the hypothetica l regression of TesfScon' on STU and STR . In mult iple regression. uted (Li. Vi)' i "'" 1.I()N M()nFI  Y. = f30 + J3 IX j. .. • ! _ _ .er regre!..Xk . At an imuili ve level.exce pl tbat you make a Iypogra pttieal error Jud accidenta lly type in STR.<. ' _ u _ _• .. This makes no se nse.. . J::(II. you reg. if you try to estimate this regrt:~ · sion the sohware will do one of three things: (1) II will drop one of the occu rre nce~ of STR. (2) it will re fu se to ca lculate Ihe OLS estimates and give an error me S)~lge : or (3) it will cra:. + f3. .F<.X/ci) = o. a second time instead of PC/EL.. hold ing the olber regresw rs const ant.: IhM is. .I ! 204 CHAPTER 6 linear R egression with Multiple Regreuoo ':~:~::. ___ ! __ • _ __ _ !. on STR. and STR j • This is a case of perfect mull icollin earily beca use on~' o( t he regressors ( th e first occurre nce of STR) is a perfec t linea r fuoction of anoth. · .IX!i' X 2. Why does pe rfeel muLticolLioearilY make it impossible 10 compu te the OLS estimator? Suppose you wam to estimate the coefficient on STR in a regre~~io n of Tes/Score. _ _ _ _ .
hold )c{Jre 0 0 .6 The Distribution of the OLS Estimators in Multiple Regression Because the data differ fro m one sample to lhe next .. :t lnd OL S on test . In large samples.4.6. tb e OLS estimators ~I)' ~l' . In additio n. The least squarl!s a ~umptions for the mult iple regression mode l are SUmma· rized in Key Concept 6. the OLS estimat ors (~() and ~l ) are unbiased and consis tent estimators of lhe un known coefficients ([30 and (31) in the linear regress io n model wi th a si ngle regresso r. in large samples.. This example is typical: When perfect multicollinearity occurs.. . and if the sample s ize is sufficie ntly large the sampling distribution of those averages becomes normal. the solu· tion \0 perfect multicollinearit y is to mod ify the regressors to e liminate the problem.p. the samp ling distribution of ~o a nd ~I is well approxi· mated by a bi\lariate normal distribution. which also define s and di scusses imperfect multicollinearity. R eca ll fr om Section 4. '~" are unbiased a nd consistent estima tors of 110 {31> . is that ~' OU art' :ion. 6. A dditi o na l examples of p erfect mu ltico lli nearity ar e give n in Secti o n 6. That is.b".4). 13k" Just as in the case of regression with a sin gle regressor.• {3" in the linear multiple regres ' sio n mode l. wruch is the exte nsio n of the bivariate normal distribution 10 the general case o f rwo or mo re joinlly norma l random variables (Section 2. Because the multivariate normal l. differe nt sample.[3o. it o ft en refl ects a logica l mistake in choosing the regres sors or some previously unrecognized feature of the da ta set. o are averages o( th e randomly sa mple d da ta. produce different \lalues of the OLS es timators. {3 I....4 tha t. .7 .. under the least sq ua res assumptions of Key Concept 6. .6 The Distribution of the OlS Eslimotors in Multi ple Regres~ i on 205 The solutio n to perfect multicollinearity in this hypothe tical regression is sim ply to correct the typo a nd to re place o ne of the occurrences of STR with the vari able you o rigina lly wa nted to include. A lthough the algebra is more complicated when there a re multiple regresso rs. . . under the least squares assu mptio ns.. th is varia lion is s umm a rized in the sampling distri bution of the OLS estimators.~: Po.. In general.. This variation across possible sa mples gives rise to th e unce rtainty associated with the OLS estim ators of the population regressio n coe fficients. . These results can)' over to multiple regression analys is.. the cen tral limit theorem applies to the OLS estimators in the m ultiple regression mode l for the sa me reason that it applies to Y and to lhe OLS estima tors when the re is a single regressor: The OLS est imators bo ~ \.::ssag. th": ()T ... the joint sa mpling distribution of '~k is well appr oximated by a muh iva ria te norma l d istributio n.4.
. This section providcs some examples of perfect mu lt icollinearity and discusses how perfect multi collineari ty can arise.. perfect multicollinea rity arises when one of the regres sors is a pe rfect linear combi nation of the o the r regressors. and the general case is diSCU SSed in Section 18. 6.5. it docs mean that one or marc regression coefficients could be estimated imprecisely_ Examples of Perfect Multicollinearity We continu e the discussion of perfect mul ticollineari ty from Section 6.2. .5 P i distribution is best handled mathem atically using mat rix algebra .. 131. nor does it imply a logical problem with the choice of regressors.5 by exa[ll ini ng three addit ional hypothet ica l regressions. the OLS est im ators arc co rrela ted. However. . and PC lEL. 13k :~~~~~~If the least squares assumptions (Key Concept 6. for the join! d ist ributio n of the a LS estimators are deferred to Chapler 18.lTj ). 12). in Equation (6.7 Multicollinearity As discussed in Sect ion 6. k.with th e oLher regressors. PI' ... the expressioru..• 206 CHAPTE R 6 linear RegrElS$ion with Multiple Regresson K..y CoNcEPT LARGE SAMPLE DISTRIBUTION Of 13o. this correlation arises (ro m the corn: lalion between the regressors..A . 6. In general.. impe rfcet multico llinearity does no t prevent estimation of the regress ion . a lhird regressor is added to the regression o f TestScQre. in regressions wit h mult iple binary regressors. .hut not perfectly corre lnted..5 summarizes Iht. . Unlike perfect multico llinearity. the distribUl ion of the OLS esti mato rs in multiple regression is approximately jointly normal. then in l(lrge samples the OLS estimators ~o. Imperfect multicollin earity arises when one of the regressors is very highly correlated.! r~sult th aI .4) hold .2 . and can bc avoided. Th e joint sampling distributi on of th e OLS estima tors is discussed in more deta il for t he case t hat there are two regresso rs anu ho moskedast ic errors in Appendix 6. on STR.j = 0. Key Concepl6. ~1 arc jOin tly normally distributed and each is dis tributed N(P. in large samples. In each.
thai is. . Like the pre vious example. = 100 x X\Jf .: For every district.. OLS f:lil s beca use you are asking. it eq uals Xo. NVSj = 1 fo r all observations.ressioll includes a n inte rcept. hol ding constant the fraction of English learn· ers? Because th e percentage of English learners and the fraction of English learn e rs move together in a perfect linear relationship. exit l1l . be the percent age of "English speakers" in the jIb disuic t. Example #1: Fraction of English learners. = 1 x XOi for a l1 lhe observa tions in our dala sel. There are in [acl no districts in our data set with STR i < 12. i Because of t his perfect multicollinearilY. N VSi ca n be written as a perfect lin· ear combina tion of the regressors. Wbal is the effect of a un it change in the percenrage of E nglish learners. In ne Ola a nd d in Example #2: " Not very small" classes. added Let PCIES.. This il lus trates two important points a bout perfect multicollineari ty. the n one of the regresson. aod FracELi. A gain the regressors will be perfectl y mult icollinea r.2. Second. the smallest value of STR is 14.Thus one of th e regressors (PcIEL.. that eq uals 1 {or all i. specifically.At an intu itive le vel . defined to be the percentage of stu· dents who arc not Englisb lea rners. per fect lfluJticollinearilY is a statement about the data sel you have on hand. gres vides ulti inary very nlike al ion ~~ors. Example #3: Percentage of English speakers. . PerES. Th is regression also exhibits perfec t multicollinearity. the perfect linear relationship among the regressors involves the constant regressor X o. for every district. NVS j eq uals I if STR. as is shown in Equation (6. Thus. If the variable Fr(/CEL . as you can see in the scatterplot in Figure 4. While il is possible to imagine a school dist rict with [ewer than 12 students per teacher.teacher ratio in the jlh district is " not very small. First. so that Pc/ELi = 100 X Frac EL. be the fraction of English learn e rs in the i ll! district.7 Multicollinearity 207 Ie i'" m Ion r Let FracEL . but for a more subtle reason than the regressi on io the previous exa mple.) can be written as a perfect lin ear func tion of anoth er regressor (FracEl_ ).6. wht'n the reg. were included as a third regressor in addit io n to STR.PctEL j • . there are no s uch dis tric ts in our da ta set so we cannot anal yze them in our regression. that can be implicated in perfect multicollinearity is the constant regressor X o. 2:: 12 and equals 0 otherwise. mated Let NVS j be a bin ary variable that equ als 1 if the student. Th e reason is that PcrE L is the per· cenrage of English learners.6)." specifi call y. it is impossible to compute the OLS estima tes of lhe regression of Tes/Score j on STR h Pc/EL. this question makes no sense and OLS cunnOI answer it. Thus we can write NVS. whieh v<lries between 0 and 1. Now recall [hat the linear regression model with a n intercept can equivale ntly be thought o f as including a regressor. XOj. and PcrEL" the regressors would be perfectly multicollinear.
ca tegories: rU(31. Somctintc~ th.:r feet muhicolline<lrity... variables are used as regrc<. . if Rural..• 208 CHAPTER 6 linear Regression wit!.... J = XUj' whe re XIJj de no tes the conSlrmt regressor intra· d uced in Eq uat ion (6. Tn one way or another your software will le t you kn ow if yOU make such a m is take because it ca nno t compule tbe OLS estima to r if you h9\ ':· When your software lets you know th a t you have perfect mu ll icollineanlY. or dummy. Rllral.soli ...ual wa y \0 avoid the dummy variable trap is to exclude one of the binary variableS from the multiple regression.. lhe constanl term is relained. For ~xample. if there is an intercept in the regression.... SOT For example.: mistake is easy to spot (as in the first example) but sometimes it is not (as in tbe: second example). =0 PerCcet multi coll inearity typi CAlly arises when a mistake has been made in specifyi ng the regressio n.. a nd at a minim um you will be ceding control over your choice of regressors to your compute r if your rcgre ..1 of the G binary va riables are i. 111US. Some software! I) unreliable when there is perfect multicollinearity. the n the coefficient on Suburban. relative to the base cast' of Ihe omitted ca tegory. If yOli include a ll three bina ry vari· ab ies in the regress ion a long with a conslaDl . Solutions to perfect multicollinearity.ncloded as regressors.. 1.6).. In Ibis case. r SlIbw'balt. and if aU G binar: variables are incl uded as regressors. By conventiOn . all G binary regressors can be inc luded if the interce pt is o millcd fr o m the regrcs~i on. Le t these binary variables be Rurali _ which equals I for a rural district a nd eq ualsOolberwise:SLlburbanj...1" . in which case one of th e binary indica· la rs is exclude d . ho lding constant tbe othe r variables in the regression .(. This situatio n is called the dummy variable Inlp. suppose you have panitioned the school districts into three S. Ahernative ly.. + Urban. Each d istric t fa lls inlo one (and o nly one) category.. eithe r one of the binary indicators or the constant term. suburban. The u<. we re excluded ... a nd Urban ..11.. 10 eSliro3tc the regressio n. ~t is imponant that you modify your regression to elim inate it. lhe regressor Xflj) or PctEL. Multiple Regressors This example illustra tes another point: petiee! multicollinearity is a feature of the e ntire SCI of regressors.. would be the average difference between test scores in suburban and rura l dist ric l:'. the coefficien ts o n the include d binary variables repre sent the incremental effect of being in that category.. The dummy variable trap. you must exclude one of th ese fo ur variables.. Ihe regressors will be pe rfect m ulti collinearity: Because each district ~Iongs to one and o nly o ne ca tegory. . were excluded from Ihis regression .... AnOlher possible source of perfect muhl colLinearity arises when multiple binary. so on ly C . In ge ne ra l.. H e ithe r the intercept ( Le. jf lhe rc a re C bina ry variables. a nd urba n.. tht: regressors would not be perfeclh multicoll inea r.. ho ldiog cons ta nt the o the r regressors. then the regression wi ll fail because of p. ..... if each obse rvation fa lls in to one and amy o ne category..
Imperfec t mult icollinearity d oe s nOi pose a ny pro ble ms for the theo ry of the OLS esti.lnd Pcll::L Su ppose we were to add a third regres· sor. is the correlation between XI and X 2. Imperfe ct multicoWnearit). In o the r words. but ra the r just a fe ature of OLS. Firstgenera tion immigrants ohen speak E nglish as a second language.la ta to estima te the pa rtia l e fftet o n test scores of a n increase in PC1EL. The effect of imperfect mul ticol\ineo rity on the variance of the OLS estimators ca n !:'Ie see n m a t ~ ematica lly by inspec ting Equation (6.2.at is highly corre la ted with a nothe r regressor.se of I' all G :ssion. icall)' . th ey will have CI large sampling vari a nce.HScore o n STR ..The la rger is U correlation betwee n the two regressors.6. so the vari · IEL a nd pt reeot age immi gra nts will be highly correlated: Districts with a bies PC ma ny re cenl immigrants will le nd 10 have ma ny stude nts who arc still learn ing E ng lish. Fo r example. If the va riable s in your regressio n a re the o nes you mea nt to includethe o nes yo u chose to address the po te ntial fo r omitted varia ble biast hen impe rfect mul ticoll inea ri ty impJic£ tha t it will be d iffic ull 10 estima1e precisely one o r mOTe of thc parlinl e ffects usi ng Ih e da ta ::It hnnd .cs thl! il1 the ir yOU have.1 7) in Appendix 6. however. o r vice versa.) for the special case of a ho moske dastic e rro r. + lro eof . then the OLS estima tor of the coefficient on PclEL in this regreSSion will be unbiased. in the sense thatlhere is a linear func tio n o f the re gressors th. means that two o r more of (he regressors a re highly co rre la ted. a purpose of OLS is to so n o ut the independent influences of the var io us regressors whe n these re gressors a re po tentially cor related.. whe n multiple regressors are imperiectly multico ll inear. the perce ntage th e distTict 's resid ent s who are first ge neration im migra nts. yo ur da ta .net 'ari iulti I. will be imp recisely est imated. tl }' It i 'es· ree ne) . impe rfeci m ulticollin earity is not necessarily an e rror. tlle variance of {31 is inverse ly pro portio na l to 1 . If the least squares assumptions hold. it will have a larger variance than if the regressors PCIEL and percent age immigrants we re uncorre lated. Because these two variables a re highly correlated. which is the variance o f III in a multiple reg ressio n with two re gressors (Xl a ~d X. and the q uestion you are trying to answer. 7 Multkolineority 209 of Imperfect Multicollinearity D espite its sim ila r name. indee d. If the regressors are imperfect ly muhicoUinear.x . the closer is th is term to zero le and th e large r is the variance of (31' More generally. By ica t on and one inar)' per· usual abies l uded eprc . . th en the coefficients on one o r more of these regressor. it would be d ifficuJt to use these <. it o re I ~ va will tM: ress ots . the data sel pro vides li ule informa tion abo ut wha t happe ns 10 lest scores when the percentage of Eng lish learne rs is low hut t he fraction of immigrants is high. ho ld ing consta nt the percentilge immig ra nts. consider the regresslon of Te. arilY.] n contrast .ma to rs. Perfect multi colli nearity is a problem that often sign al s th e presence of a log ica l e rro r. imperfect muhieoll inearity is conceptually quite differ e nt th an perfect multicoll inearity.x!" whe re Px. t.. then the coefficients o n alleast o ne ind ivid ual regressor will be imprecisely estimate d.Ih a t is. In this case.P}.
4 are satisfi ed .J. The multiple regression mooel is a linear regression model tb al includes multipk regresso r:. . Beca use the regression codficients are estimated using a single Sam p le. holding co nstant the perce ntage of En \! li sh learne rs.8 Conclusion Regressio n with a single regressor is vuloe rable to o mitled variable bias: If an omilled variable is a de te rminant o f the de pendelll va riable and is correlated W ith Ihe regressor. 13 1. .. consistent. and no rmally dislributed in large sam ples. ing the omitted varia ble in Ih e regression..13k .5Of~ 6 .. holding Ihe other regressors constant.. Thi s sampling uncertainlY must be quaoti[ied as part of an e mpirical stud y. XI in multiple regressio n is the part ia l effect o f a change in X I_holding co ns tant Ihe olher included regressors. plu s a fourth assumption ruling uut perfect mulli collinearity.IJOil change in XI. us ually arises from a mistake in chuosing wh Ich . Summary 1.. X I' X 2..:oefCicienl will be biased and will re neel both lhe effect of tile regressor fi nd the e ffect of the omitted variable. Perfect mu lt icollinea rity. The statistica l theo ry of multipl e regression builds on the statistical theory uf reg ression with a sin gle regressor. Do ing so red uced by halt the estim ated effect on t e~ t scores of a change in the student teacher ra tio. there fore... The other regn.f English lea rners as a regressor made it possibl e to es timate ute effect on test l)Corcs of a change in the student. Muh iple regression make) il possible :0 mi tigate omilled varia ble bias by ind w. 13l . <lod the ways to do so in the mult iple regre ssion model 3re the lopic of the next chapter. the OLS estimators have a joint sampling dis tribu tion and . In the test scort: exa mple. The least sq uares ass ump tions for mu lt iple regn:ssion are extensions of the three least squares assum ptio ns (or regression wi th a sin gle regressor. 4. When the four least squares assumptions in Key Concept 6. .teac he r ratio. includ ing Ihe percenlage I.. The coefficie nts in multiple regression can be estimated by QLS. The coe ffi cie nt on a regressor. 2... the OLS estimato .• 210 CHAPTER 6 linear Regreuion with Multiple Regres. . Associated with each regressor is a regression coeHi· cie n! . then the O LS es timator o f Ihe slope . s io n coefficie nlS have an analogo us interpretation.. X k.. which occurs when one regressor is a n exact linear fu nction (If the olher regressors. have sampling uncertainty. The coeffici ent 131 is the expected change in Y < Issociated \vith a one. 3. Q milled va riable bias occurs when an omiued va ria ble ( 1) is correlated with an included regressor and (2) is a determ inant of Y. a ~ unbiased .
(Jk (197) OLS regres!iion line (197) predic ted value (197) O LS residual (197) R! and adjusted RZ(RZ (200. {3. be an unbiased esti mator of the effect on test scores of increasing tbe number of computers per student? Why o r why nol? If you think 13 1 is biased. Key Terms omitted variable biaS ( UP) multiple regression model ( 193) popula tion regression line (193) population regression fu nction (193) inte rcept (19 3) slope coerticient o f XI. she rcgrcsse~ di strict average test scores on the num be r of compu ters per student. The standa rd error o f Ihe regression. Will (3 . regrCC.2 A mulliple regressio n includes IWO regressors: YI :: f30 + I3\X •• + ~X2J + 11.nclude in a m ulliple regfes~ i o o . G ive two examples of a pair of perfectly multi collinear regressors. Solvin g perfect m ulticollinearit y requires changing the set of regressors..1 A researcher is in terested in the effect o n test scores of co mputer usage. the R2. 201 ) ) perfect multicollinearity or to exhibil pe rfect muh icollinea rilY (204) dummy va riable tra p (208) sam have ofa n \he Review the Concepts 6. a nd the J?2a re measures o f fi t fo r the muhiple regression mode l. ~t lineal g whid' l " ._ . is it biased up or down? Why? 6.1. 5._ Wha t is the expected c hange in Y if X I increases by 3 un its a nd X 2 is uncha nged? What is the expected cha nge in Y if X l decreases by 5 units and X I is unc hanged? What is Ihe e xpecled change in Y if X I increases by 3 units a nd X 2 decreases by 5 unilS? 6.3 Explain why two perfectly multicollinea r regressors cannot be incl uded in a linear mull iple regression. ~he fou f IO rS ~Ire I . . an coe W· cd with Using school district data like that used in this chapter...Review 1M Concepts 2 11 an regresso rs to i. . (193) coefficient on X l i ( 193) slope coefficient o f X2i ( liJ3) coefficie nt on X 2i ( 193 ) con tro l varia ble (193) holding Xl conSta nt (194) cont rolling for X (194) pa rtial effect (1 94) popu lation mul tiple regression model (195) conslliat regressor constant term (195) homoskedast ic (195) he te roskt:dastic (195) OLS estim ators or f3o.
O o therwise) W(.3 Using the regression results in column (2): a. The highest educational achie'y e. than l. Sa lly is 29year·old fem ale college graduate. a Olherwise) Midwesl =. ment for each worker was either a high school diploma or a bachelor's degn=e. For the purposes o f these e xercises let A H E = average hourly ea rning!> (in 1998 dolla rs) College = binury variahle ( I if college.:Slimale precisely the partial effect of XI' hold.4 Using the regression resuhs in column (3): fl. marital ~Ia· IUs. 0 if male) Agt" = age (in ye:lrs) N ln easf = binary variable (1 if Regio n = North east. O o the rwise) South '= bin<lTY variable (I i [ R egion "" South . 'Inc da la sct consists of infor ma tion on 4000 Culltime fullyear workers. Predict Sally's and Be tsy's earnings. 0 ing X 2 conslanl. Do there appear to be important regiona l diffe rences? h.4 Explain why it is diffic ult 1 t. computed usi ng d ata fo r 199R from the CPS. 6. The data set also contained in fo rmatio n o n the region of the country where the person Jived.(2 are highly correl ated . if X I and. Exercises The rirsl four exercises refe r LO the table of estimaled regressions on page 2 1~.h inary variable ( I if R egion = Midwest .1 C ompute R1 for each of the reg ressions.2 Usin g the regression results in co lumn (1): II. h. Why is the regressor West omitted rrom the regression? What would happen if it was included'! .I. Do workers with co llege degrees earn marc. on average. Is age an important determinant of earnings? Explain. Betsy is a 34y(? arold female college graduate.212 CHAPTER 6 linear Regre~ion with Multiple Regres~ 6.'SI = b inary va ria b le (J if Regio n = WesLOothe rwise) 6. orkcrs with only high school degrees? How m uch more? b.O if high sc hool) Female = binary variable (I if fem ale. a nd number of ch ildre n. 6. Do meD earn more than wome n on ave rage? H ow muc h morc? 6. The worker's ages ranged fro m 25 10 34 years.
Lsiz.46 5.62 2. .156Hsiz.29 .64 2..69  0. \ workers !re? 6.62 0. Norlhe3s1 (X~) Midwest (X 5) South (X6 )  0. .2 + O.OO2Lsize ~ 48.44 .5 Data were collected from a random sample of 220 home sales fro m a com munity in 2003 .48alh + O.72.". Reg". and Poor denote a 0 binary variable that is equaJ 1 J if the condition of the house is reported as "p OOL" An estimated regression yie lds • i\at wo uld 'PriCe = 119.e + O. Hsize de no te the size o f the ho use ( in square feet ).5.22 0. Let Price denote The selling price (in $1000).Age de note Ihe age of the house (i n years).(Xl ) FeOla!\: Age (X.quare (eel ).194 0.190 6. 4(0) " 4000 4(0)  c. 801ft denote the number of bathrooms.27 0. BDR denote arold lhe num ber of bedrooms..ssor ( 1) (2) (3) Ige 2D. Jua nita is a 28yearold fema le college graduate (rom the South . Cal culate Ule expected difference in earnings between Juanita nnd Jennifer.69 '4<) 3. 176  .21 0.60 . College (X I) 5. 0.2.Exerci~~ 213 I hold Results of Reg ressions of Ave rage Hourly Eornin9 ~ on Gender ond EduCOhon Binory Voriobles ond Other ChoracterishCS Using 1998 Octo from the Current PopulotiOfl Survey Dependent variable: aven:lge hourly earnIng.o tained rltal sta r::. Jen nifer is a 28yearold female college graduate from the Midwest. :{ info r I h'\e\'e Ie degree.75 SER R' R' 6.09OAge .48 5. + O.e denote the 101 size (in :.027  Inicreepi Summary Statistics 12.8Poor. SER = 41. lAME ).4HSBDR + 23.29 0. 'R2 .
He plan s \0 regress th e COll n!. elh nicilY. Include a d iscussion of any addi tional !. age.n Equa tion (6. Compur e the R2 for the regression. To deTermine potential bias. (That i\ do yo o trunk that PI > III or /31 < 131?) 6. a. Explain why this regression is likely to suffer fr om omitted variable bias.or unde restimate the effect of police on the crime ralc. 6. W'b.lata tha t need to be oolicCled and tbe appropliate sta tist ical techniques for ana· Iyzing the data.214 CHAPTEII 6 l ineor Regression with Multiple Regressors a. He collects data on a random sam ple of people who have been out of prison fo r a t least [if tee n yea rs. A researcher is interested in determining whether ti me spent in pri. What is the expected increase in the value of (he house? b. wh ich inc reases the s ize of the house by 100 square fee t. A researche r is interested in delermining WbeUler a la r g~ aerospace firm is guilty of gender bias in setting wages.S. He collects similar data on a random sample of people \.on has a permanent effect on a person's wage ratc. Wha t is the loss in value if a homeowne r le ts his house TUn down so tha t its condition becomes "poor"? d. counties. Suppose that a homeowner converts part of an exist ing family room in her house into a new bathroom.\'ho have never served time in prison. Your c ritique should e xplain any proble ms witb tbe proposed resea rc h aDd descr ibe how the resea rch pla n might be improved . th e researcher collects salary and gender jnforma tion for all of the firm's engineers. gender. education . The data set includes information 00 each pe rson's curre nt wage. U.. Suppose tha t a homeowner adds a new bathroom to her house.7 Critique each of Ihe following proposed research pla ns./s crime rate on the (per capita) size of the county's police force. The researcher theo plans to conduct a "differ ence in means" test to determine whether the average salary for wome n are significantly less than the ave rage salary for men. A. Use your answer to (a) and the expression for omitted variable bi a~ giVe\l i. tenure .1 ) to delermine whether the regression will likely over..ich variables would you add to the regression to conlrol for imponant omitted variables? b.6 A researcher plam to study the causal effect of police o n crime us ing data from a random sa mple of U. What is the elCpcctcd increase in the value of the house? c.
2.2 : . 111is calculation was then repea ted for people sleeping six hours. A random sample of size n = 400 is drawn from the populalion. ted 6. Co mpute the varia nce of ~I ' c. .) co ial of "r b. occupation.on 0 11 t model Y j = f3 1X li + f32X 2j + u. the variance of PI i ~ Jarger than it wo uld be if X l an d X2 were uncorreialcd. and ullion status.Exercises n 21 S (time in curren t job). if you are interested in # 1' il is beS1 to leave X ] out o f the regressio n if it is correlated with X l'" 6.9 (Y" X I!.8 A recent study found Ihal the death nHe fo r people who sleep six to se ven hours per night is lower than the death ratc for peopl e who sleep eight or mOTe hours.) = 4 and var(Xli) "" 6. Each survey responde nt was tracked for four years. Assume that (or(X}.5. and higher than the dea th rate for people who sleep five or fewer hours. and so on). ata y's s ely i. . . X 2) = 0.11 (Requires calculus) Conside r t he re~ ress i o n ri ~oi1 on ' fif Ie \liM ·. Comment on the folloA ng Malements: "When X I and X z are corre wi laled. who sleep nine hours per night consider reducing their sleep 10 six or seven ho urs if they wam to pro long the ir lives? W hy or why not? E xplain . Does this estimator suffer from omitted variable bias? Explain. tenu re. IX 1" X 2.1 million observations used fo r this study came from a random survey of Americans aged 30 to 102. 6. X li) salisfy the assumptions in Key Concept 6. . inclu di ng in the regression the other poten lial determinants of wages (education . Compute the variance of 13}· [Hint: Look at Equati on (6.llms. The 1. 17) in the Appendix 6. You estimate f31 by regressing Y onto Xl (so that Xl is not included in the regression). union status. oukl tho: dat<l ana 6. Tne researcher plans to estimate the effect of incarceration on wages by regressing wages on an indicator variable for incarceration. Xli' X 2.10 (Y i . a.no (Notice that there is no eonstant term in the regression . m ' for i = 1.) satisfy the assumpti ons in Key Co ncept 6. Assume that X l and X l are uncorrelated . in addition. would yo u reco mmend that Ame rican!. as well as whether the person was ever incarcerated.4. Based on this summary. and so on.4. The death rate for people sleeping seven hours was calcul ated as the ratio of the number of deaths over the span of the study among people sleeping seven hours to the 10lal number of survey respondents who slept seven hours.) Fol lowing analys is like tha t used in Appe ndix 4. var(II. You are interested in f3 1• the causal effect of X I all Y Suppose that X I and X l are ullcorrelated .
D erive an expressio n for ~l as a functi on of Ihe data(Y"X 1"X2l ). Specify the leas! squares funClion tha t is min imized by OLS. a. b.• 2'. + /3)(21 + It. Run a regression of Course _c. Compute the pa rlia l de rivati ves of the o bjectio n fu nction with respect to b l and b 2• c.. E6. and NN English. OWl/home._ Show that the teast squares estimators Empirical Exercises E6. Show lhat PI  17 \ X 1/"y. .. i::= 1. . He teaches a threecred it upper·d ivision course. Wh at is lbe estimil ted effect of Beamy on Co/me_Enll? Does the r egrc ~s ion in (a) suffe r from im ror· tanl o mitted va r iable bias? c.. Fem(lle. What is the estimated slope? ((I b. and SMlllfg80. Hispan ic. Dat/Coll. Suppose ~~. but include some add itional regrc!< sors to control for characteristics of the student. inc lude as additional regressors Imra. "" O. include as additional rcgn:~' sors Bytest. 11. "* e. and the local labor market In particuJar.P .. R un a regression of years o r completed education (ED) on distance th e nea rest college (Dis /). Ru n a regression of UJ/me_Eval on Beamy. J l~~ l xii' d.P2X2' I . Minoriry. Incoillehi. CHAPTER 6 Uneor Regression with Mo~ip\e Regressors a. carry oul the fo llowing exe r ci s~s. a.X] + PIX!. Professor Smith is a black male with average beaUly <l lld is a natl'ie English speaker. OneCredir.c<lf0 out lhe following exercises. In partic ular.. I XIi'\'21 O. the student's famil y. Pre' dict Professor Smith's course eva luation. B lack. What is Ihe estimated slope? b. includi ng some additional va riables to control for the type o f course aod professor c ha racteris tics. Suppose thaTthe model includes an intercept: y / :: f30 satisfy P) :: Y . Run a regression of ED on Dis/.2 Using the data sel ColJegeDistance descr ibed in Empirical Exercise ·U. Suppose:t~ l XI /X.1 Using the da ta se l TeadungRarings described in EmpiriCA l Exe rcises 42. ClIeBO. Female. What is the estimll ted effect of D isl on ED'! .·tll on B~tlUly.
lind his family owned a home. His family income in 1980 was $26. years at comple ted schoolin g using the regression in (b). YearsSchool. Assassinalions.car ry stance to regres fa mily. and the Slate aver age manufacturing hourly wage was $9.5'%. TradeShare. Rev_Coup. E D? Co Use the regression to predicl lh e ave rage annual growth rate for a co untry that has average va lues for a ll regressors. R 2 and sion (b)? /i 2.pect c. His baseyear composite test score (B y test) was 58. Is it large or small in a realworld sense? 4. E xplain why Cue80 a nd Swmfg80 appear in the regression. Pre dict Bob'!:. of Impor a tive ·se. a. ca rry OUI the following exercises. Construct a table tha t shows the sa mple mean. His high school was 20 miles from the nearest col lege. but hi s fath er d id not. What does this coefficient measure? tors f. does Ihe regression in (a) seem 10 suffe r from importan t o mitted variable bias? d. Include the appropriate units for aH entries. Re peat (c) but now assume that the country's value for TradeShare is . Compare the fil of the regression in (a) and (b) using the regression n of !>ta ndard e rrors. but exclud ing the da ta fo r Malta . Bob is a black male. o ne standard deviation a boye tbe w ean . RGDP60. A re Ihe signs of their esti rnalcd coefficients (+ or . Why arc the R 2 and R2 so similar in regres e.3.000. H is mother ses 4. Is the es tim ated e ffect of Dist on ED in the regression in (b) substan tively diffe re nt from the regression in (a)? Bast:d o n this. attended college. Jim has the sam e characteri sti cs as Bob excepL (hat his high school was ted ditional teris edit.'1lle unemployment rate in his county was 7. standard deviation. Pred ict Jim·s years of complt:ted schooling using the re gressio n in (b). d. Yea rsSchoo/. Pre· 40 miles from Lhe nearest college. b.4.) wha t you would have believed? Interpret rhe magnhudes of these coeffic ients. Rev_Coups.2. a nd mi.t. h. Run a regression of G rowth on TradeShare. Assassinations and RGDP60. I rcgrcs (ldCol/. E63 Using the datti set Gwu1h described in Empirical Exe rcise 4. Whal is th e val ue of the coefficien t on Rev_Coups? Inte rpret the vldue of this coe fficient.75.nim um and maximum va lues for the series Growth. Oil.Empirical Exercises 217 . TIle va lue of lhe coeffic ie nt o n DadCofl is positive. g.
(6 Ib) Under the last two assumptions in Key Concept 4.x . _ X)"." (6. (1l 11) . Whcn there lire two regresso rs.2 Distribution of the OLS Est imators Wh e n There Are Two Regre ssors and Homoskedastic Errors Although the general form ula for the variance of the OLS estim:ltors in multiple regrl!s· sion is complicated. ._11 l h is appendix prc~'nls a derivation of the formula for omillcd variable bias in Equ31 ion (6. X b ) . .1 ). 11_1 _ _ f31 = f3 1 +'~ i 1 1 L (X.I~ I WO rcgrciWrs (k = 2) und the errors are hnmoskeda.JO) in Ap pendix 4. where th. IT).lTj.X) ' lI _ .ltors.\i. SubslilUlion kk~I(Xi  X Y ~ O'~.L P. .fT!.3 st ates tha I I " .he distribu tion vf th . Why is Oil omiu ed from the regression? What would ha ppen if it weft..::cause the errors are homoskedastic.: induded? APPE NDIX _ _ _6 1 Derivation of Equation (6.. . X II und X 2 and the error ternl l ' homosktdastic. _ .1 ).xJ = px~rr~ux. can be writll:n var(u" X II . APPE NDIX 6.in large samples the sampling distribution of P I is N(PI.• 218 CHAPTER 6 linear Reg(e$$ion with Multiple Regres500 e. if there are OLS estim.5. Equation (4.( X... 1 ~' . and of t hese limit s iolO EqUDl it. . B..· . ~ COV (II/. n I [t ( Pi.3.:. * ~: .~ ..). 1) _.16) yields Equal ion (6. the conditional variance of /1.: \ nri· ance of thls distribu\ion . . tben the formula ~jmplifies enough to provide some insights into .X )II. is ~" /1.
isthe population correlation between the two regressors X I and X~ and u } is the population \'aria nce of Xt.Pk .\ ' were close to O.\. Ihen ISciosc to I.I~) .. If X Land X 2 are highly correlated.{J2) . 18) (fi. 17) pl .x: in the denominator of Equ3tion (6. A no ther featu re of the joint normal largesmnple disuibutio n of the QLS esti mators is Ih:lt t WO P and ~ are in general correlated. The variance ujl of (he sampling dis tribu tion of P.. When the errors arc homoskedastic. is small and the variance of P I is 1:lTgcr than il would be if P'r l . depends on the squared correlation between the regressors. either positively or negati\O.Di~tribution of the OlS E~timom W'hen There Are Two Regreuors ond Homoskedoslic ElTOfs 219 where Pxlx .· (6.and th us the term I .Px l.x.~ly... the corre l lation berween the OLS estimalOrsp t and ~ is the negative of the correlation betWeen the regressors: corr(j3t.
0 This chaple r presents me thods fo r quantifying the sa mpling uncertainl)' of the OLS estimato r through the use of s tandard errors. Thl.1 extends th e me thods for statis tical infe rem:e in regression . O ne new possibility thai arises in multiple regression is a hypoth esis that simultaneousl). in volvc~ a .! coefficients o f the multiple regression model can be estimated by O LS... Deciding which I'ariables to include in a regression i~ an im port ant practical issue.2 lind 7. In Section 7. so Section 7. lhe O LS estim<l lor has sampling uncertainly beca use ils value differs from one sample 1 the nex t. muJtipJe regression analysis provides a way 1 0 mitiga te lh ~ problem of omilled va riable bias by including addi tional reg ressors. we appl y multip le regression analysis to obtain improved es tim.CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regression A s discussed in Ch<lpt<:: r 6.teacher ratio using the California tes t St. and confiJe nce intervals.6.4 ext(!nd~ the noti on of confidence intervals for a single coeffi cienll O confidence seh for mul1iple coeffici ents. Like nil estima tors. in volves two o r mo re regression coeffici ellts. the F·statistic. thereby controlling for lhe effec ts of those additio na l regressors.3 show how to teSI hypotheses thaI involve lwo or more regression coefficients. The general approach to testing s uch "jo int " hypotheses new test stat istic.5 discusses ways to approach this pro blem. Section 7. Sections 7. ilb a single regressor to multiple regression.. SlaliSlica l hypolhe!li!l lesls.:ore data se t.:lIes o r the e ffec t on test sco res of a reduction in the student. Section 7.
The important point is that. Under the least squares assump led 10 lhe estimator tions. H( {3J *.(3).ion ivcs a with" test regressors. The square root of . on the regressor takes on so me specific va lue. TIle OLS ei>l imator of the i" regression coefficient has a stand ard devia tio n. so for example tti Icrt ~ J.a re the same whe the r o ne bas one.ndS elS for is an t r 110: f3j = f3 j .g conce ptually different betweeo the single· or multiple· regressor cases.1) .teacher ratio exam ple. All this extends directly to multiple regression. £rom the decision· making context of tbe application. Standard Errors for the OLS Estimators o Recall thai . The key ideasthe largesample normal it y o f the estimators and the ability to esl. ho lding constant lhe pe rcenlage of English learn ers in tbe district. l 11c all lifters P I . as fa r as standard e rrors are concerned. (7. y or. il was possible to estimate the vari ance or the OLS estimator by substituting sample averages for expectations. which given in Equation (5..o comes either from economic lheo.4). If Ihe allc rnative hypoth esis is twosided .The null value f3}.7.o \IS.. two.l1le fo mwla for the standard error i1> most easily stated usi ng matrices (see Sectio n 18.. and how 10 construct confidence intervals for a single coefficient in a mult iple regression equation. in the case of a single regressor. there is nothi. as in the sludenl. yo( .o (twosided alternative) .. OJ1 is the standard error of ~I ' SE(f3I)' an estimator of tbe standard deviation of the sampling dislribUlion of P I. we migh t want to test the hypothesis that the true coe[[icient f3. then the two hypotheses can be wri tten mathematically as ext.). how to test hypotheses. and this standard deviation is eSlimated by its standard error. o r 12 OJ. SE(p. More generally.2).1 HypoIhesis TM and Confidence Intervals for 0 Single Coefficieot 221 7.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient This section describes how (0 compute the standard error. This corresponds to hypotbesizing that the true coefficient f3 1 on the studentteacher ratio is zero in the population regression of test scores on STR and PcIEL. {3"o. Hypothesis Tests for a Single Coefficient Suppose tbat you want 10 lest the hypothesis that a ch ange in the studentteacher ratio has no effect on lest scores. I PI ~. the la w of la rge numbers implies thai these sample averages conve rge to the ir population counte rparts.imate consistently the standard deviation o f their sa mpling distribution.
1. if the first regressor is S TR. The theoretical underpinning of Ibis proced ure is that the OLS esti mato r has a largesample no rmal distributio n which.. . _ 13/ I  SE(~i)." =i" .ressor.. and that the variance of Ibis distributio n ca n be estima ted consistently. ~ (7.1. alternat.'/ is the value of the (statistic actually computed. The first step in this proced ure is LO calculall! the standard error o{ the coefficient .:~ Concept 6.equivalcnt!y.05 or..1 j.3) .statistic 10 the critical value corresponding to the desired signif· ica nce level o f the tesl. tben the nuU hypOlhesis that changing the stude ot.statistic.d Confidence Intervals in Multiple Regre~~ion r.0 j j '" • J.. 10 compare the .statistiC actually computed i' . 2 CHAPTER 7 Hypothesis Te~t~ or.O" The variance of this distritlU' {ion can be estimated consistently..0 arc computed au tomatically by regression software..J~ P i The procedu re for testing a hypothesis on a single coefficient in muluplt regression is summarized as Key Concept 7. Und\!! the null hypothesis the mean o f this d istri bution is I3j.~'.Q. Our task is to test the null hypothe:<. Compute the .."e1}. Compute the standard error of /Jr SEcfl).tcache r ra tio has no effect o n class size corresronds to the null hypo thesis that /31 = 0 (so 111.. CompUie the pvalue. pvalue =..5. KEY CONa" AGAINST THE ALTERNATIVE h /3.!.. Therefore we can simply follow the same prO"" cedure as in the singleregressor case to lest the null hypothesis in Equation ( 7. Ho aga inst the alterna tive H I using a sample of data. Reject the hypothesis at the 5% significance level if the pvalue is less than 0.2) 2.. has as its mea n the h)'P0thesized true va lue.value testing f3j =. The standard error and (typically) the [statistic and ". The second step is to calculate the {·statistic using the g\!n' eral fo rmula in Key Concept 5. Key Concept 5. the sampling distribution of is approxima tely normal.0).i. irt llK"l! > 1. The third step is to compute the pvaluc of the lest using the cumulative nonnal dislJibutJon in Appendix Table 10 r. f3jJJ 3.. 111i5 unde rpinning is present in multiple regression as well. under the null hypothesis. • where . (7. As stated in K.0 =.. The .~~ ~HE HYPOTHESIS i3 = i3 0 _ _.p 22.. For example.2¢( ! t""cl). 7.2 gives a procedure for testing this null hypothesis when Ihere is II single rt:g.96.
2) (7.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient 223 (7.3) CONFIDENCE INTERVALS FOR A SINGLE COEFFICIENT IN MULTIPLE REGRESSION A 95% (wosido.965£(/"91. K.96Sf. (7. il is customary to denote thI s simply as t. it comains the (rue value of /3..2.1.16 in Equation (7. Equivalently. When the sample size is large. CONCEPT 7.tc':lcher ralio.2 rely o~ the huge sa mple normal approximat ion to the distribution of the O LS estimalor f3.!d confidence interval for the coefficient {3.645. The regression of test scores against STR and PCIEL.7. lhat cannot be rejected by a 5% twosided hypothesis les!.. controlling for the percent age of Engl ish learners? We are now able to find QU!. n \ll ultipl.1 1.. Accord ingly.s distritlU' same pro ~tion (7.12) and is restated here wit h standard errors in parenthcses below the coefficients: ed in Kt:~ ~Il der til': ..4) A 90% confidence interval is obtained by replacing 1. the 95 % confidence interval is 95% con fidence interval for f3. fSis at t~·:~ denoted [. (lnd we adopt th i~ simplified nota tion fur the rebt of the book. 1111S method is summa rized as Key Concept 7.!' d the null ~ria n cC (If Application to Test Scores and the StudentTeacher Ratio Can we reject the null hypothesis that a change in the studentteacher rat io has no erfect on test scores. in 95% of all possible randomly d rawn samples.: bmput ed IS . e there n lt3ndard the gen r . Iy./ in this Key Conce pt. The mEthad for cond ucting a hypot hesis test in Kcy Concept 7. ~angi ng r null he lesis Ho Confidence Intervals for a Single Coefficient T he m e thod for constructi ng a con fide nce int e r val in the multiple regression model is (lIsa the same as in the si ngl eregressor model. Howeve r. was give n in Equation (6.2  "" [~i .(i3)'~j + 1. 1 and the method for constructing a confidence interval in Key Concept 7.. estimated by OLS. is a n interval that con tains the true value of [J/ wit h a 95 % probability: that is.1.4 ) wilh 1..e of the I . once we control for the percentage of English learners in the district? What is a 95% confidence interval fo r the eft"cct on test scores of a c hange in the stude nl. it sho uld be kept in mind that these methods for qua nti fying the sampling uncerta inly are only guaranteed to work in la rge samples. it is the set of values of P. natIve ~ tha i the C sign.I.
she moves 0 0 to a mo re nuanced queslion.6)."' n li. and th e percent3ge of English leam ers:111e OLS regression line is ~ = 649.0. Your analysis o f the multi· Adding expenditures per pupil to the equation pic regression in Equation (7.h.1 %.1.7) (0.224 CHAPTER. or by ask ing (or an increase in be r budge t.2).S) (0.54.1. lI she is to hire more teach ers.. The resu lt is striking.% X U.43 .. reducing class s ize will help test scores in her district.O?Q M nrrnvrr .3.hi' t.·statistic in Equation (7.. lolal spe nding pe r pupil . it is 0 1111 .5) l1as persuaded the supe rinte nde nt th at. the smallest significance 11. (R. Hypothesis Tesb end Confidence Intervals in Multiple Regression ~ = 686.1. the null hypothesis can be rejected al the 5CY. we first need 10 compute the .1 0 X STR .95 a nd . Because the p.1% .6) ( 15.43 = ( .5) (0. changing tbe student. aft er adding Expn as a regressor in Equation (7. of ' he ~fficientlj':. based on the evide nce so far .95.lcache r ratio by 2.650 X PCIEL.reduced maintenance.5) but.0.J . The associoted p·"aJuc is 24>( 2.1. which.032) where Expn is total annual expenditures per pupi l in the district in thousatlds of dollars. a nd so on).26 X 2) = (.0)10. A 95% confidence inle rva l (or the popula tion coefficienl on STR is 1.6 .she asks.~ d by estimat ing a regression of lest scores on the studentteacher ralio..2. the ' statistic is' = (. taxpayers do nOI favor..10 .52).cry sma ll effect on lest scores: The estima ted coefficien t on STR is \ \0 I~ Equation (7.:\1. 0. Because the null hypothesis says that the true value of this coefficient is zero. Ihal is.43) (7. Wha t.". . thc 95% confidence interva l for the effeci o n test scores of this reduction is (.'1 at which we can reject the null hypothesis is 1. however..I''''_ _ __ _.1.656 X PuEL. 'n1f~ value.0.26).59) (0 . Holding expenditu res pel pupil a nd the percent:lge 01 English learners constant.0.90..87 X Expn . is the effect on test scores of reducing the studelllteacher ntio. significance level (bu t not quile at the I % significa nce leve l). Inl e rprelt~ d in the conte xt of Ihc s uperinte nde nt 's inte rest in decreasing the SlUde nl. that is.U .11) z: 1. .0 ..48) (1.26.val ue is less thall 5%.lir rnr u""lin~ Th AI . she can pay for those teachers e it her through ellis e lsewhe re in the budget (no new compute rs.29 X STR + 3. we ca n be 95% confide nt Ihal the true value of the coefficient is be twee n .031) To test the hypothesis that the true coefficient on STR is O.teacher ratio is esti mated to hat' a . Now.95 X 2.54) = 1. holding expendi tures per pupil (and the per centage o f E nglish lea rne rs) constant ? Th is question ca n he addre. 0. (7.
and to test it we need a new lool.95 x 2.0. O ur a ngry t axpayer hypo thesizes that neither the stu dent.48 = .60 1 < 1. thereby reducing cl ass sizes while holding expendi lures constant.6) indicates thai this transfe r would have little effect on test scores. Thus Eq ua tion (7. rcm ulti based 00 iet. introduce d in Section 6. o nce • . (7.5) to 0. tech nology. Suppose.645 ). d the per zero is now ( = ( . and so o n) and transferri ng those fund s to hire more teachers. ODe inte rpretation of tlle regression in Equa tio n (7.7 in the context of imperfect m ulti collinearity. c level 'O ( ss Ihan quile JUG " ue value It of the t he 95% ~ . Wha t a bout o ur angry taxpayer? He assens th at the po pulation values of bOTh the coefficient on the slUde nl. The t ax p flye r '~ hypothesis is a joint hypothesi s.6 ) is I = 3.teacher ratio.43 ill Equatio n (7.6) is thaL.43.0. expendi tu res pe r pupi l.0) /0. Now. cL.teacbcr ratio (ft. res on tnt . he hypothesizes t ha t both 131 = 0 a nd {32 "" o.1.teacher ratio nor e xpenditures per pupil have an effect on test scores. in these Cali fo rnia da ta.87/1. th a t corre latio n be tween regressors (the correlation between STR and Expn is . If so. tba t is.r English 7.2 Tests of Joint Hypotheses TIli s sectio n describes how 10 fonnulate joint hypo th eses on multipl e regressio n coe ffi cie nts and how to test th em using an Fstatistic.6). This illustra tes the gene ral pa int . Note tha t the standard e rro r o n STR increased whe n Expn was added.) were negative and larg~ .2 Tests of Joi nt Hypolhese~ 225 (7.6) o f the test sco re aga inst the stu de nt. However. school administra to rs a llocate their budgets effici ently.59 = 2.54. so the hypothesis thallhc population value of this coefficicnI is indeed zero cannot he rejected e\'cn al Ihe ]0% signif· icance level (1. lila! the coefficie nt on STR in Equatio n (7. spans. the small a nd statistically insignit'icant coefficient on S TR in Equation (7.62) can make th e OLS estimators less precise. thi s reasoning is nawed.0. districts are alread y alloca ting the ir fu nds e fficie ntly. I e teach r Jdgel (no Icrcase in cst scorC!. and the percent age of English learners. the F sta listic.) and the coefficient o n spe nding per pupil ({32) are ze ro.29 . fro m 0.0.6) provides no evidence thaI hir ing more leachers improves tcsl scores if overall expe nditures per pupil are be ld constant.48 in Equa tion (7. school districts could raise the ir test scores simply by decreasing funding for other p urposes (textbook s.5) ccd to hat the 2. Put diffe re nlly.01 )usands of Testing Hypotheses on Two or More Coefficients Joint null hypotheses_ Consider the regression in E quation (7.7.60. Altho ugh it might seem that we can reject thi s hyp o tht:sis because the Is ta tistic testing /32 = 0 in Equation (7. co un Ic rfaclUaily.
"" O. for a total o f q restriclions.. · ..' null hypolhl!si:. minology wc can say thatt hc null hypot hesis in Equation (7.. .teacher ra tio (P I) IIlId the coc(fjcic nt on expenditures pc r pupil (f3 2 ) are "lero is a n example of a joint hypothesis on the coefficients in the muh iple regrcs. and pj.. In gene ral.....'r to th. suppose that you are interested in test in!! t~~ joint null hypothesis in Equation (7.ll l:{J1 '* Oand/or fJ. refer to differe nt regression coe fficil! ll ts.i\ f1~ doce..~{J1 = Oandp2 = Ovs. •• ~ I..c... O .o' P". then the joint null hypot hesis itscl f is fa lse. We consider joint null and fllternntive hypl)th~. .so that there . In Ihie. under the null hypothesi s Ho there arc q such restrictions.7) is <In e xample of Equation (7. If anyone (or morc than one) of the eq ualities unde r the nul l hypothe~b !II' in Equa tion (HI) is fal<.liter' nati\"(~ hypothesis is tha t al least one of the equalilies in the null hYPolhl!.fjJ = O and Ps = o.nt •••• ? . Il l: onc or more of the q reslriclions unde r H o docs not hold.o.13"'.226 CHAPTER 1 ~is Tests end Confidence Intervals in M ultiple Regrenion we com ro l for the percentage o f English learners. the .p"..ion model. What happens when you use the "one al a time testing procedure: Reject the joint null hypothesis if eilher" or'2 exceeds 1. tions 011 thl! m ultiple regression model: f31 = 0 (Il1d f3 2 . Ln II regression with k = 11 regre~ors. ses of the for m 1111: 13) = f3 .. rcfi. .1.tric.. (""'"'J 11\c hypothesis lilal bolll the coefficie nt on the studen t. Tn general. Why can 't I just test the individual coefficients one at a time? Althou. the null hypothesis is tha llhe coe fficients on the 2ndA'h. I. Beca use STn b the first regrc\. vs.O. The null hypolhe"is in Equation (7. thi.8) where P" 13". TIlu!.. .Jlh.8). 3rc q = 3 restrictions. restricts the value of two of t he eoeffick·nts.7) im posc~ IwO rl!<.6) a nd £'(1''' is tht:' second. S pecifically. . we can w rit ~ th h.:maticall~ ae.c..6) thai {JI = 0 and f32 = O. ing the null hypolhesis thHt fJJ. Anot he r eX<lm ple is Ihat. not hold. and leI t 2 be the t'3tatistic {or test.. en. 1l. a joint hypothesis is a hypothesis Ihat imposes two or more re<.: values of these coeffkients under the null hypothesis. (7. hypothe./b i~ .!!~ it seems it should he possi ble to lest a join t hypulhcsis by using the U3ual r·:Wll s· tics to test Ihe restrictions onc at a time.e. . _ . so as a mailer alter. that is.is 1l1.e.. Let I I be the 1_!>t 9' tistic fo r lesting the null hypotbesis tbat 131 = O. f3 z = O.. sor in Equation (7 . and 51h regres~ors arc 7Cro.0..o. the fo llowing c:llculation show~ thall~l' approach is unreliahle.tric· lions on th e regrt!ssion coefficients.z +.
so under Ihe joint null hypot hesis the .omplicated .he two tSla ._stali~· . tha t thl! :..96. So the proba bility of reject ing the null hypot hes is when it is true is 1 .96) X PrO'2! !S \.. The null is nOI rejected only if both I'tls 1.As mentioned in Section 6.statistics ' l and have a b ivari.a new approach is needed .lIe normal distribu tion . arc.2 :cgre~_ Test1 01 Joint Hypotheses 227 math_ (7.952 :c.11 T he F Statistic Th e F statisti c is used to test joint hypothesis about regressJon coefficients.lhat i ~ ils rejection ra te unde r the null hypothesis docs not equa l the desired significance leveJ. Pr( lld s 1.: 0.t ing tnt the I_~ta c for I~~t lit a till). d~ 1.s di sadvantage is that it can have low powe r. called t he Boo (erron.".s H~ Because Ihis question involves the IWO ra ndom variabkl) II and 12. re fer pothesi<i thilt. When the joint nuU hypothesis has the two restrictions thnt PI = 0 and tisl ics I) and 12 using the formula pz'" O..hhou~h 11 .. TIlis "one at a time " method rejects the null too often because it gi ves you too many chances: If you fail to rejeci using the first {statis tic . ih '2 . . method. If th e regressors arc corrda ted. The Fstatistic with q = 2 restrictions. Fortunate ly. What is In e size of the "one at a lime"testing proced ure: that is.7) I) (mel a joint sc. We fi rst d iscuss the case o f twO restrictio ns. Because the Istat istics arc indcpendent.96) = 0. ·lb c liize of the "one at a time " procedure depends o n the value o f the correla tion betwee n the regressors.0. the n turn to the general case o f q restrictions. you get to try again using the second.96. lllC ad va nl:lgc of the Bonferroni method is that it appHes very ge nerally.95 2 = 9.7.6.:!. answeri ng il requires characterizing the joint sampling distribution of'l and .e.9025 = 90. . Firs t consider t he special case in which th e (statist ics are uncorrela ted and thus arc independent. is d ~s cr ibed in Append ix 7. That approach is based on the Fstatistic.. there is anothe r approach to testingjoinl hypotheses tha t is more powe rful. the Fstntistlc combines . The for mulas fo r the Fstatis tic arc integrated in to modern regression sofly. This method . ~speci ally when 'he regressors are highly correlated. in larg~ samples ~I and have a joint normal d istribution.75%. I. O ne approac h is 10 modify lht' "o ne 31 a t ime " method so that it uses differ enl cri llca l values tha t ensure Ihal its size equals it!> significa nce leve l.': rcstric ypothe (70S) .25%. Because t he "o ne a t :'I time" test ing approach bas the wrong siL. what is the pro bability that you will rej ect the null hypot hesis wh en it is true'! More than 5%! In this special case we ca n calculate the rejection prohabil it y of t his me thod exactly. whe re each Istatistic has mean equal to 0 and variance equal to I. in a It<: on lhe h at there re q such t h~sis fill the alter the<i.lt.96) = P r(I'll !S 1. the of ler restr.96 and I (~ I ~ 1. the situatio n is even more (.96 aDd \12! s 1. il freq ucntly fa ils to reject the null hypothesis when in fad the ailcm ative hypothes is is true.
is given by the Fq.228 CHAPTER 7 Hypolhesis Tests end Confidence Intervo!s in Muhiple Regression f . (7. under the oull hypothesis. Uode r the null hypothesis. ('1+ f2.L·" ' (19) where P is a n estimator of the correlation between the two (sta tistics. That is. U nder the null hypOIhesis.1t the Ista tistics a re unco rrcl ~ t ed so we can drop the terms involvingp'.4).ll. in l:lrge samples. ' _:: :. in large sa mples.2' .~. the Fstatistic is the average of the sq ua red ' ·stat istics. Under the alternative hypothesis thai either f3l is nonzero o r f31 is no nzero (or bo th). first suppose that we know th. Computing the heteroskedasticityrobust Fstatistic in sta tistical software.. leading the teslloreject the null hypothe~is.9) simplifies a nd F ::::. If the F.3. TIle formula for the he teroskedasticity· robus t Fsta lis!ic T esting the q restrictions of the joint null hypothesis in E quation (7.! estimate of the "covaria nce matrix"). ")I To und e~tand the Fstalis llc in Equation (7. lf so. This adjustment is made so that. lio n (7. for historical reasons most statistica l software computes homosked ns ' ticily·o nly standard errors by default. Consequently._ distribUTion in Appendix Table 4 for the approprillt e value of q and the desired significance level. / 1 a nd I] are independent standHtd normal rando m va riables (because the Islalistics are uncorre la ted by assumption ). + fi): that is.~ distribution in lnrge samples whether or no t the {·stati st ics are correlated. 1 In genera l the {·stati stics are (. under the null hypothesis the F·statistic is distributt!d Fq.'/ If ] ) P" 1 1" I'. the F statisTic has an F2. This fa nnula is incorpo rated into regression software. its large II distri bution under the null hypot hesis is Fq•x regardle~ of whether the errors are homoskedastic or he teros kedastic. so under tbe null hypolhesis F has all F2. As discu!)sed in Section 5.orrelated. 1.8) is given in Sec tion 18. the Fsmistic has a sampling distribu tion that.sla tistie is computed using the gener~ 1 b etc rosk edasticit ~ · robust fomu la. .9) adjum for this correlation.. in some software pack'lg~S you must select a ··robust'· option so that the Fstatistic is computed using he!' croskcd asticityrobust standllrd errors (and. i({T The Fstatistic with q restrictions.4. mo re genernlly.10) Thus the c ritical values for tbe Fstatistic can be obtained from the tahlt:!~ of the F'I. then either 1 o r II (or hath) will be large.x distribUlion. Eq ua. Th e homoskedaslicityonly V CT"iofl of Ihe Fstatistic is discussed ot the e nd of this section.9).<~ distribution (Section 2. m:lking the P ·stal istic easy to compute in practice.. a nd the formula for the F·stal istlc In Equati o n (7. a hcteroskedast tc1W robu:!.
e of the general null hypoth esis in Equat io n (7 .9) can he compu ted using the largesample Fq ap proximation to its d istribution. It.e" u~ i n g tu:!' das ticit~ Application to Test Scores and the Student._ > F." dist ri bu t ion when the n ull hypothesis is true.. . because a x~d is lribut(:d random variahle is q limes an "~/. r the p value . at le ast o ne j. distribution under the null hypothe<ii:o:. Lei Ff denote the value of the Fstatist. Allematively. The pvalue of the Fst:uistic 7.. 12) t hat.12) is a special C<t!o. That is..volue using the F·statistic.0 using the rcgre~ion of TestS('()re on STR.7. When q = l.. the null and alternative hypotheses are .11 ) a ta ble of the Fq dis . a table of the x~ distribution.!· paekag..il' . and the overall regression f'statistic is lh e I'~stat is t ic computed for Ihe null hypot hes is in Equation (7. and the I~slal i s lic is the square o f the Istatistic.. OI ..''i !cussed 11'1 oskedn.value can be c\alualed using a compu te r. alternatively.7I.lClual1y computed. .<£ trihution (or. illy versi(lll \ . ..Teacher Ratio We are now able to test the nu ll hypothesis that the coefficients on hmh the s tu dentIeacher ratio m ill expenditures per pupil arc Lero. The "overaW' regre~sion Fstatistic tf!:>IS the joint hypothesis that all tht: slope coefficients are zero.2 ~hofJoinlHypotheses 229 Computing the p.dislributed random variable).:z . . ah hough lhe intercept (which under the nu ll hypot hesis is the mean of Y. Hn: {3! = 0.. Iht! overall regressio n Fstatistic has an Fk•.'il Ihal {jl = 0 and f3.) can be llon"lero: rhc null hypo thesis ill Equlllion (7.11) can be evaluated u~ing (7. I . the p\"alue is . I til' in s under ether The "overall " regression F·statistic. (7. W) ~abiCS of .. 'l. In IMge sam ples. that qua· fthe dard ion). lind tbe The F·statistic when q = I . (7. j = 1. controlling for Ihe pcrC\.. no ne of the regressors expla ins any of the variation in Y. the p.againsl the alternative th at at least one coefficient is nonzero. To lest Ihb h) pothcsis.Pr [f·~. .thl! Fstatislic (Csts a single reslric lion .". in der the U nder this null hypothesis. Becau~e the P·statistic has a largesample "~.13k = 0 \'~ Jf1: {3/ "* 0. The p value in Equation (7.~t i dt y uation ·ftwa re. fl!gardk.. k. because formu las for the cumu lative chisquared and F distributions have been im:orporated in lo most mode rn slOlIis tica l softwa re. ..8)./32 = 0. I' then bcsh. we need to compute the hcteroskedaslicityrobust F sllltis(ic of the tc. Then the join t null hypothesis reduces \0 the null hypothesi~ on a single regression coefficient.stlC ~asticit). .1 2)..nlage of English learn ers in the dj"tricl.
c. be associated witb . This Fstat istic is S.005)." distribution. Ihe he terosked astici tyrobust F·stati~tic comput ed using til e formu la in SeCllo n 18. and the 1% critical value is 4. if the e rro r is homosked. we can reject t.1l) as summarized in this Fslatistic. This resta tement sug gests thaI there is .43. The ho mos kedasticil yonl y F·statistic is compllled using a simple C ormul ll hased o n T sum of sq uared rcsidu als fr om two regressjons. the F·sta listic can be wrincn 10 terms o f the im proveme nt in lhe fit of the regression as mc:asured eit be r b) Ihe s um of squ ared residua ls o r by t he regressio n R2. The value of t he Fstatistic computed fr om the data . th e altemative hypothc~i~ p allowed to be true.. 5. the restricted regressiun is lbe regression in whieh th ose cocfficicn(~ are set to 'le ro. Despit e thi s significant limit ali on of (he homoskedastk ity·oruy F slalistic.43. Based on the evidence in Equation (7. the relevant reg. Under the null hypothesis.6). in large samples this statistic has au Flo. the null hypothesis is forced to be true. lo contrast. and P". If the sum of squared residuals is sufficienlly smaller in the unre stricted than the restricted re£ression.8).. Tha t js.. In the [irsl regression. its simple form ulAsheds light on what the F·~t a li stie is doing. exceeds 4. ln fact. that is.43 if Ule nuU hypoth_ esis really were true (the pvaJue is u. link betwee n the Fstatistic and the regression R2: A large F· statistic sho uld. The resu lting Fstatistic is referred to as the ho moskedasticity·onJy Fstalistie.. substantial increase in we R2.) is va lid whe ther the error term is homoskedaslic o r heteroske daslic. stic. because it is va lid only if tbe e rror term is ho moskedastic. When lhe null hypothesis is of the IyJ)C in Equation (7. such as might be repon ed in a t<lble thai includes regres~ion R2's but nOI F·slatisti cs. u.EL reported in Equ ation (7.230 CHAPHR 7 Hypothesis Tests and Confidence Intervals in Multiple Regreuion Expn.h e taxpayer'S hypothesis lhat nei_ ther the studenll eacher ra tio lIor expend itures per pupil ha\ e an effect on t t:~1 scores (holding constaot the pe rcentage of English learners). lhen the tcst reiects the null hypothesis. this intuition has an exact mathematical expres· sio n.. Ihe simple formul a ca n be comput ed using standard regression outpu!.'llled the unrestricted regression.61. distribu tioo is 3.. In addiTi on. In the second regression . ~o the null hypothesis is rejected at the 1% level. The S% crit ical value of the F2.. if tbe e rror te rm is homoskedastic. he ea]Jed the restricted regre~sion . .61.00 (Appendix Table 4). it see ms.ressors arc excl uded from the regression. The HomoskedasticityOnly FStatistic One way 10 rest ate the questio n addressed by the Fstatist ic is to ask wht=thcr re laxing the q restrictions (hat constitute the null hypothesis improves the fit or the regression by enough that this improveme nt is unlike ly to be the result merely of rando m sampling va riation if the null hypot hesis is (Jue. where all the hypothesized \'alue~ are 7ero. It is very uolikely thaI we would have drawn a sample that procluced an Fstatjstjc as large as 5.
d ) /q _ SSRunrnnicrtdl( n k UllujUlCIM I) (7. q is the number of restrictions under the null hypothesis.1. bo moskeda!>tic. U nfo rluna te ly.13) where S5R'.wrirred is the sum of squared residua ls from the unrestricted regression.13) and (7 . in practice the ho moskedasticityonly Fstatistic is llo t a satisfactory substi tute fo r the beler oske dasticilyrobust F sta tistic. the di[fc rc nces between the two distri butions are negligible.."'~wia(d is the number of regressors in tbe unrestricted re gression. The unrestric ted regression has the regn. If the errors rnd~rd a re homoske dastic a nd a re i. then the diffe re nce between the ho mos kedas ticityonly Fstatistic computed usi. £t:!JII. or mo re generally wit h data sets typica lJy fo und in the socia l sci ences.<.:ssors STR. we need to compu te the SSR (or R2) for the restricted and unrestricted regr ession .o< distri 1 bu tion as n increases. however.R?u"u. the two sets o( c ri rical val ues d iffe r. a nd is give n in E quation (7. An alternative e quivalent fo rmula fo r the ho moskedasticityo nl y Fstatis6c is based on the R2of the two regressions: F = (R~".smt'frd ) l q (I . R~"r~$lflcrffl = 0.d.k"nrtm i. whic b depend 00 both q and II .i.R:.6): its R l is 0. and Pct£L.I dis tri butio n under the null hypothes is... th e FqJl .t _ .13) or (7.14) has an Fq.4. for large sa mple sizes.:'I"':I(d is the sum of square d residuals fro m the re stricted regressio n.4366.k""~rT1(.rricrfd . ress10n f n11Ula rt:nthe ressiofl. SSR"". These rule ofthumb (onnulas are ea sy to compute and have an int uitive int e r pre ta tio n in (e rms o f how well the unrestriCied a nd restricted regres. For small samples. 14 ) a nd the het e roskedasticityrobust Fstatistic vanishes as the sample ~ i ze n increases.rftl .ions li t lhe datOl . The ~ thesiS is esi~..nIt~ . no rma lly d istributed .mtl) / (n . in large samples. and k. the sampling distribution of the ruleof· thumb F statistic unde r the null hypothesis is. ~.SSRullmlrkr.4366: tha t is.1 distribution converges to the P1.1) ' (7. C ri tica l va lues rOr tbis d istribution ...:1ttl . To test the null hypothesis that the populat io n coefficients o n STR and Expn a rc Q. Fq. the n Ihe homoskedaslicity o nly Fstalistic defined in Equations (7. As dis cussed in Section 2. 14) Ir the errors art. t1 "alu~ ~nts art .s small. a re give n in A ppe ndix Ta ble S. they a re valid o nly if the e rrors are ho moskedaSlic. Because homoske dasticity is a specia l case tbal caono l be counte d o n in a pplica tio ns wiiJl economic data.2 TestsofJointHypolheses 231 n e homosk edQstitityorn)' Fs tatisti c is given by the formula F'" (SSR mmqrd . if the e rrors a re homoskedastic. conlrolling for PcrE L . Thus.ng Eq ua tio n (7. : F·sta frthe Using the homoskedasticityonly Fstatistic when n. heUJl~ . In II>' Application to Test Scores and the StudentTeacher Ratio.7.
.14). . so q '" r.0 far to lest this hyp01hesis.4366 .6J. computed usiog Eq ua tio n (7.671 x Pc.032) ::= so 2.Ol exceeds the J % critical val ue of 4..).. the number of obse na. cstimated b} 01 . Tllcre are two appro.4J~ 9.4366) / (420 . I~ TestScore  =:c 66·'1 .232 CHAPTfR 1 Hypothesis Te~t$ and Con~dence Intervals in Multiple Regression re~trictcd regression i m po~es the joint null h }' pot h es i~ that the true coefficient<:. the effects of the first and second regressor are the sam~ 10 this case. theory might suggest a null hypothesis ("Itthl' form {31 = {3l.tEL.\.1 This null hypothesi .43. 111is example illustrates the adv::lDtages and disadvan tages of the hom osk~uas· ticity·only Fstalislic. Because t!.3 Testing Single Restrictions Involving Multiple Coefficients Sometimes economic theory s uggesl ~ a single restriction tha t in voh"es two or nlOre regression cocf[icients. and the number of regressors in the unrestricted regression l~ k 3. has a single res triction . The restricted regression.0 1.0.' 1e~s rcliahle homoskednstici tyonly ruleof·thumb value of 8.4149.1)[ = 8. The number of restrict ions is q = F ~ [(0.. all hough PclE L does ( the nu ll hypothesis does nlll re~trict the codficicnl on PctE/.0. tions is 1/ = 420.1 . (7 5) (1.IM = O.01 . It s disadvantage is that the va lues of the homos kcdas tieityo nly and he! t:roskt:dnsticityrobust FstatiSlics can be very different : The helcroskedastiClty robu~1 F·stntislic testing this joint hypothesis is 5.0) (0.d~ pre~entcd <. IJn 5TR and E:r:p" are zero: that is. the hypothesis is rejected <It the j % le vcl using this ruleo fthumb approach.4 )49)1 211[«( .l'lt depends on your 'illftwarc. the task is to test this null h}'Polhcsis against t he alternative thu t th~ tWO coefficit:nts differ: Ho:f3 I =/31 VS · lI j:{3I'1f3 2· (7 16. H2 = 0. under the null hypothesis STR and E:r:pn do not enkr the population regressi on.15 R~n."lches: w hich one \\ill ttc c:'i')ie. that is. lls ad van tage is that it ca n be computed using a calculJlor. but that reslrict iOfi involvcs multiple coerricienls ({31 and {:J2)' We need t o modify the meth.. Fo r example. quite different from tht. The homoskedasticityon ly F·statistic..0. 7..
this is do ne by first constructing th e new regressor Wi as the sum of t he two ori ginal regressors. This me t hod c(ln be extended to other restrictions on re gression eq uations usi ng the same trick (see Exercise 7. i where Y1 = f31 .8l(X 1 + X 2 = YIXli + f3 l W!. suppose therc arc only two regressors. y.:ntcr Approach #1: Test the restriction directly. is ted at the Y.9). The two methods (Approaches #1 and #2) a re equivalent .17) into Equation (7. + fhX2t + Ui' (7.f32' under th e null hypoth esis in Equation (7. O. lti) can be tested using a trick in which the ori ginal regression equation is rewritten to tu rn the restriction in Equation (7.3 Testing Single Re5tridions Involving Multiple Coefficients 233 entS on )\ .. by turn ing Equation (7. then estimating the regression of Yi on X li and WI' A 95% confi dem:e interval [or the diffe re nce in the coefficients f3 1 .f32 can be calcul ated as ). Because the restriction now involves the sin gle coe ffi cient YI' the nu ll hypoth esis in Equation (7. Th us.18) I ~ wo or mort Ithcsis of the Ithe same. we have tu rne d a rest riction o n two regression coefficienb into a restriction on a single rcgression coeffic ie nt. has an F Loc distribution under the null hypothesis.1l1US.1tol ~ t restricti')" the methodS hieh onc Because the coefficient YI in this equation is YI = f3 1 . To be con crete. Some statist ical packages have a specia lized command design ed to test restrictions like Equ ation (7_Hi) and the result is an Fstatistic that .96 2 = 3. the hypothesis in Equat ioll (7 .is (7.17) lloskedas ~alculalO[.15 ) Approach #2: Transform the regression ..) . In practice. I and het :edaslicit" I .f32 (Illd Wi = Xli + X 11.1.81 .8\X 1t + f3VY2i = f3 1X II . X l i and X 2t in t he regression.96S£(Y)). (Recall rrom Section 2. + f32 W i + ui' (7.Io that the tWO \7 .82)XIi + .) Des no t O LS. because q = 1.sion is k 14).7. If your statistical pack age cannot test the restriction dir ectl y. the po pUlation regression in Equa Lion (7. . = f30 + f31XI . so the 95 % pe rcentile of the Fl.16) can be tested using th e Ista tistic me thod of Section 7.16).f32X li + f3JX 1 + .18).84 .17) can be rewri tten as Yi = f30 + YVYI."" distr ibution . = 0 while un der the alternative.4 that the square of a standard normal random variable has an Fl.16) into a restriction on a single regression coefficient. • .1 ± J . Y\ of:.'" distribu tion is 1. in the sense that the Fsta tistic from the first method equals the square of the Istatistic from the sec ond me thod .82X 2j = (. so the population regressio n has the form observa . ~m the \css Here is the trick: By subtracting an d adding f32X li' we have that .
e case of multiple coefficie nts. . Figure 7.z._ _ __ _ _ . a confidence set is the generalization to two or more coefficient s of a con!"i· dence interva l for a single coefficient. The F·siatistic of Section 7.0 works in thc· ory.6J and f3.J shows a 95% confidence seI (confide nce e llipse) for the coefficients on the s tudent. In general it is possible to have q restrictions under the oull hypothesis in which so me or all of these restrictions involve mult iple coefli. 'The method is conceptually similar to the method in Section 7. flGU ellipse C' 00$ of PI rejected 7.. Recall thac a 95% confidence interval is computed by findin g the set of vtll· ues of the coefficie nts thai are not rejected using a Istatistic at the 5% significance level.0 a nd l3z. the true population values of (3\ :md [32 will not he rejected in 95% of all samples.teacher ratio and expenditure per pupil.234 CHAPTER 7 Hypothesis Tesi's ond Confidence Intervals in Multiple Regression Extension to q > 1..OU a re inte rested in construc ting a confidence set for twO coefficients. in practice it is much simp ler to use an explicit formula for the con fidenc e ~c\· This fonn ula for the confidence set for an arbitrary number of coefficients is based on the formula for the P·statistic. TIle p./3~.!(l. suppose ). This a pproach.tI)· }'OU construc t the Fstari slic and reject it if it exceeds the 5% crit ica l va lue oj J.nificance level. ciems. except that the confidence sel for multiple coefficients is based on the F·statistic. Thus. As a n illustration.61. Thus.To make this concrete. This means that the null hypothesis thai these two coefficie nts arc both ze ro is rejected using tilt Fsta tistic at the 5% sil!.u aDd [32..111is ell ipse does not include the point (0. TIlis section explains how to construct a confidence set for two or more regressi on coeffici ent s.2 extends to Ihis type o f join! hypot hesis. based on the estimated regres· sion in Equation (7. the resulting confidence se lS are ellipses.Suppose you were to test every possi hie va lue o f . ware being used .4 Confidence Sets for Multiple Coefficients the 5% si. stalistic ca D be comput ed by eithe r of the two methods just d iscussed for q = I Precisely how best 10 do this in prac!ice depends on Lbe specifjc regression soh.0 a nd /32 = fJz p. Secl ion 7.2 showed how 1O usc the F s tatislic to test a joint nuJl hypothesis tha t /3 1= /31. 2. A 95% confidence set for two or more coefficients is a set that contains the true population values of these coefficients in 95% of randomly drawn samples. which we al ready kne w from Section 7. can be extended to th.1 for constructing a confidence set for a single coeffici ent using th e I·St8Iistic. When there are two coefficients.6)..0).o at the 5% le veL For each pair o f cand ida tes (/3t O.. Because the lest has a 5% significance level. the set of values nOI rejected at the 5% level by tbis F·statistic const itutes a 95% confidence se t for /3 1 a nd {h Although this melhod of trying all possible values of {31. holding consta nt the percentage of English learners.
because some useful guidelines lire available.11:t) = (0.:l ~ndi\ure per pupil.o. and no single rule applies in all si tu atio ns. " 2.1l 0 .\ \ . 3. he estimated rcgre~0.00. 7 (..0).sloli~tic 01 <Je~ the 5% significance leoeL . statistic. :t fo r /31an d 132 ' Id i32. .11 . the resull iM (confidence e llipS." 1.5 Coeffici ent o n STR ({JI) : g the set of vat n e 5% significance fficie nls.S JOS under the lultipl e coeffi.. "Ine stan ing poin t for choosing a regression specification is thinking through the po~sible sources of omitted " a riable bias. . This mea nS tllJI C is rejected using tll ~ " from Section • . . Section 7.'aria blcs to include in multi ple regressio nthat is. othesis. It is important to rely on your expert knowledge of the empirical problem and to focus o n o btain ing an unbiased est imate of the causal effect of interest:do no t rely St)lely on purely statistical meilsun.0 1. . ~7) . th e problem of choosi ng a regressio n specifica tioncan be quite challen ging.1.:t.U 0. (iJ2) 95% Confidence Set of t31 ond 132thot COnnot be relected u ~i ng the F.ts iTIore re gression I + (11 1. egression sofl· Model Specifico~on lor Multiple Regression 235 FIGURE 7.0 wo rks in tM' the con[ide ncl: s.5 0. hat contains the ~ drawn samples.S Model Specification for Multiple Regression The jo b of detemli ning "hich . ' me re ason fo r this o rien ta tion is th : the H estim ated corre latio n between P I a nd {3: is positi ve. The elIip~ contoin$ the poir~ of vol· .29. But do not des pai r. :oefficicn ts is ba~l:d :ients.1 for except .8. cie nts of a eonfi· ~stat i st ic. 1 95% Confidence Set for Coefficients on STR and Expn from Equation (7.) and Expn iJ311 j~ on ellipse. lho)· :Iie"l value 00. jidales (J3 l.5 I. . The F· .sed for q = 1.7. . which in tum arises because Ihe corn: la tion b ~t wee n the regressors SiR and Exp ll is negative (schools tha i spend mo re pe r pupil te nd to have fe wer stude nts per teache r).6) Coeffici elll on El:p" The 95% confidence set lor the coeff icients on STR (.:s of fit such as the Rl o r "H2. m values o f {3 \ and ues not rejected al 7. To make idence se t for tWO istic to test a joint o test eve ry possi Th e co nfi dence e llipse is a fal sausage with the long part of the sa usage oriented in the lowerIeflfupperrigh t directio n.
The two conditions for omined va riable bias in multiple regression :He summari zed in Key Concept 7. . then the schools will tend to have large r budgets and lower st udent.This means th at the conditional expectation of II. even after co n[roJtj ng C the percentage o f English lea rn or n ers. if the district is a wea lthy o ne. economic theory. decid ing whe ther to include a particul ar va ri able ca n be d ifficult and requires judgment. A t lea variabJ! 2. . given X Ir . and the OLS esti mate of the coefficie nt on the stude nt. however. wh ich could lead to bette r test scores.!r est and the control variables suggested by expert judgment and economic theory· .:e is large.OMlnE Omitted \1 more inelu able bias l' I.3 . then at least one of the regressors is correlated with the error term.• Xk/ is nonzero.teac her ratio and the percentage of English lea rners. Th e gene ral cond itions for omill ed vari able bias in multip le regression are similm to those for a single regresso r: If an omitted varil:l ble is a de terminant of Y . For example. As a result. . so that the first least sq uares assumption is violated.. the OLS estima tors are correlated. and if it is correlated with at least one of the regressors. omitting the stude nts' economic backg round could lead to omitted variable bias in th e regressio n of tes t scores on the student.236 CHAPTfa 7 Hypothesis T esbond Confidence Intervo11 in Multiple Regrenion Omitted Variable Bias in Multiple Regression The O LS estimato rs o f the coeffic ients in multiple regressio n wiU have omitted variable bias if an omitted determ inant of Yi is correlated with at least one o f the regressors.teacher ratios. The O f] Model Specification in Theory and in Practice In theory. O ur approach to the cha lle nge o f potential o mitted variable bias is twofold. the solution to omil· ted variable bias is to include the omitted va riable in the regressio n. As was discussed in Section 1l.teac he r ralio would pick up the effect of average district income. so in general the OLS eSTimators of all the coefficients will be biased.stude nts from afflu ent families o h en have more learning oppo rtuni ties tha n do their less a rnuen t peers. a core o r base set of regressors should be chosen using a combination of expert judgment.1l. If so. T short. the omitted variable bias persists even if the sa mple si.teacher ral'io wo uld be negatively correlated. and knowledge of how the data were collect ed: the regression using th is base set of regressors is sometimes referred 10 as a b ast" specifiClttion. omitted variable bias implies thilt the OLS estimators are inconsisten t. Fi rst. that is. Moreover. In prac l i c~. the afnu ence of the students and the stude nt. ch en the OLS est imators will have omitted variable bias. when dal<! are ava il able on the omitted va ri able. A t a ma lhema tjcal level. This base specifica tio n should contain the variab l e~ or prim ary int. if the two conditions fo r omitted variable bias are satisfied.
:ombins tion of I were collectl!d: 'red to as a bs. This makes these statistics useful summaries of the predictive <lbility of the regression . . it is easy 10 reat! more into the m than they deserve.s t\\'orold .S Model Specifica tion for Multiple Resre~sion I 237 .Will be gress10n a re r1b1e bias art lor {efln. alternati ve sets of regressors.. We elaborate on t.. two things must be true: I. ff. however. QLS esti .lIeS of the coefficie nts of interest c hange substantiall y across spec ifications.. • .'I' 7. There are four potential pit fa lls to guard against when using the R2or R.! omitted rWWMJM.JJJtfll. and often the va riables suggested by economic theory are no t the o nes on which yo u bave data..his approach to mode) specification in Section 9.3 j glish leam to omitted tio and the lression are ri nant of Y. that is. However. Tbe ref01e the next step is to develop a list o[ cand idate aheroative specifications. the eslim. S estimators I DL$ estima· fienlS.fications.. int erconomie the°!'}'· 1.. The omitted variable mllsl be a determinant of the dependem variable. and an R2 or an R2 near 0 meam tbey are 1)0t. A t least one of the included regressors must be corre\<lted with the omitted variable. This bzero. then this provides evi dence that the estimates from your base specificat ion are reliable . O MITIED VARIABLE BIAS IN MULTIPLE REGRESSION 7. To asce rtain whelher an added vari able is statistically significant." K£y CQNC.e effect of ~omitted variable bias is the bias in the O LS estimator that arises when one or more included regressors arc correlated with an omitted variable.2 after studyin g some tools for specifyi ng regressions. The Rl increases whenever you add a regres sor.. Y.. Ie difficu!l and bias .. you need to perfonn a hypothesis test using the (statistic. In practice. Interpreting the RZ and the Adjusted R Z in Practice A n Rl or an "R 1 near 1 mean s that the regressors are good at predi cting the values of the dependent variable in the sample. so that variable bias .2: ution to om it :>0 .'l'" .Foromitted \'ari· able bias to arise. on the other hand. If the estimates of the coefficiems of interest are numericall y similar across the alternative speci.<it )f primar). All increase in the Rl or Rl does not necessarily mean that an added vari· able is srarisfically significant.. but if il does this docs nol necessa rily mean that tbe coefficient on that added regressor is statistically significant. '. lhis ohe n provides e vide nce tbat the original specifical ion had a mi lied variable bias. 2..< . whe the r or no l it is statistically significan t.P1 I me of the ~Iearni ng rettc r test 1d to have Ie student!'. The "R2 does not always increase.s implies that Expert judgment and economic rheory are rarely decisive.
1 does discussion of Section 6. The R2 of the regression ne\ ~r came up because it played no logical role in this disc ussion. If the R2 (or l?2) is nearly 0. 1. in the sense t hat the variance afth c OLS residual is small compared to the variance of the dependent varia hie. The regressors are a true cause of the movements in the dependent Variable: 3. Pa rki ng lot are{l is correlated with the studentteacher ratio.238 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regre~sion WHAT THEY TEll AND WHAT THEY DON 'T R2 AND 7?2: You ~~ 7. which concerned omitted variable bias in the re gre ~ sion of test scores on the student. You have chosen the most appropriate set ot" regressors. If the R~ (or 7?2) is nearly 1. and. a low R2does not imply that there necessarily is omitted variable bi. data quality. a moderate }<2.Ieacher ra tio. Cun verse ly.4 Tht: Rl and R2 tell YOEI whether the regressors are good at predicting. or "explain_ iog." the v(llues of the dependent varia hie in the sample of dat a on hand . A high R l or liz does not necessarily mean you have (he mosr app ropria rt sel oJregre. An incl uded variab le is . The question of what const itutes the right se t (If regressors in multiple regression is difficult and we return to II throughout this textbOok . then the regressors produce good predictions of the dependent variable io that s<lmplc.011 whether: 1. A high R2 or li. or a high R2 .111us the regression of test scores on parking lot area per pupil could have a high R2 and R2.nors. 4. O mitted variable bias can occur m regressions with a low R2. most importantly. The re is omilled variable bias: or 4. da ta avail ability.statistically significant: 2. wi th whether the school is in a suburb or a city.but the relations hip is not causal (try telling the s uperintendent that the way to increase test scoreS is to increase parking space!). and possibly wi th district income all things that arc correla ted with test scores. economiC theory and the nature of the ! mhstantive questions being addressed . lIor does a low Rl or lil necessarily mean you have an il/ap' propriate set of regressors. A high HZ or /i 2 does not mean there is no omirted variable bias. Non e of . The R2 and R2 do NOT tell ). Imagine regressing test scores against park ing lot area per pupil. the opposite i ~ true. Recall (he l10t 2.1. mean that tlie regressors are a (rue cause oj the dependent variable. Decisions about the regressors must weigh issues of omitted vari able bi <ls.lS.
economic . va riable: Discussion ofthe base and alternative specifications.1fI. None: ~ )mittcd vari Iy.~ not perfectly correla ted (their corrclillion coe fficient is 0.74). These two variabks thus measure the fraction of economically disadvantaged chil dren in th e district. These poinls are summarized in Key Concepl 7. Here we I. Whe n we do this. a lth ough L hey are related.6 Aoolysis of the Test Score Doto Set 239 these questions ca n be answered simply by having a high (or low) regression R2 or n 2. Fxplain ffthe R2 pendent 7. Many factors potentially affect the average k'~\ score in a district. holding constant student cha racteristics that the superintendent cannot control. The firs t new variable is the percentage of students wh o are eligible for receiving a subsi di zed or free lunch at schoo L St uden ts are eligi ble fo r this program if their family income is less than a certaio threshold (approximately 150% of the poverty line). the solution 1 this problem is to indude them 0 as additio nal regressors in the multiple regression. so in stead we use two impe rfect indicators of low income in the dist rict.teacher ratio using the Californi<l data set. Although theory suggests that economic : se of the u 1 lot area & ratio.. Con a riablc bias.. holding constant these other factors. the coefficient on the studentteacher ralio i ~ the effect of a cha nge in the studentteacher ratio. they . If data are available on these omined varia hles. One of these conlrol variabl es is the ol:"le we ha ve u~ed previously. Some of th e fac tors th at could affect lest scores are correlated with the studentteacher Hltio. so omilling them [rom the regre s~io n will rc~uh in omitted variable bias.6 Analysis of the Test Score Data Set This section presents an analysis or lhe effect on lest scores of Ihe slUde nl. Our secondary purpose is to demonstrate how to use a tabl e to summarize regres sion resulls. is small benrly O. Ih~ fraclion of "lUden ts who arc still learn ing Eng lish. with income test scores e!ationship : te :.:onsider th ree variables that contro l for background characteris tics o f the stude nts that could affect tesl scores.7. This analysis focuses on estimat ing the effeci on test scores of a change in the student.· 'ssion nC\"t.1 lcd variable gh R2. apl)ropriQlt Ql 't (UI jl/UP ~ right set l. The two other variables are new a nd control (or the economic background o f tht: stude nts.t scores Rcca ll lh~ I the rcgre. Our primary purpose is 10 provide an exam ple in whi ch multiple regression a nalysis is used to mitigat e omitted variable bias.1 'oughout thi~ Ised .4. Families arc eligible for thi s income assistance progra m de pending in part On their family income.teache r ratio. Tne second n ~w va riable is lhe pe rcentage of students in the district whose fami lies qualify for a California income assistance program . . but th e t hreshold is lower (stricter) than Lhe threshold for the subsidized lunch program . There is no perfect measure of economic background in the data scI.
..64. N " 62( "" •• (0) I What scale should we use for the regressors ? A practica l question that arises in regression analysis is what scaJe you sbould use for the regressors. to us. more nalUral. should you choose the scale. In the It:st score appli· cation. which would range betwee n 0 and 1 instead of between Oand 100. so the maximum possibl e range of the data is 0 to 100. How.ti5. th e coefficient is the pred icted cnange in test scores (or a o nepercenta ge·point in crcase in En!!' lish learne rs.63. Although these two specifications are mathematica lly equivalent. FracEL ( = PCIELl I OO). Scalle rplots of test s scores and these va riables are presented in Figure 7 2. In the specification with Pc/EL. the coeffi· cien t is the predjcted change in test scores fo r an increase by I in the fra ction of or English learnersthat is. the coeHicient on Fr(IC£L would bave been .0. the regreSSion would have had an identical R2 aDd SER. we choose the percentage eligible for 'J subsidized lunch as t h~ ecooomic backgrouDd variable.00000356.·s 720 71~1 I)(~ ! '''" .0..ho ldi ng STR N Il stam.87.0.O. The corre. o r uni ts.. if a regressor is mcasured in dollars and has a coeffici ent of 0. In the regression of Tes/Score o n STR and PCIEL report ed io Eq uation (7. bctwcl:n tesl scores and the percentage eligible for a subsidized lunch is . Alte rn atively. Each of these variables exhibits a negative correlation with test scores.6. in the specification with Frac£L. we could have defined th ese variables to be a decimal fraction rathe r th an a percent. theory and expe rt judgment du not really help us decide which of these two variables (percentage eligible fOr <I subsid ized lunch or percen tage eligible for income assistance) is a better mea!lUrc of background . of the va ri ables? The general answer to the question of choosing the scale of the variables ~ to make the regression results easy to read and to interpre t. If instead the regresso r had been Fr(lcEL.2. lation between test scores and the percentage of English learners is . howc:vef.0. then.. the natural unit for the dependent variable is the score of the test ilSde. In Fig· ure 7. pet'tenlogeq r t' nnr' PfI . but we consider an alter· D ative specifica tion that includes the othe r va riable as well. holding STR constant. to read. the coefficien t on Pct£L is . A no the r consideration when deciding on a scale is to choose the units of the regressors so that the result ing regression coefficients are eas). for the purposes of interpretation the one with PctEL see ms. For ou r base specificatio n. th e units of the variabl es are percent. C a lOOpercenlage·pointincrease.and betwct: n test scores and the percentage qualifying for in come assistance is .'\a01 pie._ _ __ ~ . ". for example. Fo r e.?II "''' I (e) The SCOlterpl hon ~ 0. it is easier to read if the regressor is convc rted to millions of dollars nnd the coefficil: nl 1 "ifi i~ l . fIGURE T. Mure gene rally.650.5). Pcf£L could be replaced by the fruc/iolf o f E nglish learne rs. { . in regression analy· sis some decision usually needs to he made about the scale of both the dependent and independen l variables...240 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Mulliple Regression baCkground could be an important omin ed facto r.
• . e nd (c) the percentage qualifying for income 0~5i5!o nce (correlation = ..f.2'1 • • ' .1Jn ~ 'll...' I~'::·.re appli. Each regression has ..1gc learners (b) PercC'ilIage (or n:dun.. What is the best way to show the results from several multiple regres s ions that contain different <. ..6 Analysis of the Te51 Score Data Set 241 t do FIGURE 7 .. Each column summarizes a separate regression. Table 7._. the cE L. . Three Student Characteristics est Test score "o r a sure {or a score Iter e 7.(/1 • ( . • ' . we have pre sented regression results by writing oul the estimated regression equations. {b) the percentoge of students qual ifying for c wb~jdjzed lunch (correlation = .1 summa rizes the results of regressions of the test score on various sets of regre. orre tween t ween . of the ~bk" 1S to Test score ~2(1 7.J pncc IUllch In Fig ange of to be a aced by d range n analy pendent ts.. .1111 (. We are now faced wit h a communica tion fraction of g STR con enl. eo • "" .. the lio'! Percem (c) r"rn'ntJ~'\' 'lua h/\'illS lor UK OllW J~<i~t:ll1c \"" r The scolterplob !>how 0 r icient on negative relation~hip between Ie! scores and {oj !he percen!oge of Engli!>h \earners (correia 0. (oiill •• • • 7>' ~ f. in a table. '" 7S Ir~. • I:" .".'<OTh. ow /.63).lIc the results of several regTcs sions i:.<.l so qU~hfyUlg 75 P e rcent P e rcen t n that (a) Pcr(C'm..6). • • 4. {Of th~ [ural. . 0 !.7.0.87). itself. • "..lg~ o{En~h".2(1 f>l~' . bU I with morc regre ~sors and equations this method of presenta· tion can be confusing. :. '. " . .:?(l 6f)1 • • UM) o 2'. In (7. .~ '"~I. A better way to com m unic.. For c"a(11' p000356. /.0.5). This works wcll whcn there are only a few regressors an d only a few equations.. . '. as in Equation (7.0 ~ ifNI r.ubsets of the possible regressors? So far. 2 T~st Scatterplol'J of T Scores vs.01 . coefficient c in Eng : tht: cnefti Tabular presentation ofresult.2. it I~ ~e cocfficiCld I units of the problem.4(1 • . ~...(~t nil h4U (." .64).
R') and the sa mple size (which is the same for all of the regressions.LOO" (0.242 CHAPTE II..27) .2. descri bed in Appendix 4.: clVerage tnt lCore in the district. The md.036) . Reg.·el using a t . I n equation fonn.059) 686.547" (0.'l. testi ng the hypothesis th at the relevant coefficient is zero.0·· (8. Fo r example.4) (0.58. is signi fica nt at the 5% level (one asterisk) or Ihe 1 % level (two asterisks) .424 9.9) 700A'" (5.28" (0.OS 0.626 I).01 ** (0.46 0. (7.0.033) . . consider the regression of the test score againsllhe student. All the informa tio n that we have presented so far in equation Cannat app('al"S as a co lumn of this table..O JR) .52) UO· (0.te acher ratio.79()· (0."'" (0.0.19) .) Percent on public income assistance (X4 ) Intercept ( 1) 1 (2( : (3) .1 Results of Regressions of Test Sc:ores on 1M StvdentTeacher Ratio and Student Characteristic Control Variables Using Colifornia Elementary Sc:hool Districts Dependent vo.28 x S TR.~trkts in Ca lifornia.529u (O.dual coefficient i1 StahstlcaJl y signifi ca nt al lhe "5% level or *"1 % signifi cance k .deDt.1.J()" (0..650'" (0.9 . " 420 420 420 ThClie regressioon5were estimated llSing tbe d~ta on KS scb{K)1 d. osided test. Standard ~rro'" are g.. 420 observations).0.O.6) 698. the same depe ndent varia ble. wi th no COO lrol variables.58 0.27) .488 (0.euor Studen tteacher ratio (Xl) Percent English learners (XJ Percen t eligible for subsidized lunch (X.049 14. SE R R2 (10. and the adjusted R2.4) 0.5 ) Summary Statistics SER R' 18.~n in parenth('se~ un d~r codf. The e nLries in the fi rst fi ve rows are thl! est imated regression coefficients.2. The final three rows contain summary statistics for th~ regression (the sta ndard error of the regression. The asteri sks indicate whether the rstatistics.0. with their standard erro rs below the m in paren· theses.43) 1.·.0'" (6.030) I .049.122" (0. test score.773 420 o.)1 ·· (0. SER.1.:J4) ..031) 0.1.65 0.tlti:> regression is  TestScore = 698.n3 420 .9" (10.11 = 420.2" (5. 7 HypoIhe~ is Test$ oncI Confidence lotervoh in Mulliple Regression TABLE 7.7) 700.0.068) 698.= O.08 11.024) (4) (' ) .52) = 18..iobl.
036) 700. when the st udent chamClerislic variables are added. Column (2).uggest Ihree conclusions: I.) Similarly.0.teacher ratio and on the percentage of English learn ers.111e intercept (698..4. "R!) and observations). 111ese results s.5) .27) .6 Analysis of the T Score 0010 Set esl 24J .lhe SER (J 8. For example.tcacher ratio alon e explains only a small fraction of the variation in test scores:The R2in column (1) is O.teacber ratio and ( IVa control vari ables. Controlling for these student characterist ics cuts the e ffect of the s tudent I hypothesis thaI : asterisk) or Ihl: statistics for the listed R2. and its standard error (0.5)  lO.049.4" 9. tt'lis teacber ratio on test scores approximately in balt 'Ibis estimated effect is not very sensitil'e to which specific control variables arc incl uded in the regres sion.7.\. Regressions thaI include the control variables measuring slUde nt ch<fracteris Lies are reported in col umns (2)(5).teacher ratio by one stu dent per teacber is estimated to increase average test scores by approximately one point.52 = .. tbe intercept can be viewed as the coeUicjent on a regres sor that is aJwaysequallo I. The estimated coefficient 00 the student.5S). the (slatistic testing the hypothesis that the coefficient on the studentteacher ratio in column (1) is zero is . n format appears of the test scOre: 4uation form .9) and its standard error (lOA) are given in th e row labeled "Intercept.0.1 30" . which is indic(l ted by the dou ble asterisk next to the estimated coeffi c i~nl i. and in column (5) both of the economi c background variables are included. the percenlage ofstudents on income assistance is included as a regres sor. was previously staled as Equation (7.2 jumps.049). the R2 in Ihe base "" 420. (1.52) appears in parentheses just below the estimated coef ficien t.ion of test scores on the stud elll. In the fo ur specifications with control variables. The student characterisTic variables are very useful predictors of test scores.llS 0. Although the table does not repon I·stati stics.529· · (n.059) (5. The blank enlries in lhe rows of (be other regressors indicate Ihat tbose regressors are no t included in this regression.n Ihe table. Column (3) presents the base specification.1. Columns (4) a nd (5) present alternative specifications thaI examine the e ffect of changes in the way the economic background of the students is measured. however.teacher ralio (. these can be computed fr om the information provided. reducing the st udent. for example.2.038) 0.(1. Th e stude nt. which reports t he reg res. the percentage of E nglish learners and Ihe percentage of stude nts eligible fo r a free lunch.and the sam ple size n (420) appear in the fin al rows. ho lding constant st ude nt characteristics. In all cases lhe coefficient on Lbe studentteacher ralio remains statisti cally significan t at the 5% le vel. ve rows are the them in parc n Discussion ofempirical results.2..1 9) • . regressions (2) .2. in which the regressoTs aTe the stu de nt.Thc /i. as discussed in Section 6.38.28/0.773 420 Slon d3td eH()T'i lr H\ % ~igni {i All this information appears in column (1) ofla ble 7.048 (0. .28) appears in lhe fi rst row o f numerical e nlries. 2. In column (4). the R2(O.." (Someti mes you will see this row labeled "constant" becau~c.01" .(5).This hypothesis is rejected at the 1% level.
. at least for the purposes of Ih is analysis. To mitigate this potentia l o mitted varible bia::. If so..1t aln.~ a nonlinear function of tbe Xs. although it rema ills possible to reject the null hypo thc)'is that t h ~ population effect on h:). thc hypothesis that the coefficien t o n tbe pcrcentage qUilli· fying for incnme assistance is zero is Dot rejccted at the S°k. and confide nce interva ls are much ma rc useful for advising the superinten dent than the .' r ratio and its standard error. tbe eUecl of reducing the StudCnltcachd ratio migh t be quite d ifferent in districts wilh large classes than in d.: popu li!' tion regrc"~ion function is linear in the regressorsthat is. lic is 0...llly E nglish learners and d istricts with many poor children have lowe r test ~orcs.7 Conclusion Chapter 6 bl!gllO with a conce rn : In the regressio n of lest scores against the stu d\!nl. we need the lools devc lop<!d "'!~ ____. i.. Thus.omitted student charac\(:risljcs thai innuence leSl scores micl11 be correlat ed with Ih~ student.n: ~ · sion function " thal Are nonlinear in the Xs.... leve l (the /·st.Ieacher ratio in the dislrict wou ld pick up the d fect on lest scores of Ihc'''' omitted s tudent characteristics. 3..limmate omitted va riable bia" ari'lo)! from thc"c student chllracteristics.'I ri able bias.nificant in specification (5). holding these co ntrol vnriabk )..J .. tha I the condition.. f~ro at the 5% significance level. To extend o ur anul).. The signs of . regression (3). 773. these muhiple regression estimate).82). Duing SO cub.ICachcr ralio. he coefficients on the 'tu· dent lJemographic variables are consiste nt wllb the paHems seen in Figure 7 2 Dist ricts with m.\I . ItJ rcg. howc\ cr. Because they e. trict" th.<. we a ugme nted the regn:''> sian by including variables that con lrol [o r va rious student ch ar a c l erisli~ (I hl' percentage of English ICnrnl!h and two measures of stude nt econom1C bJck · grou nd). There is.11 cxpectation of Y! given the regressors is a straight line.. constant..t scores. tbe population regression line is nOt linear In the: X's but rather . hy p(llh l!~IS tests. 'his additional control variable i" redundant. The analy)is in this and the preceding cha pter has presumed that tht. is 0. no PitT ticular reason to thin k this is so. 111e control varia bles are nol always individuall y statistically significant In specification (5). and because the coefficient o n this con trol vun· able I!> nOI sig. In fact. a nd U so the \(U. ...:ady hme small classes.leacher ral io in the d istrict.ingle·regressor estimates o f Chapters 4 and 5.. • 244 CH APTiR 7 ~is TasPs and Confidence In lervals in Multiple Regression ). however. denl.pcdfication .lhc estimated effect of a urnt change in Ihe sludentlcacher ratio in half.lhe OLS estimator would have omitted \'.. Bec:lUsc adding Ihis control varia ble to the bas~ specification (3 ) hil)' :I negligible effect o n the esti mated coerricient for the sludentIcadh. 7.
Explain ho\l. !he Coocepb 245 Summary L H ypOIhesis lests and con fi de nce int ervals fo r a .! causal cf(~C l of mleresl.. a. 1 /I.In F. sion coc fficient (s)..96SE(j3t)· 2. dd . p l = 0 in the multiple d . .e specification cho . Joint hypotheses can ~ lested using . " tllllng sen to address concern about omitted variable bi ..~ . 1 regrc <>sion coeffici ent a re log e carried oul using essenliaUy the same procedure. . is givcn by p. nn par· esis that /3 1 = 0 all(/ fJz = O.l ~t ici t\' _o n l) Fslatislic (231) 95 % confi~lcnce 'set (2. es. . Yj = f3u + fJ. Si mply choosing the spc"lr. Regression specification proceeds bv first detern.o )'ou \\ O U Id testlhe joint hypoth popul a · . · Ilh Ihe highest R] can " tcallon w lead 10 reg.a 9.~ m p Ie .z = O. . ~ The basi specification can be modifi ed by including additional regressors that .. Key Terms ~Iu V. Explain wh y the '\c y 10 J OLS eStin1310rs "'ould be biased and in col"I ~ i stc nt.. 'testS regressio n model. . Expl<lin why Ihe Rl is lit I be I igh .Ie.IO rt!striclions (226) joint hypolhesis (226) Fstatistic (227) reSl.1 Ex pla in how you would lestl he null hvpo.X li + fJzXz.ression models Ihat do nol esti mate thl.3 4 ) base speCification (23t j ) ahe mali\ c 'Ipecific3l it 1ns (237) Bonfe rroni lest (251) Review the Concepts 7.' .I / nato rs of the regres.sed in the onevari· able linear regression_model o f Cha ple r 5. ExplaiP how you would test 12.. 's IS 3.·· a bil .ricted regression (230) unrestricted regression (230) hOJll OSked . I the null hypothesis that .2 Provide 3n example:: o r a regression Ihat atkuablv WOl ~ ld have a high value of 1<2 but would produce biased and iocol"I. I IHII were .. Why isn't the tl:sul l or the joint test implied by thc resulls o f Ihe fi rst lwo tests? 7.D_ : ~.~% con fidence Lllle r • va l fo r P.. 'h a . :::!: 1. For clI.l ler potential sources I ress 0 o f omill ed variable bias.Isten. Hypotheses involving more than o ne restriction 0 1\ the coeffit·ic nls are called joi nt hypotheses.
female earnings diffcrence estimated from this regre<.ion statistically significant at the 5% level? Construct a 95% confidence interval fo r the differe nce. Midwest. For th e purposes of these exercises k't AH E = overage hourly earn ings (in 1998 doll ars) Co f/(!ge = binary variable (1 if college.n. me nt fo r each worker was either a high setlool d iploma or a bachelor'.. mation on 4()()() fulltime full. and number o( children. The data set abo Con.year workers. Uo the lw ise) IVes/ = binary variable ( I if Region = West. Is age an impo rtant detenninant of earnings ? Use an appropriut e ~IO' lisli call est <lndlor confidence interval to explnin your answer.246 (HAPHR 7 Hypothe~s Tests ond Confidence Intervols in Muhiple Regr~on Exercises The firs l six exe r ci~es refer to the table of estimated regressions on page 2. compllled using da ta for 1998 from the C PS. ital status.. 0 otherwise) Sow" = binary v:v iable (1 if Region = Sout h. 7. 0 if hig."~i on res ults in column (3): a. Construct a 95 % confidc nce inte rval for th~ expected difference between their earn ings. tained info rmation on the region o[ (he count ry where the person lived. Betsy is a 34. degree. yearol d female college graduate .4 Using the regrc.3 Using the regression resuhs in column (2): 9.. b. Is the collegehigb school earnings difterence estimated fr om thi~ regression stati stically significant at the 5% level? Construct a 95''' 0 confidence in ll!r val of the difference. Is the male. .binary variable (1 if R egion = North eas t. (5%) and " *¥" (1 %) to the table to indicate the statistical signifi· ca nce o f the coeffici ents.<. b. 7.1 Add .h school) Female = binary variable (1 if female .2 Using the regression results in column (1): ll. The worke r's ages ranged (rom 25 to 34 years. a if male) A ge = age (in years) Nth easr . The data set consists of infor.. The highest educa tional achlc\t:. Uo the rwise) Midlvest = bina ry variable ( 1 if Region ::. O othe rwise) 7. Do there appear to be important regional d ifferences? Use an appro priate hypothesis test to explain your answer. 7. Sa lly is a 29ye arold female college graduate. mar.
29 (0.ignifl Summary Srotisriu and joint Tesh SER th is :95% ~re!>Sion F'\[. Conslruct a 95% conIjde nce in terval fo r Ihe d iffe re nce in expected r1ate sIn L\rold eumings be lwee n }lIa nil a anu Mo lly.46 (0.62 (0.Cor 247 .75 (1.04) Age (X l) l'llnheast ( X~) 0. 2.47.40 ( 1. Jennifer is a 28yea ro ld [e male college grad uate [ro m the M idwest i.or 1' 1 5.20) 0 .64 (O.06) s.1 94 R' 0.W) 2.21) (31 or's College ( X t ) Fe male (X2 ) 544 (0.10 6. (Hil1l: What would hap pen if you included West and excluded Mid· wes( (rom lhe regression?) 7. converted into 1998 dollars using the consumer price index).21) (21 JAS (0.05) 3.0 6. Molly is a 28yearold femil le college graduate from the West.27 (0. Ju anita is a 28yearold female college grad uate from the South. Explai n how you wou ld conSlruct a 95 % confid ence interval fo r bl fo r the an apprO' the difference io expected earnings between Juanita and Jennifer.27 6.S The regression shown in colunUl (2) was esti mated again . The rcsuils are .69 (0.62 (n.0.04) I on c let ".JllSti c for regional effects .60 (0.29 (0. ii.1 4) 4.eJl. .28) .22 0.21 0.26) 12.Exercises Re5Ulis of Regressions of Average Hourly Eamings on Gender and Education Binary Variobles and 0tMr Characteriltks Using 1998 Data from tf1e Current Populorion Survey .30) MId west (Xj ) Soulh (Xo) ln tcrcc pt 0.20) 0.190 6. vc ~pe nc:lent variable: ovet"age hourly eorningl IAHlJ• ROSI .69 (0.21) 2.176 " 4000 4000 4000 dence b. this time using dala from 1')92 (4000 observations selcclcd at random from the March 1993 CPS.
Typicall y fivebedroom houses sell for much more than two· bedroom house:.2l . SER = 4!. was there a sI31islicaily sign ifieanl change in I be coefficiem on CQllege? owing stateme nt: ' In all of the regressions. Construct the homoskedaslicityonly Fstatislic for tcsting {31 :: {34 () in the regrcssion shown in column (5).03) Xiil = 0. R.6 E\alua te the fo U on Female is negative. labor marke l. Con struct a 99% confiden t inte rv. Is the s tatistic significant at tht: 5% level? t"" Test 133 "" f34 .'· 7.9) + OAS5RDR + 23..:!e l fro m an adjacent lo t. b. Is Ih is consistent with your answer to (a) and with the regrec. «US) (0.20) Comparing this regression 10 the regression for 1998 shown in column ( 2).002Lsiu (2. large. '1l1e f:·statist ic for o mitl ing BDR and Age' fro llllhc regression is F =' O. Is the cocfficienl on BDR statistically significa nll y difre re nl from :Le TO? b..S. D o you think thai another sca le might be more a ppropriate? \\'h )' or why no t? e.0(1) (O. d.0. R.1 in the text: Construcllh c R2 for each of the regression s.= 119.48(1/11 + O..2.1.5 a. SER = 5$5.::nt 7.5 re ported th e fol lowin g regression (where standard e rrors bee n added)..OO04S) + O.OYOA gL' .156NsiU + O. .29College (O.. 0 in the regression shown in column (5) using the Bon rerroni tCoSt discussed in Appendi x 7. to Table 7.72 .40Age..I 248 CHAPTER 7 HypoIhe~is T ests ond Confidence Intervob in Mulriple Regre~sion 2. Lot size is measured in square feet.8 R cfe rrin ~ !l.! for the change in the value of he r house. the cocfficio.~~) (0.7 Question 6.48. A homeowner purchases 2000 square f.1 11) 0.8Puor.5) (8.94) (0.OR Are the coefficients on RDR and A ge statist ically different from ze ro at the 10% levct? 7. tUl\C p.2 (2J . This provides strong stati sti cal evide nce o f gende r discriminalion in lhe U..59Fenwle + 0.2 = (0. and statistically significant..77 + 5.· sion more gene rally? c.6 1) (!U.
:a l Exe rcise 4. c. f3 r = 131: b. Construct a 99 % confidence interval for {31 for the regression in column 5." ors have listie to test a.9 Consider Ih ~ regression model Y.10 E quations (7 . What is the cslima!cd effect of Age on e arnings? Construct a 95% confidence in terval for the we (fi cient on A ge in the regression . Are the result s from the regression in (b) substa ntively differen t fro m th e result s in (a) regardin g th(' e ffects of A ge and A H £'? Does !he regression in (a) seem to suffer from omitted vari able bias ? ther scale sion is F == '[ereol froJ11 d. II.he null hypothesis that bOlh FI.) 7. Bob is a 26· yearold male worke r with a high school diploma. c. and educat ion (Bachelor). Tcsi Ih e null hypot hesis that Bllchelor ca n be de leted from the regressio n. (Him: You must redcl'inc the de pendent variable in the regression. d. f31 + /3 2 = 1.Empirical Exercises 0.1gs? Test the nu ll hypothesis lha l Female ca n he dele ted fro m the regression.:l se t CPSQ4 described in E mpiri r. Con b. Pre dict Bob's earnings using the e stimated regression in (b). Test I. Why are the Rl and R2 so si milar in regres sion (b) ? f. R un a regression of A HE on A ge. Predie! A lex is's earoings using the regression ..2 l. Use "A pproach #2" from Section 7. gender ( Female). 19 {3~ = f3~ == II [lineant al the Llsing the Bon • .3 to lIansform the regression so thai you can use a (stu ~ge?cicnt fii s slrong et. R2 and R2.14 ) show two fo rmulas for the homoskcdasli cilY only r "statist ic.1 to answer the following questi ons. f31 + a{32 = 0. Are gender and educatio n de lerminants of earnil. 249 mn (2). A lexis is a 30 yea ro ld fema le worker with a co llege degree.e Empirical Exercises {rom E7. whe re a is a constan t. What is the est ima!ed inte rcept? What is the e stimated slope'! rcgrcs· lof her t lot. rbedroom ~le Run a regressio n or average hourly ea rnings (A HE ) on age (Age).13) and 17.1 Usc th e dat. c. 7.m(lle and Bachelor can be deleted from the regression. Compare !he fit of the regression in (a) and (b) using the regressio n standard crrors. = 130 + {3 IXli + fJ2X :1' + u. Show tha i the two formulas are equivalent 'iz.
R un a regression o f years of completed e ducation (ED) on disl30ce to the nea rest col. o n avernge. I Run a regressio n of Growth o n Tr(llieShare. i E7.4 Using the data sel Gro\\1h described in Empirical Exercise 4. Is this result consistent with the regressions that you constructed in part (b)? I . II has been argued Ih at. Q.. examine the robust.n ess of ihe confidence inte r"al that you constructl!d in (a). Rev_COUPs. lege (Di~I). Consider lhe various control variables in the data set. A n educa tio n advocacy gro up a rgues Ihal.1.1. Is the advocacy groups· claim consiste nt with Ine est imated regression? Explain. Construct a 95% confi dence interval for the effect of Beauty o n Cou rse_ E mf. H..1 U sing the da ta sel TeachingRa liugs desc rihe d in E mpirica l Exercise 42. Run a regression of Course_ Evaf on Beauty. b. Ilnd Hi~ panics complete more college than whites.2S(l CHAPTER 7 Hypothesis Te~~ and Coohdence Intervols in Mvlrip1e Regression g. Conslruct a 95 % confid e nce interval fOT the coefficient on TradeSJ1I1r/'. A!iS(lssil/(lfioll. Is the coefficic nt stulistically significant at the 5% le ve l? .15 year if dis ta nce to the nearest college is decreased byl0 miles. carry o ut the following exe rcises. a pe rson·s edu catio nal a tla iome nt wo uld increase by approximately 0. Discus~ boW th e estimated effect of Dist on ED changes across th e specifications.4. construct a tabl e like Table 7. blackl>.3 10 anl>. Whic h do you think. II. should be included in the regression? Using a lable like 1able 7. controlling for other factors. Docs controlling for these other factors change the estimaled effect of di!> tance on college ye<lrs completed? To answer this q uestio n.~ a nd RGDP60. c. What a re these two conditions? Do these condi tio ns seem to hold he re? E7. A regression will suffe r from o milled variable bias when two condI_ tio ns hold.wer loe following quesrio ns. YearsSchoof.3 Use Ihe data set CollegeDistancc described in Empirical Exe rci se 4. bu t e:'tcJud ing tbe data for Malta . b. O lhe r factors a lso aHee l how much college a person comple tes. and several modifica tions of the base specifi cation. What is a reasonablt: 95% confidence interval for the effecl 01 Beauty on Co urse_Evof? E7. Include a simp le specification [co nstructed in (a)l· a base specificatio n (t hat includes a sel of important conlrol "an· abIes). carry Qui Ihe fo llowing e"e rcises.
if the author of a study pres~ lIt s regression results but did nollest a joint rcstricl. Rn'_Coups.2.ion in which you are interested . Because Pr(A n B) ~ O. then you will not be able to compute the F·sliltistic of Sec!1on 7. Bonkrroni's ~ on's edu a r [((lis rest col estimated TIle Bonfe rroni test is a test of a joint h ypo Lheses based on the Istatistics for the indi vidual hypothe ~s..al fOf ·· A or B or both" (the union of A and B) ._111The Bonferroni Test of a Joint Hypotheses able 7. and RGDP60can be omitted from the regression..3 to table of regression res ults...Pr(A UB) I .o tes. Assassill(l (ions.The Bonlerrooi Tesl of a Joint Hypotheses 2S 1 Indi seem 'ise 4. where ') and'2 are t he Istatistics tha t (esllhe reSlrictions on f3 L and f31. What is the p. ·ntis is done by using Bonferron i·s ineq uality to choose the critical Y uc <ll c to <lllow both for the fac t th at two restrictions arc being tested and for any possible cor· relation between I) and 12' = /310 and {32 "" f3z.. antiYOU _ . structed The method of Section 7. Test whether.ted in ((1)1.. and let A U R be the event I Rev _Coul'~ ~ inter. and you do nOThave the o riginal data._ 7_. rejecl (7. ~~ van rriSCUSS hO\\' irtcalions. Then Pr(A U B) = Pr(A) + Pr( B)  Pr(A n B). the Bonferroni test i~ the o neatatime Istatistic test of Section 7. However.1. 011 legreSSion This method is an appli catio n of it very general testing approach based inequality.The Bonfe rroni test of the jnint null hypothesis /3! based o n tlte critical value c > 0 uses t he fo llowing rule: 11 Acce pt if li t I S (" a nd if 1 s c: olhe . This appendix describes a way to test Joint hypothest's that can be used whe n you on1)' have II tisc 4. TIle trick i~ lo choose the critical value c in such a way that the probability that the one· atatime test rejects when the nu ll h ypothesis is true is no more than the desired signifi· cance level..2 do ne prope rly.20) (Bonfe rroni o ne·atlltime (statistic test). taken as a group.2 is Ihe preferred way to test joint hypotheses in multiple regre:. APPENDIX b.vise. Lei N and W be the complements of . Yea rsSchool. say 5%.· r eel of sian. it foll ows tha t Pr(A U B).2. respectfully.. Let A and B be evcnts.value of (he Fslatislic? . that is. 1\ . Let A n a be the even t " both A and B" (the intcrsection of A and R). Does C of dis l conslrUct 8 . Pr(A) + Pr(B).sisten! with kbut cxduJ· Ily significant Bonferroni's Inequality Bonfcrroni's ineq ualit y is a basic resull of probability theory. This inequality in turn ~ implies that I .(Pr(A) + Pr{B)J. s and His .
22) tells us t hat.11 \alul.241.(oneatatime tes t rejects) S l Pr( IZ I > c) .Pr( A U 8) .scd o n homoskedasticityonly ~t.3. and 4.nificance le\d The Bonferroni approac h can be extended 10 more than replaced b) 11. For eX<llI1p[e.252 CHAPTER 7 Hypothesis Te1h 000 Confidence Intervols in Multiple Regression A and R.Pr(ArnW) . su ppose the desired ~ignifical1i.1 distribution.20 ) \\ 111 al most 5% of the time under the null hypothesis..22) provides a way 10 choose critic. the f.3.3 arc larger than the critical values for te~tlllg" ~1T1 ~l<' restriction. m samples.S ror the fac l that.96 bccau'ie 11 pro\. Pr(Nnl1' 1 . (. Pr( 11 1 1> c) '" Pr(jlll > c) = Pr(l l l > c).1.. 1bis critical value i~ t ho: 1. (( the indi\'idual/slaliSli~ are based on heteroskedasticityrobustMam. which yields Bonferroni's inequ:.lctor of 2 o n Ihe righthand ~ i de in Equation p.l .lI1dard errors. "' I J Bonferroni Tests Because the event "1 1> cor 1 > c or both" /. the critical value c is 2.. Thus Equa tio n (7.lard crror~ th~' 11 the Oonrcrroni test is \"8lid whe ther or not there is heteroskedasticity.3 presents cnlical valul'S (' for the one·al3·lime Bonferroni ICSI fur "ilri''u~ ~ig' nlficanec levels and q . the oneatatime tes t reject.. Alx'Ording to Tahle 7. Table 7.TIlis critical \'a1ue is grealer than \. 121 IS the rejectio n region of the onc·atH Ulle ti m~" In test.s {\: .l h u~ Equation (7. as discussed in Seclion 7.\lC exceeds 2. Thl' critical values in Tabk 7. but if Ihe t_l>I 3tl~t i<:" a re b. > c and B b.241 in a~olute "" llue.lity.2. you get a second chance to rcjel. Under the null hypollh.2.IPr{A) '" Pr(B)] Now kt A be the e\cnt that it) l7(A U 8) :S Pr( A) I'.. (7. Then the meq PrOI.21) implic~ tha t.s if at Icu~t om: 1_~JjJ \I . Equation (7. the probability that the onea tatime test rejec ts under the null Pr/J. i~ in large samples. so Pr(1 rej~cl IWO cocfficienb: if there an: q restric tions under the null.l ~r ho moskedasticity.! + Pr(R))ieldl Pr( t' . in large samples. 1~ Zl > 2.22) The inequality in Equ:uion (7.'l the Jomt nu ll hypothesis. the onca tatime te5t in Equation (7 . the BonfeTToni test is \illid onl\ unt. tbe cvents "not A " and " not B:' Because the complement or A UII i.241) = 2.~r\l l'Orree\.: c w that the probability of [hc rc)cc[ion unde r [he null hypothesis equals Ihe deSired sip. For cxampk.25% percentile of the standllrJ norm. that is.?!. by looking at two I·~tatistic:s.21) provides a way tochoos<: tho.' 21 > r. with q .! the c\'cnt that . A' nlJ<.:e k Id I~ S'!Iv and q = 2.1 > c or I' ll > c or both) S > c ) + Pr(1r11> c ).. critical value c w that t he "CIn(' at II larg~ Istatistic has the desiroo significance leve l in large samples. .
241.The Bonferroni T of a Jail'll Hypotheses est 253 il AU 8 i~ TABLE 1. ~ vtllid ani) tln lic' .960 2. so we canno l reject the joint null rejeel this hypolhesis at the 1% l:.: is Ihe )"" I 2.60 and 12 A ltho ugh lId = 2.394 2.5%·Thus :.2 41 2.01'1 (7. both (I and 12are less t han 2.20) ~ill lance \0 reject ttl.1\.W 2..· ificunco: level i~ ieal \ alu.241 .: ~ tesling a .807 (7. 'nle /. us ing the Fslati. at the 1 % significance level w~rt using Ihe Bonferro ni les!.\' if the re are q (Ilion (7.1 onc 1_~13 Ii~lic cau<.2 .22) e c sn lhal the ificance lev".128 2.:!l I if Ihe 1_.023 . Tn cont rast.ull5. null is < 2.Ih.t:l li~lic. because 1 > 2. However.21 ) 1.241 2.498 1% 2 (7.43.. Application to Test Scores 3 e_al:l_lim~ cal a time" ypolhc5is in implies thaI. we able to for var.slalistics tes ting th e joint null hypolhesis that the true coefficients on test scores and e xpenditures per pupil in Equ a tion (7.sig.:rl~ t Tldard errol".tic in Section 7. we can rejecl the join t n ull 121 hy ~thesi$ hypothesis at the 5% significance level uSing the Bonienoni test.6) are.c it pror.0.22) is in ahsolute value. respectivel y. 11 = .ioglC: 1.ignifica oce leve l.3 1\'n8') 2: Bonferroni Critical Values c for the Oneatotime tStan5tic Te51 of a Joinl Hypothesis Significon<e Level he inequlll Number of Restri ctions (q) 10% 5% 2..935 3.
. the non linear popu lation regression function in Figure S.iependt'nt varia hI e. ope I so th at the effect o n Y of a unit chunge in X does nOl i tself depe nd on the valul! of X.l~ The methods in the second group are usefu l when the effect on Yof fl has a steeper slope when Xl is smat! than wh en it is larg. Th is chapter develops two grou ps o f methods fo r de tec ting and madding nonlinear populatioll regression functions. But what if the effecl on Y of a ch(lnge in X does depend on th e val ue (If one or more of the independent variables? If so.• CHAPTER 8 Nonlinear Regression Functions I n Chapters 47. the effect on test scores of n. the sL of the population regression fun cl io n was con!ot'. An example of a non linear regression funerioD wjth this fea ture ie. say X. X I_ depends on the va lue o f XI itself. has . For exa mple.ln in diSlricb wilh few English leurncr.lnt. reducing class sizes by one s tude nl per tcacher might have a greater effect if class sizes arc already man ageably smaU than if they are SO large that the teache r can do little marl! than kee p the cI. the effect 011 . (h~ popu lat ioo regression [unCI ion was assumed to he linear. the test score (Y) is a nonlinear function oC the Studen Heacher ralio (X I)' where this fun ction is s teeper when X \ is small. shown in Figure S. students still learning English might cspecially benefit f TQ m h<1\" ing more oncononc attention. if so. Tn other words. ss under control.. ln this exa mple... Whereas the linear population regression (unction in Figur~ K I.2.dueing the studcnt. The me thods in the first group ar~ useful w hen the effect 0 11 Yof a change in one indepe ndenl variable. l. If so.:. ntis first group of met hods is presented in Sectio n S.teacher ratio \\ill be greater in di~trict~ with many student s stillle:lrnin~ English Ih. the pop ulation regression f unction i s nonlinea r.Iconstant slo pe. for ex.. change in Xl depends on the val ue of another in<.amplt!.
Nonlineor Regression Functions 255 FIGURE 8.\ _____ '"" x. I Xl " ." F()r ving c nonlinear function o r the independent vari ables. the pop ulation regression function is a X. . e ffe ct oil . J ('P ~n d~ 0 <1 the val\l ( uf X 2 ~" nction s In Figure 8 .1 b. tbe \ \carning. the slope of Ihis Lype of population regression func tion depends on the value of X l' Th is second group of methods is presented in Sect ion R3...As shown in Figure S. 1 YI Population Regression Functions with Different Slopes Y .ioo ftmctioo wheo Xl eo 1 " Ri~r.{~ aL n t. these models are linear funct ions of the unknown coe. the slope of the population regre~i on function depen ds on the value of Xl In Figu re 8 .<. ng To Population regres.. that is..3.) is a nonlinear functi on of one or more of the X's.p~nd.. . on the v:11u<" of X l luc of y ' ''1 ''. Altho ugh they are nonlinea r in the X's. 1a .. 01 a In th~ models of Sections 8.>nds on the value of X2 . the slope of the populotion regress ion fu nc tion dep.1b percentage of English learn ers in the district (X2 ).. "" Population regressjon jUK tioo whfn X2 r' = 0 x. 1c. (c) Slup".~ '"" R i~ '"" x\ (a) C omta m . the conditional expectation E(Y.fficicnts (or parameters) of th e populat ion regression model and th us arc versions of the multiple regression model of Chapters 6 and 7. lo pe (b) Slope J". ta test scores ( Y ) of a red uction in the st udentteacher ratio (X l ) depends on the re S. Therefore. howO 8.2 and 8.le. In Figure 8. X 1. the popvlation regre~sion fu ndioo has a constant sk>pe.
1 A General Strategy for Modeling Nonlinear Regression Functions Thi s section lays OUI a general strategy for modeling nonlinear population reg res · sion functions. 1. To keep things si mple.8. wc combine uonlinea.3. we found that the economic background o f the students is an imp<W tant factor in explain ing performance on standardized tests.. however. the nonli near models are extensions of the multi· pie regression model and lhcrcfore can be estimated and tested using the tools of Cha pters 6 and 7.lnd tested using OLS and th~ methods of Chapters 6 and 7. Iv tWO inde pendent \'anables. holding studen t characteristics constant. but they cll n be estimated using noolinear least squares.!f. First.teacher ralio. If sO.2 irtl roduce nonlinea r regression functjons in the conk\1 of rcgressio n with a single inde pendent variable...5.. In practice. t ' . it is impor1ant to analyze nonlinear regression functions in models that comrol for omitted variable bias by including control va riables as well. the regression function is a nonlinear fu nction of lhc A S alld of the paramete rs.256 CHAPTER 8 Nonlinear Regression Functions unknown paramete rs of these nonlinear regression function s can be estimated . That analysis used 1 '.r regression functions and additional C(lntrnl variables wh en we l l1ke a close look a\ possible nonlinearities in the relationship between test scores and the student.. Appendix 8.. . we re turn to the Californ ia test score data and consider the relationship between test scores and district income. 1 and 8. addilional contro l va riab lc~ arc o mitted in the e mpirica l exam ples of Sections 8..\'0 econom ic background variables {the percentage of studen ts qua lifying for a ~I.tIJ 8. Test Scores and District Income In Chapter 7. In Section ~.1 provides examples of such funct ions and describes the nonlinea r least sq uarc~ estim:lIor.3 extends thi. In th is st rategy. rn some applicat ions. Sections 8. the parameters canno t be estimated by OLS. hOWC\'\.llt sidized IUDCh and the '''' rccntage o f district families q ualifving for inco fl1 ' r . and Secljon 8.
the n f( X) is no nline ar.7 (I hat is..8. {he medjan dislrlCI income is 13.300 per person). but are above the line whe n inco me is between S15. A llo nlinear func tio n is a fu nction with a slope thai is no t constan t:1l1e function [(X) is linear if the slope o f [( X) is the l)ame for a ll values of X.. bu t if the slope d epe nds o n the va lue of X..000.3 ($55.2 2S 1 Scotterplot of fe.lb tbe OLS reg ressio n line rela ting these t wo variables. Rathe r.000) or very high (over $40. it seems that the relationship betwee n d istric t income a nd lest scores is no t a stra ight line. But this sca u e rplo t has a peculiarity: Most o f thc poin ts a rc below the OLS line whc n income is very low (under $10.'dislriCi i~ com e " ) .71). The re see ms to be so me c urvnturc in the re la tio nship he tween test scores and income tha t is not captured by the line a r regression . The Cal ifornia data sel include. it is no nlinear. In sho rt. $13. slUde nl S from afnue nl districts d o bette r o n the tests tha n slude nls (rom poor d istric ts.TiCf in come (tho usands of doU an) . ~) '0 50 60 DiSl. <lIa ng w. 1 A General Strotegy lor Modeling Nonlinear Regression Functions FIGURE 8.nd assista nce) 10 measure Ihe fraclion of stude nts in the district com ing from poor fami lies. Figure 8. . but the linear a s regression line does not adequately de~ribe !he rclotiomhip between the!>e vorioblos score 74" 720 700 """ 660 640 62fi 600 0 10 2 . TIl e sam ple contains a wide ra nge or income levels: For the 420 d istricts in o ur sa mple.71 . dislrici income measured j('l tho usands of 1998 dollars.000 a nd $30. Test scores a od average income a re stro ngly positive ly corre la te d. with a co rre latio n coefficie nt o f 0.2 shows a scallerplo t of fifthg rade leSt scores against disirici income for the Califo rnia data set. a nd il ranges fro m 5.3 ($5300 per perso n) to 55.000).sl Score Vi. broader measure of economic back ground is (bi! average annual per ca pila income in the school dislricl (. District Income with a Uneor OLS Regression function Tf'~1 There is 0 positive correlotion between lest ~ ood dldJicI iocome (correlation "" 0 .700 per person). A differen l .
. /l1wm e. and th e second re gressor is IIIC()IlI e1.2.. I) \\:ith the multiplt!' n:gre~sio n model in Key Concept 6. TllU~. =0 607. This cu rve would be steep for low values of district income.ln d /31 are coeffici ents. you ' cou ld predict the lest score of a district based On ils average income.. At first. = f30 + {JI l n collle. 'AC could model test scores as a functi on of incomt: {/Ild lhe square of income.. Ilttn would nallen out as dist rict income gels higher. llllcome. But these population coefficients are unknown and therefore must be estimated using a Sdm· pic of datil.. R~ 12.. you . and II .s.1 ) is simply <1 mul tiple regression mode l wilh two regressors! Be<:<lu~c the quadrat ic regression model is a variant of m ul tiple regrc<. E ( TesrSto re. is the square of income in the ith dist rict.sion . Eq um io n (80 1) is ha e~\ l led the qu adratic regressio n model became the population regression function. what is? Imagine drawing.554. A quadralic population regression model reialing lest scores and incomo: i~ wriu.. as Income a nd I ncomt? the nonl inear model in Equamll1 (8.I ) is in fnet a version of lhe multiple regression mode l with twO regre~. If YOli knew the population coefficien ts f3r" 131 and 132in Eq uation (8. rcprest:nts "Il the other fac tors T t de termine leSt sco res.) = f30 + f3Jncorne.IH) . . lI ' unknown popul ation coefficients can be est imated a nd tcsted using the OLS methods descr ibed in Cha pters nand 7.2..nrs. 4 f32 / I/(:om('~ + 1/" 1'...3 cu rve that (jLS the point>.:ornc in th e . If you compare Equation (8. :Iller defining the r~gn:sson.. is M error term thal. ill see that Equa· tion (~ .27) (O.:n mathematically as TesrScore.2 yields r.1). as usua l. One way 10 appro'{imalc such a curve mathematically is to model the relations hip as a quadratic (un(lioo.11131 \. E stimating the coe ffici e nts of Equnti<lfl (K 1) using OLS for the 420 o bservations in Figurc H.3 + 3.. In come. + f311 tlcome! . in Figure !j... 0. is the inr.1 where (a.• 25' CHAPTER 8 Nonlinear Regression Functions If t\ slraighl line is not an adeq ua te description of the relationship beh"cen district income and test scores.0423IlIcom r . is a quadrat ic fu nctIon of the independent vari"ble... Income.! e~limated regression function (So2) is plotted in Figurt' :o. however.851IlCOIIIl' ~ O.1 ) where /30' 131' .9) (0.IJO. The fir st regressor is I ncome. it might seem difficult to find the coefftcie nts of the quadratic func tion that best fi ts the data in Figure 8. (':\. usual) standard errors o f the estimated coefficicll h are gi\cn if! parenlhesc~ Thl.2.Ih diMricl.}.
This {statistic is ( = (~l .: m~ . this exceeds the 5% critical val ue of this test (which is 1.1).2 and 8.0043 = 8.5 regression function. 1) . against the alternative that it is nonlinear.81.3 A General Strategy for Modeling Nonlinear Regression Functions 259 ip between t!> the points lcollle. its ~ using the OLS knts of Equation f = U.O.3: The quadra tic modd fit s the data better than the linear mode l. the null hypothesis that /32 = 0 can be tested by constructing the tstatistic for Lhis hypothesis.0)/ SE(ffi2)' which from Equation (8. except th at the regressor Inc:ome 2 is absent.1 FIGURE 8.0423/0.n is ssion fu nction.1) holds with /32 = O. if the relationship is linear. after del in E4uation ~Ie regrc%ion.5 regre~ion function fits the dolo better than the linear 01. 620 (. so we can reject the hypothesis th at /32 = 0 at aU conventional significance levels. : q ~ (8. .aL (IS luation (8 .1) is just a variant of the multiple regression model. the quadratic regression function seems to fit the data better than the linear one.8. (It. We C(ln go one step beyond this visual comparison and formally test the hypothesis that the relationship between income and test scores is linear. In absolute value.tu " Quadrati<: regressil)l1 he r ll district.) fllts are given iTl I in Fi~ure s}· ted superimposed over the scatterplot of the data. that is.1 ) with tho: i'see that Equa' two regn::sson. Thus. Thus thi s formal hypothesis test supports our informal inspection of Figures 8.. ::w 700 680 660 (8. Because Equation (8.1"11at j~ f income. ~ te rm tl.1 \ (. Indeed the pvalue for the Istatistic is less than 0. Bul theSi: ed using a sam l I ua d ratlc \.01%. you bille.96). District Income with linear and Quadranc Regression Functions Test score 740 Li near fE"9 ff'S>ion The quadratic 01. In short. . If the relationship is lineaL then th e regression function is correctly specified as Equation (8. the n Equation (8. Thus.554.M IJ 10 20 30 61) 40 so DiHTicl income (thousands of dollars) Itic (unction of ~tion (R. th en mate such a :tion. The quadratic function captures the curvature in the scatterplot: It is steep for low values of district income but flat tens out when district income is high. we can test the null hypothesi s that the population regression fu ncti on is linear against the alternative that it is quadratic by testing the nu ll hypothesis that {32 = 0 against the alternative that /32 *. ntl income is Scatterplot of Test Score n.unc .2) is ( = 0..
up model.m.) = f30 T f3 1/ncome. Equa tio n (8. however.. the e xpected change in }' h more com plicaled to ca lcula te because it can depend o n the values of the inde· pendent variables.3) . In lhe i~ the populllllon rcsre. As d iscussed in Section 6...~~ but I~ II hnear fundl "it the unknown pammetcrs (the 11"). is the d iffe rence In the: '1l1<:" tcon Mn(mline . If the populmion regression func lio n is line ar.. holdi ng constan t other im. . X" . r rej!rt!>l>lon·· IIpphes IWO co~ccptUlllly ~iffcrcnt ramlh~ of modeh. Y = t3 l AX I . Fo r exam ple. X k. (~.\ .• .. where 131 is the popu lation re gressio n coefficien t multiplying Xl. the expected cha nge in Y is 6.)n function IS a noJl' hnear funcuon of lhc unknown parnmclcl"5 and mlly or mll~ not be" mmlmear funcllon or rhe .. X k .lly. .) + Il j. . in Eq ua lion (X3) we a llow fo r the possibility Ihat Ihis condi tiona l expectation is a nonlinear fu nctio n o f X l.mJI).2.. .. X 2J •• .3) becomes tbe lincar regression model in Key Concept 6. X 2. . X k . Xt..4). this effec t is easy to calcula te:As sbown in Eq uation (6.fi X II. a possi· bly non linea r fun ction o f Ule inde pe ndent varia bles Xl i' X 2i •• .. so X I is Income a nd the population regres sion (unction is j{ lncomt:. I ).. ..) . + . • X A =Al + 13IX" ) 2l + t32XZ.. . When the popula tion regressio n fun ction is Iinc'H.. where! ca n be a non line a r fun clion. .the populalJon n:tt:rc:u. + t3.. froll! ..only o ne inde pe ndent vari:t ble is present. In the :)«ono fllmil).X •.3) allows for nonlinea r regression func· tions us well. penden t variables X2..a nd Eq ua tio n (8. II . Ihe pi A generalformula for a nonlinear population regression function .Ii" '? fll.. X l .) .• X"i._ that is. flIC models In the hod)' of Ihit chapl er fire all in the first family_ Appendl~ ft.. .X! j.. second h.. + fJJ l ncomer. The effect on Y ofa chanEe in X...... . i = t.. Ihenj(Xli. I Tbe no nlinear popula tion regressio n models conside red in this chapte r a re of the form \'alue Ih~ p ~~' jI \~ YI :::: A X il.. Whe n the regressio n function is nonlinear. H o wever. i~ the e rro r te rm .. .) is Ihe popula tio n nonlinear regreSSion f\lndio n. the e£fe~1 on y or a change in X I' uX I • ho lding Xl> ...2.<SInn funcllon IS a nonlmear function o f the ). IX II' Xli' .Xk/.l t. constant. in Ihe q u ~d rat i e regressio n mo de l in Equa tion (8.01 whcre !(X li • X2i . ilnd II. I tllk~ ... TI XI ·10 run exp. £ ( Y.. Dccausc the population regressio n func tio n i~ the co nd itio na l expectation of Yi give n X I/' X u •.260 CHAPT(R 8 Nonlinear Regreuion Functions The Effect on Y of a Change in X in Nonlinear Specifications Put aside the test score example for a mome nt a nd consider a gene ra] probk:m You want to know how the depe ndent va ri'lble Y is expected to change when tht independent va riable Xl changes by the a mo unt 6X I . ' X ". • Xli..
.. To estimate th e population effect. + "X. effect d e pends o n the initial dis trict income. A t a general level..W. ··. XI: ilnd th e predicted value of Y whe n {hey take on the values X I .. Xd· d Because the regression function/ is unk....) . X2•• · •• Xk ) be the predicted value of Y bas~d on the estimator f of the popu lation regression fUllction. the e~pt!clcd change in Y is the difference : 8. he Ie· aT. j.XI:' Tn the non linear regr. Xl .. I lXI. X 1. X I:' The difference.. %flJJ¥H5M>b. Xj.. ( is de· The c:. T11en the predicted change in Y is ~y ~ j(X. .\'"I' X 2.X. X.. denot e this esti mated function by f: an example of such un estimated fun ction is the estimated quadratic rt:grcs sion fu nc.ff.' 8.rm .l. ..1b' 0111 \)IC Application to test scores and income... .:: 261 "'I KEY CONCEPT 1 in X. say !ll~ is whal happens to Yon average in the population wh e n X l changes by an am (l unt !lXI' hol ding consta nt the oth e r variables X :" • . . XI: . .1 A General Strategy lor tv\odeling Nonlinear Regression Functions .2)? l:kcausc that regress ion function is quadratic. .tion in Equation (8.X2 .. constant.pected change in Y..3).r'\ :lion . between these two expectcd val ues. X:!./f.3) "~ .~/:. . 111e estim ated efIect on Y (denoted.. . X 2 . X I .1y) of the change in X l is the diff~re nee between the predicted va lue or Y whe n the inde pendent variables take on the values XI +.2).r. associated with the change in X l' ~Xl. . is the difference between the value of the population regression function before and afler changing Xl ' hold ing X 2_•••...nly res.:ssion m ouel of Equation (8.6.Xk• The met hod fo r calculating the expected effect on Y of a change in Xl is sum marized in Key Concept 8. X 2····· X".l.n of Ihi~ expected value of Y whe n the independent variables take o n the values Xl + aX I. 'he . . Lei I{X!.odd Irune· CI OO n {he J(X ) + . ...1. X k constant That is. lirsi estimate the population regression fu nct ion.).1 .. lion. th e population effect un Y of a change in Xl is also unk nown.. . based on the estimated quadratic regression function in Equation (8.X 1.. We there fore . .3) 'SSI The estimator of this unknown population difference is the difference between the predicted values fol' these two ~ascs..5) . ). (S.'" .f(X"X.~. Xd . hold i ng X 2• . " . Y. X 2' · •... .nown...~s..4) :be . .  THE EXPECTED EFFECT ON Y Of A CHANGE IN Xl IN THE NONLINEAR REGRESSION MODEl (8 . X. What is the predicted change in tcst scores associated with a change in district income of$I 000.( d pVO . .Y = l(X I + l it.. Ih~ (.. X .. thb. Ihis effect on Y is [) Y = is. ""~ ". ~ m. (8.md the expected value of Y when the independent variables take on the values X l' X?.
000. apply the ge ne ral form ula in EquaLion (8.0423 X 412) . the predicted value is 607.oo:J (the predicted chang~ are 2.000 and one with average income of Sl O. To illustrate this me thod. Thus. " Ln the nonlinear regression models of th is chapter. To comp ute AY associated with the change in income from 10 to I L we lan .0. we need to eompUle the standard e rror of ~ y in Equation (8.262 CHAPTER 8 Nonlinear Regression Functions conside r two cases: an increase in uistrict income fro m 10 to 11 (i.6) is the predicted \ alue of Y when income = 1\. The ~stimated c[(ect of a c hange in X l is ~IAX1' so a 95% confidence inter· va l for the estim ated c hange is ~16XI :!: 1.57. Said differently.3 + 3.000 to $41 . These predicted va lues a re calcul at ed u~ing the OLS estim ates of the coefficients in Equation (8.5) 10 the quadratic regression model. the predicted difference in lest sco res be twee n a district with ave rage income o f $11 . wh ich va ries from one sample to the next.0.3 + 3. 42 point).85 x 4l .1inS sampling error. O ne wa y to qua ntify the sampling unce rtainty associated with the estima ted effect is to com pUle a confide nce mlt!fVa l for the true population effect. the sta ndard error of :lr' can be comput ed using the tools introduced in Section 7.85 x 40 ..3 is steeper a t low va lues of income (like $10.icted va lue of Y when income = 10.. In the second case.53.5).000) than a t the higher values of income (li ke $40.42 points.(607. when hu:vme = 10. (S.2) .641. d.53 .96 points. II is easy to compute a standa rd error for ~y whe n the regression function is linea r.(0) tha n if it is $4U. a cha Dge of income of $1000 is associated with a large r change in predicted tt!st scores if the initial income is $10. Accordingly. the slope of the estimated qua· dra tic regression fu nction in Figure 8. Do ing so yields o (A.OOO is 2. lha t is.04 .(Po + ill x 10 + h x 10'). th e pre dicted value of test scores is 607.3 ror testing a single restriction involving multiple coefficie nts.0. Y Standard errors ofestimated effects. iJu.3 + 3. the differ e nce in the predicted values in Equation (8.O<!<J) and an increase in district income from 40 to 41.62 = 0. The estima tor o f the e ffect on l' of c ha nging Xl depe nds on tbe estim a tor of the po pula tion regression fun ction.3 + 3. The di ffe rence in these two predicted values is 6 = 644.85 X 10 . There fore the estima ted effect c(Jnt.6) is AY = (607. x II ' ) ..0423 X 402) = 694.e. from $ toJ). + ill X 11 + il. consider lbe r . Th e te"" in the fir st set of parent heses in Equa tion (8.n) whe re PI' a nd ~ are the OLS estimators.96 points.0423 X 11 2 "" 644.57 = 2. 0423 X J01 "" 641.85 X 11 .96SE(PI )llX1.0. whe n income changes from $40. and the term in the second se t of parentheses is the pre.693. Whe n Income "" 11 .COO) . To do so.96 points versus 0.() per ca pita to $11.
then we have com puted Ihe standard effor of 6 Y. l t function i~ deuce in te ! error of oli' lconsider l}\e ~ting a siogle ~Equall('" (8.J b)' noting thar the F·stati~\)c is the square orthe lstatbl.z. That is. in the transformed regressio n. The first method is to use "a pproach #1" of Secti on 7.29).3. ct contains cd with tho! A comment on interpreting coefficients in nonlinear specifications." l(b.17. But..57 riet with .1) as being th e effect of changing the district's income. tllis is not generally the case in a nonlinear model. which correspond to the two a pproaches in Section 7. v . /31 is the expected change in Yassociated with a change in Xl.0423 2 points. The second me thod is to use "approach "2" of Section 7.96 x 0 . which is to compute the Fstatistic te~ting the hypothesis th at (3. which entails trans forming tbe regressors so that.C lesting this hypoth· esis.0. c\ on yof unction. (8. holding th e other regressor. There afC two met hods for doing this using sia n· dard regression software. Because 6. In the multiple regression mode l of C hapte rs 6 and 7.1()2) = PI + 21{J2' 'Ine standard error of the predicted change therefore is in Eq uation (8. 8. I~Yl Vi' (8.961'\1299.6).0423 x 641.s had a natural inte rpretation . the regression coefficient.1 A General Strategy for .thal is. F . The standard error of flY is then given by2 S£(6Y) = V 10' = ~.2).8) gives SE(ily) = 2. + 2Ib~Sli<P. le differ . applying Equati on (S.8) 6 points. if we can com pUle I~e standard error of {3] + 21~.. constant. + 2tb f 1~YIS£(dYW. as we have seen.6) I ~d \'a\ue the pre ~u using Inco me Thus.. predicted d changes alcd qua Ollle (like \V'hen applied to the quadratic regression in E qu a tion (8. w hich is uY = S£(6Y) = SE(jJ. + 21iJ. + 21f3. it is not very helpful to th ink of /31in Equation (8.3.7) (8.ilnd solving for SEedY).. Doing this tra ns fo rmatio n is left as ao exercise (Exercise 8.(XlQ estimated change in test scores associated with a change in income from 10 to \1 I'e can nodel. ~I X ( 1 1 . holdi ng the square of th e d istrict's income constant.9).82 = O.96 == 1.Modeling Nonlineor Regre~on Functions 263 'O.3 for testing a single restricti on on mu ltip le coe ffi ciem5.8. j. For eXH mplc.96.94 = 0. Th is means that in nonlinear models. lion effect. o ne of the coe ffi cie nts is {3.5 ). the Fstatistic testing the hypothesis that /31 + 2 1/3 2 "" 0 is F "" 299.3.10 ) + ~2 X (1 12 . the regression func tion is best interpreted by gra phing it and by ca lcu lating t he predi cted effec t o n Y o f changing one or more of the independent variables.63.94. Thus a 95% confidence inte r val (o r the change in th e expected va lue of Y is 2 .8} is derh·. + 21.). r.17 or (2.Y = 2.
implc. Before you even look althe data. tha t can be cqi ma ted by OL~.1. After work ing through these sectio ns you will unde rstand the c haract eristics of e ach of these functions. 3. ask yourself whether the slope of the regression function relating Yand X migh t reasona bly dcp. Why might such non linear dependence ex ist? What non linear shapes does this sugges t? For exam.tcd that the quadratic model [i t th e data better than the line ar model.2 Nonlinear Functions of a Single Independent Variable This section provides two methods for modeling a non linea r regression fun ctiOn To keep thi ngs :. JU~ 1 bet:ause you think a regression funt:lL is no nline ar does no t mea n it really orr is! You must dClc mline e mpirica lly whe the r your nonlinear mode l is aprro priate. • 264 CH APTER 8 Nonlineor Regression Functions A General Approach t o Modeling N o nlinearities Using Multiple Regression The genera l approach to modeling nonlinear regression f unctions taken io thl~ chapler has five elements: I. pic. 5. Jdemijy a possibl~ nonlinear relationship.3 cont ain var io us nonlinear regression functio n.:. 4. Determin e whether fh e nonlinear model improves upon a lin ear model.. r lol/he estimated nonlinear regressionJunclioll_ D oes the e stimated regres· sion fu nction desc ribe the dat a we ll? Looking a t Figun:s 8. E~'l imate 8. Most o f the time you can use (sta tistics a nd F·stat istics 10 t e~1 the 11ull hypothesis Ihal the popula tion regression funct ion is li nea r against the tllter na ti ve thaI il is nonlinear..3 suggc:.:nd on [he value of X or on another independent v ari able. we develop these methods fo r a nonlinea r reg r e"~L()1l .2 and 8. S ec l jon ~ 8. thin king about cl assroom dynamics with l l yearolds suggests that clIlting cl ass size from 18 students to 17 cou ld have a gre ater eHec! than cutt ing it fro m 30 1029. th e e lJe" Oil Y oj a change in X. Th e fi nal ste p is to use tht' esU m ate d reg ress io n to ca lcul ate t he e ffe ct on Y of a change in one o r more regrt!ssors X us ing the method in Ke y Concept 8.2 and 8. 2. SpeciJy a l1ol1linear Junction and esrimate its param erers by OLS. The best th ing to do is to use l'l:O_ nomic theory and what you know aboullhe application to suggest a pO~:'lblc nonlinear relationship.
9) is rodel.. .X. X.5. . ted regres ~ suggested I .. that is. .. In pa rticula r.1 restrictions on the coeffici e nts of the population po lynomial regression model.9) I Sections t be esti stand the When r = 2. against X" X?.. so tltat the highest power of X included is X 3.9) can be esti ma ted by O LS regressio n of Y.2 Nonlinear Functions of 0 Single Independent Variable 265 ~ in tbis I ~. Xi . it can be tested using the Fstalislic as described in Section 7. {3.1. Equation (8. however./33 = 0. except tha i in Chapler 6 the regressors were distinc t independent va riables. . Because Ifo is a joint null hypo thesis with q = r . :st! the cs!\' I Of mort: ne Testing the null hypothesis that the population regression fun ction is linear. + 132Xr + . f.iOfl The nuH hypothesis tha t the popula tio n r egression function is linear can be tested against the alternative thai it is a polyno m ial of degree r by testin g No against H I in Eq ua tion (8. Accordingly..Xi + U i · (8. If th e popula tion regression func tion is linear. the null hypothesis (Ho) that the regression is linear and the allernative (HI) that it is a polynomial of degree f correspond to HO:f32 = 0.8. . they call be used in combinatio n. The polynomial regression model of degree r is br it from Yj = f30 + {3. As we see in Section 8..: at least one{3J =I... In ge ne ral.j = 2. X l. .2. H. = OV5. Thus the tedmiques fo r esti ma tio n a nd in fe re nce developed fo r multi ple regressio n can be a pplied he re. . these models can be modifieu to include multiple independent variables. wh e rea ~ he re the regressors are powers o f {he same depe nde nt variable. .)(2.9) is the quadratic regression model discussed in Section 8. . and so o n . then the q uadratic a nd highe r·order temlS do not e nler the population regression functi o n.. tbe regressors are x . F. Equalion (8. se eco funct ion that in volves on ly one indepeodcn t variable.. The first meUuxi discussed in this section is polyno mial regressio n..10). {3.sible f rtcutting ~g \'hether depend uch non exam· Polynomials O ne wa y to specify a nonlinear regression functio n is 10 use a polyno mia l in X. tbe unknown coe ff icients Po.• {3. + f3. in Equa tio n (8.10) ion func li01'l ~ rt:grc~ . The polynomial regression model is simi lar to th e multiple regression m ode l of Chapte r 6. (8.. lei r denote ihe highesTpowe r of X tha t is included in the regressio n. TIle second method uses logari thms of X andlo r Y A Itho uglt these met hods a re presented separale ly. Just is appro sl the null Init really t the alter· called the cubic regression m odel. When r = 3. .D. an exte nsion of the quadratic regression used in the last seClion 10 model the re lationship between test scores and income.• X.
029) (0. inflection point~) In its g. Ihe non linear functions are smooth. The .raph. U:>c the I. c~ressioo fo r Ihal r.statistic on Illcome' is 1....1 is ze ro. Pick a maximum value o f r and estimate the polynomial . how many powers o f X should be incl uded in a polynomial regression? The unswer balances a (radealf between ncxibility and statistical precision. (5. If you n:jccl. be~io wit h r = 2 or 3 or 4 in step 1. Ihen these tcrms can be dropped fr om the regressio n. which is called seq\lenti al bypot he~is tes ti ng beca use int. If you do not reject {3. Th is recipe has o ne missing ingredient : the initi al degree r of the polynomial.1 4.I bends (that is. Test whether the coemcien! on )(. This procedure. .h. continue this proced ure until the cod ficient on the highest power in ).lU should include enough to modcl lhe nonlinear regression funct ion adequately.. such as 2.I . 3.! .. But increasing r means add ing. so use the polynomial of degree r.li yidua l hypotheses are tested sequcntially.stat istic to teSllhe hypotbesis that the coefficie nt on )(' {fi. J.. 0. . 2.... (he n belongs in the regres· xr sion.. a polynomial of degree r can have up \0 r . the)' do nol have sharp jumps or "spikes. but no more.. Increasing the degree r introdUl.:t.O. If you do nOI reject (3.nA!1vc thai it is a cubic at Ihe 5·. In many applic<Ltio ns involving economic data. more regressors.: regfl::~' S1011 function rclati ng dist rict income to lest scores is (8 ~ .~~ ..97. 1) (0. Ihis answer is nOI very useful in practice ! A practical way to dete rmine the degree of the polyno mial is to ask whct ht. tion (~t9» ) is zero." If so.55:'1. use lhe polynomial of degree r . then il is appropriah.1 + 5.71 ) (0.. is summarized in the fo llowing ste ps: 1.. L = 0 in slep 3.9) associa ted with largest va lues o f ra re zero.:\ more flex ibility into the regression fun ction and allows it 10 match more shape . prccbion or thc estima ted coefficie nts.I 266 CHAPTER 8 Nonlinear Regrenion Functions Which degree polynomial should I use? Th at is.:r the coefficients in Eq uation (R.!_!_ _ .ffi6/ncomil + O. or 4that is. . Unfort unately. lI... Iha l is. 600.. which can reduce tht. 50 the null hypoth~is Ihat the regression fo~' •__ . H you reject this hypotbesis.our po lyoo mi al is statistically significant.(0)69JllcomeJ. _. The estimated cubi<. Thus the a nswe r \ 0 Ihe questio n of how many terms 10 include is Ihal }\.!~~. in Pqua.I .000)5) ti' R. If '\Q. = 0 in step 2.Q2Jncome .' \0 choose a small maximum order for th e polynomial . I .eliminate X' (rom the regression and e"ti mate a polynomia l regressio n o f degree r .. Application to district income: and test scores..
Interpretation ofcoefficients in polynomial regression models. but N hether ro. ~e in Equa. rc smooth. professions and over lime when 'hey are ex pressed in percenlage terms. play an import ant role in mode li ng nonlinear rcgres. ropnate w hat is. "The Gt:nder G ap in Earn ings of Collt:ge Grflduatcs in th e U nited States.. Would this relationship be lincar using percenlage changes'? That is. regrcs t and l:sti ficient on graduates. the exponential fu nction i. wi th a p value less than O. it is easier to compare wage gaps ncros!!. hich is e tested Logarithms Another way to specify a nonlinear regression fu nctio n is to use the nalural loga rithm of Y and/or X. where e is lhe constant 2. The coef· ficie nts in pol ynomial regressions do not have a simple interpretation . begin ~bic regreS (8.7. Howe \'er.8. Here arc some examples: • The box in Chapter 3. hat you lely. llle percentage decrease in dema nd resulting from a 1% increa~ in price is ca lled Ihe pricc elasticil}'.so the null hypothesis Ihal the regression func tion is linea r is rejected againsl tht! alternative Ihal it is either a quadratic or a cubic. Regression spec ificat ions that use natural logarithms allow regression mod els to estima te percentage relationsh ips such as these. . The exponentiul runcti on of x is e'< (that i::. ic at the ) • ." ression fun': .Ol%. I . Moreover.1.71828 .If so. Berore introducing those specifica lions.ion fu nctions. In that discussion.. 111 The exponential function and the natural logarithm _ Tbe exponent ial function and its inverse. the wa ge gap was measured in terms of dollars. it is often assu med that a 1% increase in price leads to a ce·rtain percelltllge decrease in the quantily demanded.2 Nonlinear Functions af a Single Independent Variable 26 7 deoff uces Ihapes: ints) in Lee Ihe ~ or X lcvel. Logarithms convert changes in variables into percentage changes. o \ynomial. the nat ural logarithm. e raised to the power x). Th e best way to in terpret polynomial regress ions is 10 plot the est imated regression fu nc lio n and to calculate the estimated effect on Y assoc iated with a change in X (or one or more values of X. • In Section 8. might it be Ihat a change in di strict income of 1%rather than SIOOOis (lSSo cia led wjlh a change in test scores th aI is approximately constanl for di fferent va lues of income'? • In the economic analysis of consumer deman d. we fo und that district income and test scores were nonlinearly rela ted. and many relationships are naturall y expressed in terms of percentages. we revie w the exponenti al and natural logarithm fu nctions. II the coef l ificant.. the FSlatislic testing tbe joint null hypothesis that lhe coefficienls on Illcome2 and II/comc' arc bOl h zero is 37." examined th e wage gap between male and fe male college tor that r.
A hho ugh there are loga rithms in o the r bases. tha t is. x = In[exp(x)J. 8 Nonlinear Regre~sion Functions FIGURE 8. No te th at the log arith m func tio n is defined o nly fo r positive va lues o f x. In(ax) = In (a) (b. such as base 10. In (x) . X also wOlten as exp(x) . The logarithm fu nction has a slope that is sleep al firs t. in th is book we conside r on ly logarithms in base e. con tinu c~ to I In(l/x) = ·In(x). p ) (•.4. Y = In{XI y 5 The loga ri th mic functi on Y". d ivided by lOO.268 CHAPTER.13) l + In(x). The aaturallogllTitlun is Ih e inverse o f the expone ntial func tion. then fl <l ltens out (<llthough th e function increase) . is graphed in Figure 8." The loga rithm fun ctio n.. TIle slo pe of th e logarithm function In (x) is l / x. In(x / a) ::::.12) (8. (S. the pereenlage change ill J . . y = In(x).111e b ase of the n a tura l logarithm is e. y'" InOO 4 3 2 °o" ~~~..~on thon for lorge vo'ue~ of X. so when we use the term "logarithm " we always m ean " n atural logarithm.and In (X') == lfln(x). ond has slope 1/ X.In(a). is only defined for X> 0. The link between the logarithm a nd percent· ages re lies o n a key fact: Whe n !:u is small. the natural log(.. InlXI is steeper for. . That is. 15) Logarithms and percentages.4 The Logarithm function.:L"L~ 20 . The logarithm function h as th e fo llo wing use ful prop e rties. the na tura l logari thm .. the di ffere nce between the loga r ilhJl1 o f x + ax a nd the logarith m o f x is approxima telv ~ . equiva lently.0 80 100 60 l ~(J .. that is.lfil hm is the funelion for which x 0:: In (e") o r.
Case I: X is in logarithms. a 1% change in X is as.01 (or 1%).oc ia ted wit I) a cha nge in Yof om f31 ' To see this.2 Nonlinear Functions of a Single Independent Variable OIl 269 In(x + llx) . 1l1e re are three d iffe re nt cases in which loga rithms mi ght be used: whe n X is transt"o rme d by ta king its logarithm but Y is not. (8. In (X ).x 1x = 1/ 100 = 0. a nd when bOlh Y and X are lransfonn ed to their logarithms.).OI ) is very clo se 10 In(x + ax) .and a 95% confidence interval for /31 can be consuucted as /3 1 :: 1. o n In(X. wh.16) wh ere " S!'" me <lns " ap proxim<ltely equal to:'1l1e de rivo tion of this ap p roxima ti on relies o n colcuhL\ but it is readily demonstrated by trying o ut sOme val ues of x and !lx. hypotheses abo ut 131 can be l es l e~ using the l .l3.5/100 = 0. t he n 6. + 6<) . . in this model a 1% change in X is associated with a change of Y of 0. 6< /.l3d ln (X + JiX) .c.]e In(. x 15 small ) . when x = 100 and A. consider t h l.t OLS regreSSio n of Y . In the lin ear· log mode l. lf X changes by 1 %.[. The o nly diffe re nce between the regressio n model in Eq ua tio n (8. {3t( ilXIX) wh e re th e fi lial step uses the approxi· mation in Equation (8 . thus. The three IOKarithmic regression models. 17) and the regression model o f C hapter 4 with a singk regressor is Iha l the rightha nd vari able is now the logarith m o f X raliler Ihan X itse lf. To estimate the coefficien b 11) f3 0 a nd (3L in Equa tio n (8. Th en Poa nd 131 ca n be estimated by the in . E stimating this regression by OLS yields • . Thus /lx / x (which isO.!..16). while In(x + Ax) .In (x) (which is 0.In(lOO) = 0.1 7) Because Y is no t in logari thms but X i~ this is sometimes re ferred to as a 1in~8"" log model.l311n(X + JiX )] . Y is not.In(.In(x) = In( 101) .995% ).1 7) . . Yi ::.00995 (o r 0..In(X)] .00995).r = 1.In(x) 6. x ( when ~.05.1.Dl. . when Y is tran sfon neu to its logarit hm b ut X "is not . th en :1XIX = O.! diffe rence betwee n 111e pop ulation regressio n func 10 ti on at values of X that differ by 6 X: This is [J3u + .In( IOO) = 0. we cou ld usc t he linear·log specifi catLon in Equation (8.8. Fo r example.0 113 1. th is is readily done using a sp readsheet or sta tistical software. The interpre tation of the regression coefficients is diffe rent in each case.1 7). fi rst com pu te a ne w va ria ble._ + J3 l ln(X )] = . ~ta tistie. We d iscuss these three cases in turn . no (8. In this case. W he n ax = 5.) = In( 105) . return to the relationship betwcen district income and test scores.04S79. the regressio n mod e l is Po + p \lo(Xj ) + u"i . Instead o f the quadratic specification ..96SE(fJI)' As an example.
. Because the regresso r in Eq uation (8. this regression predicl~ Ihal a SIOOO increase in income has a larger effect nn teS I scu res in poor districts than il does in aID uen tl.42 X [I n(ll) .In (40)] = 0.jlh an increase in lest scores of x 36. Po + ~ 11ntXl captures much of the nonlinear relation between ted )(O(e~ ond di~trid income. The estimaled linearlog regressio n functio n in Equation (S.' 1 Distr icl in corue (thousa nds of dollars)  F WScore = SS7.= 0. the estimated regression fun ct ion is not a slraight lint: Like the quadratic regression funct ion in Figure R.: n natlens OU I (or higher levels of income.In(lO)J = 3. H2 (3.R + 36.5..42 x (10(41 ) . Thus.. 18).1 For example.36 points.! hctwel:n a district wit h average income of $40. the predicted d ifferenc&. To cstim. 1S) is the nalurallogaril hm of income ralher than income.000? The estimated value of /1 Y is the d if£er. a 1% increase in income is associated ". 42In(/ncomc). il is initially stee p bu l Ih.000 versus SI1.8) ( lAD) {K lS) AC\':ording 10 Equ ati on (8.56 1.n~ bch~ een the predicted values: AY = [55 7.5 The linearLog Regression function Ten 5core 7of() The estimated linearlog regrenioo functioo y . om = .20 600 U L IU 20 30 40 5U (.ue the effect on Y o f a change in X in its original un its o f thou~nds of dollars (nol in logari lhms).(X)() and a district with avcmgc income of S41. Similarly.\8) I!> plollcJ JI'l Figure 8.3.lo.42 = 0. wha t is the predic ted di(fe re nce in lest scores (or d istrict!> wit h a\'~ r<lge incomes of $10.lhtricts. .4210(1I)J .47. like the quad rat ic specific<ltio. we can use the method in Key Concept 8.90.1557.8 + 36A21n( lOll 36.• 270 CHAPTER 8 Nonlinear Regression funcnom FIGURE 8.8 + 36.(XXJ is 36. 720 1011 '''''' 660 610 (.
Many e mployment con t ract s spcciry th a t .19) SO that each additional year o f age (X) is.81.OO86Age.). associa ted with some consta nt per centage increase in earnings (Y ). In t his case.21 ) a loglog Bt.the rela tionship between age and ea rnings of college gradua1t:s.655 + 0.l. earni ngs arc predicted \0 increase by 0.~ X = I) is associUlcd with a 100 X 13.86% 1(100 x 0. this is referred to lIlo d c. this relationship is ~ In (E arnings) = 2. r. In(Eurnillg.030.1/30 + . ~ffect on \t::~1 ! inc001e (l ( According to this regression. If LiX = J. a worker ge ts a certain percentage increa:. we re turn to the e mpirical example of Section J.7. By first computing the new dependent va riable.010) (0.e in hi s or he r wage. (8. I n the loglinear model. com pare the expected val ues of In( Y ) for val ues of X that differ by l l. Tra nslated in to perce ntages.thc regression model is plotted ill logarithm (It I straighl litl~ teep but Ih(tl " In(Y/) = fJu + 1l1ln( X. so that X changes by one ullit. o n average in the population.. D conll~ dollau) (8.\1 = fJ \AX. % ch ange in Y To see this. In this casco the regression model is In( Y. Rl (0.) "" {J(~ + {3I X .18) id with pt 8.TIlis percentage re lationshi p suggests esti mating the loglinear specifica lion in Equation (8. n ~. a one.2 Nonlinear functions of a Single Independent Varioble 27 1 Case 1/: Y is in logarithms. Y) = /30 + f3J (X + . Ibis is referred to as a loglinea r model.\j .Thus th e difference between these expected values is In(Y + tJ. X is not .19) Because Y is in logarith ms but X is no\. the unknown coeffi cie nts f30 and {31 ca n be estimated by the OLS regression of In (Ra rnings l ) against Age.20) ~6.. aYI Y == {3J a x.In (Y) == a Yl Y Thus.777 obser vat ions on college graduates in the 2(X)5 C urrent Popul atioll Survey (th e data are described in Appe nd ix 3.\'.16) . From the approximation in E quation (8. however. then a Y I Y changes by fJ t.:cause both Y (lnd X are specified in logarithms.421i\(IO)1 e between (I pecificatiO !l. for each additio nal year of se r vice . l1 tlnit change in X is associ ated with a 100 x f3 !% change in Y As an illustration. the ex pected value is given by In(Y + 6.) + IIi' (IS (8. Case III: Both X andY are in logarithms. th en ISI(Y + a Y) . an tbousand~ FIll In(Y) = 1/30 + f31(X + aX)1 .. • .(086) % I for each addi tional year of age.. The expected value of In(Y) given X is In(Y) = 130 + f3 1X. When estimated usi ng t he 12.. When X is X + dX.6.u. if /3 taX is small.1.8.0005) lth average difference = 0.llnit change in X (. (8.1 ).
6 The LogUnear and Loglog Regression funcrions In{Test score) In the Iog·Iinear regression functioo.40 1 o j II 10 20 J I) 50 61 Di strict ill COIll (thousacds of doll ars 40 II In the log. iC 6.) :.OI X). 1: lhus In(Y + i1Y) .In(Yl = 1 + 13\ln(X + .23 l .In(X)J.557.' 2 Thus.d. Y fl X y:. '''' (.X =O. In(Y) is Q linear function of X..B)ln(X») = 13. ao illuslrat ion ..ogll'\t¥ ~ L at I i L " th e r 6. in thc loglog specification PI is. In(YI is 0 linear function ollnlXl.1~sO' cia ted with the perce ntage change in X. A pplic3 Iio n o f the approximati· 130 in Equat ion (8. A:i. 'R2. the unknown coeffici ent~ arl! <!s mated by a regression of the logarithm of test scores against the logarithm income.lX 130 .: lh aga in apply Key Concept 8. If thc percentage change in X is 1% (Ih is. a 1% change in X is associated wit h a 13.lln(X + uK) .336 + O.0. To sec.05S4Ln(lncome).the ratio of the percenta ge change in Y . 16) 10 bot h sides of this equation yields . The res ulling estim ated equation is l ri(Te:'·I S·~.5d d. 6.0021) (~. Thai is.006) (0.XI X = 100 X (AX / X) = percenlagechangeinY perce ntage change in X ' (.O: f3cyor (31 = flY I Y 100 x ( fl Y I Y) o.% chang(: in Th us. In the 1og.. then 13. in this speci[ica tion 131 is the eillsticity of Y with respec t to X. retu rn 10 the relationship betwee o income and Icst )cOf' When this relationsh ip is specified in this Conn.1og regreuion function. 6.45 1'.. (0.55 l. is the percentage change in Yassociated with a I change in X.272 CHAPTER 8 Nonlinear Regr~sion Functions FIGURE 8 .1 + .8 1 is th e elasticity of Y wi tb rcspeC! to X.log model..
(8.6 is t he logarilh m of the test seore_and the scalte rplo t is the logarithm of test scores versus d istrict income. . The three logari thm ic regressio n models are summa rized in Key Concept 8.2 Nonlinear Functions 010 Single Independent Variable 273 I~ lr LOGARITHMS IN REGRESSION : THREE CASES Logarithms can be used to transform the dependent variable Y.= Rl (O. cie nts arC c~\I' ~ logarilh Jl' of . Figure .iX '" I) is associated with a IOO{31 % change in Y.439 1ge in Y aSSO'  + Q. fW¥$~ 82 K~Y CONlOEI'T r . in Y.6 also shows the estimated regression (unclion for a loglin t:ar specification_ which is In(Test Score) "" 0. ~ 'Og. 23) is plotted in Fig ure 8.OO284ln com c.ooOl~) 0. Th is is consisten t with the higher 'Rl for the loglog regression (0.01131.) = . a 1% increase in income is esti mated to correspond to a 0. Eve n so. incomt "" A ill Xhy 1 unit (.) = f30 + .' U j .. Because Y i ~ in logarithms.B 1ln(X.497).e vertica l axis is in logarithms.11 Because th. .. the re gression function in Equa tion (8. + II f In( r .221 According to this estima ted regression fun ction. (X + . ) {3q + {3.X.Blln(X1 + It.: in X is associated Yi = f1rl + . most of the observa tio ns fan be low the loglog curve.[t!>sion "'M n t Regrouion Specifi(orion Interpretation of /1.24) (0.6.:: in X is assoc iated wilh a {31 'Yo chsngt in Y.6. For comparison p urposes.8.. o r both (but they must be positive). while in the middle income range most of the observa lions fall above the estimat ed regression function.0554 % increase in test scores. A I % chang.2. As you can see in Figure 8. A 1% Chang. (B.557) lb an fo r the logli ne ar regression (0 . f dollars) m In(Y.change ' with 3 cha nge in Y of 0. 'nle following table summarizes th ~sC three cases a nd the interpretation of the regression coeffi cient {j . tbe vertical axis in Figure 8. The estimated loglog regressio n fUllctio n in Equalion (8.497. the loglog specifica tion does not fit the data especially well: At the lowe r values o f income.1•..) + II.Xli roximation y X· (B. the loglog specificatio n fits slightly better than the loglinear specificatio n.24) is the Slraight line in Figure 8.7.6.s. an independen t variable X. so {31 is Lhe elasticity of Y with respect to X. In each cascof31can be eSlimated by applyi ng OLS after raking the logarithm of the depen dent and / or independent variabl e.003) X is 1% (thai ed with a l % nd test score.. o sec this..
consider the loglinear regression model in Equa tion (8.) F 1.561 while the linear regression has an RI o f 0.23) and (R.J. The problem is that . natel}.) = ~+iJ.1111.J lo'ibis OlJlen.. i.I X.. J Y.ed to compUl. II r\ a bit trickier to compute the predicted value of Y ibdf." din:ctly the predicted value of In( Y). HO\\ e . ) = e A. To sec Ihis. £(e".508."veil ir !:lId = O. Reell lllhlH th!... because wage compari~on:.ll\'e "· . If the I. Which or the log rcgr~ion models best iiI.. (Ie. ... . Beca use of Ihis problem. contract wage in c rea5e~.' thl."urC$ tbe fraction of the variance of the depende nt variable explained b~ Ihe regressors.'\eu l X. th e R2can be u~d 10 compare the log·linear and loglog moo_ els: as it happened.1 rewrite it so that it is specified in t~rrns of Y ralher than In( V ). whether it makes sense to specify Y in loga rithms.1 \J more ad~anccll af"J can he: '~lppcd "ltOOUt I""s or ~onunully_ .j. To do so. Bccause the depcndent variables in the loglog and lineurIog modcl~ arc diffen:nl. tak~ thl' I!xponcnlial function of both sides or the Equation (H.'ay) nat ural to di <. the estim ated regrL's sion can be u::. is di\tributcd indcpendemly of X" then the cxpccteJ vulue 01 Y.kp. labor economists typicnUy model earning~ using logarithm::. Il ow can we compare the linearlog model and the loglog model? Unlor!u..'ll d~pcndent variables are different lone is Y.24). Similarly.19): the res ult is Computing predicted yalu~s of Y when Y is in logarithms.. .: used to compare the linearlog regression in Equation (8.• 274 CHAPTER B Nonlmeor Regression Functions A difficulty with comparing logarithmic specifications. Fo r example..exp(. the other is In(YJJ. . Hnd 50 (Orlh arc often mosl naturally discussed in pc'r centage term!t.: R: mca.~ E(e"'). Sl) Ih\! linearlog model fits the data ocHer.!> the data? As \I.) e £(~.2 can h. the loglog model had the higher R2.) If II... 18) and the linear rcgf('.'r. the 8.:~lrc rather than its loga rithm.. the R~ COIIl/ot he used 10 compare these two regressio ns beeau~1.lhe linearlog regrL'\. the appropriate predicted value of Y.:u~ test result:.'>_ sion of Y against Xln the test sco re am. nOI simply obtiUI1.:n dent variable Y has heen transformed by taking logarithms. the best thing to do ill a pUrlieul "r application is 10 decide.! income regression. using economic Iheory and either your or olhe r experts ' knowledge of the problem. sion has an "R2 o f 0. it seems (to us. in terms of poinL'i on the test rather than pcr<:enlage inCTe asc~ in [he le~t score~ so we C ocus on models in \\hich the dependent va riable i!> the test s.ln modeling lest scores.1 Y).e saw in the discussion of Equalion~ (8.8o r 13 IX/ + II. gi\cn X IS E( Y.. it docs nol muke se ns~ to I:ompure their R~' s.tI. ThUs. anyv.
) lohtl1lllo. depen I "r.560.re!. quadratic [E quation (8. n. n p. the cu bic specification provided an improvement over the quadratic. Accordingly. In practice.rc<. 00 II be . gets compliC<lted and we do not pursue it further. this is ofte n acceptab le because when the de pendent variable is n specified as a logarit hm. = e ti .+. . then we can COIl clude that the specification in E qulilion (K 18) is adequate in Ute sense that it can not be rejected against a polynomial function of the logarithm.g:~c~. whieh is the approllch used in this book.':cn district income and lest scores. we compare log arith mic and polynom ial models of the relationship bctwt. economic theory or expert judgment might suggest a func tional form to use.18) see med to provide a good fit to th ese data..26.2 \o~ Nonlinear Functions of a Single Independent Vorioble 275 one.: th. the estimated cubic regression (specified in powers of the logarit hm of income) is T eS1Score = 486. is to compute pre di cted values of th e logarithm of Y b ut not to transform them to their origin al un its. O ne sol ution 10 th is problem is to estimate the factor £(e". . Because the coefficient on ilicom CJ in Equat ion (8. If these additional terms are not statistically different from zero..: 1 (8. ltnd Ink. so we select the cubic model as thc preferred polynomial specification.7) ~ Jt if £(". A nother solut ion. fi . the mtu their Ite by taking the exponential funct ion of l30 + that b.. (3.06[ln(/m:omt')j3.. Rl "! t he ode ls blo of the labor I risons.2)} and cubic [Equation (8.74) (K26) . but we did not test Ihi~ formally.1 I)}.:r iscuss in th~· I scurc Polynomial specifications.) and to usc this ~ stimatc whe n computing the pred icted value of Y..9) (31.9[ln(/llcomc)j2 (79. II I~ 19). T practice.4) (87.:!:'t i\'cn X i~ Logarithmic specifications_ The logarithmic specification in Equation (8. 1 + llJ. filling a no nlinear function therefore enta ils deciding which method or combination of met hod~ works best.11) w as significa nt at the 5% level. by setting Y. but thb.. Polynomial and Logarithmic Models ofTest Scores and District Income In practice. but in the end the t rue form of the populat io n regression fu nction is unkn own.8.7?1::: 0. We conside red two polynomial specifica tions spt:cifi ed usi ng powers of In come. it is ofte n most natural just to usc the logarithmic speci fi ca tion (and the ltssociatcd percentage interpretations) throughout the a na lysis.:J + 3. A s an illustration.4In(/l1come) .8! Yo: l b is predicted value is biased because of the missing factor £(e"').x. Oue way to dn so is to a ugme nt it wit h higher powers of the logarithm of income.
tio n (8.~ 6) does nOI provide a statistically signi ficant improvemen t over the model in Equa. with a p value of 0.• 27. The two estima ted regression [unc.7 plols the estimated regression funct ions from the cu hic sp(~cification in Equat ion (8. Thus the cuhic logarit.1 8). so this joint null hypothesis is nOI rejected a l the 10 % leve l. The Fstati.. Because the logarit hmic specificat.ion has a sli ght edge in terms o f the R!.7 'The linearlog and Cubic Regl'tiSton Fundi on~ Tes t score The estimated cubic regre~ion IiJnction [Eqvation (8.818. FIGURE 8. 740 720 7(". of the logarithmic regre ssion is 0. so the nul] hypOth. we adopt the logarithmic s pecifi cation in Equation (8.0 " Cubk 640 620 600 0 10 20 '" 40 50 D is trict in come (th ou sands of doJlar1) . One statistical tool fo r compa ring these specifications is the R2..44.' .1 1I[ and the e~imoted linear· los regressiOl1 IiJnction [Equation 1 8. lions are q uite simila r.561 aud for the cubi c regression it b 0. 18). Figure 8.log 680 6(. tic testing the joint hypothesis tha t the true coef[icie nts o n tbe quadratic and cuhle term are both zero is 0.64. lilM'ar.hmic model in Equa tion (8. CHAPTER B Noolinear Regression Functions TIle (·statistic on tbe coefficient on the cubic term is 0.11 ) and t he lincar· logspecificat ion in Equation (R l 8).The"R'1.555. esis that the true coefficient is zer o is not rejected a1 the 10% level. Comparing the cubic and linearlog specifications. wllich is linea r in the logarith m of income.181J are nearly identical in this l1Omple. and becHuse this specification does not need highe rorde r polyno mia ls in the log· a ri thm of income to fit these da ta .
when onc is binary and the other is continuous.. if students who arc slilllearning English benc fit differenti<llly from oneonone or small grou p instruct ion . . holding D II constant. on these two binary variables is Yi = f30 + {3ID li r J.In other words.ct jncOil1C Is of do\1 pd) In this regression model.c ).(8. however.< ' rr . = tn(Eanlillgft)] against two binary va riables. Interactions Between Two Binary Variables Consider the population regression of log earnings [Y. The possibl e interaction be tween the studentteacher ratio and the fraction of English learners is an exam· pie o f the more general situation in which the effect all Y of a change in one inde pe ndent variable depends on the va lue of another independen t variable.27) has an important limitalion:Thc effect of having a colkge degree in this spl. and when both are continuo us. the re could be an interac tion between gender and having a college degree so that the value in the job market of a degree is diHerent for men and women. d cubic sis is not on (8. This section explai ns how to incorporate such interact ions between two inde pendent variab les into the multiple regression model.:iCicat ion. holding gender consta nt.26) in EqulI r p lo~s the . If sO.3 Interactions Between Independent Variables 271 ~ l Fstatis· l hypath 8.:hc r ratio wou ld depend on the fraction of English lea mers.27) '"'" . 1\) MOd ssion func l ions is th~ lession i~is . Therc is. ° .teache r ratio in s uch a way that the effect on test scores of a change in the s tud en tIc~u. We consider three ca ~ e s: when both inde pcndent variables are bina ry.8. (8. the effect of DlJ o n Y. could depend on the value of 1. Phrased mathematically.in the log Ification in rof thC R2.. f31is the effect on log earnings of being fe male. is the same for men and women. bolding schooling constant and f32 is the effect of havi ng a college degree.l he presence of many English learners in a dist rk'\ would inte ract with the studen t..< where Dh = I if the jlll person graduated from colleg. The specificat ion in E quation (8.111e pOpulll1ion li near r~grcssion of Y. holding constant gender.:r. for example. the individual's gen dcr (D I" which = I if the (ill per son is fem ale) and whether he or she has a college degrcc (D 2.ll1is could arise.3 Interactions Between Independent Variables In the in troduct ion 10 th is ch apter we wondered wheth er reducing the st udenlleacher ratio might have :l bigger effect on test scores in districts where ma ny students are slillieaming English than in those with few s@ learning Eng Iish.' 50 ''"' ':> + f32D 21 + 1/. where Y. no reason Ihalthis must be so..
The coeffici ent f13 on the interaction te rm is tbe dif· fe rence in the effect of acquiring a coUege degree for wome n versus me n. (82. If the person is m.28) allows the population effect on log earnings (Y.27) does not allow (or Ihis interac_ tion between gender and acq uiring a college degree. lhis is £(Y. lhe point is a general one. the product of the two binary varia bles.gender. ification so tha t il does by introducing anothe r regressor.29) }. Ih x l + P3 X (d t X I ) = /3 0 + /31(i l + /3 1 + f3 3(/ t. the effect of acquiring a college degree is {Jz. = 111' DlJ "" 1) = Po + /3\ X d1 4. = a. l Dl i "'" d l .D.. The me thod we used here to inrerpre t the coeUicie nls was. in the binary variable interaclion specification in Equation (8. + f32D 2t + f3lD \.ID " = d.4)].278 CHA. this is £(Y.ID .D b = \)  £(Y. whic h is f10 + f3 \d l · The E(Y. To show this malhema tically. Th is method. given the same value of D.lle (d1 0). The (irst step is to comp ule the conditional expectat ion of Y.. x Dli ) + I~. the effect of acquiring a college degree (a.28) is call1. AJtho ugh tJlis example was phrased using log earoings.) Firsl of bin bee e1(pcC 1llt: new regressor. to work through each possible combination of the binary variables. The bina ry va ria ble imeraction regre~' sio n allows the e ffect of changing one of the bina r y indepe ndent "a riab lc~ to depend o n the value of the othe r binary variable. is summarized in Key Concept s j · = = . uni t cha. acted regresso r. The int eraction term in Equation (8. X Du. in effect. t he e(fect is /31 + ~.. is called an inlersC1ion term or an inter.. which applies 10 all regressions with binary variables.ID \.28 ). the d ifference in Eq uation (8.lhat is. given a value of 0 \. ~ d. it is easy to mollify the spec.).. for Ov = I. and the popula tion regression model in Equation (8.nge in D 1i ) depend ~ on the person's gender [th e value of O k which is d 1 in Equa tion (R. and acquiring a college degree.) of having a college degree (changing D21 from 0 21 = 0 L D y:::: 1) to a depend on ge nder (D \.19) Thus. the product 0/ .. + ~3dl' (8.D 21 = 0) = 130 + 131X (I I + fh X 0 + ~ X (d\ X 0) = next Slep is to compute the conditional expectation of Y. aher the cha nge. for O2.'d a binary vari able interactioo ~gression model.. DI/ x Dk The resulting Jegression is Y i = Po A IN R + P lO t. calculate th e po pula tion effect of a chaDge in 0 21 using the general method lai d out in Key Concept 8. = 0) = /3.PTER 8 Nonlineor Regression Functions Although the specifi cation in Equation (8.1. but if the person is female (d 1 I).. The c frecl of this c ha nge is the difference o f expected va lues (tha t is.
2 . The estimated regressio n in Equatio n (8.tell<. For di stricts with HiS TR j = 1 ( high stude ntteache r ratios) a nd Hit: Lj "" 0 (low frac t ions of English learners). the sample average is 662. Each coefficient can then be expressed c ither as an expectcd value or as the difference between two or more expected \'alues.1) (8.18.5Hi EL.9 points.8. to \\. and when H iS TR j = I a nti Hi£L j = 1."ti ll s method. This is done using the procedure in Key Concept R. Whe n HiSTR .9HiSTR .290. The difference io (1. = 0) and low fr actions of Engli sh learn ers (HiEL.18.2HiEL .and acquiring e ractjon regre~' ent variables 10 n effect.The rive n a value f(dl x O)~ r'j after the TesrScore ~ = 664.28).3. 8. and HiEL. t D u : d l • D. the sample aver age lest score for districts with low studen t.ng from a district with a low srude nHeacher rati.1.3.1.9).3 an inter S) is caned Application to the studentteacher ratio and the percentage of English learners. then the effect Oll lest scores of moving from HiSTR = 0 to HiSTR = I is fo r lest scores to decline by 1. .9 ("" 664.1. this effect thus is .1 .1 .1.9 . interac th~ ~pec If the two fOR INTERPRETING COEffiCIENTS IN REGRESSIONS WITH BINARY VARIABLES ~ A METHOD KEY CONa/'T II (8. test scores a re estimated 10 decli ne by 1.28) )f First compute the expecled values of Y for each possible e<lse described by the set "fbinary variables. Next compare these expected values.5 ::< 5.3) (1.9 . Accordingly. The pre dicted effect of movi.3 Inleroction~ Between Independent Voriobles 279 r. I.1 . Let HiSTR . be a binary variable that equals 1 ifthe studentteacher ratio is 20 or more and equals 0 o therwise.18.:ht:r rat.2 ( = 664 .5 ( = 664.1. ~ 0 a nd Hi EL j '" l.3. That is.4) (2.1 . holding constant whel her the pe rcentage of Englisb learners is high or low. the sample lIveragc is 640. If the fraction of Eng lish lea rners is high.30) also can be used 10 estimate the me8n test scores for each of the four possible combinations of the binary variables. if the fra ction of English learners is low (HiEL = 0) .5 ) .3.5(HiSTR x HiEL) .9 + 3. the sample average is 645.2). The inte racled regression of test sco re~ aga inst HiSTR.29). the the pers oO's pe rson is Olal~ erson is female term is the dif· .us mell .' is [feet all log D ll : 1)10 l ~ population ept S. i33d\. and let HiEL Ibe a binary variable that equals I if lhe percentage of E nglish learners is 10% o r more and e q uals 0 other wise. "'hie Key Concept 8} to one with a high slUde nt.o ~Jdl' (8.29) )Il ion (8.ios (HiS TR.30) 7<2 = 0.teache r ratio.4 points. with estimated coef ficie nts replacing th e population coefficients.9) (3. = 0) is 664. [. is givc n by Equa tion (8. According to the estima les in Equa li o n (8.30).
.. (he popul:Hion rc~rC~' sion line relaling Y and the continuou::.. .+fl2 I" . Utnl' ~lo~ X (b) l>uft:Tl""nI 111l1. x (el 5. when~ f) the j lh penon i') a college graduate) .:nt mt("rt"~rt~.PIX fhD • p)(X x D\ allows for different intercepts and different slopes.J I :... whether the worker hl\:::.11lle intercept.. ariable X <. a college dt!grec (D .8 Regression Functions Using 8inary and Continuoos Variables (Po ~fJ2)+(fJt +p)X }' I (jJe +P1hPIX I' ... ditr"'rC"1l1 ..:a n depcnd on the lmwr~' ~'. .l. hown in Figurc RI<.. Ihc individual's years of work cxperkncc (. PI reren t slopes.\) ...lOO5varioble$ can produce rhree different population regreni<N1 fvr"IChons lal f30 + PIX.. SbptPI+Pl po+flz r /Jf) .._ __ ) 1 ' J .'.)1 " sJope ~ PI +P: P' l~...111....2 80 CHAPHR 8 Nonlinear Regression Functions FIGURE 8.ff~.... PI flo x (a) O .the 1\\0 regrcssion lines differ only in their imcrcepL 111 C ((lr rc~ponding populntion regression model is ( ~.w d I II one binury variable. and (clPo + PIX+ P2f X x DI has !he some intercept but allows lor dif· l Po +PI X jJopt ./ I"U . IhtTt:Tl""llt ~'ol'''''' y I fJo~ (PI +13.. =: In( c·(ll"IIill... fbi fJo +..1opc'l Illlffodion~ of binory voriablel and conlinl. As .. fJ2D allows for different intercepb but nos the some slope. Interactions Between a Continuous and a Binary Variable Next consider (he population regression of log ~ilrnings [ Y....ln anle f) in three diffe rent way~ In Figure RXa.JJ against onc continuous variable... .."T<t:pl\.. fl...
jf D. holding years of experience constant. thc popula tion regression funct ion is (f3l) + {32) + (131 + f3J)X. In Figure 8.8a have the Silme slope. A third poss ibility. The differ e nt slopes permit the effect of a n additional year of work to differ for college grad uales a nd nongraduates.. so the intercep t is (Jo and the slolX' is (J \. and the difference be tween the two slopes is {3~. In the earnings example.32) l luoctiom: allows s for dif £IIrningl. 13.'y Concept 8. + {32' so the slope remains f3. but requires that c xpcclcd log earnings be the sam e for both groups whe n they have no prior expe rience.that is. the effect of an additional year of work expericnce is the same for college graduates a nd llongradUalCs. so 13J is the difference in the e. (8.. the populatio n regression fu nctio n is 130 + f31 Xi' whcre(t ~ if D j = 1.3. {31is the effect of an additional year of wo rk expe rience for non graduates (D. as shown in Figure. 8. (his specification corresponds 10 the population mean entrylevel \VolgC being Ihe same for college graduates and • .. + f32(Xj x V i) + II . To allo w for different slopes. The di fference betwce n the two intcrcepts is f3z. the populat ioll regression function is f30 + PI X..) and ~feV"'lir . li where Xi X Di is a new variable. The interacte d regression model for this case is Yi "'" Ice (X. a pply the proced ure in Key Co ncept S. as is shown in Figure S.Sb. hold ing college degree stat us constant. In th is specification. this specifi ca tion allows for two diffcrenl population regres~ i on functions relating Yi and X i' de pending o n th e val ut: of D . and f32 is the effect of a college degree on log earni ngs. (8. add an int. Slated in terms of the earnin gs cxample.Jl l The coefficien ts of this specification also c. . 8Bc.eraction term to Equa tion (8. ~) + (3\X . + ~Di + (3J(X i X 0 i) + Il. fhc co f (:. th e two lines in Figu re 8.3.an be interpreted using K(.33) Ilion rcgrl!~ binaf'\'ari· ~pl. Thus. is that the two Jines have di fferent slopes bu t the same intercept. "'" O.ffect of a n additional yea r of work expe rience for college graduates versus 110n gradua tes.8. Ihi$ specification allows for different effect ~ of experience on log earnings between college graduates and nongradua tes. is the effect on log earnings of an add it ional year of wo rk experic nce. To in te rpret the coeffi· cients of this regression. = 0) and f3 1 + (3J is this eUect ror g rHduates. Doi ng so shows Ihat . Said differently. shown in Figure.8a.31): Yi = (3{) + 13 I X. 10 terms of the earnings example.Sb.. the population regression fu nction is (Jo + f3 1X j. the two lines have differen t slopes a nd intercepts.. When Di "'" 1. but the intercept is f30 + f31·lllUS (J2 is the difference between the inlc rCepls of the two regression lines.3 Inleroctions Between Independent Variobles 281 This is the fa miliur multiple regression model with a population regression func tion that is linear in X I and D j " When Dr = 0. the product of X i and Vi.
1.32).: 2.59) (10 .32 ).<.0.:: 682 .6HiEL ...1. This is achieved usin g I h ~ d iffere nt intercept/different slope specifica tion: TeslScore . and the continuous variable Xi can have II slope that d ep~nd s on the binary variable D .31).J..R.33). which allows for dif ferent intercepts and slopes.. no ngraduates. X Di) + 'I.~)  (0. Application to the studentteacher ratio and the percentage ofEngJish learners. + II. = 130 + f3lXi + P20i + f3iX..Sb ): Y.h or loW'! One way to answer this questio n is 10 use a specificalio n Ihat allows for '''0 dif· feT eD( regression lines.28(Sf"R x mEL). Differen t intercept and slope (Figure 8. 1. x the population regression line: relating Y. depending on whether there are a high or low percentage of English learners.ari able are summarized in Key Concepi 8. and in practice this specificaiion is used less frequently than Equation (8.. Different intercept. 1.5 ) (O. The three regression models with a binary and a comjnuous independent . i~ cre· ated . . This does nOI make much sense .): Y/ = 130 .! = 0. the coefficients of all three can be estimated by OLS. All three specifications.teacher ratio depend on whet he r the percentage 01 stmJe nLSsti llleamjog English is hig.: 3. Same intercept. There are three po£sibilities: 0.4 Through the use of the interaction term X. . (8. (ll.}4i "R.4. and (8. Eq uations (8. are vcrsion~ of the multi ple regression model of Chapter 6 and. Does the eHec r on test scores 01 cuuing the student. + 1320 .fl 2(Xi X 0) + /1 /.~7) (8. o nce the new variable Xi x D.282 CHAPTER 8 Nonlinear Regression functions INTERAcnONS BETWEEN BINARY AND CONTINUOUS VARIABLES 8.8e): Yi = Po + PIX.305. d ifferenl slope (Figure 8..2 .n Lhis applicatioo. f31X .97STR + S. same slope (Figure 8.
isCf<· ~ pr<lclice ndent van· offni /i" !acher ral1l1 high or lOll" .3 Intero(:~ons Between Independent Variables 283 tES ·n line n the Iws for dif· ~ o ns of the ~ D. so the null hypothesis that the two li nes have the same slope (.lgc of students stillleam· ing English in the district hi greate r than 10% and equals 0 otherwise. reducing the stu· dentteacher nnio by J is predicted to increase test scores by 0.2 .:ien t on HiEL.ero. 1. for twOcbI" I ' percent~: [ferent s\~ riEL).25STR. the coefficients on the stude ntte'lch ~ r ratio a f C sta tistically significa nt al the 1% significance level . .28 points. the hypothesis that the lines have the sa me intercept can bc tested by testing whe ther the population coefficient un HiE L is zero. so the hypothesis that the lines have the same lIltercepl cannot be reject ed at the 5% level. 'Ine difference be tween these two effects. These three (csts produce seemingly coorradictory results: The jOillt test using the F·statistic rejects the joint hypothesis that the slope and t he intercept are the same. Th e I·statistic.1. The OLS regression in Equation (8.28STR.6 . is less tha n 1. t he estimated regression lin e is 682. • . which has (I pvalue of 0.97S TR. = 687. Finally.1. For dimicts wi th a low fraction of English learners (Hi EL J = O). TIlis F·statist ic is 89. HiEL a nd STR x HiEL.O.5 == 0.97STR.2. the h ypot h e~is that (WO lines have the same slope can be tested hy testing wh e the r the coeffic ien t on the interaction term is zero.. = 1). but the tes ts of the individual hypot heses usi ng the (·slatisti c fail to reject it.teach er ra tio does not enlt:r this spec· ificalion can be tested by computing the F·statistic for the join t hypot hesis that the coefficients on STR and on the inte raction tcnn are both zero.1.lhe estimated regression line is 682. According to these estimates.645 in absolute value.9.28/0. Thc HiCa tistic i~ I = 5.04.97 = .34) can be used 10 test seve ral hypothe· sc:s a bout the popula tion regression line. Even though it is im possible to tell which of the coefficie nlS is non7.2::1 .2 + 5.6119. .25 poinls in districts with high fra c· tions of English learners.97 poin t in districts wi th low unctions of E nglish learners but by 2. &. For districts with a high fraction of English le<lfncrs (Hi EL. .8. Thus. X HiELjare both 7.O. the hypothesis that the st udent. (~ }J) where the binary variable I1iELI equals I if the perce nt. ·n lis results in la rge standard e rrors on the individual coefficien ts. Tnis F·statistic is 5. (lrc highly cor· related.:annot be rejected using a two·sided test at th e 10% significa nce level. 8 . which is significan t at the 1% levcl.004 ..32.34). an d the coefficient on the interaction term S TR. esi~ that th e coeffi(. The reason fo r th is is that the regressors.ero. Fi rst the hypoth esis thatthc two lines are in fact t he same can be test ed by computing the F·statiSlic testing the jo int hypoth . is the coefficient on th e interaction term in Equation (R. Third . there is strong !!videncc against the hypothesis that both a re zero. Second .
hIo.721 " (0.484 (0.006) . w the OLS estimator orthe coefficient on education could have omined variable bias.OOOJ(l<'l· · (OJIOOO IS) . the fu nctional form used in Chapter 5a simple linear reiatiOn implies \hal earnings change by a conSllI nl doll ar (roil/if/li ed) TABLE 8 .52 1" (0.n the Mid.. ..0008) si Femafe X Years ofI! (!m::miol/ POIentiaf e.: logarithm of Hourly Eornings.1 74 1..oo.0. A!. Slandllrd " nor< a.n 1I.1XXJ8) . mfJoCt Are level. workers with morc education lend 10 earn more than their counterparts with less educa tion.f.cp. In<h"'dual COC:ITOC:lCnlS arc "ausually 51gnirlCllnl al the .. Ililt omitted region is North NS/)... education has economic rewa rds. The dala are from 11M! March 200S Currenl PopulauOfI Sur.621" (0. the boxes in Chaplcl10 3 and 5 show.(X)1 I ) Female . OIl~ " Year.cX~) 0. 3. SI.\ 284 CHAPTER 8 Nonlineor Rq~tiOl'l f UI'ICliOM I n addi tion to its intellectual pleasu res.s of educaflOn 0. arc 1lI<l' calor va riableS de nou ng the reJlOO of the Un lled SlIlla In ".o:".xperien ce su: dOl ).:ilC&lOf uriilblc thai cquab t for "vtn<"11 and 0 for men.~ . Tepmlcd .023) 0.0861" (O.. The analy~ j s in those boxes was incomplete. .lXlII) . and 1\.(l.0... Porell1iul t!. <00 add aris..006) .:" and "qual) 0 nth" .:h the work er loves: For example.54 S" (0. for al least three reason" First.058" (0.(122) 0.0016) (O.0207· (0.30" (0. 11.220 1.0232" (0.2 15" (0..018) 0. Sou/h..09. II ) 0.cy (scc Append.011 ) 0.237'" (O.011) 0.~/.n pI . .006) Midwest South \\Ie..S1.1XXJ8) )2) (3) )4) O.on' for cadi regr"".mal...0 15) 0.. il failed to control for other dctcnninants of earnings lhat might be correlated with educational uchievcmcnt. ~ ••• .0. R~renor."f l Intercept R' I....fper ienctf O.22 1  .63 ob!..·al. F. 1 The Retum to Education and the Gender Gop: Regression Results for the United States in 2004 Dependenl _rjobl..030" (0.0. Second. Ill!. MIII . "" '" canl h..078" (0. MIII" ·<"1/ equals I If 1110:: wurke Tlivn . however. reg: 1.01 80"' " (0.242 'h..0.1).(XH 6) 0.ll!e 'lampk SlZC IS n ..()914" (0. theses ""low the: CSll mllled coeffICient$.
For 12 years of education.0899 + 0.0%. . and 11 . the umis ~ion and women have different slopes... the e<.... so anothe r ycar of education is a~iatcd with a conSr31l1 percentage mcrease (not dollar increase) in carnings... These estimate~ )207·" m32·· <XXI8) ~ 1\ .(!d by a multiple regression analysis that includes those deter minants of earnings . herefl§ onc might su~pect region. Table 8. OOO~ of the return to education and of gender in regressio n (I) docs nOt result in omiHed variable bias: Even though gen the gender gap still have limitations. . data on rulltime . Controlling for region makes a small difference to the estimated coefficients on the edu that1he dollar change in earnings is actually larger at higher levels of educa tion.07S'" 1.3 Inleroctions Between Independent Variable" 285 at mig. regression (4) control<. 10 percent) for women.(06) uncorrelated.... Decau~c tbe regression functions for men {4 ) ' ""'.0207. notably the native abilit}' of the worker. the gender gap is less in perccntage terms. SO amount for each additional year of education. if omitted.0207 x 12 .ht tbe [)n could It. 1.= in Chapler 3. and potcntial problems associated with the way variables are measured in the CPs. relative to those reported in regression (3).nctional cation terms. could cause omi tted va riable bias. on average men and women Ita\'!: nearly I]}e same il!"els of education. the returns to education are e<:onomicaily and regression (3).. the gender gap depends o n thc years o f educa tion..030. ~.006) that they are 1he samc is 11. as measured by years since completion of schooling... from the Cu rrent Population Surve)' (the CPS dltla arc de. Fourth.05!!...242 otl!Iernlioro> J W". gender and years of cducation are (XXXII') ."Ond. and that uses a nonlin car fu nctional form relating education and earnings..~ion (4) controll> for the rcgion orlhe Country in which the individual lives. and that the ret urn depends on the quality of the education.. A recent survey by Ihe econometrician David Curd (1999) of dozcns of empirical studie~ concludes that labor cconomi<. Thble 8.215" 1018) Third. Third. See Card ( 1999).!!· :& \ !flbc rted III pare n· JlgIllrlC3!lCC Ir years of education differ sysremat icnlly by return to cducation. .99% for cach rear of education for men..' ..06'X..1).. including the possibility of other omitted variables.25 (= O.J cally significantly diffe rent for men and women: In hypolhe~i~ 1.~ing potential omitted variable bias that might ari~e 0. ( ...1:(6) . 19.1 summarizes regressions estimated using in regression (4) is 8.I are oomistent with Ihose obtained by economists who cllrcfully address these limitations. The estimated cOt!fficicl\ts imply a declining marginal valuc for each year of potential experience.ts' best estimates of lhe return to education generall y fall between 8% and II %. hich. Nevertheless.OI8OlO. art in(. for the potentilll experience of the worker. the tstatistic testing the ~tatisti \. der enters regression (2) significa ntly and with a large coefficient. Scr.timatcs in Table S. ages 30 through 64.0. regres.0. If you are inte rested in learning mo re ahout the economic sU~lIln tial .!.521 . The dependent "ariable is the logarit hm of hourly earnings. the gender gap is estimated to be 27. tha t is.scribc:d in Appendix 3. thereby addre<. All of these Iimi1ations . the box in Chapter 5 ignores the gcnder diffe rences in earnings highlighted in the bo.orkers..3% kill) i21 "· )22) (= 0.1 has rour salient resul t\' First.8. The estimated economic retUrn on ~due~ t ion 'elation 'fl/i/lw:~dJ nt dolhlr can be addrcs. ).00t6). in percent): for 16 years o f education.
. increa Po + {3 ]X II + (32X2i + (3)(Xli x X 2.IO(c>l 111~ first term i ~ the effect from changing Xl holding Xl constant: the second lern11~ the effect from changing X l holding Xl (. of an additional yei\! uf expcricnci. am. Wheu interaction!.' i~ the number of years he or she went to school.lX1. and X 2s ) are continuous". i. X ]' is his or he r years of ""\1r~ experience.:onstanl.35) .111e difference in Equation (K4 ). A similar calculation ~bows that thc c[fect on Y of [1 chang~ . Thall.lX2 in X 2• hllkl· . they eJ O be used to estimute price elasticitie:i when the price elnstic::ity depends on Ih~ which depends on Xl' For example.I~ the extra effect from changing both Xl and X 2. by !iXI and Xl changes hy .!t t~1 = (~ + fl~I)' PUlling these two eff~cts togcthe r shows that the cocfficient fl. This interaction can h~ modeled by augmenting the linear regression modd wilh an in teraction term th..n example is when Y. however.286 CHAPTU a Nonlinear RegrM~ion Functions Interactions Between Two Continuous Variables Now suppose that hoth independent variables (X ].~X~).{3\ 6. To sec thi!. Y .8J. 1.equivalen tly. then the effect on log ea rnings of all additional year of expe rie nce is greater.1I is the product of X].l.thc effect on wages o f nn additional year \' experience does not depend on Lhe number of years of education or.5..:.':') 'Ille intcraction lerm allows the effect of a unil change in XI to de pend on X~.! X z. = INTI The i prod jn inXt Xz T and b. .1 on the illil'r' action tern. if XI change!.lX] = {3t + fl3 2· X ~y (f06) ing Xl constant..10(a»).the e[fect on Yof a change in X l' holdi ng X 2 constant I!.. and X u: Y. the effect of an additional year of education does not depend on the number \11 years of work experience.\' [Exercise 8.lXl~Xl (Exercise 8.XI . apply th e general meth od for willputing effects in nonlinear rcgrcs ~iOll models in Key Concept 8.(/3] + /3.) + "l' (K . if fll is positl\<. then the expect ed change in } h 6. in the earnings example. and the finalt crm. In reality. If the POpll lalion regression functio n is lincar. is log earnings of the lih worker. by the amount {3" for each additional }ear of education the worker has.lX . is!:J. are combined with logarithmic Iran~fonnutions. Interactions between two variables are summarized as Key Concept 8. Y ::: (f3\ + fhX2P. there might be an intet<ll:linn between these twO variables so that lhe effect on wngc!.md X 2 • above and beyond the 'lUll of the effects of a unit increase in Xl alone and a unit increm:ie in X~ alone.: depe nds on the number of yea rs of educa tion.X\ + ({32 + f3JX\)~X2 + . 11lUs. computed tor the interacted regressiun function in Equation (8. is the cffcct of a unit increase in X] .
above and beyond the sum of the individual effects of a unit increase in X l a lone and a unit increase in X 2 a lo ne.IO(c)I·I'h' (3)~Xl ~·y:· i~ When the pe rcentage of E nglish le arners is a t the median (PeIEL = g. t h en oSl ~tcr.**1 KEY CONCEPT rfwork ~ pOpu INTERACTIONS IN MULTIPLE REGRESSION The: interac tion term between th e two indepe nde nt variables X l nnd X 2 is thei r product X l X 2.0012 x 23. Th e pre vious e xampl es co nsider ed in te r actions between th e SIlI de nt.1 2 + 0.}¥J!VCf.37) (8. wit h a slope of 0 1.5 I r lalcnlly.36) I tl. reducing the student.teacher ratio by o ne unit is predicted to increase test scores by o nly 1.0). ThaI i.019 = 0..67 PctE L ~ + O. )0 the inter bnd the sunl 'one. That is... \Vhe n the percentage of English learners i ~ at the 75 th pe rcenti le ( PctEL = 23..09 ( = 1.85).0012lO.59) (U .()9 points. inbcr of tract ion year of can hI:: ~ nn that x (8. bUI for a district with 23. kms.hOld Tes/Sco re = 686.85 % English learn· ers. is (8.:her ratio and a binary variable indicating whet her the percentage of E ng lish learne rs is large or small.teacher ratio is estimated to be lang!:: In f 1<' 1.11 points. Ihe\ call ipcnds o~ tl1~ • .! 10 depend on lhe value of X l The coefficient o n X I X X2 is the dft!ct of 3 unit increase in XI (/lui X~ .l.UOI2(STR x (0.1 2 + 0. iCond ternl i) ~ccrl H5. r rd an c:\amp le ). A different way to study this interaction is to examine the interaction between the studen tteacher ra tio and the continuous va riable .0). this li ne is esti mated 1 be natler. ·111C difference between these eMimatcd cffcCl~ is n 01 statist ically signi ficant.. for a dislrict wilh 8.0% English learners. '111 is true whet he r X I a nd/or X 2 arc continuo us o r binary.0012 x 8. The estima te d in te ractio n regressio n is llslanl.3 Interactions Between Independent Voriobles 287 rt.11 ( = . conversely_allows th ~ e t'fcc! of n change in X.1. ho wc ve r:The Ista tistic reSoting wheth er the coefficient on the interaction term is zero i ~ I " U.37) N' = 0. 1S j ~car of 8.Ind uding this interaction te rm allows the effect on Y o f II ('hangc in XI to depend o n the value of X2 and. by the ~n X2 .0.the slope o f the line relating test scores and the student.019) PcrEL). jus. ~ ut e d rcg res for _~Xl) ~X \ Application to the studentteacher ratio and the percentage of English learners.g5).3 . the percen tage of Engl ish learners (PctE L). ~.12STR .422.35) characteristics of the good (see the box "The De ma nd for Econom ic lou rna ls" fo r 0 " X. A n 8. \' '::.8) (0.06. (11.the estimated effect of a unit reduction in the studentIeaeher ratio is to inercasc lest scores by 1."1 ~. which is not significant althe 10% leve l.Iem.
so economists..... " fiGURE 8.S. How elastic is thc demand by libraries for ceo nomks journals'! To find OUI.. ib price is logically me~ urcd not in dollars per year or dollArs per page but in<itcad in dollars per idea .. lihraries (Y1) am! ils libmry sUh$cri p lion price using data for tnc year 2{)()O for ISO eco nomics journals...80). .491' :: 80 De~nd w~n • 0 6 S l 3 2 _I 2 3 I Ln(Price per citation) (c) In(Subolcnpuo1l1) lOO 111(1'II("e.. ..90 lor ISO economics ja''nlOr~ in 2OCX>. a good indirect measure h.51 than for old journals (Age . !\fal]. 'o( Itu In(S ubscriprions) 8 1 .s [ lIlla. o s shown in f ig ure 8. figure 8..9c shows thai demand IS mQ(e elastic lor young journals {Age .. library $ubK riplion~ (quontity) and the ljbrary price per citotion (price).9b.)X'r ei t3tion lo(pricc pt{ citalio n ) (b) In(Sub:<criptlons) and In{Yrlct ptr CIl<lIIt)n) In (.. Bccaus. we analyzed the rela tio n~hip between Ihe numher of subscriptions to a journnl at U.: the product of a Journal u. • • 0 II II < 2 • 2. p. Although we cannot mea')ure "ideas" directly.$ 0 nonlinear inverse relolion between the . Most research in economics first apPC.r 3 There .or thcir librariessubscribc to economics journals. S. not thc paper on whkh it is prmted but mther the ideas it contains.9 Sublc r ipdons library Subscriptions and Prices of Economics Journals In(Subscriptiom) . the relation between log quan tity CW"Id tog price appears 10 be appro)(imote/y lin eo.. " Sf:" if llt<. I.288 CHAPTER 8 Nonlineor Regression Funclions P ro ressional econo mists (o llow the most recent research in their areas of speciali:. .' .lrs in econom ics journals. .I 2 < P r ice pe r citatiou (a) Subscri poons lnd P rice.. o . the number of times thm articles in a journal are cOnlinllell .. .:ation . o '0 ) 6 S 4 3 2 . •..per ma tion) number of U. Bul 0S }~ in figure 8. ) 6 5 " .
ord Journa! (ll a l m~1 E C(lm) 1III'u'jCJ cost m OT'.899·· (0.622 0. ~rna:1 is leT the r 'w~ iogly...onomic Uevie:w: Because we UTl' inleresh':d in estimming clastici nOlin ead in. • .1<..u.i. Inc:l.O.235· (0.040) In(Agt'j X In(Pru'" pn ci(m/rm) In(CllllfO("lt'r.373" (0.1 4' " (0.206(0.1i91 O.098) 0.229 (U.2 Oe~ndent Iw i mwll I Estimates of tne Demand for Economic Journals va..llbsequentty d ied by other researc he rs. Ih~1 the ~frKiclIIs on rlnll'Ti.43" (().n:.0) 0. Ac(. sivc per citation tx c au. \i\!UI:) the in Fig )ood 0. Some journ als an: c xpen· ties.705 0.9a and R lJh provide c: mpi r ical '>uppo rl fo r this transformat ion .15 I (o.11Y ) 0.'" range is enormous.tlcalJv :\.s..2).en In parcmh~ un<kr F·s[llhSUC5.OS33" (0.38) 3.. lndi. libraries in the Year 2000.f)44 ) 0.41" (0.6SS quan ..145) (0.374 (O .77" (o.ly !in k2 0.38) FSlatilnu and Summary Slatiltin F~lallslic testing coefficieniS on qUddratic and cubic lenns (p.id· U~[ cuetliclCQt5 ~re ~tatlj.able: logarithm of l ubse.555 I.(XXUX X1) !nttre!:p!  0..W8'· (O.'( cllario/l) . we u~c a loglog ~ pcc itica tion (Ke) Concept 8..750 0.(96) 4. ()th o erS because their libra ry subscription price per year is very high: III 2006.607 ().. tnan $2700. we m<:llsure price as the "prin' per ci tatio n" in the jourual.8.~ pt"TCllollmlWand [In(P'''~ ~r'II"lUm)I' a r~ both zew.or (I) (3) .0037 (OJX)55) [In( l'rice per ("luNion )f [lnr Price pt'r d /(II/ol/)]' !n(Agt' ) 0..~e ther hav.lIS ) 0. B ~cause casure IS the cominued al arc TABLE B.0.1[14) .: few ci l ~lion .Am ~ri{'all Ec.38) 3.017 (O.lInd .626 elastic JrI'IOis  ~hc r\Iali~hc t .Iute"'! ~re g.n Y ) 0.. IWlos 'leN u. Ih<:: h\·poth~.21·· (0.:..pf>anl at u. 3 Interactions Belweefllndependenl Voriables 289 <.S ) 3. ".. (2) T80 observation• . Rej!IC"..025) 0.9\<::n in p"r<::nlheses under codflCienl).Ii'nlricant al Ibe ·S~ lc'cl or •• I 'l(.156*" (0. per cita tion or mu re. from iCl peT ci ta tiun (the . a library su bscription 10 the The scaHe rplots in Fig ure 8. The priel.'ll11eriflln Ecollomic Ri:~'if')\') to .f + l.961 "* (') lnl Price PI.052) 0.424 '" (0.098) 0.13ndard Cl'mf! arc .()~ 9 times \h~' prk <: of a library subscription 10 th.118) 0.
Demand is less cI. 8.S<lnlu. afte r takjng econom ic facto rs and non linearities into account.5. 2.. s. holding pricc and age constmlt.te acher mlio by tWO s tudents per tcac he r. does the eHect on test sco res of reducing the s t udent. " 'hile the younge r Joumal"s is . espcciall~' (or old~r journals.9c: the older joumal"~ demand elltSticit) is O. these nonlinear s pecifications must be augme nted ~~ith control va riables. a rtgression of log quantit).2. 3. abou! the effect on [est scores of reducing tbe s tude nt. t he s pecifications in Sec tio ns 8.0.3 to 0..2R (SE = 0.:r istics of different dis Lricts.: two cont rol vari· abies.06). as our s uperintendent from C hapter 4 propose). Those resulLS yield the following conclusions (see if you can find the basis for these conclusions in the table!): 1. dOCS th is effec t depend on the value of t he slUdc nI. against log price CQuid have omitted \"ariablc bias. 18..4 Nonlinear Effects on Test Scores of the Student.Teacher Ratio Thi s secti on 3ddrCS5e~ Ihree speci fic q uestions about tes t scores and the :. First.teaeber ra tio. ns addictive as cigarett~sbut a lot ben e Tfor your hea lth! ' jourIlnl!l.ec UcrgSITOm (2001). Conseque n t I)" t hese results a rg uably are s u bject to o m itted \'3T1ablc bias.67 (SE . For libraries. \\ hat is lhe estimated e ffect on test scores o f reduci ng the s t udent.tU de n t. having thc most recent research on hand i~ a necessity. it seems. IThese dat a wcrc graciOllsly provlded h)' Profe~~or lheodore Bergstrom of the DepilTtmcnt of Economic! III the Uni . to do? . Barbara.it)· of Califomill. To d raw s u bstantive concl us io n:.3 excl ude additio na l control va riables s uc h a'i t tLe s tude nts' economic backgro und . experts estimate the demand elasticity for cigarettes to be in the range of . Our regressions therefore includ. and it is to such an exercise that we now turn. Demand is greater fo r journals with more lIcters. To keep the discussion foc used o n no nlinear models.tstic for older than for newer inSl:nsith'c to price. By way of comparison. the logarithm of age lind the logarithm of t he n umber of characters per year in the journal.0. after controlling fo r differences in economic ch aract.teacber ratio? Third. Economics jour· nals arc. nOI a luxury.teacher ro ti o.:n.teacher ratio depe nd Oil the frac tion of English learners? Second. If you areinlCf eS led in learning mOfe about the cronomics 0{ economIC! JOurnals. and mo SI impon ant. So what is the elasticity of demand for economics journa ls? It depends on the age of the journal. Demand curves lor .OJ)8). rather than a ~'hur cubic. TIle evidence SUpp<lflS fI linear. upstan arc superimposed on the !OC8lterplot in Figure 8. funct ion of log price.· 290 CHAPUR 8 Nonlincor R:egreutoo Functions some of the o ldest and most prestigious journals are the cheapest per citation. This demand is very inelastic: Demand is VCt)" lbc regression resuhs arc summarized in Table 8.m 80yearold journal and a yea ~old ~.
was dropped because il was nOI significa nt in \ ·ec l1 ie bk 109 'ith stu' cter ~ thC tJl. This regression does not control fo r income. so the first thing we.s and thdr pvalues. 0 .teacher ratio by includin g a cubic spec ifica tion in STR in add ition to the ot her control variables in regression (4) [the interactio n term .0010 .2 and 8. When the economic control variables (percentage eli gible for subsidized lu nch and log incom e) are added [regression (4) in the tableJ. lhe hypothe sis Ihatthe effect of STR is the same for djSlriets wilh low and high percentages of English learners cannot be rejected <l Ithe 5% lelfe l (t he /SLlJ tisLic is I "" . but with no economic control varia.0. and summary sta tistics.8.l1te entries in tbe table are the coef ficients. we are oat holding expenditures per pupil constant). we do not include expe nd ilures pe r pupil as a regressor and in so doing we are considering the effect of decreasing the studen t. the coefficients change.1.3.1l1e log of income is sta tistically significant at the 1% level and the coefficient on th e s\ udenttt:achcr ratio becomes somewhat closer to zero.2 suggests tllul th is spec· ification captures the non linear relationship betwee n lest scores and income.rized in Table 8.50 = . Discussion of Regression Results The OLS regression results are summa. cerlain Fstatisttc. The logarithm of income is used because tbe empirical analysis o f Section 8. fallin g from . on T Scores of the SIudentT est eoc:heI Rotio 291 We answer these questions by considering nonlinear regression specifica tions 0 of the type discussed in Sections 8. The cha nge in the coeffic ient o n STR is large e nough between regressions (I) a nd (2) to warrant including the loga rithm of income in the remaining regressions a!> a de terrent to omitted variable bias. although it remains statisticall y significant at the 1% level. Regression (5) examines whether the effect of changing the student. Based on the evidence in regressio n (4). standard errors. The first column of regression results.1. is regressio n (3) in Table 7.4 Nonlinear Effech.16). allow ing expendit ures per pupilla incre ase (t hat is.1 l dOC:~ f hat i~ . but in neither case is the coefficient on the interaction term significa nt at the 5% level. As in Section 7. do is check whe ther the results change substantially when log income is included as an additional economic contro l vari able.3 i ~ tbe interacted regression in Equation (8.1 repea led here fo r convenience.bles. The columns labeled ( I) through (7) each report separate regressions.0. Regression (3) io Tahle 8. extended 1 include two measures of the economic background of the studen ts: the percentage of st udents eligible fo r a subsidized lunch and the logarithm of average district income.73. laheled regression (1) in rhe ta ble.teacher ratio.34) with the binary vari able for a high or low percentage of E nglish learners.3.y l\\ll " . IliEL x STR.58/0. as indicated by the description in e<lch row.6.3. The results are given in regression (2) in Table R.teacher ra tio depcnds on Ihe value of the student..
27 ) O.!9) 12.<j. 11t F.7<)1 0.2..[ 'M "ltnrj. STR' 0 (e) /f.1 on Joinl Hypothe .IIIE! X STR·.EL )( S TR.XU) ~2. K :'1 R' an: ""'.101 (0.% l < (Will ) ~.b rn C"ahr(lrnia.1\ U.41~' fl..~1 ~.59) 0.'} 11 .0 5. U!1I11.1 7 « IJ.7'1') 07'l" Ibex rC!U"""'\ln~ "ac C'llm~ l cJ U'll\£ tile data (\Il Ii.420·~ O.. j( STR' "..t "_ In dbrrict.Jj)""· (tU1~'>J (O.n.5) STR J 4 :is'' ( \.nh .!:1I (25 ?6 ) ).54) ~HiEL )( SfR' % Eligihle ror subsidized lunch 0.O:!9) (O.H) \1 507" (1..dr..6.0 ( \63.1.gressor . '1.~l· A\erag.<}2 {'I:11 (0.6) (11.·all1CS IIrC 11''('11 In Pl'fo:nl~ under F·". IJ] (t).88 8.f).:all) "~..81 ) 11...1J7) .3 1151 '· ( UO) 2+1 700.6' (8.) 122. ('I..47 (l:!7) f % English learners % English ieamc" 2: 0.'" .U."..04 (0..\:1) h. and p·.53 (0.e. lllEL) 11 9..50 (9.7) HitL. (a) All STR vari:lblc ~ and interacuons .f"'''n!.50) ~ 123. n" ( 103) 816.lIJ (b) STR' .1 0795 K~n 85 ..lKII) 4.l.80) 5. 5.% (0 (~ji) (0.7b) (1.42" ( 1.lj.1.d .6) s (1 85.'J) () I n~' (O.7) fStotiltiU and pValu..44) J..7IJS 0.19:"· to.\6h ({J.0·0) OS47·" (0. '. ) (7 ) StudenHeacher r3110 (STR ) Snl~ I OU ((l.!). 0 Sf. d~"1Ctlbed in ArflClldh ~ 1 '\lIll.075"· 10024 ) O.5) (lM.VKiu.: ui~trici inl'Ome (logarithm) I nterc~pl 11.< STR O.003) 2.5 1) .1 (50. ( I) (2) (3 ) (4 ) (5) (.tlJl) 0..V) 252.2) Hi f..41 1·· (OJT.I~ . nu: k.26) .7.12" 12..~) 65.n"..059 (0.lrc Sl~ .1 d"ln.34 ) &U3' (Zt86) tlUO" (2S..2' ('..O.\lCf1i<:l..6·· (V...') 658.003) « (WII ) C).R ~.OhO (OJJ21 0.021) 0. 420 observations.292 CH APTER B Nonlinear Regreuion Fuodion~ TABU 8..04 15...97 (0.?3" (0..75' (1.176·· (O..69 (O. lnd.3 Noolineor Regression Models of Test Scores ~ Dependent TOriab': R.2fi (1J...Wnl CTT"" lQ fQrcnl~ urwk.mll) Jl.305 U..I ) /<.SS (0.1 ...2" O.J1. /Iln'"".004) 5.024) 11.'i..J/.12 ( I..\ (327.L x 5rH! 6.U3I) IO%? (Binary.'.
comparing regressions (6) <lnd (4) makes it clear that these differences are associated \\ith the qU3dratic and cubic terms. '1l1e nonlinear specifica ti ons ill Tanlc 8. with a pvaluc of < O.03') r 0.1)21 ) 0 . T he estimated regression funct ions are a\1 cl ose to each other.ression functions are different for dis tricts with high a nd low percentages of English learne rs: however.1lle estimates in regression (5) arc consiste nt wit h the studentteacher ratio having a nonlinear effect.~ oomput<:d by setllng each ind.046 and thus is significant at the 5% bu t not the 1 % significance !evel Illis provides some evidence thai the reg. to its ~mplt. on the olher regressors do nol change subsln nlially wheD this modification is made.81) 244. the predicted .:~ndent YJlriahte. intlic<ltmg thaI the result s in regression (5) arc no! sensitive 10 whal measure of the percentage of E nglish learners is actually used in the regression.~1·· (1.o!: the other variables ron~tant at their sample aver figes.teacher ralio but also on the frac tion of English le(lmcrs..29·· l<'.1I"cra!Jc "!Lluc and compUling thc predicted vatue by multiplyinp. 4 These estimated regression functio ns show t he predicted v<llue of test scores as a function of the studentteacher ralio...57 0.69."S of S J 1<.. 3 :lre most easily in terpreted graphi call y.OJ') 1].The coeffi ci ent!:. "For each curvc.27) o))60U (0.% (f)... which has a p value of 0.' ) :."]\1:<0 ~ndarJe"''' . hold.47" (1. 7 ) 5.alue wa.. Regr. in which the con ti nuous vari able PctEL is used instead of the binary variable Hil:L to control for the pe r centage of English learners in th e district. we test the res triction that the coeffi cients on the three interaction terms ure ze ro. regression (4) at the 10% kvell. STR".knt Yllriahle~ hy the respecti .10 g raph~ the esti mated regression functions relating test scores and lhe sludc m.lre ~t~ • .. alo ng witb a SC3uerplot of the data. although the cubi c regressions flatten out for large val ues of the studentteacher ratio. In all the specificat ions. holding fixed other vulues of the independent variables in th e regression .teachc:r ratio for the linear spccific.8.::ssio n (7) is <l modification of regression (5 ).'llIis "as done for "arious .fXH).17.X (165. 111e null hypothesis thaI the rela tionship is linear is rejected at the 1% signHica n c ~ level against Ihe uherna live Iha l it is cubic (Iht.Oo. Regression (6) further examines whether the effetl of the studentteacher ratio depends not jusl on the value of the student. The resulting Fstatisti c is 2.4 Nonlinear Effec~ on Test Score~ of the StudentTeocher Ralio 293 (7) ~S.]6j ~3..... Figure 8 ..'lti on (2) and the cubic specifica tions (5) and (7).! Fstatistic testing the hypothesis that the true cocfficienls on SFR2 and STRJ are zero is 6. thC"Se fi.: ~hmllted coefficients (rom Tohle RJ. the hypothesis tha t the studentteacher ratio does not enter the regressions is rejected at the 1% level. o ther than STR. By including in te ractions between Hi EL and STR. To do so.91 (UJ))) ) 5.402 (O. ' t(lcicn!..·aIUl.d values of the inde~no. and STR" we can check whethc r the (possibly cubic) population regression s funct ions relating test scores and STR are different for low and high perccntages of English learners. and th" graph ofthc resulting adjustl""tl predicted values is the ~\1m3tcd Tegre!> " ~ Ion line rclalll1!: test scores !Lnd the STR. I6n •• 1(0.
11 shows.~ .5 constitute onl y 6% of Ihe observations... 0 0 680 6(. ... '" • .. de pend ing on whethe r the pcr centage of EogJish learners in the district is large or small.teacher ratios for which we have the most data. for STR between 17 and 23... y.a range that inclu des ~% of the observations.)0•• ..'. They indica te 0 $ffiQ1I amount o f oonlineority in the relation between ~I ~Qre~ Ten score no 700 and ~tudentIeocher rolio. ".. so the differences between the nonlinear regression functions nTe renecting differences in these very few districts with very low stude ntteacher ratio~. holding co nsta nt the stude ntteache r ra tio.'l. .. we conclude t hat the effect on test scores of a cha nge In th e st ude nt.• 294 CHA. based on Figure 8.. "~*::. " _.. )~ . • . . ....5) and (7) of T obie 8. is of practical importance.. Thus.5.. ~"' .. ~. : t ..~.the two function s are se parated hy approximately ten poi[\ls but otherwise are very similar..0 . in addition to bei ng statistically significant. 11 graphs these two estimated regression functions so that we can scc whether this di fference.. . .. As Figure 8..... districts with a lower perce ntage of E nglish learners do better..teache r ratio does not de pe nd on the pe rcentage of Engli'\h learners for the range o f student .teacher ratios below 16. :::0' • o • .teacher ratios between 17 and 23... Figure 8... . • "# '\ iK>n..• tillt'ar regression (2) _ _ • 62tJ 6OO1~CCC~~CC~=~~C~ 12 14 16 H! 20 22 24 26 . for student.'". 10 Three Regression Functions Relating Te5t SeOret and Studentleocher Ratio fl Di The cubic fegression~ from column~ (. ......... . .. .PTER 8 Nonl inear Regression Func tions FIGURE 8 .(' 64{) . ..... 1~ Studentteacher rrllio R egression (6) indica tes a sta tistica ll y sigoifica nt differe nce in the cuhic regression fuoClion s relating lest scores a nd STR .. • '~t . Cubic ~rtMion (7) . ] . tha t is.. '111 e diSlricts with STR < 16.11. 11le two regression func tions are diffcn: nt for student.. but tbe e ffect o f a c ha nge in the studen tteache r ratio is essentiall y the sa me for the two gro ups. Cllbk regression (5) . hut we m ust be careful not to read mllTe into this than is justified." ..... :o "fIi..3 ore nearl}' identical. . J ....
2U ~ LI • . i'J.i cant at th e 1% level). after controlling for economic background .. Second. . as shown in Figure 8. but otherw<!.11 des 88% n pointS These results let us answer the three question s raised at the start of this section. " I '' 28 12 __~__L__L__~__~__~__~__~ 14 16 22 u 26 20 ~ ratio Studentteacher ratio " Summary of Findings e cubic the per hs these renec. in ure 8. wheth er there are many or few English learners in the district does not have a substantial influence on the effecl on test scores of a change in lhe student. In the linear specifi cations. ilfter controlling for economic backgro und.8..teache r ratio.le<lcher ratio by two SludenLS per teacher. this effect does nOI d epend on the student.73 x 2) points. there is evidence of a non linear effect on tes Lscores of th e stud en tlea ch ~r ratio... . with a tant the r ratiO is ce ntages of English learners. • ~ . First.11.ression (6) provides sta tistically significa nt e videoce (at the 5% level) that tht: regression functions are diUe reni fo r disuicts with high and low per ts.e rile two rune iions hove similor ~pe$ and slopes in this range.. .i have th~ • . This effect is statistically significant at the I % level (the coefficients on STR2 and STR J are al ways signif.'O n taining most of our data. fOf which there ore few observo liom .teache r ral io itself. howevec the estimated regression functions have similar slopes in the range of studentteacher ralios i. She wants to know the effect on test scores of reducing the student. we now can return to the superinte ndent's problem that opened Chap ter 4. In fhe diffe rent ead jllorc % of the ctions are t~ I C ilc her coreS of a t nlage l. ~ . there is no statis tically significant evidence of such a diffe re nce.46 ( = 0. The cubic regreuion function ~ HiEL = 1 from regression 1 in tobie 6) 720 700 680 660 610 (. and the estima ted effect o f this red uction is to impro ve test scores by 1. " '. The cubic specification in reg. The slopes of the rogreu ion functions differ most for very large and small values of STR.. 4 FIGURE 8.11 Nonlinear Effects on T Scores of the SIudenHeocher Ratio est 295 Regression f unctions for Districts with High and Low Percentages of English learners Test score Districb with low perc:anlogcs of English learners (HiEL =" 0 ) are shown by groy 0015 and dis/riels with HiEL = 1 are ~own by colored dols.' • • 8 3 i ~ oppfOximaiely 10 point$ below the (ubic regression function for Hifl = 0 for 17 "5 SfR "5 23. • • • • • • .. • • • • •• ••• . . 10 the linea r specificalion (2). 'Tllird .
t of this reduction is to improve lesl scores by 1. for example_. . llo\\ should you analyze possible nonlinearities in p racticc? Sectio n 8.• X ..lnd she is considering cutting itto 20..lUdenl~ would differentially benefit from more personal aUention. In the test sco r~ application. X~. 8.. the unkn own coefficients can be eSlimated by OLS.. independent variable? If so. XI' holoing the other independe nt v(lriables X 2• . ~ addressed by your study? Answering th ese questi ons carefully will foell:' ~our analysis..ue. There. If her district currently bas a studentteacher ratio of 22. The estimates from the nonl inear specifications suggl.'. most importantly. this effect depends on the value o r the studentteacher ralio.! expected effect on Y of a change in onc of th e independen t variable:.1 laid o ut . will ie based on regre:'siun (7) this c~lima t c h 1. but this approach req uires you to make dcci". and hypotheses ahout the ir va lues can ~c tested using /. in ge ne ral depends on the values of XI' x.S Conclusion This ch~lpler presen te d several ways to model non linear regression functions.93.. hul 10 practicc data analysis is rarely that simple. tho.t that cutting the studentIeacher ralio has a somewhat greate r effect if this mlio i\ already small. .296 CH APTf.and '·statistics as described in Chap ter 7.R 8 NonlineoT Regres~i on Fundion ~ nonlinear specifica tions.rc~· sion function mighl depend on the value of that . cring cUlling it to IS. wh y th e slope of the populatioll n:p. Because these models arc variants of the mul tiple regression mode l. \\hlch nonlineari[ies (if any) could have major implicatio ns for the substantive i~. By making the qucs. .. perhaps Ih:cau.J geno:ral approach {or :)uch an analysis.e those :. consta nt. and she is COINu.. while based on regression (7) this esti mate is 2. In Ihese models. b.:uncd fo r being a bit bcwilde red abo ut willcb 10 use in (I given applical ion. then based o n regression (5) th e estim ated effect of thi~ reduction is 10 improve tesl scores by 3. can you think of a rea~on .l~ecl Oil economic theory or expert judgment.I large percentage ofstudenb stilllcarning English. arc muny dHferent mode ls in this chapter. and yo u could nol be ol._ __ _~ . we were able to find a precise answer: After cOlltrollinR fOf lhe'_ j". tion precise.z. ll1e single most important step in speci fying non linear regression functiOnS is 10 "usc your head. what sort of dependence might you expect? And . II would be convenient if there were a ~in' gle recipe you cou ld follow that would always work in every appl ication." Before you look at the data .1f her district currently has a slUden tlcacher ratio of 20..00 points. or another. then based on regression (5) the eSIimated erfe. .i(lOs and exe rcise judgment along the way.90.93 points. such reasoning led us 10 in\'e~ tigate whether hiri ng more teachers might have a greate r effect in districts \\ith .
llled an interaction tenn. g. which live iSSlll·~ focUS your portional changes and clasticities. b ut in nclions i~ 011 .:asing (has a positive slope) and is steep fur small values of X but less steep for large values of X.:ne ral eClsions re a sin· 11. regrl. In a nonlinea r regression. Key Terms quadratic rcgrc~sion model (258) tog·linear model (271) loglog model (271) interact ion term (27~) il'L1 cracted regressor (278) interaction regression model (278) nonlinear regression funl:lion (260) polynomia l regression model (265) cuhic regression model (265) elasticity (267) exponen tial function (267) natural10garithm (268) linearlog model (269) nonlinear least squares (309) nonlinear lcaM squares estimators (309) Review the Concepts K_ l Sketch a regression fu nction that is incr.. 5.. 4.lble~ b c. 1. R egressions involving logarith ms are used 10 estimate pro nown can be c is.. we found no statistically significant evidence of such an interaction. 2. When interaction terms are included as regressors.'ariablc ·) nlly.I. The e ffect on r of a change inlhe independent variabk(s) can be computed by evaluating the.. and a cu bic regressi on includeS X. the 10iding o n the blamed .hi g .rc~ l. Small changcs in logarithms ca n be interprdcd as proportional or percentage liolls. Can you thin k of an economic relationship wilh a shape like Ihis? us 10 i tl\' ':~ Inch \\il\1 a se s\Udel'l~ Ihe qU CS ling r(lr . X 2• and X l.RevIew the Concept~ 297 'id '"' his (7 ) and Icc\ sian economic background of the students. changes in iI v. the slope of the popUlation regression funct ion depends on the value or one or more of th e independent variables.!ssion function at two values of the independent variable(s). Summary 1. A quadrat ic regres sion includes X and X 1. they allow the regression slope of one variable \0 depend on the value of a nother variable.cst tio is The procedure is summar ized in Key Concept S. ~hould g. A polynomi<ll regression includes powers or X as regressors. The product of two varil.lriablc. based on reg. Explai n how you would specify a nonlinear regression to model this shape.
where In is the q ua ntit y of (reil!) money. Sales2(. . a. capital (K). . Compare th is value to the approxim ation 100 X [In (Sa le.. but rat he r in creased when K increased.4 You have estimated a linear regression model relating Y to X. . S. ll ~ ble {! . where A PI' P2• a nd /33 a re produc_ . /1 ($1001 saJ(~l. '0 '0 Ln" Po..1 Sales in a company a re $196 million in 2001 and inc rease to $198 million in Inter. . How good is the approxima tion when Lhe change is small? Does the quality of the approximation deteri.xplain /1 ff Si. How could you use an inter action te rm to capture this effect? Con.In(Sales2(X )]." E. How would you usc regression analys is to estimate tht: production parameters? 8.. a nd R is the value of th e n(lminai inte rest rate measured in perce nt per year. aud raw materials (M). Compute thl! percentage increase in sales using the us ual fo rmula 100 X SaJ° SER  Sum. Repeat (a) assuming Sales xm :0 205.3 A s tandard " money de ma nd" func tion used by macrocconomists has the fo rm In(m) = /30 + t3 lln(GDP) + t32R.'2002 ) . \Vbat will happen Lo the value of III if CDP increases by 2%? %a1 will happen to m if the interest rate increases from 4% to 5% '? 8.XI2 = 250. Your profes· sor says.:" . b.. Sales1f. Ii' VarianJ. CDP is Ihe value of (real) gross domestic product.S Suppose tha t in probll!m 8. T.1 binary . lio n pa ram c te rs. }Joo. II/!.m '" 500. Supposc you ha ve data on produc tion and the fac l or~ of production from a random sample of finn s with the same Co bb· D ouglas pro d uc tio n functio n.orate as the percentage change increases'? 8.0 and # 2 = 0. think iliat the relationship between Yand X is nonli near. how you would test the adequacy of your linear regression. Suppose that 13 1 = 1. Exercises 8. a nd an e rror term II using the equation Q :0 AKf'>I L jJ~MjJle" .2 A " Cobb· Douglas" production function relates production (Q) to factors of produc liun.2 you lhought that the value of {32 was not can· stan t.labor (L) . 9_ Using the results in column (1).02. what is the ex pected cha nge in prict: of bui lding a 500·square·foot addition to a house? Construct a 95% can fid e nee inte rva l for the perceOlage change in price_ .2 Suppose th at a researcher collects data on houses thal have sold in a partiC ula r neighborhood ove r the pas t yea r and obtain s the regression results III the table shown below. 2002.298 CHAPTER 8 Nonlineor Regression FUn<:lions 8. c.
39) (0.00342 (0.035) 6.Ol.035) 6.1 2 (0. rool5 binary 'ari· able fltf hOU.73 0.n a paru..." 5 bi naf) V'dnabJe (I if house: has a nice ' ·ICW. r".029) 0.2 Oepe ndent vorioble: In (Price) 299 lOTS of 'T le rm Toduc· lors of Reg reSiior [ 1[ (2) [3 [ [' [ IS) as pro· tile the Sizt III{SI.60 Intercept 6.099 0.OJO) 0.028) 0.098 0.027 0. Condition 5 bind!} '~riable (1 if realtor rcporl$ house is in excellent cotldit ion.099 0.027 (0.40) S.035) 0.12 (0. is it better to usc Si:. Using column (2).:e) 0.03) O'<Xl78 (0.·i!.69 (0.) Construct a 95% confidence interval for thi s effect.00 (0..O otherwise).!) 0.c): Vic.. Compa ring columns (1) a nd (2) .e).055) InL~il .0022 (0.13 (0.73 ula llXl 00 :::> 500.036) 0.069) U.032) 0.68 (0.73 0.087) 0. Vall~bk dc:tinlu005' T'riec::5 sale price (5): Soli".72 0.037) Pno/ lil(W 0.<)99 0.07 1 (0.026) 0. 0 othc:rwl:.054) 0.102 0.14) 0. Uolh.10) POlll x View Condition 0.50) 0.029) (O.c results i(l in pric~ of 95% coP  o.91 (0.c: (in square reet ): BedrQllnls 5 numbl:r of ~drooms.69 (0. what is the estimated effect o f pool on price? (Make sure you gellhe unit'> right.Exercises Regression Results for Exen:ise 8.74 0.034) 0.1m (0. 071 (O. . oes the ang e b.12 (0.071 (0.071 (0.02 (7.OJ5) 0.000038) has Ihe f (real) Ie value that (31 1creases 1105%? r profcs· Explain not con an inter· 0.57 (2.021 (0.037 (0. The regression in column (3) adds the number of bedrooms to the regression.0036 (0.12 (0.. How large is the estimated effeet of an additional bed room? Is Ihe effeci stati stically significant? WIlY do you think th e esti matetl effect is so small'! (llilll: Which other variables are being held constant?) • .F fJrdroom s 0.036) 7.e or In(Size) to expl ain bouse prices? c.53) 0.026 (0..<c: has a swimm ing pool. 5 house >II. mm ory Stotiltin iUion in SFR l?l 0.63 (O.045) 10.
a. + {J 1STRsmnllj + ~STRmoderale. Repeat (a) a~uming 10 years of elCperience.srSwre. + II. and STRsJnn{/ .<ic wi lh a vicw. and S TRmo(IerOle = 0 o the rwise.' ~ between 20 and 25. £In educt!.3 Aft~r reading Ih is chapter's analysis of test scores and cl ass si?c. A resea rche r tries to estimate the regression TewScore.s size is greater th'ln 25.4 Read the box ··The Re lurns to Education and tbe G e nJ ~ r Ga p " in St: c tio n tD.\ size is less thH n 20 students and do very roorly when c1:ts.:lass sizes less tha n 20.studenl pcrformam:<: depends on class Sileo but not in the way your regressions say. . th en jumps and is COm l <1 nt for cl ass SI 7(. Sketch the regression tunetion relating TeJrSCQre to STR for hypothe tical val ues of the regress ion coefficiem s thai are consistent with the educatoc·s statement. T {33STRlar8ei + u r and fi nds Ihal her com puter crashes..• 300 CHAPTER 8 Nonlineor R C9re5Sion Funclions e. " In my experience. b. Is the quad rat ic term In(Size)2 important? r. and STRlarge = 1 if STR > 25. 8. Why? 8. Is there a la rge difference'! Is the differ ence sl atisticaUy significant? 8. STRmoJerole = I ir 20 ==: STR ~ 25.1 to estimate the expected change in the logarithm of average hourly earnings (All E) a~soci at. and STRmoderatt! = 0 other wise.: 1\ constant fo r <. th e regression Tr. tor comme nts. and then jumps aga in Ior class sizes gre. = {3r.To model these threshold effects. Consider b.ears of education. There are no gains from reducing cl ass size below 20 students. Use the regression in column (5) \ 0 com pute the expected chonge III price when a pool is added to a ho use withou t a \'iew..0 o the rwise. Consider a man wilh 16 ). Usc the results fro m co lumn (4) of Tahk' 8. defin e the binary vu riables STRsmoll = I if 5TR < 20. and 2 years of experienl!":· who is from a westcrn stalC. Repeat the l!xcr cise fo r a hOll.'·The educator is describing a "threshold effec(· in wh ich per{ormanCl..:d with .Her til<l1l 25. Rather. stud ents do well when c['l. = p (l + {3ISTRmwll.:m addi tional year of expcricn~. + {3~TRlllrl?t'. the r~L I ' lionship is constant in the in termedia te region between 20 and 25 stude nt~ and th ere is no loss 1 increasing class si ze when i t is alread)' greater t han 0 25. 1 and the method in Key Co ncept 8.
Wou ld your answers 10 (a}(d) c ha nge.. How wo uld you cha nge the regressio n il" you suspecte d fha l fhe effecl of expe rie nce o n ea rnings was diffe re nt fo r me n than fo r wo me n? r Ihnn e re ladents.000. Describe a nonlin ea r specification Lhat can be used to m odel this fo rm of nonlinea rity. How would the re. would you test wheth er the researcher's conjecture was bel ter tban the linear specification in column (7) of Table 8. if the person was a woma n? From fh e So ulh? Expla in. i.3.i>ults in column (4) ch ange? 8.o the linear specificatio n in column (7) of Table 8. Suppose tba t tbe vactable Characters had been divided by 1. ii.6 Refe r to Ta ble 8. To it The box reports tha t the sta ndard error fo r the estimated elasticity is 0. a.'S e.28. wha Li ~ the basis fo r eac h of these conclusio ns? b. In pa.rtic ula r.io n "Sta nda rd erro rs of eSL imated effects " below Key Concept 8.0. 1. Is the diffaence in the answers to (a) and (b) statistically signiJicanl at the 5% level? E xpla in . I." Journals" in Sectio n 8.ne d from the estimated regression ? anc!! is ss siz!!5 25.06. Looking a t the results in the ta ble. H OIII wo uld yo u ca lculate this standard error? (Hin!: See the di scus. How wo uld you test whe the r Ule rese arche r 's conjecture was bet ter tha.. How was this va lue de te nni. r than 8. xer er duca size..&ercises c. Expla in why the a nswecs to (a) and (b) are differe nt. lhe box reports that the elast icity of demand fo r a n ROyearold jo urnal is . ) c. r. A researcher s uspects tha t tbe effect of %Eligihle for subsidized Illnch has a no nli nea r effect o n lest scores. Th e box reaches three conclusio ns. he conjeclUres (ba l increases in this varia ble from 10% to 20% have little effe ct o n lest score ~ bu t tha t cha nges fr o m 50% to 60% have a much lar ge r eUecl.000 instcad of 1. b: and i. D escribe a nonlinea r specifi cat io n tha t ca n be used to m odel this form of no nlinea rit y.000.s. c\fl:. 30 T ! in d.5 Read the box "The D eman d for Econo mic.3.3? ii.J ? A resea rcher suspects tha t the effect o ( income on test scores is differ ent in di stri cts wi th small classes tha n in districts with large classes. How • . Using the results in reg ression (4).
< between 5 and IOU on the horizontal axis and va lues of Y on the vt:rllci\l axis): . (0. Let Female be an indicator variable that is equa l to I for remales ami 0 for males. Sketch the (allowing regression fun ction':> ( with value~ of . Explain what val ue mea ns. The coemcient on In( MarkcrValu e) is 0. Are large firm s more likely to have female top executives than ::. / i~ 3 binary variahle. Does this regression suggest that fe male top executives earn ics':> th an top male execu tives? E xplain.S.mall firms? Explai n.. A regression o[ the logarithm o[ earnings onto Fef1wlc yielJ.28Female + 0. in millions of dollars) and stock return (a measure of !irm perfor· mance. 111e SER is 2. iii.0.44.345. SER = 2.37 In(MarketValue) (0.004) /I .86 .: + O.01) (0.g.03) (0. public corporation.44Felll(lle. Explain why it has changed from the regression in (a).65.003) 46. fn(Eami. S. "Illc coefficien t on Female is now 0. ii.S X is a conti nuous variab le that takes on values between 5 and 100.) = 6.302 CHAPTER" Nonlinear Regression Functions 8. i . Explain wh at this value means. are added 10 the regression: [D (Earnings) = 3. ii.670. (Eacb yea r these publicly traded corporations must re port total comp~nsation levels fo r their top five executivcs. Two new variables. c.7 Thi s problem is inspired by a study of the "gender gap " in earnings in top corp0nLtc jobs [Bertrand and Hallock (2001 )J .()()4Rcwfn.65. "R 2 = 0.05) \h i~ i.28. in the 1990s. Does this regression suggest that there is gender discrimimHion? Explain. «1. Tne estimated coeffi cien t on Female is . iv. T he st udy compares total compensation among top execut ives in a large set o( U.O. b.) a. the market value of the firm (a measure of firm size.04) (0. in perce ntage points).0.37. Ex plain what this value means.48 ..
[)U3) hal this value it has than small a..10 Co nsider Ih(.0 X In(X) + 4.) + [lination'! + /3 3X2 (effect of change in X l holding X 1 constant).lX1. Run a regression of the logarithm average hourly earnings. bUI with Z = O. If Age increases from 25 to 26. Female. See Exercise (7. on In (Age).0 X In(X) + 4. e.wil h Z = 1. :lnd education (Bachelor).Xl ~2 · )U4 Refill'll. Female. 3. Empirical Exercises E8. but with Z = O. .0 1.02.01X 2.02  =.::' in top 303 a.XJ !1X I + (/32 + f3~XI)ilXl + (3J!:J.() + 125. Ires lot a\ l)Orations port total b.. Usc Key Concepl 8. b. Run 11 regression of the logaritllm average hourly earnings.0 c.] earn less 8.lhen AY = (/31 + /3.0 X In(X).s r l and \00.lHim:This requires esti mating a new regression using a different definition of th e regressors and the dl!pcndcnt variahle.1 to answer Ihe following questions.3 to calculate the conlidcnce interval discussed helow Equation (KS). = 2. Z is tl ith values ()~ . = {3(1 (I. how are earnings expecleLl to change? If Age increases from 33 1034. how I I=.. + (3 zX Zi + f3iXlI x X 2. SHme Ull (i). ii. = 2. In(A IIE).0 X In(X). on A.9). Run a regression of "vcmge hourly earnings (A H £ ) on age (Age). I. l' l' = 2. how arc earnings expected to change? If Age increases from 33 10 34.0 + 3. If Age increases from 25 to 26.mgel! by AX l and X2 ch anges by . Ii.\ J on the .0X . l' i.0 + 3.~\ = {3 l + {31X 1. and 0 wll' yields 2. how are earnings e xpected to change? c.Empirical Exercises g.! rcgrc~ion model Y.1 10 show: a. gcnder (Female). 8. Y= l. If Age increases from 25 to 26.1 Usc the data ~el CPS04 described in Empirical Exercise 4. . If XI ch. i.O. Same as (i ).£:e. d.'er\1G:1 . = lie:::. In(AH E). and Bachelor. and Bachelor. \. of fnm firm perfor t' c.9 th is ilal ExplHin how you would use "Approach #1" of Sect ion 7.0 X Z x In(X).i1 = /32 + /3~Xl (effect of change in Xl holding Xl constant). how arc earnings expected to change? b. . with Z + 3.
Btlchelor. summarize the effect of age on earnings fo r young v. Do you prefer the regression in (c) to the regression in (b)? Explain. g.' . Would your answer cha nge if you plotted the regression function for females with college degrees? i. After running all of these regressions (a nd any others that you w3nl iO run). D esc ribe the si m il arilie~ a nd differences between the estimated regrc ss~o o func tions. and Bachelor. ln(AII El. What does the regression predict for his value of In(A H E)? Jim is a 30yearo ld male with 3 high school degree. orkers. Ru n a regression of the logarit hm average hourly earnings. What doe!. f. h. how are ea rnings expected to change? e. on Age. o n Age.1rry . Wha t does lile regression predict for his value of In(AHE)? What is the predicred differe nce betwee n Bob 's an d Jim's earni(lgs? j. Do you prefer the regression in (d) to the regression in (c)? Explain. FenUlle. E8. Is the CUCCI of Age o n earnings different for males chan fo r fem al e~? Specify and estima te a regression tb at you can use 10 answer this ques· tion. What does the re gressio n predict fo r her value of In(A H £)? Jane is a 30yearold female with a high school degree. " C. and the in teraction term Female X Bachelor. k. th e regression predict fo r her value of In (A H E)? What is Ih e pre dicted difference be tween A lexis's and Jane's earnings? Bob is a 30 yearold male with a bachelo r's degree. how are ea ming~ expected 10 change? If Age increases from 33 to 34. RUll a regress ion of In(AHE ). out the following exercises. What does th e coefficient on the interaction term measure? Alexis is a 30yea rold female with a bache· lor's degree. Do you prefer the regressio n in (d) to the regression in (b)? E xplain. and (d) fo r males with a high school diploma. Agel.2 Using thc da ta sel TeacblngRatings described in E mpirical ExerclSt' . how a re earni ngs expected to change? d.304 CHAPTER 8 Nonlinear Regrenioo Functions are earni ngs expected [0 change? If Age increases from 33 to 34. Agr? Female. if Age increases from 25 to 26. PIOl the regression relation between Age and In( AH£) fro m (b). I. (C) . Is the e Uect of A ge a ll earnings different for high school graduates than college graduates? Specify and estima te a r~g(ession that you cafl use to answer Ihis q uestio n.
cs· Hispanic. and SIwmfgSO. DadCoU. CtieSO.fema le dif fe re nce in the effect of BeaUty sta tisLica lly significJ nt? d. e. co nstruct a 95% confide nce fo r the increase in his course evaluaticm . how a re years of e duca tion e xpected to cha nge'! c.. BYlesl . What is his value o f Beamy before {he surge ry? After th e surgery? Using th e regressio n in (c).3 Use the dat a set C ollegeD istance descr ibed in Em pi rical Exe rcise 4. Coosider a H ispanic fe male wilh Tuirion '" $950. His panic. Tuition. how a re years o f educa Lio n expected 10 c hange? If Di. from 60 to 70 mi1es) . Tuilion. DadCo/l. DadColI . Is the male. Dis?. Femaf~.3 to a nswe r the foUowing q uestio ns.~1 increases from 6 to 7 (fro m 60 [Q 70 m iles).Empirical Exercise~ a. e. E8. • . a. M omColf. a nd S llI'mfg = SlO. fro m 20 10 30 m iles) . O Wl/home. Hisponic. ho w a rc years of educaLion expected to cb ange'! IJ Dis' increases from 6 10 7 (from 60 to 70 m iles). ]n c:omehi = 0.. CueSO = 7. 305 b. e. Female. R epea t (d) fo r Professor Jones. how £Ire yea rs of educa tio n e)( ~c te d to clllmge? If Disl increases fro m 6 to 7 (that is. O wnhome. MilWriry. l. . Is the re evide nce tha t A ge has a nonli near cffect on Counoe_E van Is the re evide nce tha i A ge' has a ny effect o n Course_Evan c. Mom ColI. lllcomehi. lncornehi. If Di. Mom Col/. If Dist increases fro m 2 to 3 ( tha t is. Add A ge and A gr2 to the regression. a nd Srwm/g80. B/tu. Run a regressio n of £ 0 o n D ist. Black . H e has cosme tic surgery thai increases his be auty index from o ne sta nda rd de viatio n belo w the ave rage to o nc standard deviation above the average. OlleCrulir. d. Fema/e. 1. OWllh ome. Byte. Inwmehi. hmaJ By test. Mod ify the regressio n in (a) so that the effect o f BeOltry o n Course_EmJ is di ffe re nt fo r me n a nd wome n. Tuilinn. and NNEnglish. Hlack. OWllhome "" 0. and . Cue80.58. DadColI. Run a regress ion of In( £D) o n Disr.06.2. Professo r Smith is a man . R un a regressio n of ED o n Disl. who is a woma n.k.)Mm/gSO.. Do you prefe r the regression in (c) to the regression in (a)? Expla in.H increases from 2 to 3 (fro m 20 ( 0 30 miles). Byrest.fl. [mro. Mom ColJ :os 1. Ctle80. how 3re yea rs o f ed uca tion expected 10 change'? ca' t t. If Dist inc reases fro m 2 to 3 (fro m 20 to 30 miles) .. how are years of ed uca ti on expected to cha nge? b. Estimate a regress ion of Course_EvaJon Beallty. Igh .
How does the regression (unction (c) b(. \){hal does the regrcssion predicr fo r the diffe rence between Bon nie's and Mary's years of educa tion? h. but htlr f. Using the regressions from (I): i. Assassil1Glions. What does the regression predict lo r the d iffere nce hetwee n Jane's and Mary 's yea rs oi education? ii. Use the plot to e ~pl ain wh\' regres. Tra deS}lII n::' In ( YcorsSchoo(). What docs the codficient on the inte raction term measure? g.. Rt'v_Coups.w. summarize the effect of Disr on years of ed ucation . and '{iI/dr Share x In(YeafJSchoo() . Both of Bonnie's pa rents attcnded col lege.. Fill come. A ua. I n(Yt:{/r~.. A mlS)'ill{l/iul1s and In(RGDP60): (4) '{rcllt. Alexis.· Slum'.:lIed regression funl.ioll. E8. BYlest. De:tCribe the similarities and differences between the est im. In(YeorsSclioo(). and Bonnie have the same values of Di. Construct a scatterplot of Groui/ll on YearsScI. TrudeShare1 .hange if yOll plotted the regression fu n ~ tion for a while male with lhe same characte ristics? ii. Afler run ning all of these regressions (and any others that you want to run).'iinn (2) fi ts bette r than regression (I).s<:huol}. Plot the regression relation between Disr and ED from (3) lind ( C) for Disr in th e range of 0 10 10 ( from 0 to 100 miks). In(RCDP60).havc for Dist > 10? Ho\\ many observat ion are lhere wilh Dis. Tlli.lIher did not. H ispallic. Would your answer c. 810ck.ool. Own/tol1le. CUf!80 and Stll'm/ 80. Whal dot::s the regression predict fo r the differe nce between A le xis's and Mary's yeats of educa rio n? iii. and In(RC DP60). Rev_Coups. Rev_Coups.4 Using the da ra set Growth described in Empirical Ex~rcisc 4. D oes the rdation ship look linenr or nonlinear? Explain. o.uint/tion '>.:_ tions. but he r mother did not. Female. (3) TradeSha rt'. Is there any e\l idence Iha! the e rrect o f Di:)'l o n ED depend" on Ihe fnmi ly's income? I.4. Mary. ex cluding tht" datu for Malta .306 CHAPTE R 8 Nonlinear R egreuion Function$ i. . Neither of Mary's parents altended collcge. Alex is's mother attended college. and (5) Tra deShare. > 10? f. Jane. Janc's fat her g anended college. Add the interaction term DadColi x MomColi to th ~ regression in (c). (2 ) TradeSlwre and In(Yea rsSchou/). run the following fj\le regressions: GrUlI'l1z on ( 1) TradeS/iliff' and Ye(/rsSclwu i .
We then pro vide a general formul<ltion. Because they are lin e.m: in different industries. In some applica tions. The depe n.'gression func tio ns is both rich and convenie nt to use. l rehl 1ion i. 4) Trot/e t/eS}J//rt' ' . Use regressio n (I) to predict the increase in G rowth.'SiJort' df!S/J(1f t .He d. how eve r. b Bon I the ~ want to ~uding the 'ut/l.:arin the parameter. This family of no n linear f(..foT t:xmn pie the adoption of databasc management softw. den ! va riahl e is the fraction of firm s in the industry that have ado pted the softwa rc.. Use regressio n (2) to predict the increase in Growth.. the. [ nd Tra de  Functions That Are Nonlinear in the Parameters We begin with two examples offunetions tha t are Ilo nlin. Use regression (3) to pre dict the increase in Growth. those ptl ramctc n.] a re nonlinear fu nc tio ns of the X s but arc linear functions of Ihe un known para me te rs.2 a nd 8.R egression Fuodions Thai Are Nonlineor in the Parameters 301 ~c) he ~mc · ~w fnc b. In 1960. Us ing regression (4).y can be eSlimated using an ex te nsion of OLe.adtShare from 0. economic rca~ni ng leads to regreSSion fu nctions that are not linear in the parameters.r in the unknown parame ters. a co untry contempla tes a trade policy that will increase the average value of T. ca lled nonlincllr least squares. Usin g regression (5) is t here evidence of a nonli near relationship betwee n TradeShare and G rowth? f. is th ere evidence that the effect of TradeShare on Growrh depends on the level of educa tion in the counlry ? In e. In 1960.. te .n \Vl1~ LQgi)·tic curve_ Suppose you a re studying the m:ukc! penetration of a technology. . bdcOl_ }anes APPENDIX 1 8. ca n be estimated by OLS after defining new regressors that a re nonli nea r Irans{ormaljons of the origi nal X's. Co Test whether the coefficients 0 11 Assassinmiom and Rev_Coups equal to zero using regression (3). ~st.. a cOll ntry contemplates an educa tion policy that will increase average ye<l rs of schooling fr o m 4 years 10 6 years.1 Regression Functions That Are Nonlinear in the Parameters The nonli nea r regression functio ns conside red in Sections 8.. a • . Use regression (5) to predici the increase in Growth. A llho ugh suc h regressio n functions cannOt he estimated by OLS.5 to 1.
.:: [or 1111 values of income. and hus an upr~er Ixm nd (that is. p.:s ~m. between test ~core~ and income ha..iti\c :.. and fur large values of X.11 . As Cim be seen in the grdl'lJ \alu~or the logistic function has an elongated "S··shape.11l In~.: function appru1lches I and the sl ope is n:H again .. l h~ X's ~nl~rcd this function nonlincarly. For eXllrnple.. has a slope that is gr~· .·ep 1\)( lu \\ \ alu c~ of X .li moJo ~I ope i~ ds can produce <l ncgative vcr~ implausible.BtJ I + (I ./l. regrc~it)n than l. which 10 mudcllhe rel:. Nrgolil'r npoll clliia/ RrOtl'th.1 2b. il makl.40) in which there arc k independent \ariablesandm'" I paHlmcte. I n the: elCarnpll.lupo.2 for some val ues of income.': I3tJl  e IJ.308 CHAPTlIt 8 Nonlinear Regreslion Functions slOglc Imk_ pcndcnt \ armble X describes all industry ChMtl. the prcdicted vtllues increase without bound... . Th" logi~tic regre.md the ~ I opc IS nat: the curve is stee per [or moot:ratc values of X .r lI1~tead a function that produces prcdicted values octween II 'Ille logisti. The slope is stt. .. m~ and decrea~e'i !l~ inCQme ri ~c:l. /J".. (S.. C'p'l' rcgr. G~neral functions Ih(Jt ore IlOnlinrar in the paramelers. hut the paf:ll11ctl!r' cl1Icred linearly. fu nction smoothly increa~es from a minimum 01Uto a maximum of I.) + lip 1. /J..sion mood wi th it single X is Y. I 1.: paramctcn enter nonlinearly as \\. The nega ti\'e exponeDtLaI growth function is graphed III Figure S.2 and R3.t.lU\... 111e 1 0gLstic and ncgall\'''.. p.AJ 1 + Uj. the polynomi.s to infinuy). 111e dependent variabk is between U ( 110 adopters) and 1 (100% adoption) Becaust a linear and 1. The tiOo logi~tic function with:1 si ngle X is graphed in Figure 8. the value of the (une ncitrly 0 . d....3'.:u pos.tcnstlc.::h large. The arithmic ~peci iicn t ioo has a positive slope lor all valuc!\(jf income: ho" c\er.:1nd you h:1\'e data on II industnes.c 10 usc mudd could produce predicted \u iucs less than 0 or gre.1' cis of Sections 8. but as :< increases it reaches an a5yrnptolc of f3v.t\c s t al low values of inc...!$ of Ihis appendlX..ion mode ls are special cases o( the general nonline:H model Y. . so for 50me incomes the predicl.lhl.as income !o.Hl..  AXI. nn asymptote re~rc ~s ion a~ inC\l!tlc increaSt..~ihle ~re value for a district will exceed Ihe maximum on the test The negallve exponential gro\\ th mooel providcs a nonlinear \pecification tha t hfl~ a prn.e some udiciellcies.( ' . thl.. TIle negative exponential growth model is (K~II) Y.. ln the 1111. . 130.. For small i~ X.".':~lv(1 nential growth regrl's:. TIle fu nctions lIsed in Seclilln 8.. _ • •• Xh .fJu. '" I + etA...
l3eciluse the regression model is nonlinear in the o."1 ~ " L _ _ _ _ _ __ _ __ ' graph. u~d Thb ~ame approach can he ficient~ to estim:lIc the p:namct.. (~ .(~ .""..~ion in Section 5....  (8.TIn: log are knolln.. ! (y.Regre$$ion Functions Thot ATe Nonlinear in the Porometers fI GURE 8...n:::.: daHl.:d u.rowth nun" he (une : and for I relation Pa rt (0) plOl$the logistic furn.J p.~(Il mr~s.38).I.lian of EqlXllion (8 . Pori (b1 plOI> the negative exponential growth function of Eql.:lcrs that enter nonlinearly cannot be estimated by OLS. Paraml. but they can be c ~ timaled hy nonlinear least ~qu3rCs. a of income as Hlcomt: Nonlinear Least Squares Estimation Nonlinear leasl squares is linearl~ It g~" n~"ral method for e~limating Ihe unknown paramcterl> of a regression function when those parameters entcr the populatioIl rcgres~oD function non (8.... '.0<1' . " param.: f ' .:lcrs are un known and must prl!dich:d tha t ha~ be estimated from tho. . .391.n conSlruellhc sum or squared prediction mistakes: f(X k ···· Xl.. ~ i'" """. bl and scltlingon Ihe valllc~ ~ali\"e e.X + + bkX/.:rs of the general nonlincltr regression model in EqulItion (SAO)./Cltion (8 . then predkled effcct~ may be compu..41 ) C<lr\~ ""II .... p. If the pa r amctcr~ .:~ t imator can be computed hy checking many trial \"aluesof bo. Ih\. which ha~ predicted volues thai lie between 0 and 1.)f.. . b. .t. In principle.3 of the OLC) estimator of the cocfficientsof the lin ellr multiple regre . and on asymptote 01 flo os X tend~ to in~nity.{ic n rllTW x x (b ) A Il cglciw I 'XPOlltJlU.:oef thi. ~ ~.. In appl icatIOns.qofl that minimize th~' sum of squared mistllles. howeve r. hmlf (8.. (a) A lo~i. 'reatcr ~' ecn 0 ~ l.sing (he method ornl! gcts described in Section R I.~jon l is sle":p [of model.8)."!!'" ~ ~ panllnl: l. + b.l()) h i> .12 Two Functions Thot Are Nonlinear in their Parameters y 309 !lOn 11 Ilion). The OLS estimator minimiLes the sum of "<luared prcdic lion mistakes in Equation (5.P''' Ir rcg.The \' 1'.".2m RcclIJI the di~w.  . which hos 0 slope thot is always posilive and decreases os X increases.b " :2:1) . For a set of trial parameter values btl.. the " OLS o. method is called nonlin car ICllst .
0552(5£ = 0.0 (5£ " 4 4S). and . f3 1'. (8. plus or min us 1.. One differe nce is that the negative cX jXlnential gro wth curve fl a ttens out .md IheX's.0) 1 . SO the no nlinear least squa res esti mator must be found nume rically using a compuler.n.r 131' anJ 13~ in Equation (8.48) l11is estimated regression fun ction is plut ted in Figure 8.lt In linear regressioD.81 = 0.:! errol'> sho uld be used. and a 95"/Q confidence interval can be COllstructed as the estimated coeffic1enl.2 (hc te roskedastlcit) . whic h simplifi es Ihe task o f comp ut ing the no nlinear least squares 1. • 13mare the values of /'0' b1• ··· minimize t he sum of squared predict ion mistakes in Equation (8.ThUS = 4. infere nce concerning the parameters ciln pm..uio n proble m.39) is positive1 and a n asymptote of f3u as income increases to infinity. As a consequence.4l). in t hi~ cast· quite similar.2[1  e~O.% standa rd crro~ Just as in linear regressio n. Istatistics can be constructed using the general approach III Key Concept 5. lion of the data. Application to the Test ScoreIncome Relation A negati ve e xpone ntial growth model.34. Unfo rtunatel y.! the highest levels of income.1. In rcg.~J4.310 CHAPTER 8 Nonlinear Regression fUodions The nonlinear least squares estimators of f3v. The two spccificllt ions are. th. the output typically reports sta ndard error~ for the esti mated paramete rs. squa re~ miz.:~li.. consistent with ha\'ing an asymptote.ssion software that ~lI p.44) (0. so he teroskedaslicityrobust standan. a n:lativcl y simple formula expresses the OLS cstimalOr as a func. .. mini_ Regressio n sortware incorporales algo rith ms for solving Ihe nonlinear least mato r in practIce.13. ceed as usual: in part icular. the nonlinear leas t squa res r~ll.44). ha~ the desirable fea tures of a slope that is always positive [if 131 in Equatio n (8.l)5S2(IJI(""'.(068).0068) (4. Unde r gene ral conditions on the fun ction f. the error tcnn in the non· linear regression model can be heteroskedastic.. fit to distric t income (X) a nd test scores (Y).square. • b. mator sha res two key properties with lhe OLS estimator in the linear regression model: It is consistent a nd it is normally distri buted in large samples.39) using the California test score data yields robuststa ndard e lTo r e te r estimates) is ~ Po = 703. along wi th the logarithm" regression fu nction a nd a sca n e rplot of the data.. ports nonli nea r least squares estimation .82 "" the estimated no nlinear regression function (with standard e rrors repo ned belo\>.. the pararn· res/Score = 703. The result of estimating fJ. .12) (4. no such gene ral formula exislS for nonlinea r leMt .
:ll''.: . ums oul <II . c non crton ~~~I..garilh ll1iC Ihis .. The negative uponentialSrowth regres' ~ function [Equation (8. bul the JioeorIog regreuion Junclion does nol.13 The Negative Exponential Gl"owttt and Linearlag Regression fund'ion s .··'r.. . J.tsup 'fS fOf I lir..42)) and the Test 'Cort 700 ~eSli d:h .lt usn  n pro mKey licicnl. • • .t8).t • • Negatlve tlj)OOeflti.ThUS IC para m' (8. in us\icil~ . ·o 20 • • '" Districi income '" .42) .~ ' .. One difference between tne !"wo functions is that the negative exponential growth model has on asymptote o~ Income incrooW!S 10 infinity..eorIog regression function {Equation 61811 both copNre the OOI1lineor relo lion be!"ween 1 scores and district 8st income.has lhe ivcl ~nd lInd fJ.Regression Fuoctioni Tho! ATe Nonlinear in the Porometef"s hal 311 FIGURE 8. no fl11m f cSli i ~e~.
Whal makes a study lhal use s mult iple regression reliabl\! or un reliable? We focus on statistical sludit:s Iha1 have th e objective of estimating the ca usa l effec t of a change in some ind ependent vari able.:ll1dity of forecasts madc using regression models. +1 . when will it fail to do so? To a nswer this question. This framework relies o n the concepts of intern al and ex\c rnlli validity. and discuss how to ide ntify those threa ts in practice.2 ...1 and 9. juSt as importHlllly. A st udy IS inlernally valid if its sta ti stiC<! 1infe rences about causal eiCec ls are valid for the populalion a nd se tt ing studied.. In Sections 9. in Section 9. For such stud ies. As an illustrati on of th e framework of internal and exte rn al v. · . on a depende nt variable.4 we assess th e internal and exte rnal validit y of the study of the .CHAP TER 9 Assessing Studies Based on Multiple Regression T he preced ing five chapteT e>. such as lest scores. such as cl<lss size.3 discusst:s a differt:nt use of regression models.forecas ting.2 focuses on the estima tion of causa l effects fr om observati onal data.: ff~C( un test scores of cutti ng the st udent. list a variety of possible thre.lliciit)'.. wh en will mUltiple regression pro vide a useful estimate of the causal effect and.and provides an introduction to The threats to the \. this chapte r presents a tramework fo r clssessing stati stical studies in generaL whethe r or not they usc regrcssion analysis. \: SIt'P back and ask.plain how 1 use multiple rCgTcs:. it i~ externally valid if its inferenct"S can be gene ralized to o ther popula ti<Jns and sc ltin gs. we discUSS inte rnal and exte rnal va lidity.teacher ratio presented in Chapt ers 4.ats to internal and exh:rnai validity. 'me discussion in Sec ti ons 9.ion [0 s 0 analyze the relationshi p among variables in a data set..1 and 9.. Secti on 9.. In Ihis chapler.
By "selling. In te rnal a nd e xte rna l va lidity distinguish between the popula tjon a nd setting studied a nd the populaTion a nd setting to which Ihe results 3re gene rali7. aod so forth fro m which the sa mple W<lS drawn. Internal va lidit y has two compo nents. social. K£Y CoNcEPT I 1 A :.. 9." we mea n the institutional. tha t is.ased and consiste nt.people. whe ther tllc o rganic me thods tha t work in the set ling of a labora tory also work in the selli ng of the rea l world..1. compa nies.teacher ratio in a certain regression . school dis· tricts. .}.1 Internal and External Validity 111e concepts of inte rnal a nd ex te rnal validitY. a high school (grades 912) prindpul might "'ani 10 gene ra lize our findin gs on class sizes a nd test scores in Ca lifornia e le me nta ry school districts (the popula tion stud ied) to the popUla tio n o f high schools (the popu l<l tion of inte resl). a nd econo mic environ· me nt. il would be important to know whe ther the find ings of a lab· o ra tory ex pe riment assessing me thods fo r growi ng o rganic to ma toes could be g.' . For exam ple. I" Threats to Internal Validity t' y L£. TIle popula tion to which the results are gene ralized. {3STR. th e n {3STR should be an unbi <1sed an d consistent estimator of the true population causa l effect of a change in th e studentteacher ratio. the estim a tor o f the causa l effect sho uld be uobi... The analysis is externally ''alld if it~ inferences and conclusions can be generalized from the popu l<ltion and sel ting studierJ to other popu lations ami se ttings.ed. First .la tistical analysis is internally valid if the statistical infe nm ce~ about ca usa l "effects are valid for the population being studied..9. For exa mple. o r the populatio n of inte rest.eneralized to the field. provide a framework for evalua ting whe the r a sta tistica l o r econome tric study is useful for a nswe ring a specific queslion o f in terest. is the population of e nti lies to which the ca usa) infe re nces from the study a re to be applied . if ~STR is the OLS estimator of the effect on test.scores of a urUt chaoge in the student. The pOlmlation studied is the popula tion of e ntit ies. We provide other exa mples o f differenceS in popul<ltions and sett ings la te r in this section. For example.1 r 9.' .. 1 Internal and External Validity 313 v INTERNAL AND EXTERNAL VALIDITY I ' "0' '. legal.".defined in Key eoncepI9.
J3s n~ . or becau<. rbcse threats lead to failures o f one or more of the leaSI S4ua res assumptions in Kt'y Concept 6A.2 provides a detai led discussion of the vario us threats 10 intt:rn. because of differences in charact.l threat to external valid ity. and these reasons constitute th reats to internal va lidity. If data on the omitted vari. confidence interval1Should contain the true population causal effect . then th is threal can be avoided by including that variable as an additional regressor.JI vnlidity in multiple regression analysis and sugges~ how to mitigate them. There are various reasons Ihls might nOI happen. and their standard errors. able arc available. cl). tests should have the desired significance level (the actuil] rejection rate o f the test under the null hypothesis sho uld equal its desired signil.e the study is OUi of date. Wheth N mice and men differ sufficiently to threaten the external va lid ity of such studie' L~ a mailer of debate... which violates Ihe firs t least squares assumption. For example. Threats to External Validity Poten tial threats to extern:tl validity arise (rom differences between the popula lion and selling stud ied and the population and selli ng of interest.·r cnee. lat"!· oratory studies of the lOxic effects o f chemica1s typically use animal population' like mice (the population studied) . For example. Differences be tween the population studied and the populat ion of inte re"t can po:. In regression analysis. but the results are used to write healt b and safel) regulations fo r human populalions (the PO p ul ~lIion of interest). For example. This could be because Ihe population \\. . ..ion. W Ith probability 95% over repeated sa mples.. dencc intervals ha\£: the desired confidence level.:ristics of the populations. lhe requirements for internal validity are thaI the OLS estimator is unbiased and consistent. hypothesi. causal e ffects are estimated using the estima ted regn. beca use of geographical di nt. <. Differences in populations.e i. Section 9. icnnce le . ~ .3 14 CHAPTER 9 Auessing Studie~ Based on Multiple Regression Second. in a study based on OlS regn:~. one threa t thai we have discussed at length is omitted variable bias: it leads 10 corrC"ialion between one or mo re regressors and the I:!rror term .96SE(~'I7"H) . Accordingly.I~ chose n in way tbat mak es it di(fere nt from Iht. the true causa l effect might nOI be Ihe orome in the popu l ali (~n studied and the population of interesl. if a confidence interval is constructed as ~S1'R ::!:: I.' popula tio n of inte rest. and confidence imervals should have the desired confidence levd . sian function and hypothesis tests nre performed using the estimated regression coe(ficienL<.. More generally. thi. <1nd tha t s tandard errors arc computed III a way that mah'S conh.
e of interest. dif ferences in laws (differences in legal pcnalties). In general.9. l mpurtant differences between the two will cast doubt on thc external validity of thc study. based on test result:. a study of the effect on college binge drinking of an an tidrinking ad\'enising campaign might not generalize 10 another identica l group of college students if the legal jXnalties for drinking at the two colleges dif fer.al setting in which the study was cond u et~d differs from the lega l setting to which its results are applied . Suppose fo r the moment that these results are internally valid.teacher rat io.5 Even if the population being st udied and the popula tion of in terest are identical.ct of reducing. Sometimes Ihere are two or more studies on dirrerenl but related popula tions.or differences in the physical em'! T men! (tililga teparty binge drinking in southern California ve rSlls Fairbanks. examples of differences in sCH ings include d ifferences in the institutional environment (public universities versus religious uni.' other h.collegc students and collegc instruction are vcry different than elemen tary school slUden ls and instruction. This analysis wa:. class sizes e!>limated lLsing the C"Ilifor nia elementary school d istrict dat a wou ld gene l' ~l h 7e to coll eges. In th is ell!>/:. sim ilar findings ill two or more studies bolster claims to How to assess the external validity of a study. For example. Chapters 7 and 8 reportcd sta tistically signifi ca nt. on Alas ka).1 Inlemol and External Volidily 3 1. For example. but substanli\'e ly small. clemen t. in Section 9.sl.: settings differ. Application to test scores and the studentteacher ratio.:. the Mronger b the casc for external \'alidit~. the leg.S.. Exte rnal validity mUl>t be Judged using specific knowledge of the populations and se n ings studied and those of inlerc. More generally.ersilies). and orgllnilfllion are broadly simi lar throughout Ihe United Slates. so it is plausible th. Differences in settings. for California school districts. it might not be possible to generalize the '\tudy resu lts if th. the external validity of bot h st udies Clill bc checked by comparing thei r results. • . so it is impill usibic that the effe. elemen tary school students. estim atcd improvements in test scorel> resulting from reducing the student.4 we (In<llyzc Icst score and cla'l> si/c dala fur de mentary school districts in Massachusetts and compare the MassachusetlS and Cal ifornia resul ts.1I thc California resulb might gcneralil'c to performance on standardized tests in ot her U. To what o ther populations and settings of int er est could th is finding be generalized? The closer arc the population and setting of the ~ tudy 10 tho.lIld. I r so.lT)' school district'\. For example. On tht. curriculum..
The section con· eludes with a discussion of circumsta nces Iha l lead to inconsistelll standard errors and what can bt: done about it. sample selection. 9. an d Campbell (2002). imprecise measurement of the independent vari· abies C"errors in variables"). these threal\ are best minimized at the early stages of a study.. . eve n in large samples: omin ed variables.316 CH A'TER 9 Asses. Cook.! to Shat.eIght (lII' large ~tud) than a small siudy? A dl~sslon of nleta·anal~i~ and IIsch ~lIe n ges goes bc)'Ond ih ( "'. before the data are colleeted. For eac h. do you sort the: good studies from the t>aJ' 110100' do you cumpBre studies when Ibe dependent "ariabtcs differ" Should you put more "... !Im. fid e nce int e r va ls wilh lhe desired confidence leve l. fur example.""n III tbe box 00 the ·· ~Ol:arl cffed" in Cbapte r 6 is ba. and the interested reader is rdc rrl:t. Study design . I A com pari5Ol\ o f many relatcd studies on tbe same to pie is callcd iI m~ta. I How to design an externally valid study.sing Studies Based on Muhiple Regre~~ion external va lidi ty. yield con. viola ting the first least squa res assumption in Key Concept 6. ity ste m from a lack of comparability of (Xlpulations a nd senings. TIli s st":ctlon surveys five reasons why the OLS estima tor of the multiple regression coefficients might be biased.ed on II mCIIlanalySiS.s beyond the scope of this textbook. and sim uil . All fj\'e source) of bias arise beca use the regressor is correlated with the error term in the populatjo n regression.Ior' of I hi~ textbook. Because threats to extern al valid .lish. and jJ their sta ndard t ITan.anal~·51s.meous causality. we disc. Thi s bias persists even in large samples. mi sspeciIication of the functi onal form of Ihe regression function .2 Threats to Internal Validity of Multiple Regression Analysis $ wdies based on rcgrcssion a nalysis arc intt:rnally valid if the esLim al cd regres. The inte rC!5led rellder is re(errtd 10 Hed¥cs and 01ldn (J91l~) and COOJ'Cf and H( W ' (l!n4}. so that the OLS estima tor is inconsistent. sion coeffici e nLs are unbiased and consistent . The disCu. Perfonfl1nll' mellianalysis of many §iudics has its own ehaUenges. Om itted Variable Bias Recall tha i omitted variable bias arises when a variable that both determinl!s Y and is correlated with one or more of the included regressurs is omitted fro m the regression.4. whi le d iffe rences in their findi ngs that nre not readily expl ained cast doubt On their external validity.. How best to minimize omitted variable bias depends on whe ther or not data are available for the pote ntial omitted variable.uss wha t can be do ne to reduce t bl~ bias.
and should occur before you actually run any regressions. o[ ir the estimated coefficien ts of imeresl change appreciably ... On the ot ber hanq . The seco nd ste p is to ask yourself: What are the mosl likcly sources of impor tant omi tt~d variable bi as in this regressio n? Answering this question requircs applying economic th eory and expert knowledge.l step is 10 identify the key coefficien ts o f in terest in your regression. permits the Skeptica l read er lO draw his o r her own conclusions. In the test score regression s. add ing" new vari able has both costs and benefi ls. including the \'aria blc when it does Dot belong (t hat is. O n the one hand. Tables 7.3 are examples of this Slralegy. These steps a re summarized in Key Conccpt 9. however. In practice. t h ere b~ addressing the problem. the decision wh eth er to include a va riable invol\. Land 8. this step entails i ¢~ntjfy ing those determinants of test scores that. If the coefficients on the add itio nal variables are statistically signilicant. Prese nting th e other regressions. then the)' should remain in the specification and you should modify your base speci ficatio n. However. In the test score example. hecause the q ucstion o riginall y posed concerns th e effect o n test scores of reduc iog the sludcnllelleher ratio. The fir. in Table 8. because Ibis is done before analyzi. W~ could have presented o n I)' the regrcssion in co lumn (7) .sion specifica tion. and a lisl o f additiona l "4ueslio nable" variables that might he lp 10 mitigate possible omitted va riable bias.2. this is the coefficicni on the studenttea cher ralio. H you have daHl on the omitted vuriable. If not. who can Ihen draw his o r her own conclusions.cs a tradeoff between bias and variance o f Ihe coefficie nts of interest. For exam ple. there are four steps that can help you decide whether 10 include a variable or set of variables in a regressio n.. the starling poi nt for your empirica l regression a nalysis. In other wo rds. . omi tling the variable could result in omitted variable bias. if ignored. then these variables ca ll be excluded from the regression. Th e result of this step is a base regres. The th ird step is to augm ent your base specifi cation wit h the addit ional ques tionable variables identified i.could bias o ur estimato r o f the cl ass size e ffect..n Ihe second ste p and to test the hypotheses that their coefficients are ze ro. The fourt h step is to prese nt an accurate SUmmilrY of your results in lahular fo rm .1: mreaTS to InI8l'TlOl ValidIty of Multi~ Regreuion AooIysis 317 Solutions to omitted variable bias when the omitted variable is observed. This provides "rull disclosure" to a polentia l skeptic.'hen the additi onal variables are incl uded. wben its popu lation regression coefficient is zero) red uces the precision of the estimators of the ot her regression coefficients. then yOu can include this variable in a multiple regression. this is refe rred to as a priQri (" before the fact") reasoni ng.. beca use that regression summarizes the relevant effects and llonlinearitics in the other regressions in that table.3.9.ng the data.
.lt a randomized controlled experiment. tben aga in in 2000. Do your results change if you include a questionable variable? Solutions to omitted variable bias when the omitted variable is not observed. Here are some guide. tcst score ancl related data mighl be collected fo r the same dislric ts in 1995. Provide "full disclosure" represen tative tabulations of your resulis so thaI oth· e rs can see the effect of induding the questionable vuriables on the coem· ciel1t(s) of inte rest. Randomized conlrollcd experi ments nrc: diS cussed in Chapter 13. Instrumental vaei.lrol es regression is discussed in Chapter 12. The second solu tion is to usc instrumental vari ables regression_ Th is method re lies on a new variable. 3.2 If you include another variable in your multiple regression. 4. Still . there are three other ways to solve omill ~d variable bias. For example. Be specific about the coefficient or wefficients of intercst. Use a priori reasoning to idcntify the most important potential sources of omitled variable hias. tro l for unobserved omi tted variables as lon g as those omilted var iables do not change over time. the effect of reducing class size on student achievement) is studied usil\. The third solution is to use a study design in which the effect of intere<. Adding an omitt ed varia ble to a regression is not an option if )'Oll do nOI have dala on that variable.lcading to a base specification and some "ques t ion abJe~ variables. lines to help you decide whether to include un additionul variable: !. Each of these three solu tions circu mve nts omitted variable bia5 th rough the use of difrerent types of da ta. The first sol ution is to use data in wh ich the sam e obse rvat ion al unit is observed at differen t points in time.. called an instrumental variable. Te5t whether addit ional question able variables have nonzero cot!CficienlS. 2. Data in this form an: calL panel daiS. As e xplained in Chapte r 1 pane l data make it possible to con ed 0.t (for example.31 8 CHAPTER 9 Assessing studies Based on Multiple Regression OMITTED VARIABLE BIAS:SHOUlD I INCLUDE MORE VARIABLES IN My REGRESSION? 9. you will eliminate the possibility of omitted variable bias from cxduding that variable but the variance of the estimator 01' the coefficients of interest can increase.
1 Threats 10 Intemol Validity 01 Multiple Regression Anolyiis 31 9 FUNcnONAl FORM MISSPECIFICATION ~ KEY CONCEPt I' ' Functionnl form misspecificution arises when the functional form of the estimated regression function differs from the functional form of the population regression funclion. (he depende ot variable b dis· crete or binary (for exa mple. this problem of potentia l nonlineari ty ca n be solved using the methods of ChapteI 8.teach er rario for elem entary sch ool st udents and te nth graders might be correlated . so thi ~ mixup would lead 10 bias in the estim<lted coefficie nt .9. th ~'y are not the $<Ime. then Ihb functionaJ form misspecificlttion mak es the OLS estimaTor biased. Y.I scores for rifth grad ers on the Sludent.Dias a rising from fun c(ionall"onn misspeciJication is sum· merl'l.teacher ralio we had inadvertently mixed up our data . things are more complicated. If. 9 . if th~ population regression function is a quadratic pOlynomial. so that we ended up regressing t e~i. then the estimator oj' the partial effect of a change in one of the variables will. equals 1 if th e i'h person att ended college and equa ls 0 otherwise). howeveI. For exam ple.l io Chapler I I. If the fu nctional form is misspeci fied .sing nonlinear aspecls of the regression function. A lthough tbe slUdent.3 Misspecification ofthe Functional Form of the Regression Function If the True populati on regression function is nonlinear but th e estimated regres sio n is linear. Solutions to functional form misspeci/ication .to:: ache r ratio fOr tenth graders in that d istrict. ErrorsinVariables Suppose that in ou r regression of test scoreS against the student.. Funct ional C misspecification often clln be detected hy plotting the data and the estimated onn regression function. in gene ral. in which the o mitted variables are the terms that reflecllhe mi!'. and it can be corrected by using a different fu nctional form. then a regression that omit s the square of the independent variabl e would suffer fr om omitted varia ble biii5. Wh en the depend ent vari able is continuo us (lik e lest scores).3 . Regression wIth a di screte de pendent variable is discusset. Th is is an eXlIlllple of errors. 111is bias is a type of omitted va riable bias.ed in Key Concept 9.in·vari ables bias • . be biased.
e samples. To sec that errorsinvariables results in correlation between the regressor :. As an example.\'... .( X. suppose there is_3 single regressor X.. Written in terms of the imprecisely measured varia bl~ Xi' the populution regress ion equalion Y.u'. lion in the Current Population Survey in volves last year's earnings...ri + so ~L . l'.X. Th e precise size and direction o( the bias in P depe nd on the correlation I be tween Xi and ( Xi . (the respondent's estimate of income).. the measured value of the variable. one qUt:~.j • co~ (X. c. II this diffe rence is corre lated with the measured value X. P.) + CO\( X. II.. .) • n:.This r E En bias persists even in very larg.1 ) whe re v. will be corre lated with the error term aDd P1 will be biased and inconsistent.. in turn. is measun:d imprecisely by X.. No" oj . + tli is len bi3 y. suppose that the survey respondent provides her best guess or recollection of the actua l va lue of the independent variable XiA conyen ic= nl way to represe nt this mathematically is 1 suppose thatlhe measured value of X.'(tri <r] '" cr. There are many possible sources of measuremenlerror. lUlld£f th. P..) _ P Ia::_ ll'll.. + 1 . 0 equa ls the actual... the n the regressor . unmeasured va lue.) . is X/ = Xi + hli .w.320 CHAPTER 9 Assessing Studies Bo~ on Multiple Regr«wion because its source is a n error in the measu re ment of the illdepcndcll l . ". "" 1J1(X... For example. . .II .(X1_1<. (9.. SO thai the OLS estimator is inconsistent if there is measurement error. Accord ingly. a responden t might give the wrong answer.J + 11.This correlation depends.:. 1<...t "" "l" fJI". on the specific nature of th e measurement error.aI(X.!.Xi) + u. ".X..tnd the error te rm.X.) = 130 + f31 X .. a bit of algebra 2 ~hows th at ~l has the probability limit a..'anable..~ jnlfll EquAlion (61) .. + Vi' (9. u!)!P. = I3IJ + (3)(. or he might misstate it for some other reason.) 0 .) '" P. Under thi s ass umption. we might suppose that Wi has mean ze ro and vari:1nce and i ~ uncorrelalcd with XI and the regression error 11 /. ..P I"!J~.. is obse rved. the population regression equation written in 1ConS of XI has an error term that contains the differe nce be tween X.. Because X" not X.. /3 l • p '+ . err Ihe have been typographica l errors when the da ta were first entered. and X.. A respondl::l1l might nOI know his exact earnings. 111 .. plus a pure ly random compone nt ..)..oo. If instead the data are obtained from compute rized administrative rccor~ there mi~ht abl. Th us. Because: the error is purel)' random .. = Po + PIX.. (say.\'.. denoted by X.2) luiJ(ui . 13 t Ux u .L. act ual income) but that X. uJ' 2 + . PI .so co. the regression equati oo actually t:stimated is the one ba~cd on X.~ mell~urcmcnt error assumption.. lr the data arc collected through a survey.• tld oov(X1_"'.00\'(. hi.. = 131 (X ..
invariables bias in the OLS estimator aris.2 Threats to Intemol Validity of Multiple Regression Aoofysis 321  ERRORSiNVARiABLES BiAS Ke\' CONCEPT I' .. how ever. then the OLS estimator in a regression with a single righthand variable is biased toward ze ro.'This bias dt:!pends on the nature of the measurement error and persists even if the sample size is large. U'.. and if she knows or can estima T the ratio Q'~. if possible. then ~1 is inco nsistent.9.4. to use 1he resulting fo rm ulas to adj ust the estimates.2) to com pute an estima e lor of {3j that corrects for the downwa rd bias.J!.2) is specifi c to this particular type of mea sureme nt error. if a researcher belie ves that the measured variable is in fac t the sum of the actual value and a random measurement error term. it illustrates the mo r~ general proposition that if the independent variab le is measured imprecisely th en the OLS estimator is biased..". In the e xtre me case that the measurement error is SO large that essen tially no informa tion about X . If the measured variable equals the actual va lue plus a mcanzero. <41' Solutions to errorsinvariables bias.variable ~ bias. when there is no measur~ment error. For example. This method is studied in Ch:lpter 12. Errorsin ·variables bias is summari 7cd in Key Concept 9.. One such method is instrumental varia bles regression . if the meas ure ment imprecision has the effect of simply adding a random element 10 the actu al value of the independent variable. ~1 will be biased toward 0. = 0 so ~\ . then she can use Equation (9.I al. .:s when an independent varj· able is meas ured imprecisely.2).. even in large samp les. Beca use the ratio <1! is less th an 1.2) is 0 and ~l conve rges in probability to O In the o ther . independently d islri buled measuremem error lerro. extrem~. and its probability limit is given in Equation (9. A second mel hod is to deve lop a math ematical model of lhe me asurement e rror and..4 That is. Because tRis approach requi res spe ci<tHzed knowledge about the nature of the measurement error.. but is uncorrelaled with the measureme nt error. If th is is impossible. remains. the de tails ty picall y are specific 10 a gi\len data set and its measurem ent probJems an d we shall not pursue this approach furth er in this textbook .i o . econometri c methods can be used to mitigate errors. 9. the ratio of the va riances in the final exprcssion in Equation (9. f31' A lthough the result in Equation (9. The best way to solve the errorsin variables problem is to get an accurate measure of X. It relies on having anothe r va riable (the " inslrumenlaJ" variable) that is correla ted with the aemal value X. even in large sa mples.Errors.
a re si milar to the factors thal de te rmine how much that person earm.tluc of th e depend. For example. where furthe r refere nces are providl!d. Sa id <. the fact that so meone has a job suggeST that. and cou ld be coreela le d witb the regressors. Th ose methods build on the tech niques introduced in Chapter 11.ol1 of sampling is related to Iht' \.I. Th e factors (observable and unobservable) tha t de te rmine whe the r someone has a jobcdu· ca tion...:lion in econom ics a rises in using a regression of wages on educa tion to estimaT the effect o n wages of an addiTionul year of ~du· e ca ti on . the sa mple se lection met hod (ran. Such sam pling does not inlroduct bias.'~ nOI imroducc bias. exper i~nce.Iiffercntly. l i t least on ave rage. whe ther som.ls Outperform the Market?'" provides an example of s<lmple sckclion bias in fi msncial economic$. by definition . Solutions to selection bias. lation) has nOlhing to do with the value of the dependent variable. .5_ 11le box "Do Stock Mu tual Fun<. tbe :oimple fact that someone bas a job. provides infor ma1ion 1ha t 1he e rror term in l he regression. when e mployed . lnis seier.ta is innuenccd b. the sampling method (being dnlwn at mndom from the popu.. An example of sample selec tion bias in poll in r.16 car owners with phones were more likely to be R ep ublican s. A n example of sa mple SCIC(. The me thods we h ave d iscussed so fa r can nul eliminate sample selection bi'ls. B i a~ can be inuoduced whe n tht' mc th. and so fo rth. I 322 CHAPTfR iii A!~~ing Sludies Based 0f1 Mukip\e Regression Sample Selection Sample selection bias occurs when the availability of the da.Thus.:nt variable.:o ne has a job is in pa rt de t~ rmined by the omined \'ariabLes in the e rror te nn in the wilge regres sion. Sample selection that is unrelated to the value of the dependent variable dUI. dent vari(lble (who The individual supponed for president in 1936). In that example. a selection process that is rela ted 10 tbe value urlhe de pendent variablc.. (jo n process can inlroduce . was given in fl box in Cha pter 3. tIle error term in Ihe \\Jge S eq uat ion for that pe rson is positi ve.Thus. The me thods (or estimating models with sa mr Je sel ect ion are beyond the scope of Ihis book. all else equal . domly se lected phone numbe rs o f automohile owne rs) was rela ted to the depen. abilit )" luck . and thus appears in the data '1CI. because in 19. Sample se lection bias i~ summarized in Key Concept 9. is pOS il iv~. if data are collec ted from a population by simple random sampling.Orrclatio n becween tb e e rror te rm a nd the rcgrc~~or ' which leadlii to bias in the OLS estimator. wbere one lives. TIl is 1 ca n le ad to b ia~ In 00 the OLS estimator. On ly individuflls who have a job have wages.
• " • .
324 CHAPTER 9 Assessing Studies Based on Multiple Regression Simultaneous Causality So far.. this factor that produces low scores in turo resulls in a low student.I.3) represents the edllC a· tiona l effect of class size o n test scores.cly le com:lated with the error term in the population regression.\ .teacher ralio.. ity would rUIl in bo th directions : For the usual educational reasons low student.. Su ppose. suppose there is an omi tted fac tor that h::<lds 10 poor test scores. consider just th e two variables X and }' and ignore ot her poss ible regressors.Ieache r ra tio. and one in which Y ca uses X: ~ I Yi = f30 + (3tXi + u.teacher ratio.) (9 ') x. tive subsidized hiring teachers in school districts wilh poor tcstloCores. o ur study o f lest scores focused on the eUect on test score~ of reducing tbe student. becauSt of the government program.and (9'. . for convenience. there are (\vo equations.:nt variable (X ca uses Y). bUI because of the government program it also leHds to a decrea~e in U student.teacher ra tio is positi .4) represents the rC\'I!f'C causal effect of test scores on class size induced by the go\'crn mc:nt progra m. = 'Yo + 'Yt Y . But what if causality also runs from the dependent variai'll!. while Eq uation (9.. + VI' Equation (9. Accordingly. Equation (9A) represent s the reverse causal d fect of Yon X. is the effec t on Yof a cho. In o the r words. Equation (9.n X. This in turn lea d~ to si multaneous causality hi as and inconsistency of the OLS estimator. where 1/ represents olher factors. In t he tes t score example. causality runs "backward·· a. we have assumed that causality runs from the regressors to the dependl. one in which X ca uses Y. however. In the test score problem . This correlatio n betwee n the error te nn and the regressor can be made pre· cise ma thema tica lly by introduci ng an additional equatio n that descri bes the reverse causa l link .Ieacher ra tio 10 lest scores.3) is the fa miliar one in which /3.:1[ as forward. is presumed to run from the studeot. a negative error term in the popu lation regression ot test scores on the studentteacher ratio reduces test scores. there is simultaneous ca usHlity. that is.! to o ne o r more regressors (Y causes X)? If so.nlle . . the student . cau~. il )'. that a government inHm. but bccause of the govern ment program low test sco res would lead to low studenH eacher ratios. Thu s.1f so. If there is simuhaneous cau'a].. an OLS regression picks up both effects w tht OLS estima tor is biased dnd inconsiste nt. for exa mple.leacher ra tios would arguabl y lead to high test scores. Sim ultan eous causa lity leads 10 correlat ion be tween th e regressor and the error teml. so that causa lit ).
u. in Eq ua tio n (9..B ICO' ·(X . o. inconsiste nt standard e rrors will prod uce hypothesis tesls with size that diffcrs from tbe desire d signi ficance level and "95%" confidence intervals thaI fail to include the true value in 95% of repeated sa mples.) oo' IX. y .) + COY Il.) • oo~( Yn .. L . and u.U. if 'YI is positive. II... Assu ming that cov(•••1 ...:qualion (9. u.'. in addition to the causal link of interest from X to Y.. 'Y I~ Soll'ing for oov(X. also called simuhaneous equations bias.) Beca use thi s can be expressed mathematically using two simultaneous equa tio ns. a Jow va lue o r Y. ]) 'h i~ in curn implieS that (l'..11. this lowe r value of Yi affects the va lue of XI tlHOUgh the second of these equations.X ... which decreases V. a nd the e rro r te rm II..and if 1'1 is posi tive.4 ) imp!ie~ I hal OOV(X. How eve r. the to pic of C hapter l2. there is fI causal link from Y to X This reverst causality makes X correla ted with the e rro r term in the population regression of interest. Simultaneous causalit y bia~ is summarized in Key Concept 9.To see Ihis. Sources of Inconsistency of OLS Standard Errors Inconsisten t standard errors pose a different threa t to inte rnal validi ty. arises in a r~grt::ssio n of Yon X when.) = l'. It) . Yl Y.9.YIP ). will be positively correlated. wiU lead to a low va lue of XI" Thus. + P ..) then Yield) the result cov(X.) . . imagine tha t II. One is 10 use inslrume ntal variables regressio n.. II. The second is to design and 10 implement a randomized con tro ll ed experim e nt in which the r evcne causali ty channe l is nullified ... / (1 . rn3themlllically. and such experim ents are di sc u s~e d in Chapter 13. 9.) "" " 100' ( Y.CO\·(A. E ve n if the OLS estima tor is co nsistent a nd the sam pIc is large.6.2 Threats 10 Intemol Volidily of Multiple RegrM~ Ano/ysis 325 1  SIMULTANEOUS CAUSALITY BIAS KEy CONe"" SImultaneous causality bias.) .).3). YICID·( Y.It. + 1'. nme thaI Equation (1J.) .. u. X.. the simultaneous causality bias is somc time~ called simultaneo us equations bias. hi... "Y1. u. Simuhaneous causalit y Leads ro correla tio n between X. The re are two ways to mitigate si mult aneous ca usa lity bia1>.r.6 . Solutions to simultaneous causality bias. is nega tive. '1h i hov. by I. 1...
7 summari zes the threats fa int ernal "alidity o f a muJtrrl . biased or inconsistent. sian wiLh panel da ta) and in Chapter 15 (regression with time series dat a).. Heteroskedasticity.are incorrect in the sense tba t they do not produce confidence intervals wit h the desired confidence level. If th e omitted varia bles th at const itute the regression error are persistent (like district demographics). The most common circumstance is when the dala are repeated observations on the same e nti ty o ver time. Correlation ofthe error term across observotions_ [n some . howevcr. Thi s wi ll not happen if th e data are obtained by SatnDling a t random from th e pop Ulation because the rando mness o f the sampli ng process ensures tha i th e e rrors are ind ependentl y distributed fro m one observa tion 10 the next.robust standard errors and to construct FSt3tistics using a he teroskednsticityrobusl variance estimator. 'Inc solution to this problem IS to use hetcroskedasricil y. but it does violate the second least squares assumr lion in Key Concept 6.a tjons is whe n sa mpling is based on a geographical unilo If there are omilled \'ari abies that reflect geographic influences.4. Ano ther situatio n in which the erro r te ml ca n be correlated across obser. In many C(lses. Somet imes. He teroskedasticityrobust standard errors are provided as an oplion in modern software packages. for bypOlhcsis tests and confidence intervals. the n this induces 'serial"correlalion in the regression error over time. As discussed in Section 5. setlin g~.hose standard errors are not a reliable ba~i. Key Concepl 9. The consequence is U the OLS standard e rro~hoth 13t homoskedasticyonlya1rd heteroskedaslicityrobust.: regression ~rror is beteroskedaslic. lht. these omilled va riables could result in correlation of the regression errors for adjacent observations.: regression study. this problem ca n be fi xed by using an alternative fo rmu la toT stand ard errors. We provide such a fo rmula for comp uting standard e rr o r~ thaI are robust 10 both heteroskedasticity and se rial conelation in Cha pter 10 (regrc:s. the same school district for different years. If. howe\'Cf. sa mpling is onl y partially random. Serial corre laliOn in the error term can arise in panel data (d ata on multiple districts for multiple years) and in time series da ta (data on a single district for multiple yea rs). t. Correlation of the regression error across observations does no t make the OLS estimato]. fo r example. (he population regression error can be correlated across observa tions.326 CHAPUR 9 Aueuing Studies Based on Multiple Regres~ion There are two main reasons for inconsisten( standard errors: improperly han dled beteroskedastici ty and correlat ion of the error term across observations.. {or histories1 reasons some regression software report homosked3slicityonly standard errors.4.
9. including forecasting . which in tum means that the OLS estimator is biased and '* inconsislCllt.3 Internal and External Validity When the Regression Is Used for Forecasting Up to now. Homoskedastidtyonly standard errors are invalid if heteroskedaslicily is present. but concerns about unbiased estimation of causal effe. If the variables arc not independent acro~ observa.. O mi tted variables 2.7 1 study: l. results in f& il ure of the first teast squares assumption. rncorreci calculation of the standa rd errors also poses a threat to internal validity.) 0. Apply ing this list o f threats to a multiple regression study provides a system atic way to assess the internal validity of tbat study.3 Internol and External Validity When the Reg renion 15 used for Forecasting 3'2..7 V THREATS TO THE INTERNAL VALIDITY OF A M ULTIPLE REGRESSION STUDY There are five prim ary threats to the internal validity of a multiple regression ~ KEYCONCEPT 9 . concerns aboul external validity are very import ant. Simultaneous causality Each of these. Sam ple se lection S. Errorsinvariables (measurement error in the regressors) 4. how ever. Using Regression Models for Forecasting Cbapler 4 began by considering the problem of a school superintenden t who waDIS to know how much test scores would increase if she reduced class sizes in her . Re gr~s i on modds can be used for other purposes. Funct ional form misspccificalion 3. the discussion of multiple regression ana lysis has focus ed on the esti mation of causal effeCIs. When regression models are u ~ ed for forecasting. if presen l. as can arise in panel and time series data.ts are not. E(Ui IXli"' " Xk .tions.9. then a further adjustment LO the standard error for mula is needed to obtain valid sta ndartl errors.
In th is situat ion. lar. Equation (9.d tests.9 .328 CHAPTER 9 Aue55ing Studies Based on Multiple Regression school district lhat is.in panicu.ressioD models [or fo recasting.tha t is.'m are conceptu(lU~ very differcn l.5) estimate!. the pan:nt simply wants the regression to explain mueh 01 the varirll ion in test scores across districts and 10 be stable.2R x STR . Thai is.nt body and SlUdents' other learning opportunities o utside school. Rather. Now consider a different problem.evcn tf th~]r coefficients ha ve no causal interpretation.5) could he useful to the pare nt trying to choose a home. Thl:: pa rent would like to know how different school districts perform on standardl/\.:tive what matler5 is whether it is a reliahle predict or of tesLperformance. hut from the parelll's perspel. 5) We concluded that th is regressio n is no t useful fo r the superintendent :Th e OLS estim ator of th e slope is hiased beca use of omitted varia bles such as the composi· tion of the stud.S size. More ge nerally. class si7.the requirements for the validity of the regression are different fof .. (l).. This reoogn ition underli es much ot 1M use ot" rcg.". Although omitted va riabk hias renders Equation (1). regressio n models can produce reliable fore<. that rest score dala are not available (perhaps the) are confident ial) but data on d ass sizes arc. however. 10 apply to thl. Suppose. Nevertheless.usts. t h ~ superintendent wants to k.e.' dis tricts to which the parent is considering moving. Assessing the Validity of Regression Models for Forecasting Because the superin tendent's problem and the pl ren l's probli".. How can the paren t make this forecast? Recall the regression of test Scores on the stude nt. TIle pareot interesled in Corecasting lest sco res does no t ca re whe the r the codficie nt in Equation (9.teRcher rati o (STR ) (rom Chapter 4: TesrScnre = 698.5) useless fo r an swering the causa l qu~ s tion . the parent mu. To be sure.J! bow well the different districts perfo rm on s tandardized tests based on a lim ih:d amount of information.2.now the ca usal effect on tC~t scores of a change in cia5. the ca usal effect on test ')C(lr~S of class size. it still ean be usc· ful for forecast ing purposes. elas!) size is not the only dete rmin ant of lest perfo rmance. Accordingly. sion analysis 10 estimate causa l effects using observational data. the parenfs problem is 10 forecast average tl!~t scores in a given district based on information re lated to lest scores.sl gu e~. A parent moving (0 a metropolitan art. Olaplcrs 48 focused on using regrc.CI plans ( 0 choose wh\!re ( 0 live based in part on the qua(jty o r the local schools.
we con sider whether Ihe results can be ge nemli7cd to perrormance andardi zed tests in other el eme nt ary pu blic school distri cts in the o n o 'h ~r SL U nited Stal es. e leme ntary school di. although the detaib dirfer.a lthough aspects of elementary school fund ing and curriculum differ.ation is made. we return 10 the probkm of assessing the validity of <I regression model for forecasting.teache r ratio on test perfo rmance in the Califor nia and Mas!\aehusens dilta would be evidence o f externa l validi ty of the fi ndings in Californ ia. fi nding diffe re nt results in the two stales would ra ise ques ti ons aboulthe internal or c:= ternal validity of at least one o f the srudie!'l. ln the case o f test scores and class size. 9 . • . Section 9. Here.»\i matcd regression musl have good explanatory power.4 Example: TMI Scores and dau Size 329 their rcspcclive problems. fi nding si mil ar resullS about the effect of the studen t. availa ble. future va lues of time series dala. In contrast.9. if we are to obtain reliable forecasts.that is.7 . lbus. Conversely. and il must be stable in the ~ nse that the regression esti mated on o ne set of data can be reliably used 10 make forecasts using ot her data.1 nOled Ihat ha ving more than one study on the same topic provides an opportun ity to asseSS the exte rnal va litlity of both stud ie~ by comparing their rcsult!'l. a paramount concern is lhat the model is ex ternally valid.!>lricts). In this seetlon. the (.S. the orga nization of classroom instruction is broadly simila r at the elementary schoollc\el in the two states (as it is in most U. based on standard· ized les.4 Example: Test Scores and Class Size The uamcwork of internal and external \alidity helps us 10 take <I critical look at what we have learnedand what we ha\'e notfrom o ur analysis of the Ca)jfor nia t~s t score d ata. in fac t. its coeffici ents must be estimated precisely. External Val idity Whether the Calirornia analysis can be gene ra lized. I n Part IV. we mus t add ress the threats to internal "alidity summarized in Key Concept 9. in the se. other COm para hIe data sets : nc. results for fou rth grade rs in 220 puhl ic school distric ts in Massachusetts in 1998. whelher il is e~te r nally va !iddepends on the populalion and ~I! tl in g to which the generalir. To obtain credible eSlimates of causal effects. Similarly. When a regrl:ssion model is used for forecasting. Both the Massachusetts and Ca lifornhl tests are broad measures ofslu dent knowledge and academic skills. we exami ne a dirferen t data set.nse that it is stabk and quanti tatively ap plicable 10 the eirClIlllstancc in which the forecast is made.
 2.1 '0  .7% 19.317 420 1999 S7226  SS""" Comparison ofthe California and Massachusetts data .330 CHAPTE R 9 Assessing Studies Based on Multiple Regression TABU 9.1 15.etts sam ples. ho\\ ever..1.8 173 I ~ Standard Oeviot>o" Test scor~s StudeDtt ~. we do not pre )~nl sc3 tre rplots of all the Massachuseus data. Ave rage distr ic t income is 20% higher in Massachusem" but the standard deviation of inco. Th is ~ca tt crplot is presented in Figure 9. The genera l pattern of Ihisscatterplot is similar to that in Figure &. 7011. Stondotd~ MauochuH1t. Test scores and average distrt'ct income.6 versus 17.1 19.8% 44.3).1% $1 5..1 I 19 18. the Massachuseu s data are althe school district leve l.'arit y. The average studentIeacher rat io is higher in Califomia (19.lli fornia data.2 for lhe C.3 % Re ceiving lunc h su~idy Ave rage district income ($) Number o( obscrvlltion~ Year 15. including definitions of the variables. A_.1 Summary StatistKs for CalifornKJ and Massochusetts Test Score Data Sets Colifornia A... it is in teresting to examine the relationship between T ~co r es lind avcr:lg.me is grea ter in Cali fo rnia.. The cubic regression function hJ ~ 8 . Cubic and logari thm ic regr~" sion functions 3rt: also plou ed in Figure 9.lifolll l<t data:The relal io nsh.c eST district in come in Massach usetts.ip be1ween income and lest scores appears to be steep fo r 10\\ va lues of income and na lter for high values. To save s p ac~. 1h e linear regression plOl' ted in the figure misses this apparent nonli m. but the test is dif£erent. The average test &core is higher in Massachuseus. Table 9. 1. The average perce nt age of students still learning Eng1i~h and th e ave rage perce ntage of s tudents rece iving subsidized lunches arc both much highe r in the Ca lifornia than in the Massachusetts d istricts.3% $I K747 220 199.:lchcr rat io  654.9% 15. the re is 3 greater spread in average district inco mes in California than in Massach usetts. or nearly so.. 2. so a direct comparison o f scores is not appropriat ~. Because il was 3 focus in Chaptcr 8.. is given in Append ix 9. More inform ation on the Mas sac hu s ~1lS data ~e t.1. The definit ions of the variables in lhe Massachusett s dala set arc the s!l me as those in the Cali fornia data se l.3% 27.6 15..1 % % English learners 1.. 1 presents summary statistics fo r the California and Massachu:. Like the C'. that ill. Evidently.
rpresc nt F9.2 than Ihe log<l ril bmic specificatio n (0.cacher i~ 20% in Cali liforoi3 EngJ~h slightl y higher R. from . huselt~ n 10 20 .2.wf of the ~~ '~I e Cali· nitions c Cali· la ta SCi. ho W> Falifo rniJ ep for 10 \\ r pl(l~> ~IC regrc~> ilion ha~· • .1.. ""'•.0. and the hypOthesis that the coeffici ent is zero can be rejected at the I % signifi cance level tr :: . I() 50 Dim·jet income L _____________________________________ (Thousands of do Uars) rferent.'. the region conlolnll'9 mos! ob~I'YOIion$. hls o nly the stude ntteacher ratio as a regressor.. T he slope is ncgalive (1. 7 ~t ~ 74() ~ " 710 f .72 in regr~s i on (1) 10 0.000 ond $30. The estimated lineorIog and cubic regreuion functions or.69 in regressio n (2) and .. The precise functiona l forms that best describe this non li nearity differ. . iOn paverJgl! [r S. ".xample: Tesl5cofM and Clo~~ Size 33 1 FIGURE 9. UbK regJ '.L ~ "''''.. 6&Ji r. Income fo.1 s how~ th at the ge nera l pallern o f nonl inearify found in the California income and (.Ieacher ratio by 60% . reported in column (1) in the table.3.e~t score data i:. a nd average district income reduces the estimated cocfficienl on the student. luc both Multiple regression results.4 E.44). the pe rcenlllge of students eligible for a free lunch . The (jrst regressio n...455) .Massachusetts Dota 78il The esti mated li rlCOr otiol'l r l«ar rrgression regreu lon function does no! capture the nonli near relotion between income and te~t scOte$ in the Mos$OChur.72 /050 :: . with the cubic specification fin ing bes t in Massachusetts but lhe linear· log speciIicalion fitting best in Californ ia.486 ve rsus 0. .9. Comparing Figures 8. t. 7<)(' l " ".e1ts dalo..1.7 and 9.Tht. • .000. Regression result s for the Massachusetts data are presented inlable 9. also prescnt in the Massa chu se lh dfll3 . similer for district inc:omes between $13. however.64 in regression (3). The remainingoolumns report the results of including additional variables that control for student characteri stics and uf imroducing nonlinearities into the esti mated regression function. Controlling for the percentage of English learners...72).1 Test Scores V$.
6 (8.3.64* (0. There is no st.a t isticall y significant evidence o f a nonlinear rela· tionsh ip between test scores and the st ude nt. 437 (0. = Hi£L x 5TR % E ligi ble foc free lunch 0.3.303) .1 cm:ww rJi Comp<l ring the :R2's o f regressions (2) and (3) indicates thai the cubic specifi · cation (3) provid es a better mode l of the re lat ionship betwee n test scores and income than does (he logari thmic specificati.0.300) .6 (9.164 (0.OS9) 0.165 (O.·ut (0.0.on (2).1 74 (0. Similarly.1) { TaMe 9. 739.7~ * (0.085) . 220 obHrvorions.12. O. /li STR·' % English learn ers % English learne rs > median? (Binary.56) .35) 0.090) 3.2 Multiple Regression Estimates of the Sl\.49) .02" (0.306) .0022· (o.0010) 7 4 7 .0023· (0..0023* (0.5) 0.521'" (0.0. and KierK.0022 (0.l Intercep.27) (3 ) (4 ) Studentteacher ratio (STR) STR2 . .0010) 665..i2 u (0. even holding consta nt the :..l.value of 0.CXHO) 744.4 (1 4.[U · de ntteache r ratio.104) .O 5) .077) 16.434 (0 . A ") 16) OependenT voriable: average combined English.22 (2.3)  0.9" (23.. Regressor (1 ) I' ) 0.50) 0.0" (21. .0.38 (2...0) .S 0.013) .IdentTeacher Ratio and Test Scores: Data from Mos5Q(husetts .72) . there is no evidence that a reduction in I h~ student.097) .67* (027) s· '" .2) (20.587" (0. q Aueuing Studie~ Bo~ on Multiple Regres~ion TABLE 9.." "ore in tne f(hool district.4· ~  0 .53** (3.4" (11.87· (2.0..6&1 (0.0.o53 u (0. 332 CHAPTER.l 07 (2.49) 0.27) 12.80 (0.091) Dislrict income (logarithm) District income District incorne l .1.69* (0.6) 682. HiE L ) o.37) 0.31 ) 0.teacher ratio has a di[fere nl effect in districts wi th many En glish learners .3) 759.0010) District inwrn c.teacher rati o : The F· statislic if) regression (4) testing whether the population coefficients OIl STR2 and STR J are zero ha s a p.8) S£ R' .641. moth .737) '.0.5· · (81. fUll } (0.15) . fourth grade.582" (0.184* (0.
T Indi.2.4 &omple: Tesl Scores ond Cion Size 333 ( TQh~ 9.l errQ n. Therefore we adopt regression (3) as o ur bafie esti male of Ih~ effect in test scores of a change in the siudent.431 Finally. 1. '_g1 are 1 in InC !arner.nd cha racle ristics red uced the coeffi cienl on the studentteilcher ratio from .'w~d) . For the Ca lifo rnia data. we found: I) " IHIII(d) J..676 8.1 rol1l. "'odfincrm arc Stallst lC4lly significanl al th e '5% level or .038) 0. regression (6) shows that the estimated coefficient on the student.69  8.5S (O..64 0.020) STR!.teacher ra tio does not change substanti ally when the percentage of English lea rners (which is insignifica nt in regression (3)1 is excluded.9. 1.Tatt.5) ~HO) I "~ tban wilh few (the Istalisticon HiEL x 5TR in regr~i on (5) is 0. are gwen in parentlle!>C's unde r the .a ry :scboo] dl~lncl~ dncritx<. Adding variables Ihat con lrol for siude nibackgrou.675 TIwlot regressions wer".80/0.002) STR 14.Ieacher Tfltio is nonlinear.85 (0. .01 % le vel.3. Sland~t<.01 '" . 2.4 .20S) 6.64 8.:.063 0.675 8.0_73 ITable 8.l~~ ar~ given in parenTheses under th~ F.74 7.~dual ics.Statistics Clnd pValue.l in Append»:: 9.64 1) 4. (I I All 5TH.2.670 0. The effect of cutting th e student.teacher ratio did not depend in an important way on the percentage of English learn ers in Ihe distri cL 4. variables and lIl t ~ract i on s . .2.001) X 5. even after adding variables tha t con trol (or Siude nt backgrou nd a nd diSlrict econo mic c haracleristics.56 = 1.55 (0. income' JIiEL..674 R' 0.003) 1.. a reduction of 68%. n!im~t~d USlIl! th« data on MassacbuseU$ e1emclll. in n rela 3. pt:!cifi es and he stU i ~tic.75 « 0. (3 ) .63 0.001)  «0.EL S£I< 7.45 (0. There is some evidence that the relationship between test scores and the stu dent.62 8. 2. 13 ~~ .. ~nd p'~' all. 0 (0. Testing Exclusion of Groups of VClrioble. Ihe res ulls in regression (3) are not sensitive to the changes in functional form and specification considered in regressions (4){6) in Table 9. W· Comparison ofMassachusetts and Call/ornia results.61 0. In shOJi.86 (0. STRJ '" 0 Jllcomr. The hypothesis that the true coefficie nt on the studentteacher ratio is zero was rejected at The 1 % significance level. H.'> • .odficicnts.leache r ra lio based on the Massachusetts dat a.28 [Table 7. regression (2)1. regression (1)] to . .
The seco nd column report s the s ta nd ard deviation of the test sco res across districts. however.2.0.1 r TAB l. the percentage of students e ligible fo r a free lu nr. ru  catif nca CubiC" Rt'I/m Cubic: R. The fi na l t\HJ columns report the estimated effect on test scores of red ucing the student. The coefficien ts on the studcnt..teache r ratio.. th e coeffi cients th emsi._ lish learne rs in the d istrict.69 [Tank 9.lbe standard error of this estimate is 0. divided by the standard deviation o f the tesi. whereas they a r~ ').l across the two data sets.1 X ( 2) = 1.!h·c~ cannot be compared directly: One point on the Massachuse tts test is not the s<lme as one point on the California lest. O ne way to do th is tS to transform th e test scort:s by standardizing Ihem: Subtract the sample a\'crag. ca n he cOOl'p<lJe<. Those cocfficientl) arc only significant at the 5% kvel in the Massachuse l1s da ta. regression (2)1. the lest scores are pur into the ...tandard deviation units. there arc nearly t". If. first in the umts of the test. this corresponds to 1.aliforn ia dat a is .3. (2). a Jeduction of 60%. lliis comparison is undertaken in Ta ble 9.teacher ratio from 1 . and the average district income incl uded as conlro l variables. s estimated to increase district test scores by _0. there is no statistically significant evide nce in the MassachuSCIIS da ta o f an inte raction bet wee n the studen t.teacher ratio and Ihe binary v. does oo t hold up in the Milssachusetts d ata: Th e h~ path· esis that the relationship betwee n the s tudelltt e~c h e r ratio and test sco reS is lin ear can not be rejected at the 5% sig nifi cance level whe n tested aga in st a cuhic specification. Beca use the standard de viation of test scores is 19. Fi nding (4). however.11)c fi rst column re ports the OLS est ima tes of the coefficient on the studentteache r ratio in a regression with the perce ntage of English learne rs. 'Tnu5 the coefficie nt on the studen t.!!" nificant at the 1% leve l in th e California data ..h.teacher ratio by two students per teacher (our s uperintenden t's proposal). the n the estimated class size effects can be compared..1 pomt. liw O LS coefficie nt est imate using C. Because the two standardized testS are different. However..72 [Table 9.c and divide by the standard deviation so thattbcy have a mea n of Uand a varianct: of 1.46 poin ts. and (3).73. The slope cocfucienlS in the regression with the transformed test score equal the slope coefficie nts in the origi nal regressio o.26 X 2 / 19. divided by tht: 5lim· dard dc"i(l lion o f lest scores..7.niable indicati ng a large percentage of Eng.d!/(. i c~ ilS ma ny observations in the Califo rnia data.ame unit s.334 CHAPTU 9 As~ssing Studies Based on Multiple Regression Do we fi nd the same things in MaSSlichusem? For fi ndings (1 ). As io the Ca lifo rnia da ta.. so it is no t surprising that the Califor lllil cstimates are morc precise..teachl" ra tio re main signific:::a nt aft e r adding the eo nlrol variables. For the linear specification.1 = 0. Mass1 Lmea~ Slandari .2.'6/ 19.076 standard deviations of the distribulton vi test scores across districts. so cutt ing the st udent. the answer is } c$.teacher ratio by two. and second in . Includi ng the additional control variables reduces the coefficient on the student. regression (I )Ito 0.
73 (11. The estimated eff~c ls for the non linea r mode ls a nd thejr Slanda rd errOrs w~re computed using the method described in Section 8. nJ ~nQn Ire &I.. 0..3. so:.1).027) 0 153 (0.·en In r arcnlh .54) !:>t.085 (OJI36) to. arc small.2 / 19. .lble 8 3(7) RtJII('t oS (R Irom 22 11120 Mo.085 standard deviation uni t.93 (0. TIle non linea r models fo r Californ ia data suggest a somewhat larger effect.3(7) Rt rillCt srR from 2f)w 18 Cui"llc·1'. These estimates arc essen tially the samc.027.teac he r ratio.w~ Stvdenf$ pet' TeeKher.teacher ratio is predicted to raise tcS! scores. Reducing the student. while nonzero.027.' Deviotton* unear:Table 8.64 (0.soc hu Mtt$ 19. Ratios and Test Scores: Comparing the Estimates from California and Mossochusetts htimoted Effed of T_ F.according to this esti mate.. or 0.70) 190 (0. the differe nce in test scores bet\\cen the medi an district and a diSlricl a t the 75 1h pe rcentile is 12.26) 19. wilh B standard erro r ofO. on the Te.ralio by twO would move a district on ly one ·tent h of the way from the med ian to the 75 th percentile of the distribUlion of test scores across d istricts.lnd . l.4 Example: Test Scores and dau Size 335 TABLE 9.036. Based on the Massachusctts data.52) 1 0.0. CaHtom io Standard Devtarion of T..27) 15. The estimuted effect from the linear modd is just over oneIent h Ihis size. in other words. but the estim ated benefits shown in Ta ble 9.1 19. this c:slimaled dfecl is 0.thle 8.1 j 1. Cutt ing the student.076 (0.9. r Standard Pain'. a red uction of two students est per T eacher is es lim at~d to increase T scores by 0.2H 0. fo r example.1 1.: T. with a standard error of 0. but the predicted improvement is sma ll. CUll ing the Slu dent tcacher.037) ruN. In the Cal ifornia data.099 (0.64 ( = 12. with the specific effect depend ing on the in itial stu dent.teacher ra lio by two is a large change for a district.69) 1 0.2(3) .036) Lme!lrTahlc 9.3 StudentTeach.0. 1) standard deviations. Based on the lin ea r model using Ca lifornia data.1 2.46 (0.s' Stores Across Dislri<'.3(2) .076 standard deviation unit .. In Units of: OLS Estimate 8Jr.2 test score points (Table 4.
336 CHAPTER 9 Assessing Studies Based on Multiple Reg r~sion This ana lysis of Ma~sachusetts data suggests that the California results <Ire externally valid.. ..: did nOt substantially ah d the estimated effect of reducing the studcntteacher ratio. districts W a low studentteacher ratio might attract fami. diSlriCIS with a low student teacher rat io migh t also offer man y extracurricular learning opportunities. i!. lO conduct an experime n1.. For example. O ne way to eli minate omitted variable bias. TIle multiple regressions reported in this and previ ou~ chapters control for a student characterist ic (the percent age of English learn ers ). this suggests tha t tho. at least in theory.io n specificatioos.1ies that are more com iUl mi11ed to enhancing their children's learni ng at ho me. Such omi ued factors could lead 10 om il1ed variable bias. Similarl y. Omitted variables.) Ihe coe ffi cie nt o n the ~lUdentt eac her ralio. at least when generalized to ekmenw ry school districts elsewherc in the United States. A[~o. such as other school Hnd student charac teristics. while those th at wen. and their omission might cause omitted variables hias. Altho ugh furthcr rune' tion<l l fo rm ana lysis could be carried oul. and if t ea ch~r quality a {fecLS test scores. and we examine it in Chap ter 13.then omission ot teache r quality could bia'.teache r Hlli05). Functional form .: m<liu findin gs of these studies are unlikely 10 be sensitive to using dirferent nonlinear regrc. Such a study was in fact conducted in Tennessee. Section 9. We found that some of th e possible nonl incarities investigated were not statistically significant.n the estimated effect on test scores on class size.. Possible omitted variables remain. The analysis here and in Chapter 8 exp lored a varict) of hmct ional forms. Internal Validity T1le sim ilarity of the resulls (or California and Massach usetts does not ensure their imemal va lidity.2 listed five possible th reats to internal validity that could induce bias i. For example. and their subsequent performance on standardi zed tests could be compared.. and a broader measure of the affluence of the di suict (average district income). students could be randoml y aSSigned to different siu classes. We consIder these threats in turn. if the studenHeacher ratio is correlated wit h teacher quality (perhaps beca use better teache rs are at tracted to schoob with smaller stude nt. a family economic characteristic (the percentage ot students receiving a subsidi/t"d Il1nch) .
al or Th e average studentteache r rati o in the distri ct is a broad and potentially inaccurate measure o ( class size.9. Thus. the det ails are compl ica ted and specialized and we leave them to more advanced lex ts.ng the time of these tests. for example. u<. Those data wen~ taken from the 1990 census.'on to believe that sample se lection is a problem here. ct an size d be it in All o f lhc results repon ed here and in ea rl ier chapters use heteroskedasticrobust standard errors.:. however. a series of court cases led to some equalization o f funding. ed CI Selection. Simultaneous causality. because stu dents move in and OUI of districts. while the ot her data pertain to 1998 (Massachusclls) or 1999 (Califo rnia ).4 C Example: Test xares and Closs Size 337 e . In California . s).ariables. this would be an imprecise me<lsure of the actual average dislric1 income.l istri butjo n of funds was not based o n student achieveme nt . The California and the Massachusetts data cover a ll the puhlic ele mentary school dist ricts in the slate th aI satisfy minimum size restrictions. {n Massachusetts. in th e sense thai the main find ings can be generalized to perform:lnce on standa rd ized tests at oth er elemen tary school districts in the Uniled Stales. t. ro r exam ple. This could bap· pen.. if there is a bureaucratic or political mechanism for increasing the fun ding of poorly pc rform ing schoub or districts. hut Ihis rel. . ~o th ere is no rea. Altbough there afe alt er nat ive standard error formulas th at could be applied to this sil u<l tion.. which in turn could lead to the estimated class size effect being biased towflrd zero. Simultaneous ca usa lit y would arise if the perfo r ac the tter d if Ihe nl Iso.. Correla tion of the error term across ob~ e r va li ons. il1ed ~ fun c' Ings 01 ~essi o n Discussion and Implications The sim ilarity between l h~' Massachusetts and Cali fornia results suggest that these studies are externally valid. in neither Massachusells nor Calirornia does simultaneous causality appear 10 be a problem. so heleroskedasticit y does 0 01 Ihreaten in ternal va lidi ty. '~om mance on standardized l es t~ affected the studentteacher ratio. alter Iyof g. which in tum resulted in hir ing more teachers. Errorsin ~. Another variable with pote·nt ial measure ment error is average district incume. could threaten Iheconsislency of the standard errors because simple random sampling was not used (t he sample cons ists of all elementary scJwol districts in the st ate) .the stude ntteacher ra lio might not accurately represen t the actual class sizes experienced by the students taking the lest.. :ould Heteroskedasticity and correlation ofthe error term across observations.. II" the economic composition o f the district cbanged substantially over the 199Os. no such mechanism for cqualization of school fin ancing was in place duri.
which wt have measured by performa nce on standardized l e<. CUlling Ihe student.Ieacher raLio by two students per teache r is predicled to increase lest scores by approxi malet)' 0.nd Willms (200h. The eSlimated effeci of Ihe proposal o n sta ndardized lest pe rfo rmance is one imporlan l input into her ca lcu lation of costs and be ne fil s. and after mode ling nonljnea rilies in Ihe reg ression functi on. 2OO1b).• 338 CHAPTER 9 Assessing Studies Based 00 Multiple Regression Some o f the most important potenli al threats to internal validi ty have been addressed by cOntrollin g for student background. but it is quite smaiL TIlls sm<lll eslim atcd effect is in li ne with the results o f the ma.5 Conclusion The concepls of int ernal and external validi ty provide a framework for assessing what has been learned fr om an econometric study. 3. includ ing lo\\er dropou t rates a nd enhanced future earni ngs. . Th e benefits include improved academic per formance . student characte ristics. Thrc. she will need to wei gh the COStS of th¢ proposed reducti on against the bendit s. and district afnuence.: the revie. and district affluence. and by checking fo r oonlinearities in the regression funl"' tion.lIs to the internal validity of such a study include omitted vari ables. '\<'. The costs in clude leacher salaries and expenses for additional classrooms. misspecific<1tiOn of funct ional form (nonlinearities).ny stud ies that have investi ga ted the effects on test scores o f cla ss size reductions. family economic back ground. some pote nti al threats to internal vali dity remain . 4 Th e superin tendent can now use lhis estimate to help her decide whether 10 reduce class ~izes. Thise ffeci is statistica lly significant. In making this dec ision.08 sta ndard deviatio n o f Ihe dis tri bu tion of teSI scores across districts. and if standard errors 3re consisten t.. A leading candidall' is omitted va riable bias. 9. GaIDOran. perhaps arising because the control variables do nut capture o ther characte ristics of the school districts or extracurricular learning opportunities. we arc able to answer the superintendent 's quest ion fr om Section 4..~ by "hrcnberg.t~ hUI there are o ther poten tial benefit s that we have not studied . Based on both th e California and the Massachuse liS data . imprecise measurement o f t he inde pend~nt ~J r you arc mterestro in learning more ahoulthe reilltion~hi p bel"'ccn cl a~ Site and Ic~l ~or~j. A stu dy based on multiple regression is inlernally valid if the estimated cod fid ent s are unbi ased and consistent ...1 : After controlling for fam il y economic backg round. SLiIl. Brewe r.
g fIS r. The next {Wo p. A study is inte mally valid if the sta tistical in ferences about causa l effects are valid fo r the popUlation being :. which are causal eUects th~H vary " I . Second. as they can be with time series data. and simultaneous ca usalit y. 3. o r there is sim ultaneo us causa lit y between the regressors and dependent variables. Statistical studies are evaluati!d by asking whe ther the analysis is inte rnally and externally val id. Part III exte nds the multiple regressio n mode l in ways designed to mitigate all fi ve sources of potential bias in the OLS estimator. the sam ple is chose n oo nrandomly fro m the population . These latter problems ca n be addressed by computing the stao· dard errors properly. there are two types of threats to inter· nal valid ity.ms of this textbook develop ways 10 address threats to inter· na l valklity that canno t be mitigated by multiple regression analysis alone.Summary 339 variables (errorsinvariables). In reg. A study using regression ana lysis. Part IV deve lops me l hods for an alyzing time series da ta and fo r using lime series data 10 estimate socalled dynam ic causal effects. then internal validity is compromised because the standard errors wi l\ be inconsistent. o r if they are het eroskedaslic but the standard errors are comput ed using the homoskedasticilY o nly fo rmula. Each of these introduces correlation between the regressor and the error term. OLS estimators will be inconsistent if the regressors and error terms are correlated . like any statistica l study.f over time. Whether or no l the re a re two o r mo re such studies. randomized controlled experimenlS. I'" . however. Summary 1. sample selection .r~ ss ion estimation of causal effects. an incorrect functional form is used . If tbe e rrors arc correlated across observation s. Some times il can help to compare two or more studies on the sa me topic. one o r mOre of the regresso1'5 is measured wi th erro l'. Part JJI also discusses a diffe re nt approach to obtaining intern al validity. First. 2. Regressors and error terms roay be correlated when there are omitted variables. which in tum makes OLS estimators biased and inconsiste nt. is ext ernally valid if its find ings can be generalized beyond the population and se lling studied .t udied. assessing external validity requi res making judgments about thesimila rities of the population nod selling studit:d and the population and st: Hing to which the results are being generalized. A study is externally valid if its infe rences and conclusions can be genera lized fro m the population and selling studied to o ther populations and sellings. con fid ence int ervals and hypot hesis tests are not valid when the standard errors are incorrect.
2 describes the problem of vnria ble selectio n in t e rm ~ 01 a tradeoff belwee n bias nnd varia nce. 5. regression coefficie nts to be unbiased esLinHlIes of causal e(fects. o r when the errOt term is correlated across differen t obse rvations.4 Suppose that a Slate offered vo lunta ry standardized lests to a ll of its thlrJ graders. and these data were used in a s tudy of class size on studem ~rr<1T' mance.: b~ using citylevel da ta.<.carche r estimates the t: [fecl o n crime rales of spending o n polic. Wha l is this tradeoff? Why could incltld· ing an additional regressor decrease bias? iocrease variance? 9. Explain how simul taneous causality migh t in vnlidJle tlw results. thallhe regressio n model be exte rn ally va lid (or the forecastin g applica tion at hand.2 Key Concept 9.6 A researcher esti mates II regression using two diffe rent softwar e packages. The first uses the homoskedasticityonly form ula for sta ndard errors. pUler software uses the bo moskcdasticiryonly stand ard errors.. The seco nd uses the heteroskedasticity·robust formu la.5 A re.H" very different. !.1 \\'hat is the difference between internal and external validity? Be lwcellihe populal ioo Sl udie d a nd the populatio n o f int e resl? 9. It is critical. il is not neces'iury for th.340 CHAPTER 9 Auening Studies Based on Mukip\e lI:egreuion • 4. Key Terms internal vatidity (3l3) external vl. Explain how sample se lection bias might inva lidate the resulls. Whicb should the researcher use? Why? .. Standard errors arc incorrect when the errors are heteroskcdastic and the Com.tlidity (313) population ~ Iudied (313) popul at ion o ( inte rest (313) setting (315) functional form m isspccificario n (311) errors· inva ria ble bklS (3 19) sample se lection bias (322) simuhaneous c:lU sality (324) simult aneous eq uati ons bias (325) Review the Concepts 9. how ewr.I. The s tandard e rrors .3 Economic variables a re oft e n measure d wil h e rro r. 9. When regression models are used solely for fo recasting. 9. D oes Ihis mean thai regression analysis is unreliahlc ? E xpla in .
3 to compare the estimated effects of (l to Ol" increase in distri ct income o n test scores in California a nd Massa · chuse tts. (Assume tha t W j is inde penden t o f Yj a nd Xj (or all values of i a ndj and has a finite fourth mo rne n!.ment error in Y is Dol : ' 9. T = = a.!ing empirical result. so that the data arc Y. Show tha t Vi = It.) 9. + II. Exp lain how sample se lectio n m ight be the cause of this result.construct a table li ke Table 9. Show that the regressio n V. whe re w.es a nd :.) (This empir ical pu zzle mOlivated James H eckman's research on sample selection t hat led tu his 2000 Nobel Prize in eco nomics. + Wj' b. it concl uded th a t ad vertising on bu:. educa tio n. :. New York in 2006.2. occ upation. and Xi' Conside r the population rcgre ssio(l V f3u + f31X i + v" ~here V.3. Use the concept of external validity to deter· mine if these results are likely to apply to B ~ t on in The 1970s: ~ Angeles in the 19705. is the regressio n e rro r i us ing the mismeasured depende nt variable. = 130 f3IX. Using ra ndomly selected employed women. they regressed ea rnings o n the women's number of children and a set of con tro l va ri ables (age.ure me nt e rro r which is i. and so fOrlh). and independe nt of Y.4 Using the (egressions shown in colunm (2) o fT:i bk s'3 a nd column (2) of Table ~.3 Labor econo mi sts stud ying the determ inant S of women's earnings discov ered a puv. Y. Evalu ate these sta teme nt s: "Measure me nt erro r in the X's is a se rious proble m. Can confidence inte rva ls be constructed in t}le usual way? e. Th ey found thai wome n with mo re ch ildre n had higher wages. is the mea· measured with error...E.· + v. = f3u + f3LX.Ld . ( Him: Notice that the sample includes only women who a re working.2 Consider the onevariable regression model: Y. nn d sup· pose that il satisfies the assumption in KeX Concept 4.ubways was more effective than print advert ising. .. 9. Usi ng data from New York during the 1970s.) c.3.x.conlroWng for these o lher fac tors. Suppose that Yj is Y j + 11'.erciSM 34 1 Exercises 9.1 Suppose that you have just read a careful sta tistica l stud y of the effect of adver1ising on the de mand for cigarelles. Measure. Are the OLS estimators consb ltnl? d. .sa lisfies the assumpt ions in Key Concept 4.
SER = (1 5.: s~i o n Y. where II de no te!.) is C Ollected.l (. a. and Q. Is the es timated slope too large o r 100 sm all ? (Him: Use the fac t tha l de mand cu rve~ slope down and supply curves slope up. fHint: Use Eq ua tio ns (4.d. and art. observations for (Y"X. is regressed o n Pj' (That is. Solve the twO simulta neous e quations 10 show how Q a nd P de rcnd o n!1 and v. A no the r researche r is interested in (he same regressio n. Deri . and u de no tes fact ors olher than price th. Y. Use yO U( a nswe rs to (b) a.1 + 66.5 The demand fo r a commodity is given by Q = {Jo + {3I P + /I .1) (12. d.81.. SER = _. De ri ve the m ea ns of P and Q.'X.) yie ld the following r~ s ult s : regrl. observatio n 2 e nlered twice. Suppos. (.2) 15. R' = 0.342 CHAPTIR 9 As~sing Studies Bo~ on Multiple Regression 9. but he make. U~ Ihese to dete rmi ne the regressio n slatistics. fac tors o th e r tha n p rice thai determine supply.8). so he has 200 observalio ns (with observati on I entered twice.:h observation twice.ine dem a nd.nd (c) to de rive values of Ihe regres:.. i.. is tb.' a n d ~.l[ de term. Qi is th e regressa nd a nd P. S upply fo r the commodity is given by Q == "Yo + "YIP . e the variance o f P. b. c.) Suppose tha t the sa mple is very la rge.: regressor. have varia nces u.. Using these 200 o bservations. whilt results wi ll be produced by his regression program? (Hint· W rite the '·incorrect'· values of the sam pit" means...s of Ya nd X as funcli o o ~ of the ·'cor· rect"' va lues. A rand om sample uf o bserva tions of (Qj.6 Suppose n :: 100 i. .R~ = _.) 9. and so fort h). J ii... variances and covarianct.: lh at LI a nd v bo th have a mean of ze ro. where Q denOTes quantity.7) a nd (4.l .ion coe ffic ie nts. the variance of Q.32 . and t he covariance between Q and P.1.) y == _+ _ X. Pdenates price. H. A researche r uses the slo pe of Ihis regressio n as an estimate of the s lope of the demand function (f3d. i. P.. mOlUa lly uncorrc la led.In e rro r when he e nters the d a ta into his regression program: H e ente~ eiu.
Empiricol &erci5e~ b.11 Read the box "The De mand fo r EconOmics Jo uma ls'· in Section 8.uScore on Income shown in Figure 8. Include a discussion of possible om illed variable bias. misspecificatio n of the functional fo rm of the regression. TIle da ta set C PS92_04 described in Empiocal Exercise 3. useful fo r predi cti ng test sco res in a school d istrict in M assachusettS? Why or why no t? Consider the.1 includes data fro m 2004 and 1992. kes an rs each enterl!d " D iscuss th e inte rnal valid ity of the regressio ns th at you used L answer O Empirica l Exercise 8.wn 'ession Empirical Exercises £9.1inear regression of Te. his ~amplc Ie ··cor· b. 1 10 a nswer Ihe fo ll owing questio ns. and inconsislency of the OLS standard errors. [Nult~: Reme mber to adjust fo r inflat ion as explained in Empirical Exercise 3. 1 .3.n E mpirica l E:cefcise 8. Discuss the intem a l and exte rna l valid ity o t the estim ate d e ffect of price per cit atio n on su bscriptio ns." b. Discuss the inte rnal a nd exle rnal va lidi ty of the estima ted effect of ed u catio n o n e a rn ings. simultaneous causality. "Ao ord inary least squares regressio n of Y on to X will be internally inconsiste nt if X is correlated with the e rro r te rm .18).1 Read the box "The Returns to Educatio n a nd Lhe Gender Gap" in SCClioo 0 8.1 Use tht da ta se t CPS04 described in. errorsin . 9. "Each of the fi ve primar)' threats to inte rna l validity implit!s that X is correlated with the e rror te rm:' Would the regressio n in Equatio n (9. sample selection .2 and the nonlinear regression in Equation (8. [the large .variables. Use these data to investigate the (temporal) ex terna l validity of the conclusions that you reached . Would either of these regressions provide a reliable estimate of the effect of income o n test scores? L) ion Would e it her of lhese regressions provide a re liable mc thod for forecasting lest scores? Explain. 9.1(1).3. E mpir ica l E:xe rcise 4.1(1). 343 a. Which (if any) of the interna l validity cond itions are vio lated? Are the following state me nts true o r false? Expla in your ans\\eT . 1(b).5) be.
. The data SC I ColtegeDistance excluded students from western s l atc~: data [ar these student s are included in the data ~e t Coll ege Dis· tanceWest.3(i).:r_ nal val idity oC Iht: regressions that you carried o ut 10 answer the E mrirical Exercises using these data in earlier chapters. U~e these data to investigate the (geographic) ext ernal validity of thc conclusions that you reached in E mpirical Exer· ci:"e 8.2 that has served as the basis for several Empirical Exercises in Part II of the lex\. misspecificalion of the function. and gender.344 CHAPTE R. Based on yo ur analysis of these dala.Thc test is sponsored by the M assachu~elts Department of Ed ucation .2 A committee o n im proving undergraduate teaching at your college need. so you mUSl base your recommendations on thl analysis of the datase t TeachingRatings described in Empirical Exercise 4. as an econometric expen.3(i) .) You do not have time to collecl your own data. E9. about whether your college should take physical appearance into account when hi ring teaching fucuil y. your help hefore reporting to the Dean..1 form of the regres:. age.ion. Include a d iscussio n of possible omitted vari able bias. The committee seeks your advIce. c rrors·in va ri ables. t~·1TI (MCAS) lest administered to all fuurt h graders in Massach usclts public sehoob in the spring of 1995.\lId . and inconsistency of the OLS standard errof5. sample selection. \\Ih<ll is your advice? Justify your advice based on a care(ui and complete assessm ent of t he internal and C\II. simullaneous causality. (This is legal as 10tl~ as doing so is blind to race. b.3 Use the data set Coliege Dishm ee de scribed in Emp irical answer the foll owi ng Ques ti a n ~ g. The lest score is taken from the Massachusetts Campn:henswe Assessment Sr... 9 Asw»sing Studie~ Based on Mu~iple Regreuion E9.1 The Massachusetts Elementary School Testing Data The Massltchusells data arc distrietwide averages for public elementary school dislfld. APPENDIX 9. Exere is~ 4.' Hl 1998. religion.3 10 Discuss the iote rnal va lid ity o f the regressio ns that yo u used to answer Empirical Exercise 8.
and science ponions ofl he test. which is the sum of the scoret. and were obtained from the Massachu setts Dep:trlme nl o f Education. Data on avemge district Income were obtained from the I:~ Ihe 1 10< ~ 1990 U. Cen~u~ "< k al :\ to weT .S.. .al Ing me I Data on the ~ tu dentte3che r ratio.ion.I(1ll Is in . mat h. on the English.b' :ltiOn aoJ . :e. The dala ana lp~t"d here nre the ow:rali lotal soon:.199S school year.chool district for the 1997. lhe percentage of students receivi ng a subsidized lunch. and Ihe percentage of stude nlS still ieaming English are ave rages for clIch elemen tary :.The Mau ochuseth ElemenlQry School Testing Dota js 345 IS mandalury for all puhlic schools. lC ~: d ijst n ct~ III I Sy.
~ RT TH REE I Further Topics in Regression Analysis C H A I'TEK 10 C Il A PTE R II Regression with Panel Data Regression with a Binary Dependent Variable Instrumental Va riables Regression Experiments and QUQ.'i Exp eriments C H A PTE R 12 C II A I'TB R 13 • .
.foLsome ofi t he varia bles.. rl'. Th is chapter descri bes a mcthod fo r contro ll ing fo r some Iypes of omitted variables wilho ut actua ll y ubserving them.ily.t do not cbange over time.· ~ with Panel Data M ptx.talilies?"We 1Ji. .~~ f1"1'1* 4kc .. # variables th at differ (rom one state to the next. This p'and data sci ~ts ~ears us cOlll. (/~ dm data ~.. called pane l data.~UIU addfiJS thi s questio~ using data on traffic fatalities. is an extension of muhiple regression that exploits panel data to control for variables that differ across entit ies but lire constanl over time. • ~w. ilke Improvements inj the safety of new cars. ". Fixed effect s regression is introduced in Sectio ns 10..~ CHA PTER 10 Regression rmw .!coho! taxes.(.gM/~1 /wI) f Of data.. in which each obscrva tion<il unit.rivin. In Seclioo 10.P _/J... ~? Section 10.. drunk driving ="> . states for each of th e seven s.. it is possible 10 eliminaLe the·effect ofomitted variables that differ across ff ~ entitiesbut are constant" over time.~ .e~_ .(ul~ U lti ~ l e regrcssio. these • 349 .lJ)' studying clltl1~geJ in the dependent variable over time..Kee.!! omilted variable bias. allows e~ ~ ~~ to control for variables Ih al vary through lime. however. such liS preva iling cultural atlifudcs toward drinking." is a powe rfull ~ 1 for controiJ ing tor fhe effect of vanables on which wc· have data.g. ~.. This me thod requires a speci fic type I"""'" (" t/(A< . from 1982 10 1988.a!!d d..faw and related \'ariahles fo r tbe 48 contiguous U...'\flint! .4..roJ for unobserved .~. Fixed effects regression. the n for multiple time period:.S. the main tool for regression analysis of 'fng(lVV tt.3. but do not vary across s tates. the)' cannot be included in the regression <lnuthe OLS estimators of the regression coefficients could ha... I( data are not 3vailable. ~ panel da ta.. .2 and 10.... I . o r enL. lualso ~ ~ ~t/ .. 1 descri bes the structure ofpanel data ana introduces Ih" .. is observed at twO or morc time periods. .".. The em pirical application in this chapler concerns drunk driviQg: Whalare the effects of alcohol taxes and drunk driving laws on traffic fa. bU. fi rst fo r the case of ani)' two lime periods.ud .
v!e J 1tvd(1 ..u.00 entity and the time period .350 CHAPTER 10 Regression with Ponel Data '__ KeY CONal'! NOTATION FOR PANEl DATA ~ (10. ~I/AfiI. Y..SI o f observ.:...iJ QVU . When de!scribing crosssectional data it was usc fulto use a subscri pt to denote the e ntil y. refers to the entjty.h of T~nods.h~ .' . 1O.ume pcnQd. 1 Panel dOl' cons.. 19M)...he "men cn. I.. ervdr: .i.nt iI Y. script.I! PI e A4 : J..6....J. then the data are denotcd (XIl'Y.. I. referred to the varia ble Y for rh ~ il h e.. which control (!UK.ln is is done by us ing two subscripts rather than ••JIle: 'm e fi rst. ~~ pt.vI ~ the vanablc~ arc ('~rved for each cntlty. refen to the date al which it is observed.~ observatio~. we need some addit ional n0 13 1 10 kCl'P Irack of both Ihe .. for e" amp!. and tile second sub.5 d iscusses tbe panel data regression assumptions and standard crron for pfln el da ta regn:ssion. for a total of 7 X 48 = 3 3(1 observations.'h of II entities in lite I wht:lh~r .. . and the second . i. Those data arc for II 'R enli· ties (stales).' ~ Panel Data = rl IJ ft.' Recall fro m Section 1.i= l...1 k..~ _ .and each.ohscrvalion~ that ~ ... refers to the ti me period of Ihe ~ ~P~j IJ ~ B~ /J . t.JtJ? ~me o... I n Section 10.T t<.on. on .3 th at pa nel dala (also caJled longitudi nal data) rcfe~ to data for II differenl entities observed at T different time periods..1subscript. me thods are extended to incorpora te socalled timc ~ed e ffects.cs utlwo or more time peri. Som e add ition al terminology associated wit h panel data describes ~ . ...ThuS !J deno~es I~e variabl~ Yo~served for the .. I /. . A bala~ced pancl .. \\hen: each emity is observed in T = 7 time periods (each of the! year.... When descri bing panel data..bser"at io~ are mis_s~l~.. ~ 10. we use these methods to st udy (he effect of alcohol (axes and drun k driving laws on traffic deaths..l. refers to the entity being observed.1Ji'lS notallon IS summanzed III Key Concepl lO. .). l. ffll e stale traffic fata lity data studied in this chapter are panel data.Apand 'lhal b. lf the data set contains observations on the variables X and Y.. 1982... ods T. T.1) . I wh ere tbe fi n. Section 10.aJl its .U / ~Pve/ for unobserved va riables that <lre constant across enl ities but change over ti11\f".. nandt= .{.
.01 .. so il is ba la nced.M 7i.~( IO... have heen drinking.~"/1c. however. The pan el data set \Jriables related to traffic fatalities and alcOhol.W~ ~ _ OLS rcgression line o btained by regre~sing the fatality ratc on the rea l bee r lax is contain~ . but noutatist ..o" softw"" he. Approx. se nts tne talUlily ra te III 1982 an d the real beer tax in 1982 for .. A Roint in th is scatte rplot reprc . hcc:lo"C of inflation a IU of S 1 in 1m \:orresponds 10 a tax o f $1. however .!in& data for at least onc time period (or at least one entity is called an unbalanced p. ~ Example: Traffic Deaths and Alcohol Taxes f we did not have d ata on fatalities for some s lat c~ in 1983). a nd that a dri\c r who is legally drunk is a t least 13 times as lik elv to ca use a (a lai nash as a d river \.cri!"lcd in more detail in. 200J) estimatcs tha t as many as 25% of dri\(:~ on the road belwee n I A..1i.o plotted in the figure: the estimated regression line IS ~t.2 ) 1 _''~ .  There a re a pproximately 40. Ihey an:: pO inlO"1988 Jollars~ Llsing Ihc r onsomer Price: I i ndcx ICPI).. I Panel Dote 3 St some mis.:~firetcnt on the real beer tax is I 'fF.. put_into I~ dollars by adjusting for innation . if {..000 peo ple in tht' popula tio n in the st ate. ' lne measure of traffic ". b.1a is a scatterpl ot of the d:l1a for 1982 on two of these varia bles. ITo make the I~xt~ comp~r~ble ovCl" limc. .cl. I~?*' . wruch is the beer tax .~ .delll hs wc usc is the fatahtv rate. ]) ~ : In this cha pte r. including tbe num . OIS8eerTax l1982da ta ) po!I.e depends on the regress.M.  l7 10.' ln e measure of alcohol taxes we use is the " real" tax on a case of beer..M."""' .a.. hJd  . j. If.. wc s ludy how effec tive va rious government polic ies d~sigDed to discourage drunk dri ving aClUally are in reducing traffi c deaths. some dala were missing (for example... The tra ffi c fa tality data set has data for a ll 48 U.im ately o nethird of fa tal crashes involve a driver who was drink· ing. to 1 41'1 dr. and th is fraction rises during peak drinking periods..e _PJ (015) (9. stales for all se\e n yea rs.. Fig ure 10.23 in 1988 dollars .ltlVC. ." ~~=2. _.I ~ ~p wI~' if<.=iT "I the 10% Ic\'. I'..!r of traffic fatalities in eacb state in each year.. all t hese methods can be used with a n un balanced panel. O ne stud y (Levitt and Porter. although pre cisely how 10 do so .Appendix . and) A.. I3J _ "'~ I V I~.} given sta te.rc dc!'..n pract....lhe n the d a ta sel wo uld be unbalanced . the Iype of drunk driving laws in each state in each year. ng used. .! "'''~ _ ~ / _ 'y a lo..The methods presented in t his chapter ar e d escribed for a balanced panel ..~ ~ .000 highway traffic fatalities each year in the United Stales. The .1.10.! The dala. ho has not been drinki n!!:..f.nel. the . which is the number of a nn ua l traffic deaths per 10..t'.S.x: fat a lity ra te and the rea l tax on a case of bee r. I'br e){nmpk.a bl. (.I' . ana the tax on beer in each slate...
.0 0.86 . ...2. fie fatality (O~$ and !he real lev<.5 l'J Betr Tax (Dollars per c ase 51988) Fa la li ty Rail' . 5 3.. O I58mTar .0t.0 J5 3.5 Panel 0 is a KOtterp/ot of lraf.• '. Both plots show 0 posi · tive relolion$hip between the fotol ity rote and the rear beet ' .000) 1 .u I _~'_~_~_ _ ~_~_~ L 0.5 ' .0 fiJ~l€ = 1.O.liriu per 10.000) ' . .352 CHAPUR 10 Regreuion with Ponel Data • FIGURE 10.5 • •• • • • • Fillilft)fYlt " 2 01 . Fuality Rate (Fatalities per 10..0 15 10 05 • • • ••• ... 1 The rroffk Fatot.0 3.0 1.0 • ""' . 2.44BNrr.5 I0 1. .. • ~ .•0·· • . • • • . 2. Ponel b shows the do to fo r 1988. 0.0 LI_~:_~_~_~_ _~_' 0... *~ • •: .5 0.n . • ·'.0 2.o • • • ~.. ..ty Rate and the Tax on Bee... (Y~ 25 \0 Bu r Ta~ (b ) 1988 mu (Dollars per case $1 988) . .~ o.5 @ (Y. on a case 01 beer (in 1988 cIoIlors) for 48 stole! in 1982.0 15 1.• 10 15 o 2.. (9 .1· 0.
d.53 Because we have da ta fo r more than o ne year. to variable bias.3). we use OLS o .\ays arc in good repair.n the When da ta fo r each sta le a re obtained [or7 (Y J!e4. the n anothe r route is available._ ~ ..r:$ ~ ~"".3) t~ ) In contrast to the regression using the 1982 data."hese varia bles and add the m to [he annua l crosssectional regressions in Equa ITo ns ( 10.2 time per iods. we can reexamine this rela lionship for a nother year.:IU.' =. .t we cannot meaSUf. Unfo rtunately. This is done in Figure 10.t.86 + 0.. J o/~ 10 these pote ntial sources of Olllllled va riable bia s would be to collect data o n a U p. .<.Jfl J T r tf} J O~ a~d the~ willlea~ o~illed Y ~ . fJJ." ~ ~ ([) / () ~. 5' ~.. J.~/. might he very bard or e\'en impossibJe to measure. which is the same scatter plOI as before.~ regression with fixed effects. ~ I @a.e'. A ny of these fac tors may be correlated \'lith IJ . and \. .J:... whethe most driving is rural or urban.2) and (10. alcohol taxes..11) (0. Th e O LS regrc1'..... Many fac tors affect the fatality rale. If these factors remain constan t over time in a given sta te.13) (10. not fewer.s.sio n line thro ugh Ihese dalil is FOiOtity~ = 1.. Should we conclude that an increase in the tax on be!:r leads to more traffic deaths? Not necessanly.~~~~WithTwoTime {'..10. including the quality o[the automobiles driven in the state. whether the state high\. it is possible to tV I Ei~ ~~ ~~uA: ~~~<. To do s o. Curiously. PQsilife: Taken lite ral higher real beer taxes are associated with more. (0.e lb~m. some of these va riablcs:such as· the cullural acceptance of drinking and driving.) {isti c is 3. the density of cars "Oil the road. if they are. because these regressions could ha ve substantial onutted variable bias.2 Ponel Dota with Two TIme Period~: ·Before ond After' Comparisons 3. the coefficie nt on the real beer tax is statistically significant a t the 1% !eve I (the fst.. Beca use we have panel data . we can in effect hold these ~ factors conSlan l.44Beer"/(1X (1988 data). trafficfatalitic. even thoug!. =. \Y 1$ vmues of the depe:den~:abl. except tha t it uses the da ta fo r 1988.1 b.. .. however. the estimated coefficie nt fo r t he 1982 and the 1988 data il.\:he lher il is socially acceptable to drink a nd drive.43) . One approac h .Periods: " Before and After " Comparisons ~ ¥o.
:/t ing .\ here ~~ ~.I'N.. lA. = 1.. we/' Bccu use 2/ does not cha nge over time. + 1I.1 ~ lies over time m ust bave arisen from other ~ources.. driving and the traffic fatality . In othe.~ produce any c:1rang!J. not cha nge over time (so the I subscript is o mi tted ). + f3 2Z .. + {J1BeerTClxd'llt9!.'"Rather.Ti"l:(.. fhu~ .4) it will not prod uce a ny change in the ratali ty rate bc[\.. l)JefJ ~ .Z.") is the error term and.:a lly.! after" comparison in eIfeet holds constant the unobserved factors that differ from ..wc. If.!> changes in other facton. . n fatalities in the state... tion.8mTa.'li'11:0. ~ ulation li near regression rela ting Z.lnq. on~e slate tothc next but do not chan.r word ."l I~ Jf.!£rm 'which CilP ~ lUre.eSH 'f \. IJ.·fJ.I')~:!) + 1/11Y1l11  /I'I~' (10.o"er lime within the state.twee n 19M2 and 19R8:"'AccordingIy: th e p~p _ ~er". To see th is mathemall(.6) eli minates the effctl of 2.') Subt racting Equat ion (10.=fJ.~ ~ Iyring changes in Y a nd X has Ihe cUect or controlling for variables Ihat arc (lIn .Ie ~ p. IQiI2 ' (IlLS) f~~ ~. "" flo + {3 tBeer"{(I.\ cen 1982 and 19SX. Z. in this regressio n model... all: in /J • a stale.354 CHAPTER 10 Regression with Panel Dole period. B'y focusing on challges in the dependenl vari:J blc.j98S' ( 10. ho. which changes siOwl~ and lbu. ' Fatci/jfy Rml. = ~.t .I:I\'IK! + 132Z .d.~ (10. these other ~urcts are changes in the lax on beer or changes in the error ... . II" .~eM stant over time. J tftL r _ . but t. Oc a varia ble tha t dctc nnine:s the fa tality ra te in the jlh state.tvi"1r (p/?el P1 F(lta liry Rate.. 1n Equation (10.. any changes in traffic falall " •.. .! fatality fat e i~ fir\t T ho 01 2: _[~~.p<c t.<.FiiiiiIUlRat.:r Equa tio n (lOA) fo r each o f the IWO years. and the rea l beer tax to [hI. in the reg res~ i on model in Equution (10..l9AA .u. might be thl. + lI.l9N:. lhis "before anI.idered to be ~omi\a n t be.: FlIWlity R(I//!.!. ~ ~ Specifying the regression in changes in Equation (10..FtltlllilY Rmf'd9S2 = f3 1(HeerTflx." ancJ I >= I•. can be d iminatcd by a nalYling the cha nge in the fa tality ra te between th ~ two periods..5) from Equation ( W.therehy eliminating this sourcc: of omitted va ria ble bias.k fi.. ~.":c\"e(. could be Foru.. Le i Z. the influence of Z.7) This specificat ion has an intuitive inte. For exa mple.. T. 1982 and 1 9~: ' .7). that determine traffic deaths). Tl1U5. local cultural attitude toward drinking and dii"Ving. ... they did not change between 1982 and 1988 theathey dill n~1 l.7) eliminates the df!!c t f1~ of the unobse rved \ariables Zi Ihal arc eonstanl over Time. k· and driving affect the IcveLof drunl:. consicJc. gc .ln3· . Cultural altitudes toward dn.  Be£'.
the 4$ states in our data set.8) . is nonzero.2 presents a scatt~rplO I of the challge in the fatality rate between 1982 and 1988 against the ch(lllge in the real bee r tax belwt::cn 1982 and 1988 for . :ing lIy.J$~: ~#~  .. i : 0.uI9&l l and the chonge in reol beer Io.04(BmT. ~r Figure 10.065) (0.2 Panel Octo with Two Time Pericxk: '"Before and Aher" Comparisons 355 . C hllonge in Beer Ta x (DoUan per cau $1988) ).1)4 deaths per IO'(X peo JO ple.. 2 ratalities~r year pcrlO.zero is rejected at Ihe 5% sig nificance level.1..res . IS 'Ilk· . ! ().[. '..000) traffic fololity rote Lu )It 1 FII~irJfiilltl 'JU .OOO members ofthe population")_ ~~ pl'l"" . The hypothesis that the populatio n slope coefricicnt is.36) (10.Filldlit.:t.. 1982. • ralilfily Rme l')I'.J.6 ..2 represe nts the change in the fatality nile and the change inlhe real bee r {ax betwee n 1982 find 1988 fo r a given state.d fiGURE 10 .2 0. between changes the fatali ty role and changes in the beer lox .. : in h no< 'atali· .7) \ J. an increase in the real beer tax by $1 per case reduces the tranic fatality nile by 1. the estimated effect of a change in fhe reall>eer lax is neg.u\1IIIII .. 1.10.BeerTtLlt9W' (0.04(Beer ru·'·I~ .~.{)..l(os • 0.on 1 88.0._~~_~.2 0.~~ 0 I.i1 .R11tl962 • '¥ U~ IVII} . <l .0 • n. This estimated effect is very largc:1ne average fatality rate is approximately 2 in these data (that is.tti\·c..1 ana ~ con whe re including an inte rcept allows for the possibil ity that the mean change in Ihe ~ fat ality ratc..$.BwT.II' ~ In cont rast to the crosssectional regreSSiOn results..6) f 7.~ .¥ . in the absence of a change in the real beer tax.072 .. as predicl~d by economic' theory. 2 Chonges in Fatality Rates and Beer TOJ(es. A point in Figure 10. ' .L 51.~. .5 OJ) between \982 and 1988 for 48 sfote1.1988 This is 0 $COtterp/ot of the .. c change in the C hange in Fatality R2 te (Fatalities per 10. A) There is a negotive relotioluhip In .4 0..s..g ~ FllllililyRatt'I'»J1 n = cap Idf.l. m e OLS regression line. • OA 0. estimated usi ng these data and plottctl in the fig ure. According to lhis esti mated coefficient..(her '.5) &. .072 .
timllle suggests thaI trafficlatalitict..ch factors. Z. To analyze all the observations in o ur pa nel data sct. represents cult ural all ilud es towa rd dn(l\.. we use the method of fi xed effects regression.cont ains observations for seven different year\. can be c. that differ from one entity to the . The fixcd effects regression model has 1/ different intercept~ one for each entity. V. tm 10.2..: .<. lion (10... This "berore and after" anal ysis works when the data arc obse rved in Iwo dll fe rent yea r ~ O Uf data set.8) controls fo r fi xed ractors such as cult ural att itudes lo ward drinking anti driving. we undert ake a more careful analy'lh that euntrals for several s. Y. then their omission will pro duce oru illed variable bias.3 Fixed Effects Regression Fixed effects regreSSion is a met hod for controll ing for omit1 ed va ri ables in punel data when the omitted variables vary across entities (~ Iat es ) but do not change over lime. half merely byincreas. "But the "belore and after " method does not apply directly when T > 12.. But tht!re arc many fac tors that influence tra ffic safety. III Sectioo 10:5.. Unlike the "be fo re and after" comparisons of Seclion 10. fixed effects regression can be used when mere are two or more lime observations for each enti'ty. and if they chan~ over lime a nd arc correlated wilh the real beer lax. = /Jo + PIX" + /32Z . so fOJ now it is best to refrain from dfilwing a ny substa ntive conclus ions a bout the e ffect o f real beer taxes on traffic fa lalitlt'l. the regression in Equa. These binary variablesabsorb the influences of all omitted variable. These intercepts can be represented by a set of binary (or indicator) va n· abJes.4) wit h the de pendent yuriat'llc (Fa /(/fi ryRt/re) fin d observed regressor (BeerTtu) denoted a5 Yi. and it seems foo lish to discard those potent ially m erul additional data ..9) where Z. The Fixed Effects Regression Model Consider the regression model in Equation (10.356 CHAPTER 10 Regression with Ponel Data ~/:d ~ ?J' :t1 so the c. howc"er. ing thl! rcnltax on beer by $1 per casco By cxamin ing changes i n the fatality rate over time.next but are conSlimt over time.. is an unobserved variable t hat varies fro m one state 10 the next but J<)t"$ no t change over time ( for example. + " ii' (10. an d XII' re~rcc · lively: v..
10) a l '·· . We want to estimale f3 1 (he effect on Y of X holding: constant (he ' unobserved state characteristics Z. one for each s t ale.9) bUI doc<. in which Equation Y" = {3/X" + 0'. so we arbitrarily omit th e binary v~lriahl e Dli for the first group. the2Qpulation. the fixe d effects regression model in Equation (to. equal 1 when i = 2 and eq ual 0 othe rwise:and so on. ~. the terms . . like Z. in Equation (10. Because we have more than two slates. :0 sion model would apply here.4). + IIII' ( 10. we need add itional binary variables 10 ca pture all the slalespecific intercepts in Equa tion (10. '= f30 + IhZi' Then Equation ( 10.~~. .10).9) can be interpreted as having n inter cepts. .10) is the fixed errecls regression model. Bt!c(luse Zj varies from one stale to the next but is constant over time. but the intercept or the populatio n regression line varic::s from one _ stale to the next.1.(lO. Specifically. ~ Because t. rd drink treated as unk nown intercepts to be estimated.10 comes from consideringr ) ~ . To develop tbe fixed effects regression model using binary variables.Bl~ 'is the same ~~~". '" arc known as <ntil)' fixed effects The variation in Ihe . entit ies are ~ lates ). ..r each reach r) vari . The stalespecific illlercepis in the fi xed e ffects regression model also can be expressed using binary variahles to denote the indi vidual stales. a ll arc pant!l hange effects . If we had only two slates in our data set. tet a .regression_linc for the i!h state: this populatioru:egression line is . S~ b. 0 . rcspec ( 10. / O'j ~ ."n 'y fixed elfec's comes from omitted variables thaI .10) can be thought of as the "crfecI" . That pop ul ation regressiou line was expressed mathemat icall usi" a single binary variable indicating one of the groups (case # I in Key Concept R.he intercept a t in Equation (1 0. ~ . o di( years. that blOary vana I. 1h~ in terprel<i lion ofa. Sec tio n 8.BIX ir 'lne slope coeffici ent of the population regression line.3 coo sidered the case in which the observations belong to one of two groups and the populati on regression lin e has the same slope for both groups but different inter cepl ~ (see Figure K8a). Accordingly. be a bi nary variable that equals I when i = 1 an d eq uals 0 ot herwise: let 02.7). "'~ ( 10. being in entity i (in the current applicat ion. .10. let 0 1. . one for each state.3 Fixed Effects Regression 357 eas qua and ange pro ing and dri ving).. We c"nnOl include all 11 binary vari ables plus a conunon intercept. vary across entities but not over time. however.. as a slalespecificinterceptin Equation.lO) can be written equiva lently as v)i1!'f ~. the pop ulation regression model in Equat ion (1 0.. ""~r. for if we do the regressors will bc perfect ly multi coll inear (thi s is th e ·'dum my vari able trap" of Section 6.9). I the "II th t! 5S1 0 .w for all states.lcS' th<l1 ~' ar ia1.9) :tlysis wing lities. .lIe .
the en tity· n spccificaverage is sublrJcted from each variahle. + . Doing so results in lhe fi xed errects regression mode l with multiple regressor".X II • so a 1 = /30.I binary \'ariables. Econometric software therefore has spe cia l rou tines fOT OLS estimation of fix ed effects regression mode ls. 358 CHAPTER 10 Regression with Panel Dolo • Y" = f30 + PIXi..l1) nnd the intercepts 10 Equation (10. In Equation (10. In the second step. 11). there are two equiva lent ways to write the fixed effects regression model.. so in practice this OLS regression is tedious or.·· ·.ntitydC""meantd'· \"ariablCSl Specifically. for i ~ 2.in some software packages. This regression.10) and the bi nary regressors in Equation (10. impossible to imple· ment if the number of entities is large. l n Equation (lO.13)1 can be estimated by OLS. fo: an mn. has k II regressors (the k X's. compare the population regression lines for each state in the Iwoequations.I binar y regressors.2. If there are o ther observed delemlinants of 1' Iha l are corre lated with X and tha t change over lime . In both formul .the " . In Equation (10. the regresslOfl estimated uSihg "e. These spc. Ihen these shou ld al~o he included in the regression to uvoid omitted va riable bias..10). i.For the second and rcmai njng states. 'Y" are unknown cocfficienls 10 be esti mated. com ider the cas. it is wriucn in terms 01 11 sta tes[X!cific imercepts. ThUs. 1 5 . is fJlJ + PIX.111) . + y/.:ni on X is the sa me from one state to the next The sta le specific intercepts in Equation (10. Estimation and Inference In principle Ihe binary variable specification o f Ihe fixed eC ts regression model fec (Eq uation ( 10. Equations (10. Regression soflware typically Nm' putes the OLS fixed effects estimator in two s leps. T the first step.Jr in the veThion of Ihe fixed efiects model in Equa tion (10.11). + y)D3.iJI routines are equivalent to lIsing OLS on the (ull binary variable regref:osion. J 1). however. + y"DII . The "entitydemeaned" OLS algorithm. Ihe fixe d effecls regression model has a common intcrc~pt and II . summarized in Kl'Y Concept 10.! of a single rcgrC:»I.11) fT n w where {Jo' {3 \. the slope cocffici!. lhe J>OPulalio n regression equation for the first slale is Po + {3 .so a. 11) have tbe sam e ~o u rce: the un observed vari able Z. and the intercept ).. bUI are faster hccause they employ some mathemat ical simplifications Ihal arise in Ihe ulgebra o r fixed effects regression. wh Extension to multiple XS. + Y2D2.Iud is .f30 + )'. )'2. Ih at varies across slates but not over time..10) and (10.10). + /111' (10. To derive the relationship between the coefficients in Equation (to.lIion!l. .
hask+ n estimated by the OLS regression of the "entitydemeaned" vari ables y" on XI/" In fact .. where Xu.11) I THE FIXED EffECTS REGRESSION MODEl ~ .. obtained byestimationo} th e·~xcd effects model in Equation . there a re three ways to estimate {31 by OLS: ' the " before and a ft e r'· specificatio n in Equation (10. 1 \ = .~.r!.3 Fixed Effects Regression 359 ( 10. is the vaiu. regression model can be written in temlS of a com mall intcrcep t.( + .. = Y" . Thcsc three methods a rc equivalent..and Let~.n(IO.sion model :r.lO)an d O LS estimator {3] from th e bina ry variable specification anu fro m the "before and aft er " specificat ion are ident icnl if the intercept is excluded from the "before and after·' spt.2 f3 1X 1Jl + ... and so forth.· ltcse special ~ressio n ..10. EquivalentlY. t he entity he regression 'r the cast: of .xk j. whe re D 2. II rive the 'cepls in Ie in the ! the fim f :0 + f31 X ir . in the speci a l case that T = 2 the but 11 arise in the ~pically com p. . . • a" are en tityspecific intercepts. that is.. are de fined ·si milarly..Cifications.. + iI. = 13 ]X. .. fixed effects estimation .... the binary variable specification in Equation (10.X kJ1 + a t T . lhe fixed effccl<. = 1 if i = 2 a nd D2i = 0 otherwise. A lthough Equation (10.= f3)(x.:XII + U". the where i = 1. whe n T 2.10): then r !Ythat ~ = tha~Y.11) with its binary va ria bles looks quite differen t than the "before and after" regression model in Equation ( 10. 13) + "/:.X.~.. 11 and 1= 1. and so fonh. + . X" Ii.D3. th ey pro duce id e ntical OLS estimates. K£y CONCEPT The fued effects regression model is Yi/ = 10.Ie to imple Wusing 11  I ore has spe.11) have ( no t over 130 + f3IX ).. J)~~~~ 5. U.. and the "ent itydemeaned·· spec ifinni on in Equa tion ( lO. + f3~. (10. + cr..x. Y. tbe X's. Xz' il is the value of Iht: second regreswr.. .14). binary variables (Exercise HD'):': ) in practice . + f3l.:: of the fi rst regressor (or c ntity j in lime period r.( 10.. where = Xi' ..7) (without an intercept).. this estimator is identical to the OLS e&limator of /3.: accordingly. .Dnj + II".. . and II ... ll1uS Equationt10./~ D2. ThUs. .t II"  =J ~:~L Yif' and Xi and Ii..: :. + "/.l0) implies 7. (1014) !..). /111' (JO. tQuid also results in cd in Key take the avcrage o f both sides o f Equation (10. . . (If" Ii. and 0'1' •.1 binary variables rc presenting all but one en tilY: Yu =: ~ espccific .l2) grcssion terms o f on m odel lions. The "before and after " regression \IS. T. .11).7). ~~ . + .
to the fatali ty rate.ion assum ptionshold.Jlecls. The fixed e ITects regression assumptions and .ressions are no t identical because the ··Jif· {erences·· regression in Equa tio n (10. the estimated elleHi· cient in the fi xe'd effects regressiorfincquation (l(X"~) is nt:gar~t hat.8) uses only the li'!h.. if the four least squares as. the difference between those I\\'O years).15) u~s (he data for all seven years. standard errors.he sampling distribu. of Equations (iO.te'lting hypotheses (incl uding joi nt hypotheses using FstatisLics) and construcling confi . as pr.2) and (10. The two reg.ltor of Lhe variance thOi t is.5.3). : 'Irc~ . (or 1'982 and Il)~ (specifically..15) Ihan 11'\ EquatIon (ro..8).uQn. in m ult iple regressio n with pa nel dat a. lion of the fixed effet·ts OLS estima tor is normal in large samples. the estimated starei'ixedinle rccpb are not listed tOS<! I":' space anC:! bccau!'e"theyare norof primitry interest in this applica. de nce intervalsproceeds in exactly the same wa y as in multiple regression \\'Ith cross·sectional data. then !.w Jl. based on all seven years of data (336 observation s). and statistical inference. Because (lIth\! addilional Obscrvation~thc !1I'andard e rror is smal ler in Eq uiltioo.360 CHAPTER 10 Regression with Panel Dolo The sampling distribution... (1I115) where. Slruct confidence intervals.''i.Qi." Cj ~\? ~ Application to Traffic Deaths The OLS estimate o f the fi xed effects regression lin e relating the real beer t.. statistical inference.:t ~ r~gress i on in Equ:ltion (10.!Xl.\ (0. Like the "differences" specification in Eq uation (10. In mu l1 iplc regression with cross·sectional data. if a set of assumption'l_ called the fixed effects regrcs:.(lO.66BeerTu.!oite of what we roundTn the initial crosssection~eglession.4 hold. and the square fool o f thh cstim. the sw ndard crrorcan be used to tcst hypotheses using a (statistic and 10 cun.umptit)O\ in Key Conccpt 6. mated from Ihe data. then thc sampling distribution of the OLS estimator 1\ normal in large samples. is hilaiitl~ .2Q) + S((lt(·Fi. Similarly. Given the standard error.. · dieted by economic tbeory. \) ' .H.8)..h J ..t:md ard errors for fi xed effects regressio n are discussed furt her in Section JU. the variance u{ that distribu tion can be cSl il11uted from the data. O.. as is conventional. whereas the {ixed eJfc. and the standurd error can be used to construct !·statistic\ ami confidence intervals. '111e variance of this sampling distrihUlion can be CStl.. bifieneal neenaxeslfre associated withrewer fi affie death ~hc oppo. . the s{j uare rOOI o f Ihat estimatur is the standard error.
that vary across states but are constant over time within a state. For example. then omitting 51from the regression leads to omi tted variable bias. represents va riables thai determine YII' if S. its influence can be eliminated because it varies over lime but nol across states. Still. is correla ted with X". lcads to the fi xed effects regression model in Equation (10. is unobserved. over this period cars were gellin g safer and occupants we re increasingly wearing seal bel ts: if the rcal tax on bee r rose on ave r age during the mid1980s. just as il is possihle 10 elim inate the effeci of Z. Our objective is 10 esti mate 13" conlrolling for 5. which varies across state!:! but not ove r lime. then i1 could be picking up the effect of overall a uto· mobile sa fe ty imprO\Cmen ls.\S.9) can he modified to include the effect of automobile safet y. + /fil' (10. suppose tha t t~e va riables Zf arc not prese nt. is unohserved.4 Regression with Time Fixed Effects l 1 r lax 10 lust as fixed effect s for each e ntity ca ll conlrol for variables thai are constanl over lime hut differ across entities. Although 5. rematns. they serve to red uce traffic fa talities in all sta tes.10 ). so can time fixed effects control for variables that are constant across e ntities but evoh'e ove r time. Beca use safelYimproveme nts in ne w cars a re inlroduced nationally. the presence o f Z... if is plaus ible 10 think of au to mobil e safe ty as a n omitted va riable Iha t c ha nges over lime but has the same vsl ue for all states. k1 cO¢ffi Time Effects Only lhe ··dir· d 1988 e{icC~ sc of tlte 1 ) titan in For Ihe moment . = ( Ill.15. 16) to sa.. so thai I~e te rm IJ2Zj can be dropped fr om Equation (10. listies esting confi n with effccts 10. then we can elimina te the ir in fl uence hy includ jng lime fixed effects. alt hough the te rm /335. which we will de note by 5. safely impro\·emenls evolvcd over time hut wcre the same for all sla tes.tribu ce of Imator Including sta te fixed errects in the fatality rate regression " lets us avoid omit ted variables bias arising rrom omitted ractors. + (3. a skeptic might suspect tha t there arc other raclors tha I could Icad to omi tt ed vtl riabJcs bias. The populalion regression in Equalion ( 10. and where Ihe single "t" subscript emphasizes thai safety ch anges over time bUI is constant across states.4 Regression with Time Fixed Effecb 36 1 In 'lions or is csti ·that con ns .. l!..16). If. in . In the entity fixed effects model. Because 13.10. such as cultural attitudes loward drinking and dri\. e where S.! 5) 130 + {3\X" + fJ1Z.: Yi .li? {vJ 1/u!' "1. So.ng. however.
. In Ihe lraffic fatalhies regression. l)rare unknown coefficients..1 bin ary indicators. Tht: combined entit). ~ us to eliminate bias arising from omi tted variables like nationally in troduced safe l)' standards lhat change over lime but are ( h~ same across states in a given year.. ation in the time fixed effects comes from omitted variables tha i. so..IY) .."" as lime fixed errcds.::gression mode l cnn be represent cd using n . (lO..I binary indicators: t YII = Al + (31X . in (his version o f (he rime e(ft:cts model tbe intercept is includeo. and so fortI) . the lime fixed effecls specification a llo\. Just as the ent ity fix ed effects r.) is omilled 10 prevent perieci multicollinearity. Equa tion (10. and where 82..so the terms " \ ..18) as wel l. ATare kno.. 17) b.11 ). the n Ibese regre~so rs appear in Eq ual ions ( 10. for each time pe riod..\ ' regressor is ~ Yu· ""P~>'. .16). because S.17) can be though t of as the"cffect"" on Yor )'ea r I (or... + AI + /1 //. more gencr_ a lly. + II tr (1 0. then It is appropriate to indude 1>01h en tit y (SCLtl!) (flit! tim e effccts. while others nrc constant across sta les hut vary over lime (such a~ national safety standards) . and time fixed effects regression model is Y. 10 Regression with Panel Data wh ich cilch state has its own intercept (or fi xed effecl). r t () otherwise. When tbe re <I re additio na l o bserved "X" regresso rs. caD the time fixcd effects regression model he rep resented using T . Similarly. vary over lime bUI not across en tit ies. Ihe intercept A in . A.\.I + ~B21 + .. + Ih8T. in Equation ( 10..362 CHAPTER...a.. tne presence o f S.e ~ f. time period /)..1n c \<lri. + 111/' ( lO I S) when! 8 2.17) and ( 10.I' cul tural norms ).. "~ Th is model has a different intercept. Both Entity and Time Fixed Effects If some omitted variables <lfo! constant over time hut vary ~H:r os s states (such . like 5. = {J IX u . As in the fix ed e ({ects regression model in Equ ation (10. too. varie" over lime but not over s tates. leads to a regression model In which each time pe riod has iIS own intercept The time rh:ed effects regression model with a single . and the firs t binary variahle (B l .1 if t = 2 and 82.
well. 131 )'2" . the X's. Application to traffic deaths. + aiB2.. which is common ly implementcd in regrcssion so ft wa re. provides the same estimate of the slope coefficient as the OLS regression o f FmalilyRmt! on Beer Tax. in 11 balanced panel the coeffici e nt s o n the Xs can be computed by first deviat ing Ya nd the X s from their ent ity lind timepe riod means.l .20 ). + Yn Dn. .10. estimated using data for the two years 1982 and 1988.•• y". An equi valent approach is to devi ate Y. Aller native ly.64BeerTtu + SlOteFi. . The time fi xed eCfects model and the enti ty and lime fi xed e[fccts lO.2 1) . ( 10. When there are additional observed " X ' r~ g rcss ors. including entity and time fixe d effects. in which the change in Fmalily Rate from 1982 to 1988 is regressed on lhe change in Beer Tax from 1982 to 1988 including an intercept. + (0. + .ap.a" arC unknown coefficients. the entity and time fixed effects n:gression can be estimated using the :'before and aftcL:. then the se appear in • using rep Equation (10. "Ine combined Slate and time fix ed effects regressio n model eliminates omit ted variables bias arisiogboth from unobserved variables tbat are constant over time and from unobserved variables that are constant across slates. 'Ibis modelean equiva lent ly be represented using n .proach.I time bi nary indicators. is the time fixed dfecl.'«('(IEjjecf. is the entity fl. in ener \Can S. Fin<llly. Estimation.xed effect and A.. along wi th an intercept : y~ = lJo + f3IX.20) 0 . if T = 2. including the intercept in the r~es: SiOD~ThUS Ihe "before and afte r" regression re ported in Equation (to.20) as. T . Thus their coefficients can bo..25) TimeFixe(IEffect ~. IS) :2. then cstimaling the mUltiple regression equation of deviated Yon the dc\'iaied X~ Th is algorithm. + III/' (10. in where /30.8).= estimated by O LS by including the additional time binary variables. + BrBT.uL~tion Ip. I ill where a.19) and (10.. Addin g time effect s to the state fixed effects ( 10... el im in ates the need 10 construct the fu ll set of binary indica tors t hat appear in Equa tion (1 0.I en ti ty binary indicators and T . {such as Im~ {such ltV (state) model are bolh variants of the mulliple regression mode l.~ + 12D2. 19) ~ regression res ults in the OLS estimate o f Ihe regression line : " ityRme = O.17) A. and the time indicators from their state (but not time) means and to estimate k + Tcoefficients by muhiple regres sion of the deviated Y OD the dcviulCd Xs and the deviated time indicators.." Regression with Time Fixed Effect" 363 des. ()2"" . = 0 ualion la nd the ressors lIows us d safel~' year. + .
Il" . This esl imalcd relatio nshi p be lween the eeaJ beer tax and traffic fata l il ie~ is immune 10 omitted vari able bias from variables that are constant either over time or across states. c they are not of primary interest.. sl. so this specificat ion could still be subj ect to omitted variable bias.' nOlal ioo as simple as possible.6 therefore undertakes a more complete empirical examination (If the effect o f the bee r tax and of laws aimed directly at eliminating drunk drivi ng.64 /0.0.4) to panel data. The fi ft h assumption req uires the errors II" to be uncorrcl:l!ed over time for each enti ty. 47 sla te binary variables (siale rued effects ). Section 10.6 )'ear binary variables ( lime fixed effects). a nd an in tercept....3. THE There erfect cion:!> I.4.~. .364 CHAPTER 10 R egression with Ponel Dolo This specification includes the beer tax.lI~d for crossscctional data in Key Concept 6. rh~ first four of these assumptions extend th e fO UT least squares assumptions. this section (ocuses on the entity fixed effects regres sion mode l of Sect ion 10.15) and (10.'1tcgory. ( di 3.ual heteroskedasticityrobust formul a. the fifth assumption is impbu· sible. in wh ich case a different standard error formula should be useo.25 . so tbal lhh regression aClUally has I + 47 + 6 + I = 55 righihand variablelo! The coc (ficienh • on the time and slate binary variables and the intercept are nOI reported becau. L •• 5. and the coefficient on the real beer tax remains sig. many import ant detcn nimlnts of tra ffi c dea ths do nOi fall into Ihis c. To keer Iht.. Before turning 10 that study. For m Xl. in which there are no time effects. These heteroskedasticityrobust standard errurs are valid in panel data when Tis moderate or large unde r a SCI of five ass um p t h)n~ cnl led the fi xed effects regrcssion assumptions. .nificam at the 5% level (/ .56). The Fixed Effects Reg ression Assumptions The fixed crfccts regression <:I ssu mplio ns are summarized in Key Concept I O.21)].2.5 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects R~gression The standard errors reported so far in this chapter were com puted using the w. to panel data. However. 10. we fi rst di!K: u s~ the assumptions underlying pa nel data regression and the construction o f standard e rrors for fi xed e ffects estimators. ·In c fu st fo ur of these assumptions extend the fou r least squares assumptions for cTosssectiona l data ( Key Concept 6. controlling for a variety of factors. £ Incl udi ng the lime e ffects has li llie impact on 1he estim ated relatio nship between the rea l beer lax aDd the fatality rale [compare Equations (l0. 2. In some panel data 5enings.
. given all T values of X for that entity. because there already is a "Minnesota" fixed effect in the rcgressioncould result . arc unlikely: (Xlt • UI') ha\'e nonzero finit e fourth moments. X u should be replaced by the full list X UI' X 2.d.4 and implies that there is no omitted variable bias. Large outliers.sumptions for crosssectional data in K e ~' Concept 6. ~ The second assumption is that the variablcs for one entity Ilre distribuled 'iden tically 10.. ~ There are fh'e assumptions for the panel data regression model with entity fixeo effects (Key Concept 10.l2" .3 ~ I. the second assumption for fixed effects regression holds if e ntities are selected by simple random sampling from the population.tand this assum pt ion is to recall thaI II" consists of timevary ing facto~ that are determinants of YI/ hut are not includcd as regre~ors. (X. ···· X k. Th is ass umption is new aD docs not arise in crosssectional data.• II. specifically. Stated for a single observed regressor.i. COV{1I11' li/tI X i! ' X !2 •. XiT• II/I ' !ll2" .4 . 4. The errors fo r a give n entity are uncorrelated over time. o One wuy to unden. the vari ables arc i. 11 are i.1I' The first assumption is that the error term has conditional mean lero. 2.). S The Fixed Effetts Regression A$$umptioos and Standard Errors for Fixed Effects Regression 365 THE FIXED EFFECTS REGRESSION ASSUMPTIONS KEY CONCEPT ... condiljonal on the regressors. 11.. . X j2• ··· • Xj"/"... c <I time dime nsion. . conditional on the regressors. 3.. .tha t is.10.. E(u"I X Il . This assumption plays the same role as the first least !iquares assumption in Key Concept 6. one such factor is the weather: J\ part icula rly snowy winter in Minnesota. In the traffic fat alities applicalion. .2). The fifth assumption is thatl hc errors /I I' in the fi xed effects regression model arc uncorrdatcd ove.• X i"(' a il = 0 for / "* S. the five assumptions are as fo U ows: 10.d. The th ird and fou rth assumptions for fixed effects regression arc analogous to the third and fourth least squares a<. which do not ha . S. .i. For m ultiple observed regressors. the variables for another en tity: that is.I' X.. i = 1.. There is no perfect multicollinearity. Like the second least !iquares assumption in Key Concept 6. o) = O.. a winter with more snow thllO average for Minnesota.r lime. draws from their joint distribution..1. across entities for i := 1. but independently of.
Assumption #5 can be lyJt~' . conditional on the regressors.p~(. ·n )Us. .. In thiscase"':"ifT" moderate 'or lar.' they werc derived under tbeJalse assumption of homoskedaslicilY. If Ihe errors are aUlOcorrelated. over mu lIP e years. Simi. then the errors It..:r time within an entity are referred to a . then lhis om il1ed variable (snowfall) is uncorrela lCd from one year to the next.isl . ~~ . Omitted fac tors like these.. Autocorrd a tion is 3n essential and pervasive feature of time se ries data and is dbcussed in detail in Part IV.llterns ~Iowly adju!l1. a major road improvement project might reduc~ traffic accidents nOI only in the yt:!ar of completion but also in future years.> in Mjnnesota tend 10 follow in !\uccession. v \' \ . St andard errors thal. will prod uce corre \atlon m lhe error teml over lime ..366 CHAPTER 10 R egrenion with Panel Dota in unusually tr":<lcherous driving and unusually many fat(llllccid c nt ~ If the amount o f snow in Minnesota in one year is uncorrclate{) wit h tht! amounl o f snow in Ihe next year. if Uti COllSisLS of random factors (such as snowfall) that are unco rrelated from one year to the next . and the fift h assumption holds. at Jiffere nt dates) or serially correlat ed .ge then the u':iual (hctc roskcdasfiCityrobusI) standard errors arc valid . however. then the usual stand. Standard Errors fur Fixed Effects Regression }f Assumption #5 in Key Concept 10.It' Y"""' .lrd errors will nUl be valid becllUM! they v. . If lI u is autocorrclated. ln a rCJlr. th en (as discu_sed in Section 5.A dowl\t urn in the local economy might produce layoffs and dim inish commuting traffic. ~ ~ntity fixed effects. For exam.> as the worker. Appeudix 10. if unusuaU snowy winter. then Assumption #5 fa/ Is. conditional on the regressors. Si milal'l j . Staled more gene rally. : II 'elJ#Y" . cre derived uodcr thejaiscassumplion that they are not aJlo" correlated. . if tbe errors are hctcroskedast ic. conditional o n the regresso ~ ( the bee r IHX) and the state (Min n e~ota ) fixed e Uect.: are uncorrclated is ove r timc. if Ii" is corre· laled ovc r time for a given en titytheulI" is said 10 be autocorrelal ed (correlated with ilself.2 provides a mathematical explanation of why tht' u ~ual standard errors arc nOl valid if the regression errors are autocorrclaled.4) the homoskedasticityonly standard errors are nOt valid becJ U~1.:' · sian with cross·sectional data .3 holds. which rx:r. One way to see thisi':i to draw an analogy to hetero':ikedasticity..then the usual slandarderror formulft i ~ not valid. . thus reducing traffic fatalities for two or mOre ) cur. jf the errors in panel data are autocorrelaled. hClerosked asfid t~ · and aulocorreilitionconsistent (HAC) slandard C!rron .then thai y omi tt ed factor would be corrclated./'? r~e. Tht standard errllI" . If u ti is correlated with UI~ for different values of oS (Ind Ithat is.> look for n~w jobs and commuting p.tV. pie. then lIu is uneorre latcd [rom \car 10 year.tP""iI" vU~~. lariy. .are valid if llu is potentia II) heteroskcdllstic aod po~ o' tially correlated o\'.. The fifth assumption migh t not hold in some applications. pM SLt r'1 JA4/'~ Y' J vJtY'~ ~M:sWted as req uiring a lack of autocorrelation of /l lf' condit ional o n the A San d [he I .
L!reporlS a differen~ r~grcssion and each row r~porlS a ~OCfficient e~lilllale a~d slan . the state fix ed effects accoun t for II large amount of the variation in Ihe data..21 and (10. lbe regression"R! jumps from 0.10.·s when ti me effects are added. hollJingcconomic conditjons con A _ .'\ion (1) is the resuh of omit ted \'ariable bias (the coeflJdcnl on thl! rea l beer tax i~ .1. 8.'luding beer taxe~J.AS time fixed effects. suggesls Ih<lt the positive coefficient in rcgrc<. I • . ~ srant' lbis is done by estimating pand d~lIa regrcs!l.\i.21)]. 10..~ are ~ummarized in Table 10..es inaeoses traffic fata lilies! Howe\'er. which includes stale fixed dfecis.ter. The formal of the table is Ihe same as that of th e ta bles of r e gr es~ i on re sults in C haptt:rs 7.posiril'e (0. whiCh are one t}PC of HAC ~tandard error<.lered standard erro.!} arc uncorrelated [or .:rror.36) and the column (1) t!<.sttcand pvalue.P'll rcsuh in omi lled variable bias."2. <1... or othcr IIlformatlon aboUt the rcgres. States differ ~.(3) are e~ £IL. arc calltfd clu.ions that include regressors rep ~. and a ~ tat l.11\e resu lts in columns (1).O.loll.. .311. Column ( 1) in Ta ble 10.cd~ In this section. ~ . we exte nd the preceding.ly as Equation ( 10.. clu~tercd standard errors should I"lc used.n.090 10 OJ*l9 when fixed cffecls arc included: evidently. . cven in regress ion ~ with stale and tI.. or grouping.. ..~.. As in the cro~ sectional regressions for 1982 and l~AA rEquation!l (10..~ ' in their punishments for drun k d riving.tmitting these laws could produce omitted variable biAS in the O LS estimator of • ~~ h e cffeci of real beer taxes on tra ffic fa lalilies.. darderror. . the cocfftCtenron the real beer tax io. the cluste r consists nel of the observatio ns for the: . but assume that Ihl..r /tPe . .. increasi ng I:k: ~r I.timate is sta I~tic. I : When lInjscorrel hued over time._ resenling other drunk drh ing 13\\5 and state economic conditions... If M). ~..ally Signi ficantly diffe rent from zero at the ..\i not in the ~amc duster.6 Drunk Driving lows and Traffic Deaths 367 presenlcd in Appendb: 10. because vehicle use depends in part on whether dri· vcrs have jobs and becuusc tax changes can reneel economic condilions (a l)tale budget deficit can lead to tax hikl!~). In addition.lme entity at all times I I. f r l. 1 presents result s for the 01...lhe regressio n in column (2) Ireported previousJ) as Equation (10. In lh~ cJ ntcxt of au!ocorrdatio n in pA dOi ta.! th:1I cr:'leks down on drunk dri ? \ ing could do ~ across the hoard by toughening laws a5 wdl as ra ising (axt:!)..66).fIW'~: The result..al. and 9: Each colum n .S regression of the fatal ity rate on th t! real beer tax without Siale and ti me fi xeJ effecls. Lill ie eh ungl.. analysis to st udy the cffcct on traffic ~fatalitiesordrinKing law~ (im.6 Drunk Driving Laws and Traffic Deaths A lco hol taxes arc on ly one way 10 discourage drink ing lind driving.15)].5 % level: According 10 this esti mate.J . ~cau"c they allow the errors to he correlated within a duo. .(Jmilting state economic conditions also cou ld frt.... F~t. OlS reported in colum n (3) [reported prev iou.
~ • v :.: % • • E ~ c o " ~ • o ~. • • ..~S C'''  ':: >o~r. r c • o .r::""T"" [ 08 ~~ d2.
This estimated e ffect is la rge: Because the average fat. a reduction of O. According to the estimate in column (4) .50 = 0. general road conditions. This wide 95% confidence interval includes values of the true effect that arc very nearly zero.o. 1.45 x 0. repo n ed in co lum n (4).0 1).1 include additional polential delermi millts of falality rates.6 cOnSis[~Dt Drunk Driving laws and Traffic Deaths 369 " with tbe omitted (ixed factors.000. 19. the effect of a $0.0.696. 'illesec ond set of legal variables is punishmen~ associated with the first conviction for driving under the influenceof alcohol. tax: because the average rcal beer tax in these data is approximatcly $0.and the logarithm of real pmdollars) personal income .2). The three mea sures o[drivi ng and economic conditions ltre average vehiclemiles per driver.ality rate is 2 per 10.m.22. the 95% confidence interval for this effect is . TIlt! base specifica tion. incl udes two SCIS of legal variables relaled to drunk driving plus variables that control for the amount of driving and overall stale eco nomic condilions. The minimum legal drinking age is estimated to have very lillie effect o n traf fic fatalities. Ooe way to ~vtlluatc th c magnitude of the coeffi cie nt is to imag· ine a state with an average real beer tax uoubling it!."al variables reduces the e$fimated coefficient on the real beer lax.96 x 0.50 ::!: J. 2.50 = (0.50Icase. rep resented by three binary variables for a minimum legal drinking age of 18.10.i.44.0. the esti mate is quile imprecise: Because the standard error on this coefficient is 0. This said.000.egression in column (3). The joint hypothesis that the coefficients on the minimum legal drink ing age variahl es are zero cannot be rejected I I the 10% significance Icvcl:The Fstatistic test ing the joint hypothesis thm the threc coc(ficie nts are zero is 0. either mandatory jail lime or mandatory community service (the omitted group is less severe punishment).e. with a pvalue of 0. The estimated coefficient (0. TIle next three regression s in Table 10. and 20 (so the omined group is a .77 per 10.48. the estimates arc small in mag / . relativc to the r.lega\ drinking age of21 or higher).22 x 0. [neluding theilddili~. population density.50 increase (in 19M dollars) in the beer tax is a decrease in the expected fat ality rate by 0. . attitudes toward drinking and driving.imum. TIlC regressio n in column (4) has four interesting results. along with ti me and slate effects.23 dCilih pcr 10.2 ~ cor responds to decreasing the fatality rate to 1.this en tails increasing the tax by SO.historical Bnd cultural factors..000. the unemployment ratc.45) continues to be negative and statistically significant at the 5% signif icance level.45 X 0. Mo reover.50 /case. The firs t set of variables is the minimum legal drinking age. and so forthbeing important determinants of thl.! variation in traffic fatalities across 'Hates. capita (using the logarithm of income permit! the coefficient to be interpreted in terms of percentage changes oi income: see Section 8.
's_ . ficicnts on those variablcs. hen income is high.er fa talities: An incrca\c in the unc..U~SSIOn in'column (6) examines the sen~itiv.drinking age il::. ~:IM .v/ r '.of O.:>:timateU effcct of the real beer tax.ludc. ~ ~~ · ' 1111':: finol column in Tablc 10..thc . as d~ussed in Section 10. ~H/IJ~crea.1 % gmlu.e m the .. "Ion constant.. For cltamplc.lIC \\Jlh aminimum legal drinking age of 21. A ccording to these esti mat es...5 and Appe ndix. fOr inl~r prclatioll o r th is coeffici ent) .thc conclusion from regression (4) that the coefficients 011 V.'d with tbe sta tistica l signi fica nce of thc H~f. . ll1e estimated cocftJdl'nl~ ~] . 4. .:au.} uf lhe results to u::.Therel!l.81. i?coilimm. J J . umn(4). perhaps ncc. / rcrcd standard ~rrors thut allow foraulocorrelati~n of thc error term wilhin an entity. The rc::.OOO. the only difference is the slandn rd errof".2.A ~{0 ."'gnui. pun· \/ ish menl \ariahk!>. a sto le wit h a minimum legal drinking age of IS is C\l j.V~ f. holding the other factors in the rcgn....1)63 death ~r IO. Th..lln In [ the ba'>C ::.ults 01 regrc~ion l4rare norscnSiti\'e to these chan!:\~:>. so a I ~ increase in real per capilli income is associalCU wilh sin increHsc 10 I!"df. ("()n::..10.peclllc. lOdica tes that the e<:onomlc variables should rem.[ 370 CHAPTER 10 Regression with Ponel Dolo n.02S deatb pcrlOJXXllhan a "t<. hut wit h ~Iu" ~ .:. good ceo· nomic coO(. Jy of these conclusions to changes 111 the base specifica ti on. . ('he other codficlents: the sensitIvity oflhe estimated beer tax coeffiCIen t 10 Indud ing the econom ic varillbles.ltlon. . canee Ic\el (the Fstatistic is 0. but no appr~ci<l blc chan 1!c In ~r.'Simiiarly.nificanll~ diffcr~nt from 1ero al th~ l O'li'.<llSI death per 10..lImal cu to' reduce traffic (atnlittcs by 0. ities. com binl.17).2':1 )..litions are associated with higher fata lities. n . in 1.llaJ. llle I ~ustered standard errOfS in c()iumn (1) arc larger than (he standard crron.: res ult IS an ..1 is Ihc reg.ression ofcoTumn ( 4). high vulues 01 r~al per capita income are associated \\.ith high fatalities: The coefficient i!> 1. (4) and (7) art: the same." Columns (5) and (6) aCTable 10 ..'01 ~_ .mpioymeDt rate by one percentage point is e<. TIle regression III 1:01· ~*". The economic variables ha\e considerable explanatory power ror traffic r. The coeffitknls on the firsroffcnse punishmcOl "ariublc~ are also cslima\qJ 10 be small and jointl~ insig..r~ t.000 (see Ca!>e I in Key Conccpt t! 2. fie fat ulitics.hUercnt functional fonn for the drinking age (replacing the thr~C indicator variables ""ilh. _p [k. High unemployme ll t ralcs a rc associa ted with rc\O.~u mn (5) drops the \ar iabks thai control for economic cond~tlons..>e of im:rcused tfaffic density when theunt:mployment mle is low or greater alco hoi con!>umption .equcntly.. mated to have a fatalilY rate higher byO..1 repon regressions th ai chec!::: the Se n"I\I\' ~ I /J I .df) and combining the two bina t.:ancc Icvel '(the F·statlstlc IS 38. 3.lOg a l. The two economic vari ables art: jOin tly '\ignificMt at Ihe0.
11~llh" <.~~) ..: in co l nlS on be used to control for unobserved omilled \ariables thnt differ across entitics but : u ~ou lire mlcreqcd in seeing funher anatY\lI' of Ih~ da ra.. bm\c"cr. A subtler po<.si hihty IS that hlk~ III the real beer tax could be a<.me Fstatislics in column (7) arc smaller than thost! in column (4).. however..ing) OLdo.7 Coodusion 371 .u~. hul. in the rcal heer ta\": could pick up the effect of a broader campai~n to red ucc drunk driving..89.<o:t:.intly siti.09..so traf· for ceo I. it is import ant to think about possible threats to validity. .~ 10. T11...<.. pcrhaps in response to political pressure.0.ing :llcohol taxes."C JntcrCloh:d IOd . ard drinking and dri ... One potential source of o mi tt ed variable bias is that thc measure of akohol taxes used here...7 Conclusion l b is cha pte r showe d how multiple observations ove r time on the same entily can ! clm h in an Icien l " . Ibese resu lts present a provocativ'e picture of mca~urc<."1._ M.". the 95% con fidence interval for the e ffec t on fatalili e!> of II cha nge in the beer tax using the IIAC ~ I andard error..ociated \\ ith puhlic eduC<lltoll campaigns. neither stiff punishmen ts in tbe minimum legal drinking <lge haH' important cffccts on f:1lal itie~_ r~ cod· ain in oflhc three . 0. to COll\rol drunk dri nor incrt! ase ~ aleo· .ests interprding the results as pertaining more broadly than juo.valuc~_ One substantive difference between columns (4) and (7) arises because tht: IIAC standard error on the beer tax cod ficient i!l larger than the s tandard error in column (4)_ Consequently.ik. I. . ~ ~ .not \ary across "'ah~s (like ":)lJfdy innovationsJ.20).. an: not stuti:.. wider than Ihe interval [rom column (4 )_ ( .ing the II AC standard error includes L~ro_ led tfi· tal· An d 10 real I.1. According 10 thc~c c:... does reduce traffic death."'(:onomlO of akvbot mOR !lC~rall~...!.. but there ure no qualitative differences in tht! \WO set!> of J"statislics and p .t 10 beer...oJ)..QpI." col i!lan gein dud· . to. could move with other alcohol taxes: th. is imprecisely estimatcd .. chan~e:..limalet\.10.'" s ugg. ( .. the re"lta x o n beer..O. ti· ith '\'!s drun k driving laws and legal d rinking age. In Io::i mmg JnOn' arout drunl dri\ on" . pun ang~ In contrast. ~ The Mrengtb of this analysis i~ that including Mate nnd time fixed effects mit igates thi:: threat of omitted v'ariable bia\ ari'1Ilg lrom unobserved "ariables that either do not change I.e of V . ..lVl!r lime (like culluml itlliIUd~:.f so. As always......rs. as mcasurcd by the real tax o n beer..tically significant also obtai ns u~ing the HAC standard errors in column (7). io.. there is somc cvidence thatlllcrcao.e O~I g :md traffic fatalities.: Ruhm . . and the interval computed u.. Thc magnitude of Ihis effect..~u .
pl u~ the: }(' 5 and an inte rce pt.. With panel da ta.¢ are cons tant over li me. 3. then I!. 5. firms. panel da ta methods requi re panel data . If cul tural a ttitudes toward drink ing a nd driving do no t change appreciably over seven years within a slale.\:pld na lions for c ha nges in the traffic fatality ralC over those se\'cn years mo.)1 lie elsewhe re.t jty and time fixed effects can be included in the regression to control for variables that vary across entities but are constant over ti me and for variablesI hat vary over time but are constant across en lilies.A powerful and general method for doing so is instrumental va riables regressio n.hanges in the dependent variable must he due In influ_ e nces Olher than these fixed characteris tics.stutes. Th us there remains a need for a me thcxi that call eliminate the influe nce of un observed omit· ted variables when panel data methods cannot do the job. tha t is.:111 1' ties bUI vary ove r lime. bina ry variables fo r T . which often are not available. t¥>viously. fixed effect regression ca n be estimateo hy a " before and after" regression of the change in Y from the fir st pe ri od to the ~cc · ond on the cha nge in X. 6. Despite T hese virtues. ~~e'. PC('lp1e.lI diffe r froO) one e nt ity to lhe ne xt but remain constant over time. you need panel data . Entity fix ed effects regression can be estimated by induding bina ry variable" for /I . The key insight is Ihal if the uno bscncd variable docs O(lt change over time. you need data in wh ich the same enl ilY is observed cli two or more time periods. entity and lime fix ed effects regressio n cunnot control for o milled varia bles that vary hOlh across entities ami over time. Whe n th ere <Ire two time periods. . this is the fixed effects regression model. . 2. then any c.j'mmar y r / J/ I . stant acro~s entities. the mul. To e xploi t th is insight. Panel data consist of obser vations on multiple (n) entities. Regressio n with entity fixed effects controls (or unobserved variables Ih. tiple regression mod el of Part 11 can be extended 10 include a full se t o( entity binary va riables. plus the observable independent va riables (the X's) and an in lt:r· cepl . which can be c ~t i m LlIcd by OLS. Time fix ed effects control for unobserved variables Ihal <Ire the same acros> . And.1 entities.1 lime periods. A twist on the fixe d effects regression model is 10 include lime l'i xed e ffccl~ which conlrol for unobserved variables thai change over lime but (lrc con. 4. Both en.I e ntities. a nd so forthwhere eac h e ntity is observed a t two or more time periods (11.I 372 CH APTER 10 Regrenion with Ponel Doto g»?~ V<""~ gtJI. the topic of Cha pter 12. A regression with time a nd e ntity fix ed effects ca n be estimated by incl uding binary variables for 11 . .
1 W hy is it necessary to u. .1 million people... N. G ive so me exa mples of unobserved personspecific varia hies tha t are cor re lated with bot h education anJ earning'$. 1 refer to? A researc her is using a panel dala sct On n = 1000 workers over T = LO years (from 1996 to 20(5) tha t contains the worke rs' earnings. Conslruet a 95% confidence interval for you r an~ .1 'Illis question re fers to the drunk d riving panel dal<l regression !>ummarized in Tal:llc 10.e two subscripts..2 does i refer 10? Wha t dOel. Suppose that New Jer!>Cy increased (he tax o n a case of beer by $1 (in $l988). to describe pane! data'! What 10. New Jer..ey has a populat io n of 8. Exercises 10.er.!ffc:c1S in a panel da1a regression? 10. Usc the results in col umn (4) 10 predict the number of lives Ihat would be saved over the next year. e ducation.Exercises 373 Key Terms panel data (350) on lanced p.3 Om the regres!>ion that you suggl!slc d in response 10 q uestion 10.trchcr is inte rested in Ihe e ((ecI of education o n e arni ngs.2 be used to csti mme the effect of gender on an individ ua l 's earn ings? Can that regres · sion be USed to esti mate th e effect of th e Illtlional unemployment rale on <I n individual's earni ngs? Explain .... ge nde r. ' Ille rese..mel (350) unbalanced pa ne l (351 ) fixed effects regression model (357) entily fixe d effects (357) time fi xed effects regression model (362) time fi xed effects (362) e ntity and time fixed effects regression mode l (362) a Uiocorrelatcd (366) seria lly corre lated (366) h e tc rosk cda~" icilY· a nd a utocorrelation consistent (HAC) standard e rrors (366) clustered sta nd<ITd errors (367) Review the Concepts 10. and age.1. C1n yo u tbink of e xamples ofti.m e specific vanabk:s tha t might be correlmed with education and earnings? How would you control fo r Ihe!>e p e rsollspeeific and tim e !>pecific I.i and I.
Entit y I in time period 3? c. list to the e mpiric..2 gave a list of five pOlelllialthreats to the interna l \lalidit~ l) ( a reg ress\on study. D Ii: that is. D3i . except with an additional regressor. A researcher conjectures that the une mploymen t nHe hm. Does this mean t ha t the estim ate in (5) is more reliable? f.ll ) . expre ss one of tbe va ria bles 0 1" 02. Entity 1 io time period I? b. 10.t = I (or all i. that is. What will happe n if you try 10 estimate the coefficients of the regres sion by OLS? 10. Construct a 95% confidence ioterval [o r your a nswer. Su ppose tha t New Je rsey low."ll a na lysis in Sec tion 10. S how lhal the binary regressors and the "con· stant " regressor are perfectly multicollinear. H ow wou ld you test th i!'..2 Conside r the binary variable version o f the fixed effects mode l in Equation ( IO. + u". where XQ..374 CHAPTER 10 lI:ogreuionwi!hPoneIDoIo b. a different effect on traffic fatalities in the weste rn states than in the o the r ~tate~. hypo thesis"? (Be specific aboulthe specifica Lion of the regress ion and th~ statistica l test you WOuld use. Use the results in column (4) to predict the change In the nwnber o( traffic fa talities in the next year. let Yll ::: f30 + (31)(" + ylD l i + 'Y202. a nd Xop as a perfect li nea r func tio n of the others. Show lhe result in (a) for genera l n . The d rinking age in New Je rsey is 21. dence interva l for your answer. Entity 3 in time pe riod I? d. in the next yea r. c. Suppose that real income per capita in New Jersey increases by 1% in the nex t year.6 and the re by draw conclu sion s about its internal validity. Enlity J in time period 3? .(" On. d.. Apply thi!'. Construct a 90% cont i. b. It. e red its d rinking age to l K Use the results in column (4) 10 predict tht.) 10. + . f.3 Section 9 .The estimate in cohJmn (4) is significant at the 5% level. Should time e ffects be include d in the regression? Wh)' o r why not? c. S uppose tbat 1/ = 3. what is the slope and intercept for a. + ". change in the number of traffic fatalitie!'. The es timate of the coefficient on bee r tax in column (5) i$ significant at the l % leveJ. c. 11 )..4 Using the regression in Eq uation ( 10.
a" . whe re Ot... = = ..or YII = J3I X IJ/ + a. 0 2. .. ) = 0 for I s in E qualion (10. = X i/{JI + a j + A) + ifu. + ilu' ) 10. Show that cov(V. + Ujr') b. so thai Slales with more snow will have more fata lities th an o the r states.. and the worker's earnings in the . The researcher collects data o n the ave rage snowfall (o r eac h state a nd adds this reg ressor (A vaogeSIIQlVj) to the regressio ns given in Table 10.. + y"Dn l + lI il .PT)? 10.9 Exp la in why Assumption #5 gjve n in Section to..8 Conside r o bserva tio ns (Y. + . H ow are the coefficients (13 0. b.. 1 = I . . + . = 1 if f 2 and 0 othenvise.s small (say. 10.7 A researcher beLie\'cs that traffic fataliti es inc rease when roads are icy.1 ..N. + 11/ model also can be written as 3 75 + un' This Y!/ = {JrJ + {JIXIJI + °2 82. . In the fixed effects reg ression model . H ow would you estimate /31 ? 10.. T = 4).. . + Air is an unobserved individua l·specific time tre nd.. i= I •. The researc he r collects data o n (he snowfa ll in each s tale fo r each year in the sample (SIlOlV u) and adds this regressor to the regressio ns.. * 10. Com· men t o n the (ollowing me thods designed to estim ate the d fect oCsnow o n fa ta lities: a. education.) related to the coeffi cients (al" .Vi.. do you think tbat the estimated values of 0".r Xu) CroOl the linear pane l data mode l Yj .6 Suppose that the fixed eUeels regression ass umptions from Section 10. whe re 8 2 .. T. _ on )'2" '" y.. are approximately normally dis trib· uted? Wby or why not? (Hint: Analyze the model Y II = a ./ = for 0 .10 a.u" . union status.28). (I re the fi xed e nti ty effects.5 Considerthc model with a single regres. + 51 STI + "Y2D2.11 rn a study of the effect on earning~ of ed uca tion using panel dala on annual ea rni ngs for a large number of worke r". ai. and so fOrlh . .co n· sistently es timated as n i.Exercises 10. a researcher regresses earnings in a given year on age.5 is im portant fo r fixed e ffects regression. 8 2_. If 11 is large (say. n = 2(00) but Ti.. .. What happens if Assum ptio n 115 is not trul!? 10•. 1 if i = 2 and 0 otherwise.5 are sa tisfied.00 with T fi xed? (Hin t: Analyze the model with no X's: Y .
.SIUII/rmll <I" R. as measured by stat istical significance? As measured by Ihe '"realworld'" signifi cance of the estimated coefficient ? iii. and previous year's earnings) on earnings? Explain.l S!JDlC u. for the years 1977. a. "ShOOHIli 1. 0 0 the re ~ ults change when you add fixe d state effects? If so. Oppone nts argue lhat cri me .rip tion is given in Guns_Description. pbl064. Will this regre'>sion gi\lc reliable esti males of the effccts of the regressors (age. m increase because of accide ntal or sponlaneous use of the wea pon.. Proponcnts argue tbat. and pm 1029. tI9_ t312. density. \\ hidl sct of rt. and why? 'Thc~ UUI3 "cre provided by Professor John Donohue or Sianfonl Uni\'e~IIY and "~r( u<. ' . and Ihil l could Cllllse omi tted variabl!: bias in regression (2 ). (Him: Check the fixed cff ect. These laws are known 3S "shallissue" la\los because they in'it ruct local authorities to issue a concealed weapons permit to all applicants who arc citizens. Interprel the cocfficient on shall in regression (2). Does adding the con lrol variables in regression (2) change t he <.: J HI hi' paper WIth Ian ' \\Tel. union statu~..) Empirical Exercises EIO.s. il more people carry concealed wea pons.lm.. avgillc. pwlO<H. and have not been convicted of a fd{l]1\ (some states have some add itional restrictio ns). In this exercise. Estimate (I) a regression of In(vio) against shall and (2) a regressIOn af In( . plus t he District of Columbia.>0""" the 'More Gu~ Less Crime' HypoIhc$I'.!'!II' mil ted eHect of a shall·carry law in regressio n (1). On the textbook Web sitc www.5..·jo) against shall.... P()P.:ed cffects regression.II<'" 2003: "5.. Is this I<lrgc or small in a " realworld" sense? ~'sti mal e Ii. illcurc]fIle. i. st:\t~s. are men ta lly compelCn l. available on the Web site.o.S.} A detailed dc. regression assumptions in Section 10. Suggest a variable that varies across sta tes but plausibly \nrics lil tl eor not at aJ1ove r tim e. b.3 76 CHAPTER 10 Regrenionwith Panel Dolo previous year using [v.!gressian results is ma rc credible. crime will decline because crimilhlb are deterred from attack ing olher people. education.awbccomlSlock_watson you will hnLl a data file G uns that CO/Hains a halanced panel of data from 50 U. stat cs have enacted laws that allow citizens to carry concCa kd weapons. you will analyze the effect of concea led weapons lawf> on violent crimes.
"~ on T>mmg HchJ\ ror Hnd Truffic nnal! lI~~ Tit. and wh) "! d..'11)97. wha t arc: the 1110"t importan t remaining threat<. Using the results in (e). In(llI(:ome). Is it la rge? Small? How many lives \~ould he saved if seat be lt use increased fro m 52% to 90'10? f.' I.is most reliable'! Explai n why. ha08. R epeat the analysis using In(rob) and In(m"r) in place of In(vio). <. Do the results change when ). In your view..ion ill1al)'si~'! r. plus the D iSlri ct of Co lu mbia.~ A detailed description is give n in Seatbelb_Dcscri ption .ou add time fixed errects plus sta te fixed effects? d.discuss Ihe size of tht: col'fficienl on sIU1se(l1:c..ve these 103""'5 arc in increasing seat hell U~ and reducing fatali ties.' If so. c. spcC'lf7(). DOI!s lhe estimated regress.t the driver if the officer observes.ion suggest that increased seal belt usc reduces fatali ties? h. "f .. ··11n: Effccts of Mandalorv Seal Bcll U.ou add slate rixcd effccts? Provide an intuitive explanation for why the rc~u l ts changed. Based on your analysis. Which re gress ion spccification(a ). a. Slales.': ~~(4) IQK_&n . (b) .I seat belt : "secondary" enforcement means that a police officer can write a ticket if an occupant is not wearing it sell helt. wh:1I co n clu~ions would you draw abou t the: effects of conccilled\\ capon laws on these crime rmcs? £10. Through various spending polici~s.1Il OC(:upant not ~earing . available on the Web site.2 Traffic crashes are Ihe leading causc of dcalh for American~ bCI\\l!cn Ihe ages of 5 and 32. and age. In this exercise you will in\cstigate how effecti.Empirical o:S'Cises 377 c.. Do the results change when ).rl\· and "'crc IIscd In hIS po'l!)(r with Alma Cohen.aYl·bc.".pud65. or (c).comJsloclC YI·ubo n }OU will find a data file Selllbclts that contains a panel of dilla from 50 U. e.. Estim ate the effed of scat bell usc on fa tal it ies by regressing Fa/afifyRme on sb_lI ~eage. hut must havc another '1'11'::<.S. Do the results change when you tltJd rixed tinH..: data were provIded by Pro{~ Lrran Einav ofStllnfonJ Um\co.. drill knge21.0."cmUII/lief and SfaflSlICS 200. e. H.lhc federal government ha~ ~ ncouragc d states to institule mandalor~ scat hcltlaws to reduce the number of fatalities and serious injuries. 10 thc in t ~rna l validi ty of this regresc.!ffccts'. On the text book Web site l\ wl\. which set of regre~ion results is more credible. 111ere are IwO ways that manda tory seatbelt law" are enforced: "Pri ma ry" en forceme nt means that a police officer can slap a car and ticke. for the years 198.
Bu reau of Labor Statistics.s.~ morc gcn erally. In the data set.1 dc. or 20.1 The . hether the legal dri nking age i~ 18. De partment ofTransportiition Fatal Accident Re porting \~ ere Sy~tem.000 people \iving in tha t state in that year.5.He 's minimum ~enlcncing rcquireme nts for an initial drun k dri ving con\·lctton: " Mandatory jail'!" equals I if the state requ ires jail time and equals 0 otherw l~c. The two binary punishment va nabks in Table 10. R uhm (If {he: Department or EconomiC!'. Data on the total vehide miles traveled annually by Slate were obtained e(IU from the Department of Transport3tio n.sis. In(illcume). speed65. 10 Regreuioo with Ponel Data reason to ~ I Op the car.. primary is a binary variable for primary e nforcement and second. Personal incomt' was obtained from the u. 1 are binary va ria bles indicating . stales (excluding Alaska and Hawllii). Ourea u o f Economic Anai}. and "Mandatory co mm uni ty service?" equals I i( the stale re qui re~ community se rvIce and aIS0 olherwise. including fi :ted Sla le and time effects in the regressio n. In 2000. Does pri mary enforce me nt lead to more seal belt use? What about secondary enforcemenl'! g. allhe University of North Carolina. annually for thro ugh 1988.378 CHAP TEfI. ..19. The drin king age variables in Table 10.. The beer tax is the tax on a case of beer. Run a regression of sb _lIseage on primary. speed70. Traffic fatality datd obtained from the U. which is a measure of state alrohollaxe. State Traffic Fatality Data Set I~S2 In The (illia arc for the "lower 48" U.. huOS.s. .ary is a binary variable for secolldary enforcement. and Ihe une mployme nt rale was obtained from the 1.rnbc Ihe SI. 111e~e data "ere graciously provided to us by ProCessor Christo pher 1. APPENDIX 10. and age. New Jersey changed f rom secondary enforcement to primary e nforce ment. ". per 10..S. drinlwgc21. TIle tT{lffic fa tality rate is the number of traffic d ea\ h ~ in a given state a given year. Estimate the number of lives saved per year b) making Ih i~ c ha nge.ec(mdur). .
in Exercisc IIU5.s~ion or Equ(llion (10.~c t iona l da ta.. X lr) .c OLS estimator I~ Y b~ obtai ned by replacing X i  X b) X" and Y. T)5. it" CTQS5. r __ (10.._tX"... = T I '2. . The Asymptotic Distribution of the Fixed Effects Estimator Th\! fixed effect~e~timntoTof fJ... First. .:ancd fixed cUects regression \Vith a singlc Tegres wr X.. 1V...~ X'l+ (Xl! • ..: . .\'. The fo rmula for Lh..XI' Y. .re~<. <. I the C:Xlcn~KJn 10 double sLl~rirl' or 11 ~ingle $\lmmlltiun: = L (X.and f1utocorrelalionconsi~ l ent (HAC) standard errors. conditional on the X's. + x : r ) I' • (X~I + x~:.:1 SIondord Errors lor Fixed Effects Regression with SerioIty ~ Errors 379 APPENDIX 10. . 'Ib is appendix explains why this is ~o and presents an al te rn:lIive form ula f(Jf standard errors thnt Me valid if there i~ bClt' roskcdasticity [lntl/or autocorrelationthat i). I . h Vu in Equation (4 7) and by Tepkld ng the single SUIll Ill:l l i(l1l in Eq u lIlion (4..1mpling distribution of the OLS estima1<lr . ~'. ~recifica l1 y when Assumption 5 of Key Conc~p t 10. ~ub '1h.22) LL X.*1 L L X~ I~I The deri\'ation of tho: sampling distribulion of PI parallels the derivation in Appendix 4..dULIble ~ummalion i~ •• 1.. is the OLS estimator obtained using the entitydemea ned regre." L LX") . om: o\cr enlities (i '" I. we focus on the ClIl>C in .n) and one over lime pcriod~ (/ '" 1•. If " II is <1utocorrelateu.7) by IWO SUnHll1l1 io ns. .t 0. hich the numbe r of entit ks is large but the numbe r of time periods T IS small: mathcmaticalt y. .·· + '\ ~r) ./) + XI!" .. "(' LLX.~ .lven in this aprendh: are extended to multiple reg.. Xi...." + X u + .this oorrcs ponds to treating II as increa~ ing to infim ty \\ hile T rcm~m~ fi "<ed..Y. li nd Xi = rl l.3 does not hold.. This appendix consider~ emitydcm. 1 .2 Standard Errors for Fixed Effects Regression with Serially Correlated Errors TIns appendix provides formulas for "tandard errors for fixed cffcclS regres~ion when the crrors are scriallycorrc l<1tcd. . = XII .3 of the 5. fo r heteroskeda~ticity. The fommlas g..14) in which Y" is rcgrc~~cd (In XII' ~here Y" := Y" . + .. f31 = ...()rr. [hen the u~U<t l standard erfliT fonnula is innppropri alc. Throughout..
.(X"  W. the va ri ance of t he largesa mple dislribu tion of PI is ± ( lU•.24).II" Ox Under the first fo ur assumptions o f Key Concept 10.3. '" }:. _I X"uj...380 (H A PUR 10 Regrenion with Ponel Dolo stilUte = fJI X" + u [Equallon (l0. \'ar(V) + 200v(U. .IX" U "" " I . alor by T POV J Nex t. The variance of1hc sum in Equalion (10.•• X.) '" va r(fir.....{).~ Q r 7 ( 1U. where fJ"~ IS VJ(iJl .. whe rC: 1]... _\/7 £" "" fi ~~ Qx II (1 0. dJ"ide the de nominator of the figh l side of Equation (10.26) var(J31) . ~rrI7"C 'T11 is distri buted N(O.1Q1) whe re (large II). The scaling factor in Eq ua lion (lO. J~ the lotal nu mbe r of observations. II .II).22). Re"alt tha t. .::!1) Under Assumption S of Key Concepl 1O.. b /3 I I  .X! Also.14)1 inlo the numeralOr of Equation (10. . by the cen tral lim it Iheorem..4. p lus covariances: .divide the num~r.1 I:_IX~.. thell " rearrange the result 10 o btain y.':6) simplifies... ..IIT. mulliplyTIhe _ ... a riables U and V. a nd no te that :f.'.) ]II. ~ L: L:X. ~.r=' left side by X.23) by nT./3. fo r two random .' ~ II r II· I LLX)iit _ ._.!..21) where lill := and = n~ 1:.26) t herefore can be wrinen !l~ the sum of va ria nce!\.lhc expression fo r u~ in Etjua!ion (10. (1025) IT! = var(q. tne variance (If '1i' II follows [rom Equation (10.) = fl· V~" ~". 0.3.II. var(U + V) '" \·ar(lf) . The n v. Qx L.O'~) for /I large.25). From Eq ulllio n (lO.) is dis tribuled N(U. v nT{fJ .r. Q. V ir). X.fl1" ~ .1. unde r AUumplions 1.24) that. = £7' .
SCl' ( 10.t <c if the re is heleroskedaslicily a nd/or autocorrelation (Exe r· ( 10.C!~~rmd (clustered standard error). C\' CD u.:. .29) to be reliable. fo r diffen:nt ti me periods but the same entity)..25) SEd). the so·called clustered va ria nce estimator is valid eve n if I'll is condition· estimator i~ .ith Serially C~ted Errors II . as n ..) TIle clustered panel da ta standard e rrors are give n by (10.. . Q (10..."2 _ _ + .. nonzero..23) + 2cov(v.d~$Imd is a consiSlenteslilnalor of l and T is a fixed COI1 SIIlOl . Because the o mi tted factors thai en ter the e rro r te rm could have . then the covariances in EtlulIlkm (10.ls .)  11T O'. ) + .and au tocorrela tio n· consiste nt.24) ally autocorrelated .. + va r(ii.27) ion (10.~ ) = r\'ar (T~ " " I = + "...1 2)../lT. U nde r AS$um plion 5. n sho uld be large. lncn va r ( slondord Errors for Fixed Effecb Regression . and Mullainathan (2004).... This va riance cstimtltor is ca lled the clustered variance estimator because the e rrors a rc grouped into cluste rs o f o bservatio ns. But if U II is autocorrelated. n that the sampl ing sche me selects f(lmilies by simple random The n tracks all siblings within 1I family.28) f1var(V..28) nrc zero (Exe rcise 10.30) x The cluste red varia nceesll mator if2 ).] :V.Vn)l.::d effects regression.28) are./ = Xi/u" a nd u" is the res idual from the OLS fix... the variance estimato r is hcte roskedastidty. The clu~tered varianc~' u!. (10.. Fo r example. in general. thc erro rs a re uncorrela tcd across tJ me periods.7 _I...26) cise 18.rI~j/""{ where 1 " . but are assumed to be uncorrelaled ac ross clusters. + 2oov(ii. + "ir I ( 10. ] ..~ ( ' }' fI . In contrast.1 5): tha t is. given the X'~so e numer· l Xirliu all the co\'arjances in Equation ( 10.26) ') + vi"lr( V) 'j!!cn n~ the Standard Errors when SllPPOSC U it Is Correlated Across Entities ~a mpling../ ~ . in a stud y of earnings.. BertrAnd. so if II i) is au tocorre lated the usual heterosi::edasticityrobust variance estimator does no t consiste ntly estimate (10. (Some soft "here O'~ i~ ware imple ments the clustered variance form ula with a degreesoffreedom adjustment.) 381 'i/I . 1 .T) (1 0.6). For the clustered va riance est imator in Equation ( 10..) + . .10. Duflo...I o:T~. The usual heleroskedas· ticityrobu~t variance esti mator se ts these covariances to zero. where the errors can be correlaled wi thin the cluste r (he re.1 2'>")'.29) X'ii· &. 11'1 might be correlated across entities.24). For cmpirkal examples us ing H AC stltnda rd errors in economic panel data. T some cases.]) + v:n (va.
 = . ~ o • • • w ~ u •  . h ~ M.' \~ ~ ~ ~ ~ ~ ) ~ ~ 1 ~ ~ \." \ <0.. <> ~ r~ ~ " ~ ~ ~ ~ i ~ .....
rlng that whi les f. Bul whe ther Ihey actually do is a ma iler of great concern amo ng bank regulators. ~ Pi AJJe!~ opencd this chapter.)f ''''U : ~ ~t<. B ut how..nial. SO Ihc de nia l of a singJe minority applican t docs nOI prove a nything abo ut di<.':'" d. huitJin•l(" Oilier ap(}lical1l c/wTl1cterist jcs CUIL\/llIlt..conta ined in la'rge da ta se l ~ sho\l. For exampk.~ 1'~~/".s w~o \~~rc denied a mOr)g~ge.." ~ pro posed lo.CHAP TE R 11 I'e ~ Ir~ fju.. because tbe black a nd while applicants we re not necessarily p .bHe "pp l.can<s ...6 '~~ '1 di~rimination.cd.. canJs i1 j ~. ~ Matis~eal c\:ide.J "identical but fo r their race:' Ima cad. wc need a method for comparing rates of r A . ~. If the It. a large loan ~o thaI each can an identic(li house.m payments take up most o r a ll of the applican t's monthly income.lnd minorities are treated dl(fercl1Ily.lrc the fraction of minority and \\hite r. /ia.L. MassachusellS. e ven lORn officers arc huma n a nd they ca n make ho nest milllakes..~<e denjed mortg""" but o nly 9% of . comp.ve I IM#rtf1j(I_ ... Loans a re made and de nied for many Icgilimale reasons.. _~ .n. £0< " • 383 . I~}~le d~ta cxam!ncd in this chapter.IP.f~: LpQ<H """w ~.l.·/l41~U>uJ. identical but for their race. ~uy C _.. r .. ~ T wo people.....? Regression with a !:::::~ Binary Dependent Variable~~ Wv.~ '. wa lk into a bank and apply fo r a m ortgage. gn thered [rom mortgage applications in 1990 in the Boston ..Jik . ' ed'g. fO r ~..... Butth.crimin ation... ~ ~h 28% of black»ppl. should o ne chtck. precisely. that is. D oes the 44 bank treal them the sa me way'? Arc they Ool h e qually likely to have their mOrtgage applicatio n accepted? By law they must receive identical treat men I._~k evide nce .. Also. Many studies of d illc rimimllio n thus look fo r sta tistical e\ide ncc of { fOc. area.nee of discriminatio n e~1 applica nt..so<1 does not rcolly answer tbe questiun that in Ihl! mortgage market'? A start is to comp..P.. the n a loan ollicer mig ht jusli(hlbly den) Ih e loan. fA..
1 (N)U~ and they ca used no particular proble ms. Th is in terpreta tion is discussed in Section I I.~ J) .. 1''</1 Y1I<. in o ther words. but with a twist. nonlinear regression models can do a better job modeling these probabilit ies.1  .. cal led " prohit" and ··togit·· regression..2. TIle binary depende nt va riable conside red in this chapter is an example of a dependeot variable with a limited range.3. r~~J!IA> PI ~ J .3. 11.. th~ method of ma. J.~} 'l Or"" ..I. Models for o lhe r Iypes of limited depende nt variables.384 CHAPTU 11 Regrenion with 0 Binary Dependent Varioble Th i~ sounds like a job for multiple regression analysisand it is. Section 11.4. The twist is that the dependent variablewhether the applicant is denied../e. Many o ther important questions also conce rn binary outcomes. are discussed in Section 11 ... we apply these methods to the Boston mortgage application data set to see whether the re is evidence of racial bias in mo rtgage lending. which is op tional. for exampk .t' ~~ ~.l (f'" U .1 (0'. Section 11 . we regularly used binary variables as regre'\sors. it is a limited dependent l'anable. dependent variables that take on mu ltiple discrete values. and it allows us to apply the multiple regression models fro m Part II {o binary dependent variabl es.. Binary Dependent Variables 'and the Linear Probability Model Whether a mortgage application is accepted or denied is one example of a binary variable. • 11. In Section 11 . al e surveyed in Appendi.. ~ k~ p~~  .1 goes over this "linear probability modeL·' But the predicted prohabili ty interpretation also suggests that alternative. 0 and 1? The answe r to tbis question is to interpret Ihe regression runc:lioJl as a predicted probability. { ~U~ ! ~~d".is bin~.... Th ese methods... imum likelihood estimation. 10 Part II ...\. But whe n the depende nt variable is binary. things are morc difficult: What does it mean to fi t a line to a dependent va riable that can take 00 only two values. discusses the method used to estimate the coe[[icieots of the pro bit and logit regressions. Whal I:> the dfet:t of a tuition subsidy on an individual's decision to go to college? \Vha l ..
) Thil.s b. .l . The 101111 officer must forecast whethe r Ihe . (Thl: scatterplot is casier to read using th is subset of ~he datu .'£' 1 ~. . JJ area in 1990... f .. whether a teenager 1.onsh.. 0. ole deny. and so is the proces!i by which the hank loan officer makes a dccision. \\ ilh a paymenttoincome ratio ~ss than 0..flo/ I ..v'.p between deny and I t~'iiI ratio: Few applicant. e their . I ' Illis positi ve relationship betwcen PII rario . This seclion discusses what distinguishes regression with a binary depende nt variable from regression with a cont in uous de pendent \'~riahle . As usual.1. Thc 130ston HM DA data are described in A ppcndix 11.I. ' t P. the OUicomc of interest is hinary:The student does or does nOI go 10 college.obi. ~ ':70  . O ne impo nanl pieee of infonnation 4 "[ is t he size o f the requi red loan paym ents Jc lath'l. .3 b.I J ~ f.. this line plots thc pre dicted value o f tleny as a function o r the regressor.lhe teenager docs or does not take up smoking.1 . a coun try does or docs nOl rccej \"c foreign aid.IX". Iht: lin ear probability l {i i47't&~ >uuo.:ond the 'u"... applicant will make his or her 1 01m payments. Massachusen s..f. J~nninc:. mo nthly loa n pavmen ts to his or her month ly income.our inC"Omc t ha n SO% ! We the refore begin by looking at the rela tionsrup be\\\ ccn two va riables: the binary depe nde nt va riable deny. wh ich b the ratio of the a pplica nt's anticipated 10lal 5 . 7 J 'I ' . th e payment ·loincome ratio . .' seems to show ...1q t b'/c."'. ~etH T r) tis . ersus Pli rutio for 127 of the 2180 Dbscr\'arions in the data scI. l h en turns 10 the ~ i mpi cs i model 10 use with binary dependent va ri ubles. ~. The data are a subsct of a larger data sel compiled by researchers at the Federal Rescrve Ba nk o f Boston uml~ r the Home DisslmlJ[c Ast ( HMDA) .' . relm.4 art: denied. ...# t 'Binary Dependent Variables The applica tion examined in this chapter is whe ther race is a factor in denying a mortgllge applicat ion: the binary dependent variable is whether a mortgage appli cation is denied.1 presents a sC811erplot o f tlen).! to the applicant's income. S.md deny (the higher the PI! mtio.. but most upplicants with a paymenttomcomc ratio exce!i.. and re late to mOrlgage a pplicatio ns fi led in the Boston. and the ~ cont inuous variable PlI rmio ..m"y.. :f / MOrlnas . ho has ~ rrowed 1ll0l1CV knO\:'rlo .1y Mode1+"!>ifK JvtA #<'1 country receives foreign aid? What determines whether a Lob applicant is sue ce~!)fur! In alllbe!>c examples. wbich equals 1.j I ir the mortgage applicatio n was de nied and equals 0 if it was acce ptcd . P<obob.1 is much easie r to make pavntcnts that ~ ~c 10% o f \.lke.". up s moking? What detecuunes whether a J raU"e fh'~r'< ~. I S Figure 11. ~'I orlgage applications are complic8lcd. Ali ~ anyone \I.k~ /4~.. p application denkd.II.:. tht! applicalll docs or does nol get a job...t' 14'!:f/. . ~ the greatcr the fraction of deniu ls) is su mmarizcd in Figure ILl by the OL5 regres sion linc estimated using these 127 ob~ervations.d rIvet{ ing.> sC3tt crplo t look') different th"'n the sc3ucrplots of Pan II because ~~ .
from Pan II .2 .3. 1 X. w ith a high rolio of debt payments 10 income (PII ratiol ore more d en y 1. Th us for a binary va riable.tI~=1 ~ regression with a binary dependent variableis to interpret th e regression a~ mod The key 10 answe ring this q uestion.. . if Y is a 01 binary. . '/ U. the pred icted val ue of de ny is 0.386 CHAPTf. . when I'll r(ltio = 0. and "probability mode l" because it models the probability thai the dependent variJPlc cq ulI ls I.0 U..... then 20% of them would he denied.. does it mean for the predicted va lue of the binary variable demoto be 0...20'1 i1. • •• • X~) =" Pr(Y = l IX I •. when PI] ral. • .5 U6 0.4 ttkely to have their oppficclion denied (dtony = 1 if denied. But what.. c::d!? p nO L . the popul ation er~egress ion function is the expected value of Y given the regressors.. E(Y IX 1. £( YIX l' ...3. if there were ma ny applicali on~ y.. deny .. ~ ::.V I..7 U. m .R " RegrO$sion with 0 Binary Dependent Vorioble fiGURE 11 .20..O is 0.. ..variable. 0 il approved) The lin· ear probability model U~ a Mroight line 10 model the probability of denial. the probability of loan denial. given X..20 is interpreted as meaning that.8 0 . ). In the re~rcs~ sia n context the expected va lue is conditional on the value of the regreSS(ll~. 1 Scorterplot of Mongoge Application Dental and the Paymentto·lncl Mortgage opplrconl1. Second...••••••••• .4 0..2 V. ... /' . (~' ~ 4._NPL....Thus.2 II.t (or mean) is the probability that Y = I. II I'l l rllIio "" 0.). • X. ... from Section 2. _ _ • • _ .'2 1. condj· tional on the PI/ ratio . for a binary variable the predicted value fTOm th ~' popularion regression is the probability that Y = I. precisely.and more generally 10 understa ndmg ~ cling the probabiliry that the dependent variable equals I.. the probability of demal L.= p.lth {l.6 \Jnea( PtobabI'ity Mor!9* dfnltd ModtI IIA 0. This interpretation follows from two facts.3. Mortgage a~ 0. The lineur multiple regression model applied to a binary dependent vfln Jbl e is ca lled the lincar probability model: "l inear" because il is a straight lin e. then its expected value I. that is. . For exampl e. ~D ~ the probabil it y is conditional on X.0 ~ ••. 1. in our example. ? .3 is estimated to be 20% .' PI! ratio P. . . •••• 0. thc predicted value f of 0. [n short. First.' P a~? 0/. £(Y) = Pr(Y '" 1).'tP« ( IUt7 ~~ ..2 0) 0.• 0... Sa id differently.
given a change in the regres . 1. r" '~ 11 .rted with a IInir chons..I. . yrobust standard errors be used for inference.•' '.060. A jp.. nts can be teste using the Fstatlsllt seusse 10 Chapler 7: and IOterae Ions ). and the population coerfi· I cient is statistically significantly different from Icro at the I % level (the lstatistic .e in X. Because the dependen t variable Y is binary.11''1 e Boston HMDA data.:scss ntialtbat hetcroskedasti .. E! 0. (. is the predicted probability that the dependent 1. (0. ' The linea r prob ility model is summarized in Key Concept 1/..: vals can be formed as ± 1. and !he U.". Accord ingly.3. Simil arly. For cxample.againsl the pa)'ment ·to·incom~ ratio. = Y . The population coefficient f3 1 on a regressor X is the cbQl!gp in 11ll' p ro bobjlic)' ~hM Y I associ tr. ~w · 0.1) ".et w . the popula tion regression function corresponds to the probabili ty thut the dependent variable equals one. given X.1 centage points.13). PI[ rafio.fw. When the dependent variable is co IOUOU it is possible to ' agine a situation in which the R~ equals 1: All the S. unless tbe gressors are also binary.080 + O. .0 per I !L __ ...~ p.604 X 0.~ Model ~ The Linear Probability Model Y ~~y.O~ C.098) P ' ( 11 . We re turn to measures of fi t in the next sectio n. 1 8..)~ ~0 ~ o f ." '""0. B ecause the ~J:tors f thelinear probabili ty mod arc always heicroskedastic ( Exercise 11. pute r. Th Us. /J /} r' f J ff' ~ ~plicatjon to ~d~epcndc nt van 1d ~ · .l4t I' 7 I ~ le... n i It ~ '. (lala lie exactly on the rcgr ssion line.. l A lmost all of th e tools of Part II ca rryove r to th e linear probability model.. and the OLS estimator ffi l estimates the change in the probabiJ ity that }' = I associated with a unit change in X.604PII ratio. Ninetyfive percent confidence inter. _ {.96 standard errors.~ /J X~I 0 e tooll hat does no t ca • over is the R2. fY~61:t "'*:"1t' ~ 387 f £R J. Ir~yre likely to have their applicatio n denied.. l."" 0 •••". Iv' n variables can be modeled usi the methods of Section 8./th e estimated regression fUnc lion .032) (0.... according to Equation (11._ f) de..J. This coefficie nt can be used to com .8). compuled using ~ . The OLS regressio n of th e binary P M l !.1Il be estimat ed by OLS.. hypothcse~ concern ing several eoef fi .~ 1/1 The estimated coefficient on PII ratio is positive. the R2 is not a par ~ f"'~ularly useful statisti he re. Y. deny. This is impossible when the dependent van I I_ffble is binary. that is. P~b... pl1 then the probability of denia l incrcases by 0.r ' . if the PI/ ratio Increases by 0. the OLS pred icted va lue. 1 : 8 ~ y" _I 'I ~ variable equals 1.1). by 6. ~stimated Sing 311 2380 bservations in our data set is 1_i!~l'k 1. applicants with highcr debt payments as a frac tion of income are ". The linear probability model is the Ilame ror t he muhiple regression modiFot'"Pa~ II when the dependent variable is bi nary rather than conti nuous. . The coe fficie nl s C.'l a II 51" i..1.t' ~~ the predicted change in the probability of denial.
and th e usual an ( hcte roskcllasticityrobust) O LS s tandard errors cnll he used for confide nce inter vuls and hypothesis tests. 0. a n I'I pplicam whose projected debt payments arc 30% o f income has a prob<lbility of to.stimate Equation ( 11.604 X 0.11 ).029) (0.o + O.0.I XI' X l' . For e xample.tant their pay mentIoincome ratio. 1% that his or her aprli· ca tion \\ ill be den ied.1).: Int ha. '(k. llle estimated linea r probability model i. The estima ted linea r pro ba bili ty model in E quat ion ( 11 . a 17. 177 _indicates that an A frican 'American appll. _ X~) = f3(J + f3I X j + {32X2 + ". we focus o n differences hetween black and \\ hilt' a pplican ls. r rn. ThaI is. . a nd so forth for {32" " .101.lIl a \\ hile. the n the PI} rallU IS 0.3 = 0.ry:::: . + .) Whm is the effect of race on Ihe probability of den ial. if project ed dcbt paymen ts are 30% of a n a pplican t's income. f3A' TIle regression coefficients C be est imated by OLS.2) when: Y.3 and the predicled va lue from Eq uation ( 11.is is different than the proba bility of 20% based on the regn:ssion line in Figure ILL bcCOi USC that line was estima ted using only 127 III thl' 23SO obse rvations used to .l77Mllck.388 CHAPUR 1 1 Regression with 0 Binary Dependent Variable • THE LINEAR PROBABILITY MODEL ~J 1 1. hold ing constant the I'l l rmi() . To estimale the effe ct of race.025) ( lUI (0.nJIJ ca nt al the t % h. "' ~ aug me~t Equaliorr(l t . . holding constant thl: 1'// ralio. 1 Yi = f1n + {Jrx l..cd th.. T f32Xb.! To keep th ings sim ple.:vet ( th e Istatistic is 7.091 + O. : : (0.559PI} rar.i) with <l binary regressor that equals 1 if the applicant is I1lilek and equ:lls 0 if the applinmt i~ white.080 + 0.1 ) is . is binary_ so thm Pr( Y . ( 11.SSociatcd with a unit change in X I' holding constam the other regressors.0. + {J(X Ic The regression cocrficient {31 is the change in the probability tha t }' = 1 :l.J) ca n be used tll compute prell ieted de nia l proba bili lies as a funclion of the PII rmio.. according to th is linear pro babiji ly model. + fh x h + It.089) "nl C coefficient on Mack.7% higher probabil ity o f \l<lving a mortgage a pplicmion ~l'n. holding con... 'fllis coefl'icicnt I:' sif.
o and exceeds 1 for high values. so do many other factors.3. + ~tA1.c.>.c" "dJ. Logit regression._ ulatlOn that forces (hc predIcted values to be between 0 and I.SI.:..~:£~'d{!t. also cailed logb:fic . '. Shortcomings ofthe linear probability model.robit regression with a single regressor.~rR7. Ihey are used in logit and probit regrcssions. it makes sense to adopt a non lincar for ~JJCl /2. thiS estjmale suggests that (here might he racial bias in mortIt gage d('cision~ but such a conclusion would be premature. This nonsensical fea lUre is an inevitable consequence of the linear regression.rt? 0~ Thken literally.·T1te estimnted line representing tbe predicted prollahilities drops below ofor very low \alues of the PII rtlt.ss~n:~ ~ 11.1'1l) ~~~~.t1 r (" C/".on. omission from Equation (11. we introduce new nonlinear models specifically designed for binary dependen t var. j " j: Probit Regression " / ""' P irdIX)" ~(. To address this problem. f~ 'nle probit regression model wllb a stugk regressor X . for binary dcpendent \'ariables" BecauM= a regressIon "'i tll a hinary dependent vari 'O. ~ 11.d J. uses thc "log.! Bullh. Although the payment toincome ratio plays a role in tbe loan officer's decision. _. s pro uct! probabilities betwee n 0 (Section 2..J.V.k. .. TIms \\e must defer ~ny conclusions about discrimination in mortgage lendi ng un til we completc the more thorough analysis III Section 11. The linearity that makes the li near probabilily model easy 10 use is also its major nnw.f... then their.!..2 Probit and Logit Regression ( p .. Becaust u ptC~Umulati ve proba Islribul ion functions (e..' "d '0. pl'eJ. Look again al Fig ure 11.1 .ables'...d. Pro /I4I't~ bit cssion uses Ihe standard normaL c. (114) ./t f" Prohi l and logit 1 r~gre~sion arc nonlinear regression l1looeb spccificallv de~igncd ~.. II any of these variables are correlated \\~lh the regres. or PII n il/(}.!>O" black. ICh as the applicant's carning potentia l and the individual's credit history.is is non sense: A probability cannot be il'ss than 0 o r greater than I.Is ~ egress.jti'V"'.~ able ~ models the probability th~t Y = \.2 ProbitondtogitRegression 389 f..3) will cause onl ined variable bias. ~"O"""d it I'r( Y ~ IIA1 ~ ">(/l. p...\).O'.
2 0. In the prohlt model..:.0 _ .8. when the PI! ratio is O. Given the P/I Ratio The probil model use the cumu lative normal distribl. more generolly..4. however. 0.8) = Pr(Z ::.::: $( ...•. equivalently. sor. then looking up tbe probability in the tail of the normal distribution to the kft of z . if /31is negative. If /31 in Equa tion (11.0 · 0..8.:_~~:::.S P/I ratio 0.4).. __ .4) = (1)( ..0. 1he predicted probability that the application will be denied is 21.4? According to Equation (11.7 U.O . <t>( .4) is positi ve.l~ (lWIlW~ IL~_:.. the term . According to the cumulative normal distribution table (A ppendix Table I). computed using the pro bit model wit h the coefficients f30 = . then an increase in X increases the prob.::: .6 Probit Model 004 0._ . Beyond this.004 u .0 where <l> isthe cumulative standard normal distribut ion function (tabulated in AppeodixTablt! 1).6 0. which is 21..2 and f3t "" 3.8) = 21.0. For example. this proba bili t~ is ~ (Po + 13IPI l ratio) . to model PriY .0...390 CHAPTER 11  Regression with a Binary Dependent Variable FIGUR£ 11 .2 Probil Model of the Probability of Denial. Thus.. un. Instead . deny 104 1.8 0. Unlike the linear probability model.~).2%. X is Iht' payme nlloincome ratio (PIf ralio).. an increase in X decreases the probability that Y = 1.::: 2 + 3 X 0.2 %. the coefficients are best interpreted indirectly by com' puting probabilirie:.2 1. Whe n lhere is just one regr. 1IX).2 OS 0. MOOgaqt'denltd I. f30 = .2 %.3 OA " . _ • • _ _ < 0.. 1 0. . .. the probil conditionol probabilities are olways between 0 and 1.l' bility that Y = 1. andlorchanges in probabilities. suppose Y is the binary mort gag~ de nial variable. deny.tioo func tion 10 model the probability of denial given the paymentto income ratio or. The estj· . What then is the prob ability of deni al if PII ratio"" 0. That is. the easiesl way to interpret a probit regr c~ sion is to plot the probahilities.2 plots the estimated regression fu nction produced by the probit regrc~ion of deny on Pil ratio for the 127 observations in the sca lle rptOI..pu + PI X.2. the calculatiun in the previou s paragraph can. be done by fir st computing Ihe ·'zvalue." z = ~ + f3 \X .2 .2 + 3PI I ratio) =: <1>( . it is not easy 10 interpret the probit coefficients (3'1 and f31directly.. • .4 = . _ ________ __ • • • ____ _ Mortgao. plays the TOle oC"z" in the cumula· tive standard normal distribution table in Appendi xTablc I .2 + 3 X 0. Figu re 11. and f31 "" 3.:_~:~~ 0. 0.
ul a ion  For example. Thh. As emphasized in Section 8. compu te the predictcd value at the changed value of X.lit}' Prey = qx. dlld when !he PJl ratIo IS 0. tillS cxpected change is estimated in thre e steps: Firsl. compute the pre dicted value at the original value o f X using the estimated regression fu nc tion.4. is ..). Ibe prahil popululion regression model with two regressors. leaving out a determina nt of Y that is corre lated with the included regressors results in om itte d va ria ble bias. for PI/ Ttlfio = 0. the solution is 10 include the add it ional variable as a regresso r.\i . the proba bility of denial is sma ll . the melhod of Key Con ce pt S.2.2 is Pr(deny :: 11 rOlio = 0.) = <I>(Po + p. the probability oC den ia l increases sha rplv 10 SI.. For small values of the payment toincom e ratio. When the Pi l TtI/io is 0.1.1 %. Probit regressio n is no excep lio n. 1%.B 2 = 0. suppose. for ~pplicanls with high paymcn H oincome ratios.4. this method always works for computing predicted effects of a change in X.8o = 1. when the population regression function is a non lin ear fun ctio n of X. l yields the estima ted effect on Ihe probabililY that Y = I of a cbange in X.8)_ l ).2 Probit and legit Regre~ion 391 nUlled p robit regression function has 11 stre tch ed "S " shape: 11 is nearl y 0 and flal for small val ues of Pi l rlllio: it turns and incrcasc ~ for in termediate values: and it flatte ns ou t ag<lin ond is neorly 1 for large values.4 an d X 2 = I.11 . the de nial proba 'hilh)' is 98. the pro ba bi lity of denial is nearly I .6. + (J. Accordingly. (11. no matter how complicated the nonlinear model.: : value. This procedure is sum marized in Key Concept 8.5 x 1 = .able bias in probil regression.81 = 2. the estimated proba bility of de nial based o n the estimated probit function in fig ure 11.6. R ecall from SeCtion 8. Effect of 0 change in X ..5.he ility ba thaI :s 131' :onJ ..X. the probability that Y = 1 give n XI = 0. reS obit e.X.3. When Y is bi nary. its condi tional expectation is the conditional probability that it equ a ls 1.3%. Xz = 1) = <fJ(. then the zva lu e is z = . The pro bit model wit h multiple regressors extends the si ngleregressor pro bit model by adding regressors to co mpute Ihe .2) = 2. According 10 this estima ted probh model. Tn li nea r regressio n. Probit regression with multiple regressors_ In aJl th e regressio n problems we have studied so far. next. so the expected change in Y arising fro m a change in X is tlle change in the probability that Y = 1. X + ax. In general. SO. X.4 + 0. When the PIT Talio is 0. the effect o n Y of a change in X is tb.4 and X 2 = I is Pr(Y = ll Xl = 0. If Xl = 0. is a lso the solutio n to o m itted vari.3) = 38%.n he b .1 that.5) 1 .1. Xl and X 2.3. .O.. and. ted ro .6 + 2 x 0.9%. then compute the dWerencc b!:lween the Iwo predicted values. the esti PIT ma ted p robability of denia l is 16. When appl ied to the probi ! model..0.e expected c hange in Yarisi ng from a change in X. tor example.1.
• 13k . 19 + 2. and esti mated effccts are sum marized in Key Conccpl ll .!' .£BOBIT MODEL.97 are d ifficult to interpret bC SC c<lU thcy affect the probability of denial via the . I> is the cumulative standard normal distribution function and X I' X 2. + .pROBABILITIES. are regressors.. lbe predicted probability thai Y "'" I. (11.J .2. 1: Compute th r: plob.X.47) (11.• XI. X. given \'a lues of X l' X 2•. The prohit coeffi cients i31J• {3 )... preditted probabilities.ln be rea dily concl ud ed from the estimated prohi! regression in E q uaLion (1 1.32).97. we follow the procedure in Key Concept K. .. lndced. As an illustration.. c lc. Application to the mortgage data .4? To .. PREDICTED . the pre d icted probabi lity for the ini tial va lue of the regressors: (2) com puting the pre d icted probability for the new or changed value of the regrcssors: and (3) tak ing their ditT.toincome rat io (PII raTio): Pr(lieny =I PII ratio) = (N~2 . the o nly thing that C . (0.'1). The effect of a change in a regressor is computed by ~I) computing.::rencc.: is calcu_ l ated by computing the zval ue.6) where the de penden t variable Y is binary. . do not h~\'e simple im erpret at io m..( Y = II X.16) (0.~7 = 6..2.. z == f30 + {3IX I + f32X2 + . AND ESTIMATED EFFECTS The po pu latiull pro bit mode l with multiple regre ssors i~ 11.:\Ll~W<r Ihis q uestion . . • .2 P..) = <t>(flo + ~ . The model is best interpreted by computing predicted probabilities a nd the e ffect of II change in a regressor. we fit a probit modcl to the 2380 obse rvations in our data set on m o rtgage denial (deny) and th e pay me nt..! va lue.1 Y a nd 2. + ~ . . and then loo king up Ihis zvalue in the normal distribution table (Ap pe ndix Table I). X.97{'11 raTio). What is th e change in the predicted prohability lhat a n application \\111 be denied when the payme ntto· income ratio increases fro m 0. 7) is tha t the PII ratio is positive ly related to proba bi lity of denial (tbe coefficient 011 the ('II ratio is posilive) and Ihis re1ationsb ip is statistica lly significant (t "" 2.392 CHAPTER 1 1 Regressioo with a Binary Dependent Variable Key CONCEPT Ti:JE.7) The estimated coefficients of . X.. + {3 kX j.. TIle probit regression model..3 to O. + ~. .X.
74PI/ ". 'Illc maximum likelihood estimator is con sistent . ratio = 0.7) is (1 )(2. the effect of a ch a ng~ in X depl. s. an increase in the paymentlaincome ratio from 0.1.5%.3.0"3) (0.159.4 is (1)( 2. sta t i~tical Estimation ofthe probit coefficients.0 percentage points. Thb coefficient is sta ti. For exam ple.~ percentage poin ts.3 is (D( 2. Wh al b. $ ( 1 .3 it is 23. so thai (statistics and confide nce interva ' ~ for the coefficients can be constructed in the usual \~a) .3) .16) (0.19 + 2.8) Agl1in.)0) 0.4) = Ci)( . "The estimated change in the probability of denial is 0. including regres sion wit h a hinary dependent v'lriablc. 159.7Iblack). 'rne cocf(icicnt 0 11 black b positive. 2 percentage points when the PITratio increases from 0. (0. while for a black applicant with PI.lnd llonnaUy distributed in large samples. Standard errors produced by sllch software can be used in the !>amc way as the ~tandard errors of regression coefficients: for example. which produces efficient (minimum variance) estimalo~ in a wide variet).3 to 0. 71) "" 0. from 9.1 59 .0.239.5..2 percent age poims. then the esti mated denial probabil ity based on Equation (1 1. The probability of denial \\ hen PI! ratio = 0.7% to 15.4 to 0.97 x 0.2 Probil and logit Regreuion 393 bililY of denial for PII ratio .5).97 x 0. hold ing conJ. indie:lIillg that ao Africa n American "pplica nt has a higher prohahility of denial than a white applicant . holding con stant the paymenttoincnmc ratio? To estimate this effect.rio + 0.. the values of the codIidents are difficult to interpret hut the sign and significance are not.5) = <1>( 0.5.44) (1 [.:mls on Ihe starting value of X.. That is. the difference in denial probabili ties between ihese two hypothetical applic<llll s is 1 5.4. For a white applicant with Pli ralio = 03. so this is a simple method to apply in pract ice.097 = 0. black) = cJ)( . if PII ratio = 0. and th en compute the difference. a 95% confidence in tcrval for the . The probit coefficients reported here were estimated using the method of muximum likelibood.0. Beca use the probit regression function is nonlinear. or 8. then for PI! rmio = 0. 3%. the predicted denial probllbilit y is 7.239 .4 is associated with an incrense in the probabilit y of denial of 6. Thus the change io the predicted probability when PII ralio increases from 0. 19 + 2.5 is 0. .t 1. Regression soft ware for estimating probit models typically uses maximu m likelihood estimation.00) = 0..2.ticaliy significant at the I % leve l (the I . The probability of uenial when PI! ratio = 0.19 + 2.3 to 0.0.97 x 0. y. l:uger than the increase of 6.~I<lti " tjc on Ma ck j.097.lhe dfcct of race on the probability o f mortgage denial. of applications.062. e estimate a probi! regressiun witb boTh PIT rario and l>Itu:k as rcgrcssors: = Pr(dellY  IJ PI! ratio.4.tant their paymenttoincome mtio.26 + 2.9%.
{. Logit regression is su mmarized in Key Concept 1l.. Similarly. The Jogit regression mooel is similar to the pro bit regression mode l. which is given as the final expression in Equat ion (11.394 CHAPTfR 11 Regression with a Sioory Dependent Variable LOGIT REGRESSION 11.6) is replaced by the cumulative srandard logistic di.l he logit coeffic ients are best interpreted by computing pre· dicted probabilities and d iffere nces in predicted probabilities. ~ }~ Logit Regression The logit rezression model.2.X" p'>. so that tstClt istics and confidence intervals for thc coeffici ents can be con structed in the usu a l way. The logit and pro bit regression func tion s are simila r. As with probil.3.96 stan· dard errors. r Logit regressio n is similar 10 probil regression .9).ich graphs the probit and logit regression functions fo r the depenM n! variable deny and the single regressor Pil TaljO .3. The logistic cumulative distribution function has a specific functional form. The maxim um likeli hood estimator is consistent and no nnally distributed in l ar!!~ sa mples. except that the cumula tive standard normal d istribut ion funccion (}) in Equation (11 . true probi! coefficient can be conslJucted as the e ~timated coeffici ent ::!: 1.B.3 The po pulation logit model o f the binary dependent variable }' "'ilh mUltiple regressors is 1 ::: 1 + eCAl+... estimated by maximum likelihOod .~t ·p. wh. t:XCCpl that the cumulal ivc distri bUlion fu nction is different. This is illust rated in fig: ure 11. Max imum li kelihood estimalion is discussed ~ ~ further in Section 11. £ ·statistics computed usin g maximum likelih ood estimat ors ca n be used to test joint hypotheses. with additional del ails given in A ppendix 11.· tributio n fun crion .. The coefficients of the logit model can be esti mated by maximum likelih ood. defined in te rms of the e xponent ial function.3.. which we de note by F.
produce nearly identicol esti motes of !he probability that a mortgage application will be denied.2 _____ u _ _ _ .8 0(..'l Probil and lagit Regression 395 f iGURE: 11 . h I. Ihis dis tinctio n is no longer important. o r 22 .35) (0. Application to the Boston HMDA data _ A logit regression of dell)' against PI[ ratio and black .2 .0 IJ.3 is 1/[1 + e . the main motiva tion for logil regression was that Ihe logist ic cum ulative distribution fu nction could he computed faster tn(lD the normal cumu lative distribution function.3 is I / fl + e U5 J = 0.21XOJJ = 1/ /1 + e2.1O) (0.. _ .2 0..2%.0 0. using the 2380 observati ons in t he data set.. in Figu.1 and 11.4%. d eny 14 1.47). . __ u _ u _ Mortgage approved 11. .%) 1.J1XO..• . Hi storically. (11.3t1. .2 03 OA 0.11 . The pred icted denia l probability of a white applicant with PI! ratio = 0. .6 0. "''''~:~ Mortgag~ dffiied ..lion.I 0.4 J OIl'" 0. black) = F( 4.37PI! ratio + (0.5 0. so th e di ffe rence between the two proba bilities is 14.J' _ .".222. _ . .15) lbe coeffi cient on black is' positive a nd statistically signiUcant at the 1% level (the tstatistic is 8. yie lds the estimated regression function Vr(deny = l lP II ratio.074.2 1 · l 1.3 Probit and logit Models of the Probability of Denial. With the adve nt of morl! efficie nt co mputers.52J = 0.". Given the P/ 1Ratio Thew lagit and probit model. 04 0. . The diffe «nces be twee n the two functions are sm all._. ~ ~"..27hfack ). given the poymentkl income ratio. _ _ _ _ _ _ _ _ _ • • _ . _..7 08 P/ I ra n o ~. 11.( ·UH 5.13 + 5.. . ( ~ using Ihe sam e 127 obse".0 0.8 percentage points. . . or 7. The pred icted de nia l probabilit y o f an African Ame rican applicant with PI! falin = 0. ..
I 1. huger than the probi! and 10gil estimates but slill qU<l litatively sim ilar. however.. TIle linear pro ba bili ty mode l provides t h~ast sen sible approx imalion to the nonli near population regre!>sion functi on.1lle only way to know this. car prolxl bil ity model is casiest to use and to inte rpret. For exa mpl e. and differe nt researchers use diffe rent model\.. 11 The lin. in which ca ~ e th e linear probability model still can provide an adeq us it: approx imation .:~ · si on fun ct ions are a nonlinear ~nc tio n of Ihe codricien!!>. but their regrc~ io n cocf f)Cients are more difficult to interpret. Fo r practical purposes thl! two t. and logitarc just approximati o n~ to the unknown popula tion regression funClion E(Y I X) . So which shou ld you use in practi(:c? There is 0 0 o nc righ t a nswe r. according to the estimated probit model in E quatio n (1 1. is 10 estimatl! both a hn ear and nonline ar model und 10 compare their predicted pro babil ities.\')..r<: s ion funclions can he estimat