# Chapter 4

Linear Regression with One Regressor („Simple Regression‟)

Outline
1.

2.
3.

Scatterplots are Pictures of 1→1 association Correlation gives a number for 1→1 association Simple Regression is Better than Correlation

2

Outline
1.

2.
3.

Scatterplots are Pictures of 1→1 association Correlation gives a number for 1→1 association Simple Regression is Better than Correlation

3

Does Having Too Many Students Per Teacher Lower Test Marks?

?
4

Scatterplots are Pictures of 1→1 Association 5 .

Is there a Number for This Relationship? 6 .

What about Mean? Variance? 7 .

called „X‟ 8 .Treat this as a Dataset on Studentteacher ratio (STR).

Treat this as a Dataset on Studentteacher ratio (STR). called „X‟ Imagine Falling Rain 9 .

Collapse onto „X‟ (horizontal) axis 10 .

Ignore „Y‟ (vertical) 11 .

Sample Mean n xi x i 1 xi n n x 12 .

Sample Variance n ( xi S 2 x i 1 x) 2 ( xi n 1 x )2 n 1 x 13 .

Standard error/deviation is the square root of the variance n ( xi Sx 2 Sx i 1 x) 2 ( xi n 1 x )2 n 1 x 14 .

(It is very close to a typical departure of x from mean. „deviation/error‟ = departures from mean) | xi Sx n x| x 15 . „standard‟ = „typical‟.

called „Y‟ 16 .Treat this as a Dataset on Test score.

Collapse onto Y axis 17 .

Calculate Mean and Variance S 2 y y 18 .

Is there a Number for This Relationship? Not Yet 19 .

Break up All Observations into 4 Quadrants x II I y III IV 20 .

Fill In the Signs of Deviations from Means for Different Quadrants x xi x 0 yi y 0 II xi x 0 yi y 0 I y III IV xi x 0 yi y 0 xi x 0 yi y 0 21 .

Fill In the Signs of Deviations from Means for Different Quadrants x xi x II 0 yi y 0 xi x 0 yi y I 0 y III IV xi x 0 yi y 0 xi x 0 yi y 0 22 .

The Products are Positive in I and III x II I ( xi x )( y i y ) 0 ( xi x )( y i y ) 0 y III IV ( xi x )( y i y ) 0 ( xi x )( y i y ) 0 23 .

The Products are Negative in II and IV x II I ( xi x )( y i y ) 0 ( xi x )( y i y ) 0 y III IV ( xi x )( y i y ) 0 ( xi x )( y i y ) 0 24 .

Sample Covariance. describes the Relationship between X and Y S xy 1 (n 1) ( xi x )( y i y ) If Sxy > 0 most data lies in I and III: This concurs with our visual common sense because it looks like a positive relationship If Sxy < 0 most data lies in II and IV This concurs with our visual common sense because it looks like a negative relationship If Sxy = 0 data is „evenly spread‟ across I-IV 25 . Sxy.

What About Our Data? x II I y III IV 26 .

Large Negative Sxy 27 .

Large Positive Sxy y 28 .

Zero Sxy y 29 .

Our Data has a Mild Negative Covariance Sxy<0 x II I y III IV 30 .

31 . rXY. is a Measure of Relationship that is Unit-less S XY rXY S X SY It can be proved that it lies between -1 and 1. 1 rXY 1 It has the same sign as SXY so ….Correlation.

Mild Negative Correlation rXY= -.2264 x II I y III IV 32 .

2. Scatterplots are Pictures of 1→1 association Correlation gives number for 1→1 association Simple Regression is Better than Correlation 33 . 3.Outline 1.

Outline 1. 2. 3. Scatterplots are Pictures of 1→1 association Correlation gives number for 1→1 association Simple Regression is Better than Correlation But… How much does Y change when X changes? What is a good guess of Y if X =25? What does correlation = -.2264 mean anyway? 34 .

2.Outline 1. Scatterplots are Pictures of 1→1 association Correlation gives a number for 1→1 association Simple Regression is Better than Correlation 35 . 3.

2264 mean anyway?” …by fitting a straight line to data on two variables. X 36 . Y b0 b1. Y and X.What is Simple Regression? Simple regression allows us answer all three questions: “How much does Y change when X changes?” “What is a good guess of Y if X =25?” “What does correlation = -.

sc. Applet: http://hadm. This task then becomes the minimisation of the area of the squares.html 38 .We Get our Guessed Line Using „(Ordinary) Least Squares [OLS]‟ OLS minimises the squared difference between a regression line and the observations. We can view these squared differences as squares.edu/Courses/J716/demos/LeastSquares/LeastSquaresDemo.sph.

39 . R2=[TSS-„sum of squares‟]/TSS If the model fits perfectly „sum of squares‟ = 0 and R2=1 If model does no better than a constant it = TSS and R2=0 The standard error of the regression (SER) measures the magnitude of a typical regression residual in the units of Y.edu/Courses/J716/demos/LeastSquares/LeastSquaresDemo. and sum of squares equal to „total sum of squares‟) to a line.html It is the proportional reduction in the sum of squares as one moves from modeling Y by a constant (with LS estimator being a sample mean.sc.sph.3) The regression R2 can be seen from the applet http://hadm.Measures of Fit (Section 4.

40 i 1 . The SER is (almost) the sample standard deviation of the OLS residuals: SER = 1 n 2 1 n 2 n ˆ (ui ˆ u )2 i 1 n = ˆ ui2 i 1 1 ˆ (the second equality holds because u = n n ˆ ui = 0).The Standard Error of the Regression (SER) The SER measures the spread of the distribution of u.

SER = The SER: 1 n 2 n ˆ ui2 i 1 has the units of u. 41 . and doesn‟t matter if n is large. which are the units of Y measures the average “size” of the OLS residual (the average “mistake” made by the OLS regression line) Don‟t worry about the n-2 (instead of n-1 or n) – the reason is too technical.

2) 42 .How the Computer Did it (SW Key Concept 4.

9 0 Estimated regression line: TestScore = 698.9 – 2.28 Estimated intercept = ˆ = 698.28 STR 43 .The OLS Line has a Small Negative Slope Estimated slope = ˆ1 = – 2.

Interpretation of the estimated slope and intercept Test score= 698.9 – 2. the intercept is not economically meaningful.28 STR The intercept (taken literally) means that.9. districts with zero students per teacher would have a (predicted) test score of 698. 44 . This interpretation of the intercept makes no sense – it extrapolates the line outside the range of the data – here.28 points lower. = –2. Test score That is.28 STR Districts with one more student per teacher on average have test scores that are 2. according to this estimated line.

if STR goes up by one. an approximation is: Test score = –2.9 – 2.28 „d‟ means „infinitely small change‟ but for a „very small change‟ called „ ‟ it will still be pretty close to the truth. Test score 2. say. If STR goes up by.28.28 (20)=45. 20. So.28 STR How to interpret this? Take denominator over the other side.28 STR dTest score Differentiation gives dSTR = –2.Remember Calculus? Test score = 698.6 45 .28 STR So. Test score falls by 2. Test score falls by 2.

28 19.33 = 654. CA.8 ˆ predicted value: = 698.9 – 2.33 and Test Score = 657.8 – 654.8 Y Antelope residual: ˆ u Antelope = 657. for which STR = 19.0 46 .8 = 3.Predicted values & residuals: One of the districts in the data set is Antelope.

which looks large.2 R and SER evaluate the Model TestScore = 698. The standard residual size is 19. SER = 18.05.9 – 2. you only reduce the sum of squares by 5% compared with just ‘modeling’ Test score by its average. STR only explains a small fraction of the variation in test scores.28 STR. 47 .6 By using STR. That is. R2 = .

05335 8.250 Mean dependent var S.28.5 -1822.E.Seeing R2 and the SER in EVIEWs Dependent Variable: TESTSCR Method: Least Squares Sample: 1 420 Included observations: 420 TESTSCR=C(1)+C(2)*STR Coefficient Std.9330 9. 698.D.051240 0.479826 -4.048970 18.STR 48 .9 – 2. dependent var Akaike info criterion Schwarz criterion Durbin-Watson stat 654.706143 0.686903 8.58097 144315. of regression Sum squared resid Log likelihood = 698.279808 0.0000 -2.82451 0.129062 C(1) C(2) R-squared Adjusted R-squared S.1565 19.467491 73. Error t-Statistic Prob.0000 0.751327 0.

Correlation gives number for 1→1 association 3.Recap Scatterplots are Pictures of 1→1 association 2.2264 mean anyway? 1. 49 . Simple Regression is Better than Correlation But… How much does Y change when X changes? What is a good guess of Y if X =25? What does correlation = -.

50 .Recap Scatterplots are Pictures of 1→1 association 2.2264 mean anyway? 1. Correlation gives number for 1→1 association 3. Simple Regression is Better than Correlation But… How much does Y change when X changes? b1 x What is a good guess of Y if X =25? b0+b1(25) What does correlation = -.

Correlation gives number for 1→1 association 3. Simple Regression is Better than Correlation But… How much does Y change when X changes? b1 x What is a good guess of Y if X =25? b0+b1(25) What does correlation = -.Outline Scatterplots are Pictures of 1→1 association 2.2264 mean anyway? Surprise: R2=rXY2 1. 51 .

Correlation gives number for 1→1 association 3. 52 .2264 mean anyway? Surprise: R2=rXY2 1.Outline Scatterplots are Pictures of 1→1 association 2. Simple Regression is Better than Correlation But… How much does Y change when X changes? b1 x What is a good guess of Y if X =25? b0+b1(25) What does correlation = -.

Chapter 5 Regression with a Single Regressor: Hypothesis Tests .

54 . This is called inferential statistics. Simple regression also allows us to estimate. This is called descriptive statistics. under the OLS assumptions. by fitting a straight line to data on two variables. Y and X. as before.What is Simple Regression? We‟ve used Simple regression as a means of describing an apparent relationship between two variables. and make inferences. about the slope coefficients of an underlying model. We do this.

i = 1. other than the variable X 55 . or possibly measurement error in the measurement of Y.The Underlying Model (or „Population Regression Function‟) Yi = 0 + 1X i + ui.…. In general. these omitted factors are other factors that influence Y. n X is the independent variable or regressor Y is the dependent variable 0 1 = intercept = slope ui = the regression error or residual The regression error consists of omitted factors.

i = 1. 56 . we want good guesses (estimates) of 0 and 1. Clearly.What Does it Look Like in This Case? Yi = 0 + 1Xi + ui.…. n X is the STR Y is the Test score = intercept Test score 1= STR = change in test score for a unit change in STR 0 If we also guess 0 we can also predict Test score when STR has a particular value.

A Picture is Worth 1000 Words 57 .

ˆ u1 ˆ u2 b0+b1x 58 .From Now on we Use „b‟ or „ ˆ ‟ to Signify our Guesses. We never see the True Line. or „Estimates‟ of the Slope or Intercept.

We never see the True Line or u‟s.From Now on we Use „b‟ or „ ˆ ‟ to Signify our Estimates of the Slope or Intercept and for guesses for u. ˆ u1 ˆ u2 b0+b1x 59 .

they are different every time you take a different sample. What is a reasonable range of guesses for the slope 60 .Our Estimators are Really Random Least squares estimators have a distribution. How to test if the slope 1 is zero. or -37? 1? Confidence intervals: Eg. (like an average of 5 heights. A Random variable generate numbers with a central measure called a mean and a volatility called the Standard Errors Least squares estimators b0 & b1 have means 0 & 1 Hypothesis testing: Eg. or 7 exam marks) The estimators are Random Variables.

4. 2. 3.Outline 1. OLS Assumptions OLS Sampling Distribution Hypothesis Testing Confidence Intervals 61 .

OLS Assumptions OLS Sampling Distribution Hypothesis Testing Confidence Intervals 62 .Outline 1. 2. 4. 3.

Outline 1. 3. 2. 4. OLS Assumptions (Very Technical) OLS Sampling Distribution Hypothesis Testing Confidence Intervals 63 .

4.Outline 1. 3. OLS Assumptions (When will OLS be ‘good’?) OLS Sampling Distribution Hypothesis Testing Confidence Intervals 64 . 2.

5. E(ut)=0 E(ut2)=σ2 =SER2 (note: not σ2t – invariant) E(utus)=0 t≠s Cov(Xt. ut)=0 ut~Normal 65 .Estimator Distributions Depend on Least Squares Assumptions A key part of the model is the assumptions made about the residuals ut for t=1.2…. 3. 1. 2.n. 4.

and 4. (a combination of 1.Yi). that is. (technical assumption) 66 .d.) (Xi. i =1.i.….SW has different assumptions. E(u|X = x) = 0.n. Use mine for any Discussions The conditional distribution of u given X has mean zero. are i. (unnecessary in many applications) Large outliers in X and/or Y are rare.

1. E(ut)=0 E(ut2)=σ2 =SER2 (note: not σ2t – invariant) E(utus)=0 t≠s Cov(Xt. 4. ut)=0 ut~Normal 67 . 2. we need to understand them. 3. 5.How reasonable are these assumptions? To answer.

ut)=0 ut~Normal 1.It‟s All About u u is everything left out of the model E(ut)=0 E(ut2)=σ2 =SER2 (note: not σ2t – invariant) E(utus)=0 t≠s Cov(Xt. 68 . 3. 2. 4.

Really. If „all the other influences‟ don‟t have a zero mean. the estimated constant will just adjust to the point u does have a zero mean. B0+u could be thought of as everything else that affects y apart from x 69 .1. this is not a restrictive assumption. E(ut)=0 is not a big deal Providing the model has a constant.

E(ut2)=σ2 =SER2 is Controversial If this assumption holds. the errors are said to be heteroskedastic (hetero for short) There are many conceivable forms of hetero. but perhaps the most common is when the variance depends upon the value of x 70 . the errors are said to be homoskedastic If it is violated.2.

Hetero Related to X is very Common Homoskedastic Heteroskedastic 71 .

but Don‟t be Complacent Homoskedastic Heteroskedastic 72 .Our Data Looks OK.

it is highly likely that „left out‟ variables will be autocorrelated (i. E(utus)=0 t≠s A violation of this is called autocorrelation If the underlying model generates data for a time series. z depends on lagged z. most time series are like this) and so u will be too.e. 73 . But if the model describes a cross-section assumption 3 is likely to hold.3.

Auto plagues time series.Aside: Hetero and Auto are not a Disaster Hetero plagues cross-sectional data. Heteroskedasticity and Autocorrelation do not bias the Least Squares Estimators. This is a very strange result! 74 . Remarkably.

Hetero Doesn‟t Bias y Draw a least squares line through these points x 75 .

y (a) homoskedasticity (b) heteroskedasticity x 76 .If You Could See the True Line You‟d Realize hetero is bad for OLS .

y (a) (b) x But we will make an adjustment to our analysis later OLS is no longer „best‟ which means minimum variance 77 . pulling up the (negative) slope of the least squares line. OLS is still unbiased because the next draw is just as likely to find the third error above the true line. the true line would be revealed with many samples. On average.But OLS is Still Unbiased! In case (b).

Tick Newey West to correct for both. the correction only occurs for the standard errors. Sometimes. we will use standard errors corrected in this way 78 . In EVIEWs you do this by: estimate/options/heteroskedasticity consistent coefficient covariance/ input: leave white ticked if only worried about hetero. Because OLS is unbiased.Conquer Hetero and Auto with Just One Click SW recommend you correct standard errors for hetero and auto.

it is highly likely that that variable will be correlated with a variable that is left out of the model. it is worth stating that 1. – 4.4. Before proceeding with assumption 5. are all that are required to prove the so-called Gauss-Markov Theorem. that OLS is Best Linear Unbiased Estimator (SW Section 5. which is implicitly in the error term.5) 79 . Cov(Xt. ut)=0 This will be discussed extensively next lecture When there is only one variable in a regression.

html The assumption „delivers‟ a known distribution of OLS estimators (a „t‟ distribution) if n is small.5. so it is unnecessary. Many variables are Normal http://rba.rand.gov. But if n is large (>30) the OLS estimators become Normal.org/statistics/applets/clt.au/Statistics/AlphaListing/index. OLS is minimum volatility estimator among all consistent estimators.html http://www.html 80 .com/stat_sim/sampling_dist/index. ut~Normal With this assumption. This is due to the Central Limit Theorem http://onlinestatbook.

Assessment of Assumptions
1. 2.

3.
4. 5.

E(ut)=0 harmless if model has constant E(ut2)=σ2 =SER2 not too serious since E(utus)=0 t≠s OLS still unbiased Cov(Xt, ut)=0 Serious – see next lecture ut~Normal Nice Property to have, but if sample size is big it doesn’t matter We assume 2. and 3. hold, or just adjust the standard errors. We‟ll also assume n is large and we‟ll always keep a constant, so 5. and 1. are not relevant. This lecture, we assume 4 holds.
81

Outline
1.

2.
3. 4.

OLS Assumptions OLS Sampling Distribution Hypothesis Testing Confidence Intervals

82

With OLS Assumptions the CLT Gives Us the Distribution of 1
ˆ
1

~ N(

1

, SE (

1

) )

2

83

t-distribution (small n) vs Normal

0.4

0.3

0.2

0.1

3

2

1

1

2

3

84

SE ( 1 ) ) 2 85 .With OLS Assumptions the CLT Gives Us the Distribution of 1 ˆ 1 ~ N( 1 .

With OLS Assumptions the CLT Gives Us the Distribution of 1 ˆ 1 ~ N( 1 . SE ( 1 ) ) 2 86 .

Durbin-Watson stat 87 . 0.5 -1822.467491 0.05335 8. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.0000 0.706143 8.279808 0.58097 144315.0000 654.E.686903 8.EVIEWs output gives us SE(B1) Dependent Variable: TESTSCR Method: Least Squares Date: 06/04/08 Time: 22:13 Sample: 1 420 Included observations: 420 Coefficient C STR R-squared Adjusted R-squared S. Error 9.048970 18.479826 t-Statistic 73.000003 Std.9330 -2.751327 Prob.D.82451 -4.129062 Mean dependent var S. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 698.57511 0.051240 0.250 22.1565 19.694507 0.

4. 3.Outline 1. 2. OLS Assumptions OLS Sampling Distribution Hypothesis Testing Confidence Intervals 88 .

4675 The standard error of ˆ is 10.52 The R2 is .4798 (10. R2 = .4) (0. = 698.6 89 .28 STR 9.EVIEWs Output Can be Summarized in Two Lines Put standard errors in parentheses below the estimated coefficients to which they apply. SER = 18.6 TestScore 0.28 STR.05. the standard error of the regression is 18.4675 This expression gives a lot of information The estimated regression line is TestScore = 698.9 – 2.4798 The standard error of ˆ1 is 0.05.52) 9.9 – 2.4 0 0.

9 – 2.We Only Need Two Numbers For Hypothesis Testing Put standard errors in parentheses below the estimated coefficients to which they apply.4798 The standard error of ˆ1 is 0.52) 9. = 698.05. SER = 18.05.6 90 . R2 = .28 STR.28 STR 9.4798 (10.6 TestScore 0.4675 This expression gives a lot of information The estimated regression line is TestScore = 698.4675 The standard error of ˆ is 10.4) (0. the standard error of the regression is 18.52 The R2 is .4 0 0.9 – 2.

6.Remember Hypothesis Testing? 1. 3. 5. H1 = alternative hypothesis = what you believe if you reject H0 Collect evidence and create a calculated Test Statistic Decide on a significance level = test size = Prob(type I error) = The test size defines a rejection region and a critical value (the changeover point) Reject H0 if Test Statistic lies in Rejection Region 91 . H0 = null hypothesis = „status quo‟ belief = what you believe without good reason to doubt it. 4. 2.

0 is the hypothesized value under the null. H1: 1 1.0 92 .0 vs.1) The objective is to test a hypothesis.0 vs. Null hypothesis and one-sided alternative: H 0: 1 = 1.0 1 = 1.Hypothesis Testing and the Standard Error of ˆ1 (Section 5. General setup Null hypothesis and two-sided alternative: H0: where 1. H1: 1 < 1. using data – to reach a tentative conclusion whether the (null) hypothesis is correct or incorrect. like 1 = 0.

hypothesized value t= standard error of the estimator where the SE of the estimator is the square root of an estimator of the variance of the estimator.1) critical value) In general: estimator . and compute p-value (or compare to N(0. t= 1 .General approach: construct t-statistic. SE ( ˆ1 ) where SE( ˆ1 ) = the square root of an estimator of the variance of the sampling distribution of ˆ 1 Comparing distance between estimate and your hypothesized value is obvious. doing it in units of volatility is less so. 93 .0 For testing 1. ˆ 1.

0 vs.0 t= 1 SE ( ˆ1 ) 1 = 1.Summary: To test H0: H1: 1 ≠ 1. 94 .0.96 This procedure relies on the large-n approximation. Reject at 5% significance level if |t| > 1. typically n = 30 is large enough for the approximation to be excellent. Construct the t-statistic ˆ 1.

73 95 . What values of the test statistic would make you more determined to reject the null than you are now? 2. “the p-value. If the null is true. and look up p-value in the index 1. what is the probability of obtaining those values? This is the p-value. also called the significance probability [not in QBA] is the probability of drawing a statistic at least as adverse to the null hypothesis as the one you actually computed in your sample.p-values are another method See textbook pp 72-81. assuming the null hypothesis is correct” pg.

p-values are another method See textbook pp 72-81. p-value is p = Pr[|t| > |tact|] = probability in tails of normal outside |tact|. you reject at the 5% significance level if the p-value is < 5% (or < 1% or <10% depending on test size) REJECT H0 IF PVALUE < 96 . and look it up in the index For a two sided test.

28 STR Regression software reports the standard errors: (The standard errors are corrected for heteroskedasticity) SE( ˆ0 ) = 10.9 – 2.52 SE ( 1 1 1. California data Estimated regression line: TestScore = 698.0 ˆ The 1% 2-sided significance level is 2.0 = 0 = = = –4. Alternatively.Example: Test Scores and STR.28 0 t-statistic testing 1.4 SE( ˆ1 ) = 0.58.38 ˆ) 0. so we reject the null at the 1% significance level. we can compute the p-value… 97 .52 2.

The p-value based on the large-n standard normal approximation to the t-statistic is 0.00001 (10–5) 98 .

126888 0.002106 t-Statistic 581.1223 -0.003183 Std.3553 -1.0000 0. 0.Hypothesis Testing Can be Tricky Dependent Variable: TESTSCR Method: Least Squares Date: 06/05/08 Time: 22:35 Sample: 1 420 Included observations: 420 Coefficient C COMPUTER 655.511647 Prob.1314 „Prob‟ only equals p-value for a two sided test 99 . Error 1.

with =. with =.05 using the critical-values approach (d) H0:B1=0.05 using the p-value approach (f) H0:B1=0. H1:B1<-. H1:B1≠0. with =.05. H1:B1<0. H1:B1≠0. with =. H1:B1<0. with =.05 using the p-value approach (e) H0:B1=0.05 using the critical-values approach (b) H0:B1=0.05 using the critical-values approach (c) H0:B1=0.10 100 .05. with =. H1:B1>0. with =. H1:B1>0.05 using the p-value approach (g) H0:B1=-.Try These Hypotheses (a) H0:B1=0.

H1:B1>0.05 using the critical-values approach 101 . with =.H0:B1=0.

with =. H1:B1<0.(b) H0:B1=0.05 using the critical-values approach 102 .

05 using the critical-values approach 103 . H1:B1≠0.(c) H0:B1=0. with =.

05 using the p-value approach This is very hard to do with p-values! 104 .(d) H0:B1=0. with =. H1:B1>0.

H1:B1<0.05 using the p-value approach 105 . with =.(e) H0:B1=0.

with =.05 using the p-value approach 106 .(f) H0:B1=0. H1:B1≠0.

with =.05.(g) H0:B1=-0.05. H1:B1<-0.10 107 .

2.Outline 1. 3. 4. OLS Assumptions OLS Sampling Distribution Hypothesis Testing Confidence Intervals 108 .

SE ( 1 ) ) 2 109 .With OLS Assumptions the CLT Gives Us the Distribution of 1 ˆ 1 ~ N( 1 .

1) 110 2 1 1 1 1 ˆ .Confidence Intervals ˆ ~ N ( . SE ( ) 2 ) 1 1 1 ˆ ~ N (0. SE ( 1 ) ) SE ( 1 ) ~ N (0.

com/bps4e/content/cat_010/applets/confidenceinterval.html 111 1.96.95% Confidence Intervals Catch the True Parameter 95% of the Time Prob(-1. SE( ˆ ) So.95 .95 ˆ 1.SE( ˆ )) . SE( ˆ ) is 0. SE( ˆ ) ˆ 1. the probabilit y will be captured by the random interval ˆ 1.SE( ˆ )) .SE( ˆ ) 1.96) .95 Prob( ˆ 1.96.96.96.96.95. http://bcs.96 ˆSE( ˆ ) Prob(-1.SE( ˆ )) .96.whfreeman.95 ˆ Prob(1.96.

112 . we can define a 1 .96 ˆ) ˆ) SE( SE( ˆ 1.Confidence Interval as a range of values that could not be rejected as nulls in a two .96 in favour of H1 : at.Confidence Intervals are Reasonable Ranges If we cannot reject H 0 : It implies ˆSE( ˆ ) ˆ1.96 SE( ˆ ) But this just says must lie in a 95% CI.96 1. Going the other way.96 1.96 SE( ˆ ) ˆ 1.sided test of significan ce with test size . say 5% ˆ 1.96 1.

4 95% confidence interval for ˆ1 : { ˆ1 1. –1.28 STR SE( ˆ0 ) = 10.Confidence interval example: Test Scores and STR Estimated regression line: TestScore = 698.28 1.9 – 2.26) 113 .30.96 0.96 SE( ˆ1 )} = {–2.52} SE( ˆ1 ) = 0.52 = (–3.

Use Simple Regression. 4. not Correlation 1. 2. ut)=0 114 . OLS Assumptions OLS Sampling Distribution Hypothesis Testing Confidence Intervals But be careful of the Simple Regression assumption Cov(Xt.If You Make 1→1 Associations. 3.

Chapter 6 Introduction to Multiple Regression 115 .

4. Omitted variable bias Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator 116 .Outline 1. 3. 2.

there are always omitted variables.It‟s all about u (SW Section 6. so.1) The error u arises because of factors that influence Y but are not included in the regression function. the omission of those variables can lead to bias in the OLS estimator. Cov(Xt. Sometimes. ut)=0 Is violated 117 . This occurs because the assumption 4.

3. Omitted variable bias Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator 118 . 4.Outline 1. 2.

Omitted variable bias=OVB
The bias in the OLS estimator that occurs as a result of an omitted factor is called omitted variable bias. Let y= 0+ 1x+u and let u=f(Z) omitted variable bias is a problem if the omitted factor “Z” is: 1. 2. A determinant of Y (i.e. Z is part of u); and Correlated with the regressor X (i.e. corr(Z,X) 0)

Both conditions must hold for the omission of Z to result in omitted variable bias.
119

What Causes Long Life?
Gapminder (http://www.gapminder.org/world) is an online applet that contains demographic information about each country in the world. Suppose that we are interested in predicting life expectancy, and think that both income per capita and the number of physicians per 1000 people would make good indicators. Our first step would be to graph these predictors against life expectancy
We find that both are positively correlated with life expectancy
120

…Doctors or Income or Both?
Simple Linear Regression only allows us to use one of these predictors to estimate life expectancy. But income per capita is correlated with the number of physicians per 1000 people. Suppose the truth is: Life=B0+B1Income+B2Doctors+u but you run

Life=B0+B1Income+u*

(u*=B2Doctors+u)
121

OVB=„Double Counting‟
B1 is the impact of Income on Life, holding everything else constant including the residual But if correlation exists between the Doctors (in the residual) and income (rIncDoct≠0 ), and, if the true impact of Doctors (B2≠0) is non-zero, then B1 counts both effects – it „double counts‟

Life=B0+B1Income+u*

(u*=B2Doctors+u)
122

Our Test score Reg has OVB In the test score example: 1. ˆ1 is biased. English language deficiency (whether the student is learning English) plausibly affects standardized test scores: Z is a determinant of Y. Accordingly. 2. 123 . Immigrant communities tend to be less affluent and thus have smaller school budgets – and higher STR: Z is correlated with X.

must be negative (ρXu < 0). It is „too big‟ in absolute value. Therefore the correlation between STR and u[ minus PctEL]. (Standard deviations are always positive) Bias ˆ X u X Xu X the volatility of the error and the included variable matter So the coefficient of student-teacher ratio is negatively biased by the exclusion of the percentage of English learners. the correlation between STR and PctEL will be positive PctEL appears in u with a negative sign in front of it – higher PctEL leads to lower scores. Here is the formula. 124 0 .What is the bias? We have a formula STR is larger for those classes with a higher PctEL (both being a feature of poorer areas).

all classes have the same PctEL. with finer gradations of STR and PctEL – within each group. Run a randomized controlled experiment in which treatment (STR) is randomly assigned: then PctEL is still a determinant of TestScore. so we control for PctEL (But soon we will run out of data. and what about other determinants like family income and parental education?) 3.) 2. 125 . Adopt the “cross tabulation” approach.Including PctEL Solves Problem Some ways to overcome omitted variable bias 1. but PctEL is uncorrelated with STR. Use a regression in which the omitted variable (PctEL) is no longer omitted: include PctEL as an additional regressor in a multiple regression. (But this is unrealistic in practice.

Omitted variable bias Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator 126 . 3. 4. 2.Outline 1.

X1. X2 are the two independent variables (regressors) (Yi. X1i. 0 1 2 = unknown population intercept = effect on Y of a change in X1. and X2.The Population Multiple Regression Model (SW Section 6.n Y is the dependent variable X1.2) Consider the case of two regressors: Yi = 0 + 1X1i + 2X2i + ui. holding X2 constant = effect on Y of a change in X2. X2i) denote the ith observation on Y.…. holding X1 constant 127 ui = the regression error (omitted factors) . i = 1.

holding X1 constant = Ceteris Paribus 0 = predicted value of Y when X1 = X2 = 0. in Economics Yi = 0 + 1X1i + 2X2i + ui.Partial Derivatives in Multiple Regression = Cet.n We can use calculus to interpret the coefficients: = Y X1 1 .…. holding X2 constant= Ceteris Paribus 2 = Y X2 . 128 . Par. i = 1.

The OLS Estimator in Multiple Regression (SW Section 6.3) With two regressors.b2 i 1 [Yi (b0 b1 X 1i b2 X 2i )]2 The OLS estimator minimizes the average squared difference between the actual values of Yi and the prediction (predicted value) based on the estimated line. 129 . 1 and 2. This minimization problem is solved using calculus This yields the OLS estimators of 0 .b1 . the OLS estimator solves: n min b0 .

728224 0.423680 14.05335 8.188387 8.93909 654.0113 0.65 PctEL More on this printout later… 130 .10 STR – 0.031032 Mean dependent var S.561 Std.59930 -2.544307 -20. 0.0000 C(1) C(2) C(3) R-squared Adjusted R-squared S.29 -1716. dependent var Akaike info criterion Schwarz criterion Durbin-Watson stat t-Statistic 78.0322 -1.D.649777 0.Multiple regression in EViews Dependent Variable: TESTSCR Method: Least Squares Sample: 1 420 Included observations: 420 White Heteroskedasticity-Consistent Standard Errors & Covariance TESTSCR=C(1)+C(2)*STR+C(3)*EL_PCT Coefficient 686.0000 0.426431 0.217246 0.0 – 1.46448 87245.432847 0.101296 -0.685575 Prob. Error 8.1565 19.E. of regression Sum squared resid Log likelihood TestScore = 686.

Outline 1. 2. Omitted variable bias Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator 131 . 3. 4.

4) R2 now becomes the square of the correlation coefficient between y and predicted y. to modeling it with a group of variables.Measures of Fit for Multiple Regression (SW Section 6. 132 . It is still the proportional reduction in the residual sum of squares as we move from modeling y with just a sample mean.

2 R and R 2 The R2 is the fraction of the variance explained – same definition as in regression with a single regressor: ESS SSR R = =1 . TSS = 2 i n (Yi Y ) 2 . SSR = 2 n ˆ u . TSS TSS 2 n where ESS = i 1 ˆ ˆ (Yi Y ) . i 1 i 1 The R2 always increases when you add another regressor (why?) – a bit of a problem for a measure of “fit” 133 .

2 R and R 2 The R 2 (the “adjusted R2”) corrects this problem by “penalizing” you for including another regressor – the R 2 does not necessarily increase when you add another regressor. 134 . Adjusted R2: n 1 SSR 2 R =1 = 1 n k 1 TSS n 1 SSR 2 (1-R ) n k 1 TSS Note that R 2 < R2. however if n is large the two will be very close.

426. Test score example: (1) STR.6 STR – 0.65PctEL. SER = 18. ctd. R 2 = .10 R2 = .9 – 2.Measures of fit.05. TestScore = 686.5 What – precisely – does this tell you about the fit of regression (2) compared with regression (1)? Why are the R2 and the R 2 so close in (2)? 135 (2) .0 – 1.424. SER = 14.28STR R2 = . TestScore = 698.

3. 4. Omitted variable bias Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator 136 .Outline 1. 2.

E(utus)=0 t≠s 3. Cov(Xt. There is no perfect multicollinearity 137 .5) yi=B0+B1x1i+B2x2i+……Bkxki+ui • E(ut)=0 1. ut)=0 4. ut~Normal plus 5. E(ut2)=σ2 =SER2 (note: not σ2t – invariant) 2.Sampling Distribution Depends on Least Squares Assumptions (SW Section 6.

Assumption #4: There is no perfect multicollinearity Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. Example: Suppose you accidentally include STR twice: 138 .

holding STR constant (???) The Standard Errors become Infinite when perfect multicollinearity exists 139 . In such a regression (where STR is included twice).Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. 1 is the effect on TestScore of a unit change in STR.

140 . because it always rises with extra variables. The more you add.OLS Wonder Equation SE (bi ) • Suˆ S xi 1 n(1 R 2 xi on X ) Multicollinearity increases R2 and therefore increases variance of bi Perfect multicollinearity (R2=1) makes regression impossible You expect a low standard error the more variables you add to a regression. the higher the R-squared on the denominator becomes.

Quality of Slope Estimate (R2 and Suˆ fixed) High SE(bi) Low SE(bi) Low SE(bi) xi Sxi2 n=6 xi Sxi2 n=20 xi Sxi2 n=6 141 = < .

The Sampling Distribution of the OLS Estimator (SW Section 6.6)
Under the Least Squares Assumptions, The exact (finite sample) distribution of ˆ1 has mean 1, var( ˆ1 ) is inversely proportional to n; so too for ˆ2 . Other than its mean and variance, the exact (finite-n) distribution of ˆ1 is very complicated; but for large n…

ˆ is consistent: ˆ 1 (law of large numbers) 1 1 ˆ 1 1 SE ( ˆ ) is approximately distributed N(0,1) (CLT)
1

p

So too for ˆ2 ,…, ˆk Conceptually, there is nothing new here!
142

Multicollinearity, Perfect and Imperfect (SW Section 6.7)
Some more examples of perfect multicollinearity The example from earlier: you include STR twice. Second example: regress TestScore on a constant, D, and Bel, where: Di = 1 if STR ≤ 20, = 0 otherwise; Beli = 1 if STR >20, = 0 otherwise, so Beli = 1 – Di and there is perfect multicollinearity because Bel+D=1 (the 1 „variable‟ for the constant) To fix this, drop the constant

143

Perfect multicollinearity, ctd.
Perfect multicollinearity usually reflects a mistake in the definitions of the regressors, or an oddity in the data If you have perfect multicollinearity, your statistical software will let you know – either by crashing or giving an error message or by “dropping” one of the variables arbitrarily The solution to perfect multicollinearity is to modify your list of regressors so that you no longer have perfect multicollinearity.

144

Imperfect multicollinearity
Imperfect and perfect multicollinearity are quite different despite the similarity of the names. Imperfect multicollinearity occurs when two or more regressors are very highly correlated. Why this term? If two regressors are very highly correlated, then their scatterplot will pretty much look like a straight line – they are collinear – but unless the correlation is exactly 1, that collinearity is imperfect.

145

Imperfect multicollinearity (correctly) results in large standard errors for one or more of the OLS coefficients as described by the OLS wonder equation Next topic: hypothesis tests and confidence intervals… 146 . Intuition: the coefficient on X1 is the effect of X1 holding X2 constant. so the variance of the OLS estimator of the coefficient on X1 will be large. Imperfect multicollinearity implies that one or more of the regression coefficients will be imprecisely estimated. but if X1 and X2 are highly correlated. there is very little variation in X1 once X2 is held constant – so the data are pretty much uninformative about what happens when X1 changes but X2 doesn‟t. ctd.Imperfect multicollinearity.

the overlap tells the size of the R2 X 147 .Portion of X that “explains” Y High R2 Y For any two circles.

the overlap tells the size of the R2 X 148 .Portion of X that “explains” Y LowR2 Y For any two circles.

Adding Another X Increases Y 2 R X2 X1 Now the R is the overlap of both X1 and X2 with Y 149 .

adding X2 allows us to work out independent effects better. but we realize we don‟t have much information (area) to do this with. the overlap tells the size of R2 SE (b1 ) Suˆ S x1 1 n(1 R 2 x1 on x2 ) 150 . Larger n makes all circles bigger and.Imperfect (but high) multicollinearity Y X2 X1 Since X2 and X1 share a lot of the same information. as before.

Chapter 7: Multiple Regression: Multiple Coefficient Testing 151 .

kxkt + et We know how to obtain estimates of the coefficients. and each one is a ceteris paribus („all other things equal‟) effect Why would we want to do hypotheses tests about groups of coefficients? 152 ..Multiple Coefficients Tests? yt = 0 + 1x1t + 2x2t +.

kxkt + et Example 1: Consider the statement that „this whole model is worthless‟. x2. xk helps explain y 153 .Multiple Coefficients Tests yt = 0 + 1x1t + 2x2t +.= k=0 because if this is true then none of the variables x1. . One way of making that statement mathematically formal is to say 1 = 2=…..

Multiple Coefficients Tests yt = 0 + 1x1t + 2x2t +.. The way to write this mathematically is 1 > 2 154 . kxkt + et Example 2: Suppose y is the share of the population that votes for the ruling party and x1 and x2 is the spending on TV and radio advertising. as measured by the impact on the share of the popular vote for of an extra dollar spent on each. The Prime Minister might want to know if TV is more effective than radio.

Mathematically. this is 1 = 2= 0 155 . x1 and x2 are the cash rate one and two quarters ago. kxkt + et Example 3: Suppose y is the growth in GDP. One way of doing this is asking if the cash rate at any lag has an impact on GDP growth. Suppose we are interested in testing the effectiveness of monetary policy. and that all the other X‟s are different macroeconomic variables.Multiple Coefficients Tests yt = 0 + 1x1t + 2x2t +..

kxkt + et In each case. even if possible (SW Sect. 7.2) 156 .Multiple Coefficients Tests yt = 0 + 1x1t + 2x2t +. we are interested in making statements about groups of coefficients.. You ought to care about reliability. What about sequential testing? errors compound. What about just looking at the estimates? Same problem as in t-testing.

which can be done with a t-test 157 . except for example 2. kxkt + et The so-called F-test can do all of these restrictions. Before turning to the F-test. let‟s do example 2.Multiple Coefficients Tests yt = 0 + 1x1t + 2x2t +..

. + et = 0 + x1t + 2(x1t + x2t)+ . to test 1 > 2 just run a new regression including x1+x2 instead of x2 (everything else is left the same) and do a t-test for H0: =0 vs. Sub this in. kxkt + et then . H1: >0.Example 2 Solution yt = If 0 + 1x1t + 2x2t +. this implies 1 2 >0 which implies 1 This technique is called reparameterization > 2 158 . Naturally. yt = 0 + ( ) x1t + 2 x2t + . if you accept H1: >0. + et So.

.Restricted Regressions yt = 0 + 1x1t + 2x2t +. kxkt + et One more thing before we do the F-test. This is just the model you get when a hypothesis is assumed true 159 . we must define a „restricted regression‟.

If 1 = 2=…. The estimate for the constant will just be the sample mean of y.= k=0 then the model is yt = 0 + et and the restricted regression would be an OLS regression of y on a constant. 160 .Restricted Regression: Example 1 yt = 0 + 1x1t + 2x2t +.. kxkt + et Example 1: Consider the statement that „this whole model is worthless‟.

Restricted Regression: Example 3 yt = 0 + 1x1t + 2x2t +. . kxkt + et If 1 = 2 = 0 then the model is yt = 0 + 3x3t + .. kxkt + et and the restricted regression is an OLS regression of y on a constant and x3 to xk 161 .

sc.edu/Courses/J716/demos/LeastSquares/LeastSquaresDemo. That is.Properties of Restricted Regressions Imposing a restriction always increases the residual sum of squares. it implies that the restriction is relatively „unbelievable‟. since you are forcing the estimates to take the values implied by the restriction.sph. the model fits a lot worse with the restriction imposed. This last point is the basic intuition of the F-test – impose the restriction and see if SSR goes up „too much‟.html 162 . http://hadm. rather than letting OLS choose the values of the estimates to minimize the SSR If the SSR increases a lot.

the original regression).The F-test To test a restriction we need to run the restricted regression as well as the unrestricted regression (i.e. Intuitively. where SSRur n k 1 ) 163 r is restricted and ur is unrestricted . we want to know if the change in SSR is big enough to suggest the restriction is wrong F SSRr SSRur q . Let q be the number of restrictions.

since the SSR from the restricted model can‟t be less than the SSR from the unrestricted Essentially the F statistic is measuring the relative increase in SSR when moving from the unrestricted to restricted model q = number of restrictions 164 .The F statistic The F statistic is always positive.

n-k-1. where q is referred to as the numerator degrees of freedom and n – k-1 as the denominator degrees of freedom 165 .The F statistic (cont) To decide if the increase in SSR when we move to a restricted model is “big enough” to reject the restrictions. F ~ Fq. we need to know about the sampling distribution of our F stat Not surprisingly.

The F statistic f(F) Reject H0 at significance level if F > c 0 c fail to reject reject F 166 .

using p-values f(F) Reject H0if p-value < 0 c fail to reject reject F 167 .Equivalently.

The 2 R form of the F statistic Because the SSR‟s may be large and unwieldy. an alternative form of the formula is useful We use the fact that SSR = TSS(1 – R2) for any regression. where again k-1 168 r is restricted and ur is unrestricted . so can substitute in for SSRu and SSRur F R 1 R 2 ur 2 ur R n 2 r q .

Overall Significance (example 1) A special case of exclusion restrictions is to test H0: 2 = 3 =…= k=0 R2 =0 for a model with only an intercept This is because the OLS estimator is just the sample mean. implying the TSS=SSR the F statistic is then F R k 2 1 R n 2 k-1 169 .

0000 0.308856 0.0000 654.1152 0.396366 0.136515 1.8053/4]/[{1-.165602 7.117504 7.0000 0. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 675. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.676 429.07 -1489.194328 0. Error 5.083331 0.192818 Prob.803421 8.8053}/(420-5)] = 429 170 .2606 -14.451272 -6.674984 -0.000000 Mean dependent var S.0000 0.100035 -2.0146 0.027408 0.Dependent Variable: TESTSCR Method: Least Squares Date: 06/05/08 Time: 15:29 Sample: 1 420 Included observations: 420 F Std.545766 2 k-1 Coefficient C MEAL_PCT AVGINC STR EL_PCT R-squared Adjusted R-squared S.D.6082 -0.05335 7.447723 29616.46148 8.805298 0. Durbin-Watson stat [.031380 R k 2 1 R n t-Statistic 127.1565 19. 0.E.560389 -0.228612 0.

General Linear Restrictions The basic form of the F statistic will work for any set of linear restrictions First estimate the unrestricted model and then estimate the restricted model In each case. make note of the SSR Imposing the restrictions can be tricky – will likely have to redefine variables again 171 .

and the p-values will be the same F-tests are done mechanically – you don‟t have to do the restricted regressions (though you have to understand how to do them for this course). then F = t2. p-values can be calculated by looking up the percentile in the appropriate F distribution If only one exclusion is being tested. 172 .F Statistic Summary Just as with t statistics.

173 .F-tests are Easy in EVIEWs To test hypotheses like these in EVIEWs. use the Wald test. SW discusses the shortcomings of F-tests at length. After you run your regression. type „View. that the pvalues are the same. and compare it with the F-test automatically calculated in EVIEWs. Wald‟ Try testing a single restriction (which you can use a t-test for) and see that t2=F. Try testing all the coefficients except the intercept are zero. and. They crucially depend upon the assumption of homoskedasticity. Coefficient tests.

and so on…. run the model again. Test for Hetero. Start with a very big model to avoid OVB Do t-tests on individual coefficients. delete the most insignificant variable. Finally.until every individual coefficient is significant. Delete the most insignificant. run the model again. 174 . and correct for it if need be. Do an F-test on the original model excluding all the coefficients required to get to your final model at once. you have verified the model.Start Big and Go Small General to Specific Modeling relies upon the fact that omitted variable bias is a serious problem. If the null is accepted.

Chapter 8 Nonlinear Regression Functions 175 .

4.„Linear‟ Regression = Linear in Parameters. Not Nec. 2. 3. Variables 1. Nonlinear regression functions – general comments Polynomials Logs Nonlinear functions of two variables: interactions 176 .

Not Nec. Nonlinear regression functions – general comments Polynomials Logs Nonlinear functions of two variables: interactions 177 . 3. 4. Variables 1.„Linear‟ Regression = Linear in Parameters. 2.

Nonlinear Regression Population Regression Functions – General Ideas (SW Section 8.1) If a relation between Y and X is nonlinear: The effect on Y of a change in X depends on the value of X – that is. The solution to this is to estimate a regression function that is nonlinear in X 178 . the marginal effect of X is not constant A linear regression is mis-specified – the functional form is wrong The estimator of the effect on Y of X is biased – it needn‟t even be right on average.

2) We‟ll look at two complementary approaches: 1. Polynomials in X The population regression function is approximated by a quadratic. Logarithmic transformations Y and/or X is transformed by taking its logarithm this gives a “percentages” interpretation that makes sense in many applications 179 . cubic.Nonlinear Functions of a Single Independent Variable (SW Section 8. or higher-degree polynomial 2.

3. 2. 4. Nonlinear regression functions – general comments Polynomials Logs Nonlinear functions of two variables: interactions 180 . Not Nec.„Linear‟ Regression = Linear in Parameters. Variables 1.

Polynomials in X Approximate the population regression function by a polynomial: Yi = 0+ 1X i + X i2 +…+ 2 X ir + ui r This is just the linear multiple regression model – except that the regressors are powers of X! Estimation. hypothesis testing.2. etc. proceeds as in the multiple regression model using OLS The coefficients are difficult to interpret. but the regression function itself is interpretable 181 .

Example: the TestScore – Income relation Incomei = average district income in the ith district (thousands of dollars per capita) Quadratic specification: TestScorei = Cubic specification: TestScorei = 0+ 1Incomei + 0+ 1Incomei + (Incomei)2 + ui 2 (Incomei)2 2 182 + (Incomei)3 + ui 3 .

D.951439 Prob. Error 2.2878 14.32 Schwarz criterion Log likelihood -1662. of regression 12.0000 0.556173 Mean dependent var Adjusted R-squared 0.850509 654.0000 0. 0.268094 0.05335 7.850995 -0.3017 3.1565 19.901754 0.36434 -8.554045 S. 183 .042308 Std.E.931944 7.004780 t-Statistic 209. dependent var S.960803 0.72381 Akaike info criterion Sum squared resid 67510.0000 C(1) C(2) C(3) R-squared 0.Estimation of the quadratic specification in EViews Dependent Variable: TESTSCR Create a quadratic regressor Method: Least Squares Sample: 1 420 Included observations: 420 White Heteroskedasticity-Consistent Standard Errors & Covariance TESTSCR=C(1)+C(2)*AVGINC + C(3)*AVGINC*AVGINC Coefficient 607.708 Durbin-Watson stat Test the null hypothesis of linearity against the alternative that the regression function is a quadratic….

85Incomei – 0.Interpreting the estimated regression function: (a) Plot the predicted values TestScore = 607.27) (0.0423(Incomei)2 (2.9) (0.0048) 184 .3 + 3.

Interpreting the estimated regression function, ctd:
(b) Compute “effects” for different values of X TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2 (2.9) (0.27) (0.0048) Predicted change in TestScore for a change in income from \$5,000 per capita to \$6,000 per capita: TestScore = 607.3 + 3.85 6 – 0.0423 62 – (607.3 + 3.85 5 – 0.0423 52) = 3.4
185

TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2 Predicted “effects” for different values of X: Change in Income (\$1000 per capita)
from 5 to 6 from 25 to 26 from 45 to 46

TestScore
3.4 1.7 0.0

The “effect” of a change in income is greater at low than high income levels (perhaps, a declining marginal benefit of an increase in school budgets?) Caution! What is the effect of a change from 65 to 66? Don’t extrapolate outside the range of the data!
186

TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2 Predicted “effects” for different values of X: Change in Income (\$1000 per capita)
from 5 to 6 from 25 to 26 from 45 to 46

TestScore
3.4 1.7 0.0

Alternatively, dTestscore/dIncome = 3.85-.0846 (Income) gives the same numbers (approx)

187

Summary: polynomial regression functions
Yi =
0+ 1Xi + 2

X i2 +…+

X ir + ui r

Estimation: by OLS after defining new regressors Coefficients have complicated interpretations To interpret the estimated regression function: plot predicted values as a function of x compute predicted Y/ X at different values of x Hypotheses concerning degree r can be tested by t- and Ftests on the appropriate (blocks of) variable(s). Choice of degree r plot the data; t- and F-tests, check sensitivity of estimated effects; judgment.
188

The following applet lets us explore fitting different polynomials to some data.A Final Warning: Polynomials Can Fit Too Well When fitting a polynomial regression function. then any prediction may become unrealistic. despite the fact that a higher order polynomial will always fit better. If we do fit too many terms.html 189 .scottsarra. we need to be careful not to fit too many terms.org/math/courses/na/nc/polyRegression. http://www.

exploring some common regression functions http://www. Are Polynomials Enough? We can investigate the appropriateness of a regression function by graphing the regression function over the top of the scatterplot.edu/%7Elane/stat_sim/transformations/index.html 190 . take logs of the response variable The site below allows us to do this.3. For some models. we may need to transform the data For example.rice.ruf.

4. Variables 1.„Linear‟ Regression = Linear in Parameters. 3. 2. Nonlinear regression functions – general comments Polynomials Logs Nonlinear functions of two variables: interactions 191 . Not Nec.

Here’s why: ln( x) 1 x x ln( x) 1 x x ln( x) x x proportion al change in x Numerically: ln(1. rather than linearly.01).6889-3.8067=-.00995-0 =.1111) .3. Logarithmic functions of Y and/or X ln(X) = the natural logarithm of X Logarithmic transforms permit modeling relations in “percentage” terms (like elasticities).01)-ln(1) = . 192 ln(40)-ln(45) = 3.1178 (correct % = -.00995 (correct % .

” 193 .Three log regression specifications: Case I. linear-log II. The interpretation is found by applying the general “before and after” rule: “figure out the change in Y for a given change in X. log-linear III. log-log Population regression function Yi = ln(Yi) = 0 + 0 0 1ln(Xi) + ui + ui + ui ln(Yi) = + 1Xi + 1ln(Xi) The interpretation of the slope coefficient differs in each case.

differing in whether Y and/or X is transformed by taking logarithms. and plotting predicted values 194 . Hypothesis tests and confidence intervals are now implemented and interpreted “as usual. tests. Choice of specification should be guided by judgment (which interpretation makes the most sense in your application?).Summary: Logarithmic transformations Three cases. The regression is linear in the new variable(s) ln(Y) and/or ln(X).” The interpretation of 1 differs from case to case. and the coefficients can be estimated by OLS.

4. Variables 1. Not Nec.„Linear‟ Regression = Linear in Parameters. 3. Nonlinear regression functions – general comments Polynomials Logs Nonlinear functions of two variables: interactions 195 . 2.

= 0 if not X = 1 if female. So far. = 0 if male X = 1 if treated (experimental drug).Regression when X is Binary (Section 5. How do we interpret regression with a binary regressor? 196 . 1 has been called a “slope. = 0 if not Binary regressors are sometimes called “dummy” variables.3) Sometimes a regressor is binary: X = 1 if small class size.” but that doesn‟t make sense if X is binary.

Yi = + ui the mean of Yi is When Xi = 1.Interpreting regressions with a binary regressor Yi = 0 + 1Xi 0 + ui. where X is binary (Xi = 0 or 1): 0 0 When Xi = 0. E(Yi|Xi=1) = + 1 = E(Yi|Xi=1) – E(Yi|Xi=0) = population difference in group means 197 . E(Yi|Xi=0) = + 1 0 + ui 1 0 the mean of Yi is so: 1 + that is. Yi = 0 that is.

Interactions Between Independent Variables (SW Section 8. might depend on PctEL STR Y More generally.3) Perhaps a class size reduction is more effective in some circumstances than in others… Perhaps smaller classes help more if there are many English learners. then continuous X‟s 198 . might depend on X2 X1 How to model such “interactions” between X1 and X2? We first consider binary X‟s. who need individual attention TestScore That is.

1 To allow the effect of changing D1 to depend on D2. In this specification. D2i are binary is the effect of changing D1=0 to D1=1.(a) Interactions between two binary variables Yi = 0 + 1D1i + 2D2i + ui D1i. include the “interaction term” D1i D2i as a regressor: Yi = 0 + 1D1i + 2D2i + 3(D1i D2i) + ui 199 . this effect doesn’t depend on the value of D2.

Interpreting the coefficients: Yi = 0 + 1D1i + 2D2i + 3(D1i D2i) + ui It can be shown that Y D1 = 1 + 3D 2 The effect of D1 depends on D2 (what we wanted) 3 = increment to the effect of D1 from a unit change in D2 200 .

9 – 3.5/3. STR.1 201 .2HiEL – 1.5(HiSTR HiEL) (1.Example: TestScore.1 – 18.5 = –5.9 “Effect” of HiSTR when HiEL = 1 is –1. English learners Let 1 if STR 20 1 if PctEL l0 HiSTR = and HiEL = 0 if STR 20 0 if PctEL 10 TestScore = 664.1) “Effect” of HiSTR when HiEL = 0 is –1.4 Class size reduction is estimated to have a bigger effect when the percent of English learners is large This interaction isn‟t statistically significant: t = 3.4) (2.9HiSTR – 3.3) (1.9) (3.

(b) Interactions between continuous and binary variables Yi = 0 + 1Xi + 2Di + ui Di is binary. X is continuous As specified above. the effect on Y of X (holding constant D) = 1. include the “interaction term” Di Xi as a regressor: Yi = 0 + 1Xi + 2Di + 3(Di Xi) + ui 202 . which does not depend on D To allow the effect of X to depend on D.

Binary-continuous interactions: the two regression lines Yi = 0 + 1Xi + 2Di + 3(Di Xi) + ui Observations with Di= 0 (the “D = 0” group): Yi = 0 + 1Xi + ui The D=0 regression line Observations with Di= 1 (the “D = 1” group): Yi = 0 + 1Xi + 2 + 3Xi + ui = ( 0+ 2) + ( 1+ 3)Xi + ui The D=1 regression line 203 .

Binary-continuous interactions. D=1 D=0 D=1 D=0 B3=0 All Bi non-zero D=1 D=0 B2=0 204 . ctd.

Y X = 1 + 3D The effect of X depends on D (what we wanted) 3 = increment to the effect of X1 from a change in the level of D from D=0 to D=1 205 . using calculus.Interpreting the coefficients: Yi = 0 + 1 Xi + 2 Di + 3(Xi Di) + ui Or.

206 .6HiEL – 1.97STR When HiEL = 1.59) (19.97) When HiEL = 0: TestScore = 682.8 – 2. HiEL (=1 if PctEL  10) TestScore = 682.9) (0.2 – 0.97STR + 5.28(STR HiEL) (11.28STR = 687.6 – 1.25STR Two regression lines: one for each HiSTR group.97STR + 5.Example: TestScore. STR.2 – 0.5) (0.2 – 0. Class size reduction is estimated to have a larger effect when the percent of English learners is large. TestScore = 682.

ctd: Testing hypotheses TestScore = 682.6/19.59) (19.28/0.94 (p-value < .28(STR HiEL) (11.97STR + 5.97) The two regression lines have the same slope the coefficient on STR HiEL is zero: t = –1.5) (0.29 The two regression lines are the same population coefficient on HiEL = 0 and population coefficient on STR HiEL = 0: F = 89.001) !! We reject the joint hypothesis but neither individual hypothesis (how can this be?) 207 .9) (0.97 = –1.6HiEL – 1.5 = 0.32 The two regression lines have the same intercept the coefficient on HiEL is zero: t = –5.Example.2 – 0.

so you must use judgment: What nonlinear effect you want to analyze? What makes sense in your application? 208 . Estimation and inference proceed in the same way as in the linear multiple regression model. Interpretation of the coefficients is model-specific. allows recasting a large family of nonlinear regression functions as multiple regression. but the general rule is to compute effects by comparing different cases (different value of the original X‟s) Many nonlinear specifications are possible.Summary: Nonlinear Regression Functions Using functions of the independent variables such as ln(X) or X1 X2.

Chapter 9 Misleading Statistics 209 .

210 .Statistics Means Description and Inference Descriptive Statistics is about describing datasets. Various visual tricks can distort these descriptions Inferential Statistics is about statistical inference. You know something about tricks to distort inference (eg. Putting in lots of variables to raise R2 or lowering to get in a variable you want).

211 .Pitfalls of Analysis There are several ways that misleading statistics can occur (which effect both inferential and descriptive statistics) Obtaining flawed data Not understanding the data Not choosing appropriate displays of data Fitting an inappropriate model Drawing incorrect conclusions from analysis.

How to display data badly 212 .Poor Displays of Data: Chart Junk Source: Wainer (1984).

Poor Displays of Data: 2D picture 213 .

Poor Displays of Data: Axes Increments of 100.500.000 to 1.000 A jump in the scale from 800.000 214 .

edu/gmklass/pos138/datadisplay/sections/goodcharts. – “minimise the ratio of ink to data” .htm 215 . start the Y-axis at 0 If this is not possible then you should consider graphing the change in the observation from one period to the next • Some general tips on how to properly display data can be found at http://lilt.How to Display Data • • • The golden rule for displaying data in a graph is to keep it simple Graphs should not have any chart junk.Tufte Axes should be chosen so they do not inflate or deflate the differences between observations – – Where possible.ilstu.

How to Display Data 216 .

4 9.9 7.1 9.4 7.2 1970 1971 1972 1973 1974 1975 6.5 4.7 1.0 1.4 23.3 7.1 24.4 1968 1969 1970 1971 1972 1973 4.7 5.8 11.848 Excess money supply (%) 1965 1966 1967 4.8 Increase in prices two years later (%) 1967 1968 1969 2.4 22.Incorrect Conclusions: Causality Correlation: 0.2 16.2 217 Source: Grenville and Macfarlane (1988) .

MILLS.Accompanying Letter Sir.01. 1976. this would appear to a biologist to be a highly significant correlation. 218 . If Mr Rees-Mogg‟s figures are correct.848 and since there are seven degrees of freedom the P value is less than 0. Professor Lord Kaldor today (March 31) states that “there is no historical evidence whatever” that the money supply determines the future movement of prices with a time lag of two years. I think Mr Rees-Mogg has fully established his point. Department of Medicine. Most betting men would think that those were impressive odds. May I refer Professor Kaldor to your article in The Times of July 13. for it means that the probability of the correlation occurring by chance is less than one in a hundred. University of Cambridge Clinical School. Until Professor Kaldor can show a fallacy in the figures. IVOR H. Yours faithfully. Data If one calculates the correlation between these two sets of figures the coefficient r=0.

Response Sir. we calculated the correlation between the following sets of figures: 219 . Professor Mills today (April 4) uses correlation analysis in your columns to attempt to resolve the theoretical dispute over the cause(s) of inflation.848 between the rate of inflation and the rate of change of “excess” money supply two years before. of course). He cites a correlation coefficient of 0. We were rather puzzled by this for we have always believed that it was Scottish Dysentery that kept prices down (with a one-year lag. To reassure ourselves.

7 5.0 4.1 9.5 1974 1975 16.2 220 Source: Grenville and Macfarlane (1988) .Incorrect Conclusions: Causality Correlation: -0.4 1970 1971 1972 3.4 6.1 24.2 1971 1972 1973 9.5 1967 1968 1969 4.3 Increase in prices one year later (%) 1967 2.868 Cases of Dysentery in Scotland („000) 1966 4.6 1.5 3.4 7.1 3.2 1973 1974 1.7 5.3 1968 1969 1970 4.

Yours faithfully. E.A Final Warning We have to inform you that the correlation coefficient is -0.” By the same argument. so have we.868 (which is statistically slightly more significant than that obtained by Professor Mills). Professor Mills says that “Until … a fallacy in the figures [can be shown]. 221 . R. M. WITCOMB. G. LLEWELLYN. J. Faculty of Economics and Politics. I think Mr Rees-Mogg has fully established his point.