StockWatson Econ CH 3

Fundamentals of statistics and empirics
Linear Regression with one regressor
SW, Chapter 4
MEPS – Preparatory and Orientation Weeks 31

The simple linear regression model:
Outline
1. Basic Idea
2. Ordinary Least Squares Estimation
3. Measures of Fit
4. Least Squares Assumptions
5. Sampling Distribution
6. Hypotheses Testing
7. Confidence Interval
8. Binary Regressor
9. Heteroskedasticity and Homoskedasticity
10. Concerns regarding OLS Estimator
MEPS – Preparatory and Orientation Weeks 1-32

1. Basic Idea
• We want to find the regression line that fits our scatter plot best.

1. Basic Idea
• The slope of the regression line is the expected effect on Y of a unit

change in X.
• In our example, class size and test score.
• With the regression model, we determine
– whether there is a (statistically significant) relation between X and Y,
– how strong is this relation?
– Is this a causal effect between X and Y?

1. Basic Idea
• Estimation:
– How should we draw a line through the data to estimate the slope?
• Answer: ordinary least squares (OLS).
• Hypothesis testing:
– How to test if the slope is zero, i.e. there is no effect of X on Y?
• Confidence intervals:
– How to construct a confidence interval for the slope?

1. Basic Idea
• The regression line: Testscore = b0 + b1STR
b1 = slope of regression line

= ∆Testscore
∆STR
= change in test score for a unit change in STR
• We would like to know the value of b1.
• Since we don’t know b1, so must estimate it using data.

1. Basic Idea
Yi = b0 + b1 X i + ui i = 1,..., n
• We have n observations, (Xi, Yi), i = 1,.., n.
• X is the independent variable or regressor, also explanatory variable.

• Y is the dependent variable or regressand, also explained variable.
• b0 = intercept
• b1 = slope
• ui = the regression error
• The regression error consists of omitted factors. In general, these

omitted factors are other factors that influence Y, other than the variable
X. The regression error also includes error in the measurement of Y.

2. The Ordinary Least Squares Estimation
• How can we estimate b0 and b1 from data?
• We will focus on the least squares (“ordinary least squares” or “OLS”)

estimator of the unknown parameters β0 and β1. The OLS estimator
solves,
• The OLS estimator minimizes the average squared difference between the
actual values of Yi and the prediction (“predicted value”) based on the
estimated line.
• The result is the OLS estimators of b0 and b1.



• Application to the California Test Score – Class Size data
• Estimated slope = β̂1= – 2.28

• Estimated intercept = β̂ 0= 698.9
• Estimated regression line: Test Score= 698.9 – 2.28 STR

Interpretation of the estimated slope and intercept
• Test Score = 698.9 – 2.28 STR
• In increase of student-teacher ratio by 1 reduces the test scores on

average by 2.28.
That is, ∆Test score = –2.28

∆STR
• The intercept (taken literally) means that, according to this estimated
line, districts with zero students per teacher would have a (predicted)
test score of 698.9. But this interpretation of the intercept makes no
sense – it extrapolates the line outside the range of the data – here, the
intercept is not economically meaningful.

Predicted values & residuals:
• One of the districts in the data set is Antelope, CA, for which STR = 19.33 and
Test Score = 657.8.
• predicted value: YÂntelope = 698.9 – 2.28 19.33 = 654.8
• residual: uˆ Antelope = 657.8 – 654.8 = 3.0
• STATA output
regress testscr str, robust
Regression with robust standard errors Number of obs = 420

F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
--------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-----------------------------------------------------------------------------

3. Measures of fit
Two regression statistics provide complementary measures of how well the

regression line “fits” or explains the data:
• regression R2
measures the fraction of the variance of Y that is explained by X; it is
unitless and ranges between zero (no fit) and one (perfect fit)
• standard error of the regression (SER):

measures the magnitude of a typical regression residual in the units of Y.

3. Measures of fit
Yi = Yî + uî
• Variance decomposition
n n n
∑ i
(Y −
i =1
Y ) 2
= ∑ i
(Yˆ − Y
i =1
) 2
+ ∑i
ˆ
u 2
i =1
TSS = ESS + SSR

• TSS = total sum of squares
• ESS = explained sum of squares
• SSR = sum of squared residuals
• (Here, we use Yˆ = Y and uˆ = 0 )

3. Measures of fit
Regression R² = fraction of variation of Y that is explained by X

n
ESS ∑ i
(Yˆ − Y ) 2
SSR
R2 = = i =1
n
= 1−
TSS TSS
∑ i
(Y
i =1
− Y ) 2
• 0 ≤ R2 ≤ 1
• R2 = 0 means ESS = 0 (β̂1 =0), i.e. no variation is explained
• R2 = 1 means ESS = TSS and SSR=0, i.e. all data points on regression line
• For regression with a single X, R2 = the square of the correlation

coefficient between X and Y
R 2 = [corr ( X , Y )]²
3. Measures of fit
Standard error of regression SER
• The SER measures the spread of the distribution of u.
n n
1 1
SER = su2ˆ = ∑
n − 2 i=2
(uî − ˆ
u ) 2
= ∑
n − 2 i=2
ˆ
ui
2
• The SER is (almost) the sample standard deviation of the OLS residuals.
• The SER has the units of u, which are the units of Y
• It measures the average “size” of the OLS residual (the average “mistake”
made by the OLS regression line)
• Why n-2? Degrees of freedom correction by number of estimated estimators.
(In large samples, irrelevant,whether division by n, n-1 or n-2.)

3. Measures of fit
Root mean squared errors (RMSE)
• The root mean squared error (RMSE) is closely related to the SER:
1 n 2
RMSE =
∑
n i =1
uî
• This measures the same thing as the SER – the minor difference is division
by 1/n instead of 1/(n–2).

3. Measures of fit
TestScore = 698.9 – 2.28 STR, R2 = .05, SER = 18.6
• STR explains only a small fraction of the variation in test scores (5.1%).
• SER is relatively large: strong deviation from regression line
• That`s why, predicted values are highly incorrect.
• There must be other factors affecting test scores, e.g. ?

StockWatson Econ CH 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

StockWatson Econ CH 3

Uploaded by

Copyright:

Available Formats

Fundamentals of statistics and empirics

Linear Regression with one regressor

MEPS – Preparatory and Orientation Weeks 31

MEPS – Preparatory and Orientation Weeks 1-32

MEPS – Preparatory and Orientation Weeks 1-33

• The slope of the regression line is the expected effect on Y of a unit

• In our example, class size and test score.

• With the regression model, we determine

– whether there is a (statistically significant) relation between X and Y,

– how strong is this relation?

– Is this a causal effect between X and Y?

MEPS – Preparatory and Orientation Weeks 1-34

MEPS – Preparatory and Orientation Weeks 1-35

• The regression line: Testscore = b0 + b1STR

b1 = slope of regression line

• We would like to know the value of b1.

• Since we don’t know b1, so must estimate it using data.

MEPS – Preparatory and Orientation Weeks 1-36

• X is the independent variable or regressor, also explanatory variable.

• The regression error consists of omitted factors. In general, these

MEPS – Preparatory and Orientation Weeks 1-37

• How can we estimate b0 and b1 from data?

• We will focus on the least squares (“ordinary least squares” or “OLS”)

• The result is the OLS estimators of b0 and b1.

MEPS – Preparatory and Orientation Weeks 1-38

MEPS – Preparatory and Orientation Weeks 1-39

MEPS – Preparatory and Orientation Weeks 1-40

• Application to the California Test Score – Class Size data

• Estimated slope = β̂1= – 2.28

• Estimated regression line: Test Score= 698.9 – 2.28 STR

MEPS – Preparatory and Orientation Weeks 1-41

Interpretation of the estimated slope and intercept

• Test Score = 698.9 – 2.28 STR

• In increase of student-teacher ratio by 1 reduces the test scores on

That is, ∆Test score = –2.28

MEPS – Preparatory and Orientation Weeks 1-42

Predicted values & residuals:

Regression with robust standard errors Number of obs = 420

MEPS – Preparatory and Orientation Weeks 1-44

Two regression statistics provide complementary measures of how well the

• standard error of the regression (SER):

MEPS – Preparatory and Orientation Weeks 1-45

TSS = ESS + SSR

• ESS = explained sum of squares

• SSR = sum of squared residuals

• (Here, we use Yˆ = Y and uˆ = 0 )

MEPS – Preparatory and Orientation Weeks 1-46

Regression R² = fraction of variation of Y that is explained by X

• R2 = 0 means ESS = 0 (β̂1 =0), i.e. no variation is explained

• For regression with a single X, R2 = the square of the correlation

Standard error of regression SER

• The SER measures the spread of the distribution of u.

MEPS – Preparatory and Orientation Weeks 1-48

Root mean squared errors (RMSE)

MEPS – Preparatory and Orientation Weeks 1-49

TestScore = 698.9 – 2.28 STR, R2 = .05, SER = 18.6

MEPS – Preparatory and Orientation Weeks 1-50

You might also like