Professional Documents
Culture Documents
StockWatson Econ CH 3
StockWatson Econ CH 3
SW, Chapter 4
1. Basic Idea
2. Ordinary Least Squares Estimation
3. Measures of Fit
4. Least Squares Assumptions
5. Sampling Distribution
6. Hypotheses Testing
7. Confidence Interval
8. Binary Regressor
9. Heteroskedasticity and Homoskedasticity
10. Concerns regarding OLS Estimator
• We want to find the regression line that fits our scatter plot best.
• Estimation:
– How should we draw a line through the data to estimate the slope?
• Answer: ordinary least squares (OLS).
• Hypothesis testing:
– How to test if the slope is zero, i.e. there is no effect of X on Y?
• Confidence intervals:
– How to construct a confidence interval for the slope?
Yi = b0 + b1 X i + ui i = 1,..., n
• We have n observations, (Xi, Yi), i = 1,.., n.
• The OLS estimator minimizes the average squared difference between the
actual values of Yi and the prediction (“predicted value”) based on the
estimated line.
• One of the districts in the data set is Antelope, CA, for which STR = 19.33 and
Test Score = 657.8.
• predicted value: YˆAntelope = 698.9 – 2.28 19.33 = 654.8
• residual: uˆ Antelope = 657.8 – 654.8 = 3.0
MEPS – Preparatory and Orientation Weeks 1-43
The simple linear regression model:
2. The Ordinary Least Squares Estimation
• STATA output
regress testscr str, robust
• regression R2
measures the fraction of the variance of Y that is explained by X; it is
unitless and ranges between zero (no fit) and one (perfect fit)
Yi = Yˆi + uˆi
• Variance decomposition
n n n
∑ i
(Y −
i =1
Y ) 2
= ∑ i
(Yˆ − Y
i =1
) 2
+ ∑i
ˆ
u 2
i =1
ESS ∑ i
(Yˆ − Y ) 2
SSR
R2 = = i =1
n
= 1−
TSS TSS
∑ i
(Y
i =1
− Y ) 2
• 0 ≤ R2 ≤ 1
• R2 = 1 means ESS = TSS and SSR=0, i.e. all data points on regression line
n n
1 1
SER = su2ˆ = ∑
n − 2 i=2
(uˆi − ˆ
u ) 2
= ∑
n − 2 i=2
ˆ
ui
2
• The SER is (almost) the sample standard deviation of the OLS residuals.
• The SER has the units of u, which are the units of Y
• It measures the average “size” of the OLS residual (the average “mistake”
made by the OLS regression line)
• Why n-2? Degrees of freedom correction by number of estimated estimators.
(In large samples, irrelevant,whether division by n, n-1 or n-2.)
• The root mean squared error (RMSE) is closely related to the SER:
1 n 2
RMSE =
∑
n i =1
uˆi
• This measures the same thing as the SER – the minor difference is division
by 1/n instead of 1/(n–2).
• STR explains only a small fraction of the variation in test scores (5.1%).
• SER is relatively large: strong deviation from regression line
• That`s why, predicted values are highly incorrect.
• There must be other factors affecting test scores, e.g. ?