Simple Linear
Regression
Population Model
• Cross-sectional analysis
• Assume that sample is collected randomly from the population.
• We want to know how y varies with changes in x.
• What if y is affected by factors other than x.
• What is the functional form.
• How we can distinguish causality from correlation?
• Consider the following model, that hold in the population:
Population Model
• We allow for other factors to affect y by including u (error term).
• If the other factors in u are held fixed, ∆u = 0, then x has a linear
effect on y.
• Linearity: a one unit change in x has the same effect on y.
• The goal from empirical work is to estimate and (population
parameters).
• and are not directly observable.
• We estimate and using data and ASSUMPTIONS.
A simple assumption
●
The average value of u, the error term, in the population is 0: E(u) = 0
●
This is not a restrictive assumption, since we can always use 0 to
normalize E(u) to 0.
●
Show this!
Zero conditional mean / Mean
independence
●
We need to make a crucial assumption about how u and x are related
●
We want it to be the case that knowing something about x does not
give us any information about u, so that they are completely
unrelated.
●
E(u|x) = E(u) = 0, which implies
●
E(y|x) = 0 + 1x
●
This is the most crucial, challenging assumption for the interpretation
of 1 as a causal parameter.
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)
f(y)
. E(y|x) = 0 + 1x
x1 x2
Ordinary Least Squares
●
Basic idea of regression is to estimate the population parameters
from a sample
●
Let {(xi,yi): i = 1, …,n} denote a random sample of size n from the
population
●
For each observation in this sample, it will be the case that: yi = 0 +
1xi + ui
●
ui is unobserved.
To calculate the estimates of the coefficients The regression equation that estimates
that minimize the differences between the data the equation of the first order linear model
points and the line, use the formulas: is:
cov(XX,,YY))
cov(
bb11
s22
s xx bb00 bb11xx
ŷŷ
yy bb11xx
bb00
Example 17.1 Relationship between odometer
reading and a used car’s selling price.
• A car dealer wants to find
Car Odometer Price
the relationship between
1 37388 5318
the odometer reading and
2 44758 5061
the selling price of used cars. 3 45833 5008
• A random sample of 100 cars is selected, 4 30862 5795
and the data recorded. 5 31705 5784
• Find the regression line. 6 34010 5359
. . .
. . .
. . .
Independent variable x
Dependent variable y
9
Solution
• Solving by hand
• To calculate b0 and b1 we need to calculate several statistics first;
x 36,009.45; s 2x
( x i x) 2
43,528,688
n 1
y 5,411 .41; cov( X , Y )
( x x )( y
i i y)
1,356,256
n 1
where n = 100.
cov(X , Y ) 1,356,256
b1 .0312
s 2x 43,528,688
b 0 y b1x 5411.41 ( .0312)(36,009.45) 6,533
ŷ b 0 b1x 6,533 .0312x
10
Alternative approach in deriving OLS
Estimates
●
To derive the OLS estimates we need to realize that our
main assumption of E(u|x) = E(u) = 0 also implies that
●
Cov(x,u) = E(xu) = 0
●
Why? Remember from basic probability that Cov(X,Y) =
E(XY) – E(X)E(Y).
●
We can write our 2 restrictions just in terms of x, y, 0 and
, since u = y – 0 – 1x
Alternate approach, continued
●
If one uses calculus to solve the minimization problem for the
two parameters you obtain the following first order
conditions, which are the same as we obtained before,
multiplied by n:
Deriving OLS continued
●
We can write our 2 restrictions just in terms of x, y, 0 and ,
since u = y – 0 – 1x
E(y – 0 – 1x) = 0
E[x(y – 0 – 1x)] = 0
●
These are called moment restrictions.
• and are the estimates from the data.
More Derivation
Plug into the second equation!
Summary of OLS slope
estimate
●
The slope estimate is the sample covariance between x and
y divided by the sample variance of x.
●
If x and y are positively correlated, the slope will be positive
●
If x and y are negatively correlated, the slope will be
negative
●
Only need x to vary in our sample
More OLS
●
Intuitively, OLS is fitting a line through the sample points
such that the sum of squared residuals is as small as
possible, hence the term least squares.
●
The residual, û, is an estimate of the error term, u, and is
the difference between the fitted line (sample regression
function) and the sample point
Sample regression line, sample data points
and the associated estimated error terms
A short simulation
Residuals and fitted values are uncorrelated, by construction!
Algebraic Properties of OLS
●
The sum of the OLS residuals is zero, coefficients were
optimally chosen to ensure that the residuals sum to zero.
●
Thus, the sample average of the OLS residuals is zero as well.
●
The sample covariance (correlation) between the regressors
and the OLS residuals is zero.
●
Because fitted values are linear functions of the , fitted
values and residuals are uncorrelated too.
●
The OLS regression line always goes through the mean of
the sample.
●
If we plug we predict , that is the point is on the OLS regression
line:
Algebraic Properties of OLS
• Residuals sum to zero!
• , since
• Sample covariance between x and residuals is always zero:
• The fitted values and residuals are uncorrelated too:
• The OLS regression line always goes through the mean of the sample.
Goodness of Fit
●
How do we think about how well our sample regression line fits our sample data?
●
Can compute the fraction of the total sum of squares (SST) that is explained by
the model, call this the R-squared of regression.
●
We can think of each observation as being made up of an unexplained part, and
an explained part
• We then define the following:
is the total sum of squares (SSE).
is the explained sum of squares(SSE).
is the residual sum of squares (SSR).
Proving SST = SSE + SSR
Goodness of Fit
• Then SST = SSE + SSR
●
R2 = SSE/SST = 1 – SSR/SST
●
Coefficient determination. It is interpreted as the fraction of the sample
variation in y that is explained by x.
• An R2 of zero means no linear relationship between and
• R2 of one means a perfect linear relationship.
• As R2 increases, the are closer and closer to falling on the OLS regression line.
• R2 never decreases when x is added into the model.
• R2 is a useful summary measure, it does not tell us about causality.
Unbiasedness of OLS
• So far, when we apply OLS on a sample, residuals always average to
zero, regardless of any underlying model.
• Now, we will study the statistical properties of the OLS estimator,
referring to a population model and assuming random sampling.
• How do estimators behave across different samples of data?
• Will we get the right answer if we repeatedly sample?
• We need to find the expected value of across all possible random
samples, and determine whether we are right, on average.
• Unbiasedness:
Unbiasedness of OLS
• is estimator for a specific sample.
• Different samples will generate different .
• Unbiasedness means if we could take as many random samples as we
want and compute each time, the average of the estimates would
be .
Unbiasedness of OLS
●
Assume the population model is linear in parameters: y = 0 + 1x + u.
●
Assume a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the
population.
●
Thus we can write the sample model yi = 0 + 1xi + ui
●
Assume E(u|x) = 0 and thus E(ui|xi) = 0
●
Assume there is variation in the xi
●
How do we show OLS estimator is unbiased, ?
The last term is the slope coefficient from regression of ui on xi .
But this is an imaginary regression since ui is unobserved.
Monte Carlo Simulation
• Suppose we have the following population model:
y = 3 + 2x + u
Where ,
x and u are independent
• We will estimate OLS 1000 times
Monte Carlo Simulation
Unbiasedness of OLS
• Unbiasedness is a property of the procedure of the rule, it is not a
property of the estimate!
• Proof of unbiasedness depends on all of the four assumptions.
Variance of the OLS estimators
• Now, we need to capture the uncertainty in the sampling process.
• The dispersion in the sampling distribution of the estimators.
• The assumptions before are not sufficient to tell us anything about
the variance in the estimator.
• Assume to simplify calculation: Homoscedasticity/constant variance
• u has the same variance given any value of x: V(u|x) =
• is the variance of the stuff other than x that influence y.
Homoskedastic Case
y
f(y|x)
. E(y|x) = 0 + 1x
x1 x2
Heteroskedastic Case
f(y|x)
y .
. E(y|x) = 0 + 1x
.
x1 x2 x3 x
Variance of OLS estimators
The average value of y is allowed to change with x.
The variance does not change with x (homoscedastic).
Sampling variance of OLS
Read Wooldridge section 2-5B (variance of the OLS estimators) to
derive the above results.
Sampling variance of
• is not valid if homoscedastic errors does not hold.
• Remember, homoscedasticity is not used to show unbiasedness!!
• As increased, so does ; the more noise in the relationship between y and x (i.e.
the larger the variability in u), the harder it is to learn something about 1.
• As SSTx rises, decreases; more variation in xi is good.
• Now, we need to estimate , the error variance.
Estimating
• Replace each ui with its estimate
Note that and
The unbiased estimator of uses degrees
of freedom adjustment:
The unbiased estimator of under the FIVE ASSUMPTIONS is:
The standard error of the regression (an estimate of the standard
deviation of the error in the regression):
STATA calls it the root mean square error (RSME).
Given , we can now estimate sd(0) and sd(1)
Gauss-Markov Assumptions
1. Linear in parameters
2. Random sampling
3. Sample variation in x:
4. Zero conditional mean/conditional independence: E(u|x) = 0
5. Homoscedasticity
Under the five assumptions, OLS estimators are Best Linear Unbiased
Estimators (BLUE)
-Best: In the class of LUE, OLS has the smallest variance.
- None will be better than OLS
Robust Standard Errors
• Homoscedasticity is an exception. In real life, errors are
heteroscedastic.
• Unbiasedness does not depend on the assumption about the variance
of the error.
• If errors are heteroscedastic:
Robust Standard Errors
• A valid estimator of under heteroscedasticy of any form (including
homoscedasticity) is
• Option “robust” in stata.
Statistical Inference
Assumption of the Classical Linear
Model (CLM)
• So far, we know that given the Gauss-Markov assumptions, OLS is
BLUE.
• To do classical hypothesis testing, we need to add another
assumption (beyond the Gauss-Markov). Why?
• Under Gauss Markov assumption, the distribution of can be any shape.
• Assume that u is independent of and u is normally distributed with
zero mean and variance : u ∼ Normal(0, ).
• So, assumption of CLM is : Gauss-Markov assumptions + normality
assumption
CLM Assumptions (cont’d)
• Under CLM, OLS is not only BLUE, but is the minimum variance among
ALL unbiased estimator (not necessarily among linear estimators).
• We can summarize the population assumptions of CLM as follows:
• Conditional on x, y has normal distribution with mean (linear in x) and a
constant variance ().
y|x ∼ Normal()
• Normality is sometimes not the case.
• Nonnormality of the errors is not a serious problem with large sample
sizes.
Normal Sampling Distributions
• Under the CLM assumptions,
The t test
• Under the CLM assumptions:
• Note this is a t distribution (vs normal) because we estimate by .
• Knowing the sampling distribution for the standardized estimator
allows us to carry out hypothesis tests.
• Start with a null hypothesis
• For example,
• If fail to reject null, then has no effect on y, controlling for other x’s.
T-test (cont’d)
• To perform our test we first need to form the t statistics for βj :
• We will then use our t statistic along with a rejection rule to
determine whether to “accept” the null hypothesis.
T-test: One sided alternatives
• Besides our null, , we need an alternative hypothesis, , and a
significance level.
• may be one-sided, or two-sided.
• and are one-sided.
• and are two-sided.
• If we want to have only a 5% probability of rejecting if it is true, then
we say our significance level is 5%.
T-test: One sided alternatives
• Having picked a significance level, we look up the (1 − α)th percentile
in a t distribution with n – k – 1 df and call this c, the critical value.
• We can reject the null hypothesis if the t statistic is greater than the
critical value.
• If the t statistic is less than the critical value then we fail to reject the
null.
One-Sided Alternatives (cont)
One sided vs two-sided
• Because the t distribution is symmetric, testing is straightforward.
The critical value is just the negative of before.
• We can reject the null if the t-stat < –c, and if the t-stat > –c then we
fail to reject the null.
• For a two-sided test, we set the critical value based on α/2 and reject
if the absolute value of the t-stat > c.
Two-Sided Alternatives
Summary for
• Unless otherwise stated, the alternative is assumed to be two-sided.
• If we reject the null, we typically say “ is statistically significant at the
α% level”
• If we fail to reject the null, we typically say “ is statistically
insignificant at the α% level”.
Computing p-values for t tests
• An alternative to the classical approach is to ask, “what is the smallest
significance level at which the null would be rejected?”
• So, compute the t statistic, and then look up what percentile it is in
the appropriate t distribution – this is the p-value.
• p-value is the probability we would observe the t statistic we did, if
the null were true.
Confidence Interval
• Another way to use classical statistical testing is to construct a
confidence interval using the same critical value as was used for a
two-sided test.
• A (1 − α)% confidence interval is defined as , where c is the (1 − α/2 )
percentile in a distribution.
Testing Other Hypothesis
• A more general form of the t statistic recognizes that we may want to
test something like
• In this case, the appropriate t statistic is
• for the standard test.
Stata and p-values, t tests, etc.
• Most computer packages will compute the p-value for you, assuming
a two-sided test.
• If you really want a one-sided alternative, just divide the two-sided p-
value by 2.
• Stata provides the t statistic, p-value, and 95% confidence interval for
for you, in columns labeled “t”, “P > |t|” and “[95% Conf. Interval]”,
respectively.
Regression with Stata
The F-stat
●
In a regression model with k independent variables,
H0: 1 = 2 =…= k = 0
H1: H0 is not true (at least one of the s is different from
zero.
How to proceed?
-
t-stat tests a hypothesis that puts no restrictions on the other
parameters
-
Further, we would have three t-stats. What constitute a rejection
at 5% level?
The F-stat
• Run restricted and unrestricted model, find the SSR.
• How much SSR increases when we drop q variables from the model
restricted model.
• Whether the increase in SSR is large enough relative to the SSR in the
model with all of the variables unrestricted model.
• F stat measuring the relative increase in the SSR when moving from
the unrestricted to the restricted model.
F-stat from R-squared
• Sometimes it is more convenient to compute F-stat using R-squares
than SSR.
• SSR = SST(1-)
F-stat (cont’d)
●
Just as with t statistics, p-values can be calculated by looking
up the percentile in the appropriate F distribution
●
If only one exclusion is being tested, then F = t2, and the p-
values will be the same.
●
If Ho is fail to be rejected, this means that we must look for
other variables to explain y.
The F-statistic for Overall
Significance of a Regression
• We use F-statistics
●
Small R-squared sometimes results in a highly significant F stat.
●
That’s why we must look F-stat for joint significance on top of R-
squared.