You are on page 1of 47

ECO 401 Econometrics

SI 2021
Week 2, 14 September

Dr Syed Kanwar Abbas


Email: Syed.Abbas@xjtlu.edu.cn
Consultation hours: BBB, 3:30-5:30 (Tuesday)
Agenda 2

At the end of this session, you should be able to understand:

• What is Ordinary Least Squares (OLS) method?


• What are key assumptions of OLS?
• How to interpret the parameters?
• How to measure Goodness-of-fit?
• How to conduct the hypothesis testing?

This lecture is based on Chapter 2 of your textbook by Verbeek (2017).


Linear regression model
We start with a linear relationship between 𝑦 & 𝑥! ≡ 1 (a constant) and 𝑥" to 𝑥# , assumed to be generally
valid: 𝑦$ = 𝛽! + 𝛽" 𝑥$" + …….+𝛽# 𝑥$% + 𝜀$ (2.24)

In vector form, we write: 𝑦$ = 𝒙&$ 𝜷 + 𝜀$ (2.25)

where 𝒙&$ = 1 𝑥"$ … 𝑥#$ and 𝜷 = 𝛽! 𝛽" … 𝛽#

An econometric model consists of a systematic part and a random and unpredictable component e that we will
call a random error

𝜷 is a vector of unknown parameters characterizing the population.


Ordinary Least Squares (OLS)

Suppose we want to approximate a variable y by a linear combination of other variables, x2 to xK and a constant.

b 1 + b 2 x 2 + ...... + b kxk
Using vector notation, our econometric model is 𝑦$ = 𝒙&$ 𝜷 + 𝜀$ , 𝑖 = 1, . . , 𝑁

Coefficients in this approximation can be determined by Ordinary Least Squares (OLS), which minimizes the
sum of squared differences between y and the linear combination.
(
2 ≡ 4(𝑦$ − 𝑥$& 𝛽)
𝑆(𝛽) 2 "
$'!
Ordinary Least Squares (OLS)-Recap
• How to find the 'best' 𝒃? Ordinary Least Squares (OLS) minimizes the sum of squared
residuals: min ∑: 𝑒
789 7
; = min 𝒆< 𝒆
𝒃 𝒃
• This is a straightforward minimization problem; note that
𝒆< 𝒆 = 𝒚 − 𝑿𝒃 ′ 𝒚 − 𝑿𝒃 so 𝐞< 𝐞 = 𝒚< 𝒚 − 𝟐𝒚< 𝑿𝒃 + 𝒃< 𝑿< 𝑿𝒃
• Differentiating with respect to 𝒃 and collecting terms gives the first order conditions
(FOC): −2 𝑿< 𝒚 − 𝑿< 𝑿𝒃 = 𝟎
• The solution to the FOC is simple: 𝒃=>? = 𝑿< 𝑿 @𝟏 𝑿′𝒚
: @9 :
• Using vector notation this is: 𝒃=>? = ∑789 𝒙𝒊 𝒙𝒊 ′ ∑789 𝒙𝒊 𝑦7
• In the simple case of 𝐾 = 2 with one regressor and a constant we can plot the variables
in a graph (𝑥 on horizontal axis and 𝑦 on vertical axis; let 𝑥̅ and 𝑦7 be the sample means,
then in this case:
∑$
!"# &! '&̅ )! ')*
𝑠𝑙𝑜𝑝𝑒 = 𝑏!"#$ = ∑$ % ; 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 = 𝑏+"#$ = 𝑦- − 𝑏!"#$ 𝑥̅
!"# &! '&̅

5
The Gauss-Markov assumptions

(A1) Error terms have mean zero: E{εi}=0


(A2) All error terms are independent of all x
variables: {ε1 ,… εN} is independent of {x1,… xN}
(A3) All error terms have the same variance
(homoskedasticity): V{εi} = σ2.
(A4) The error terms are mutually uncorrelated (no
autocorrelation): cov{εi,,εj} = 0, i ≠ j.

Under (A2) we can treat the explanatory variables as fixed (deterministic).


Estimator properties

Under assumptions (A1) and (A2):


1. The OLS estimator is unbiased. That is, E{b} = β.

: < @9
Remember 𝑏 = ∑789 𝑥𝑖 𝑥7 ∑:
789 𝑥𝑖 𝑦7
Under assumptions (A1), (A2), (A3) and (A4):

2. The variance of the OLS estimator is given by


V{b} = σ2( Σi xi xi’ )-1 (2.33)
3. The OLS estimator is BLUE: best linear unbiased estimator for β.
The variance of the OLS estimator b2

yi = b1 + b 2 xi 2 + b3 xi 3 + e i
The variance of the OLS estimator b2 is written as follows
-1
s2 é N

V {b2 } = å i 2 2 úû
1 - r232 êë i =1
( x - x )

or
-1
s2 1 é1 N
ù
V {b2 } =
1- r 2
N ê
ëN
å (x
i =1
i2 - x2 ) 2 ú
û
(2.37)
23
Estimator properties
We estimate the variance of the error term σ2 by the sampling variance of the
residuals.

We employ a degrees of freedom correction:


s2 = (N-K) -1 Σi ei2 (2.35)

Under assumptions (A1)-(A4), s2 is unbiased for σ2. The square root is the
standard error of bk.
Estimated variance of OLS estimator

We can think of the standard error as measuring how precisely we have estimated
the population mean, via the sample mean, or another statistic-A measure of accuracy
of estimator.

As the sample size gets bigger and bigger, the standard error will shrink, reflecting the
fact that our estimate for the mean, or another statistic, will become more and more
precise.
Estimated variance of OLS estimator

-1
s2 1 é1 N
ù
V {b2 } =
1- r2
N ê
ëN
å (x
i =1
i2 - x2 ) 2 ú
û
(2.37)
23

What do you observe from 2.37?


Estimated variance of OLS estimator

• The sample variance of x2 shows more variation in the regressor values leads to a
more accurate estimator

• More observations increase precision

• The larger error variance σ2 produces larger variance of the estimator. The low
value for σ2 means observations are closer to the regression line.

• Correlation between regressors matters


Example: individual wages
Consider a sample of N=3294 individuals (1569 females). We observe wage
rates (per hour), gender, experience and years of schooling.

wagei = β1 + β2 malei + εi

The model explains wage from a male dummy (= 1 if male, 0 if female).

The interpretation is: the expected wage of a person, given his or her gender is β1 + β2
malei .
That is, the expected wage of an arbitrary male is β1 + β2, for an arbitrary female it is β1.

13
Table 2.1 OLS estimates wage
equation

The expected hourly wage differential between males and females is $1.17 with a standard error of $0.11.

14
What do the assumptions mean?

(A1): innocent as long as an intercept is included

(A2): {ε1 ,… εN} is independent of {x1,… xN}: knowing a person’s gender provides
no information about unobservables affecting this person’s wage.

(A3) Homoskedasticity, V{εi} = σ2: variance is the same for males and females.

(A4) No autocorrelation: cov{εi,,εj} = 0, i ≠ j: is implied by random sampling.

(A5) Normality: no reason why εi would be normal. (E.g. negative wages are not
possible.)
εi are independent
A convenient fifth assumption is that all error terms have a normal distribution. We
specify:

(A5): εi ~ NID(0, σ2)

which is shorthand for: all εi are independent drawings from a normal distribution
with mean 0 and variance σ2. (“normally and independently distributed”)
Summary of Assumptions

MR1 E(e)=0
MR2 var(e)=𝜎 ;
MR3 𝑐𝑜𝑣 𝑦7 , 𝑦C = 𝑐𝑜𝑣 𝑒7 , 𝑒C = 0
MR4 𝑐𝑜𝑣 𝑥7 , 𝑒7 = 0 and are not exact linear functions of the other explanatory
variable

If assumptions of the Multiple Regression Model (MR1-MR5) hold, the least squares
estimators are the best linear unbiased estimators (BLUE) of the parameters (Gauss-
Markov Theorem).
How well the line fits the observations?
The quality of the linear approximation offered by the model can be measured by the R2,
Goodness-of-fit.
• The R2 indicates the proportion of the variance in y that can be explained by the linear
combination of x variables. It is the proportion of variance of y that is explained by the
model. In formula:

• If the model contains an intercept (as usual), it holds that

18
4.1

Goodness-of-fit
Least Squares
Prediction

• There are two major reasons for analyzing the model

yi = β1 + β2 xi + ei
1. to explain how the dependent variable (yi) changes as the independent variable (xi)
changes

2. to predict y0 given an x0

• We separate yi into its explainable and unexplainable components.


yi = E ( yi ) + ei
• E(yi) is the explainable or systematic part
• ei is the random, unsystematic and unexplainable component
19
4.2
Measuring
Goodness-of-fit Goodness-of-fit..
yi = yˆi + eˆi

Squaring and summing both sides and using the fact that

Eq. 4.8 yi - y = ( yˆi - y ) + eˆi


we get:
å ( yˆ - y ) eˆ
i i =0

å( i ) å( i ) å i
2 2
y - y = ˆ
y - y + ˆ
e 2
Goodness-of-fit…

å( i ) å( i ) å i
2 2
y - y = ˆ
y - y + ˆ
e 2

å i
( y - y ) 2
= Total Sum of Squares = TSS

å ( yˆ - y )
i
2
= Sum of Squares due to regression = ESS

å(y i - ˆ
y ) 2
= å ˆ
ei = Sum of Squares due to Error = RSS
2
Coefficient of determination

Let’s define the coefficient of determination, or R2 , as the proportion of


variation in y explained by x within the regression model:

TSS = ESS + RSS

ESS
R2 = or
TSS
RSS
R2 = 1-
TSS
Explained and unexplained components of yi
4.2

Goodness-of-fit…
Measuring
Goodness-of-fit

We can see that:


The closer R2 is to 1, the closer the sample values yi are to the fitted
regression equation

If R2 = 1, then all the sample data fall exactly on the fitted least
squares line, and the model fits the data ‘‘perfectly’’

If the sample data for y and x are uncorrelated and show no linear
association, and R2 = 0
4.2
Measuring
Goodness-of-fit

The sample correlation coefficient:

sxy
rxy =
where: sx s y
sxy = å ( xi - x )( yi - y ) ( N - 1)
å ( xi - x ) ( N - 1)
2
sx =

å ( yi - y ) ( N - 1)
2
sy =

• The sample correlation coefficient rxy has a value between -1 and 1, and it measures the strength of the linear
association between observed values of x and y

25
4.2
Measuring
Goodness-of-fit

R2 and rxy

Two relationships between R2 and rxy:

1. r2xy = R2

2. R2 can also be computed as the square of the sample correlation coefficient


between yi and yˆi = b1 + b2 xi
Goodness-of-fit. Some Key Points
• In general 0 £ R2 £ 1.
• R2 =1, means the model explains well that the regression line approximates the real
data points. Then given the value of one variable one can perfectly predict the value
of the other variable.
• R2 =0, means the model does not explain anything. Then, knowing either variable
does not help you predict another variable.
• In turn, the higher the R2 value, the more correlation there is between the two
variables.
• There is no general rule to say that an R2 is high or low. This depends upon the
particular context.
• R2 would increase if number of regressors are increased even if it does not have
explanatory power
à drawback of R2
27
Goodness-of-fit. Some Key Points

• In Table 2.1, the R2 of 3.2% means, only approximately 3.2% of the variation in individual wages can
be attributed to gender differences à low R2

• In other words, gender difference “explains” only 3.2% of the variance of wages.

• Apparently, many other observable and unobservable factors affect a person’s wage beside gender.

• This does not necessarily imply that the estimation in 2.1 is incorrect or useless, rather amendments
are required.

• R2s of 0 or 1 are suspicious.

• R2s cannot be compared if y is different.


5.2
Estimating the
Parameters of the Table 5.2 Least Squares Estimates for Sales Equation for Big Andy’s
Multiple Burger Barn
Regression Model

5.2.2
Least Squares
Estimates Using
Hamburger Chain
Data

• Interpretation
• Let us look at another example, where we estimate the relationship of sales revenue of a firm and its
price and advertisement expenditure.

• 44.8% of the variation in sales revenue is explained by the variation in price and by the variation in
the level of advertising expenditure.

• In our sample, 55.2% of the variation in revenue is left unexplained and is due to variation in the error
term or to variation in other variables that implicitly form part of the error term.
29
Goodness-of-fit. adjusted R2
• R2 will never decrease if a variable is added. Therefore we define adjusted R2 as

(has a penalty for larger K. K is number of regressors)


Tests based on OLS estimates

OLS is the most efficient, unbiased estimation technique (provided the Gauss-Markov
assumptions hold)
Often, economic theory implies certain restrictions upon coefficients, for example
𝛽D = 0
We then have to test if an estimated non-zero value simply occurred by chance.
We can check whether our estimates deviate 'significantly' from these restrictions by
means of a statistical test.
If they do, we reject the null hypothesis that the restrictions are true.
Tests based on OLS estimates
• To perform a test, we need a test statistic
• A test statistic is something we can compute from our sample and has a known
distribution under the assumption that the null hypothesis is true.
• Next, we must decide if the computed value is likely to come from this
distribution, which indicates that the null hypothesis is likely to hold, or
not.
• The most common test is the 𝑡-test. It can be used to test a single
restriction.
• Suppose the null hypothesis is 𝛽D = 𝑞 for some given value 𝑞. Then, consider the
test statistic 𝑡 = 𝑏D − 𝑞 /𝑠𝑒(𝑏D )
• If the null hypothesis is true, and the assumptions (A2) and (A5) hold then 𝑡 has a
𝑡-distribution with 𝑁 − 𝐾 degrees of freedom.

32
Tests involving one parameter
• Suppose we want to test the null hypothesis (𝐻2 ) that the true coefficient is
𝛽3 = 0
• We will reject the null hypothesis if the absolute value of 𝑡 (the 𝑡- ratio) is
'too large'.
• (1.64 at 10%, 1.96 at 5% and 2.58 at 1% confidence level)
• If we want to test with 95% confidence, we reject the null hypothesis if the
absolute value of 𝑡 is larger than 1.96.
• The ratio 𝑡 = 𝑏D /𝑠𝑒(𝑏D ) is the 𝑡-value and is routinely supplied by any regression
package, with null hypothesis that 𝛽D = 0
• When it is rejected, it is said that “𝑏D differs significantly from zero” or “the
corresponding 𝑥 variable has a statistically significant impact on 𝑦” or simply “the
𝑥 variable is statistically significant”
• It is important to also look at the magnitude of beta (economic significance)
33
5.5
Hypothesis Testing
Tests involving one parameter
We need to ask whether the data provide any evidence to suggest that y
is related to each of the explanatory variables
• Testing this null hypothesis is sometimes called a test of significance for the
explanatory variable xk

• Null hypothesis: H 0 : bk = 0

• Alternative hypothesis: H1 : b k ¹ 0

bk
• Test statistic: t= ~ t( N - K )
se ( bk )

• t values for a test with level of significance α:


tc = t(1-a /2, N - K ) and - tc = t( a /2, N - K )
34
Table 2.1 OLS estimates wage
equation

35
Do males earn more than females?

• We would like to test the null hypothesis 𝐻! : 𝛽" = 0


#! &.&((&
• Our test statistic is 𝑡= = = 10.38
$% #! !.&&""
• Since this is much larger than 1.96 or 2.58, we reject the null
hypothesis (at 5% or 1%) that the average wage rate (in the
population) is identical for males and females.
• Note that 𝑅" = 0.0317; the simple model only explains 3.2%
of the differences in individual wages.
Why?
• Wage differentials between males and females may be
explainable by other factors (such as education or experience).
36
Tests involving more parameters
• A standard test that is often supplied is a test for the joint hypothesis that all
coefficients (except the intercept 𝛽9) are equal to zero

• Suppose we want to test whether 𝐽 coefficients are jointly equal to zero; 𝐻E: 𝛽; =
𝛽F =. . = 𝛽G = 0

• The alternative is that one or more of the restrictions under the null hypothesis
does not hold
• The easiest way to obtain a test statistic for this is to estimate the model twice:
– one without the restrictions (the full model)
– second with the restrictions imposed by omitting the corresponding 𝑥 variables
(because the corresponding bs are zero).

37
Tests involving more parameters

• Let the 𝑅;s of the two models be given by 𝑅9; and 𝑅E;; unrestricted and restricted
goodness-of-fit.
– Note that 𝑅9; ≥ 𝑅E;
• The restrictions are unlikely to be valid if the difference between the two 𝑅;s is
'large'.
• The test can be interpreted as testing whether the increase in 𝑅; moving from the
restricted model to the more general model is significant.
H! " @H# " /J
• A test statistic can be computed as 𝐹 = 9@H! " / :@G

38
Tests involving more parameters

• Under the null hypothesis (and assumptions A2 and A5), 𝐹


has an 𝐹-distribution with 𝐽 and 𝑁 − 𝐾 degrees of freedom.

– 𝐽 = the number of restricted coefficients;


– 𝑁 = the number of observations;
– 𝐾 = the total number of coefficients

• We reject 𝐻! if 𝐹 is too large; for example, with 𝐽 = 3 and


𝑁 − 𝐾 = 60 we reject if 𝐹 > 2.76 at 95% confidence (from
an 𝐹-distribution table)
39
Tests involving more parameters

• A special case of the 𝐹-test is to test the significance of all


the regressors, when 𝐽 = 𝐾 − 1 and 𝛽" = 𝛽) =. . = 𝛽* = 0

• Since in this restricted model 𝑅! " is zero by construction,


+ " /(*.&)
the test statistic can be written as 𝐹 =
&.+ " / 0.*

• This is the 𝐹-statistic reported in standard regression


output.
40
Wage example again
• Consider the more general model
𝑤𝑎𝑔𝑒1 = 𝛽& + 𝛽" 𝑚𝑎𝑙𝑒1 + 𝛽) 𝑠𝑐ℎ𝑜𝑜𝑙1 + 𝛽2 𝑒𝑥𝑝𝑒𝑟1 + 𝜀1
• Now 𝛽" measures the difference in expected wage between a
male and a female, holding constant (or controlling for)
schooling and experience
– This means male and female with the same schooling and
experience; this is a ceteris paribus (other things equal) condition.
• The method of OLS can be used to estimate the other
coefficients in this multiple regression model as well.
– The key idea is that the coefficients are estimated by minimizing
the sum of squared residuals, just like in the single regression
model. 41
Wage example again

• The expected wage differential is $1.34, highly significant with s.e.


$0.11.
• The estimated wage increase from one additional year of schooling,
keeping years of experience fixed, is $0.64.
• The null hypothesis for experience is rejected as well.
42
Wage example again
• 𝐹 –statistics (joint test for “overall” regression) is 167.63, the
appropriate 5% critical value being 2.60.
• Compared with table 2.1, the 𝑅! increased from 0.0317 to 0.1326,
meaning the current model is able to explain 13.3% of the within-
sample variation in wages.
• Joint test on the hypothesis that the two additional variables,
schooling and experience, both have zero coefficients:
– 𝐹-statistic = 191.35
– 5% critical value = 3.00, So, the null is rejected
– Therefore, the model that includes gender, schooling and experience
performs significantly better than the model that only includes gender.
".$%!&'"."%$( $'".$%!&
• / = 191.35
! %!)*'* 43
Size, power, and p-values
• Type I error: we reject the null hypothesis, while it is
actually true.
– The probability of a type I error (the size or significance
level-𝛼 of the test) is directly controllable by the researcher
by choosing the confidence level.
– For example: a confidence level of 95% corresponds with a
size of 5%.

• Type II error: we do not reject the null hypothesis while


it is false.
– The probability of not making a type II error is called the 44
power of a test.
Size, power, and p-values

• By reducing the size of a test to e.g. 1%, the probability of


rejecting the null hypothesis will decrease, even if it is false.

• Thus, a lower probability of a type I error will imply a higher


probability of a type II error. (There is a trade-off between
the two error types.)
• In general, larger samples imply better power properties.
• Accordingly, in large samples we may prefer to work with a
size of 1% rather than the 'standard' 5%.
45
Size, power, and p-values

• The p-value denotes the significance level for which the null hypothesis
is rejected.

• If a p-value is smaller than the size 𝛼 (e.g. 0.05) we reject the null
hypothesis.

• Many modern software packages provide p-values with their tests.

• This allows you to perform the test without checking tables of critical
values.

46
We would extend our topic next week and look at

Marginal Effects
Asymptotic properties of OLS

You might also like