You are on page 1of 19

FUNDAMENTALS OF

ECONOMETRICS
DR ABDUL WAHEED
PhD Econometrics
FUNDAMENTALS OF
ECONOMETRICS
Week 3
Lecture 6
Variances and Standard
Errors of OLS Estimators
The OLS estimators, the 𝜷𝒔, are random variables, for their
values will vary from sample to sample.
The variability of a random variable is measured by its variance ,
or, by its standard deviation.
In the regression context the standard deviation of an estimator
is called the standard error (Standard Error of Regression SER),
but conceptually it is similar to standard deviation.
Variances and Standard Errors of OLS Estimators
𝒀 = 𝜷 𝟏 + 𝜷 𝟐 𝑿𝟐 + 𝜷 𝟑 𝑿𝟑 + ⋯ + 𝜷 𝒌 𝑿𝒌

Sample 1

𝒀 = 𝜷 𝟏 + 𝜷 𝟐 𝑿𝟐 + 𝜷 𝟑 𝑿 𝟑 + ⋯ + 𝜷 𝒌 𝑿𝒌

Sample 2

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊

Population

𝒀 = 𝜷 𝟏 + 𝜷 𝟐 𝑿𝟐 + 𝜷 𝟑 𝑿 𝟑 + ⋯ + 𝜷 𝒌 𝑿𝒌

Sample n
Variances and Standard
Errors of OLS Estimators
The variance of regression line (using OLS) can be calculated as:
𝟐
𝟐
𝒆 𝒊
𝝈 =
𝒏−𝒌
The residual sum of squares (RSS) divided by (n – k), which is
called the degrees of freedom (df)
𝜎 is estimate for population 𝜎, n is for number of observations
and k is the number of estimated parameters in regression
model.
Hypothesis Testing: t-test
Consider the following population Regression
𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊
And estimated model
𝒀 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐 + 𝜷𝟑 𝑿𝟑 + ⋯ + 𝜷𝒌 𝑿𝒌
Suppose to test the hypothesis that the (population) regression
coefficient
𝑯𝟎 : 𝜷𝒌 = 𝟎 (𝒊. 𝒆, 𝜷𝟐 = 𝟎, 𝜷𝟑 = 𝟎, … … . )
𝑯𝟏 : 𝜷𝒌 ≠ 𝟎
The commonly chosen probability (levels of significance ) values are
10%, 5%, and 1%.
Hypothesis Testing: t-test
To test this hypothesis, we use the t-test of statistics
𝜷𝒌
𝒕=
𝑺𝑬 𝜷𝒌
If this value is greater than the critical t value, we can reject 𝑯𝟎
Note: Need not do this labor manually as statistical packages provide
the necessary output. These software packages not only give the
estimated t values, but also their p (probability) values, which are the
exact level of significance of the t values.
In practice, a low p value suggests that the estimated coefficient is
statistically significant.
Sum of Squares
Total Sum of Square: Distance of an observed point from mean
𝑻𝑺𝑺 = 𝒀𝒊 − 𝒀 𝟐

Explained Sum of Square (Explained Variations): Distance of corresponding point


on regression line from its mean
𝟐
𝑬𝑺𝑺 = 𝒀𝒊 − 𝒀

Residual Sum of Square (Unexplained Variations): Distance from regression line to


the observed point
𝟐
𝑹𝑺𝑺 = 𝒀𝒊 − 𝒀𝒊

𝑌𝑖 = Observed value, 𝑌𝑖 = Estimated value from model, 𝑌 = Mean value


𝟐
The coefficient of determination : 𝑹
The coefficient of determination is an overall measure of goodness of fit of the
estimated regression line.
It gives the proportion or percentage of the total variation in the dependent
variable Y (TSS) that is explained by all the regressors.

𝑬𝑺𝑺
𝑹𝟐 =
𝑻𝑺𝑺
lies between 0 and 1, provided there is an intercept term in the model. The closer
it is to 1, the better is the fit, and the closer it is to 0, the worse is the fit.
Alternatively,
𝟐
𝑹𝑺𝑺
𝑹 =𝟏−
𝑻𝑺𝑺
𝟐 𝟐
Adjusted-𝑹 (𝑹 )
One disadvantage of 𝑹𝟐 is that it is an increasing function of the number
of regressors.
That is, with the increase in the number of explanatory variables to
model, the value of 𝑹𝟐 generally expected to increase.
To avoid “maximizing” 𝑹𝟐 , it is suggested the use of an adjusted 𝑹𝟐 ,
denoted as 𝑹𝟐 and is computed from the (unadjusted) 𝑹𝟐 as follows:
𝟐 𝟐
𝒏−𝟏
𝑹 = 𝟏 − (𝟏 − 𝑹 )
𝒏−𝒌
Hypothesis Testing: F Test
To test the hypothesis that all the slope coefficients in the model are
simultaneously equal to zero.
𝑯𝟎 : 𝑹𝟐 = 𝟎 𝒐𝒓 (𝜷𝟐 = 𝜷𝟑 = ⋯ = 𝜷𝒌 = 𝟎)
𝑯𝟏 : 𝑹𝟐 ≠ 𝟎 𝒐𝒓 (𝜷𝟐 = 𝜷𝟑 = ⋯ = 𝜷𝒌 ≠ 𝟎)

Calculate the following and use the F table to obtain the critical F value with k-1
degrees of freedom in the numerator and n-k degrees of freedom in the
denominator for a given level of significance:
𝑬𝑺𝑺/𝒅𝒇 𝑹𝟐 / 𝒌 − 𝟏
𝑭= =
𝑹𝒔𝒔/𝒅𝒇 𝟏 − 𝑹𝟐 / 𝒏 − 𝒌
If this value is greater than the critical F value, reject 𝑯𝟎 .
Software

EViews
Case Study
Determinates of hourly wages in USA
The U.S. Census Bureau, conducts a survey. A cross-section of 1,289 persons
interviewed in March 1995 to study the factors that determine hourly wage (in
dollars) in this sample.
The variables used in the analysis are defined as follows:
Wage: Hourly wage in dollars, which is the dependent variable.
The explanatory variables, or regressors, are as follows:
Female: Gender, coded 1 for female, 0 for male
Nonwhite: Race, coded 1 for nonwhite workers, 0 for white workers
Union: union status, coded 1 if in a union job, 0 otherwise
Education: Education (in years)
Exper: Potential work experience (in years), defined as age minus years of
schooling minus 6. (It is assumed that schooling starts at age 6).
Econometric Model and Estimation

𝑾𝒂𝒈𝒆
= 𝜷𝟏 + 𝜷𝟐 𝑭𝒆𝒎𝒂𝒍𝒆 + 𝜷𝟑 𝑵𝒐𝒏𝒘𝒉𝒊𝒕𝒆 + 𝜷𝟒 𝑼𝒏𝒊𝒐𝒏
+ 𝜷𝟓 𝑬𝒅𝒖𝒄𝒂𝒕𝒊𝒐𝒏 + 𝜷𝟔 𝑬𝒙𝒑𝒆𝒓 + 𝒖𝒊

We need to calculate 𝜷𝟏 , 𝜷𝟐 , 𝜷𝟑 , 𝜷𝟒 , 𝜷𝟓 and 𝜷𝟔 using OLS


method.
We can not calculate that manually, So we are going to use
EViews software.
Results
Dependent Variable: WAGE
Method: Least Squares
Date: 02/15/23 Time: 14:20
Sample: 1 1289
Included observations: 1289

Variable Coefficient Std. Error t-Statistic Prob.

C -7.183338 1.015788 -7.071691 0.0000


FEMALE -3.074875 0.364616 -8.433184 0.0000
NONWHITE -1.565313 0.509188 -3.074139 0.0022
UNION 1.095976 0.506078 2.165626 0.0305
EDUCATION 1.370301 0.065904 20.79231 0.0000
EXPER 0.166607 0.016048 10.38205 0.0000

R-squared 0.323339 Mean dependent var 12.36585


Adjusted R-squared 0.320702 S.D. dependent var 7.896350
S.E. of regression 6.508137 Akaike info criterion 6.588627
Sum squared resid 54342.54 Schwarz criterion 6.612653
Log likelihood -4240.370 Hannan-Quinn criter. 6.597646
F-statistic 122.6149 Durbin-Watson stat 1.897513
Prob(F-statistic) 0.000000
Results
Result 1 Dependent Variable: WAGE
Method: Least Squares
Date: 02/15/23 Time: 14:20
Sample: 1 1289
Included observations: 1289

Basic Information:
✓ Dependent variable
✓ Method used for estimation of the regression model
✓ Time
✓ Size of the sample (number of observations total)
✓ Number of observation included in the model
Results
Result 2 Variable Coefficient Std. Error t-Statistic Prob.

C -7.183338 1.015788 -7.071691 0.0000


FEMALE -3.074875 0.364616 -8.433184 0.0000
NONWHITE -1.565313 0.509188 -3.074139 0.0022
UNION 1.095976 0.506078 2.165626 0.0305
EDUCATION 1.370301 0.065904 20.79231 0.0000
EXPER 0.166607 0.016048 10.38205 0.0000
Regression model results:
✓ Parameters estimated values
✓ S.E, t-values, P-values
✓ This it the main concern for us at this point in time
✓ What do different column represents?
✓ Interpretation of the coefficients
✓ t-statistics values
✓ Probability values interpretation?
Results
Result 3 R-squared 0.323339 Mean dependent var 12.36585
Adjusted R-squared 0.320702 S.D. dependent var 7.896350
S.E. of regression 6.508137 Akaike info criterion 6.588627
Sum squared resid 54342.54 Schwarz criterion 6.612653
Log likelihood -4240.370 Hannan-Quinn criter. 6.597646
F-statistic 122.6149 Durbin-Watson stat 1.897513
Prob(F-statistic) 0.000000

Additional information:
✓ 𝑹𝟐 and Adjusted 𝑹𝟐
✓ Standard error of the regression
✓ Residual sum of square
✓ F-Statistics and its significance
✓ Mean and S.D of dependent value
✓ AIC, SIC and H-Q criteria values
✓ DW statistics
Forecasting
Sometimes we may want to use the estimated regression model
for forecasting purposes.
Consider the Wage regression we estimated:
wage = -7.16 - 3.14FEMALE – 1.46NONWHITE +
1.09UNION+1.37EDUCATION+ 0.166EXPER
Given the information and the regression coefficients values we
can easily calculate the expected (average) wage of a person.
This is the essence of forecasting at this stage.

You might also like