Interpretation and Statistical Inference With OLS Regressions

Practical Example for OLS Regression, Interpretation and Statistical Inference
Author: Antonio Di Paolo
Main aim: explaining earnings as a function of years of schooling and years of experience.
Data from the Spanish version of the Wage Structure Survey of 2010 (available from this link).
Observations from (n = 12259) for regularly employed salaried workers, in the sectors of Industry,
Services and Construction.
Available variables:
- yi = monthly earnings of individual i
- si = years of completed schooling of individual i
- agei = age of individual i (in years)
- malei = 1 if individual i is male, 0 if female
- expi = years of (potential) labour market experience of individual i (expi = agei – si – 6)
Model to be estimated (using data “wages_spain.gdt”, available in the campus virtual):
𝑦𝑖 = 𝛼 + 𝛽1𝑠𝑖 + 𝛽2𝑒𝑥𝑝𝑖 + 𝑢𝑖
 Results from the linear model:
Model 1: OLS, using observations 1-12259

Dependent variable: earnings
Coefficient Std. Error t-ratio p-value

const 26.1439 32.3601 0.8079 0.4192
schooling 78.7696 1.91918 41.0435 <0.0001 ***
potexper 17.4142 0.657577 26.4824 <0.0001 ***
Mean dependent var 1315.558 S.D. dependent var 884.0818

Sum squared resid 8.38e+09 S.E. of regression 826.7761
R-squared 0.125580 Adjusted R-squared 0.125437
F(2, 12256) 880.0739 P-value(F) 0.000000
Log-likelihood −99743.52 Akaike criterion 199493.0
Schwarz criterion 199515.3 Hannan-Quinn 199500.5
 Interpretation:
- Earnings with 0 years of experience and 0 years of schooling are equal to 26.1€ (𝛼̂)
- One additional year of schooling increases earnings by 78.8€ (𝛽̂1 )
- One additional year of experience increases earnings by 17.4€ (𝛽̂2 )
- Considering only schooling and experience as determinants of earnings, we are able to explain
about 12.6% of the total variation in earnings (R2 = 0.1256) using a linear model
1
 Alternative specification using a log-linear model (l_earnings = ln(earningsi):
ln(𝑦𝑖 ) = 𝛼 + 𝛽1𝑠𝑖 + 𝛽2𝑒𝑥𝑝𝑖 + 𝑢𝑖
 Results from the log-linear model:

Dependent variable: l_earnings

const 6.10351 0.0199138 306.4970 <0.0001 ***
schooling 0.0564715 0.00118102 47.8158 <0.0001 ***
potexper 0.0127326 0.00040466 31.4648 <0.0001 ***

Sum squared resid 3172.579 S.E. of regression 0.508782
F(2, 12256) 1201.677 P-value(F) 0.000000
 Interpretation:
- Logged earnings with 0 years of experience and 0 years of schooling are equal to 6.1 (𝛼̂)
- One additional year of schooling increases earnings by 5.6% (𝛽̂1 )
- One additional year of experience increases earnings by 1.3% (𝛽̂2)
- Considering only schooling and experience as determinants of the log of earnings, we are able to
explain about 16.4% of the total variation in earnings (R2 = 0.1639); notice that this does not mean
that the log-linear model fits better the data, since the R2 from the two models are not comparable
(the dependent variable is different).
 Statistical Inference (from the log-linear model):
- Is the effect of schooling on earnings statistically different from zero?
𝐻0 : 𝛽̂1 = 0
𝐻1 : 𝛽̂1 ≠ 0
𝛽̂1 0.0564715
⇒ 𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐(𝛽̂1 ) = 47.82 > 1.96 (= 𝑡12259−3;0.025 ),
= 𝑠. 𝑒. (𝛽1̂ ) 0.00118102
=
⇒ 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.000001 < 0.05 ⇒ 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0

- Is the effect of each additional year of schooling on earnings equal to 6%?
𝐻0 : 1 = 0.06 (≡ − 0.06 = 0)
𝛽̂ 𝛽̂ 1 − 0.06 ≠ 0)
≠ 0.06 (≡
𝐻1 : 1 1
𝛽̂
𝛽̂
⇒ 𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐(𝛽̂ ) �1 − 0.06 = 0.0564715 − 0.06 = −2.9876717

= �
̂
1
𝑠. 𝑒. (𝛽1̂ 0.00118102
)
⇒ |𝑡| = |−2.9876717| > 1.96 ⇒ 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
⇒ 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.00281655 < 0.05 ⇒ 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
Notice that the same test can be directly implemented using the program, selecting from the
estimation output’s window test linear restriction and typing (where b[2] stands for the coefficient
of the second regressor):
The results are the following and are expressed as an F-statistic:
Restriction:
b[schooling] = 0.06
Test statistic: F(1, 12256) = 8.92592, with p-value = 0.00281708
Restricted estimates:
coefficient std. error t-ratio p-value
const 6.05083 0.00926302 653.2 0.0000 ***

schooling 0.0600000 0.000000 NA NA
potexper 0.0132848 0.000360090 36.89 1.22e-282 ***
Standard error of the regression = 0.508947

The corresponding p-value is the same than what is obtained from the t-statistic and leads to the
same conclusion: the null hypothesis is rejected, because given the precision with which we estimate
the schooling coefficient (i.e. its standard error) we must reject the null hypothesis that it is equal to
0.06.
Notice also that the F-statistic is equal to the squared value of the t-statistic corresponding to the
same null and alternative hypothesis (this is always true when the statistic refers to a single
coefficient).
- What are the confidence intervals of the estimated coefficients?
For years of schooling, considering tdf = tn-k = t12259-3 = 1.96 (for a 95% confidence interval)
𝐶𝐼(𝛽̂ ) = 𝛽̂ ± 1.96 · 𝑠. 𝑒. (𝛽̂ ) = 0.0564715 ± 1.96 · 0.00118102

1 1 1
= 0.0541565 (𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑); 0.0587865 (𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑)
This means that populational (i.e. true) value of the schooling coefficient should be comprised
between these two values (if the OLS hypothesis are satisfied).
The CI for regression coefficients actually provide a similar information than what is obtained from
statistical tests on a single coefficient. For example, the fact that the value 0.06 is out of the
confidence interval of the schooling coefficient indicates that, using a significance level of 5%
(which yields a
95% CI), we would reject the null hypothesis that 1 = 0.06 , as we did using the standard t-statistic
𝛽̂
(similarly, the value 0 is not included in the CI, but yes the value 0.55  we would not reject 𝐻0 : 𝛽̂ =
1
0.06).
To directly obtain the Confidence Intervals for all the coefficients of the estimated regression, you
have to type:
Variable Coefficient 95% confidence interval

const 6.10351 (6.06447, 6.14254)
schooling 0.0564715 (0.0541565, 0.0587865)
potexper 0.0127326 (0.0119394, 0.0135257)
 Testing hypothesis concerning linear combinations of estimated coefficients (three
alternatives):
- Example: the coefficient for years of schooling is twice the coefficient for years of experience
𝐻0 : 1
= (≡ 1 − = 0)
𝛽̂ 2𝛽̂ ̂
2 𝛽(≡ 2𝛽
̂ 2 ≠ 0)
𝐻1 : 1
≠ 2 𝛽̂
− 2
𝛽̂ 2𝛽̂ 1
2𝛽̂
1) Construct the associated t-statistic:
𝑡= 𝛽1 − 2𝛽̂2 = 𝛽1 − 2𝛽̂2
𝑠. 𝑒. (𝛽1 − 2𝛽̂2 √𝑉𝑎𝑟(𝛽̂ ) + 22 𝑉𝑎𝑟(𝛽̂ ) − 2 · 2 · 𝐶𝑜𝑣(𝛽̂ , 𝛽̂ )
)
1 2 1 2
We need
, ), which can be retrieved from the variance-covariance matrix of the
𝐶𝑜𝑣(𝛽̂ 1
𝛽̂ 2
coefficients:
𝑉𝑎𝑟(𝛼̂) ⋯ 𝐶𝑜𝑣(𝛼̂, 𝛽̂𝑘)

𝑉𝑎𝑟(𝛽̂) = 𝜎̂ (𝑋 𝑋)
2 ′ −1
= [𝑢̂′𝑢̂⁄(𝑛 − (𝑘 + 1)](𝑋′𝑋)−1 = [ ⋮ ⋱ ⋮ ]
𝐶𝑜𝑣(𝛽̂ , 𝛼̂) ⋯ 𝑉𝑎𝑟(𝛽 )̂
𝑘 𝑘
- Selecting and executing:
Coefficient covariance matrix
const schooling potexper

0.000396558 -2.08212e-005 -6.15283e-006 const
1.39482e-006 2.18312e-007 schooling
1.63750e-007 potexper
⇒𝑡 𝛽1 − 2𝛽̂2 = 0.0564715 − 2 · 0.0127326
=
√̂0.001181022
𝑠. 𝑒. (𝛽1̂ − 2𝛽 + 22 · − 2 · 2 · (2.18312e − 007)
0.000404662
2)
= 28.585321 > 1.96 (𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.000001) ⇒ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0

2) Reparametrize the model (in order to incorporate the restriction):
𝐻0 : = (≡ 1 −
1 2 = 0) − = 𝜃̂
𝛽̂ 2𝛽̂ 𝛽̂ 2𝛽̂ 2
⇒
1
2𝛽̂ 2
𝐻 ̂: 𝛽̂ ≠ 2𝛽̂ (≡ 𝛽̂ − ≠ 0)
2𝛽
1 1 2 1 2
Considering 𝜃̂= 𝛽̂1 − , the same null and alternative hypothesis can be formulated as a single
2𝛽̂ 2
hypothesis on 𝜃̂, that is:
𝐻0 : = (≡ 𝜃̂= 0)
𝛽̂
⇒ 1
2𝛽̂ 2 (≡ 𝜃̂≠ 0)
𝐻1 : 1 ≠
𝛽̂ 2
2𝛽̂
- Model to be estimated (generating zi = expi + 2si):

𝑧𝑖
ln(𝑦𝑖 ) = 𝛼̂+ 𝛽̂1 𝑖

+ 2 𝑒𝑥𝑝 𝑖 + 𝑖 = 𝛼̂+ 𝜃̂ 𝑖 + 2(
⏞𝑒𝑥𝑖 𝑝 𝑖+ 2 𝑖 (𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝛽̂1 = 𝜃̂+ 2𝛽̂2)
𝛽̂ 𝑢̂ 𝛽̂ 𝑠 ) + 𝑢̂


const 6.10351 0.0199138 306.4970 <0.0001 ***
schooling 0.0310064 0.0010847 28.5853 <0.0001 ***
z 0.0127326 0.00040466 31.4648 <0.0001 ***

F(2, 12256) 1201.677 P-value(F) 0.000000
Reparametrizing the model we can test the hypothesis = by considering the simple t-
1
that 𝛽̂ 2𝛽̂ 2
statistics for θ = 0, which is now the coefficient associated with schooling in the reparametrized model
(notice that it provides the same results than if we construct the t-statistics for 𝛽̂1 − ). Indeed, the
2𝛽̂ 2
null hypothesis is rejected, which means that the effect of years of schooling is not twice the effect of
experience (from a statistical point of view).
3) Construct an F-Statistic for (multiple) linear hypothesis
𝐻0 : 1
= (≡ − = 0)
𝛽̂ 2𝛽̂ ̂
2 𝛽(≡ 2𝛽
̂ 2 ≠ 0)
𝐻1 : 1
≠ 2 𝛽̂
− 2
1
𝛽̂ 2𝛽̂ 2𝛽̂
21 𝑖 𝑖 𝑈𝑅
𝑦𝑖 = 𝛼̂+ 𝛽̂ + 𝑒𝑥𝑝 + ⇒ 𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ⇒ 𝑆𝑆𝑅
1 𝑖
𝛽̂ 𝑢̂
𝑧𝑖 𝑧𝑖
𝑦𝑖 = 𝛼̂+ 𝜃̂𝑠𝑖 + 𝛽̂2 (⏞𝑒𝑥𝑖 𝑝 𝑖+ 2 = 𝛼̂+ 2(

⏞𝑒𝑥𝑖 𝑝 𝑖+ 2 = 𝛼̂+ 𝛽̂2 𝑖 + 𝜀𝑖 ⇒
𝑠 ) + 𝜀𝑖 𝛽̂ 𝑠 ) + 𝜀𝑖
𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 (𝑖𝑚𝑝𝑜𝑠𝑒𝑠 −

1 = 𝜃̂= 0) ⇒ 𝑆𝑆𝑅𝑅
𝛽̂ 2𝛽̂ 2
(𝑆𝑆𝑅𝑅−𝑆𝑆𝑅𝑈𝑅)⁄𝑞
⇒ 𝐹 = 𝑆𝑆𝑅𝑈𝑅⁄(𝑛−(𝑘+1))
where q = number of restrictions, n = number of observation and k = number of coefficients to be

estimated (including the constant). The resulting number should be compared with the value of an F
distribution with q, n-k degrees of freedom (or more efficiently, look at the p-value!!).
In order to get the SSR of the unrestricted model, go back to Model 2 (SSRUR = 3172.579). The
restricted model is the model estimated using only 𝑧𝑖 = 𝑒𝑥𝑝𝑖 + 2𝑠𝑖 as explanatory variable,
since this 2
is the way in which we can impose the restriction that 1 = (i.e. the null hypothesis we want to
𝛽̂ 2𝛽̂
test), that is:
const 6.32792 0.0189004 334.8028 <0.0001 ***
z 0.01561 0.000404778 38.5644 <0.0001 ***

F(1, 12257) 1487.210 P-value(F) 3.4e-307
The SSRR = 3384.098, which implies that the resulting F-statistic is equal to:
(𝑆𝑆𝑅𝑅 − 𝑆𝑆𝑅(3384.098
𝑈𝑅)⁄𝑞 − 3172.579)⁄1
𝐹= = = 817.11972 > 𝐹𝑞,𝑛−(𝑘+1); 𝛼
𝑆𝑆𝑅𝑈𝑅 ⁄(𝑛 − (𝑘 + 1))3172.579⁄(12259 − 3)
= 𝐹1,12256;0.05 = 3.84222 ⇒ 𝑅𝐻0
Notice that in this case q = 1, since we are constraining one coefficient to be equal to 2 times another.
Moreover, the number of parameters (k+1) always refers to the unrestricted model, which includes 2
explanatory variables plus the constant. The corresponding p-value is equal to 0.0000001, which is
lower than 0.05, so as to reject the null hypothesis at any significance level.
The same F test can be easily implemented using the GRETL command “test  linear restrictions”
(remember to do it from the original unrestricted model 2):
Restriction:
b[schooling] - 2*b[potexper] = 0
Test statistic: F(1, 12256) = 817.12, with p-value = 4.88018e-174
Restricted estimates:
const 6.32792 0.0189004 334.8 0.0000 ***

schooling 0.0312200 0.000809557 38.56 3.43e-307 ***
potexper 0.0156100 0.000404778 38.56 3.43e-307 ***
Standard error of the regression = 0.525448

 Testing for the joint significance of the explanatory variables
- How can we test the null hypothesis that schooling and experience are not relevant in explaining
earnings?
𝐻0 :
1 = 𝛽̂ = 0
𝛽̂ 2
2
𝐻1 : 1 ≠ 0; 𝛽̂ ≠ 0
𝛽̂
ln(𝑦𝑖 ) =
0 + 𝛽̂ + 𝑒𝑥𝑝 + ⇒ 𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ⇒ 𝑆𝑆𝑅
𝛽̂ ̂
1 𝑖𝛽 2 𝑖 𝑢̂ 𝑖 𝑈𝑅
= 3172.579 (the same than before)
- The restricted model is now a regression of logged earnings against a constant only:


const 7.03345 0.00502518 1399.6405 <0.0001 ***

ln(𝑦𝑖 ) = + 𝜀 (𝜀 = 𝛽̂ + 𝑒𝑥𝑝 + 𝑢̂) ⇒ 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ⇒ 𝑆𝑆𝑅 = 3794.709

0
𝛽̂ 𝑖 𝑖 1 𝑖 𝛽̂ 2 𝑖 𝑖 𝑅
Notice that now q = 2, since the restriction involves two parameters (not one equal to the other).
(𝑆𝑆𝑅𝑅 − 𝑆𝑆𝑅𝑈𝑅 )⁄𝑞 (3794.709 − 3172.579)/2

𝐹= = = 1201.677
𝑆𝑆𝑅𝑈𝑅 ⁄(𝑛 − 𝑘) 3172.579/(12259 − 3)
With an F statistic equal to 1201.677 and an associated p-value lower than 0.05 and 0.01, we reject
the null hypothesis that the coefficients of schooling and experience are jointly (i.e. simultaneously)
equal to zero. This means that the two variables are jointly significant.
Notice that the test for joint significance of all the explanatory variables included in the model is
automatically provided after any estimation (see Model 2).
The same test can be also constructed using the R2 from the unrestricted and restricted models. The
general formula is:
(𝑅2 − 𝑅2)⁄𝑞
𝑈𝑅 𝑅
𝐹=
(1 − 𝑅2𝑈𝑅)⁄(𝑛 − (𝑘 + 1))
Where 𝑅2 is the R-squared from the unrestricted model and 𝑅2 is R-squared from the unrestricted
𝑈𝑅 𝑅
model.
In the case of testing the joint significance of all the variables included in the model, the above
formula simplifies to:
𝑅2⁄𝑞
𝐹=
(1 − 𝑅2)⁄(𝑛 − (𝑘 + 1))
Where the R-squared is the one obtained from the unrestricted model (the R-squared of a model that
includes only a constant is equal to zero).
Little exercises:
a) Try to check if you are able to obtain the same results using the R-squared formula!
b) Try to check that the F-test is equal to the squared value of the t-test when it involves single
hypothesis (𝐻0 : 1 = 0).
𝛽̂
 F-Test for global differences in the coefficients by subsamples (also called Chow Test)
The F test can be also used to check whether all the coefficients of the regression model are different
for different subsamples (i.e. different groups defined by observable characteristics in cross-section
data, or different sub-periods in time series data). This is equivalent to ask whether the effect of all
the explanatory variables on the dependent variable is different by subsamples.
Considering the example of the wage regression, we could investigate whether the effect of
schooling and experience is different for males and females:
- For males: ln(𝑦𝑖 ) = 𝛼 𝑀 + 𝛽 𝑀 𝑠𝑖 + 𝛽 𝑀 𝑒𝑥𝑝𝑖 + 𝑢 𝑀

1 2 𝑖
- For females: ln(𝑦𝑖 ) = 𝛼 𝐹 + 𝛽𝐹𝑠𝑖 + 𝛽𝐹𝑒𝑥𝑝𝑖 + 𝑢𝐹

1 2 𝑖
In this case, estimating separate regressions for males and females allows the effect of schooling and
experience, as well as the intercept, to be different according to gender.
It is possible to test for the statistical significance of the difference in the coefficients by considering
the following null and alternative hypothesis:
𝐻0 : 𝛼 𝑀 = 𝛼 𝐹 ; 𝛽 𝑀 = 𝛽 𝐹 ; 𝛽 𝑀 = 𝛽 𝐹
1 1 2 2
𝐻1 : 𝛼 𝑀 ≠ 𝛼 𝐹 ; 𝛽 𝑀 ≠ 𝛽 𝐹 ; 𝛽 𝑀 ≠ 𝛽 𝐹
1 1 2 2
The above hypothesis can be tested using an F test, in which we consider that the model estimated
for the whole sample represents the RESTRICTED MODEL and the model estimated separately
by gender represents the UNRESTRICTED MODEL, in which the parameters are assumed to be
the same for males and females.
- In order to perform the test, the following steps have to be executed:
1) Estimate the model for the whole sample and compute the SSR (which will be the SSR of the
restricted model):
- Entire sample: 𝑦𝑖 = + 𝛽̂ + 𝑒𝑥𝑝 + → 𝑆𝑆𝑅 = 3172.579

0
𝛽̂ 1 𝑖 𝛽̂ 2 𝑖 𝑢̂ 𝑖 𝑅
Notice that now the original model (Model 2) represents the Restricted Model, which assumes that all
the coefficients are the same for males and females.
2) Estimate the following equations and compute the SSR for the two groups (which will be the SSR
of the unrestricted model):
- For males: ln(𝑦𝑖 ) = 𝛼̂𝑀 + 𝛽̂𝑀 𝑠𝑖 + 𝛽̂𝑀 𝑒𝑥𝑝𝑖 + 𝑢̂𝑀 → 𝑆𝑆𝑅 𝑀 = 1516.893
1 2 𝑖 𝑈𝑅
- Estimation obtained from the males subsample:


const 6.21735 0.0234295 265.3636 <0.0001 ***
schooling 0.0534628 0.00143024 37.3803 <0.0001 ***
potexper 0.0134082 0.000479228 27.9788 <0.0001 ***

F(2, 6918) 793.0529 P-value(F) 0.000000
- For females: 𝑦𝑖 = 𝛼̂𝐹 + 𝛽̂𝐹 𝑠𝑖 + 𝛽̂𝐹 𝑒𝑥𝑝𝑖 + 𝑢̂𝐹 → 𝑆𝑆𝑅 𝐹 = 1487.897

1 2 𝑖 𝑈𝑅
⇒ 𝑆𝑆𝑅 𝑀 + 𝑆𝑆𝑅 𝐹 = 𝑆𝑆𝑅 = 1516.893 + 1487.897 = 3004.79

𝑈𝑅 𝑈𝑅 𝑈𝑅
- Estimation obtained from the females subsample:


const 5.90596 0.0331962 177.9110 <0.0001 ***
schooling 0.065767 0.0019119 34.3987 <0.0001 ***
potexper 0.0108342 0.000670296 16.1632 <0.0001 ***

F(2, 5335) 592.3281 P-value(F) 4.9e-233
3) Compute the F statistic and obtain the p-value to carry out the test (or check the corresponding
critical value of the F distribution with the appropriate degrees of freedom):
(𝑆𝑆𝑅𝑅 − 𝑆𝑆𝑅𝑈𝑅)⁄𝑞 (𝑆𝑆𝑅𝑅 − (𝑆𝑆𝑅𝑈𝑅

𝑀
+ 𝑆𝑆𝑅 𝐹𝑈𝑅))⁄𝑘
𝐹 = 𝑆𝑆𝑅𝑈𝑅⁄(𝑛 − 2𝑘) = (𝑆𝑆𝑅 𝑀 + 𝑆𝑆𝑅 𝐹 )⁄(𝑛 + 𝑛 − 2𝑘)
𝑀 𝐹
𝑈𝑅 𝑈𝑅
(3172.579 − (1516.893 + 1487.897))⁄3
= = 228.07125 > 2.99646
(1516.893 + 1487.897)/(12259 − 2 · 3)
= 𝐹3,12253;0.05 ⇒ 𝑅𝐻0
Notice that now the number of restrictions (q) is equal to the number of coefficient to be estimated
for each model (k), while the number of parameters (that enters in the denominator of the expression)
refers to the number of coefficients to be estimated in the unrestricted model (i.e. the sum of
coefficients for males and for females, so 2k).
The same test can be easily implemented using the following GRETL options (from Model 2):
Augmented regression for Chow test

OLS, using observations 1-12259 Dependent variable: l_earnings
const 5.90596 0.0311282 189.7 0.0000 ***

schooling 0.0657670 0.00179280 36.68 1.27e-279 ***
potexper 0.0108342 0.000628541 17.24 8.34e-066 ***
male 0.311384 0.0397857 7.827 5.43e-015 ***
ma_schooling −0.0123042 0.00234562 −5.246 1.58e-07 ***
ma_potexper 0.00257407 0.000807413 3.188 0.0014 ***

F(5, 12253) 644.2298 P-value(F) 0.000000
Chow test for structural difference with respect to male

F(3, 12253) = 228.071 with p-value 0.0000
With these results, we conclude that the null hypothesis that the coefficients of the earnings
regression are the same for males and females is soundly rejected, which means that the slope and
intercept coefficients that relate schooling and experience to earnings are different by gender.

Interpretation and Statistical Inference With OLS Regressions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Interpretation and Statistical Inference With OLS Regressions

Uploaded by

Copyright:

Available Formats

Practical Example for OLS Regression, Interpretation and Statistical Inference

Author: Antonio Di Paolo

Model to be estimated (using data “wages_spain.gdt”, available in the campus virtual):

 Results from the linear model:

Model 1: OLS, using observations 1-12259

Coefficient Std. Error t-ratio p-value

Mean dependent var 1315.558 S.D. dependent var 884.0818

ln(𝑦𝑖 ) = 𝛼 + 𝛽1𝑠𝑖 + 𝛽2𝑒𝑥𝑝𝑖 + 𝑢𝑖

 Results from the log-linear model:

Model 2: OLS, using observations 1-12259

Coefficient Std. Error t-ratio p-value

Mean dependent var 7.033450 S.D. dependent var 0.556390

 Statistical Inference (from the log-linear model):

- Is the effect of schooling on earnings statistically different from zero?

⇒ 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.000001 < 0.05 ⇒ 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0

⇒ 𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐(𝛽̂ ) �1 − 0.06 = 0.0564715 − 0.06 = −2.9876717

⇒ 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.00281655 < 0.05 ⇒ 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0

The results are the following and are expressed as an F-statistic:

Test statistic: F(1, 12256) = 8.92592, with p-value = 0.00281708

coefficient std. error t-ratio p-value

const 6.05083 0.00926302 653.2 0.0000 ***

Standard error of the regression = 0.508947

- What are the confidence intervals of the estimated coefficients?

𝐶𝐼(𝛽̂ ) = 𝛽̂ ± 1.96 · 𝑠. 𝑒. (𝛽̂ ) = 0.0564715 ± 1.96 · 0.00118102

Variable Coefficient 95% confidence interval

1) Construct the associated t-statistic:

𝑉𝑎𝑟(𝛼̂) ⋯ 𝐶𝑜𝑣(𝛼̂, 𝛽̂𝑘)

- Selecting and executing:

Coefficient covariance matrix

const schooling potexper

= 28.585321 > 1.96 (𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.000001) ⇒ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0

- Model to be estimated (generating zi = expi + 2si):

ln(𝑦𝑖 ) = 𝛼̂+ 𝛽̂1 𝑖

Model 3: OLS, using observations 1-12259

Coefficient Std. Error t-ratio p-value

Mean dependent var 7.033450 S.D. dependent var 0.556390

3) Construct an F-Statistic for (multiple) linear hypothesis

𝑦𝑖 = 𝛼̂+ 𝜃̂𝑠𝑖 + 𝛽̂2 (⏞𝑒𝑥𝑖 𝑝 𝑖+ 2 = 𝛼̂+ 2(

𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 (𝑖𝑚𝑝𝑜𝑠𝑒𝑠 −

where q = number of restrictions, n = number of observation and k = number of coefficients to be

Mean dependent var 7.033450 S.D. dependent var 0.556390

= 𝐹1,12256;0.05 = 3.84222 ⇒ 𝑅𝐻0

Test statistic: F(1, 12256) = 817.12, with p-value = 4.88018e-174

coefficient std. error t-ratio p-value

const 6.32792 0.0189004 334.8 0.0000 ***

Standard error of the regression = 0.525448

Model 5: OLS, using observations 1-12259

Coefficient Std. Error t-ratio p-value

Mean dependent var 7.033450 S.D. dependent var 0.556390

ln(𝑦𝑖 ) = + 𝜀 (𝜀 = 𝛽̂ + 𝑒𝑥𝑝 + 𝑢̂) ⇒ 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ⇒ 𝑆𝑆𝑅 = 3794.709

(𝑆𝑆𝑅𝑅 − 𝑆𝑆𝑅𝑈𝑅 )⁄𝑞 (3794.709 − 3172.579)/2

- For males: ln(𝑦𝑖 ) = 𝛼 𝑀 + 𝛽 𝑀 𝑠𝑖 + 𝛽 𝑀 𝑒𝑥𝑝𝑖 + 𝑢 𝑀

- For females: ln(𝑦𝑖 ) = 𝛼 𝐹 + 𝛽𝐹𝑠𝑖 + 𝛽𝐹𝑒𝑥𝑝𝑖 + 𝑢𝐹

- Entire sample: 𝑦𝑖 = + 𝛽̂ + 𝑒𝑥𝑝 + → 𝑆𝑆𝑅 = 3172.579

Model 6: OLS, using observations 1-6921

Coefficient Std. Error t-ratio p-value

Mean dependent var 7.117298 S.D. dependent var 0.519097

- For females: 𝑦𝑖 = 𝛼̂𝐹 + 𝛽̂𝐹 𝑠𝑖 + 𝛽̂𝐹 𝑒𝑥𝑝𝑖 + 𝑢̂𝐹 → 𝑆𝑆𝑅 𝐹 = 1487.897

⇒ 𝑆𝑆𝑅 𝑀 + 𝑆𝑆𝑅 𝐹 = 𝑆𝑆𝑅 = 1516.893 + 1487.897 = 3004.79

- Estimation obtained from the females subsample:

Model 7: OLS, using observations 1-5338

Coefficient Std. Error t-ratio p-value

Mean dependent var 6.924737 S.D. dependent var 0.583691