Professional Documents
Culture Documents
Main aim: explaining earnings as a function of years of schooling and years of experience.
Data from the Spanish version of the Wage Structure Survey of 2010 (available from this link).
Observations from (n = 12259) for regularly employed salaried workers, in the sectors of Industry,
Services and Construction.
Available variables:
- yi = monthly earnings of individual i
- si = years of completed schooling of individual i
- agei = age of individual i (in years)
- malei = 1 if individual i is male, 0 if female
- expi = years of (potential) labour market experience of individual i (expi = agei – si – 6)
𝑦𝑖 = 𝛼 + 𝛽1𝑠𝑖 + 𝛽2𝑒𝑥𝑝𝑖 + 𝑢𝑖
Interpretation:
- Earnings with 0 years of experience and 0 years of schooling are equal to 26.1€ (𝛼̂)
- One additional year of schooling increases earnings by 78.8€ (𝛽̂1 )
- One additional year of experience increases earnings by 17.4€ (𝛽̂2 )
- Considering only schooling and experience as determinants of earnings, we are able to explain
about 12.6% of the total variation in earnings (R2 = 0.1256) using a linear model
1
Alternative specification using a log-linear model (l_earnings = ln(earningsi):
Interpretation:
- Logged earnings with 0 years of experience and 0 years of schooling are equal to 6.1 (𝛼̂)
- One additional year of schooling increases earnings by 5.6% (𝛽̂1 )
- One additional year of experience increases earnings by 1.3% (𝛽̂2)
- Considering only schooling and experience as determinants of the log of earnings, we are able to
explain about 16.4% of the total variation in earnings (R2 = 0.1639); notice that this does not mean
that the log-linear model fits better the data, since the R2 from the two models are not comparable
(the dependent variable is different).
𝐻0 : 𝛽̂1 = 0
𝐻1 : 𝛽̂1 ≠ 0
𝛽̂1 0.0564715
⇒ 𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐(𝛽̂1 ) = 47.82 > 1.96 (= 𝑡12259−3;0.025 ),
= 𝑠. 𝑒. (𝛽1̂ ) 0.00118102
=
Notice that the same test can be directly implemented using the program, selecting from the
estimation output’s window test linear restriction and typing (where b[2] stands for the coefficient
of the second regressor):
Restriction:
b[schooling] = 0.06
Restricted estimates:
Notice also that the F-statistic is equal to the squared value of the t-statistic corresponding to the
same null and alternative hypothesis (this is always true when the statistic refers to a single
coefficient).
For years of schooling, considering tdf = tn-k = t12259-3 = 1.96 (for a 95% confidence interval)
This means that populational (i.e. true) value of the schooling coefficient should be comprised
between these two values (if the OLS hypothesis are satisfied).
The CI for regression coefficients actually provide a similar information than what is obtained from
statistical tests on a single coefficient. For example, the fact that the value 0.06 is out of the
confidence interval of the schooling coefficient indicates that, using a significance level of 5%
(which yields a
95% CI), we would reject the null hypothesis that 1 = 0.06 , as we did using the standard t-statistic
𝛽̂
(similarly, the value 0 is not included in the CI, but yes the value 0.55 we would not reject 𝐻0 : 𝛽̂ =
1
0.06).
To directly obtain the Confidence Intervals for all the coefficients of the estimated regression, you
have to type:
- Example: the coefficient for years of schooling is twice the coefficient for years of experience
𝐻0 : 1
= (≡ 1 − = 0)
𝛽̂ 2𝛽̂ ̂
2 𝛽(≡ 2𝛽
̂ 2 ≠ 0)
𝐻1 : 1
≠ 2 𝛽̂
− 2
𝛽̂ 2𝛽̂ 1
2𝛽̂
𝑡= 𝛽1 − 2𝛽̂2 = 𝛽1 − 2𝛽̂2
𝑠. 𝑒. (𝛽1 − 2𝛽̂2 √𝑉𝑎𝑟(𝛽̂ ) + 22 𝑉𝑎𝑟(𝛽̂ ) − 2 · 2 · 𝐶𝑜𝑣(𝛽̂ , 𝛽̂ )
)
1 2 1 2
We need
, ), which can be retrieved from the variance-covariance matrix of the
𝐶𝑜𝑣(𝛽̂ 1
𝛽̂ 2
coefficients:
𝐻0 : = (≡ 1 −
1 2 = 0) − = 𝜃̂
𝛽̂ 2𝛽̂ 𝛽̂ 2𝛽̂ 2
⇒
1
2𝛽̂ 2
𝐻 ̂: 𝛽̂ ≠ 2𝛽̂ (≡ 𝛽̂ − ≠ 0)
2𝛽
1 1 2 1 2
Considering 𝜃̂= 𝛽̂1 − , the same null and alternative hypothesis can be formulated as a single
2𝛽̂ 2
hypothesis on 𝜃̂, that is:
𝐻0 : = (≡ 𝜃̂= 0)
𝛽̂
⇒ 1
2𝛽̂ 2 (≡ 𝜃̂≠ 0)
𝐻1 : 1 ≠
𝛽̂ 2
2𝛽̂
𝐻0 : 1
= (≡ − = 0)
𝛽̂ 2𝛽̂ ̂
2 𝛽(≡ 2𝛽
̂ 2 ≠ 0)
𝐻1 : 1
≠ 2 𝛽̂
− 2
1
𝛽̂ 2𝛽̂ 2𝛽̂
21 𝑖 𝑖 𝑈𝑅
𝑦𝑖 = 𝛼̂+ 𝛽̂ + 𝑒𝑥𝑝 + ⇒ 𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ⇒ 𝑆𝑆𝑅
1 𝑖
𝛽̂ 𝑢̂
𝑧𝑖 𝑧𝑖
(𝑆𝑆𝑅𝑅−𝑆𝑆𝑅𝑈𝑅)⁄𝑞
⇒ 𝐹 = 𝑆𝑆𝑅𝑈𝑅⁄(𝑛−(𝑘+1))
In order to get the SSR of the unrestricted model, go back to Model 2 (SSRUR = 3172.579). The
restricted model is the model estimated using only 𝑧𝑖 = 𝑒𝑥𝑝𝑖 + 2𝑠𝑖 as explanatory variable,
since this 2
is the way in which we can impose the restriction that 1 = (i.e. the null hypothesis we want to
𝛽̂ 2𝛽̂
test), that is:
Model 4: OLS, using observations 1-12259
Dependent variable: l_earnings
Coefficient Std. Error t-ratio p-value
const 6.32792 0.0189004 334.8028 <0.0001 ***
z 0.01561 0.000404778 38.5644 <0.0001 ***
Notice that in this case q = 1, since we are constraining one coefficient to be equal to 2 times another.
Moreover, the number of parameters (k+1) always refers to the unrestricted model, which includes 2
explanatory variables plus the constant. The corresponding p-value is equal to 0.0000001, which is
lower than 0.05, so as to reject the null hypothesis at any significance level.
The same F test can be easily implemented using the GRETL command “test linear restrictions”
(remember to do it from the original unrestricted model 2):
Restriction:
b[schooling] - 2*b[potexper] = 0
Restricted estimates:
- How can we test the null hypothesis that schooling and experience are not relevant in explaining
earnings?
𝐻0 :
1 = 𝛽̂ = 0
𝛽̂ 2
2
𝐻1 : 1 ≠ 0; 𝛽̂ ≠ 0
𝛽̂
ln(𝑦𝑖 ) =
0 + 𝛽̂ + 𝑒𝑥𝑝 + ⇒ 𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ⇒ 𝑆𝑆𝑅
𝛽̂ ̂
1 𝑖𝛽 2 𝑖 𝑢̂ 𝑖 𝑈𝑅
= 3172.579 (the same than before)
- The restricted model is now a regression of logged earnings against a constant only:
Notice that now q = 2, since the restriction involves two parameters (not one equal to the other).
With an F statistic equal to 1201.677 and an associated p-value lower than 0.05 and 0.01, we reject
the null hypothesis that the coefficients of schooling and experience are jointly (i.e. simultaneously)
equal to zero. This means that the two variables are jointly significant.
Notice that the test for joint significance of all the explanatory variables included in the model is
automatically provided after any estimation (see Model 2).
The same test can be also constructed using the R2 from the unrestricted and restricted models. The
general formula is:
(𝑅2 − 𝑅2)⁄𝑞
𝑈𝑅 𝑅
𝐹=
(1 − 𝑅2𝑈𝑅)⁄(𝑛 − (𝑘 + 1))
Where 𝑅2 is the R-squared from the unrestricted model and 𝑅2 is R-squared from the unrestricted
𝑈𝑅 𝑅
model.
In the case of testing the joint significance of all the variables included in the model, the above
formula simplifies to:
𝑅2⁄𝑞
𝐹=
(1 − 𝑅2)⁄(𝑛 − (𝑘 + 1))
Where the R-squared is the one obtained from the unrestricted model (the R-squared of a model that
includes only a constant is equal to zero).
Little exercises:
a) Try to check if you are able to obtain the same results using the R-squared formula!
b) Try to check that the F-test is equal to the squared value of the t-test when it involves single
hypothesis (𝐻0 : 1 = 0).
𝛽̂
F-Test for global differences in the coefficients by subsamples (also called Chow Test)
The F test can be also used to check whether all the coefficients of the regression model are different
for different subsamples (i.e. different groups defined by observable characteristics in cross-section
data, or different sub-periods in time series data). This is equivalent to ask whether the effect of all
the explanatory variables on the dependent variable is different by subsamples.
Considering the example of the wage regression, we could investigate whether the effect of
schooling and experience is different for males and females:
In this case, estimating separate regressions for males and females allows the effect of schooling and
experience, as well as the intercept, to be different according to gender.
It is possible to test for the statistical significance of the difference in the coefficients by considering
the following null and alternative hypothesis:
𝐻0 : 𝛼 𝑀 = 𝛼 𝐹 ; 𝛽 𝑀 = 𝛽 𝐹 ; 𝛽 𝑀 = 𝛽 𝐹
1 1 2 2
𝐻1 : 𝛼 𝑀 ≠ 𝛼 𝐹 ; 𝛽 𝑀 ≠ 𝛽 𝐹 ; 𝛽 𝑀 ≠ 𝛽 𝐹
1 1 2 2
The above hypothesis can be tested using an F test, in which we consider that the model estimated
for the whole sample represents the RESTRICTED MODEL and the model estimated separately
by gender represents the UNRESTRICTED MODEL, in which the parameters are assumed to be
the same for males and females.
- In order to perform the test, the following steps have to be executed:
1) Estimate the model for the whole sample and compute the SSR (which will be the SSR of the
restricted model):
Notice that now the original model (Model 2) represents the Restricted Model, which assumes that all
the coefficients are the same for males and females.
2) Estimate the following equations and compute the SSR for the two groups (which will be the SSR
of the unrestricted model):
- For males: ln(𝑦𝑖 ) = 𝛼̂𝑀 + 𝛽̂𝑀 𝑠𝑖 + 𝛽̂𝑀 𝑒𝑥𝑝𝑖 + 𝑢̂𝑀 → 𝑆𝑆𝑅 𝑀 = 1516.893
1 2 𝑖 𝑈𝑅
- Estimation obtained from the males subsample:
= 𝐹3,12253;0.05 ⇒ 𝑅𝐻0
Notice that now the number of restrictions (q) is equal to the number of coefficient to be estimated
for each model (k), while the number of parameters (that enters in the denominator of the expression)
refers to the number of coefficients to be estimated in the unrestricted model (i.e. the sum of
coefficients for males and for females, so 2k).
The same test can be easily implemented using the following GRETL options (from Model 2):
With these results, we conclude that the null hypothesis that the coefficients of the earnings
regression are the same for males and females is soundly rejected, which means that the slope and
intercept coefficients that relate schooling and experience to earnings are different by gender.