You are on page 1of 7

Checklist for final exam

1. Verify the suitability of results (based on signs of coefficients)


2. Interpretation of coefficients, R2 (linear and log form)
3. Confidence interval for model parameter, sum or difference of two parameters
4. Test for individual significance of coefficients (t-test) and overall significant test
(F-test)
5. Restriction test using t, F statistics (for dropping, constraints)
6. Dummy variable (interpretation of meaning of coefficient of dummy)
7. Building the model with dummy variables.
8. Choosing the appropriate model
9. Concepts, consequences, detection test for the model problems (Multicollinearity,
serial correlation, heteroscedasticity, functional form, normality)
10. Logistic regression (redo the example in the lecture)
Note: pay attention to demand function and Cobb-Douglas production
function. Redo all examples given in the lectures
Example 1:
You are using an econometric model to study the dependence of the annual salaries of
CEOs (Chief Executive Officers) of major private companies on some variables. The
sample data consist of observations for 60 private firms which include the following
variables:
SALi : the annual salary of the CEO of firm i, measured in thousands of dollars;
ARi : the annual total sales revenues of firm i, measured in millions of dollars;
MVi : the market value of firm i, measured in millions of dollars;
EMi : the number of years the CEO has been employed with firm i;
AGEi: the age of the CEO of firm i, in years.
The regression model you propose is:

in which (lnXi) denotes the natural logarithm of Xi. EM2 and AGE2 are the squares of
corresponding variables, ui is stochastic disturbance.
Using the data, you estimate the following regression models (estimated standard errors
in parentheses below the coefficient estimates):
(1) lnSALi (hat)=5.572 + 0.182lnARi + 0.102lnMVi + 0.046EMi - 0.00122EMi2 –
0.042AGEi + 0.00033AGEi2
se (0.0412) (0.0493) (0.0142) (0.000476)
(0.0412) (0.00036)
RSS= 42.060; TSS= 64.646
(2) lnSALi (hat)= 4.369 + 0.1646lnARi + 0.1085lnMVi + 0.04512EMi - 0.00121EMi2
RSS= 42.474; TSS= 64.646
1. In the model (1) above, interpret the meaning of each estimated coefficients
Does each independent variable AR or MV affect the salaries of CEOs?
2. In the model (1), by how much the model can explain for the variation of salaries
of CEOs?
Is it correct to say that all independent variables of the model (1) simultaneously
do not explain for the variation of the salaries of CEOs?
3. In the model (1), test the hypothesis that coefficients of AR and MV are equal
given that:

4. State the coefficient restrictions that are imposed on regression equation (1) in
estimating model (2) above? Conduct a test of these coefficient restrictions and
state the meaning of this test? Based on the outcome of the test, would you
choose equation (2) or equation (1)?
5. What are the implications of introducing the squared terms of EM and AGE in the
model (1)? Present the procedure to use F-test to test the hypothesis that we can
drop out two squared terms EM2 and AGE2 from model (1) (use the form of
population regression model).

1. In the model (1), the estimated coefficients are:


 The coefficient for lnARi is 0.182, which means that a 1% increase in annual total
sales revenues (AR) is associated with a 0.182% increase in CEO salaries.
 The coefficient for lnMVi is 0.102, which means that a 1% increase in the market
value (MV) of the firm is associated with a 0.102% increase in CEO salaries.
 The coefficient for EMi is 0.046, which means that a one-year increase in CEO's
employment tenure (EM) is associated with a 0.046% increase in CEO salaries.
 The coefficient for EMi2 is -0.00122, which implies that the relationship between
CEO salaries and employment tenure is nonlinear, with diminishing marginal
returns to years of experience.
 The coefficient for AGEi is -0.042, which means that a one-year increase in
CEO's age (AGE) is associated with a 0.042% decrease in CEO salaries.
 The coefficient for AGEi2 is 0.00033, which implies that the relationship between
CEO salaries and age is nonlinear, with increasing marginal returns to age.
Based on these estimates, both AR and MV have a positive impact on CEO salaries,
while EM and AGE have a mixed impact.
2. The R-squared of the model (1) is calculated as: R-squared = 1 - RSS/TSS = 1 -
42.060/64.646 = 0.349 This means that the model explains 34.9% of the variation
in CEO salaries. It is incorrect to say that all independent variables of the model
do not explain the variation of CEO salaries, as the model has a non-zero R-
squared.
3. To test the hypothesis that the coefficients of AR and MV are equal in model (1),
we can use an F-test. The null hypothesis is that the coefficients are equal, while
the alternative hypothesis is that they are not equal. We can compute the F-
statistic as:
F = ((RSS1 - RSS2) / q) / (RSS2 / (n - k - 1))
where RSS1 is the residual sum of squares for the unrestricted model (1), RSS2 is the
residual sum of squares for the restricted model where the coefficients of AR and MV
are constrained to be equal, q is the number of restrictions (in this case, q=1), n is the
sample size, and k is the number of coefficients in the unrestricted model.
Using the values from model (1), we have RSS1 = 42.060 and RSS2 = 42.433, q=1,
n=60, and k=7. The calculated F-statistic is 0.158, and the critical value at the 5% level
of significance is 3.95. Since 0.158 < 3.95, we fail to reject the null hypothesis and
conclude that there is no significant difference between the coefficients of AR and MV in
model (1).
4. The coefficient restrictions imposed in model (2) are that the coefficients of AR
and MV are equal, which means that their impact on CEO salaries is assumed to
be the same. To test these restrictions, we can use an F-test. The null hypothesis
is that the coefficients are equal, while the alternative hypothesis is that they are
not equal. We can compute the F-statistic as:
F = ((RSSr - RSSu) / q) / (RSSu / (n - k))
where RSSr is the residual sum of squares for the restricted model (2), RSSu is the
residual sum of squares for the unrestricted model (1), q is the number of restrictions (in
this case, q=1), n is the sample size, and k is the number of coefficients in the
unrestricted model.
Using the values from models (1) and (2), we have RSSr = 42.474, RSSu = 42.060,
q=1, n=60, and k=7. The calculated F-statistic is 0.412, and the critical value at the 5%
level of significance is 3.95. Since 0.412 < 3.95, we fail to reject the null hypothesis and
conclude that there is no significant difference between the coefficients of AR and MV in
model (1), so we can choose model (2) over model (1) due to the simpler and more
parsimonious nature of the former.
5. The squared terms of EM and AGE in model (1) capture the non-linear
relationships between these variables and CEO salaries. The inclusion of these
terms allows for the possibility that the effect of employment tenure or age on
CEO salaries may vary depending on their initial values.
To test the hypothesis that we can drop out the squared terms EM2 and AGE2 from
model (1), we can use an F-test. The null hypothesis is that both squared terms are
equal to zero, while the alternative hypothesis is that at least one of them is non-zero.
The F-statistic is computed as:
F = [(RSSr - RSSf)/q] / [RSSf/(n - k - 1)]
where RSSr is the residual sum of squares for the restricted model that excludes the
squared terms, RSSf is the residual sum of squares for the full model that includes the
squared terms, q is the number of restrictions, which is equal to 2 in this case, and the
other variables are defined as before.
Under the null hypothesis, the F-statistic follows an F-distribution with q and n-k-1
degrees of freedom. We can use this distribution to calculate the p-value of the F-
statistic and compare it to a pre-specified significance level, such as 0.05.
If the p-value is less than the significance level, we reject the null hypothesis and
conclude that at least one of the squared terms is significant. In this case, we would
keep the squared terms in the model. If the p-value is greater than the significance
level, we fail to reject the null hypothesis and conclude that the squared terms are not
significant. In this case, we would drop the squared terms from the model.
Note that we need to specify the population regression model to use the F-test, which
assumes that the true relationship between the dependent variable and the independent
variables is linear. If the true relationship is non-linear, then the F-test may not be
appropriate.
Example 2:
You want to study the dependence of beer expenditures of employees in a company on
their incomes, ages and sexes. You have collected a random sample of observations on
40 office employees, 20 of whom are females and 20 of whom are males. Here is the
description of variables in the data set:
BEi : the annual beer expenditures of employee i, measured in dollars per year;
INCi : the annual income of employee i, in thousands of dollars per year;
AGEi : the age of employee i, in years;
SEXi : the dummy variable, SEXi = 1 if employee i is female and SEXi = 0 if
employee i is male.
You propose the following model (model (1)):

Using OLS method in EVIEWS, you obtain the following results:


Result (1)

Dependent variable: BE
Included observations: 40

Variable Coefficient Std. Error t-Statistic Prob.

C 489.8631 73.85524 6.632747 0.0000

INC 0.002893 0.000775 3.734180 0.0007

AGE -10.07924 2.229676 -4.520493 0.0001

SEX -265.8574 113.3658 -2.345129 0.0250

SEX*INC -0.001029 0.000971 -1.059491 0.2968

SEX*AGE 4.231494 3.648383 1.159827 0.2542

R-squared 0.6470

Result (2)
BEi = 459.21+ 0.0023 INCi - 8.42 AGEi -169.87 SEXi R2=0.6294
Result (3)
BEi = 342.88+ 0.00238 INCi - 7.575 AGEi R2= 0.3292
1. Write down the sample regression model of model (1) based on the result (1)?
Write down the population regression model and sample regression model for male and
female employees and explain the meaning of the estimated regression coefficients?
2. Using result (1), for male employees, how the expenditures for beer change if
their income increases 1000USD/year? Answer the same question for female
employees given that:
3. In the model (1), state the null and the alternative hypothesis if you want to test
that the models for the expenditures of beer for male and female are not different
in slope coefficients of both INC and AGE. In other words, you want to conduct
the joint test of hypothesis of equal slope coefficients of male and female for INC
and equal slope coefficients of male and female for AGE. Perform this test using
appropriate information given above.
4. Using the results above to test the hypothesis that the variable SEX does not
affect the annual expenditures for beer.
5. Given that d-DW statistic is 1.92. Using this value to test the problem that can be
existed in the model.

1. Sample regression model based on Result (1): BEi = 489.8631 + 0.002893INCi -


10.07924AGEi - 265.8574SEXi - 0.001029(SEXiINCi) + 4.231494(SEXiAGEi)
Population regression model for male employees: BEi = β0 + β1INCi + β2AGEi
Population regression model for female employees: BEi = (β0 + β3) + (β1 + β4)INCi +
(β2 + β5)AGEi
where β0 is the intercept, β1 is the coefficient of INC, β2 is the coefficient of AGE, β3 is
the additional intercept for females, β4 is the additional coefficient of INC for females,
and β5 is the additional coefficient of AGE for females. The estimated coefficient for INC
represents the change in annual beer expenditures for a one-unit increase in income (in
thousands of dollars), holding all other variables constant. The estimated coefficient for
AGE represents the change in annual beer expenditures for a one-year increase in age,
holding all other variables constant. The estimated coefficient for SEX represents the
difference in annual beer expenditures between males and females, holding all other
variables constant. The estimated coefficient for SEXINC represents the difference in
the effect of income on annual beer expenditures between males and females, while
the estimated coefficient for SEXAGE represents the difference in the effect of age on
annual beer expenditures between males and females.
2. For male employees, using Result (1), the expenditure for beer increases by
$2.893 per year for every additional $1,000 in income. Therefore, if their income
increases by $1,000 per year, the expenditure for beer would increase by $2.893
* 1 = $2.893. For female employees, the effect of income on beer expenditures is
different due to the interaction term, so we need to use the estimated coefficients
from the model for female employees. According to the population regression
model for female employees, the expenditure for beer increases by $0.002893 +
$0.001029 = $0.003922 per year for every additional $1,000 in income.
Therefore, if their income increases by $1,000 per year, the expenditure for beer
would increase by $0.003922 * 1 = $3.922.
3. The null hypothesis is that the slope coefficients for INC and AGE are the same
for males and females. The alternative hypothesis is that at least one of the slope
coefficients is different for males and females. To test this hypothesis, we can
use the F-test for joint significance of the interaction terms SEXINC and
SEXAGE. The F-statistic is computed as (RSSR - RSSUR) / [q * (RSSUR / (n - k
- q))], where RSSR is the residual sum of squares for the restricted model that
assumes equal slope coefficients for males and females, RSSUR is the residual
sum of squares for the unrestricted model, q is the number of restrictions
imposed (2 in this case), k is the number of parameters in the unrestricted model,
and n is the sample size. The F-statistic is compared to the critical value from the
F-distribution with q and n - k - q degrees of freedom. If the computed F-value is
greater than the critical value, we reject the null hypothesis and conclude that the
slope coefficients are not the same for males and females.
4. To test the hypothesis that the variable SEX does not affect the annual
expenditures for beer, we can perform a t-test of the null hypothesis that the
coefficient on SEX in model (1) is equal to zero. The null and alternative
hypotheses are:
H0: βSEX = 0 (the variable SEX does not have a significant effect on BE) Ha: βSEX ≠ 0
(the variable SEX has a significant effect on BE)
The t-statistic for testing this hypothesis is -2.345 with a p-value of 0.025. Since the p-
value is less than the significance level of 0.05, we reject the null hypothesis and
conclude that there is evidence that the variable SEX has a significant effect on BE.
5. The Durbin-Watson (d-DW) statistic is a test for autocorrelation in the residuals of
a regression model. The statistic ranges from 0 to 4, with values close to 2
indicating no autocorrelation, values less than 2 indicating positive
autocorrelation, and values greater than 2 indicating negative autocorrelation.
A d-DW statistic of 1.92 suggests that there may be positive autocorrelation in the
residuals of the model. This can be problematic because it violates one of the
assumptions of the OLS method, which is that the errors are uncorrelated with each
other. Positive autocorrelation can lead to biased and inefficient coefficient estimates,
as well as incorrect inference about the statistical significance of the coefficients.
To address this issue, we may need to consider alternative estimation methods that are
robust to autocorrelation, such as generalized least squares (GLS) or feasible
generalized least squares (FGLS), or try to identify and correct for the source of the
autocorrelation. One way to check for the presence of autocorrelation is to plot the
residuals against time or the predicted values and look for any patterns or trends.

You might also like