You are on page 1of 7

Econometrics 1123: Section 4 Handout

Eunice Han
ehan@fas.harvard.edu
September 30th, 2010

1. The interaction between Independent Variables


Yi 0 1 X 1i 2 X 2i u i
To allow the effect of X1i on Yi to depend on X2i, we can include the interaction term
X1i* X2i as a regressor.

Yi 0 1 X 1i 2 X 2i 3 ( X 1i * X 2i ) u i
Q1. Show that

Y
X 2i

depends on X1

=
Y
2 3 X 1i
X 2i
Q2. Plot the regression line of Y and X2, when X1 = a and X1 = b, where a<b. Assume that
all the coefficients are positive.
When X1 = a, the regression line is
When X1 = b, the regression line is

Yi 0 1 a ( 2 a 3 ) X 2i
Yi 0 1b ( 2 b 3 ) X 2i

Q3. Let's assume that Y is wage variable and X1i is a binary variable which is 1 if
observation is black and 0 otherwise. X2i is tenure. We want to test whether the effect of
1

tenure on wage of black people is different from that of non-black people. How do we
test whether the intercept is the same for two groups? How do we test whether the slope
is the same for the two groups?
Wage i 0 b la ck Bl ack i ten u re Tenure i b la ck _ T en u re ( Bl ack i * Tenure i ) ui

To test the intercept we should test if


To test the slope we should test if

black

ten u b la ck

=0 significant, so there is difference


=0 significant, so there is difference

2 Application: Fuel Price and Electric Power Production


Research Question: Does increase in fuel price raise the unit cost?
Data Description: Christensen and Greene (1976):
We have production data on 158 electricity generating plants in 1970. The data includes
the prices of inputs (labor, capital, and fuel) used to produce electricity and quantity of
electricity produced and total costs of electricity.
Variable
log_cost: log of total cost
log_unitcost: log of unit cost (unit cost= total cost/output)
log_output: log of total output of electricity, GWH
log_wage: log of wage rate
log_capital: log of capital price index
log_fuel: log of fuel price
Descriptive Statistics

2.1 Univariate Regression


Interpret the slope coefficient in the following regressions.
Q1. Unit cost=2.152 + 0.132 *(Fuel Price)
R2 = 0.06, N=158
$1 increase in fuel price is expected to increase the unit cost of electricity by $0.132.
Q2. Unit cost=-7.081 + 3.916*Log(Fuel Price)
R2 = 0:06;N=158
1% increase in fuel price is expected to increase the unit cost of electricity by $0.039.
Q3. Log(Unit cost)=0.926 + 0.026*(Fuel Price)
R2 = 0:24;N=158
$1 increase in fuel price is expected to increase the unit cost of electricity by 2.6%.
Q4. Log(Unit cost)=-0.902 + 0.771*Log(Fuel Price)
R2 = 0:25;N=158
1% increase in fuel price is expected to increase the unit cost of electricity by 0.77%.
* Scatterplot of unitcost vs. fuelprice

* Scatterplot of log(unitcost) vs. log(fuelprice)

Q. Using the two scatterplots would you suggest using (1) unitcost and fuelprice or (2)
log_unitcost and log_fuel for modeling linear regression?
The relationship between unitcost-fuelprice looks nonlinear and the fitted regression line
doesnt seem to represent their relationship well. Taking logs of both variables make the
relation look much more like a scatter with a linear relation.

2.2 Multivariate Regressions

Q1. How do regressions (1)-(5) differ?


The first 4 specifications have different functional forms of the control variable Output.
The specification 5 has an interaction term between output and fuel price.
Q2.Using regression (2), estimate the effect on log(unit cost) of changing from
log(output)=1.5 to log(output)=2, holding constant the values of the other regressors in
regression (2).
log(unit cost)=[-0.597*2+0.03*22]-[-0.597*1.5+0.03*1.52]= -0.246.

Q3. Assume that you want to test linear vs. quadratic specification in log(output). How
would you perform the test? What is the result of the test?
We will use model (2) to test the null hypothesis:
H0 :
and H1 :
l ogout put 2 0
l ogout put 2 0
To perform the test, just look at the coefficient of (log_output2) in specification (2). If the
value is significantly different from 0(|t-statistic|>1.96), we can reject the null hypothesis.
Since it is significant at 5%, we reject the null of linear model in favor of quadratic model
specification.

Q4. For now, we consider the specification (2). Could you interpret the coefficients on
log_output and on log_output2? Could you describe the characteristics of the dependence
of log(unit cost) on log(output)?
We CANNOT interpret the coefficients on log_output and log_output2 separately
because it is not possible to change log_output keeping log_output2 constant. We can,
however, say something about the relationship between log(unit cost) and log(output) by
looking at the coefficients on log_output and log_output2. Since the coefficient on
log_output is negative (-0.597) and on log_output2 is positive (0.030), we know that if
we plot log_output on the x-axis and log(unit cost) on the y-axis, we would get a parabola
like this:

Q5. Test the hypothesis that the dependence of log(unit cost) on log(output) is quadratic
vs. cubic model.
We will use regression (3) to test the following hypothesis:
H0 :
and H1 :
l ogout put 3 0
l og out put 3 0
To perform this test, just look at the coefficient of log_output3 in regression (3). Since the
coefficient is not significantly different from 0 at 5% significance level, we cannot reject
the null of quadratic model in favor of the cubic model.
Q6. Test the hypothesis that the dependence of log(unit cost) on log(output) is linear vs.
cubic.
We will use regression(3) to test the null hypothesis:
H0 :
l ogout put2 l ogout put3 0
H1 : at least one of them is not zero
To test null hypothesis, we use the F-test since we are testing more than one hypothesis
simultaneously. The reported F-statistic is 70.2 and the p-value is less than 0.001. So we
reject the null of linear model.
Q7. Considering all the previous question, what specification among (1)-(3) would you
choose and why?
I would choose specification (2). We reject the linear model in (2), so (1) is inferior to
(2). We reject cubic model in (3) in favor of quadratic, so (3) is inferior to (2). (2) wins.

Q8. Compare specification (2) and (4). Do you think there is OVB in the coefficient of
log(fuel price) in specification (4)?
Yes. The coefficient of log(fuel price) in (4) suffers from OVB. To see this we should
check whether omitted variables (log_wage and log_capital) satisfies following two
conditions:
(1) log(wage) and log(capital price) are determinants of log(unitcost)
(2) log(fuel) are correlated with these omitted variables log(wage) and log(capital price)
We already know from regression (2), that log(wage) and log(capital price) are important
determinants of log(unit cost). Log(capital price) is significant at 1%, log(wage) is
significant at 10%. To check the second condition, we can find the correlation between
the variables, or we can run the following regression.

Since both log(wage) and log(capital) are important factors explaining log(fuel price)
looking at F-test and R-squared, we conclude that both conditions are satisfied.
Q9. What specification among (1)-(4) is the best in your opinion?
We know that (2) is better than (1) and (3). So we only need to compare specification (2)
and (4). Even though R-squared and Adjusted R-squared is almost same, its better to
include log(wage) and log(capital price) as control variables since they are correlated
with log(fuel price). Therefore, model (2) wins!
Q10. Do the firm with higher output and the firm with lower output have the same
increase in unit cost from a unit increase of fuel price, holding wage and capital price
constant?
No. In (5), interaction term between output and fuel price is positive and statistically
significant. So the regression line for the firm with higher output have different intercepts
and slopes (steeper slope) from those with lower output.