Professional Documents
Culture Documents
• There is a good argument that during war years GDP should behave differently than during
peacetime.
Yt = β0 D1t + β1 D2t + β2 Xt + ²t
• If you regressed this model with an intercept term, the estimation would break down because
of multicolinearity. This is because D1t + D2t = 1 and the constant term’s ”variable” is also
1.
• We could run the model with only one dummy variable and use an intercept term:
Yt = β0 + β1 D2t + β2 Xt + ²t
108
• Here, we could run OLS on this model with no problems.
• The dummy variables actually act as a shift parameter for the mean of one group. They allow
• We know that there are various functional forms we can use in econometrics. But they all
have to be linear in parameters. This linearity is constrained because there are no ways to
• Consider the model with the intercept included. If the D = 0 then the intercept for the model
• Thus, whenever the condition is satisfied, such that D = 1, the intercept of our regression
line shifts.
• Of course, depending on whether β1 is greater than, less than or equal to zero will determine
which way the model will shift. Consider the β1 > 0 and β2 > 0, then
• ......
..
............
..... ................................
..... .......................
.....
.....
...
..
.
.......................
• slope = β2
...................
..... .............
.
... . ..........
.....
.....
..... • .....
.....
.....
..... .....
.......
. ..
......
.... ....
..... .....
..... .....
..... .....
...
...... ...
. .....
.... ....
..... .....
..... .....
..... .....
..... .....
Y .
.....
.....
.
..
.....
.....
.
...
..... .....
..... .....
.......
. .
.......
.... ....
..... .....
..... .....
β0 + β1 ....
.
....
...
.....
.....
.....
.....
.....
.
........
...
.....
β0 .....
• Here, the intercept shift allows the same marginal influence of X on GDP (Y ) but recognizes
109
• One must be careful in interpreting intercept shifts by looking at the exact definition of the
dummy variables.
• As mentioned earlier, one cannot estimate a collinear model (because the rank of X < k).
Thus, if all of the dummy variables included in your model will always sum to one, then there
• Then, we include j − 1 dummy variables for our qualitative variables where j is the number
of possible categories.
• For example: If sex is our qualitative variable, then we include SEX = 0 for male and 1 for
female.
• However, if we are looking at highest education then we may have several categories, e.g.,
grade school, high-school, some college, undergraduate degree, master’s degree, doctorate.
• That category omitted is the “reference” category - which all other categories are compared
against.
• Thus, if income was the dependent variable and level of education dummy variables are
included, then we might expect positive parameter estimates if grade-school is the reference
category. On the other hand, if graduate school was the reference category, we might expect
• A potential pitfall in the use of dummy variables can be given in an example taken from labor
economics.
110
• Consider the hypothesis that there is wage discrimination against females. One may wish
to estimate a wage equation that would control for various individual attributes such as
productivity, motivation, teamwork, etc. At the same time, one would want to include a
dummy variable that would control for the sex of a particular worker.
• This is a common approach in labor models. Thus, an example of a wage model would be
where EDU is the education (typically in years), and Zi is a set of variables thought to
influence wages such as the years of tenure at a particular job, the age of the worker and the
• Oftentimes, the explanatory variables are highly correlated with each other. This is a potential
problem.
• Nonetheless, let’s say that we estimate a model and find that β1 < 0. Some would claim that
• The estimated equation only states that, given the sample used, that women start out at a
lower wage. In this model, the returns to education, tenure, age and experience are assumed
111
• Thus, a picture of this in the EDU space would look like
Males
......
..... • ....
......
...... ..... .......................
..... ..... .......................
wage ... ..
. ∆W
• slope = ∆EDU
.....
....
...
.
.....
.....
..
.....
.
.....................
...........
...................
....................... = β2
.... ..
... .. ..
..
... ..... .....
... .....
..... • .....
...
..... ........ .....
........ ..
......
... ....
..... .....
..... .....
..... .....
.
..
...... ...
......
.... .....
..... .....
..... ..... ....
..... ................................................................
..
......
. .......
.
............
... ...
..... .....
..... .....
... .......
. .
....
.
. ..... Females
..... .....
..... .....
..... .....
..... .....
β0 .....
.....
.....
.
....
.
..
.....
.
........
...
.....
.....
.....
β0 + β1 .....
EDUCATION
• Here, the implication is that women start at a lower wage (as indicated by the lower intercept),
i.e., that β1 < 0 but that the returns to education are the same for men and women alike.
• Perhaps we think that women and men are actually rewarded differently for their education.
• To accommodate this possibility, we then interact our SEX dummy variable with EDU to
obtain:
• In this model, the intercepts are allowed to shift as well as the slope parameter on EDU.
∆Wi
= β2
∆EDUi
∆Wi
= β2 + γ0
∆EDUi
112
• When the slopes differ across groups, then we can claim a difference in the returns to education
• If the parameter γ0 is insignificant, then it would imply that there is no difference in the
marginal effect of education on wages across the sexes. This is NOT the same as saying that
• What if we think that the returns to education are not exactly linear, but that there may be
some second-order affect of education on wages. This would seem a rather straight forward
idea.
• This model allows there to be a second-order effect (probably negative) of education and that
............. ...
........ ....
...
..
......... ...
..
.......
. ...
.....
.
.
...
β0 + β1 slope = (β2 + γ0 ) + 2EDUi (β3 + γ1 )
EDUCATION
• In this case, there is a difference in the intercept terms, but the γ parameter would be
insignificant or equal. This is reflected in the parallel course of the two curves.
113
• Note that there would be a ”glass ceiling” in this graph even though the returns to education
were the same on the margin but that the returns differ because of the starting values of the
......
.......
......
Females
......
...
.......
.....
.....
......
......
.
. .......
.
....
......
.
......
......
...... ...............................
.................
...................
.....
Males
....
..................... ..
Wi ....
.....................
............... .......
..
..
.
. .
....
...
............ .........
........
....... ...
.........
.
. ..
.. .
. . .... . .. slope = β + 2β3 EDUi 2
.... .
....
... ....
...
β0 .
....
.
.....
.
..........
..... ....................
.... ..........
.....
. ..........
..........
.
... .
....
β0 + β1 slope = (β2 + γ0 ) + 2EDUi (β3 + γ1 )
EDUCATION
• Here, we see that β1 < 0 so that women start out at a lower wage. We also see that the
returns to education on the margin are greater for women (γ1 > 0) than men but that at
some level of education, women will begin to earn more than men. The question then is how
• Thus, the use of dummy variables and interaction terms can allow us to test all sorts of extra
hypotheses.
• These pictures do not necessarily represent general conclusions. Different data samples will
• In certain applications we recognize that there may be a time trend in the data.
• A good example is prices or GDP which tend to grow over time. We would like to control for
114
• One may think to use dummy variables to control for different time periods. However, this
could be problematic.
• If you have T time periods in your sample and T observations, then to treat each time period
as different would require T − 1 dummy variables, plus and intercept term. At this point, you
• A trend variable is a monotonically increasing variable, typically just the time index.
yt = β0 + β1 Xt + β2 T IM E
where T IM E = 1, 2, . . . , T .
• Note: there is a potential problem with time trends in log-log or lin-log models.
• If one takes the log of time, you obtain the following vis-a-vis linear time trends:
115
100
80
60
40
20
0
0 20 40 60 80 100
trend
trend lntrend
• Thus, when you take the log of time, you may be discounting the effect of time on your
dependent variable.
• Note: The log of zero does not exist; one should be careful in how you set up your time trend.
116
8.4 Example: The Taft-Hartley Act of 1947
• It is very apparent that something dramatic happened around the mid 1940s - work stoppages
declined significantly.
• In fact, in 1947 the Taft-Hartley Act- was passed. From Infoplease.com “the act qualified
or amended much of the National Labor Relations (Wagner) Act of 1935, the federal law
parts of the Federal Anti-Injunction (Norris-LaGuardia) Act of 1932. The act established
control of labor disputes on a new basis by enlarging the National Labor Relations Board and
providing that the union or the employer must, before terminating a collective-bargaining
agreement, serve notice on the other party and on a government mediation service.”
117
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
strikes | 73 829.2466 1330.081 17 4985
realgdp | 73 3627.236 2473.242 603.3 9214.54
unemp | 73 7.306849 5.179692 1.2 25.2
minwage | 73 1.846575 1.621499 0 5.15
time | 73 36 21.21713 0 72
. reg strikes realgdp unemp time minwage
---------------------------------------------------------------------
strikes | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+-----------------------------------------------------------
realgdp | .530194 .3803215 1.39 0.168 -.2287257 1.289114
unemp |-27.50805 29.85787 -0.92 0.360 -87.0885 32.07241
time |-106.8932 25.38961 -4.21 0.000 -157.5574 -56.22906
minwage | 84.74099 515.9023 0.16 0.870 -944.726 1114.208
_cons | 2798.781 577.0888 4.85 0.000 1647.218 3950.344
---------------------------------------------------------------------
. reg strikes realgdp unemp time minwage pre47
---------------------------------------------------------------------
strikes | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+-----------------------------------------------------------
realgdp |-.8260032 .2303428 -3.59 0.001 -1.285769 -.366237
unemp |-100.4215 17.05749 -5.89 0.000 -134.4684 -66.37464
time | 112.2559 21.79502 5.15 0.000 68.75289 155.759
minwage | 24.52896 278.1735 0.09 0.930 -530.7077 579.7656
pre47 | 4519.136 349.7473 12.92 0.000 3821.037 5217.234
_cons | -641.696 409.5055 -1.57 0.122 -1459.072 175.6804
---------------------------------------------------------------------
118
• If we do not include the pre47 dummy variable the constant term is overstated and the
impacts of unemployment and real gross domestic product are muted (insignificant).
• If we include the pre47 dummy variable, the constant term during the pre-1947 period is
equal to
. lincom _b[pre47]+_b[_cons]
( 1) pre47 + _cons = 0
-----------------------------------------------------------------
strikes | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+------------------------------------------------------
(1) | 3877.44 649.3468 5.97 0.000 2581.338 5173.542
whereas after 1947, the constant term is -641 but not significantly different from zero.
• Notice that after we include the Pre47 dummy variable, REALGDP is negatively related to
work stoppages as is UNEMP. Both of these results make sense: with greater income and
---------------------------------------------------------------------
strikes | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+-----------------------------------------------------------
realgdp | -.8260032 .2303428 -3.59 0.001 -1.285769 -.366237
unemp | -100.4215 17.05749 -5.89 0.000 -134.4684 -66.37464
time | 112.2559 21.79502 5.15 0.000 68.75289 155.759
minwage | 24.52896 278.1735 0.09 0.930 -530.7077 579.7656
post47 | -4519.136 349.7473 -12.92 0.000 -5217.234 -3821.037
_cons | 3877.44 322.1265 12.04 0.000 3234.473 4520.407
---------------------------------------------------------------------
119
• Notice that the marginal impacts of REALGDP and UNEMP have not changed. The only
thing that has change is the intercept term. Now, after 1947 we find an intercept term is
. lincom _b[post47]+_b[_cons]
( 1) post47 + _cons = 0
------------------------------------------------------------------
strikes | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------
(1) |-641.696 739.3518 -0.87 0.389 -2117.448 834.0564
------------------------------------------------------------------
120