You are on page 1of 22

DUMMY INDEPENDENT

VARIABLES
Qualitative independent variables
(such as wartime vs. peacetime, periods
of strike vs. nonstrike, male vs. female,
etc.) can be introduced into regression
analysis by assigning the value of 1 for
one classification (e.g., wartime) and 0
for the other (e.g., peacetime).
 They are also called dummy variables
and are treated as any other variable.
1
Dummy (indicator, binary, categori-
cal, qualitative, dichotomous) variables
can be used to capture changes (shifts) in
 the intercept: Y =β0 + β1X + β2D + u
 in the slope: Y = γ0 + γ1X + γ2XD + v
 in both intercept and slope:
Y = δ0 + δ1X + δ2D + δ3XD + w,
• where D is 1 for one classification (e.g., female)
and 0 otherwise (e.g., male), and X is a usual
quantitative expl. var.
2
Ex. To capture changes (shifts) in:
the intercept: Y = β0 + β1X + β2D + u
 Note: if D =1, YD=1 = β0 + β1X + β2 + u, hence
the intercept is β0 + β2
 and if D = 0, YD=0 = β0 + β1X + u, hence the
intercept is β0
 There is no difference between the two slopes
 The difference between the two intercepts is
= β2

3
Example: Let’s consider the model
Ct = 101.36 + 0.86Yt – 204.95Dt
(3.98) (58.73) (-10.91)
 where Ct = real per-capita personal
consumption in year t; Yt = real per-capita
disposable income in year t; Dt = 1 if t = 1941,
…, 1946 (the war period in the USA), and 0
otherwise.
• Based on the t-statistic, we can reject H0: β2=0
(the war had no effect on C) in favor of H1: β2
< 0 at the 1% level
4
 Conclusion: during the war years the
level of consumption spending dropped
in a significant way
• If we consider two years, one from the
war period and the other not, with the
same real per-capita disposable
income, the level of consumption
spending during the war year was, on
average, b2 = $204.95 less per person
than in the non-war year.
5
Ex. To capture changes (shifts) in:
the slope: Y = γ0 + γ1X + γ2XD + v
 Note: if D =1, YD=1 = γ0 + γ1X + γ2X+ v,
hence the slope is γ1+ γ2
 and if D = 0, YD=0 = γ0 + γ1X + v, hence the
slope is γ1
 There is no difference between the
intercepts
 The difference between the two slopes is =
γ2
6
Example: TEST_SCORE (Y) =
682.2 - .97STR – 1.28 (STR×D)
 STR = Student/Teacher Ratio;
 D=1 if the % of students still learning
English in the district is high (>10%),
and equals 0 otherwise;
If D = 0: TEST_SCORE= 682.2 - .97STR
If D = 1:TEST_SCORE =
= 682.2 - .97STR – 1.28STR =
= 682.2 – 2.25STR
7
 Reducing STR by 1 will increase Y
by 0.97 points in districts with low %
of students still learning English but
by 2.25 in districts with the high %.
 Note: The difference between the
two effects 2.25 – 0.97 = 1.28 is the
coefficient of the interaction.

8
Ex. To capture changes (shifts) in:
in both intercept and slope:
Y = δ0 + δ1X + δ2D + δ3XD + w
 Note: if D=1, YD=1 = δ0+δ1X+ δ2+δ3X+w
hence YD=1 = (δ0 + δ2) + (δ1+ δ3)X + w;
 and if D = 0, YD=0 = δ0 + δ1X + w
 The difference in the intercepts is δ2
 The difference in the slopes is δ3.

9
Example: Consider the model
UNt = 2.75 – 1.53 Vt + 1.15Dt – 0.85VtDt
(26.9) (-12.55) (3.63) (-1.98)
 UN = unemployment rate (%)
 V = job vacancy rate (%)
 D = 1 for t from Q4 1966 and later; 0 for
t before Q4 1966 (when the British
government increased the level of
unemployment benefits)
10
 All the coefficients are statistically
significant  there, indeed, was a shift in
the UN to V relationship in Q4 ‘66
If D = 0: UNt = 2.75 – 1.53 Vt
If D=1: UNt=2.75 – 1.53 Vt + 1.15 – .85Vt,
hence UNt = 3.90 – 2.38 Vt
The (D=1) curve has a much steeper
slope and a higher intercept than in the
period before Q4 1966
11
 For a given job-vacancy rate (V) there
is more unemployment (UN) after the
4th quarter 1966, than before. Why? If
V is increased, the unemployed are
likely to take a longer time to look for a
job

12
Qualitative Variable with Many
Categories. The number of possible
categories in a qualitative variable might
be more than two.
Ex. Consider the following model:
Y = β0 + β1X + β2A1 + β3A2 + u
where Y = Savings, X = Income, and
 A1 = 1 if Age is 25-55, and 0 otherwise;
 A2 = 1 if Age >55, and 0 otherwise.
13
 Notice that the third group
(Age<25) is not explicitly presented
in the model.
 This approach follows a general
rule, which says: the number of
dummy variables presented in the
model is always one less than the
number of categories (groups).

14
 In our case, we have three age
categories, therefore, only two of them
are specified explicitly in the model.
 What about the third one? This
category (unspecified in the model) is
called the control group.
 The choice of the control group is
arbitrary, and it is usually based on the
objective of the study.

15
 Notice that, based on the definition
of A1 and A2, this control group
(i.e., Age<25) corresponds to the
situation where A1 = 0 and A2 =0.
 For the group (Age 25-55),
A1 = 1 and A2 = 0,
 whereas for the group (Age>55): A1
= 0 and A2 = 1.
16
 The initial model:
• Y = β0 + β1X + β2A1 + β3A2 + u,
after its estimation with OLS, can be
rewritten in several forms:
(i) For Age<25: Ŷ = b0 + b1X (both A1 and A2 are
zero
(ii) For Age 25-55: Ŷ = b0 + b1X + b2A1, hence Ŷ =
(b0 + b2) + b1X
(iii) For Age>55: Ŷ = b0 + b1X + b3A2, hence Ŷ =
(b0 + b3) + b1X
17
 In model (ii), b2 is an estimate of the
difference in intercept between a middle-
aged (Age 25-55) and a young (Age<25)
household. Indeed, (b0 + b2) – b0 = b2.
 In model (iii), b3 is an estimate of the
difference in intercept between an old
(Age>55) and a young (Age<25)
household. Indeed, (b0 + b3) – b0 = b3.
 The intercepts in models (ii) and (iii)
are the deviations from the control gr.
18
 There is a special reason for not
defining a third dummy variable, A3,
when there are three categories.
 If we had done this we would have
had exact multicollinearity, because
Then A1 + A2 + A3 = 1
 This situation is called the dummy
variable trap.
19
 For example, if we wanted to
capture seasonal differences
between electricity consumption
and temperature, we would define
three dummy variables (because
there are four seasons).
 To capture monthly differences, we
need 11 dummy variables.

20
 Note: However, if the model is
formulated without a constant,
there is no problem with exact
multicollinearity, even when this
third dummy variable is included
in the model:
Y = β1X1 + β2A1 + β2A2 + β3A3 + u
 Additional examples of the use of
dummy variables: CN, p. 101
21
Estimating Seasonal Effects.
Another example of the use of
dummy variables occurs in
estimating seasonal effects of
independent variables (see CN, p.
101)

22

You might also like