Professional Documents
Culture Documents
VARIABLES
Qualitative independent variables
(such as wartime vs. peacetime, periods
of strike vs. nonstrike, male vs. female,
etc.) can be introduced into regression
analysis by assigning the value of 1 for
one classification (e.g., wartime) and 0
for the other (e.g., peacetime).
They are also called dummy variables
and are treated as any other variable.
1
Dummy (indicator, binary, categori-
cal, qualitative, dichotomous) variables
can be used to capture changes (shifts) in
the intercept: Y =β0 + β1X + β2D + u
in the slope: Y = γ0 + γ1X + γ2XD + v
in both intercept and slope:
Y = δ0 + δ1X + δ2D + δ3XD + w,
• where D is 1 for one classification (e.g., female)
and 0 otherwise (e.g., male), and X is a usual
quantitative expl. var.
2
Ex. To capture changes (shifts) in:
the intercept: Y = β0 + β1X + β2D + u
Note: if D =1, YD=1 = β0 + β1X + β2 + u, hence
the intercept is β0 + β2
and if D = 0, YD=0 = β0 + β1X + u, hence the
intercept is β0
There is no difference between the two slopes
The difference between the two intercepts is
= β2
3
Example: Let’s consider the model
Ct = 101.36 + 0.86Yt – 204.95Dt
(3.98) (58.73) (-10.91)
where Ct = real per-capita personal
consumption in year t; Yt = real per-capita
disposable income in year t; Dt = 1 if t = 1941,
…, 1946 (the war period in the USA), and 0
otherwise.
• Based on the t-statistic, we can reject H0: β2=0
(the war had no effect on C) in favor of H1: β2
< 0 at the 1% level
4
Conclusion: during the war years the
level of consumption spending dropped
in a significant way
• If we consider two years, one from the
war period and the other not, with the
same real per-capita disposable
income, the level of consumption
spending during the war year was, on
average, b2 = $204.95 less per person
than in the non-war year.
5
Ex. To capture changes (shifts) in:
the slope: Y = γ0 + γ1X + γ2XD + v
Note: if D =1, YD=1 = γ0 + γ1X + γ2X+ v,
hence the slope is γ1+ γ2
and if D = 0, YD=0 = γ0 + γ1X + v, hence the
slope is γ1
There is no difference between the
intercepts
The difference between the two slopes is =
γ2
6
Example: TEST_SCORE (Y) =
682.2 - .97STR – 1.28 (STR×D)
STR = Student/Teacher Ratio;
D=1 if the % of students still learning
English in the district is high (>10%),
and equals 0 otherwise;
If D = 0: TEST_SCORE= 682.2 - .97STR
If D = 1:TEST_SCORE =
= 682.2 - .97STR – 1.28STR =
= 682.2 – 2.25STR
7
Reducing STR by 1 will increase Y
by 0.97 points in districts with low %
of students still learning English but
by 2.25 in districts with the high %.
Note: The difference between the
two effects 2.25 – 0.97 = 1.28 is the
coefficient of the interaction.
8
Ex. To capture changes (shifts) in:
in both intercept and slope:
Y = δ0 + δ1X + δ2D + δ3XD + w
Note: if D=1, YD=1 = δ0+δ1X+ δ2+δ3X+w
hence YD=1 = (δ0 + δ2) + (δ1+ δ3)X + w;
and if D = 0, YD=0 = δ0 + δ1X + w
The difference in the intercepts is δ2
The difference in the slopes is δ3.
9
Example: Consider the model
UNt = 2.75 – 1.53 Vt + 1.15Dt – 0.85VtDt
(26.9) (-12.55) (3.63) (-1.98)
UN = unemployment rate (%)
V = job vacancy rate (%)
D = 1 for t from Q4 1966 and later; 0 for
t before Q4 1966 (when the British
government increased the level of
unemployment benefits)
10
All the coefficients are statistically
significant there, indeed, was a shift in
the UN to V relationship in Q4 ‘66
If D = 0: UNt = 2.75 – 1.53 Vt
If D=1: UNt=2.75 – 1.53 Vt + 1.15 – .85Vt,
hence UNt = 3.90 – 2.38 Vt
The (D=1) curve has a much steeper
slope and a higher intercept than in the
period before Q4 1966
11
For a given job-vacancy rate (V) there
is more unemployment (UN) after the
4th quarter 1966, than before. Why? If
V is increased, the unemployed are
likely to take a longer time to look for a
job
12
Qualitative Variable with Many
Categories. The number of possible
categories in a qualitative variable might
be more than two.
Ex. Consider the following model:
Y = β0 + β1X + β2A1 + β3A2 + u
where Y = Savings, X = Income, and
A1 = 1 if Age is 25-55, and 0 otherwise;
A2 = 1 if Age >55, and 0 otherwise.
13
Notice that the third group
(Age<25) is not explicitly presented
in the model.
This approach follows a general
rule, which says: the number of
dummy variables presented in the
model is always one less than the
number of categories (groups).
14
In our case, we have three age
categories, therefore, only two of them
are specified explicitly in the model.
What about the third one? This
category (unspecified in the model) is
called the control group.
The choice of the control group is
arbitrary, and it is usually based on the
objective of the study.
15
Notice that, based on the definition
of A1 and A2, this control group
(i.e., Age<25) corresponds to the
situation where A1 = 0 and A2 =0.
For the group (Age 25-55),
A1 = 1 and A2 = 0,
whereas for the group (Age>55): A1
= 0 and A2 = 1.
16
The initial model:
• Y = β0 + β1X + β2A1 + β3A2 + u,
after its estimation with OLS, can be
rewritten in several forms:
(i) For Age<25: Ŷ = b0 + b1X (both A1 and A2 are
zero
(ii) For Age 25-55: Ŷ = b0 + b1X + b2A1, hence Ŷ =
(b0 + b2) + b1X
(iii) For Age>55: Ŷ = b0 + b1X + b3A2, hence Ŷ =
(b0 + b3) + b1X
17
In model (ii), b2 is an estimate of the
difference in intercept between a middle-
aged (Age 25-55) and a young (Age<25)
household. Indeed, (b0 + b2) – b0 = b2.
In model (iii), b3 is an estimate of the
difference in intercept between an old
(Age>55) and a young (Age<25)
household. Indeed, (b0 + b3) – b0 = b3.
The intercepts in models (ii) and (iii)
are the deviations from the control gr.
18
There is a special reason for not
defining a third dummy variable, A3,
when there are three categories.
If we had done this we would have
had exact multicollinearity, because
Then A1 + A2 + A3 = 1
This situation is called the dummy
variable trap.
19
For example, if we wanted to
capture seasonal differences
between electricity consumption
and temperature, we would define
three dummy variables (because
there are four seasons).
To capture monthly differences, we
need 11 dummy variables.
20
Note: However, if the model is
formulated without a constant,
there is no problem with exact
multicollinearity, even when this
third dummy variable is included
in the model:
Y = β1X1 + β2A1 + β2A2 + β3A3 + u
Additional examples of the use of
dummy variables: CN, p. 101
21
Estimating Seasonal Effects.
Another example of the use of
dummy variables occurs in
estimating seasonal effects of
independent variables (see CN, p.
101)
22