Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2

CHAPTER 3
Dummy Variable Regression Models

(Lec 8-9)
Nguyen Thu Hang, BMNV, FTU CS2 1

Outline
1. Multiple regression analysis with qualitative
information
1.1. The nature of qualitative information
1.2. Describing Qualitative Information
1.3. A single dummy independent variable
1.4. Interaction with dummy variables
1.5. Dummies for Multiple Categories
2. Testing for Differences Across Groups
2.1. The Chow test
2.2 The use of dummy variables
3. The dummy variables in seasonal analysis
1. 1.The Nature of Qualitative Information
Sometimes we can not obtain a set of numerical

values for all the variables we want to use in a
model.
Examples:
(a) Gender may play a role in determining salary levels
(b) Different ethnic groups may follow different
consumption patterns
(c) Educational levels can affect earnings from
employment

1. 1.The Nature of Qualitative Information
It is easier to have dummies for cross-sectional

variables, but sometimes we do have for time series
as well.
Examples
(a) Changes in a political regime may affect production
(b) A war can have an impact on economic activities
(c) Certain days in a week or certain months in a year can have different
effects in the fluctuation of stock prices
(d) Seasonal effects are often observed in demand of various products
1.2. Describing Qualitative Information
A dummy variable (binary variable) is a

variable that takes on the value 1 or 0.
 The name of dummy variable should indicate

the event with the value one.
Example: (see Table 3.1):
• female (= 1 if are female, 0 otherwise),
• married (= 1 if are married, 0 otherwise), etc

Table 3.1 A Partial Listing of the Data in WAGE1.RAW
person wage educ exper female maried
1 3.10 11 2 1 0
2 3.24 12 22 1 1
3 3.00 11 2 0 0
4 6.00 8 44 0 1
5 5.30 12 7 0 1

1.3. A single dummy independent variable
• A simple model with one continuous variable
(x) and one dummy (d) :
y   0   0 d  1 x  u
y   0   0 female  1educ  u
• This can be interpreted as an intercept shift:
If d = 0, then y   0   1 x  u
If d=1, then y   0   0  1 x  u
 The case of d = 0 is the base group.

Example of 0 > 0
y y = (0 + 0) + 1x

d=1
slope = 1
0
{ y = 0 + 1x
d=0
}
0
x
Example
• Model of wage determination
y   0   0 female  1educ   2 exp er   3tenure  u
• H0: there is no difference between earnings of men and
women:
 0 0
• H1: There is discrimination against women:
 0 0
Control variables
wage= average hourly earnings
female=1 if female
educ =years of education
exper= years potential experience
tenure= years with current employers
Example
• Output
. reg wage female educ exper tenure
Source SS df MS Number of obs = 526

F( 4, 521) = 74.40
Model 2603.10658 4 650.776644 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.3635
Adj R-squared = 0.3587
Total 7160.41429 525 13.6388844 Root MSE = 2.9576
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596

educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778
_cons -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538

Example- interpreting the output
- Negative intercept the intercept for men is
not very meaningful because no one has zero
values of all educ, exper and tenure in the
example.
- Negative coefficient on female  the average
difference in hourly earnings between a woman
and a man, given the same levels of educ, exper
and tenure  The woman earns, on average
USD1.81 less per hour than the man.
0 0  0
The intercept for men: , for women:
Example
Do not include male and female in a model
with an intercept  dummy variable trap due
to perfect collinearity.
y   0   0 female   1male  1educ   2 exp er   3tenure  u
. reg wage female male educ exper tenure
note: male omitted because of collinearity

F( 4, 521) = 74.40
Model 2603.10658 4 650.776644 Prob > F = 0.0000
Total 7160.41429 525 13.6388844 Root MSE = 2.9576
female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596

male 0 (omitted)
educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778
_cons -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538
Example
• There is no dummy variable trap if we do not
have an overall intercept.
y   0 female   1male  1educ   2 exp er   3tenure  u
. reg wage female male educ exper tenure, nocons

F( 5, 521) = 477.61
Model 20888.9846 5 4177.79693 Prob > F = 0.0000
Total 25446.2924 526 48.3769817 Root MSE = 2.9576
female -3.378791 .7028798 -4.81 0.000 -4.759618 -1.997964

male -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538
educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778

Example
• Reestimate the wage equation using log(wage)
as the dependent variable and adding
quadratics in exper and tenure.
. reg lwage female educ exper expersq tenure tenursq

F( 6, 519) = 68.18
Model 65.3791009 6 10.8965168 Prob > F = 0.0000
Residual 82.9506505 519 .159827843 R-squared = 0.4408
Total 148.329751 525 .28253286 Root MSE = .39978
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
female -.296511 .0358055 -8.28 0.000 -.3668524 -.2261696

educ .0801967 .0067573 11.87 0.000 .0669217 .0934716
exper .0294324 .0049752 5.92 0.000 .0196585 .0392063
expersq -.0005827 .0001073 -5.43 0.000 -.0007935 -.0003719
tenure .0317139 .0068452 4.63 0.000 .0182663 .0451616
tenursq -.0005852 .0002347 -2.49 0.013 -.0010463 -.0001241
_cons .416691 .0989279 4.21
Nguyen Thu Hang, 0.000
BMNV, FTU CS2 .2223425 .6110394 14
Example- Interpreting output
• The coefficient on female implies that for the
same levels of educ, exper, and tenure,
women earn about 29.7% less than men

1.4. Interactions with Dummies
An interaction term is an independent variable
that is the product of two other independent
variables. These independent variables can be
continuous or dummy variables
Yt = 1 + 2Xt + 3Zt + 4XtZt + et
In this model, the effect of X on Y will depend on
the level of Z.
In this model, the effect of Z on Y will depend on
the level of X.
1.4. Interactions with Dummies
• Can also consider interacting a dummy
variable, d, with a continuous variable, x
• y = 0 + 1d + 1x + 2d*x + u
• If d = 0, then y = 0 + 1x + u
• If d = 1, then y = (0 + 1) + (1+ 2) x + u
• This is interpreted as a change in the slope

Example of 0 > 0 and 1 < 0
y
y = 0 + 1x
d=0
d=1
y = (0 + 0) + (1 + 1) x
Nguyen Thu Hang, BMNV, FTU CS2

x 18
Example
• Test whether the return to education is the
same for men and women, allowing for a
constant wage differential between men and
women).
y  (  0   0 female)  ( 1   1 female)educ  u

Example
• Output
. reg lwage female educ femaleeduc

F( 3, 522) = 74.65
Model 44.531522 3 14.8438407 Prob > F = 0.0000
Total 148.329751 525 .28253286 Root MSE = .44592
female -.3600645 .1854296 -1.94 0.053 -.7243444 .0042154

educ .0772279 .0089875 8.59 0.000 .0595718 .0948841
femaleeduc -.0000641 .0145035 -0.00 0.996 -.0285565 .0284283
_cons .8259547 .1180502 7.00 0.000 .5940427 1.057867

Example
H0: Return to education is the same for men
and women,  1  0
Interpretation of output: The estimated return
to education for men is 7.72% and for women
is (7.72-0.00)%. The difference, -0.00% is not
economically large nor statistically significant
(t=-0.00). There is no evidence against the
hypothesis that the return to education is the
same for men and women.

Example
• Reestimate the wage equation using log(wage)
as the dependent variable and adding
quadratics in exper and tenure.
female -.2267886 .1675394 -1.35 0.176 -.5559289 .1023517

educ .0823692 .0084699 9.72 0.000 .0657296 .0990088
femaleeduc -.0055645 .0130618 -0.43 0.670 -.0312252 .0200962
exper .0293366 .0049842 5.89 0.000 .019545 .0391283
expersq -.0005804 .0001075 -5.40 0.000 -.0007916 -.0003691
tenure .0318967 .006864 4.65 0.000 .018412 .0453814
tenursq -.00059 .0002352 -2.51 0.012 -.001052 -.000128
_cons .388806 .1186871 3.28 0.001 .1556388 .6219732

Example
• Interpreting output: The estimated return to
education for men is 8.2% and for women is
0.082-0.0056= 7.6%. The difference -0.56% is
not economically large nor statistically
significant.  There is no evidence against the
hypothesis that the return to education is the
same for men and women.

Example
• H0:  0  0 and  1  0
• Output
test female= femaleeduc=0
( 1) female - femaleeduc = 0
( 2) female = 0
F( 2, 518) = 34.33
Prob > F = 0.0000
• H0 is rejected  a constant wage differential
between women and men is allowed!
1.5.Dummies with Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Estimate a model that allows for differences among four
groups: primary, secondary, tertiary, BSc, MSc
D1= 1 if primary; 0 otherwise
D2= 1 if secondary; 0 otherwise
D3= 1 if tertiary; 0 otherwise
D4 = 1 if BSc; 0 otherwise
D5 = 1 if MSc; 0 otherwise
• Any categorical variable can be turned into a

set of dummy variables
• Because the base group is represented by the
intercept, if there are n categories there
should be n – 1 dummy variables
• If there are a lot of categories, it may make
sense to group some together
• Example: top 10 ranking, 11 – 25, etc.

So we estimate:
Y=β1+β2X2+ a1 D2+a2D3+a3D4+a4D5+u
Note that one dummy (in this case D1) is excluded

from the model in order to avoid dummy variable
trap.
Consider various cases,

i.e. D2=1, D3=D4=D5=0 etc.

Example-Seasonal Dummy Variables
Depends on the frequency of the data

Quarterly – 4 dummies – DQ1, DQ2, DQ3, DQ4
Monthly – 12 dummies – one for each month
Daily – 5 dummies – Dmon, Dtue, Dwed etc.
Again we either exclude one and include a constant

(always better) or if we use all we never include a
constant (dummy variable trap).
2. Testing for Differences Across Groups
• Testing whether a regression function is
different for one group versus another:
2.1. The Chow test.
2.2. Test for the joint significance of the dummy
and its interactions with all other x variables
•

Example
• Whether the same regression model describes college
grade point averages for male and female college
athletes.
cumgpa   0  1 sat   2 hsperc   3 tothrs  u
• Where cumgpa= cumulative gpa

sat= SAT score
hsperc= high school rank percentile
tothrs= total hours of college course

The Chow test
• Suppose we have 2 groups, g=1 and g=2.
• We test whether the intercept and all slopes
are the same across the two groups.
y   g ,o   g ,1 x1   g , 2 x 2  ...   g ,k x k  u

The Chow test
• SSR1 = the sum of squared residuals for group
1 (n1 observation).
• SSR2 = the sum of squared residuals for group
2 (n2 observation).
• SSR = the sum of squared residuals for the
whole sample (n observation).

The Chow test
SSR1 /  2
~  2 (n1  k  1)
SSR2 /  2 ~  (n2  k  1)
2
SSRU  SSR1  SSR2 ~  2 (n  2(k  1))
• ~  2 (n  k  1)
SSR /  2

The Chow Test
• F-statistic with the df of (k+1, n-2(k+1))
• F-statistic us called the Chow statistic
• If FF  ,( k 1, n  2 ( k 1))
 H0 is rejected.
F
 SSR   SSR1  SSR2    n  2 k  1 

SSR1  SSR2 k 1
The Chow Test- Example
• Output for female =1
. reg cumgpa sat hsperc tothrs if female==1

F( 3, 176) = 34.08
Model 83.4816253 3 27.8272084 Prob > F = 0.0000
Total 227.171352 179 1.2691137 Root MSE = .90356
cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]
sat .0017281 .0004642 3.72 0.000 .0008119 .0026442

hsperc -.0059167 .0038895 -1.52 0.130 -.0135927 .0017594
tothrs .0158603 .0018485 8.58 0.000 .0122122 .0195085
_cons .1003465 .4810947 0.21 0.835 -.8491105 1.049803

• . reg
Output for female =0
cumgpa sat hsperc tothrs if female==0

F( 3, 548) = 41.94
Model 89.6937042 3 29.8979014 Prob > F = 0.0000
Total 480.313125 551 .871711661 Root MSE = .84428
sat .0006113 .000231 2.65 0.008 .0001576 .001065

hsperc -.0059675 .0017459 -3.42 0.001 -.0093969 -.002538
tothrs .0103004 .001074 9.59 0.000 .0081907 .0124101
_cons 1.213984 .2602697 4.66 0.000 .7027359 1.725233

• Output for the whole sample
. reg cumgpa sat hsperc tothrs

F( 3, 728) = 74.72
Model 168.533658 3 56.1778861 Prob > F = 0.0000
Total 715.898555 731 .979341389 Root MSE = .86711
sat .0009028 .0002079 4.34 0.000 .0004947 .0013109

hsperc -.0063791 .0015678 -4.07 0.000 -.0094572 -.0033011
tothrs .0119779 .0009314 12.86 0.000 .0101494 .0138064
_cons .9291105 .2285515 4.07 0.000 .4804118 1.377809

The Chow Test
• F-statistic with the df of (4, 724)
• F-statistic us called the Chow statistic
• FF  2.384
0.05 ,( 4 , 724 )
 H0 is rejected.
F
 547.36  143.68  390.62   732  2 3  1 
  4.42
143.68  390.62 3 1

Test for the joint significance of the dummy and its
interactions with all other x variables
• Model
cumgpa   0   o female   1 sat   1 female * sat   2 hsperc
  2 female * hsperc   3tothrs   3 female * tothrs  u
Null hypothesis:
 0  0,  1  0,  2  0,  3  0

Example-Output
• Look at the t-statistics of female and its interactions!
F( 7, 724) = 35.15
Model 181.589407 7 25.9413439 Prob > F = 0.0000
Total 715.898555 731 .979341389 Root MSE = .85907
female -1.113638 .528539 -2.11 0.035 -2.15129 -.0759859

sat .0006113 .000235 2.60 0.009 .0001499 .0010727
femalesat .0011167 .0005 2.23 0.026 .0001351 .0020984
hsperc -.0059675 .0017765 -3.36 0.001 -.0094551 -.0024798
femalehsperc .0000508 .0041025 0.01 0.990 -.0080035 .008105
tothrs .0103004 .0010928 9.43 0.000 .0081549 .0124459
femaletothrs .0055599 .0020696 2.69 0.007 .0014968 .009623
_cons 1.213984 .2648281 4.58 0.000 .6940617 1.733907
Example-Output
The coefficients on female, female*sat,
female*tothrs are significant H0 is
rejected.
Estimate the restricted model by dropping female and its
interactions  Rr  0.2354
2
( 0 .2537  0.2354 ) / 4
F  4 .43
(1  0 .2537 ) / 724
(R  R ) / q
2 2
F ur r
(1  Rur ) / df ur
Nguyen Thu Hang, BMNV, FTU CS2
2 41
Example-Output
test female= femalesat= femalehsperc=
femaletothrs=0
( 1) female - femalesat = 0
( 2) female - femalehsperc = 0
( 3) female - femaletothrs = 0
( 4) female = 0
F( 4, 724) = 4.42
• Prob > F = 0.0015
• Many economic time series based on monthly or
quarterly data exhibit seasonal patterns.
Examples:
 Sales of department stores at Christmas and
other major holiday times
 Demand for money by households at holiday
times
 Demand for ice cream and soft drinks during
summer
 Prices of crops right after harvesting season,
demand for air travel, etc..
Seasonal variables:
D1 = 1 if Q2, 0 otherwise
Regression model
Y   0   1 D1   2 D2   3 D3   0 X  U
Regression model with interactions:

Y   0   1 D1   2 D2   3 D3  (  0  1 D1   2 D2   3 D3 ) X  U
Y= sales of refrigerators (in thousands)

• Q1: Yˆ  ˆ 0  ˆ0 X
• Q2: Yˆ  ˆ 0  ˆ1  ˆ0 X  ˆ1 X

• Q3: Yˆ  ˆ 0  ˆ 2  ˆ0 X  ˆ2 X
• Q4:
Yˆ  ˆ 0  ˆ 3  ˆ0 X  ˆ3 X
•

4. Piecewise Linear Regression
• Example: Hypothetical relationship between sales commission
and sales volume. (Note: The intercept on the Y axis denotes
minimum guaranteed commission.)
Yi   1   1 X i   2 ( X i  X * ) Di  u i
where
Yi = sales commission
Xi = volume of sales
generated by the sales person
X* = threshold value of sales
D = 1 if Xi > X∗
= 0 if Xi < X∗

4. Piecewise Linear Regression
Parameter of the piecewise linear regression.
• 1 gives the slope of the regression line in segment I,
•1   2 gives the slope of the regression line in segment II
A test of the hypothesis that

there is no break in the regression
at the threshold value X* can be
conducted easily by noting the
statistical significance of the
estimated differential slope
coefficient β2s of the piecewise
linear regression.

Assignments
• Questions 9.1, 9.2, 9.3 in p324, Gujarati.
• Problems 9.21, 9.22, in Gujarati
• Problems 7.1-7.6 in p255-257, Wooldridge
• Computer Exercises, C7.1-C7.6, Wooldridge

Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2

Uploaded by

Copyright:

Available Formats

CHAPTER 3

Dummy Variable Regression Models

Nguyen Thu Hang, BMNV, FTU CS2 1

Sometimes we can not obtain a set of numerical

Nguyen Thu Hang, BMNV, FTU CS2 3

It is easier to have dummies for cross-sectional

A dummy variable (binary variable) is a

 The name of dummy variable should indicate

Nguyen Thu Hang, BMNV, FTU CS2 5

Nguyen Thu Hang, BMNV, FTU CS2 6

Nguyen Thu Hang, BMNV, FTU CS2 7

y y = (0 + 0) + 1x

Source SS df MS Number of obs = 526

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596

Nguyen Thu Hang, BMNV, FTU CS2 10

Source SS df MS Number of obs = 526

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596

Source SS df MS Number of obs = 526

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -3.378791 .7028798 -4.81 0.000 -4.759618 -1.997964

Nguyen Thu Hang, BMNV, FTU CS2 13

Source SS df MS Number of obs = 526

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.296511 .0358055 -8.28 0.000 -.3668524 -.2261696

Nguyen Thu Hang, BMNV, FTU CS2 15

Nguyen Thu Hang, BMNV, FTU CS2 17

Nguyen Thu Hang, BMNV, FTU CS2

Nguyen Thu Hang, BMNV, FTU CS2 19

Source SS df MS Number of obs = 526

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.3600645 .1854296 -1.94 0.053 -.7243444 .0042154

Nguyen Thu Hang, BMNV, FTU CS2 20

Nguyen Thu Hang, BMNV, FTU CS2 21

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.2267886 .1675394 -1.35 0.176 -.5559289 .1023517

Nguyen Thu Hang, BMNV, FTU CS2 22

Nguyen Thu Hang, BMNV, FTU CS2 23

D2= 1 if secondary; 0 otherwise

D3= 1 if tertiary; 0 otherwise

• Any categorical variable can be turned into a

Nguyen Thu Hang, BMNV, FTU CS2 26

Note that one dummy (in this case D1) is excluded

Consider various cases,

Nguyen Thu Hang, BMNV, FTU CS2 27

Depends on the frequency of the data

Again we either exclude one and include a constant

Nguyen Thu Hang, BMNV, FTU CS2 29

• Where cumgpa= cumulative gpa

Nguyen Thu Hang, BMNV, FTU CS2 30

Nguyen Thu Hang, BMNV, FTU CS2 31

Nguyen Thu Hang, BMNV, FTU CS2 32

SSRU  SSR1  SSR2 ~  2 (n  2(k  1))

Nguyen Thu Hang, BMNV, FTU CS2 33

Source SS df MS Number of obs = 180

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0017281 .0004642 3.72 0.000 .0008119 .0026442

Nguyen Thu Hang, BMNV, FTU CS2 35

Source SS df MS Number of obs = 552

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0006113 .000231 2.65 0.008 .0001576 .001065

Nguyen Thu Hang, BMNV, FTU CS2 36