You are on page 1of 48

CHAPTER

Dummy Variable Regression Models

1
Outline
1. Multiple regression analysis with qualitative
information
1.1. The nature of qualitative information
1.2. Describing Qualitative Information
1.3. A single dummy independent variable
1.4. Interaction with dummy variables
1.5. Dummies for Multiple Categories
2. Testing for Differences Across Groups
2.1. The Chow test
2.2 The use of dummy variables
3. The dummy variables in seasonal analysis
4. Piecewise linear regression 2
1. 1.The Nature of Qualitative Information
Sometimes we can not obtain a set of numerical
values for all the variables we want to use in a
model.
Examples:
(a) Gender may play a role in determining salary levels
(b) Different ethnic groups may follow different
consumption patterns
(c) Educational levels can affect earnings from
employment

3
1. 1.The Nature of Qualitative Information
It is easier to have dummies for cross-sectional
variables, but sometimes we do have for time series
as well.
Examples
(a) Changes in a political regime may affect production
(b) A war can have an impact on economic activities
(c) Certain days in a week or certain months in a year can have different
effects in the fluctuation of stock prices
(d) Seasonal effects are often observed in demand of various products
1.2. Describing Qualitative Information

A dummy variable (binary variable) is a


variable that takes on the value 1 or 0.

→ The name of dummy variable should indicate


the event with the value one.
Example: (see Table 3.1):
• female (= 1 if are female, 0 otherwise),
• married (= 1 if are married, 0 otherwise), etc
5
Table 3.1 A Partial Listing of the Data in WAGE1.RAW

person wage educ exper female maried


1 3.10 11 2 1 0
2 3.24 12 22 1 1
3 3.00 11 2 0 0
4 6.00 8 44 0 1
5 5.30 12 7 0 1

6
1.3. A single dummy independent variable
• A simple model with one continuous variable
(x) and one dummy (d) :
y =  0 +  0 d + 1 x + u
y =  0 +  0 female + 1educ + u
• This can be interpreted as an intercept shift:
→If d = 0, then y =  0 +  1 x + u
→If d=1, then y =  0 +  0 + 1 x + u
→ The case of d = 0 is the base group.

7
Example of 0 > 0

y y = (0 + 0) + 1x


d=1
slope = 1
0
{ 0
d=0
y = 0 + 1x
}
x
8
Example
• Model of wage determination
y =  0 +  0 female + 1educ +  2 exp er +  3tenure + u
• H0: there is no difference between earnings of
men and women:  0 = 0
• H1: There is discrimination against women:  0  0
Control variables
wage= average hourly earnings
female=1 if female
educ =years of education
exper= years potential experience
tenure= years with current employers

9
Example
• Output
. reg wage female educ exper tenure

Source SS df MS Number of obs = 526


F( 4, 521) = 74.40
Model 2603.10658 4 650.776644 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.3635
Adj R-squared = 0.3587
Total 7160.41429 525 13.6388844 Root MSE = 2.9576

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596


educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778
_cons -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538

10
Example- interpreting the output
- Negative intercept→ the intercept for men→ is
not very meaningful because no one has zero
values of all educ, exper and tenure in the
example.
- Negative coefficient on female → the average
difference in hourly earnings between a woman
and a man, given the same levels of educ, exper
and tenure → The woman earns, on average
USD1.81 less per hour than the man.
The intercept for men:  0 , for women:  0 +  0

11
Example
Do not include male and female in a model
with an intercept → dummy variable trap due
to perfect collinearity.
y =  0 +  0 female + 1male + 1educ +  2 exp er +  3tenure + u
. reg wage female male educ exper tenure
note: male omitted because of collinearity

Source SS df MS Number of obs = 526


F( 4, 521) = 74.40
Model 2603.10658 4 650.776644 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.3635
Adj R-squared = 0.3587
Total 7160.41429 525 13.6388844 Root MSE = 2.9576

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596


male 0 (omitted)
educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778
_cons -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538
12
Example
• There is no dummy variable trap if we do not
have an overall intercept.
y =  0 female + 1male + 1educ +  2 exp er +  3tenure + u
. reg wage female male educ exper tenure, nocons

Source SS df MS Number of obs = 526


F( 5, 521) = 477.61
Model 20888.9846 5 4177.79693 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.8209
Adj R-squared = 0.8192
Total 25446.2924 526 48.3769817 Root MSE = 2.9576

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -3.378791 .7028798 -4.81 0.000 -4.759618 -1.997964


male -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538
educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778

13
Example
• Reestimate the wage equation using
log(wage) as the dependent variable and
adding quadratics in exper and tenure.
. reg lwage female educ exper expersq tenure tenursq

Source SS df MS Number of obs = 526


F( 6, 519) = 68.18
Model 65.3791009 6 10.8965168 Prob > F = 0.0000
Residual 82.9506505 519 .159827843 R-squared = 0.4408
Adj R-squared = 0.4343
Total 148.329751 525 .28253286 Root MSE = .39978

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.296511 .0358055 -8.28 0.000 -.3668524 -.2261696


educ .0801967 .0067573 11.87 0.000 .0669217 .0934716
exper .0294324 .0049752 5.92 0.000 .0196585 .0392063
expersq -.0005827 .0001073 -5.43 0.000 -.0007935 -.0003719
tenure .0317139 .0068452 4.63 0.000 .0182663 .0451616
tenursq -.0005852 .0002347 -2.49 0.013 -.0010463 -.0001241
_cons .416691 .0989279 4.21 0.000 .2223425 .6110394 14
Example- Interpreting output
• The coefficient on female implies that for the
same levels of educ, exper, and tenure,
women earn about 29.7% less than men

15
1.4. Interactions with Dummies
An interaction term is an independent variable
that is the product of two other independent
variables. These independent variables can be
continuous or dummy variables
Yt = 1 + 2Xt + 3Zt + 4XtZt + et
In this model, the effect of X on Y will depend on
the level of Z.
In this model, the effect of Z on Y will depend on
the level of X.
16
1.4. Interactions with Dummies
• Can also consider interacting a dummy
variable, d, with a continuous variable, x
• y = 0 + 1d + 1x + 2d*x + u
• If d = 0, then y = 0 + 1x + u
• If d = 1, then y = (0 + 1) + (1+ 2) x + u
• This is interpreted as a change in the slope

17
Example of 0 > 0 and 1 < 0

y
y = 0 + 1x
d=0

d=1
y = (0 + 0) + (1 + 1) x

x 18
Example
• Test whether the return to education is the
same for men and women, allowing for a
constant wage differential between men and
women).
y = (  0 +  0 female) + ( 1 +  1 female)educ + u

19
Example
• Output
. reg lwage female educ femaleeduc

Source SS df MS Number of obs = 526


F( 3, 522) = 74.65
Model 44.531522 3 14.8438407 Prob > F = 0.0000
Residual 103.798229 522 .198847183 R-squared = 0.3002
Adj R-squared = 0.2962
Total 148.329751 525 .28253286 Root MSE = .44592

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.3600645 .1854296 -1.94 0.053 -.7243444 .0042154


educ .0772279 .0089875 8.59 0.000 .0595718 .0948841
femaleeduc -.0000641 .0145035 -0.00 0.996 -.0285565 .0284283
_cons .8259547 .1180502 7.00 0.000 .5940427 1.057867

20
Example
H0: Return to education is the same for men
and women, 1 = 0
Interpretation of output: The estimated return
to education for men is 7.72% and for women
is (7.72-0.00)%. The difference, -0.00% is not
economically large nor statistically significant
(t=-0.00).→ There is no evidence against the
hypothesis that the return to education is the
same for men and women.

21
Example
• Reestimate the wage equation using
log(wage) as the dependent variable and
adding quadratics in exper and tenure.

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.2267886 .1675394 -1.35 0.176 -.5559289 .1023517


educ .0823692 .0084699 9.72 0.000 .0657296 .0990088
femaleeduc -.0055645 .0130618 -0.43 0.670 -.0312252 .0200962
exper .0293366 .0049842 5.89 0.000 .019545 .0391283
expersq -.0005804 .0001075 -5.40 0.000 -.0007916 -.0003691
tenure .0318967 .006864 4.65 0.000 .018412 .0453814
tenursq -.00059 .0002352 -2.51 0.012 -.001052 -.000128
_cons .388806 .1186871 3.28 0.001 .1556388 .6219732

22
Example
• Interpreting output: The estimated return to
education for men is 8.2% and for women is
0.082-0.0056= 7.6%. The difference -0.56% is
not economically large nor statistically
significant. → There is no evidence against the
hypothesis that the return to education is the
same for men and women.

23
Example
• H0:  0 = 0and  1 = 0
• Output
test female= femaleeduc=0
( 1) female - femaleeduc = 0
( 2) female = 0
F( 2, 518) = 34.33
Prob > F = 0.0000
• H0 is rejected → a constant wage differential
between women and men is allowed!

24
1.5.Dummies with Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Estimate a model that allows for differences among four
groups: primary, secondary, tertiary, BSc, MSc
D1= 1 if primary; 0 otherwise

D2= 1 if secondary; 0 otherwise

D3= 1 if tertiary; 0 otherwise

D4 = 1 if BSc; 0 otherwise

D5 = 1 if MSc; 0 otherwise
1.5.Dummies with Multiple Categories

• Any categorical variable can be turned into a


set of dummy variables
• Because the base group is represented by the
intercept, if there are n categories there
should be n – 1 dummy variables
• If there are a lot of categories, it may make
sense to group some together
• Example: top 10 ranking, 11 – 25, etc.

26
1.5.Dummies with Multiple Categories
So we estimate:
Y=β1+β2X2+ a1 D2+a2D3+a3D4+a4D5+u

Note that one dummy (in this case D1) is excluded


from the model in order to avoid dummy variable
trap.

Consider various cases,


i.e. D2=1, D3=D4=D5=0 etc.

27
Example-Seasonal Dummy Variables

Depends on the frequency of the data


Quarterly – 4 dummies – DQ1, DQ2, DQ3, DQ4
Monthly – 12 dummies – one for each month
Daily – 5 dummies – Dmon, Dtue, Dwed etc.

Again we either exclude one and include a constant


(always better) or if we use all we never include a
constant (dummy variable trap).
28
2. Testing for Differences Across Groups
• Testing whether a regression function is
different for one group versus another:
2.1. The Chow test.
2.2. Test for the joint significance of the dummy
and its interactions with all other x variables

29
Example
• Whether the same regression model describes
college grade point averages for male and female
college athletes.
cumgpa =  0 + 1 sat +  2 hsperc +  3tothrs + u

• Where cumgpa= cumulative gpa


sat= SAT score
hsperc= high school rank percentile
tothrs= total hours of college course

30
The Chow test
• Suppose we have 2 groups, g=1 and g=2.
• We test whether the intercept and all slopes
are the same across the two groups.
y =  g ,1 +  g , 2 x2 +  g ,3 x3 + ... +  g ,k xk + u

k variable
model

31
The Chow test
• RSS1 = the sum of squared residuals for group
1 (n1 observation).
• RSS2 = the sum of squared residuals for group
2 (n2 observation).
• RSS = the sum of squared residuals for the
whole sample (n observation).

32
The Chow test

RSS1 /  2  2 (n1 − k )
~
RSS 2 /  2  2 (n2 − k )
~
RSSU = RSS1 + RSS 2 ~  2 ( n − 2k )

• RSS /  2
~  2 (n − k ) k variable
model

33
The Chow Test
• F-statistic with the df of (k, n-2k)
• F-statistic us called the Chow statistic
• If F  F ,( k ,n − 2 k )
→ H0 is rejected.

F=
RSS − (RSS1 + RSS 2 ) n − 2k 

RSS1 + RSS 2 k

34
The Chow Test- Example
• Output for female =1
. reg cumgpa sat hsperc tothrs if female==1

Source SS df MS Number of obs = 180


F( 3, 176) = 34.08
Model 83.4816253 3 27.8272084 Prob > F = 0.0000
Residual 143.689727 176 .816418902 R-squared = 0.3675
Adj R-squared = 0.3567
Total 227.171352 179 1.2691137 Root MSE = .90356

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0017281 .0004642 3.72 0.000 .0008119 .0026442


hsperc -.0059167 .0038895 -1.52 0.130 -.0135927 .0017594
tothrs .0158603 .0018485 8.58 0.000 .0122122 .0195085
_cons .1003465 .4810947 0.21 0.835 -.8491105 1.049803

35
The Chow Test- Example
• Output for female =0
. reg cumgpa sat hsperc tothrs if female==0

Source SS df MS Number of obs = 552


F( 3, 548) = 41.94
Model 89.6937042 3 29.8979014 Prob > F = 0.0000
Residual 390.619421 548 .712809162 R-squared = 0.1867
Adj R-squared = 0.1823
Total 480.313125 551 .871711661 Root MSE = .84428

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0006113 .000231 2.65 0.008 .0001576 .001065


hsperc -.0059675 .0017459 -3.42 0.001 -.0093969 -.002538
tothrs .0103004 .001074 9.59 0.000 .0081907 .0124101
_cons 1.213984 .2602697 4.66 0.000 .7027359 1.725233

Nguyen Thu Hang, BMNV, FTU CS2 36


The Chow Test- Example
• Output for the whole sample
. reg cumgpa sat hsperc tothrs

Source SS df MS Number of obs = 732


F( 3, 728) = 74.72
Model 168.533658 3 56.1778861 Prob > F = 0.0000
Residual 547.364897 728 .751874858 R-squared = 0.2354
Adj R-squared = 0.2323
Total 715.898555 731 .979341389 Root MSE = .86711

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0009028 .0002079 4.34 0.000 .0004947 .0013109


hsperc -.0063791 .0015678 -4.07 0.000 -.0094572 -.0033011
tothrs .0119779 .0009314 12.86 0.000 .0101494 .0138064
_cons .9291105 .2285515 4.07 0.000 .4804118 1.377809

Nguyen Thu Hang, BMNV, FTU CS2 37


The Chow Test
• F-statistic with the df of (4, 724)
• F-statistic us called the Chow statistic
• FF = 2.384
0.05,( 4 , 724 )

→ H0 is rejected.

F=
547.36 − (143.68 + 390.62) • 732 − 2 * 4 = 4.42
143.68 + 390.62 4

38
Test for the joint significance of the dummy and its
interactions with all other x variables

• Model
cumgpa =  0 +  o female + 1sat + 1 female * sat +  2 hsperc
+  2 female * hsperc +  3tothrs +  3 female * tothrs + u

Null hypothesis:
 0 = 0,  1 = 0,  2 = 0,  3 = 0

Nguyen Thu Hang, BMNV, FTU CS2 39


Example-Output
• Look at the t-statistics of female and its interactions!
Source SS df MS Number of obs = 732
F( 7, 724) = 35.15
Model 181.589407 7 25.9413439 Prob > F = 0.0000
Residual 534.309148 724 .73799606 R-squared = 0.2537
Adj R-squared = 0.2464
Total 715.898555 731 .979341389 Root MSE = .85907

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.113638 .528539 -2.11 0.035 -2.15129 -.0759859


sat .0006113 .000235 2.60 0.009 .0001499 .0010727
femalesat .0011167 .0005 2.23 0.026 .0001351 .0020984
hsperc -.0059675 .0017765 -3.36 0.001 -.0094551 -.0024798
femalehsperc .0000508 .0041025 0.01 0.990 -.0080035 .008105
tothrs .0103004 .0010928 9.43 0.000 .0081549 .0124459
femaletothrs .0055599 .0020696 2.69 0.007 .0014968 .009623
_cons 1.213984 .2648281 4.58 0.000 .6940617 1.733907
40
Example-Output
The coefficients on female, female*sat,
female*tothrs are significant →H0 is
rejected.
Estimate the restricted model by dropping female and its
interactions → Rr2 = 0.2354

(0.2537 − 0.2354) / 4
F= = 4.43
(1 − 0.2537) / 724

( Rur2 − Rr2 ) / q
F=
(1 − Rur2 ) / df ur41
Example-Output
test female= femalesat= femalehsperc=
femaletothrs=0

( 1) female - femalesat = 0
( 2) female - femalehsperc = 0
( 3) female - femaletothrs = 0
( 4) female = 0
F( 4, 724) = 4.42
• Prob > F = 0.0015
42
3. The dummy variables in seasonal analysis
• Many economic time series based on monthly or
quarterly data exhibit seasonal patterns.
Examples:
→ Sales of department stores at Christmas and
other major holiday times
→ Demand for money by households at holiday
times
→ Demand for ice cream and soft drinks during
summer
→ Prices of crops right after harvesting season,
demand for air travel, etc..
43
3. The dummy variables in seasonal analysis
Seasonal variables:
D1 = 1 if Q2, 0 otherwise
D2 = 1 if Q3, 0 otherwise
D3 = 1 if Q4, 0 otherwise
Regression model

Y =  0 +  1 D1 +  2 D2 +  3 D3 +  0 X + U

Regression model with interactions:


Y =  0 +  1 D1 +  2 D2 +  3 D3 + (  0 + 1 D1 +  2 D2 +  3 D3 ) X + U

Y= sales of refrigerators (in thousands)

44
3. The dummy variables in seasonal analysis

• Q1: Yˆ = ˆ 0 + ˆ0 X

• Q2: Yˆ = ˆ 0 + ˆ1 + ˆ0 X + ˆ1 X

• Q3: Yˆ = ˆ 0 + ˆ 2 + ˆ0 X + ˆ 2 X

• Q4: Yˆ = ˆ 0 + ˆ 3 + ˆ0 X + ˆ3 X


45
4. Piecewise Linear Regression
• Example: Hypothetical relationship between sales
commission and sales volume. (Note: The intercept on the
Y axis denotes minimum guaranteed commission.)

Yi =  1 + 1 X i +  2 ( X i − X * ) Di + u i
where
Yi = sales commission
Xi = volume of sales
generated by the sales person
X* = threshold value of sales
D = 1 if Xi > X∗
= 0 if Xi < X∗

46
4. Piecewise Linear Regression
Parameter of the piecewise linear regression.
• 1 gives the slope of the regression line in segment I,
•1 +  2 gives the slope of the regression line in segment II

A test of the hypothesis that


there is no break in the regression
at the threshold value X* can be
conducted easily by noting the
statistical significance of the
estimated differential slope
coefficient β2s of the piecewise
linear regression.

47
Assignments
• Questions 9.1, 9.2, 9.3 in p324, Gujarati.
• Problems 9.21, 9.22, in Gujarati
• Problems 7.1-7.6 in p255-257, Wooldridge
• Computer Exercises, C7.1-C7.6, Wooldridge

48

You might also like