You are on page 1of 48

CHAPTER 3

Dummy Variable Regression Models


(Lec 8-9)

Nguyen Thu Hang, BMNV, FTU CS2 1


Outline
1. Multiple regression analysis with qualitative
information
1.1. The nature of qualitative information
1.2. Describing Qualitative Information
1.3. A single dummy independent variable
1.4. Interaction with dummy variables
1.5. Dummies for Multiple Categories
2. Testing for Differences Across Groups
2.1. The Chow test
2.2 The use of dummy variables
3. The dummy variables in seasonal analysis
Nguyen Thu Hang, BMNV, FTU CS2 2
1. 1.The Nature of Qualitative Information

Sometimes we can not obtain a set of numerical


values for all the variables we want to use in a
model.
Examples:
(a) Gender may play a role in determining salary levels
(b) Different ethnic groups may follow different
consumption patterns
(c) Educational levels can affect earnings from
employment

Nguyen Thu Hang, BMNV, FTU CS2 3


1. 1.The Nature of Qualitative Information

It is easier to have dummies for cross-sectional


variables, but sometimes we do have for time series
as well.
Examples
(a) Changes in a political regime may affect production
(b) A war can have an impact on economic activities
(c) Certain days in a week or certain months in a year can have different
effects in the fluctuation of stock prices
(d) Seasonal effects are often observed in demand of various products
1.2. Describing Qualitative Information

A dummy variable (binary variable) is a


variable that takes on the value 1 or 0.

 The name of dummy variable should indicate


the event with the value one.
Example: (see Table 3.1):
• female (= 1 if are female, 0 otherwise),
• married (= 1 if are married, 0 otherwise), etc

Nguyen Thu Hang, BMNV, FTU CS2 5


Table 3.1 A Partial Listing of the Data in WAGE1.RAW
person wage educ exper female maried
1 3.10 11 2 1 0
2 3.24 12 22 1 1
3 3.00 11 2 0 0
4 6.00 8 44 0 1
5 5.30 12 7 0 1

Nguyen Thu Hang, BMNV, FTU CS2 6


1.3. A single dummy independent variable
• A simple model with one continuous variable
(x) and one dummy (d) :
y   0   0 d  1 x  u
y   0   0 female  1educ  u
• This can be interpreted as an intercept shift:
If d = 0, then y   0   1 x  u
If d=1, then y   0   0  1 x  u
 The case of d = 0 is the base group.

Nguyen Thu Hang, BMNV, FTU CS2 7


Example of 0 > 0

y y = (0 + 0) + 1x


d=1
slope = 1
0

{ y = 0 + 1x
d=0

}
0

x
Nguyen Thu Hang, BMNV, FTU CS2 8
Example
• Model of wage determination
y   0   0 female  1educ   2 exp er   3tenure  u
• H0: there is no difference between earnings of men and
women:
 0 0
• H1: There is discrimination against women:
 0 0
Control variables
wage= average hourly earnings
female=1 if female
educ =years of education
exper= years potential experience
tenure= years with current employers
Nguyen Thu Hang, BMNV, FTU CS2 9
Example
• Output
. reg wage female educ exper tenure

Source SS df MS Number of obs = 526


F( 4, 521) = 74.40
Model 2603.10658 4 650.776644 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.3635
Adj R-squared = 0.3587
Total 7160.41429 525 13.6388844 Root MSE = 2.9576

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596


educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778
_cons -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538

Nguyen Thu Hang, BMNV, FTU CS2 10


Example- interpreting the output
- Negative intercept the intercept for men is
not very meaningful because no one has zero
values of all educ, exper and tenure in the
example.
- Negative coefficient on female  the average
difference in hourly earnings between a woman
and a man, given the same levels of educ, exper
and tenure  The woman earns, on average
USD1.81 less per hour than the man.
0 0  0
The intercept for men: , for women:
Nguyen Thu Hang, BMNV, FTU CS2 11
Example
Do not include male and female in a model
with an intercept  dummy variable trap due
to perfect collinearity.
y   0   0 female   1male  1educ   2 exp er   3tenure  u
. reg wage female male educ exper tenure
note: male omitted because of collinearity

Source SS df MS Number of obs = 526


F( 4, 521) = 74.40
Model 2603.10658 4 650.776644 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.3635
Adj R-squared = 0.3587
Total 7160.41429 525 13.6388844 Root MSE = 2.9576

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.810852 .2648252 -6.84 0.000 -2.331109 -1.290596


male 0 (omitted)
educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778
_cons -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538
Nguyen Thu Hang, BMNV, FTU CS2 12
Example
• There is no dummy variable trap if we do not
have an overall intercept.
y   0 female   1male  1educ   2 exp er   3tenure  u
. reg wage female male educ exper tenure, nocons

Source SS df MS Number of obs = 526


F( 5, 521) = 477.61
Model 20888.9846 5 4177.79693 Prob > F = 0.0000
Residual 4557.30771 521 8.7472317 R-squared = 0.8209
Adj R-squared = 0.8192
Total 25446.2924 526 48.3769817 Root MSE = 2.9576

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -3.378791 .7028798 -4.81 0.000 -4.759618 -1.997964


male -1.567939 .7245511 -2.16 0.031 -2.991339 -.144538
educ .5715048 .0493373 11.58 0.000 .4745802 .6684293
exper .0253959 .0115694 2.20 0.029 .0026674 .0481243
tenure .1410051 .0211617 6.66 0.000 .0994323 .1825778

Nguyen Thu Hang, BMNV, FTU CS2 13


Example
• Reestimate the wage equation using log(wage)
as the dependent variable and adding
quadratics in exper and tenure.
. reg lwage female educ exper expersq tenure tenursq

Source SS df MS Number of obs = 526


F( 6, 519) = 68.18
Model 65.3791009 6 10.8965168 Prob > F = 0.0000
Residual 82.9506505 519 .159827843 R-squared = 0.4408
Adj R-squared = 0.4343
Total 148.329751 525 .28253286 Root MSE = .39978

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.296511 .0358055 -8.28 0.000 -.3668524 -.2261696


educ .0801967 .0067573 11.87 0.000 .0669217 .0934716
exper .0294324 .0049752 5.92 0.000 .0196585 .0392063
expersq -.0005827 .0001073 -5.43 0.000 -.0007935 -.0003719
tenure .0317139 .0068452 4.63 0.000 .0182663 .0451616
tenursq -.0005852 .0002347 -2.49 0.013 -.0010463 -.0001241
_cons .416691 .0989279 4.21
Nguyen Thu Hang, 0.000
BMNV, FTU CS2 .2223425 .6110394 14
Example- Interpreting output
• The coefficient on female implies that for the
same levels of educ, exper, and tenure,
women earn about 29.7% less than men

Nguyen Thu Hang, BMNV, FTU CS2 15


1.4. Interactions with Dummies
An interaction term is an independent variable
that is the product of two other independent
variables. These independent variables can be
continuous or dummy variables
Yt = 1 + 2Xt + 3Zt + 4XtZt + et
In this model, the effect of X on Y will depend on
the level of Z.
In this model, the effect of Z on Y will depend on
the level of X.
Nguyen Thu Hang, BMNV, FTU CS2 16
1.4. Interactions with Dummies
• Can also consider interacting a dummy
variable, d, with a continuous variable, x
• y = 0 + 1d + 1x + 2d*x + u
• If d = 0, then y = 0 + 1x + u
• If d = 1, then y = (0 + 1) + (1+ 2) x + u
• This is interpreted as a change in the slope

Nguyen Thu Hang, BMNV, FTU CS2 17


Example of 0 > 0 and 1 < 0

y
y = 0 + 1x
d=0

d=1
y = (0 + 0) + (1 + 1) x

Nguyen Thu Hang, BMNV, FTU CS2


x 18
Example
• Test whether the return to education is the
same for men and women, allowing for a
constant wage differential between men and
women).
y  (  0   0 female)  ( 1   1 female)educ  u

Nguyen Thu Hang, BMNV, FTU CS2 19


Example
• Output
. reg lwage female educ femaleeduc

Source SS df MS Number of obs = 526


F( 3, 522) = 74.65
Model 44.531522 3 14.8438407 Prob > F = 0.0000
Residual 103.798229 522 .198847183 R-squared = 0.3002
Adj R-squared = 0.2962
Total 148.329751 525 .28253286 Root MSE = .44592

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.3600645 .1854296 -1.94 0.053 -.7243444 .0042154


educ .0772279 .0089875 8.59 0.000 .0595718 .0948841
femaleeduc -.0000641 .0145035 -0.00 0.996 -.0285565 .0284283
_cons .8259547 .1180502 7.00 0.000 .5940427 1.057867

Nguyen Thu Hang, BMNV, FTU CS2 20


Example
H0: Return to education is the same for men
and women,  1  0
Interpretation of output: The estimated return
to education for men is 7.72% and for women
is (7.72-0.00)%. The difference, -0.00% is not
economically large nor statistically significant
(t=-0.00). There is no evidence against the
hypothesis that the return to education is the
same for men and women.

Nguyen Thu Hang, BMNV, FTU CS2 21


Example
• Reestimate the wage equation using log(wage)
as the dependent variable and adding
quadratics in exper and tenure.

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.2267886 .1675394 -1.35 0.176 -.5559289 .1023517


educ .0823692 .0084699 9.72 0.000 .0657296 .0990088
femaleeduc -.0055645 .0130618 -0.43 0.670 -.0312252 .0200962
exper .0293366 .0049842 5.89 0.000 .019545 .0391283
expersq -.0005804 .0001075 -5.40 0.000 -.0007916 -.0003691
tenure .0318967 .006864 4.65 0.000 .018412 .0453814
tenursq -.00059 .0002352 -2.51 0.012 -.001052 -.000128
_cons .388806 .1186871 3.28 0.001 .1556388 .6219732

Nguyen Thu Hang, BMNV, FTU CS2 22


Example
• Interpreting output: The estimated return to
education for men is 8.2% and for women is
0.082-0.0056= 7.6%. The difference -0.56% is
not economically large nor statistically
significant.  There is no evidence against the
hypothesis that the return to education is the
same for men and women.

Nguyen Thu Hang, BMNV, FTU CS2 23


Example
• H0:  0  0 and  1  0
• Output
test female= femaleeduc=0
( 1) female - femaleeduc = 0
( 2) female = 0
F( 2, 518) = 34.33
Prob > F = 0.0000
• H0 is rejected  a constant wage differential
between women and men is allowed!
Nguyen Thu Hang, BMNV, FTU CS2 24
1.5.Dummies with Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Estimate a model that allows for differences among four
groups: primary, secondary, tertiary, BSc, MSc
D1= 1 if primary; 0 otherwise

D2= 1 if secondary; 0 otherwise

D3= 1 if tertiary; 0 otherwise

D4 = 1 if BSc; 0 otherwise

D5 = 1 if MSc; 0 otherwise
1.5.Dummies with Multiple Categories

• Any categorical variable can be turned into a


set of dummy variables
• Because the base group is represented by the
intercept, if there are n categories there
should be n – 1 dummy variables
• If there are a lot of categories, it may make
sense to group some together
• Example: top 10 ranking, 11 – 25, etc.

Nguyen Thu Hang, BMNV, FTU CS2 26


1.5.Dummies with Multiple Categories
So we estimate:
Y=β1+β2X2+ a1 D2+a2D3+a3D4+a4D5+u

Note that one dummy (in this case D1) is excluded


from the model in order to avoid dummy variable
trap.

Consider various cases,


i.e. D2=1, D3=D4=D5=0 etc.

Nguyen Thu Hang, BMNV, FTU CS2 27


Example-Seasonal Dummy Variables

Depends on the frequency of the data


Quarterly – 4 dummies – DQ1, DQ2, DQ3, DQ4
Monthly – 12 dummies – one for each month
Daily – 5 dummies – Dmon, Dtue, Dwed etc.

Again we either exclude one and include a constant


(always better) or if we use all we never include a
constant (dummy variable trap).
Nguyen Thu Hang, BMNV, FTU CS2 28
2. Testing for Differences Across Groups
• Testing whether a regression function is
different for one group versus another:
2.1. The Chow test.
2.2. Test for the joint significance of the dummy
and its interactions with all other x variables

Nguyen Thu Hang, BMNV, FTU CS2 29


Example
• Whether the same regression model describes college
grade point averages for male and female college
athletes.
cumgpa   0  1 sat   2 hsperc   3 tothrs  u

• Where cumgpa= cumulative gpa


sat= SAT score
hsperc= high school rank percentile
tothrs= total hours of college course

Nguyen Thu Hang, BMNV, FTU CS2 30


The Chow test
• Suppose we have 2 groups, g=1 and g=2.
• We test whether the intercept and all slopes
are the same across the two groups.
y   g ,o   g ,1 x1   g , 2 x 2  ...   g ,k x k  u

Nguyen Thu Hang, BMNV, FTU CS2 31


The Chow test
• SSR1 = the sum of squared residuals for group
1 (n1 observation).
• SSR2 = the sum of squared residuals for group
2 (n2 observation).
• SSR = the sum of squared residuals for the
whole sample (n observation).

Nguyen Thu Hang, BMNV, FTU CS2 32


The Chow test

SSR1 /  2
~  2 (n1  k  1)

SSR2 /  2 ~  (n2  k  1)
2

SSRU  SSR1  SSR2 ~  2 (n  2(k  1))

• ~  2 (n  k  1)
SSR /  2

Nguyen Thu Hang, BMNV, FTU CS2 33


The Chow Test
• F-statistic with the df of (k+1, n-2(k+1))
• F-statistic us called the Chow statistic
• If FF  ,( k 1, n  2 ( k 1))

 H0 is rejected.

F
 SSR   SSR1  SSR2    n  2 k  1 

SSR1  SSR2 k 1
Nguyen Thu Hang, BMNV, FTU CS2 34
The Chow Test- Example
• Output for female =1
. reg cumgpa sat hsperc tothrs if female==1

Source SS df MS Number of obs = 180


F( 3, 176) = 34.08
Model 83.4816253 3 27.8272084 Prob > F = 0.0000
Residual 143.689727 176 .816418902 R-squared = 0.3675
Adj R-squared = 0.3567
Total 227.171352 179 1.2691137 Root MSE = .90356

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0017281 .0004642 3.72 0.000 .0008119 .0026442


hsperc -.0059167 .0038895 -1.52 0.130 -.0135927 .0017594
tothrs .0158603 .0018485 8.58 0.000 .0122122 .0195085
_cons .1003465 .4810947 0.21 0.835 -.8491105 1.049803

Nguyen Thu Hang, BMNV, FTU CS2 35


The Chow Test- Example
• . reg
Output for female =0
cumgpa sat hsperc tothrs if female==0

Source SS df MS Number of obs = 552


F( 3, 548) = 41.94
Model 89.6937042 3 29.8979014 Prob > F = 0.0000
Residual 390.619421 548 .712809162 R-squared = 0.1867
Adj R-squared = 0.1823
Total 480.313125 551 .871711661 Root MSE = .84428

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0006113 .000231 2.65 0.008 .0001576 .001065


hsperc -.0059675 .0017459 -3.42 0.001 -.0093969 -.002538
tothrs .0103004 .001074 9.59 0.000 .0081907 .0124101
_cons 1.213984 .2602697 4.66 0.000 .7027359 1.725233

Nguyen Thu Hang, BMNV, FTU CS2 36


The Chow Test- Example
• Output for the whole sample
. reg cumgpa sat hsperc tothrs

Source SS df MS Number of obs = 732


F( 3, 728) = 74.72
Model 168.533658 3 56.1778861 Prob > F = 0.0000
Residual 547.364897 728 .751874858 R-squared = 0.2354
Adj R-squared = 0.2323
Total 715.898555 731 .979341389 Root MSE = .86711

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

sat .0009028 .0002079 4.34 0.000 .0004947 .0013109


hsperc -.0063791 .0015678 -4.07 0.000 -.0094572 -.0033011
tothrs .0119779 .0009314 12.86 0.000 .0101494 .0138064
_cons .9291105 .2285515 4.07 0.000 .4804118 1.377809

Nguyen Thu Hang, BMNV, FTU CS2 37


The Chow Test
• F-statistic with the df of (4, 724)
• F-statistic us called the Chow statistic
• FF  2.384
0.05 ,( 4 , 724 )

 H0 is rejected.

F
 547.36  143.68  390.62   732  2 3  1 
  4.42
143.68  390.62 3 1

Nguyen Thu Hang, BMNV, FTU CS2 38


Test for the joint significance of the dummy and its
interactions with all other x variables

• Model
cumgpa   0   o female   1 sat   1 female * sat   2 hsperc
  2 female * hsperc   3tothrs   3 female * tothrs  u

Null hypothesis:
 0  0,  1  0,  2  0,  3  0

Nguyen Thu Hang, BMNV, FTU CS2 39


Example-Output
• Look at the t-statistics of female and its interactions!
Source SS df MS Number of obs = 732
F( 7, 724) = 35.15
Model 181.589407 7 25.9413439 Prob > F = 0.0000
Residual 534.309148 724 .73799606 R-squared = 0.2537
Adj R-squared = 0.2464
Total 715.898555 731 .979341389 Root MSE = .85907

cumgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -1.113638 .528539 -2.11 0.035 -2.15129 -.0759859


sat .0006113 .000235 2.60 0.009 .0001499 .0010727
femalesat .0011167 .0005 2.23 0.026 .0001351 .0020984
hsperc -.0059675 .0017765 -3.36 0.001 -.0094551 -.0024798
femalehsperc .0000508 .0041025 0.01 0.990 -.0080035 .008105
tothrs .0103004 .0010928 9.43 0.000 .0081549 .0124459
femaletothrs .0055599 .0020696 2.69 0.007 .0014968 .009623
_cons 1.213984 .2648281 4.58 0.000 .6940617 1.733907
Nguyen Thu Hang, BMNV, FTU CS2 40
Example-Output
The coefficients on female, female*sat,
female*tothrs are significant H0 is
rejected.
Estimate the restricted model by dropping female and its
interactions  Rr  0.2354
2

( 0 .2537  0.2354 ) / 4
F  4 .43
(1  0 .2537 ) / 724

(R  R ) / q
2 2
F ur r
(1  Rur ) / df ur
Nguyen Thu Hang, BMNV, FTU CS2
2 41
Example-Output
test female= femalesat= femalehsperc=
femaletothrs=0

( 1) female - femalesat = 0
( 2) female - femalehsperc = 0
( 3) female - femaletothrs = 0
( 4) female = 0
F( 4, 724) = 4.42
• Prob > F = 0.0015
Nguyen Thu Hang, BMNV, FTU CS2 42
3. The dummy variables in seasonal analysis
• Many economic time series based on monthly or
quarterly data exhibit seasonal patterns.
Examples:
 Sales of department stores at Christmas and
other major holiday times
 Demand for money by households at holiday
times
 Demand for ice cream and soft drinks during
summer
 Prices of crops right after harvesting season,
demand for air travel, etc..
Nguyen Thu Hang, BMNV, FTU CS2 43
3. The dummy variables in seasonal analysis
Seasonal variables:
D1 = 1 if Q2, 0 otherwise
D2 = 1 if Q3, 0 otherwise
D3 = 1 if Q4, 0 otherwise
Regression model

Y   0   1 D1   2 D2   3 D3   0 X  U

Regression model with interactions:


Y   0   1 D1   2 D2   3 D3  (  0  1 D1   2 D2   3 D3 ) X  U

Y= sales of refrigerators (in thousands)

Nguyen Thu Hang, BMNV, FTU CS2 44


3. The dummy variables in seasonal analysis

• Q1: Yˆ  ˆ 0  ˆ0 X

• Q2: Yˆ  ˆ 0  ˆ1  ˆ0 X  ˆ1 X


• Q3: Yˆ  ˆ 0  ˆ 2  ˆ0 X  ˆ2 X
• Q4:
Yˆ  ˆ 0  ˆ 3  ˆ0 X  ˆ3 X

Nguyen Thu Hang, BMNV, FTU CS2 45


4. Piecewise Linear Regression
• Example: Hypothetical relationship between sales commission
and sales volume. (Note: The intercept on the Y axis denotes
minimum guaranteed commission.)

Yi   1   1 X i   2 ( X i  X * ) Di  u i
where
Yi = sales commission
Xi = volume of sales
generated by the sales person
X* = threshold value of sales
D = 1 if Xi > X∗
= 0 if Xi < X∗

Nguyen Thu Hang, BMNV, FTU CS2 46


4. Piecewise Linear Regression
Parameter of the piecewise linear regression.
• 1 gives the slope of the regression line in segment I,
•1   2 gives the slope of the regression line in segment II

A test of the hypothesis that


there is no break in the regression
at the threshold value X* can be
conducted easily by noting the
statistical significance of the
estimated differential slope
coefficient β2s of the piecewise
linear regression.

Nguyen Thu Hang, BMNV, FTU CS2 47


Assignments
• Questions 9.1, 9.2, 9.3 in p324, Gujarati.
• Problems 9.21, 9.22, in Gujarati
• Problems 7.1-7.6 in p255-257, Wooldridge
• Computer Exercises, C7.1-C7.6, Wooldridge

Nguyen Thu Hang, BMNV, FTU CS2 48

You might also like