(2022-2-GEC-ê Ë ) LN5 - Dummy Variables

Hyunchul Kim
Department of Economics Undergraduate

Sungkyunkwan University Econometrics
Lecture Note 5
Using the Multiple Regression Model
This note is based on the lecture note series by Kyoo il Kim.
Dummy Variables
Dummy variables can be used to measure the difference of intercepts or coefficients

for different groups
Consider the following linear regression of log of wage on years of schooling and years
of job experience:
Y = β0 + β1 S + β2 E + ε
where Y : log(wage), S: years of schooling, E: years of job experience
Question: Does the same regression line apply to both men and women? We may
answer this question using a dummy variable

1 if the i-th individual is male
Di =
0 otherwise
Example A.
Specification 1
Suppose
Y = β0 + γ1 D + β1 S + β2 E + ε
What does γ1 measure?
Note the wage equation becomes
Y = (β0 + γ1 ) + β1 S + β2 E for men

Y = β0 + β1 S + β2 E for women
1
This means that γ1 measures the difference of intercepts between male wage and
female wage equations
Here we implicitly assume that marginal effects of schooling and experience are the
same between men and women
Specification 2
Now let

1 if the i-th individual is male
D1i =
0 otherwise

1 if the i-th individual is female
D2i =
0 otherwise
and suppose
Y = γ1 D1 + γ2 D2 + β1 S + β2 E
Y = γ1 + β1 S + β2 E for men
Y = γ2 + β1 S + β2 E for women
Difference between intercepts γ1 − γ2 measures the difference of intercepts in male

and female wage equations
Thus, if we are interested in testing the significance of the difference, Specification

1 is preferred (simple t-test for γ1 will do the job for Specification 1)
Specification 3
Suppose
Y = β0 + γ1 D + β1 S + γ2 D · S + β2 E + ε
Y = (β0 + γ1 ) + (β1 + γ2 )S + β2 E for men

2
This means that γ1 measures the difference of intercepts in male and female wage
equations while γ2 measures the difference in the marginal effects of schooling in
male and female wage equations
Here we implicitly assume that marginal effect of experience is the same between
men and women
Specification 4
Suppose
Y = β0 + γ1 D + β1 S + γ2 D · S + β2 E + γ3 D · E + ε
Y = (β0 + γ1 ) + (β1 + γ2 )S + (β2 + γ3 )E for men

This means that γ1 measures the difference of intercepts in male and female wage
equations while γ2 measures the difference in the marginal effects of schooling in male
and female wage equations. Similarly, γ3 measures the difference in the marginal
effects of experience in male and female wage equations
This case does not restrict any coefficient across gender
We may estimate the wage equation separately for men and women. However, still
using dummies is preferred if we are interested in measuring differences and testing
the significance of such differences
Example B.
We now suppose that we want to consider the wage equation specification for three
different race groups: e.g. Whites, Blacks, and Others (including every non-black
colored races)
For simplicity, we only consider potential intercept differences (but can extend to
slope parameter differences)
3
This can be done by creating two dummy variables:
 
1 if the i-th individual is black 1 if the i-th individual is white
Bi = Wi =
0 otherwise 0 otherwise
Note that Bi Wi = 0
Now consider the specification
Y = β0 + γ1 B + θ1 W + β1 S + β2 E + ε
This implies that
Y = (β0 + θ1 ) + β1 S + β2 E for Whites

Y = (β0 + γ1 ) + β1 S + β2 E for Blacks
Y = β0 + β1 S + β2 E for Others
Note that θ1 measures the difference in intercept between Whites and Others and γ1
measures the difference in intercept between Blacks and Others. What will measure
the difference in intercept between Whites and Blacks?
In general, when there are G different groups, we can handle intercept differences by
creating (G − 1) dummy variables
What happens if we include G dummies? In Example B, consider
Y = β0 + γ1 B + θ1 W + δ1 O + β1 S + β2 E + ε, (1)
where

1 if the i-th individual is neither black nor white
Oi =
0 otherwise
– Equation (1) implies that
Y = (β0 + θ1 ) + β1 S + β2 E (Whites)
Y = (β0 + γ1 ) + β1 S + β2 E (Blacks)
Y = (β0 + δ1 ) + β1 S + β2 E (Others)
4
– Note that it is not clear what β0 means. More seriously, Equation (1) is not
estimable due to the exact multicollinearity. Now let Xi = 1 for every i (equiv-
alently, the constant term). Then, (1) can be written
Y = β0 X + γ1 B + θ1 W + δ1 O + β1 S + β2 E + ε
Now, notice that for every i,
Bi + Wi + Oi = 1 = Xi
– Lesson: Although it is fine to work with G dummy variables when there are
G groups, we have to be careful not to include an intercept term. The same
comment applies to slope coefficients when we want to allow for the possibility
that the slope coefficients differ across groups
Tests involving the equality of coefficients in different regressions
One may want to know whether a given model applies to two different data sets (or
two different time periods)
To test whether the two regression models are identical, we can rely on the dummy
variables again
Consider regression models
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi , i = 1, . . . , n (2)

Yj = α1 + α2 X2,j + · · · + αk Xk,j + εj , j = 1, . . . , m (3)
Assume that
εi ∼ N (0, σ 2 ), i = 1, . . . , n
2
εj ∼ N (0, σ ) j = 1, . . . , m
That is, the error terms in both regressions have the same variance. We also assume
that the two error terms are independently distributed
We want to test whether
β1 = α1 , . . . , βk = αk
5
For this purpose, relabel the second data set so that we have now
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi , i = 1, . . . , n
Yi = α1 + α2 X2,i + · · · + αk Xk,i + εi , i = n + 1, . . . , n + m
Create a dummy variable


1 n + 1 ≤ i ≤ n + m
Di =
0 otherwise
We may then write the overall model as
Yi = β1 + β2 X2,i + · · · + βk Xk,i + γ1 Di + γ2 Di X2,i + · · · + γk Di Xk,i + εi , i = 1, . . . , n + m
And the hypothesis is
H0 : γ1 = . . . = γk = 0
The F -test is conducted as follows:
– Restricted model (obtain RSSR )
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi
– Unrestricted model (obtain RSSU )
Yi = β1 + β2 X2,i + · · · + βk Xk,i + γ1 Di + γ2 Di X2,i + · · · + γk Di Xk,i + εi
– Number of restrictions: k
– Number of observations: n + m
– Degrees of freedom: n + m − 2k
– F -statistic has degree of freedom equal to (k, n + m − 2k)
– F -statistic is then given by

(RSSR − RSSU )/k
F =
RSSU /(n + m − 2k)
– Note that, alternatively, RSSU can be obtained as RSS1 + RSS2 where RSS1
and RSS2 denote RSS’s from two separate estimations of (2) and (3), respec-
tively.

(2022-2-GEC-ê Ë ) LN5 - Dummy Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(2022-2-GEC-ê Ë ) LN5 - Dummy Variables

Uploaded by

Copyright:

Available Formats

Hyunchul Kim

Department of Economics Undergraduate

This note is based on the lecture note series by Kyoo il Kim.

 Dummy variables can be used to measure the difference of intercepts or coefficients

where Y : log(wage), S: years of schooling, E: years of job experience

 What does γ1 measure?

 Note the wage equation becomes

Y = (β0 + γ1 ) + β1 S + β2 E for men

 Note the wage equation becomes

 Difference between intercepts γ1 − γ2 measures the difference of intercepts in male

 Thus, if we are interested in testing the significance of the difference, Specification

 Note the wage equation becomes

Y = (β0 + γ1 ) + (β1 + γ2 )S + β2 E for men

 Note the wage equation becomes

Y = (β0 + γ1 ) + (β1 + γ2 )S + (β2 + γ3 )E for men

 This case does not restrict any coefficient across gender

 Now consider the specification

 This implies that

Y = (β0 + θ1 ) + β1 S + β2 E for Whites

 What happens if we include G dummies? In Example B, consider

– Equation (1) implies that

Now, notice that for every i,

Tests involving the equality of coefficients in different regressions

 Consider regression models

Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi , i = 1, . . . , n (2)

 We want to test whether

 Create a dummy variable

 We may then write the overall model as

Yi = β1 + β2 X2,i + · · · + βk Xk,i + γ1 Di + γ2 Di X2,i + · · · + γk Di Xk,i + εi , i = 1, . . . , n + m

And the hypothesis is

 The F -test is conducted as follows:

– Restricted model (obtain RSSR )

– Unrestricted model (obtain RSSU )

Yi = β1 + β2 X2,i + · · · + βk Xk,i + γ1 Di + γ2 Di X2,i + · · · + γk Di Xk,i + εi

– F -statistic is then given by

You might also like

Dummy variables can be used to measure the difference of intercepts or coefficients

What does γ1 measure?

Note the wage equation becomes

Note the wage equation becomes

Difference between intercepts γ1 − γ2 measures the difference of intercepts in male

Thus, if we are interested in testing the significance of the difference, Specification

Note the wage equation becomes

Note the wage equation becomes

This case does not restrict any coefficient across gender

Now consider the specification

This implies that

What happens if we include G dummies? In Example B, consider

Consider regression models

We want to test whether

Create a dummy variable

We may then write the overall model as

The F -test is conducted as follows: