You are on page 1of 6

Hyunchul Kim

Department of Economics Undergraduate


Sungkyunkwan University Econometrics

Lecture Note 5
Using the Multiple Regression Model

This note is based on the lecture note series by Kyoo il Kim.

Dummy Variables

ˆ Dummy variables can be used to measure the difference of intercepts or coefficients


for different groups

ˆ Consider the following linear regression of log of wage on years of schooling and years
of job experience:

Y = β0 + β1 S + β2 E + ε

where Y : log(wage), S: years of schooling, E: years of job experience

Question: Does the same regression line apply to both men and women? We may
answer this question using a dummy variable

1 if the i-th individual is male
Di =
0 otherwise

Example A.

Specification 1

ˆ Suppose

Y = β0 + γ1 D + β1 S + β2 E + ε

ˆ What does γ1 measure?

ˆ Note the wage equation becomes

Y = (β0 + γ1 ) + β1 S + β2 E for men


Y = β0 + β1 S + β2 E for women

1
ˆ This means that γ1 measures the difference of intercepts between male wage and
female wage equations

ˆ Here we implicitly assume that marginal effects of schooling and experience are the
same between men and women

Specification 2

ˆ Now let

1 if the i-th individual is male
D1i =
0 otherwise

1 if the i-th individual is female
D2i =
0 otherwise

and suppose

Y = γ1 D1 + γ2 D2 + β1 S + β2 E

ˆ Note the wage equation becomes

Y = γ1 + β1 S + β2 E for men
Y = γ2 + β1 S + β2 E for women

ˆ Difference between intercepts γ1 − γ2 measures the difference of intercepts in male


and female wage equations

ˆ Thus, if we are interested in testing the significance of the difference, Specification


1 is preferred (simple t-test for γ1 will do the job for Specification 1)

Specification 3

ˆ Suppose

Y = β0 + γ1 D + β1 S + γ2 D · S + β2 E + ε

ˆ Note the wage equation becomes

Y = (β0 + γ1 ) + (β1 + γ2 )S + β2 E for men


Y = β0 + β1 S + β2 E for women

2
ˆ This means that γ1 measures the difference of intercepts in male and female wage
equations while γ2 measures the difference in the marginal effects of schooling in
male and female wage equations

ˆ Here we implicitly assume that marginal effect of experience is the same between
men and women

Specification 4

ˆ Suppose

Y = β0 + γ1 D + β1 S + γ2 D · S + β2 E + γ3 D · E + ε

ˆ Note the wage equation becomes

Y = (β0 + γ1 ) + (β1 + γ2 )S + (β2 + γ3 )E for men


Y = β0 + β1 S + β2 E for women

ˆ This means that γ1 measures the difference of intercepts in male and female wage
equations while γ2 measures the difference in the marginal effects of schooling in male
and female wage equations. Similarly, γ3 measures the difference in the marginal
effects of experience in male and female wage equations

ˆ This case does not restrict any coefficient across gender

ˆ We may estimate the wage equation separately for men and women. However, still
using dummies is preferred if we are interested in measuring differences and testing
the significance of such differences

Example B.

ˆ We now suppose that we want to consider the wage equation specification for three
different race groups: e.g. Whites, Blacks, and Others (including every non-black
colored races)

ˆ For simplicity, we only consider potential intercept differences (but can extend to
slope parameter differences)

3
ˆ This can be done by creating two dummy variables:
 
1 if the i-th individual is black 1 if the i-th individual is white
Bi = Wi =
0 otherwise 0 otherwise

Note that Bi Wi = 0

ˆ Now consider the specification

Y = β0 + γ1 B + θ1 W + β1 S + β2 E + ε

ˆ This implies that

Y = (β0 + θ1 ) + β1 S + β2 E for Whites


Y = (β0 + γ1 ) + β1 S + β2 E for Blacks
Y = β0 + β1 S + β2 E for Others

ˆ Note that θ1 measures the difference in intercept between Whites and Others and γ1
measures the difference in intercept between Blacks and Others. What will measure
the difference in intercept between Whites and Blacks?

ˆ In general, when there are G different groups, we can handle intercept differences by
creating (G − 1) dummy variables

ˆ What happens if we include G dummies? In Example B, consider

Y = β0 + γ1 B + θ1 W + δ1 O + β1 S + β2 E + ε, (1)

where

1 if the i-th individual is neither black nor white
Oi =
0 otherwise

– Equation (1) implies that

Y = (β0 + θ1 ) + β1 S + β2 E (Whites)
Y = (β0 + γ1 ) + β1 S + β2 E (Blacks)
Y = (β0 + δ1 ) + β1 S + β2 E (Others)

4
– Note that it is not clear what β0 means. More seriously, Equation (1) is not
estimable due to the exact multicollinearity. Now let Xi = 1 for every i (equiv-
alently, the constant term). Then, (1) can be written

Y = β0 X + γ1 B + θ1 W + δ1 O + β1 S + β2 E + ε

Now, notice that for every i,

Bi + Wi + Oi = 1 = Xi

– Lesson: Although it is fine to work with G dummy variables when there are
G groups, we have to be careful not to include an intercept term. The same
comment applies to slope coefficients when we want to allow for the possibility
that the slope coefficients differ across groups

Tests involving the equality of coefficients in different regressions

ˆ One may want to know whether a given model applies to two different data sets (or
two different time periods)

ˆ To test whether the two regression models are identical, we can rely on the dummy
variables again

ˆ Consider regression models

Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi , i = 1, . . . , n (2)


Yj = α1 + α2 X2,j + · · · + αk Xk,j + εj , j = 1, . . . , m (3)

Assume that

εi ∼ N (0, σ 2 ), i = 1, . . . , n
2
εj ∼ N (0, σ ) j = 1, . . . , m

That is, the error terms in both regressions have the same variance. We also assume
that the two error terms are independently distributed

ˆ We want to test whether

β1 = α1 , . . . , βk = αk

5
ˆ For this purpose, relabel the second data set so that we have now

Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi , i = 1, . . . , n
Yi = α1 + α2 X2,i + · · · + αk Xk,i + εi , i = n + 1, . . . , n + m

ˆ Create a dummy variable



1 n + 1 ≤ i ≤ n + m
Di =
0 otherwise

ˆ We may then write the overall model as

Yi = β1 + β2 X2,i + · · · + βk Xk,i + γ1 Di + γ2 Di X2,i + · · · + γk Di Xk,i + εi , i = 1, . . . , n + m

And the hypothesis is

H0 : γ1 = . . . = γk = 0

ˆ The F -test is conducted as follows:

– Restricted model (obtain RSSR )

Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi

– Unrestricted model (obtain RSSU )

Yi = β1 + β2 X2,i + · · · + βk Xk,i + γ1 Di + γ2 Di X2,i + · · · + γk Di Xk,i + εi

– Number of restrictions: k

– Number of observations: n + m

– Degrees of freedom: n + m − 2k
– F -statistic has degree of freedom equal to (k, n + m − 2k)

– F -statistic is then given by


(RSSR − RSSU )/k
F =
RSSU /(n + m − 2k)

– Note that, alternatively, RSSU can be obtained as RSS1 + RSS2 where RSS1
and RSS2 denote RSS’s from two separate estimations of (2) and (3), respec-
tively.

You might also like