Professional Documents
Culture Documents
Lecture Note 5
Using the Multiple Regression Model
Dummy Variables
Consider the following linear regression of log of wage on years of schooling and years
of job experience:
Y = β0 + β1 S + β2 E + ε
Question: Does the same regression line apply to both men and women? We may
answer this question using a dummy variable
1 if the i-th individual is male
Di =
0 otherwise
Example A.
Specification 1
Suppose
Y = β0 + γ1 D + β1 S + β2 E + ε
1
This means that γ1 measures the difference of intercepts between male wage and
female wage equations
Here we implicitly assume that marginal effects of schooling and experience are the
same between men and women
Specification 2
Now let
1 if the i-th individual is male
D1i =
0 otherwise
1 if the i-th individual is female
D2i =
0 otherwise
and suppose
Y = γ1 D1 + γ2 D2 + β1 S + β2 E
Y = γ1 + β1 S + β2 E for men
Y = γ2 + β1 S + β2 E for women
Specification 3
Suppose
Y = β0 + γ1 D + β1 S + γ2 D · S + β2 E + ε
2
This means that γ1 measures the difference of intercepts in male and female wage
equations while γ2 measures the difference in the marginal effects of schooling in
male and female wage equations
Here we implicitly assume that marginal effect of experience is the same between
men and women
Specification 4
Suppose
Y = β0 + γ1 D + β1 S + γ2 D · S + β2 E + γ3 D · E + ε
This means that γ1 measures the difference of intercepts in male and female wage
equations while γ2 measures the difference in the marginal effects of schooling in male
and female wage equations. Similarly, γ3 measures the difference in the marginal
effects of experience in male and female wage equations
We may estimate the wage equation separately for men and women. However, still
using dummies is preferred if we are interested in measuring differences and testing
the significance of such differences
Example B.
We now suppose that we want to consider the wage equation specification for three
different race groups: e.g. Whites, Blacks, and Others (including every non-black
colored races)
For simplicity, we only consider potential intercept differences (but can extend to
slope parameter differences)
3
This can be done by creating two dummy variables:
1 if the i-th individual is black 1 if the i-th individual is white
Bi = Wi =
0 otherwise 0 otherwise
Note that Bi Wi = 0
Y = β0 + γ1 B + θ1 W + β1 S + β2 E + ε
Note that θ1 measures the difference in intercept between Whites and Others and γ1
measures the difference in intercept between Blacks and Others. What will measure
the difference in intercept between Whites and Blacks?
In general, when there are G different groups, we can handle intercept differences by
creating (G − 1) dummy variables
Y = β0 + γ1 B + θ1 W + δ1 O + β1 S + β2 E + ε, (1)
where
1 if the i-th individual is neither black nor white
Oi =
0 otherwise
Y = (β0 + θ1 ) + β1 S + β2 E (Whites)
Y = (β0 + γ1 ) + β1 S + β2 E (Blacks)
Y = (β0 + δ1 ) + β1 S + β2 E (Others)
4
– Note that it is not clear what β0 means. More seriously, Equation (1) is not
estimable due to the exact multicollinearity. Now let Xi = 1 for every i (equiv-
alently, the constant term). Then, (1) can be written
Y = β0 X + γ1 B + θ1 W + δ1 O + β1 S + β2 E + ε
Bi + Wi + Oi = 1 = Xi
– Lesson: Although it is fine to work with G dummy variables when there are
G groups, we have to be careful not to include an intercept term. The same
comment applies to slope coefficients when we want to allow for the possibility
that the slope coefficients differ across groups
One may want to know whether a given model applies to two different data sets (or
two different time periods)
To test whether the two regression models are identical, we can rely on the dummy
variables again
Assume that
εi ∼ N (0, σ 2 ), i = 1, . . . , n
2
εj ∼ N (0, σ ) j = 1, . . . , m
That is, the error terms in both regressions have the same variance. We also assume
that the two error terms are independently distributed
β1 = α1 , . . . , βk = αk
5
For this purpose, relabel the second data set so that we have now
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi , i = 1, . . . , n
Yi = α1 + α2 X2,i + · · · + αk Xk,i + εi , i = n + 1, . . . , n + m
H0 : γ1 = . . . = γk = 0
Yi = β1 + β2 X2,i + · · · + βk Xk,i + εi
– Number of restrictions: k
– Number of observations: n + m
– Degrees of freedom: n + m − 2k
– F -statistic has degree of freedom equal to (k, n + m − 2k)
– Note that, alternatively, RSSU can be obtained as RSS1 + RSS2 where RSS1
and RSS2 denote RSS’s from two separate estimations of (2) and (3), respec-
tively.