You are on page 1of 24

Lecture 4

Multi regression analysis with


qualitative information: dummy variable

Đinh Thị Thanh Bình, PhD


Faculty of International Economics, FTU
Definition

• Quantitative variable: their values are measured with


numbers.
• Qualitative variable: reflex some characteristics of
the subject. Eg. Gender, race of an individual, the
industry of firm.
• To incorporate qualitative factors into regression
models, we have to “transfer” them into numbers
 dummy variable
Example

• Female = 1 when the person is female, and female =


0 when the person is male.

• Married = 1 when the person got married, and = 0


otherwise.

• Construction = 1 when the person is working in


construction field, = 0 otherwise.
1. A single dummy independent variable

wage  0   0 female  1educ  u (1)

 0  E(wage | female  1, educ)  E(wage | female  0, educ)

Female = 1 corresponds to females, female = 0


corresponds to male

 0  E(wage | female, educ)  E(wage | male, educ)


The level of education is the same in both expectations,
the difference, 0, is due to gender only.
Y
men: wage  0  1educ

slope  1

women : wage  (0   0 )  1educ

0
0   0
X

Figure 6.1: Graph of wage  0   0 female  1educ  u;  0  0


- Men earn a fixed amount more per hour than women  the
intercept is different
- Higher education, higher wage for both men and women
- The slopes are the same as the difference does not depend on the
amount of education.
Note: If one qualitative variable has n characteristics
 include only n-1 dummy variables in the regression.
The dummy variable is not included in the model
 base group or benchmark group.
E.g.: Gender has 2 characteristics: male and female
 use only 1 dummy variable male or female
-If female is the base group, we have the model:

wage  0  0 male  1educ  u


- Using 2 dummy variables would introduce perfect
collinearity because female + male = 1, which means that
male is perfect linear function of female.
2. Using multiple dummy variables in the model

- We can include more than 1 dummy variable in the


model:

wage  0   0 female  1married  1educ  u (2)

However, an important limitation of this model is that the


effect of “married” on wage is assumed to be the same
for men and women.
- We can overcome this disadvantage by generating 4
groups: married man, married woman, single man, single
woman

-If the base group is single men, the model will be:
wage  0   0 marrmale  1marrfemale   2 sin gfem  1educ  u (3)

Note: we have to exclude the variables female and


married from the model

Practice with file WAGE1


- For example, we have the results:

log(wage)  0.321  0.213marrmale  0.198marrfem


0.110sin gfem  ....

- The coefficients present the difference in wage


compared with the base group, sing male.
- Married men are estimated to earn about 21.3% more
than single men, holding other factors fixed.
- Single women are estimated to earn 8.8% more than
single men  ( =-0.110-(-0.198) = 0.088)
3. Incorporating ordinal information
by using dummy variables
-Ownership of firms
-Qualification of students
-Outside looks
4. Interactions between dummy variables
- Instead of using the model (3)
wage  0   0 marrmale  1marrfemale   2 sin gfem  1educ  u (3)

We can generate an interaction variable of 2 dummy


variables:
wage   0   0 female  1married   2 female.married + 1educ  u (4)

- The estimated results of 2 models are the same.


5. Interaction between dummy and quantitative
variables
- This interaction, for example, permits to check if the
effect of education on wage is the same for men and
women.
wage  0   0 female  1educ  1 female.educ  u

wage  (0   0 female)  (1  1 female)educ  u (5)

-If female = 0, constant coefficient of male is  0 and the


slope is 1

-If female = 1, constant coefficient of female is


0   0 and the slope is 1  1
.  0 Presents the difference of constant coefficient
between male and female.

. 1 Presents the difference of education’s effect on


income of male and female
Case 1: wage  (0   0 female)  (1  1 female)educ  u
 0  0, 1  0
- Higher education, higher
wage men wage for both male and
female
- Women have lower
wage then men at all
women levels of education
- The marginal effect of
education on the wage
of men is higher than
0 that of women 
Higher education, higher
0   0
gap in wage between
0 educ male and female.
wage  (0   0 female)  (1  1 female)educ  u
 0  0, 1  0
-The intercept for women is below that for men, but the
slope on education is larger for women.
- This means that women earn less than men at low levels
of education, but the gap narrow as education increases.
- At some point, a woman earns more than a man.
Case 2: wage  (0   0 female)  (1  1 female)educ  u
 0  0, 1  0

- Higher education, higher


wage women wage for both male and
female
- At the lower level of
men education, men have higher
wage than women.
- The marginal effect of
education on wage of
women is higher than that
of men  from a particular
0 level of education, women
0   0 have higher wage than
men.
0 educ
Hypothesis test:
Hypothesis 1: Return to education on wage is the same
for male and female.
H 0 : 1  0

- There is no constraint for  0 . It means that it is possible


to have difference in wage of male and female, but the
return to education on wage is the same. (Hình 6.1)

- Use t-test
Hypothesis 2: Wage is the same for both male and female
at different level of education.

H 0 :  0  0, 1  0

- Use F-test
6.5 Ví dụ về ứng dụng sử dụng biến giả
Số liệu tiết kiệm và thu nhập cá nhân ở nước Anh từ
1946-63 (triệu pounds)

TK I Tiết kiệm Thu nhập TK II Tiết kiệm Thu nhập


1946 0.36 8.8 1955 0.59 15.5
1947 0.21 9.4 1956 0.9 16.7
1948 0.08 10 1957 0.95 17.7
1949 0.2 10.6 1958 0.82 18.6
1950 0.1 11 1959 1.04 19.7
1951 0.12 11.9 1960 1.53 21.1
1952 0.41 12.7 1961 1.94 22.8
1953 0.5 13.5 1962 1.75 23.9
1954 0.43 14.3 1963 1.99 25.2
Mục tiêu: Kiểm tra hàm tiết kiệm có thay đổi cấu trúc
giữa 2 thời kỳ hay không.
Cách 1: Lập hai mô hình tiết kiệm ở 2 thời kỳ

- Thời kỳ tái thiết: 1946-54: Yi  1   2 X i  u1i

- Thời kỳ hậu tái thiết: 1955-63: Yi  1  2 X i  u2i

- Và kiểm định các trường hợp sau


1  1 1  1 1  1 1  1
 2  2  2  2  2  2  2  2
Cách 2: Sử dụng biến giả

B1. Lập hàm tiết kiệm tổng quát của cả 2 thời kỳ


Yi  ˆ1  ˆ2 X i  ˆ3 Zi  ˆ4 X i Zi  ui

Với n = n1 + n2
Z=1 quan sát thuộc thời kỳ tái thiết
Z=0 quan sát thuộc thời kỳ hậu tái thiết

B2. Kiểm định giả thuyết H0: 3=0


Nếu chấp nhận H0: loại bỏ Z ra khỏi mô hình

B3. Kiểm định giả thuyết H0: 4=0


Nếu chấp nhận H0: loại bỏ ZiXi ra khỏi mô hình
Kết quả hồi quy theo mô hình như sau

Yi  1,75  0,15045 X i  1, 4839Zi  0,1034 X i Zi  ui


t= (-5,27) (9,238) (3,155) (-3,109)
p= (0,000) (0,000) (0,007) (0,008)

Yi  (1,75  1, 4839Zi )  (0,15045  0,1034Zi ) X i  ui

Nhận xét
•Tung độ gốc chênh lệch và hệ số góc chênh lệch
có ý nghĩa thống kê
•Các hồi quy trong hai thời kỳ là khác nhau
Thời kỳ tái thiết: Z = 1

Yˆi  1,75  0,15045 X i  1,4839  0,1034 X i


Yˆi  0,2661  0,0475 X i
Thời kỳ hậu tái thiết: Z = 0

Yˆi  1,75  0,15045 X i


Tiết kiệm Yˆi  1,75  0,15045 X i
Thời kỳ hậu tái thiết
Yˆi  0,2661  0,0475 X i

Thời kỳ tái thiết


Thu nhập
-0.27

-1.75

Hình 6.4 Mô hình hồi quy cho 2 thời kỳ

You might also like