You are on page 1of 27

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

How to quantify qualitative variables to quantitative variables ? Why do we need to do this ? Econometric model needs quantitative variables to estimate its parameters

What are the differences among these variables: Dummy? Indicator? Binary? Dichotomy? Categorical

ECONOMETRIC MODEL WITH DUMMY VARIABLES


Specifically:
What if the variables are not quantitative variables, like: I. Male-female; Urban-rural; Yes-No; foreign-domestic II. Level of education: SD, SLTP, SLTA, D3, S1, S2, S3 Choice if investment: stock, certificate of BI, gold, etc.

Other Usages:
How to model Unstable Regression? - Jumping Regression - Shifting Regression

Technically speaking, do we have problems with our model if: - Independent variable (s) is (are) a dummy (ies) - Dependent variables is a dummy

Illustration:
We would like to analyze whether there are differences between graduate and undergraduate students in weekly entertainment spending. Y: weekly spending for entertainment per student PS: graduate or undergraduate PS = 1 ; graduate student PS = 0 ; undergraduate student Model: Y = + PS + u From the model, an average spending: Graduate student: E (Y PS = 1) = + Undergraduate student: E (Y PS = 0) =

For example, by using data from a survey, the estimated model is the following: Y = 9,4 + 16 PS t (53,22) (6,245) R2 = 96,54% The model indicates that 0 dan 0 (statistically signifiant) Interpretation: average spending for graduate students: 9,4 + 16 = 25,4, average spending for under graduate students: 9,4 (There is a difference between spending of the two groups) The next question is whether graduate students more able or more consumptive in entertainment spending than undergraduate students

Professors salary = f (experience, sex)


Do we have a discrimination in salary policy against female professors? Y = X = G = yearly salary of a professors years of teaching 1 ; male professors 0 ; female professors

A model that can relate X and G to Y: Y = 1 + 2 G + X + u From the model, it can be seen that: Average salary of female professor = 1 + X Average salary of male professor = 1 + 2 + X

Secara geometris:
Y Gaji tahunan Dosen laki-laki Dosen perempuan 2 1 X Pengalaman mengajar

Katakanlah berdasarkan data didapat: Y = 19,21 + 0,373 G + 1,453 X t (11,33) (1,141) (37,997) R2 = 89,75% Adakah diskriminasi?

How about if we define differently


S = = 1; female professor 0; male professor

Since we define dummy variable differently, will we have different result substantively? Model with new definition:

Y = 1 + 2 S + X + u

Kalau digambar akan menjadi:


Y Gaji tahunan Dosen perempuan Dosen laki-laki 2 1 X Pengalaman mengajar

Perlu diperhatikan sekarang bahwa berdasarkan pendefinisian baru: Rata-rata gaji dosen perempuan = 1 + 2 + X Rata-rata gaji dosen laki-laki = 1 + X

Remark
In defining dummy variable, which category is representing by one or zero does not matter as long as the estimated model is interpreted consistently.

What happened if we define dummy variable as follows:


D2 = 1; male professor 0; female professor D3 = 1; female professor 0; male professor The model with this definition:

Y = 1 + 2 D2 + 3 D3 + X + u
When we estimate this model with OLS, what will happened ?

Tabel. Nama Dosen Berdasar Jenis Kelamin


Nama Ana Hen Annisa Budi Bambang Badrun Betty Sex P P L L L P D2 0 0 1 1 1 0 D3 1 1 0 0 0 1

Hubungan antar regresor: D 2 = 1 - D3 atau D3 = 1 - D2 Akibat: Perfect Collinear Aturan main: Jika jumlah kategori sebanyak m, maka kita hanya memerlukan m-1 variabel dummy.

Qualitative Variables with more than two categories


Levels of Education: SD, SLTP, SLTA, D3, S1, S2, S3 Choices of Investments: Stock, Saving Deposits, Property, Gold Can we represent these types of variables with dummy variables? How? Supposed we have 3 categories of Education Levels: (i) Graduate from Secondary School or lower, (ii). Graduate from High School, (iii). Graduate from University

Can we represent these types of variables with a Variable that has different values like: 1, 2, and 3 based on the number of categories? Should we define differently? Try define as follows: D2 = 1 ; if the highest level of education is high school 0 ; others D3 = 1 ; if the highest level of education is university 0 ; others

Do we need to define the other category explicitly?

Ilustrasi: 3 kategori dengan 2 variabel dummy


Nama

Pendidikan

D2

D3

Ana Hen Annisa Budi Bambang Badrun Betty

SD PT SMU SD SLTP SMU

0 0 1 0 0 1

0 1 0 0 0 0

Life Insurance Consumption = f (income, education)


See the following model: Y = 1 + 2 D 2 + 3 D 3 + X + u life insurance expenses per year income per year 1 ; high school degree 0 ; others D3 = 1 ; college degree (S1) 0 ; others Average spending based on education: less than high school : 1 + X (base category) high school : 1 + 2 + X university/college (S1): 1 + 3 + X Notes: Reference group is less than high school. Why? Y = X = D2 =

Bagaimana memilih kelompok dasar? Pengeluaran Asuransi berdasarkan Tingkat Pendidikan dan Pengeluaran
Y S1 SMU Tidak tamat SMU 3 2 1 Pendapatan (X) Diasumsikan : 3 > 2

Model dg Beberapa Variabel Kualitatif


Gaji = f ( pengalaman, sex, di fakultas apa) Y = 1 + 2 D2 + 3 D3 + X + u Y = gaji/tahun X = lamanya mengajar/pengalaman (tahun) 1 ; dosen laki-laki D2 = 0 ; dosen perempuan D3 = 1 ; Dosen FE 0 ; lainnya Rata-rata gaji dosen perempuan diluar FE = 1 + X Rata-rata gaji dosen laki-laki diluar FE = 1 + 2 + X Rata-rata gaji dosen perempuan di FE = 1 + 3 + X Rata-rata gaji dosen laki-laki di FE = 1 + 2 + 3 + X

Berdasarkan pengolahan data didapat:


Y = 7,43 + 0,207 D2 + 0,164 D3 + 1,226 X R2 = 91,22% Apa artinya bila: (i) uji-t menunjukan variabel D2 dan D3 tidak signifikan. (ii) uji-t menunjukan bahwa semua koefisien variabel signifikan

Rata-rata Gaji: Dosen P diluar FE = 7,43 + 1,226 = Rp.8,656 juta. Dosen L diluar FE=7,43+0,207+1,226 = Rp.8,863 juta. Dosen P di FE=7,43 +0,164 + 1,226 = Rp.8,820 juta. Dosen L di FE=7,43+0,207+0,164+1,226 =Rp.9,027 juta.

Pemodelan upah : Moonlighting


Moonlighter adalah orang yang mempunyai satu pekerjaan utama dan satu atau lebih pekerjaan sambilan. Dugaan: pekerja jenis ini mempunyai penghasilan yang kurang memadai dari pekerjaan utamanya, sehingga terpaksa mencari sumber pendapatan lain. Apa pemicunya? Wm = upah moonlighting/jam Wu = upah pekerjaan utama Ras = 0 ; Bukan pribumi = 1 ; Pribumi Kota = 0 ; pedesaan = 1 ; perkotaan SMU = 0 ; tidak lulus SMU = 1 ; lulus SMU Wilayah= 0 ; Kawasan Timur = 1 ; Kawasan Barat Umur = umur (dalam tahun)

Model yang ditawarkan:

Wm = 1+ 2 Wu+ 3 Ras+ 4 Kota+ 5 SMU+ 6 Wilayah+ 7 Umur+ u Misalkan, berdasarkan suatu sampel, model terestimasi: Wm = 37,07 + 1,403 Wu - 90,06 Ras + 75,51 Kota + 47,33 SMU + 113,64 Wilayah + 2,26 Umur Apa artinya bila uji-F, dan uji-t, ternyata semua variabel signifikan pada tingkat signifikansi 5%. Rata-rata upah pekerja bukan pribumi di pedesaan KTI dan tidak lulus SMU: Wm = 37,07 + 1,403 Wu + 2,26 Umur Rata-rata upah pekerja pribumi di perkotaan KBIdan lulus SMU: Wm = (37,07-90,06+75,51+113,64+47,33) +1,403Wu + 2,26 Umur Wm = 183,49 +1,403Wu + 2,26 Umur

Comparing 2 regressions

Saving (Y) = 1 + 2 Income (X) + u The above model indicates that saving and income do not behave differently across sampel and time. However, in reality, there is a possibility that the model differs between before and after a certain event. Let say, behavior of saving is different between prior and post an economic crisis. How to accommodate this changing in saving behavior? The following model can be used in accommodating a change.

Periode I, before crisis:Yi = 1 + 2 Xi + ui ; i = 1,2, , n Periode II, after crisis:Yi = 1 + 2 Xi + i ; i = n+1, n+2, , N

Possibilities in comparing those two models: Case 1: 1 = 1 and Case 2: 1 1 and Case 3: 1 = 1 and Case 4: 1 1 and 2 = 2 2 = 2 2 2 2 2

Case 1 : both models are the same, no shift Case 4 : both models are different and there is a shift

Dummy variables can be used in addressing this type of change.

Membandingkan 2 regresi dengan variabel dummy Mengantisipasi adanya pergeseran model regresi: Yi = 1 + 2 Di + 1 Xi + 2 Di Xi + ui Di = 1 ; pengamatan pada periode 1 0 ; pengamatan pada periode 2 Sehingga, rata-rata tabungan (Y) pada periode : I : Yi = (1 + 2) + (1 + 2) Xi II : Yi = 1 + 1 Xi

Bagaimana mengetahui adanya pergeseran model?


1: Bila 2: Bila 3: Bila 4: Bila 2 = 0 2 0 2 = 0 2 0 dan dan dan dan 2 = 0 2 = 0 2 0 2 0 Model I = Model II Slope sama, intercept beda Intercept sama, slope beda Intercept dan slope berbeda

Regresi linier sepotong-sepotong (Piecewise linear regression)


Aplikasi: Pemodelan komisi penjualan Skenario: Bonus diberikan jepada penjual yang melampaui terget, X*, misalnya. Y = X = X* = D = komisi penjualan volume penjualan yang dicapai oleh salesman target penjualan 1 ; bila X > X* 0 ; bila X X* Rata-rata komisi penjualan bila tidak melebihi target: Komisi = 1 + 1 X ; X < X*

R a ta -ra ta k o m is i p e n ju a la n b ila m e la m p a u i ta rg e t : K o m is i = 1 + ( 1 + 2 ) X - 2 X * ; X * X S e h in g g a m o d e ln ya d a p a t d ig a b u n g m e n ja d i : Y = 1 + 1 X + 2 (X X *) D

S e c a ra g e o m e tris :
K o m is i

1 X* P e n ju a la n

The end of the lesson


by Nachrowi D. Nachrowi

You might also like