You are on page 1of 9

ECONOMETRICS ASSIGNMENT

HIMANI PASRICHA
17617008
Question 1

Using the data in file mus16data.dta, which describes data from 2001 US Medical Expenditure
Survey. ambexp (ambulatory expenditure = physician and hospital outpatient). dambexp (=1 if
ambexp>0 and =0 if ambexp=0). Regressors: age (in tens of years), female, educ (years of completed
schooling), blhisp (=1 if black or hispanic) , totchr (number of chronic conditions), and ins (=1 if
PPO or HMO health insurance).
a) Estimate a Tobit model of ambulatory expenditure on the regressors. How do we
interpret the coefficients?
b) Compute and interpret the marginal effects for three mean specifications:
1. MEs for the left truncated mean, E(y/x,y>0)
2. MEs for the censored mean, E(y/x)
3. MEs when additionally there is right censoring at the median value of y, E(y/x,
0<y<535).
4. Calculate the impact of a change in regressor on the probability that “ambexp”
lies in the range of (4000, 80000).
c) Obtain the McDonald and Moffitt’s decomposition of marginal effects for tobit estimates.
d) Estimate the Tobit model using log of ambulatory expenditure. And compare the results
with those obtained from OLS regression.
e) Compare and contrast the results of tobit, left truncated and OLS methods.

Table 1: Result of OLS Regression with Dummy Variables


Source: Author’s Calculation
NOTE: Value in the bracket indicates the standard error.
*, **, *** represents that the variable is significant at10%, 5% and 1% level of significance respectively

The data here used is from file mus16data.dta which describes the 2001 US Medical Expenditure. The
tobit is model is used to see the effect of different variables on ambulatory expenditure. The first model
which is considered in the analysis is censored tobit regression model :

𝐷𝑒𝑓𝑖𝑛𝑒 𝑎 𝑙𝑎𝑡𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 ; 𝑌𝑖∗ = 𝛽𝑋𝑖 + 𝑢𝑖

𝑤ℎ𝑒𝑟𝑒 ; 𝑌𝑖 = 1 𝑖𝑓 𝑌𝑖∗ > 0


𝑌𝑖 = 0 𝑖𝑓 𝑌𝑖∗ ≤ 0
If 𝑌𝑖 is not binary variable but is observed as 𝑌𝑖∗ if 𝑌𝑖∗ > 0 and is not observed for 𝑌𝑖∗ ≤ 0 , then

𝑌𝑖 = 𝑌𝑖∗ = 𝛽𝑋𝑖 + 𝑢𝑖 𝑖𝑓 𝑌𝑖∗ > 0


𝑌𝑖 = 0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
After regressing the explanatory variables on dependent varaibles estimates are obtained and are
givenin Table 1 Cloumn 1. It is found that by increasing the Age of person by year, by holding other
variables constant, will increase its ambulatory expenditure by $314.1479. Simillarly if a person is female
and if education level is increased by unit level then its ambulatory expenditure increases by $684.9918
and $70.8656 respectively. But a person being a balck or hispanic will decrease its ambulatory
expenditure by $530.311. It is also found that the unit level increase in chronic condition will also
increase the expenditure on medical services by $1244.578. If a person is insured and took a insurance
policy then its ambulatory expenditure decreases by $167.4717.

If 𝑀𝑎𝑙𝑒𝑖 = 1, that means if person is male then model will be:


𝑊𝑎𝑔𝑒 𝑖 = 224.843 + 5.076𝐼𝑄𝑖 + 4.496𝐸𝑥𝑝𝑒𝑟𝑖 + 498.049
If 𝑀𝑎𝑙𝑒𝑖 = 0, that means if person is female then model will be:
𝑊𝑎𝑔𝑒 𝑖 = 224.843 + 5.076𝐼𝑄𝑖 + 4.496𝐸𝑥𝑝𝑒𝑟𝑖
The graph below shows the effect of differential intercept on wage.
1500
1000
500

50 100 150
IQ

yhat, MALE == 0 yhat, MALE == 1

Figure 1: EFFECT OF DIFFERENTIAL INTERCEPT


This implies that on an average the Wage of male will be 498.0493(1254) higher than of females i.e.
244.8438. Also by adding one more explanatory model make the model good fir with value of R-square
0.4552

The third model is given by:

𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝐴𝑔𝑒𝑖 + 𝛽2 𝐸𝑑𝑢𝑐2𝑖 + 𝛽3 𝐸𝑑𝑢𝑐3𝑖 + 𝛽4 𝐸𝑑𝑢𝑐4𝑖 + 𝑢𝑖

The results of this model can be predicted as:

i. The unit increase in age level on an average will increase the Wage by 22.8297.
ii. If 𝐸𝑑𝑢𝑐2𝑖 = 1 and 𝐸𝑑𝑢𝑐3𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0, then model becomes

𝑊𝑎𝑔𝑒𝑖 = 375.099 + 22.828𝐴𝑔𝑒𝑖 + 104.559

It depicts that if person has eduction with level 2 then its wage will increase by 104.5598 dollars
from 375.0996.
iii. If 𝐸𝑑𝑢𝑐3𝑖 = 1 and 𝐸𝑑𝑢𝑐2𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0 then model becomes
𝑊𝑎𝑔𝑒𝑖 = 375.099 + 22.828𝐴𝑔𝑒𝑖 + 255.256

It depicts that if person has eduction level 3 then on average its wage will increase by 255.2564
dollars from 375.0996 dollars.

iv. If 𝐸𝑑𝑢𝑐3𝑖 = 1 and 𝐸𝑑𝑢𝑐2𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0, then model becomes

𝑊𝑎𝑔𝑒𝑖 = 375.099 + 22.828𝐴𝑔𝑒𝑖 + 385.731

It depicts that if person has education level 4 then on average its wage will increase by 385.7313
dollars from 375.0996 dollars.
v. If 𝐸𝑑𝑢𝑐2𝑖 = 𝐸𝑑𝑢𝑐3𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0, then model becomes

𝑊𝑎𝑔𝑒𝑖 = 375.099 + 22.828𝐴𝑔𝑒𝑖

It depicts that if person has education level then on average by keeping every other element
constant its wage will be 375.09

The model 4 consider an interactive which shows he interaction effect of two variables i.e. male and
married. So here the effect of male and married employees is analyzed on wages. The model is stated as

𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑀𝑎𝑙𝑒𝑖 + 𝛽2 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 + 𝛽3 𝑀𝑀𝑖 + 𝑢𝑖

𝑤ℎ𝑒𝑟𝑒; 𝑀𝑎𝑙𝑒𝑖 = 1; person is Male


𝑀𝑎𝑙𝑒𝑖 = 0; person is Female
𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1; person is Married
𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0; person is unmarried.
𝑀𝑀𝑖 = 1; person male and married.

i. If 𝑀𝑎𝑙𝑒𝑖 = 0, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 => 𝑀𝑀𝑖 = 0, then model will become :

𝑊𝑎𝑔𝑒𝑖 = 622.843

It depicts that if person is female and unmaaried then on average is wage will be 622.843
dollars.
ii. If 𝑀𝑎𝑙𝑒𝑖 = 0, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 => 𝑀𝑀𝑖 = 0, then model becomes:

𝑊𝑎𝑔𝑒𝑖 = 622.843 + 117.206

It depicts that if person is female and is married then on average its wage will be 117.206 dollars
more than the 622.8437 dollars.
iii. If 𝑀𝑎𝑙𝑒𝑖 = 1, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 => 𝑀𝑀𝑖 = 0, then model becomes:

𝑊𝑎𝑔𝑒𝑖 = 622.843 + 487.767

It depicts that if person is male and is unmarried then its wage on average will be 487.767
dollars more than 622.8437.
iv. If 𝑀𝑎𝑙𝑒𝑖 = 1, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 => 𝑀𝑀𝑖 = 1, then model becomes:
𝑊𝑎𝑔𝑒𝑖 = 622.843 + 487.767 + 39.947
It depicts that if person is male and is married then on average its wage will be 644.921 dollars
(487.767+117.206+39.947) more than the 622.843 dollars.
Question 2:

Using the data file smoker.xls containing the data of a random sample of 1196 males, analyze the
following. The variables used in the analysis are as follows:
Smoker= 1 for smokers and 0 for nonsmokers
Age=in years
Education=number of years of schooling
Income=family income
Pcigs= price of cigarettes in individual states
1. Treating the smoker as the dependent variable and other variables as regressors, see whether
Logit model fits the data.
2. See whether the probit model does well as the logit model.
3. Fit the LPM to these data.
4. Compare the results of the three models
5. Investigate the effect of interactive terms like Education*Income, Age*Education.

Table 2 : Results of Logit, Probit and LPM model

Variables LOGIT PROBIT LPM

COEFFICIE ODDS MARGINAL COEFFICIE MARGINAL COEFFICIE


NT RATIO EFFECTS NT EFFECTS NT
Education -0.0909** 0.913** -0.021** -0.056** -0.021** -0.0206**
(-0.206) (0.018) (0.004) (0.012) (0.0047) (0.004)
Age -0.020** 0.979** -0.008** -0.012** -0.004** -0.004**
(0.003) (0.003) (0.0008) (0.0022) (0.0008) (0.0008)
Income 0.0000047 1.000 0.000001 0.000002 0.000001 0.000001
(0.000007) (0.000007 (0.0000016) (0.000004) (0.0000016) (0.000001)
Pcigs79 -0.0223 0.977* -0.005* -0.013* -0.0052 -0.0051*
(0.0124) (0.012) (0.002) (0.0076) (0.0029) (0.0028)
Cons 2.745** 15.565** 1.701** 1.123**
(0.829) (12.907) (0.5105) (0.1883)
LL -809.18848 -770.84086 -770.5881

AIC 1628.377 1551.6817 1551.1762

BIC 1653.8107 1577.1154 1576.6099

Pseudo R- 0.029 0.0301 0.038


square/ R-square
Source: Author’s Calculation
NOTE: Value in the bracket indicates the standard error.
*, **, *** represents that the variable is significant at10%, 5% and 1% level of significance respectively

The data used here is from the file smoker.xls and here we are testing whether a person is smoker or
not depending on some explanatory factors. Firstly, we are using the logit model where

𝑃𝑖
𝐿𝑖 =
1 − 𝑃𝑖
Where ; 𝑃𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑝𝑒𝑟𝑠𝑜𝑛 𝑏𝑒𝑖𝑛𝑔 𝑠𝑚𝑜𝑘𝑒𝑟
The four explanatory variables are Education, Age Income and Price od Cigarette. The results of logit
model are given in Table 2 Column 1 where its coeffiecint , odds ratio and marginal effects are
explained. Coffeicients are exaplained as :
i. The unit increase in the education level will decrease the log of odds ratio by 0.09.
ii. The unit increase in the age of a person will decrease the log of odds ratio by 0.0208.
iii. The unit increase in the income of an individual will increase the log of odds ratio by 0.0000047.
iv. The unit increases in the price of cigarette will decrease the log of odds by 0.0223.
The interpretation of odds ratio is given by:
i. The unit increase in education level of an individual will increase the odds in favor of being
smoker by 91.304 percent.
ii. The unit increase in the age level of an individual will increase the odds in favor of being smoker
by 97.936 percent.
iii. The unit increase in the income level of an individual will increase the odds in favor of being
smoker by 100 percent.
iv. The unit increase in the price of cigarette will increase the odds in favor of being smoker by
97.792 percent.
The marginal effects given the probabilistic change and the interpretation of each variable is given by:
i. The unit increase in the education level of an individual will decrease the probability of
individual being a smoker by 0.0206 on an average.
ii. The unit increase in the age of an individual will decrease the probability of an individual being a
smoker by 0.00472 on an average.
iii. The unit increase in the income level of an individual will increase the probability of an
individual being a smoker by 0.00000107on an average.
iv. The unit increase in the price of cigarette will decrease the probability of an individual being a
smoker by 0.005 on an average.

Similarly, the probit model is used to analyze the result, they are given in column 2 of table 1. Probit
uses the normal distribution and its marginal effects also gives the probabilistic change. As the unit
increase in the education level of an individual will decrease the probability of individual being a
smoker by 0.0207 on an average. Also, the unit increase in the age of an individual will decrease the
probability of an individual being a smoker by 0.00477 on an average. Correspondingly, the unit
increase in the income level and price of cigarettes will increase the probability of an individual
being a smoker by 0.00000106 and 0.0058 respectively on an average.
Next we will analyze the data by using LPM model , the results are tabulated in table 2 column 3.
The results are interpreted as
i. The unit increase in the education level of an individual will decrease the probability of smoking
by 0.0206131.
ii. The unit increase in the age of an individual will decrease the probability of smoking by 0.00472.
iii. The unit increase in the income level of an individual will increase the probability of smoking by
0.00000106.
iv. The unit increase in the price of cigarette will decrease the probability of smoking by 0.00513 on
an average.
Comparison of three models can be done by using different criteria like value of AIC, BIC and Likelihood.
LPM model is not used over logit and probit due to its limitations and therefore it possesses some
problems:
1. Non – Normality of the disturbance term ui.
2. Heteroscedastic Variance of the disturbance term.
3. E[Yi|Xi] sometimes doesn’t lie between 0 and 1.
4. Value of R-square is of limited value.
The AIC value for probit is 1551.1762 which is less than the value (1551.6817) of AIC for logit. Similarly,
the BIC value (1576.6099) for probit is less than the BIC value (1577.1154) for logit. The likelihood value
is highest in case of probit model i.e. -770.5881. Also the pseudo R-square is 0.029 and .0301.Therefore,
probit is preferred over logit.

In the next model we added two more new variables age*education and education*income to see there
effect on person smoking pattern. It is seen that by adding this variable the value R-square increased to
0.039 that implies this model is better fit. Both logit and probit results are tabulated in table 3 . These
results are interpreted as:
i. The unit increase in variable age*education on an average will increase the probability of being
smoker by 0.000481.
ii. The unit increase in variable education*income will decrease the probability of being smoker by
0.00000015 on an average.

Variables dy/dx dy/dx


LOGIT PROBIT
Education -.0154 -.016
(0.016) (0.168)
Age -.0095** -.009**
(0.003) (0.003)
Income .00001** .000019**
(0.000006) (0.000005)
Pcigs79 -.0048* -.0049**
(0.0027) (0.0028)
Age*Education .000481* .00049*
(0.00027) (0.00027)
Education*Income -0.000001** -0.000001**
(0.00000048) (0.00000047)

Pseudo R-square 0.0396 0..0401

Source: Author’s Calculation


NOTE: Value in the bracket indicates the standard error.
*, **, *** represents that the variable is significant at10%, 5% and 1% level of significance respectively

The probit model is preferred over logit model as pseudo R-square is higher in that case.

APPENDIX

Question 1.

regress WAGE IQ EXPER

regress WAGE IQ EXPER MALE

predict yhat

separate yhat, by(MALE)

line yhat0 yhat1 IQ,sort


regress WAGE AGE EDUC2 EDUC3 EDUC4

generate MM= MALE*MARRIED

regress WAGE MALE MARRIED MM

Question 2.

logit smoker educ age income pcigs79

logit smoker educ age income pcigs79,or

quietly logit smoker educ age income pcigs79

margins,dydx(*) atmean

probit smoker educ age income pcigs79

margins,dydx(*) atmean

regress smoker educ age income pcigs79

quietly logit smoker educ age income pcigs79

estimates store blogit

quietly probit smoker educ age income pcigs79

estimates store bprobit

quietly regress smoker educ age income pcigs79

estimates store bols

estimates table blogit bprobit bols, stats(chi2 df N aic bic rank ll)

generate eduinc= educ*income

generate ageedu= age*educ

logit smoker educ age income pcigs79 eduinc ageedu

probit smoker educ age income pcigs79 eduinc ageedu

You might also like