Professional Documents
Culture Documents
HIMANI PASRICHA
17617008
Question 1
Using the data in file mus16data.dta, which describes data from 2001 US Medical Expenditure
Survey. ambexp (ambulatory expenditure = physician and hospital outpatient). dambexp (=1 if
ambexp>0 and =0 if ambexp=0). Regressors: age (in tens of years), female, educ (years of completed
schooling), blhisp (=1 if black or hispanic) , totchr (number of chronic conditions), and ins (=1 if
PPO or HMO health insurance).
a) Estimate a Tobit model of ambulatory expenditure on the regressors. How do we
interpret the coefficients?
b) Compute and interpret the marginal effects for three mean specifications:
1. MEs for the left truncated mean, E(y/x,y>0)
2. MEs for the censored mean, E(y/x)
3. MEs when additionally there is right censoring at the median value of y, E(y/x,
0<y<535).
4. Calculate the impact of a change in regressor on the probability that “ambexp”
lies in the range of (4000, 80000).
c) Obtain the McDonald and Moffitt’s decomposition of marginal effects for tobit estimates.
d) Estimate the Tobit model using log of ambulatory expenditure. And compare the results
with those obtained from OLS regression.
e) Compare and contrast the results of tobit, left truncated and OLS methods.
The data here used is from file mus16data.dta which describes the 2001 US Medical Expenditure. The
tobit is model is used to see the effect of different variables on ambulatory expenditure. The first model
which is considered in the analysis is censored tobit regression model :
50 100 150
IQ
i. The unit increase in age level on an average will increase the Wage by 22.8297.
ii. If 𝐸𝑑𝑢𝑐2𝑖 = 1 and 𝐸𝑑𝑢𝑐3𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0, then model becomes
It depicts that if person has eduction with level 2 then its wage will increase by 104.5598 dollars
from 375.0996.
iii. If 𝐸𝑑𝑢𝑐3𝑖 = 1 and 𝐸𝑑𝑢𝑐2𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0 then model becomes
𝑊𝑎𝑔𝑒𝑖 = 375.099 + 22.828𝐴𝑔𝑒𝑖 + 255.256
It depicts that if person has eduction level 3 then on average its wage will increase by 255.2564
dollars from 375.0996 dollars.
It depicts that if person has education level 4 then on average its wage will increase by 385.7313
dollars from 375.0996 dollars.
v. If 𝐸𝑑𝑢𝑐2𝑖 = 𝐸𝑑𝑢𝑐3𝑖 = 𝐸𝑑𝑢𝑐4𝑖 = 0, then model becomes
It depicts that if person has education level then on average by keeping every other element
constant its wage will be 375.09
The model 4 consider an interactive which shows he interaction effect of two variables i.e. male and
married. So here the effect of male and married employees is analyzed on wages. The model is stated as
𝑊𝑎𝑔𝑒𝑖 = 622.843
It depicts that if person is female and unmaaried then on average is wage will be 622.843
dollars.
ii. If 𝑀𝑎𝑙𝑒𝑖 = 0, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 => 𝑀𝑀𝑖 = 0, then model becomes:
It depicts that if person is female and is married then on average its wage will be 117.206 dollars
more than the 622.8437 dollars.
iii. If 𝑀𝑎𝑙𝑒𝑖 = 1, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 => 𝑀𝑀𝑖 = 0, then model becomes:
It depicts that if person is male and is unmarried then its wage on average will be 487.767
dollars more than 622.8437.
iv. If 𝑀𝑎𝑙𝑒𝑖 = 1, 𝑀𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 => 𝑀𝑀𝑖 = 1, then model becomes:
𝑊𝑎𝑔𝑒𝑖 = 622.843 + 487.767 + 39.947
It depicts that if person is male and is married then on average its wage will be 644.921 dollars
(487.767+117.206+39.947) more than the 622.843 dollars.
Question 2:
Using the data file smoker.xls containing the data of a random sample of 1196 males, analyze the
following. The variables used in the analysis are as follows:
Smoker= 1 for smokers and 0 for nonsmokers
Age=in years
Education=number of years of schooling
Income=family income
Pcigs= price of cigarettes in individual states
1. Treating the smoker as the dependent variable and other variables as regressors, see whether
Logit model fits the data.
2. See whether the probit model does well as the logit model.
3. Fit the LPM to these data.
4. Compare the results of the three models
5. Investigate the effect of interactive terms like Education*Income, Age*Education.
The data used here is from the file smoker.xls and here we are testing whether a person is smoker or
not depending on some explanatory factors. Firstly, we are using the logit model where
𝑃𝑖
𝐿𝑖 =
1 − 𝑃𝑖
Where ; 𝑃𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑝𝑒𝑟𝑠𝑜𝑛 𝑏𝑒𝑖𝑛𝑔 𝑠𝑚𝑜𝑘𝑒𝑟
The four explanatory variables are Education, Age Income and Price od Cigarette. The results of logit
model are given in Table 2 Column 1 where its coeffiecint , odds ratio and marginal effects are
explained. Coffeicients are exaplained as :
i. The unit increase in the education level will decrease the log of odds ratio by 0.09.
ii. The unit increase in the age of a person will decrease the log of odds ratio by 0.0208.
iii. The unit increase in the income of an individual will increase the log of odds ratio by 0.0000047.
iv. The unit increases in the price of cigarette will decrease the log of odds by 0.0223.
The interpretation of odds ratio is given by:
i. The unit increase in education level of an individual will increase the odds in favor of being
smoker by 91.304 percent.
ii. The unit increase in the age level of an individual will increase the odds in favor of being smoker
by 97.936 percent.
iii. The unit increase in the income level of an individual will increase the odds in favor of being
smoker by 100 percent.
iv. The unit increase in the price of cigarette will increase the odds in favor of being smoker by
97.792 percent.
The marginal effects given the probabilistic change and the interpretation of each variable is given by:
i. The unit increase in the education level of an individual will decrease the probability of
individual being a smoker by 0.0206 on an average.
ii. The unit increase in the age of an individual will decrease the probability of an individual being a
smoker by 0.00472 on an average.
iii. The unit increase in the income level of an individual will increase the probability of an
individual being a smoker by 0.00000107on an average.
iv. The unit increase in the price of cigarette will decrease the probability of an individual being a
smoker by 0.005 on an average.
Similarly, the probit model is used to analyze the result, they are given in column 2 of table 1. Probit
uses the normal distribution and its marginal effects also gives the probabilistic change. As the unit
increase in the education level of an individual will decrease the probability of individual being a
smoker by 0.0207 on an average. Also, the unit increase in the age of an individual will decrease the
probability of an individual being a smoker by 0.00477 on an average. Correspondingly, the unit
increase in the income level and price of cigarettes will increase the probability of an individual
being a smoker by 0.00000106 and 0.0058 respectively on an average.
Next we will analyze the data by using LPM model , the results are tabulated in table 2 column 3.
The results are interpreted as
i. The unit increase in the education level of an individual will decrease the probability of smoking
by 0.0206131.
ii. The unit increase in the age of an individual will decrease the probability of smoking by 0.00472.
iii. The unit increase in the income level of an individual will increase the probability of smoking by
0.00000106.
iv. The unit increase in the price of cigarette will decrease the probability of smoking by 0.00513 on
an average.
Comparison of three models can be done by using different criteria like value of AIC, BIC and Likelihood.
LPM model is not used over logit and probit due to its limitations and therefore it possesses some
problems:
1. Non – Normality of the disturbance term ui.
2. Heteroscedastic Variance of the disturbance term.
3. E[Yi|Xi] sometimes doesn’t lie between 0 and 1.
4. Value of R-square is of limited value.
The AIC value for probit is 1551.1762 which is less than the value (1551.6817) of AIC for logit. Similarly,
the BIC value (1576.6099) for probit is less than the BIC value (1577.1154) for logit. The likelihood value
is highest in case of probit model i.e. -770.5881. Also the pseudo R-square is 0.029 and .0301.Therefore,
probit is preferred over logit.
In the next model we added two more new variables age*education and education*income to see there
effect on person smoking pattern. It is seen that by adding this variable the value R-square increased to
0.039 that implies this model is better fit. Both logit and probit results are tabulated in table 3 . These
results are interpreted as:
i. The unit increase in variable age*education on an average will increase the probability of being
smoker by 0.000481.
ii. The unit increase in variable education*income will decrease the probability of being smoker by
0.00000015 on an average.
The probit model is preferred over logit model as pseudo R-square is higher in that case.
APPENDIX
Question 1.
predict yhat
Question 2.
margins,dydx(*) atmean
margins,dydx(*) atmean
estimates table blogit bprobit bols, stats(chi2 df N aic bic rank ll)