0% found this document useful (0 votes)
32 views6 pages

Health Insurance Enrollment Regression Analysis

This document discusses a health economics study that uses data on health insurance enrollment and doctor visits to estimate a population regression function. It provides details on the least squares optimization problem, including derivations of the first order conditions and implications for the residuals. The document is technical and mathematical in nature.

Uploaded by

ak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

Health Insurance Enrollment Regression Analysis

This document discusses a health economics study that uses data on health insurance enrollment and doctor visits to estimate a population regression function. It provides details on the least squares optimization problem, including derivations of the first order conditions and implications for the residuals. The document is technical and mathematical in nature.

Uploaded by

ak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ECON 485 SECTION A/ ECON 585 SECTION A winter 2020 – PROBLEM SET 1

Question 1 (60 pts.)

A health economist uses data on enrollment in a health insurance plan, denoted by the binary variable
𝐷𝑖 , and the number of doctor visits, 𝑌𝑖 , to estimate the population regression function (PRF) 𝑌𝑖 = 𝛽1 +
𝛽2 𝐷𝑖 + 𝑢𝑖 by using the following sample:

𝑖 1 2 3
𝐷𝑖 0 1 1
𝑌𝑖 3 3 5

The benchmark group is individuals not enrolled in a health insurance plan.


̂1 , 𝛽
a) (2 pts.) Write down the function 𝑆(𝛽 ̂2 ) that minimizes the sum of squared residuals, where
̂ ̂
𝛽1 is the sample estimator of 𝛽1 and 𝛽2 is the sample estimator of 𝛽2 .
̂2 ) = Σi 𝑢̂𝑖 2
̂1 , 𝛽
min 𝑆(𝛽
̂1 ,𝛽
𝛽 ̂2

̂1 , 𝛽
Inserting the SRF yields: 𝑆(𝛽 ̂2 ) = min Σi (𝑌𝑖 − 𝛽
̂1 − 𝛽
̂2 𝑋𝑖 )2
̂1 𝛽
𝛽 ̂2

1 b) (2 pts.) State the dependent, explanatory and exogenous variables of the optimization problem
from a statistical standpoint.

Dependent variable: Σi 𝑢̂𝑖 2


̂1 , 𝛽
Explanatory variables: 𝛽 ̂2

Exogenous variables: 𝑋𝑖 and 𝑌𝑖 for all 𝑖

c) (2 pts.) Briefly describe the nature of the optimization problem of the method of least squares.

To have the regression best fit the data, minimize the sum of squared of squared residuals, Σi 𝑢̂𝑖 2 ,
(dependent variable) by choosing the optimal quantities of the sample estimators of the intercept and
̂1∗, 𝛽
the slope coefficients, i.e. 𝛽 ̂2∗ (explanatory variables), as a function of the numerical values of the
exogenous variables available in the data, 𝑋𝑖 and 𝑌𝑖 .
̂1 and 𝛽
d) (1 pt.) Briefly explain why 𝛽 ̂2 are treated as constants but 𝐷𝑖 and 𝑌𝑖 are treated as
variables when expanding the summation sign Σ𝑖 (𝑌𝑖 − 𝛽 ̂1 − 𝛽̂2 𝐷𝑖 )2 .

When expanding the summation sign, 𝛽 ̂1 and 𝛽̂2 are treated as constants as their respective numerical
values do not vary across observations, while 𝐷𝑖 and 𝑌𝑖 are treated as variables because they take on
different values across observations.
̂1 ,𝛽
𝑑𝑆(𝛽 ̂2 )
e) (3 pts.) Show that ̂1 − 𝛽
= −2(Σ𝑖 𝑌𝑖 − 𝑛𝛽 ̂2 Σ𝑖 𝐷𝑖 ) = 0 by carefully illustrating the chain
𝑑𝛽̂1
rule and all relevant rules of the summation operator. Do NOT skip steps.

̂1 , 𝛽
𝑆(𝛽 ̂2 ) = min Σi (𝑌𝑖 − 𝛽
̂1 − 𝛽
̂2 𝐷𝑖 )2
̂1 𝛽
𝛽 ̂2

̂2 𝐷1 )2 + (𝑌2 − 𝛽
̂1 − 𝛽
= (𝑌1 − 𝛽 ̂2 𝐷2 )2 + (𝑌3 − 𝛽
̂1 − 𝛽 ̂2 𝐷3 )2
̂1 − 𝛽
̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 ) ̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 ) 𝜕𝑢
̂1 ̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 ) 𝜕𝑢
̂2 ̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 ) 𝜕𝑢
̂3
The chain rule: = + + ̂1 − 𝛽
where 𝑢̂𝑖 = 𝑌𝑖 − 𝛽 ̂2 𝐷𝑖 .
𝜕𝛽̂1 𝜕𝑢̂1 ̂1
𝜕𝛽 𝜕𝑢̂2 ̂1
𝜕𝛽 𝜕𝑢̂3 ̂1 ,
𝜕𝛽

̂1 , 𝛽
𝜕𝑆(𝛽 ̂2 )
̂1 − 𝛽
= 2(𝑌1 − 𝛽 ̂2 𝐷1 )(−1) + 2(𝑌2 − 𝛽
̂1 − 𝛽
̂2 𝐷2 )(−1) + 2(𝑌3 − 𝛽
̂1 − 𝛽
̂2 𝐷3 )(−1) = 0
𝜕𝛽̂1
̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 )
Collecting terms yields: ̂1 − 𝛽
= −2Σ𝑖 (𝑌𝑖 − 𝛽 ̂2 𝐷𝑖 ) = 0.
𝜕𝛽̂1

̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 )
Distributing the summation operator yields: ̂1 − 𝛽
= −2(Σ𝑖 𝑌𝑖 − 𝑛𝛽 ̂2 Σ𝑖 𝐷𝑖 ) = 0, where
𝜕𝛽̂1

̂1 is an additive constant and 𝛽


𝛽 ̂2 is a multiplicative constant.

f) (2 pts.) Show that your answer in e) implies that Σi 𝑢̂𝑖 = 0, where 𝑢̂𝑖 is the residual of obs. 𝑖.

2 ̂1 + 𝛽
The SRF 𝑌𝑖 = 𝛽 ̂2 𝐷𝑖 + 𝑢̂𝑖 implies that 𝑢̂𝑖 = 𝑌𝑖 − 𝛽
̂1 − 𝛽
̂2 𝐷𝑖 . Substituting this expression into the
̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 )
FOC in part e) yields: ̂1 − 𝛽
= −2Σ𝑖 (𝑌𝑖 − 𝛽 ̂2 𝐷𝑖 ) = −2 Σi 𝑢̂𝑖 = 0.
𝜕𝛽̂1

g) (1 pt.) If the method of least squares implies that Σi 𝑢̂𝑖 = 0, briefly explain why it is
inappropriate to min Σi 𝑢̂𝑖 .

The objective of squaring each residual is to penalize larger distances from the regression line,
irrespective whether these distances are positive or negative. If we were to min Σi 𝑢̂𝑖 , positive distances
will be offset be negative distances. This would be equivalent to entirely ignoring the distances for those
observations akin to excluding them from the dataset.

h) (2 pts.) Briefly explain whether Σi 𝑢̂𝑖 = 0 implies that 𝐸(𝑢𝑖 ) = 0

Σi 𝑢̂𝑖 = 0 is a property of OLS implied by the FOC required to obtain the best fit of the data.

In contrast, 𝐸(𝑢𝑖 ) = 0 is an assumption imposed on the error term, 𝑢𝑖 , that may or may not hold for this
dataset.

i) ̂1 − 𝛽
(3 pts.) Show that Σ𝑖 (𝑌𝑖 − 𝛽 ̂2 𝐷𝑖 ) = 𝑛(𝑌̅ − 𝛽
̂1 − 𝛽 ̅ ) = 𝑛(𝑌̅̂ − 𝛽
̂2 𝐷 ̂1 − 𝛽
̂2 𝐷
̅ ) by carefully
illustrating all relevant rules of the summation operator used. Do NOT skip steps.
̂1 − 𝛽
In part e), we showed that Σ𝑖 (𝑌𝑖 − 𝛽 ̂2 𝐷𝑖 ) = Σ𝑖 𝑌𝑖 − 𝑛𝛽 ̂2 Σ𝑖 𝐷𝑖 . Then, multiple by 𝑛 to obtain
̂1 − 𝛽
𝑛
𝑛Σ𝑖 𝑌𝑖
the formula for an average: − 𝑛𝛽 ̂2 𝑛 Σ𝑖𝐷𝑖 = 𝑛(𝑌̅ − 𝛽
̂1 − 𝛽 ̂1 − 𝛽
̂2 𝐷
̅ ) = 0.
𝑛 𝑛

For each observation, the dependent variable can be decomposed in the predicted value 𝑌 ̂𝑖 (amount of
𝑌𝑖 explained by the regression) and the residual 𝑢̂𝑖 (amount of 𝑌𝑖 not explained by the regression) by
̂𝑖 + 𝑢̂𝑖, where 𝑌
𝑌𝑖 = 𝑌 ̂1 + 𝛽
̂𝑖 = 𝛽 ̂2 𝐷𝑖 .

Σ𝑖 𝑌̂𝑖 ̂𝑖
which is equivalent to 𝑌̅ = 𝑌̅̂ since
Σ𝑖 𝑌 𝑖 Σ𝑖 𝑢
Summing over observations and dividing by 𝑛 yields 𝑛
= 𝑛
+ 𝑛
,
Σ𝑖 𝑢̂𝑖 = 0.

j) ̂1 − 𝛽
(2 pts.) Briefly interpret 𝑌̅ = 𝛽 ̅ by making use of the relationship 𝑌̅ = 𝑌̅̂ .
̂2 𝐷

Given the estimated coefficients, the average share of individuals enrolled in a health insurance predicts
the average number of doctor visits.

If we did not establish the relationship 𝑌̅ = 𝑌̅̂ , we wouldn’t have been able to make the claim that this
relationship is due to the explanatory power of the regression.

3 n
k) (2 pts.) Calculate Σ𝑖=1 n
𝑌𝑖 , Σ𝑖=1 𝑌𝑖2, Σ𝑖=1
n n
𝐷𝑖 , Σ𝑖=1 𝐷𝑖2 , Σ𝑖=1
n
𝐷𝑖 𝑌𝑖 .

𝑖 𝑌𝑖 𝐷𝑖 𝑌𝑖2 𝐷𝑖2 𝐷𝑖 𝑌𝑖
1 3 0 9 0 0
2 3 1 9 1 3
3 5 1 25 1 5
Σ𝑖 11 2 43 2 8

̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 )
l) (4 pts.) Use your results from parts e) and k) as well as ̂1 − 𝛽
= −2Σ𝑖 (𝑌𝑖 − 𝛽 ̂2 𝐷𝑖 )𝐷𝑖 =
𝜕𝛽̂2
̂1 and 𝛽
0 to calculate the estimates of 𝛽 ̂2 .

̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 ) ̂1 ,𝛽
𝜕𝑆(𝛽 ̂2 )
FOCs become: ̂1 − 22 + 4𝛽
= 6𝛽 ̂2 = 0 (1) and ̂2 − 16 + 4𝛽
= 4𝛽 ̂1 = 0 (2).
̂
𝜕𝛽1 ̂
𝜕𝛽2

Solve the FOCs as a system of two equations with two unknowns.


̂1 − 6 = 0. Solve this resulting equation for 𝛽
Subtract equation (2) from (1) to obtain: 2𝛽 ̂1 : 𝛽
̂1 = 3.

̂1 = 3 into equation 2: 4𝛽
Substitute 𝛽 ̂2 − 16 + 4(3) = 0. Solve this resulting equation for 𝛽
̂2 : 𝛽
̂2 = 1.
̂1 is always equal to the conditional
To check your work: For a regression with a single binary regressor, 𝛽
mean of the dependent variable for the 𝐷𝑖 = 0 group and 𝛽 ̂1 + 𝛽
̂2 is always equal to the conditional
mean of the dependent variable for the 𝐷𝑖 = 1 group.

̂1 and 𝛽
m) (2 pts.) Briefly interpret the estimates of 𝛽 ̂2 .

̂1 = 3 states that those individuals without health insurance have 3 doctor visits.
𝛽
̂2 = 1 states that the possession of health insurance increases on average the number of doctor visits
𝛽
by 1 visit.

n) (2 pts.) Briefly explain how the PRF must be modified to estimate the price semi-elasticity of
demand for health insurance, where the change in the number of doctor visits, 𝑌, is measured
as a percentage.

𝑑 ln 𝑌𝑖
The PRF is modified to ln 𝑌𝑖 = 𝛽1 + 𝛽2 𝐷𝑖 + 𝑢𝑖 since 𝑑𝐷𝑖
is interpreted as a semi-elasticity and 𝑑 ln 𝑌𝑖 is
measured as a percentage change. Nonetheless, you have to be careful with the interpretation of 𝑑𝐷𝑖 as
it is not interpreted as a magnitude but as a switch in the status of possessing health insurance.
4 o) (2 pts.) Briefly explain why it is not possible to estimate the price elasticity of demand for health
insurance for this dataset.

A percentage change is impossible to measure for a qualitative variable such as 𝐷𝑖 as it measures a


switch in the status of possessing health insurance.

p) (4 pts.) Suppose that the true values of the population parameters are 𝛽1 = 3 and 𝛽2 = 2. For
each data point (𝑌𝑖 , 𝐷𝑖 ), calculate the deterministic component 𝐸(𝑌𝑖 |𝐷𝑖 ), the error term 𝑢𝑖 , the
̂𝑖 and the residual 𝑢̂𝑖 .
predicted value 𝑌

𝑖 𝐷𝑖 𝑌𝑖 ̂𝑖
𝑌 𝑢̂𝑖 𝐸(𝑌𝑖 |𝐷𝑖 ) 𝑢𝑖
1 0 3 3 0 3 0
2 1 3 4 -1 3 -2
3 1 5 4 1 5 0

The process of arriving at these answers is illustrated below for observation 2:

𝐸(𝑌2 |𝐷2 ) = 𝛽1 + 𝛽2 𝐷2 = 3 + 2(1) = 5


𝑢2 = 𝑌2 − 𝐸(𝑌2 |𝐷2) = 3 − 5 = −2
̂1 + 𝛽
𝑌̂2 = 𝛽 ̂2 𝐷2 = 3 + 1(1) = 4

̂2 = 𝑌2 − 𝑌̂2 = 3 − 4 = −1
𝑢

q) (2 pts.) Based on your answers in part o), briefly explain if you find evidence of relevant
variables being omitted from the PRF.
0+(−2)+0 2
The strict exogeneity assumption is violated since 𝐸(𝑢𝑖 |𝐷𝑖 ) = 3
= − 3 ≠ 0. This implies that
there are relevant omitted variable(s) from the PRF.

r) (4 pts.) In a single graph:


- Draw the sample regression function, the population regression function and all data points,
i.e. (𝐷1 , 𝑌1 ), (𝐷2 , 𝑌2 ) and (𝐷3 , 𝑌3 ).
̂𝑖 and
- For the data point (𝐷𝑖 , 𝑌𝑖 ), illustrate the decomposition of 𝑌𝑖 into its predicted value 𝑌
its residual 𝑢̂𝑖 .
s) (4 pts.) In a single graph:
- Draw the sample regression function, the population regression function and all data points,
i.e. (𝐷1 , 𝑌1 ), (𝐷2 , 𝑌2 ) and (𝐷3 , 𝑌3 ).
- For each data point (𝐷𝑖 , 𝑌𝑖 ), illustrate the decomposition of 𝑌𝑖 into its deterministic
component 𝐸(𝑌𝑖 |𝐷𝑖 ) and its error term 𝑢𝑖 ;

5
t) (4 pts.) Suppose that the sample 𝑣𝑎𝑟(𝛽 ̂2 ) = 6. Test the hypothesis that 𝐷𝑖 has a positive impact
on 𝑌𝑖 . In your answer, you must specify the appropriate hypotheses and statistical test,
demonstrate your calculations and provide a conclusion.

𝐻0 : 𝛽2 ≤ 0
𝐻𝐴 : 𝛽2 > 0
̂2 − 𝛽2
𝛽 ̂2∗
𝛽 ̂2
𝛽 1−0 1
𝑡= = = = = = 0.41
̂2 − 𝛽2 )
𝑠𝑒(𝛽 ̂2 )
𝑠𝑒(𝛽 2.45
̂ √ 6
√𝑣𝑎𝑟(𝛽2 ) 3−2
𝑁−𝐾

𝑡 = 0.41 < 6.314 = 𝑡𝛼=0.05,𝑁−𝐾=1

Note: If no level of significance is specified in the question, the default level of significance is 𝛼 = 0.05.

u) (2 pts.) Interpret your result in part t) without using any mathematical notation.

There is insufficient evidence to claim that health insurance coverage increases the number of doctor
visits. Despite that the coefficient is of the expected sign, the probability of making an error is larger
than the maximum probability of making an error, i.e. 5 %.
v) (2 pts.) Briefly explain the economic significance of your result in part t).

Having a health insurance does not have a discernible impact on health care utilization. For this reason,
one might be tempted to recommend that health insurance programs are not adopted.

w) (2 pts.) Briefly explain how the result in part q) casts doubt on your answer in part v).

The violation of the strict exogeneity assumption suggests that there could be relevant omitted variables
from the PRF. Their omission could lead to the incorrect magnitude and/or the incorrect sign of the
slope coefficient.

x) (2 pts.) Briefly explain if your results in part t) are qualitatively similar to those obtained in the
Oregon experiment with respect to the direction of the effect and its statistical significance.

6 The results are qualitatively similar to those in the Oregon experiment. The estimated slope coefficient
also indicates that having a health increases leads to a larger health care utilization (e.g. number of
doctor visit, likelihood of at least one doctor visit). However, the results are not statistically significant at
either the 1 % level or the 5 % level.

y) (2 pts.) Briefly explain if your results in part t) are qualitatively similar to those obtained in the
RAND experiment with respect to the direction of the effect and its statistical significance.

The results are qualitatively similar to those for outpatient care in the RAND experiment with respect to
the direction of the effect but not with respect the statistical significance.

The estimated slope coefficient also indicates that having a health increases leads to a larger health care
utilization (e.g. number of doctor visit, likelihood of at least one doctor visit). The results are statistically
significant at the 5 % level if the treatment group is holders of health insurance with a 25 % co-payment
rate and at the 1 % level if the treatment group is holders of health insurance with a 95 % co-payment
rate.

The largest impact is estimated for the treatment group with a free plan (0 % co-payment rate) but
surprisingly there is too much variability to detect a statistically significant effect.

You might also like