Professional Documents
Culture Documents
EXPENDITURES
Author(s): Chan Shen
Source: The Review of Economics and Statistics , March 2013, Vol. 95, No. 1 (March
2013), pp. 142-153
Published by: The MIT Press
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to The
Review of Economics and Statistics
correction
Abstract—This paper studies three interrelated health care decisions: insur for sample selection (Heckman, 1976, 1979).
ance, utilization, and expenditures. The model treats insurance as an
endogenous variable with respect to both utilization and expenditures,
An alternative approach is to use a two-part model (Duan
et al., 1983, 1984, 1985).
addresses potential selection issues, and takes into account that the deci Both of these may be problem
atic:
sions to use health care and the level of treatment are determined by different the Heckman correction approach can be sensitive to
decision makers. We employ semiparametric methods to avoid making dis
the distributional assumptions on error terms, while the
tributional assumptions. Using the Medical Expenditure Panel Survey 2005
two-part model approach also makes implicit distributional
data, the semiparametric approach predicts insurance to increase the level
assumptions
of expenditures by 48%, a number in accord with an important experimental (Puhani, 2000). The literature addressing health
study and less than half that obtained using parametric methods.
economics and economics in general does not provide a the
oretical foundation or justification for these distributional
I. Introduction assumptions. Moreover, if incorrect, they can result in incor
rect inferences and policy conclusions with respect to health
care decisions.
A major health
is the care
growing policy issue
population in insurance.
without the UnitedTheStates
key today Yet another challenge is the complicated nature of the
questions are: How does health insurance coverage affect the decision-making process. In health care, both the patient
likelihood an individual seeks medical care? and How does and the doctor are involved in making decisions. The patient
health insurance affect health care expenditures? decides whether to visit a doctor (or, more generally, a health
There are many empirical challenges in studying people's care provider), and then the patient and doctor jointly decide
health care decisions. An individual's decision about whether what treatment the patient will have. These decisions are
to use health care may depend on his or her insurance cov
interrelated. Some papers deal with the two-part decision
erage. The level of use likely also depends on whether the
making process in health care utilization (Newhouse, 1993;
individual has insurance. However, because insurance isMullahy,
a 1998), but none addresses the whole process of
choice variable for the individual, we must allow the possibil
insurance choice, utilization, and expenditure level.
ity that this variable is endogenous. For example, people who This paper contributes to the literature by taking into
account the interrelated nature of health care decisions and
have a greater need for health care have more incentive to buy
health insurance. Some papers deal with this endogeneity by using a semiparametric approach to address the empirical
using instrumental variables (Vera-Hernandez, 1999; Holly,
challenges. We study three health care decisions: insurance
Gardiol, & Huguenin, 2002; Wooldridge, 2002); others use
coverage, utilization, and the level of expenditures. Using
experimental data to avoid this problem (Manning et al.,
the Medical Expenditure Panel Survey (MEPS) 2005 data,
1987; Newhouse & Insurance Experiment Group, 1993).
we formulate and estimate a model for these three health
However, instruments that are correlated with insurance cov care decisions. Because there is not a strong justification for
erage but not with use are difficult to find. Experimental data
normality assumptions underlying a traditional parametric
are scarce and often out of date. For example, the RAND formulation, we employ a semiparametric approach in which
Health Insurance Experiment, which remains the largest these assumptions are not made. As an additional advan
health policy study in U.S. history, started in 1971 and lasted
tage to a semiparametric approach, since marginal effects
for 15 years (RAND, 1974-1982). The structure, practice, in general will not be constant in nonlinear models, we will
and philosophy of medicine have changed dramatically sincereport the impact of changing a variable of interest at sev
the 1980s, as has the insurance industry. eral different points in its distribution. The semiparametric
Another empirical challenge lies in expenditure decisions,approach will also allow greater flexibility in the pattern of
where we observe positive expenditures only from individ these effects than in the parametric case. Nevertheless, as a
uals who decide to see a doctor. One standard parametric
convenient benchmark, we also estimate the model using a
approach deals with this problem by making distributional
standard parametric approach.
assumptions about error terms and then using a HeckmanWhile the focus of this paper is on health care decisions,
the methods used would also apply to other endogenous treat
ment models. For example, in labor economics, a woman's
Received for publication April 2,2009. Revision accepted for publication
May 2, 2011. fertility and marriage decisions, the decision to join the
* Georgetown University. workforce, and wage level have a similar structure.
I thank Roger Klein, Carolyn Moehling, John Landon-Lane, Francis Vella,
the editor and the referees for all their helpful comments and suggestions. I
The paper is organized as follows. Section II introduces
also thank Louise Russell and Usha Sambamoorthi for helpful discussions. the model and explains the parametric and semiparametric
I have also benefited from comments at various seminars. All mistakes are
approaches; section III describes the data set; section IV
mine.
A supplemental appendix is available online at http://www.mitpress
gives the main results; and section V provides conclusions,
journals.org/doi/suppl/10.1162/REST_a_00232. discussions, and future research directions.
II. The Model To avoid making strong distributional assumptions that are
hard to justify, in this paper we employ a semiparametric
We study a set of three equations to examine the effects of
method to estimate the three health care equations discussed
different factors on health care decisions: health insurance,
above. Indeed, we will find that standard parametric dis
utilization, and expenditures. The first equation deals with
tributional assumptions (e.g., joint normality) do not hold.
the health insurance choice. Let I be an indicator of whether
Nevertheless, as a convenient benchmark, we also provide the
an individual selects private health insurance coverage. In the
parametric formulation and results. There are many methods
model, an individual selects insurance if the net value to so
for estimating the parametric model. To make the role of the
doing, V[ — £/, is greater than 0. With V/ determined by a set
parametric assumptions transparent, we estimate the para
of exogenous variables X/ and 1 {•} as an indicator function,
metric model in a manner that parallels the semiparametric
the model is as follows:
approach.
I = 1{V7 > 6/}, where V/ = X/ß/.
A. Parametric Model
The second equation describes the decision to seek health
care. Let A be an indicator of whether an individual seeks In the parametric model, we assume that the error terms
access to health care from a doctor or other health carein the system of three equations follow a trivariate normal
providers, and let XA be a set of exogenous variables distribution.
that A two-step estimation method is then employed
determine the net value of utilizing health care. Then: to estimate the three equations. In the first step, the insurance
and utilization decisions are jointly estimated by maximum
A = 1 {Va+IQa > sA}, where VA = XA$A. likelihood (bivariate probit).
To identify the parameters without relying on nonlinear
Notice that the insurance coverage enters this utilization
ities, we require restrictions on the model. The insurance
(access) equation. There is a vast literature about the effects
equation will depend on only exogenous variables, Xi, while
of moral hazard and adverse selection (Arrow, 1963;the Roth
access decision will depend on exogenous variables, XA,
schild & Stiglitz, 1976; Chiappori & Salanie, 2001 ; Cardon &
and whether the individual has insurance. In this triangular
Hendel, 2001). On the one hand, people who have insurance
system of binary equations, the insurance equation is identi
are much more likely to use health care than their uninsured
fied, as it is essentially a reduced form. However, to identify
counterparts. On the other hand, people who have greater
the access equation, we impose exclusion restrictions on it
demand for health care (e.g., those with high comorbidity
(we discuss these in section III).
levels) may have more incentive to obtain insurance cover
In addition to the parameters in the joint model for the two
age. Consequently in our estimations, we will use methods
decisions, the likelihood depends on the correlation between
that deal with this endogeneity issue. the errors. A nonzero correlation between the two error terms
The last equation explains the level of expenditures.
would indicate the endogeneity of insurance with respect to
Denote Ye as the log of level of expenditures and XE
theas a
utilization decision. As will be described below, we find
set of exogenous variables that affects expenditures forthis indicorrelation to be small in absolute magnitude and not
viduals who access health care services. Then the model is
statistically different from 0.
given as
In the second step, we estimate the expenditure equation
by employing a Heckman correction (Heckman, 1976; Lee,
Yß = Xe$e + I&E + m : A = 1.
1982) that controls for both sample selection and endogene
An individual incurs positive expenditures only if a visit ity. To simplify this correction, we employ a form for it that
is made. The patient decides whether to visit a doctor, and is applicable when, as was found empirically, utilization and
then a joint decision is made by both the doctor and the insurance errors are not correlated.1 For individuals who use
patient. We address this two-part decision-making process by health care, recall the form of the expenditure model in the
separating the two equations and allowing them to have differ previous section. With u as the error term in the log expen
ent explanatory variables and parameters. Again, insurance, diture model and denoting Xs as the set of all the exogenous
health care, and the individual's health status are interrelated. variables in the system of three equations, for d e {1,0},
define
Insurance coverage is included in this model because it may
affect the patient's and doctor's joint decision about treatment
plans. For example, insured people are much more likely to \dGd (VA, V',) = E(u\Xs,A = 1,1 = d).
buy brand-name medications instead of their generic counter
In a parametric model with jointly normal errors, the G
parts. There could also be an adverse selection problem here,
functions above are known and the Xs are parameters whose
because people who are less healthy might have more incen
values are unknown. Typically the above expectations are not
tive to purchase insurance. Hence, our model will account for
the interrelations between these variables and will employ
1 As discussed in the next section, in a semiparametric formulation, we will
estimation methods that deal with both sample selection and not need to make any assumptions on the functional form of this correction
endogeneity issues. factor.
Ye = Xe$e + + '^dGd In this form, not only is the function F\ left unspecified
but the model also permits very flexible interactions between
+ u*d : A = 1,1 = d,Uj = u - ~kdGd (VA, V».
errors and the index.
In some problems, a single index may not adequately
By construction, the conditional expectation of the recentered
describe the underlying behavior of interest. Given that the
error is 0: E (u*d\Xs,A = 1,7 = d) = 0. access model is not linear, when insurance is endogenous with
Provided that the above equation is identified and respect
joint nor
to access, the access probability depends not only on
mality holds, OLS estimation provides consistent estimates.
its own index but also on the exogenous index driving th
To identify it without relying on nonlinearities insurance
in the G decision. In this case, a double index model woul
controls, we impose exclusion restrictions on the exogenous
be appropriate:
variables XE that enter this equation. (Detailed discussions
E(Y\X)
about these and other restrictions are provided in section III.) = E{Y\VU VA) = F2(V,, Va),
We conclude this discussion about the parametric model
where V/, VA are now two indices. Again, there are meth
by emphasizing the importance of its restrictive parametric
assumptions. Both the bivariate probit specification ods
and for
thereliably estimating the above expectation under this
form of the correction term depend on the (joint) double index structure. As discussed below, estimators for
normal
ity assumption. If this assumption is incorrectly imposed, and double index models will be employed here
both single
the resulting estimator is typically inconsistent. In Throughout,
the next we use the notation E(Y\V) to denote an esti
section, we propose a semiparametric approach that mated conditional expectation for Y conditioned on V, where
does not
make distributional assumptions. V may be a single index or a vector containing two indices.
When this estimated expectation is evaluated at an estimat
of V, as we do below, we will write Ê(Y\V).
B. Semiparametric Model
Before continuing, it is important to discuss identification
While the semiparametric model generalizes theof both index parameters and marginal effects of interest
paramet
Recall that
ric model, it does retain a parametric (index) restriction to in the parametric case, the original parameter
are identified
ensure that the estimator "works well" in moderately sized under exclusion restrictions. However, as in all
samples. To illustrate this restriction, return to thenonlinear
insurancemodels, parameters do not translate directly into
model. In a commonly employed probit specification, marginal effects, which are of primary interest. Margina
effects are recovered by comparing estimated probabilitie
P(/ = 1|X) = 4>(Z/ß/), based on parametric distributional assumptions. In the semi
parametric case, however, it is well known in the literature
where the function $ is the cumulative distribution func that the index parameters can at most be identified up to loca
tion for the model's standard normal error component, tione/. and scale. For simplicity, we illustrate the issue for the
insurance
In a semiparametric formulation, this function need not be decision. As will be discussed below, the estimates
are based in part on an estimate of the probability:
specified and indeed can be estimated from the data along
with parameters of interest. In such a formulation, the model
Pr{l = l|X;ß,) = Pr(I = 1| a + fc(X/ß/)),
is semiparametric because it makes no parametric assump
tions on the error distribution but does assume a parametric where a and b ^ 0 are constants.
index, V} = X/ß/. This index, V}, need not be linear, but it
The probability does not depend on a or b. Therefore, only
is important that it has a parametric form. In a more general
ratios of index parameters are identified. Nevertheless, th
nonparametric formulation, we might write
scaled parameters enable us to recover probabilities and,
hence, marginal effects of interest.
P(I =l\X) = F(XxX2, ...,Xk) = E(I\X).
Unlike the insurance decision, the utilization decision
depends on the endogenous insurance decision with coef
However, when the dimension of X is large, it is difficult
ficient 0A- Although this parameter is not identified, we
to "reliably" estimate the above probability (expectation).2
can recover the corresponding marginal effect by looking
Index restrictions serve to keep the relevant dimension of the
at an appropriate probability change. One possibility is to
problem small and thereby improve the finite sample behavior
report the difference in access probabilities conditioned on
insurance and no insurance, which are estimable semipara
2 If X is continuous, then the convergence rate of the estimated expectation
metrically, as we discuss below. It is easy to justify this
to the truth becomes slower as the dimension of X increases. If X is discrete,
there may be few observations to estimate E(Y\X) at each value of calculation
X. if the insurance error does not depend on the
Yrs(i) = 1{A(0 = r,m = s}, where d e {1,0}. Notice that this adjustment is similar to
that in the parametric case, but now we do not make any
with the corresponding probabilities: assumptions on its functional form here in the semiparametric
formulation. With c as a constant, let XEfiE = Xcßc + c, and
P„(i) = Pr(rrs(i) = l|VA(0,V/(0). rewrite the expenditure equation as
of this estimator.
Since the control functions are unknown, we extend Robin We extend this method to estimate both
the constant
son's differencing method (Robinson, 1988) to eliminate theterm and the marginal effect of the endogenous
unknown control functions: insurance.
To set the intuition for the proposed estimator, notice that
YE-E(YE\A = \,I = d,V0) if there were no selection issues, we could proceed to develop
= [Xc — E(XC\A = 1,/ = d, Vomc + u*. an IV estimator for these parameters. To deal with selection,
with PA = Pr(A — 1|V0), for the error in the expenditure
With * denoting a differenced variable, we can rewrite the
equation,
above equation as
E(u\Vo) = 0 = PaE(u\A = l,V„)
F* = X*ßc + u*.
+ (1 - Pa)E(u\A = 0, V0).
Before proceeding to estimate the above differenced
For such individuals with an access probability of 1 (PA = 1),
model, several identification issues need to be discussed.
there would not be a selection problem in that from above,
First, it is clear that the constant term and the insurance vari
able disappear from the model. Second, as in the parametric E(u\A = l,Vo)=E(u\Vo)=0,
model, we require additional identifying restrictions. To this
end, we impose the same exclusion restrictions as in the para and we could proceed with IV estimation, employing
metric model discussed above. To see that these restrictions
are needed, suppose that there are no variables excluded from E(J\A = 1, V0) = Pr (/ = 1|A = 1, V0)
Xc that appear in the indices Vj and VA. Without such restric
as an instrument for I. Two implementation issues now need
tions, it will be possible to take linear combinations of the Xc
variables and reproduce one of the indices. to be solved. First, because the above probability is unknown,
Replacing true expectations and index parameter values require a semiparametric estimate of this function as a fea
we
with their estimates, we first use OLS to estimate the expen sible instrument. Second, we need an appropriate definition
diture equation and get consistent estimates and residuals.of a high-access probability.
Second, employing squared residuals, in a semiparametric With a > 0, define a high-probability set as one for which
regression, we estimate the variance for the error condi PA > 1 — N~". In implementing this rule, we use estimated
tioned on the X variables through the two indices. We then semiparametric probabilities described in the appendix.6 In
employ these conditional variances in a GLS approach tosetting a, as in A&S, there is a bias-variance trade-off that
obtain the final results. Notice that the GLS estimator deals guides its selection. If a is set very high, then the bias will be
with the heteroskedasticity but not the first-stage estimation very low. However, the sample size available for IV estima
uncertainty. This uncertainty comes from the fact that esti tion on the high-probability set will then effectively be very
mated expectations are employed in place of true expectations small, resulting in a high variance. Similarly, if a is set too
and estimated index parameters are substituted for the true low, the variance can be made small, but the bias will not van
ones. It can be shown that the estimated expectations may be ish sufficiently fast. To set a, let S be a smoothed indicator
taken as known and do not affect standard errors for the esti of the form in A&S that is 0 unless observations are in the
mated expenditure parameters, while the uncertainty from high-probability set. Then that paper shows that the bias will
estimated index parameters must be taken into account. In vanish appropriately fast if
particular, as in standard parametric sample selection mod
els, the covariance matrix for these second-stage expenditure B = Nl/2\E [uAS] /Je (AS2)I 0.
estimates will depend on the covariance matrix for the first
stage, joint-binary estimates. The reported standard errorsThe value of a must be set large enough so that this bias
here appropriately reflect this dependence (see the appendix). factor converges to 0 but small enough to keep the variance
Notice that in the above approach we cannot directly esti of the estimator low. Letting Z* be the instrument with its
mate the impact of insurance coverage on expenditures (0g). mean removed, Klein, Shen, and Vella (2012) show that the
Therefore, we next describe a strategy for indirectly obtain following similar bound holds:
ing this marginal effect. Having described an estimator for
the coefficient on Xc above, define residual expenditures:
B = N1/2\E [uASZ'*] /y/E(AS2Z*2)\ -> 0.
R = YE-Xc£c = c + QEI + u. The choice of a is dictated by the same considerations as in
A&S. To set a in this application and in the Monte Carlo
Heckman (1990) developed a method for estimating the con
experiment described below, we employ an upper bound for
stant term in a semiparametric sample selection model, which
can be applied if we did not have the endogenous insurance 6 Tail assumptions similar to A&S enable us to keep index density denom
variable. Andrews and Schafgans (1998; hereafter A&S) sub inators from being too small while remaining in a high probability set. The
sequently established the large-sample properties of a variant appendix develops an appropriate trimming strategy.
B that can be estimated.7 To balance bias and variance, The keyweendogenous variables that we seek to explain are
then select the smallest value of a such that this bound tends coverage, utilization of the health care system, and
insurance
to 0. (a is approximately .4 in this application.) the level of expenditures. The insurance variable here is an
To get some sense as to how well the method indicator
described of whether the individual has private health insur
above works in practice, we conduct a small-scale
ance Monte
coverage. It would be important to take the generosity
Carlo experiment where we find that this method of performs
insurance plans into account in terms of copayments and
deductibles.
very well. We generate data from the following design, whichHowever, such information is not available in the
has the same structure as our model: MEPS data used here. The expenditures are the total amount
paid for health care services, including both out-of-pocket
7 = UV, > 6/} :Vi=Xi+X2+X3 + l, payments and payments by insurance but not including pay
A = \{Va+2I>za}:Va=X,+X2, ments for over-the-counter drugs. Note that the expenditures
are derived from the MEPS Household and Medical Provider
Ye = 41 + 2X\ + 1 + u : A = 1,
Components. Since both the health care providers and the
where the Xs are all distributed as normal and the errors are are surveyed, it is more reliable than typical sur
consumers
veys. We
jointly normal with nonzero correlations between them. define utilization of the health care system as having
The
sample size we use is n = 2000, and the number positive
of Montehealth care expenditures.10
Carlo replications is 1,000. As we can see, the trueThe
0£explanatory
= 4. variables are demographics, socioeco
nomic
The average 0£ from the Monte Carlo is 4.02, and the status, and health-related characteristics. The demo
standard
deviation is 0.14. In other words, the percentage biasgraphics
is almostare age, gender, race/ethnicity (white, nonwhite),
0, and the variance is also small, taking into account that the (married, other), family size, and region (North
marital status
truth is 4. east, Midwest, South, West). Years of education, income,
occupation class, and industry insurance rates are included
III. Data as socioeconomic characteristics. We use an indicator for
white-collar jobs (professional, management, business, and
The Medical Expenditure Panel Survey
financial(MEPS)
operations) is an
to reflect the impact of occupation and
ongoing nationally representative survey of the U.S.
the percentage civil
of people having insurance in each indus
ian noninstitutionalized population started
try in the Kaiser studythe
in 1996 by as a variable to reflect the impact
U.S. Department of Health and Human ofServices. Surveys
industry (Kaiser Family Foundation, 2006). The health
of households, employers, and medical providers areare
related characteristics con
number of comorbidities, presence
ducted to collect information on health care illnesses,
of mental expenditures
and whether they are current smokers. All
and health insurance coverage, as well individuals
as demographic and they had any of a number of
was asked whether
socioeconomic characteristics.8 conditions. The comorbidity variable then counts the follow
We consider the subsample of obese adults between theing health problems: Alzheimer's disease, asthma, arthritis,
ages of 22 and 64 who are employed. People who have a cancer, emphysema, diabetes, heart disease, high blood pres
body mass index (BMI) greater than 30 are considered obese sure, osteoarthritis, and stroke. This variable is included to
(Centers for Disease Control and Prevention, 1985-2007).capture differences in people's physical health status and is
We focus on the obese population because this is a growing
often employed in health studies (Klabunde et al., 2000).
population that might have different health care needs and Presence of mental illnesses is an indictor of whether an
patterns than other groups do. We also focus on individuals individual has depression, anxiety, or schizophrenia.
who are employed, because in the United States, insurance is Recalling the exclusion restrictions discussed in the pre
often linked with employment. In fact, health insurance plansvious section, we use the following restrictions in this paper.
are often offered by employers. We exclude individuals who The industry insurance rate and occupation are excluded
have public insurance, because having public insurance is from both utilization and expenditure equations, and marital
not expected to be a consumer's choice for working adultsstatus and region are excluded from the expenditure equa
between the ages of 22 and 64. The final sample consists of
tion. As is known in the literature, occupation and industry
2,771 individuals.9 have important effects on people's insurance (Kaiser Family
Foundation, 2006). In the United States, insurance plans for
7 Assume that the error, u, has finite r absolute moments. Then, Klein et al.
working adults often come as part of the compensation pack
(2012) show that an upper bound on the bias is given by
age. Different jobs might offer varied choices of insurance
B = Ar1/2|Ara(r-1,/r£ [ASZ*] I^E(AS2Z>2)\. packages at different prices. Hence it affects the insurance
decision by affecting the cost of buying insurance. However,
In our application, we replace the expectations with sample averages.
8 We note that the semiparametric model can be less sensitive to report
ing errors than parametric models (Hausman, Abrevaya, & Scott-Morton,
1998). 10 We use this indicator instead of the self-reported health care utiliza
9 Other exclusion criteria include individuals who died during the year and tion because the self-reported utilization may suffer from recall errors,
missing values on the exogenous variables used. Various robustness checks whereas the expenditure data were collected by both sides and hence are
indicate that there are no selection issues in this sample. more reliable.
and patient jointly decide on the level of treatment, with the $1,000-$2,000 443 16.0
$2,000-$5,000 607 21.9
doctor being the main decision maker. Once a patient decides $5,000-$ 10,000 265 9.6
to visit a health care provider, we assume that the prescribed Over $10,000 and 204 7.4
Education
treatment does not depend on marriage or region. Hence, the
Less than high school 471 17.0
level of expenditures may not depend on these variables. We High school 946 34.1
recognize the difficulty in finding appropriate restrictions for College or higher 1,354 48.9
the type of model that we estimate but view the exclusion Age
Below 40 1,022 36.9
restrictions discussed above as being plausible. 40-49 856 30.9
Some summary statistics of the data are provided in table 1. 50 and over 893 32.2
Income
Note that the continuous variables are categorized into groups
Less than $20,000 781 28.2
to show the distribution of those variables. However, they $20,000-$30,000 569 20.5
remain continuous in estimating the model. Of the 2,771 $30,000-$50,000 794 28.7
Over $50,000 and 627 22.6
individuals in our data set, 488 (18%) are uninsured and 262
Gender
(10%) have no utilization. The level of expenditures for those Female 1,460 52.7
who use health care is very skewed. About 40% of them have Male 1,311 47.3
Race
expenditures of less than $1,000, while 8% of them incur 56.0
White 1,551
more than $10,000 in health care expenditures. Nonwhite 1,220 44.0
Number of comorbidities
0 1,352 48.8
IV. Results 1 881 31.8
2 or more 538 19.4
Mental illness
Before we discuss the results, we recall that parameters are
Yes 540 19.5
identified only up to location and scale in the semiparametric
No 2,231 80.5
case. After estimation, we normalize the parameterCurrent
of educa
smoker
Yes 542 19.6
tion to the corresponding parametric estimate for presentation
No 2,229 80.4
purposes.12 We examine both parametric and semiparamet
Marital status
ric results for the three decisions. We compare not only
Married the 1,714 61.9
1,057 38.1
normalized estimates and average marginal effectsOther
but also
Family size
patterns of marginal effects calculated at different 1-2 levels of 1,219 44.0
certain continuous variables of interest. Most of the normal 3-4 1,069 38.6
5 or more 483 17.4
ized estimates and average marginal effects are close between
Region
the two approaches for insurance and utilization decisions. Northeast 381 13.7
However, the two estimation methods yield very different Midwest 611 22.0
South 1,206 43.5
estimated effects of insurance on expenditures. Furthermore,
West 573 20.7
the semiparametric approach gives richer patterns of marginal
Industry insurance rate
effects. Detailed results are provided in tables 2 to 5. Less than 75% insured 519 18.7
As shown in table 2, for the insurance decision, both the 75%-90% insured 1,326 47.9
Over 90% insured 926 33.4
normalized estimates and the average marginal effects are Occupation
similar for parametric and semiparametric approaches. The White collar 830 30.0
Other 1,941 70.0
biggest marginal effect on the probability of having insurance
comes from marital status, with the p- values of the coefficient
on married in both approaches being less than 0.01 : marriage
Estimate (SE) p-value ME (percentage points) Estimate (SE) p-value ME (percentage points)
Estimate = parameter estimate; SE = standard error; ME (percentage points) = average marginal effect in percentage points. Expenditure and income are in $1,000 and are logged. Ref
West. Marginal effects of continuous variables are calculated by moving everyone in the sample above by one unit, except income and industry insurance rate, which were moved by 10%
effects of discrete variables are calculated by moving everyone in the sample from 0 to 1.
Estimate = parameter estimate; SE = standard error; ME (percentage points) = average marginal effect in percentage points. Expenditure and income are in $1,000 and are logged. Reference grou
West. Marginal effects of continuous variables are calculated by moving everyone in the sample above by 1 unit, except income and industry insurance rate which were moved by 10% and 5%, respec
effects of discrete variables are calculated by moving everyone in the sample from 0 to 1.
Estimate = parameter estimate; SE = standard error, ME (%) = average marginal effect in percentages. Expenditure and income are in $1,000 and are logged. Reference group for region = West. M
of continuous variables are calculated by moving everyone in the sample above by one unit, except income and industry insurance rate, which were moved by 10% and 5%, respectively. Marginal
variables are calculated by moving everyone in the sample from 0 to 1.
More than 90% insured 1.41 1.37 ditures. Results are presented in table 4. Note that most of the
- -
ME on marginal effects
insurance are the same as the coefficient estimates here.
(percentage poin
on utilization (percentage points)
effects of education are calculated With the exception of the impact of insurance, estimates in
of industry insurance rate
the two approaches are similar. Both the numberare
of comor calc
o -6 -4 -2 0 2 4 6 8 10