# A Flexible Instrumental Variable Approach∗

Giampiero Marra

Department of Statistical Science, University College London Gower Street, London WC1E 6BT

Rosalba Radice

Department of Health Services Research & Policy London School of Hygiene & Tropical Medicine Keppel Street, London WC1E 7HT

October 26, 2010

Abstract Classical regression model literature has generally assumed that measured and unmeasured or unobservable covariates are statistically independent. For many applications this assumption is clearly tenuous. When unobservables are associated with included regressors and have an impact on the response, standard estimation methods will not be valid. This means, for example, that estimation results from observational studies, whose aim is to evaluate the impact of a treatment of interest on a response variable, will be biased and inconsistent in the presence of unmeasured confounders. One method for obtaining consistent estimates of treatment eﬀects when dealing with linear models is the instrumental variable (IV) approach. Linear models have been extended to generalized linear models (GLMs) and generalized additive models (GAMs), and although IV methods have been proposed to deal with GLMs, ﬁtting methods to carry out IV analysis within the GAM context have not been developed. We propose a two-stage procedure for IV estimation when dealing with GAMs represented using any penalized regression spline approach, and a correction procedure for conﬁdence

Research Report No. 309, Department of Statistical Science, University College London. Date: October 2010.

∗

1

intervals. We explain under which conditions the proposed method works and illustrate its empirical validity through an extensive simulation experiment and a health study where unmeasured confounding is suspected to be present. Keywords: Generalized additive model; Instrumental variable; Two-stage estimation approach; Unmeasured confounding.

1

Introduction

Observational data are often used in statistical analysis to infer the eﬀects of one or more predictors of interest (which can be also referred to as treatments) on a response variable. The main characteristic of observational studies is a lack of treatment randomization which usually leads to selection bias. In a regression context, the most common solution to this problem is to account for confounding variables that are associated with both treatments and response (see, e.g., Becher, 1992). However, the researcher might fail to adjust for pertinent confounders as they might be either unknown or not readily quantiﬁable. This constitutes a serious limitation to covariate adjustment since the use of standard estimators typically yields biased and inconsistent estimates. Hence, a major concern when estimating treatment eﬀects is how to account for unmeasured confounders. This problem is known in econometrics as endogeneity of the predictors of interest. The most commonly used econometric method to model data that are aﬀected by the unobservable confounding issue is the instrumental variable (IV) approach (Wooldridge, 2002). This technique only recently has received some attention in the applied statistical literature. This method can yield consistent parameter estimates and can be used in any kind of analysis in which unmeasured confounding is suspected to be present (e.g., Beck et al., 2003; Leigh and Schembri, 2004; Linden and Adams, 2006; Wooldridge, 2002). The IV approach can be thought of as a means to achieve pseudo randomization in observational studies (Frosini, 2006). It relies on the existence of one or more IVs that induce substantial variation in the endogenous/treatment variables, are independent of unobservables, and are independent of the response conditional on all measured and unmeasured confounders. Provided that such variables are available, IV regression analysis can split the variation in the endogenous predictors into two parts, one of which is associated with the unmeasured confounders (Wooldridge, 2002). This fact can then be used to obtain consistent estimates of the eﬀects of the variables of interest. The applied and theoretical literature on the use of IVs in parametric and nonparametric regression models with Gaussian response is large and well understood (Ai and Chen, 2003; 2

For simplicity of exposition. 2003). 1983. The aim of this paper is to extend the IV approach to GAMs by exploiting the two-stage procedure idea ﬁrst proposed by Hausman (1978. Section 6 evaluates
3
. Hastie and Tibshirani. 1978. we can still obtain biased estimates if the functional relationship between predictors and outcome is not modelled ﬂexibly. which are then extended to the GAM context in Section 4. However. and the Hausman’s endogeneity testing approach. To simplify matters. GAMs. Amemiya (1974) proposed an IV generalized method of moments (GMM) approach to consistently estimate the parameters of a GLM. 1989.Das. Newey and Powell. Section 3 illustrates the main ideas using GLMs. The IV extension to the GAM context is a topic under construction. An epidemiological example is provided by Johnston et al. Section 5 proposes a conﬁdence interval correction procedure for the two-stage approach of Section 4. This is because when ﬁtting a GAM the amount of smoothing for the smooth components in the model has to be selected with a certain degree of precision. and to make use of nonparametric smoothers since the functional shape of any relationship is rarely known a priori. Wooldridge. as they allow researchers to model data using the response variable distribution which best ﬁts the features of the outcome of interest. and. 2005. we ﬁrst discuss a two-step estimator for GLMs which can be then easily extended to GAMs. This generalization is important because even if we use an IV approach to account for unmeasured confounders. 1990). the main drawbacks are typically computational cost and the derivation of the joint distribution. The rest of the article is structured as follows. Here consistent and eﬃcient estimates can be obtained by jointly modelling the distribution of the response and the endogenous variables (Heckman. the classical two-stage least squares (2SLS) method. 2002). In this respect. to the best of our knowledge. Section 2 discusses the IV properties. issues that are likely to become even more severe in the GAM context. such a procedure has not been developed to date. The proposed approach can be eﬃciently implemented using some standard existing software. Simultaneous maximumlikelihood estimation methods for GLMs in which selection bias is suspected to be present have been proposed. however. it might be diﬃcult to develop a reliable computational multiple smoothing parameter method by taking an IV GMM approach. 1983) and employing one of the reliable smoothing approaches available in the GAM literature. 2005. McCullagh and Nelder. Hall and Horowitz. Our proposal is illustrated through an extensive simulation study and in the context of a health study. (2008). Maddala. Here it is not clear how such an approach can be implemented for GAMs so that reliable smooth component estimates can be obtained in practice. In many applications. Gaussian regression models have been replaced by generalized linear and generalized additive models (GLMs.

respectively.. we approach the problem of endogenous explanatory variables from an omitted variables perspective.g. (1) can be written as Y = β0 + βe Xe + βo Xo + ζ. We assume that Xu inﬂuences the response variable Y and is associated with Xe . to clear up the endogeneity of Xe . whereas Section 7 illustrates the method via a health observational study of medical care utilization where unmeasured confounding is suspected to be present. Speciﬁcally.
2
Preliminaries and motivation
In empirical studies. 50) for more details on these forms of endogeneity). is assumed to be uncorrelated with Xo and XIV . and 4 (3)
. (2)
where ζ = βu Xu +ǫY . and unmeasured confounder. and simultaneity (see Wooldridge (2002. measurement error. To ﬁx ideas. XIV ) = 0. with parameters βe . Here. observable confounder. Xo and Xu are the endogenous variable. Since Xu can not be observed. In order to obtain consistent parameter estimates. (3) can also be written as Xe = α0 + αo Xo + αIV XIV + ξu . where ǫXe has the same features as ǫY . βo and βu . 2000): 1. Xu ) = 0. deﬁned as αu Xu + ǫXe . endogeneity typically arises in three ways: omitted variables. an IV approach can be employed. Xo . E(ǫ|Xe . E(ξu |Xo . with βe generally the most aﬀected. OLS estimation of equation (2) results in inconsistent estimators of all the parameters. that satisﬁes three conditions (e. we need an observable variable XIV . called instrument or IV. and Xe . The ﬁrst requirement can be better understood by making use of the following model Xe = α0 + αo Xo + αIV XIV + αu Xu + ǫXe . β0 represents the intercept of the model. let us consider the model Y = β0 + βe Xe + βo Xo + βu Xu + ǫY .the empirical properties of the two-step GAM estimator through an extensive simulation experiment. Greenland. where ξu . (1)
where ǫY is an error term normally distributed with mean 0 and constant variance. p.

The third condition requires XIV to be independent of Xu . the 2SLS estimator can produce an estimate of the original parameter of interest. 2SLS implies the replacement of βe Xe with βe (Xe + ξu ). In ˆ the second stage. 1974). Cigarette price therefore appeared to satisfy the conditions for a valid and strong instrument. Also. this approach does not yield consistent estimates of the coeﬃcients when dealing with generalized models (Amemiya. XIV must be associated with Xe conditional on the remaining covariates in the model. However. We see why this procedure yields consistent estimates of the parameters by taking the conditional expectation of (2) given Xo and XIV . Smoking was considered as an endogenous variable since it was assumed to be associated with health risk factors which could not be observed. 2SLS being the most common. That is. hence the selection of an instrument has to be based on subject-matter knowledge. Assuming that an appropriate instrument can be found. and not to be directly related to any individual’s health. In other words. the error of model ˆ (2) is allowed to become (βe ξu + βu Xu + ǫY ). it was logically assumed to be unrelated to those unmeasured health risk confounders which could aﬀect physical functional status.αIV must be signiﬁcantly diﬀerent from 0. Thus. several methods can be employed to correctly quantify the impact that a predictor of interest has on the response variable. As an example. let us consider the study by Leigh and Schembri (2004). The second requirement is that XIV is independent of Y conditional on the other regressors in the model and Xu . The IV was cigarette price as it was believed to be logically and statistically associated with smoking. 2. 3. which can be readily shown to be uncorrelated ˆ ˆ with Xe and Xo . XIV ) or Xe . Speciﬁcally. Thus. This is because some of the necessary assumptions can not be veriﬁed empirically. In 2SLS estimation. a regression of Y on Xe and Xo is performed. In many situations identiﬁcation of a valid instrument is less clear than in the case above. This is because the unobservable is not additively separable from the systematic part of the model. ˆ E(Y |Xo . XIV ) = β0 + βe Xe + βo Xo . The aim of their analysis was to estimate the eﬀect of smoking on physical functional status. and is usually heavily dependent on the speciﬁc problem at hand. least squares regression is applied twice. This result does not hold for GLMs because βe ξu and βu Xu can not become 5
. The following argument better explains ˆ ˆ this point. the ﬁrst stage ˆ ˆ involves ﬁtting a linear regression of Xe on Xo and XIV to obtain E(Xe |Xo . not statistical testing.

but to illustrate the main ideas using this simpler class of models. a(·) and c(·) are arbitrary functions. (4)
where g(·) is a smooth monotonic link function. and β represents the k × 1 vector of unknown regression coeﬃcients.
3
IV estimation for GLMs
The purpose of this section is to discuss a two-step IV estimator for GLMs which can be then easily extended to GAMs. His procedure has the same ﬁrst stage as 2SLS. In fact. For practical modelling. η is called the linear predictor. The generic response variable Y follows an exponential family distribution whose probability density functions are of the form exp[{yϑ − b(ϑ)} /a(φ) + c(y. y is a vector of independent response variables (Y1 . However. our aim is not to discuss an alternative IV approach for GLMs. several valid methods have already been proposed to deal with GLMs in which selection bias is suspected to be present. A GLM has the model structure g(µ) = η = Xβ. The generalization to the GAM context will then easily follow. X is an n × k matrix of k covariates. µ ≡ E(y|X). φ)].part of the error term given the presence of a link function that has to be employed when dealing with GLMs. Yn )T . and φ the dispersion parameter. but in the second ˆ stage Xe is not replaced by Xe . As explained in Section 1. and its parameter signiﬁcance tested. . ϑ is the natural parameter... The expected value and variance of such a distribution are E(Y ) = ∂b(ϑ)/∂ϑ = µ. the ﬁrst-stage residual is included as an additional predictor in the second-stage regression. The developments of the next two sections are based on the two-stage approach introduced by Hausman (1978. where b(·).. 1983) as a means of directly testing the endogeneity hypothesis for the class of linear models. they do not yield the same results when dealing with generalized models since 2SLS would produce biased and inconsistent estimates (for the reasons given in the previous paragraph) whereas a Hausman-like approach would consistently estimate the parameters of interest. and var(Y ) = φ∂µ/∂ϑ = 6
. Instead. as it will be discussed in the next section. 2SLS and the Hausman’s procedure are equivalent for Gaussian models in terms of estimated parameters. a(φ) is usually set to φ.

p = 1. It is well known in the IV literature that. To simplify notation we do not write the ˆ intercept vector in X even though we assume it is included. This means that each n. then β can yield consistent estimates of β. This will be assumed to be the case throughout the article. Xo an n × j matrix of observable confounders. XIV p ). in order to identify the set of reduced form equations.ivp where n. the ξup contain information about the unmeasured confounders that can be used to obtain corrected parameter estimates of the endogenous variables. That is.g.g. as. unobservable error trivially deﬁned as ǫ ≡ y − g−1 (η). Model (4) can also be written as y = g−1 (η) + ǫ. we assume to have as many endogenous variables as there are unobservables. Correspondingly. This violates the assumption that E(XT ǫ) = 0. As explained in Section 2. (2008).ivp ) × 1 vector of unknown parameters. 2008) xep = g−1 (Zp αp ) + ξup . αp denotes the (j + n.ivp indicates the number of identifying instrumental variables available for xep . To shed light on this last point. Xu ). in Terza et al. p (6)
where xep represents the pth column vector from Xe . e. (5)
where g−1 (η) = E(y|X).. h. it is useful to model the variables in Xe through the following set of auxiliary (or reduced form) equations (e. The reason why the equations in (6) can be used to “correct” the parameter estimates of the equation of interest is as follows. Zp = (Xo . hence it can not be included in the model. depending on the nature of Y . If Xu is available. using 7
. . βu ). we assume three types of covariates. and ξup is a term containing information about both structured and unstructured terms. there must be at least as many instruments as there are endogenous regressors. . E(ǫ|X) = 0.. therefore leading to biased and inconsistent estimates. Recall that equation (5) only implies that E(ǫ|X) = 0.ivp must be equal or greater than 1. and ǫ is an additive.φV (µ). Once the measured confounders have been accounted for and provided the instruments meet the conditions discussed in Section 2.. Xo . βo . and Xu an n × h matrix of unmeasured confounders that inﬂuence the response variable and are associated with the endogenous predictors. To this end. Terza et al. X = (Xe . XIV p is the pth matrix of dimension n×n. g−1 is the inverse of the link function p th chosen for the p endogenous/treatment variable. β T can be written T T T as (βe . the error term may have some undesired properties. Certainly. Notice that. . where Xe is an n × h matrix of endogenous variables. where V (µ) denotes the variance function. The problem with equation (5) is that we can not observe Xu . .

p 2. which in turn leads to model (6) where ξup = xu h′p (Zp αp ) + υp . Notice that in the Gaussian case. and υp is an error term.1
The two-step GLM estimator
In order to obtain consistent estimates for model (5) in the context deﬁned earlier. let us assume that the true model underlying the pth reduced form equation is xep = E(xep |Zp . approximation (8) is not needed since xu would enter the error term linearly. p = 1. For each endogenous predictor in the model. we employ a Hausman-like approach. the following two-step generalized linear model (2SGLM) procedure can estimate the parameters of interest consistently: 1. . (7)
where E(xep |Zp . (2008). 8 (10) (9)
. E(ς|X) = 0. calculate the following set of quantities ˆ ˆ ξup = xep − g−1 (Zp αp ). xu ) = hp (Zp αp + xu ). obtain consistent estimates of αp by ﬁtting the corresponding auxiliary equation through a GLM method.an argument similar to that of Johnston et al. Then. Speciﬁcally. Now. hence (7) can be written as xep = hp (Zp αp ) + xu h′p (Zp αp ) + υp . xu ) + υp . . . (8)
3. . hp = g−1 . hp (·) can p be replaced by the Taylor approximation of order 1 hp (Zp αp + xu ) ≈ hp (Zp αp ) + xu h′p (Zp αp ). h. Fit a GLM deﬁned by ˆ y = g−1 (Xe βe + Xo βo + Ξu βΞu ) + ς. The next section shows how the fact that the ξup contain information about the unobservables can be used to clear up the endogeneity of the treatment variables in the model.

γu ) that solves the population problem minimize E[ y − g−1 (Xe γe + Xo γo + Ξu γu ) 2 ] w. from which follows that β = γ. 341 − 345. γo .
4
The GAM extension
The IV extension to the GAM context is important because even if we use an IV approach to account for unmeasured confounders. 353 − 354). this is not problematic since we are not interested in βΞu . we can still obtain biased estimates if the functional relationship between predictors and outcome is not 9
. βΞu ). we do not know the αp . pp. It can ˆ ˆT ˆT ˆT be readily shown that β T . However. we have that E(y|Xe . and because consistent estimates for it can be obtained. XIV ) = g−1 (Xe βe + Xo βo + Ξu βΞu ). However. It is important to stress that better empirical results are expected when the endogenous variables in the ﬁrst step can be modelled using Gaussian regression models. (11)
In (11) we ignore estimation for Ξu as the endogeneity issue only concerns the second-step equation.1 for an explanation). βo .. Xo . Hence information about the unobservables could be incorporated into the model by using Ξu . All that is needed is to account for the presence of unobservables. The sample analogue follows similar principles. and ς represents an error term. Terza et al. the endogeneity issue would disappear since the assumption that the error term is uncorrelated with the predictors would be satisﬁed. 2SGLM works since if the αp were known then by using (6) the column vectors of Ξu would be known. In this case approximation (8) does not come into play which means that we can better control for unobservables.g. is consistent for the vector value T T T γ T = (γe . now deﬁned as (βe . (2008). These arguments are standard and can be found in Wooldridge (2002. Provided the IVs meet the assumptions discussed in Section 2. e. By using (9) we can get consistent estimates for the αp thereby obtaining a good estimate for Ξu . Following. The parameter vector βΞu can not be used as a means to explain the impact that the unmeasured confounders have on the outcome (see Section 4. In this respect.ˆ ˆ where Ξu is an n × h matrix containing the ξup obtained in the previous step.t. with parameter vector βΞu . we can replace equation (5) by (10) since in this context βe is the parameter vector of interest. as shown in the previous section. As a result.r. γ. and this can be achieved by including a set of quantities which contain information about them.

inconsistent estimates are expected. and X+ = (X+ . j e. ji Since we can not observe X∗ and X+ . the fj (x+ ) are subject to identiﬁability j constraints. jp
(14)
where xep represents either the pth discrete or continuous endogenous predictor. consistent estimates can be obtained by modelling the endogenous variables in the model. u u provided that IVs are available to correct for endogeneity. E(ǫ|X) = 0. . The generic regression spline for the j th continuous variable can be written as fj (x+ ) = X+ θj . x+ .1
The two-step GAM estimator
The 2SGLM estimator can now be extended to the GAM context. Matrix dimensions can be deﬁned following the same criterion adopted in the previous section. and the fj are unknown smooth functions of the covariates. the following two-step generalized additive model (2SGAM) approach can be employed:
10
. X∗ ). X+ . . However. X∗ . X+ ). h. 2010). GAMs extend GLMs by allowing the determination of possible non-linear eﬀects of predictors on the response variable. In particular. with parameter j vector θj . Marra and Radice. The symbols ∗ and e o u e o u + indicate whether the matrix considered refers to discrete predictors (such as dummy variables) or continuous regressors. Z∗ = p ∗ ∗ + + + ∗ (Xo . represented using regression splines (see. . XIV p ). (12)
Here. and Zp = (Xo . this can be achieved through the following set of ﬂexible auxiliary regressions xep = g−1 {Z∗ α∗ + p p p
j
fj (z+ )} + ξup . X = (X∗ . In the GAM context. The linear predictor of a GAM is typically given by η = X∗ β ∗ + fj (x+ ). X∗ = (X∗ . XIV p ) with corresponding vector of unknown parameters αp ..
4. Recall that in order to identify (12). p = 1. . such as i fj (x+ ) = 0 ∀j. j j where X+ is the model matrix containing the regression spline bases for fj . X+ ).g.modelled ﬂexibly. A GAM has the model structure y = g−1 (η) + ǫ. (13) j
j
where β ∗ represents the vector of unknown regression coeﬃcients for X∗ .

This means that the linear/nonlinear eﬀects of the endogenous regressors can be estimated consistently (see Section 6). For each endogenous variable in the model. ˆ Let us now consider equation (14). obtain consistent estimates of α∗ and the p fj by ﬁtting the corresponding reduced form equation through a GAM method. o eo e o eo e
In practice.
(15)
∗ where X∗ = (X∗ .g. Recall that the use of a roughness penalty during the model-ﬁtting process usually avoids the problem of overﬁtting which is likely to occur at ﬁnite sample sizes when using spline models. will contain the linear/nonlinear impacts of the unobservables on the endogenous variables xep . the fp in (15) will automatically yield smooth functions estimates that (i) take into account the non-linearity already present in the ˆ ξup . Fit a GAM deﬁned by
∗ y = g−1 {X∗ βeo + eo j
fj (x+ ) + jeo
p
ˆ fp (ξup )} + ς. 2006). calculate the following set of quantities ˆ ξup = xep − g−1 {Z∗ α∗ + p p ˆp
j
ˆj (z+ )}. X∗ ) with parameter vector βeo . Then. For instance. These eﬀects can be partly or completely diﬀerent from those that the same unmeasured confounders have on the outcome. the 2SGAM estimator can be implemented using GAMs represented via any penalized regression spline approach.1.1. this is not problematic. The use of the fp (ξup ) in (15) allows us to properly account for the impacts of unmeasured confounders on the response. the models in (14) and (15) can be ﬁtted through penalized likelihood which can be maximized by penalized iteratively reweighted least squares (P-IRLS. . Wood. . allows for the control of the trade-oﬀ between ﬁt and smoothness through the smoothing parameters λj .. Speciﬁcally. The estimated residuals. . However. As mentioned in Section 3. p = 1. the presence of a relationship between the outcome and unobservables that are associated with endogenous predictors can lead to bias in the ˆ estimated impacts of the latter variables. This also explains why the ˆp (ξup ) can not be f ˆ used to display the relationship between the unobservables and the response. this is not problematic since all that is needed is to account for information 11
. the use of the quadratic penalty λj θ T Sj θ. X+ ). As explained throughout the paper. h. . and X+ = (X+ . and (ii) recover the residual amount of non-linearity needed to clear up the endogeneity of the endogenous variables in the model. ξup . where Sj is a matrix measuring the roughness of the j th smooth function. e. f jp
2.

The discussion of these properties is beyond the scope of this paper. 2002. are typically used to reliably represent the uncertainty of smooth terms. Vθ = (XT WX + S)−1 φ. and by adapting the asymptotic results of Kauermann et al. Gu. Gu. 1993. Since the second-stage of 2SGAM can not automatically account for an additional source of variability introduced via the quantities calculated in the ﬁrst step. λ. hence we do not pursue it further. Wood.about those unobservables that have a detrimental impact on the estimation of the eﬀects of interest. The large sample posterior for the generic parameter vector containing all regression spline coeﬃcients is given by ˆ θ|y.
5
Conﬁdence intervals
The well known Bayesian ‘conﬁdence’ intervals originally proposed by Wahba (1983) or Silverman (1985) in the univariate spline model context. For simplicity and without loss of generality. X contains the columns associated with the regression spline bases for the fj .g. Implementation of 2SGAM is straightforward. and S = j λj Sj . and then generalized to the componentwise case when dealing with GAMs (e. the conﬁdence intervals for the component functions of the second-step model will be too narrow. W and z are the diagonal weight matrix and the pseudodata vector at convergence of the P-IRLS algorithm used to ﬁt a GAM. hence leading to poor coverage probabilities. In principle. The algorithm we propose is as follows: 12
. let us consider a generic GAM whose linear predictor is made up of smooth terms only. Gu and Wahba. a fact that makes these intervals have good observed frequentist coverage probabilities across the function. φ ∼ N (θ. This is because such intervals include both a bias and variance component (Nychka. It just involves applying a penalized regression spline approach twice by using one of the reliable packages available to ﬁt GAMs.. the consistency arguments for the 2SGLM estimator could be extended to 2SGAM by recalling that a GAM can be seen as a GLM whose design matrix contains the basis functions of the smooth components in the model. Vθ ). This is particularly appealing since the amount of smoothing for the smooth components in the models of the two-step approach can be selected reliably by taking advantage of the recent computational developments in the GAM literature. 2006). This can be rectiﬁed via posterior simulation. 1988). 1992. (16)
ˆ where θ is the maximum penalized likelihood estimate of θ which is of the form (XT WX + S)−1 XT Wz. (2009) to this context.

(2002) for an overview. simulate a random N (α[p] . as a rule of thumb. simulate Nd random draws from N (β [k] . Repeat the following steps for k = 2. h. Fit the ﬁrst-step models. As explained by Ruppert et al. . ep ˆ ˆ∗ (b) Fit the second-stage model where the ξup are replaced with the ξup . samples from the posterior distribution of each ﬁrst-step model are used to obtain samples from the posterior of the quantities of interest ξup .ˆ 1. Hence. In the presence of several candidate predictors. . . Vα ). Fit the second-stage model. and then obtain ξup . . we suggest to use a shrinkage method. . . In this way. calculate new preˆ∗ dicted values x∗ . and let the ﬁrst-stage parameter estimates be α[p] and the [p] ˆ α . depending on the number of reduced form equations in the ﬁrst step. . Since in this context it is not straightforward to correct the second-step estimated standard errors analytically. For k = 1. . . In practice. and let β [1] and Vβ be the corresponding parameter estimates and covariance Bayesian matrix. where p = 1. we can exploit the fact that 2SGAM yields consistent term estimates which can in turn lead to consistent covariate selection. ˆ ˆ [k] 4. Vβ ). as a consequence.
13
. 3. ˆ [p] ˆ (a) For each ﬁrst-stage model p. it is possible to carry out variable selection using information criteria. Then store ˆ ˆ [k] β [k] and Vβ . Nb . Nb = 25 × p and Nd = 100 yield good coverage probabilities. This is because shrinkage approaches are based on the estimated components of a model. given Nb replicates for each ξup . small values for Nb and Nd will be tolerable. Then. Nd random draws from the Nb posterior distributions of the second-stage model are used to obtain approximate Bayesian intervals for the smooth functions in the model. An example of shrinkage smoother is provided by Wood (2006) which suggests to modify the smoothing penalty matrices associated with the smooth components of a GAM so that the terms can be estimated as zero. In words. . estimated parameter covariance Bayesian matrix be V ˆ ˆ [1] 2. (2003). our correction procedure can not be used for variable selection purposes. . Nb . test statistics and shrinkage methods. . the extra source of variability introduced via the quantities calculated in the ﬁrst step models can be accounted for. The discussion of this topic is beyond the aim of this paper and we refer the reader to Guisan et al. result (16) and. Simulation experience suggests that. and then ﬁnd approximate Bayesian intervals for the component functions of the second-stage model.

The proposed two-stage approach was tested using data generated according to four response variable distributions and two data generating processes (DGP1 and DGP2).0 with GAM setup based on the mgcv package. Simulate xo1 and xo2 from a multivariate uniform distribution on the unit square. 14
.0. a Monte Carlo simulation study was conducted.1].2)) var <. Simulate xu .5.1). two uniform variables with correlation approximately equal to 0.e.0. 3.var[. using R. No competing methods were employed since.5.6
Simulation study
To explore the empirical properties of the 2SGAM estimator. Simulate the endogenous/treatment variable of interest as follows xe = θ1 f4 (xu ) + θ2 f5 (xIV 1 ) + θ3 f6 (xIV 2 ) + ζ. the case in which the model is ﬁtted without accounting for unmeasured confounding). sample size and response distribution. All computations were performed using R 2. For both DGPs the number of endogenous variables in the model was equal to one. respectively. xo2 <.
6. the case in which the unobservable is included in the model).sigma=cor)) xo1 <.8.dim=c(2.var[. This was achieved using the algorithm from Gentle (2003). (17)
The test functions used for both DGPs are displayed and deﬁned in Figure 1 and Table 1. Speciﬁcally. xIV 1 and xIV 2 from independent uniform distributions on (0.2] 2. and complete GAM estimation (i.1). For each set of correlations. to the best of our knowledge.pnorm(rmvnorm(n.1
DGP1
The linear predictor was generated as follows η = f1 (xo1 ) + f2 (xe ) + f3 (xu ) + xo2 . we carried out the following steps: 1.5 were obtained as follows library(mvtnorm) cor <.e. The performance of 2SGAM was compared with naive GAM estimation (i. there are not available IV alternatives that can deal with GAMs in which the amount of smoothing for the smooth components can be selected via a reliable numerical method.array(c(1.

and xe was scaled so that its values were between 0 and 1.4 0.8 0.0 0.2 0.6 0.0 0.4 0.8 0. and θ = (θ1 .2 0.
f1(x)
1.4 0.6} and ρ{xe .0 0. f3 (xu )} ∈ {−0. f1 .4 x 0.4 x 0.2 0.6 0.2
f3(x)
0.0 0.0
f2(x)
1.0 0.0 1.5 )} f4 (x) = −e−3x f5 (x) = e3x f6 (x) = x11 {10(1 − x)}6 + 10(10x)3 (1 − x)
Table 1: Test function deﬁnitions. The three functions were scaled to have the same importance.0 0.6 0.0 0.0 0.0 0.4 0.0 0.0 0.2 0.6 f 0.0 0.
15
.2
f6(x)
0.6 0. Generate data according to the chosen outcome distribution.where ζ ∼ N (0.6
0.0 0.4 x
0.0 0. 0.0
f4(x)
1. θ2 .8
1.2 0.6 0.7}. f5 (xIV )} ∈ {0.8 0.2 0.6 0.8 1.8 1. θ3 ) was chosen to obtain the set of correlations ρ{f2 (xe ).
f1 (x) = cos(2πx) f2 (x) = 0.4 x
0.8 0.4.2 0.6 f 0.0 0.5{x3 + sin(πx3 )} f3 (x) = −0.0 0.4 0.2
f5(x)
1.6 0.4 x 0.0 0.0
Figure 1: The six test functions used in the linear predictors.0 0.8
1.4.f6 are plotted in Figure 1.4 0.2 0. and then generate the linear predictor.6
0.8 0.8 1.8 1. 5.0 1.4 x 0.6 0.0 0.5{x + sin(πx2.8 0. −0. 4. 1). Scale the model terms in (17) to have the same magnitude.2 0.

−0. 2006) based on second-order derivatives and with basis dimensions equal to 10.3
Common parameter settings
One-thousand replicate data sets were generated for each DGP. Generate (18) by setting βe = 2 and scaling all model terms (except for xe ) to have the same magnitude. f5 (xIV )} ∈ {0. 0. combination of correlations. the linear predictor was deﬁned as η = f1 (xo1 ) + βe xe + f3 (xu ) + xo2 . Step 1 was achieved by ﬁtting an additive model for DGP1 and a GAM with probit link for DGP2.
6. and complete GAM estimation were employed using penalized thin plate regression splines (Wood. we obtained the mean squared error (MSE) for the estimated smooth function/dummy parameter of the treatment variable of interest. φ4 ) was chosen to obtain the set of correlations ρ{x∗ .2
DGP2
Here. The 2SGAM approach. φ2 . For each set of correlations.7}. Then from the resulting 1000 MSEs. The smoothing parameters were selected by the computational methods for multiple smoothing parameter estimation of Wood (2006. The ﬁrst step models did not include xIV 2 .6. 4. sample size and response distribution.6} and ρ{xe . For each data set and estimation procedure. naive GAM estimation.4. Complete GAM estimation results represented our benchmark.4. Simulate xe according to the following mechanism xe = 1 if x∗ = φ1 + φ2 f4 (xu ) + φ3 f5 (xIV 1 ) + φ4 f6 (xIV 2 ) + ζ > 0 e ∗ xe = 0 if xe ≤ 0 where φ = (φ1 . 2008). f3 (xu )} ∈ e ∗ {−0. an overall mean was taken and its standard deviation calculated. we followed the same steps as in DGP1 but steps 3 and 4 were replaced with: 3. sample size and distribution (see Table 2). The three functions were scaled to have the same magnitude.
16
. φ3 . (18)
where xe was a binary variable with the corresponding parameter βe .

hence it suﬃces to use those selected here to draw conclusions. as well as any other IV method.4
Results
To save space. l. 2002). 3] [0. Missing plots convey the same information. This can be clearly seen in Figure 3. For the cases in which the IV is not strong. case in which a strong instrument can help
17
. the correction achieved by using the proposed approach when data are generated using DGP2 is approximate. The eﬀect of the endogenous variable will be always better estimated when all confounders can be observed and included in the model as in complete GAM.02. This is to be expected since the proposed approach. all we can hope for is to have a method which is as good as complete GAM when valid and strong instruments are available. These ﬁndings complement the results discussed above. u and s/n stand for lower bound. when ρ{x∗ . Naive GAM can not produce better estimates as the sample size increases since unmeasured confounding is not accounted for. 1000. obtained by transforming the linear predictors by the inverse of the chosen link function. respectively. whereas the 2SGAM results indicate that the proposed method performs as well as complete GAM provided that the IV is strong. 2002). pmax] nbin = 1 φ = 0.2. This is not surprising since.98] [0. naive GAM seems to outperform 2SGAM. 8000 logit log identity log [0.1.2.
6. For the remaining sample sizes. upper bound and signal to noise ratio parameter. Excluding xIV 2 from the ﬁrst step auxiliary regressions did not signiﬁcantly aﬀect the 2SGAM performance. Notice that the chosen signal to noise ratio parameters yielded low informative responses. 1] [0. For sample sizes greater than 2000. all previous considerations apply. 0. not all simulation results are shown. 2SGAM yields MSEs that converge to those of complete GAM. e as discussed in Section 3. 500. 2000.n g(µ) l≤η≤u s/n
binomial gamma Gaussian P oisson 250.4.4 pmax = 3
Table 2: Observations were generated from the appropriate distribution with true response means laying in the speciﬁed range. ˆ Figure 2 shows the MSE results for f2 (xe ) when data are simulated from a Bernoulli distribution using DGP1. 4000. that in turn converge to zero as the sample size increases. all that is usually required to obtain consistent parameter estimates is that at least one IV is available for each endogenous regressor in the model (Wooldridge. Naive GAM yields MSEs that appear to be rather high for all cases. In fact. In fact.6 σ= 0. but worse than complete GAM. works satisfactorily provided that the IV induces substantial variation in the endogenous variable of interest (Wooldridge. 2SGAM still performs better than the naive method. ˆ Figure 4 shows the MSE results for βe when data are simulated from a gamma distribution using DGP2. f5 (xIV )} = 0.

f3(xu)}=−0. Notice the good overall performance of the proposed method for all sets of correlations and sample sizes.f5(xIV)}=0.3.40
0. whereas • and ∗ refer to the cases in which estimation is carried out without accounting for unmeasured confounding.4 0.f5(xIV)}=0.0
500
1000 2000 4000 8000
sample size
ˆ Figure 2: MSE results for f2 (xe ) when data are simulated from a Bernoulli distribution using DGP1.
18
. The vertical lines show ±2 standard error bands.f5(xIV)}=0.40 ρ{xe.40 ρ{xe.1 0.70 ρ{xe.2 0.f5(xIV)}=0.f3(xu)}=−0.f3(xu)}=−0. which are only reported for the cases in which they are substantial.70 ρ{f2(xe).1 and 6.1
MSE
ρ{f2(xe).60
0.4 0.60
0.0 250 500 1000 2000 4000 8000 250
ρ{f2(xe). and that in which the unobservable is available and included in the model.3 0.2 0.40 ρ{xe. Details are given in Sections 6.5 0.5 0. ◦ indicates the 2SGAM estimator results.3 0.binomial
ρ{f2(xe). ∗ represents our benchmark since the right model is ﬁtted.f3(xu)}=−0.

f2(x)
0.2 f 0.2 −0. but obviously none of the methods could yield estimates converging to the true values. ˆ Table 3 shows some of the across-the-function coverage probabilities for f2 (xe ) when using the proposed two-step approach without correction for the Bayesian intervals. biased estimates can be avoided if the IVs are strong.4 −0.0
Figure 3: Typical estimated smooth functions for f2 (xe ) (ticker solid black line) when employing the 2SGAM approach (black lines) and naive GAM estimation (grey lines).
to obtain better adjusted estimates.4 x 0. where more information is usually needed to obtain consistent estimates of the parameter of interest.8 0.6 0. In our simulation study. respectively. The results obtained by using 2SGLM. hence severe ﬁnite sample bias if the IVs are weak. (1995). full parametric modelling could not account for the non-linear eﬀects of the confounders as well as model the non-linearities of the treatment variable of interest for DGP1.8 1. As pointed out by Staiger and Stock (1997) and Bound et al. naive GLM estimation and complete GLM estimation (not reported here) were similar to those reported above. this requirement becomes even more relevant when dealing with DGP2. The dotted and solid lines indicate the results for the cases in which n = 1000 and n = 8000. Notice the convergence of the proposed method to the true function as opposed to the naive approach.6 0.4 0. This is because seemingly small correlations between instruments and unmeasured confounders can cause severe inconsistency. IV methods can be ill-behaved if the instruments are not highly correlated with the endogenous variables of interest. In fact.2 0. and the two19
.6 0.0 0. Given that there will always be some empirical correlation at ﬁnite sample sizes. for the DGPs considered here.0 −0.

00
0.f5(xIV)}=0. hence we can not know to what extent the
20
.f3(xu)}=−0.
stage approach employing the interval correction introduced in Section 5.15
ρ{xe*. See Section 6. hence the neglect of the variability of these quantities might not have a substantial detrimental impact on the model term coverages of the second step model.f5(xIV)}=0. uncorrected intervals will be more likely ˆ to yield better coverages. ﬁrst step estimated quantities will be more accurate and. Details are given in Sections 6. This is because.f3(xu)}=−0. and in the caption of Figure 2. since they are based on the assumption that the instruments are asymptotically uncorrelated with the unobservables (Wooldridge.gamma
ρ{xe*.40 ρ{xe*. as the sample size increases the coverage probabilities improve.f3(xu)}=−0.3. IV approaches solve this problem but only asymptotically. In other words.40
0. We know that model term estimates are biased when the error term is correlated with some of the regressors.15
0. as n increases ﬁrst step quantities are estimated more reliably.70 ρ{xe*. Unfortunately. However.2 and 6.60 ρ{xe*.f3(xu)}=−0. at larger sample sizes. but. This results in undercoverage. The results show that the proposed correction produces Bayesian intervals with coverage probabilities very close to the nominal level.00 250 500 1000 2000 4000 8000 250 500 1000 2000 4000 8000
sample size
ˆ Figure 4: MSE results for βe when data are simulated from a gamma distribution using DGP1.05
MSE
ρ{xe*. 2002).40
0. as for the estimation results for βe .4 for an explanation of this result.40 ρ{xe*.10
0.f5(xIV)}=0. For low sample sizes the naive method seems to outperform 2SGAM when the instrument is not strong. with Nb = 25 and Nd = 100. The coverage probability results for βe (not reported here) led to ˆ the same conclusions. 2SGAM without correction yields intervals which are too narrow.10
0. if the data have high information content.05
0. we can not observe the unmeasured confounders. It should be pointed out that IV methods are never unbiased when at least one explanatory variable is endogenous in the model. as a result.f5(xIV)}=0.60
0.70 ρ{xe*.

types of insurance coverage.2SGAM stand for the proposed two-step approach without correction for the Bayesian intervals. and AD.94 0.92 0.91 0.93 0.6.92 0. IV methods should be used if the instruments are believed to satisfy the IV assumptions. Private health insurance coverage is not randomly assigned as in a controlled trial but rather is the result of supply and demand.
7
Illustration of 2SGAM
In order to illustrate the 2SGAM approach. when the correlation between instrument and endogenous variable is 0.
7.94 0. we investigated the eﬀect of private health insurance on private medical care utilization using data from an Italian population-based survey.90 0.95 0.92 0.95 0.
issues above aﬀect our empirical analysis. The outcome was utilization of private health care: an indicator variable that takes value 1 if the subject had private examinations and 0 otherwise. but also to the eﬀect of unobservables which are associated with insurance coverage.95 0. 2002) which was conducted by the leading Italian polling agency DOXA in 2001.92 0. and mainly provides information about individual health status.95 0. As a rule of thumb.94 0.93
250 0. Aging and Wealth (SHAW. If this fact is not accounted for.95 0.94 0.1
Data
We used data from the Survey on Health.95 0. for the nominal level 95%.binomial gamma Gaussian P oisson
250 0. leading to biased health policy conclusions. Brugiavini. with Nb = 25 and Nd = 100. Jappelli and Weber.91 0. utilization of health services.92 0. diﬀerences in outcomes for insured and uninsured individuals might be due not only to the eﬀect of health insurance. As a consequence.95 0. the estimated impact of private health insurance will not be realistic. including individual preferences and health status.7 and that between endogeous and unobservable equal to −0.93
4000 0.94
AD.94 0.93 0.93 0.90
2SGAM 500 1000 0. The SHAW sample consists of 1068 households whose head is over 50 years old. (2005) provide an excellent review of these issues. Notice the good coverage probabilities obtained when employing the correction.95
4000 0.95
ˆ Table 3: Across-the-function coverage probability results for f2 (xe ) at four sample sizes. 2SGAM.91 0.2SGAM 500 1000 0. The endogenous/treatment variable was private health insurance: a dummy variable with value 1 if the respondent had private insurance coverage 21
.93 0. Buchmueller et al.94 0. and the two-step approach with the correction described in Section 5. as well as socio-economic features.

This suggests that the unobservable confounding issue aﬀects the parameter of interest. and references therein for a review of the relevant literature).95).. Harmon and Nolan. on the basis of the variables already included in the model. and depending on the remaining predictors available in the data set at hand.16. there is not a general agreement on which instrument should be selected for statistical analysis (see Buchmueller et al. sex. Taking these ﬁndings into account. As discussed earlier on. Hofter. As pointed out in Section 2. Naive GAM estiˆ mation yielded βe = 0.
7. Despite the eﬀort of many researchers in trying to correctly quantify the impact that private health insurance has on utilization of private health care. identiﬁcation of a valid instrument may not be straightforward because this choice has to be based on subject-matter knowledge. The measured confounders were given by ﬁve factors (consumption of strong alcohol. Reidpath et al. 2006. (2005). 0. 2005. the response variables of the two models were modelled considering all main eﬀects only..2
Health care modelling
The aim is to quantify the eﬀect that private health insurance has on utilization of private health care by accounting for unmeasured risk factors. we can not be certain about the empirical validity of this instrument. the former shows no signiﬁcant eﬀect whereas the latter exhibits a statistically signiﬁcant estimate. Thin plate regression splines of the continuous regressors with basis dimension 10 and penalties based on second-order derivatives were used. 1. not statistical testing. income). 2001. To keep the illustration simple.3. conﬁrming that the presence of unmeasured con22
. smoking status) and three continuous variables (age. The factor variables were kept as parametric model components.94 (0.and 0 otherwise. since some of the necessary assumptions can not be veriﬁed in practice. The diﬀerences between the other parametric parameter estimates of naive GAM and 2SGAM were minimal. indemnity insurance (which is a binary variable) was suggested as an instrument possibly meeting the three conditions discussed in Section 2. 2002). Of course. self-reported health status.83). marital status. ˆ The estimated smooth functions of bmi and ξu support the presence of non-linearities (see Figure 5). whereas 2SGAM with corrected intervals produced ˆ βe = 0. Although there is not any statistical diﬀerence between the two estimates. Overall our results are consistent with those reported in the health care utilization literature (Buchmueller et al.39 (−0. the target is to obtain an adjusted estimate of the impact of private health insurance on utilization of private health care. Two logistic GAM models were employed to implement the 2SGAM approach. Smoothing parameters were automatically selected as explained in Section 6.03. body mass index (bmi).

0
s(res. ﬁrst proposed in econometrics.
founding has to be accounted for in order to obtain a consistent estimate of βe . as pointed out by Johnston et al.g. The numbers in brackets in the y-axis captions are the estimated degrees of freedom or eﬀective number of parameters of the smooth curves.0 −0. (2008). ch.5 1.4
−0.0 0.”
8
Conclusions
The unobservable confounding issue is likely to aﬀect the majority of observational studies in which the researcher is interested in evaluating the eﬀect of one or more predictors of interest on a response variable.0
s(bmi.6
−0.. These ﬁndings were not unexpected since it is well known that the endogenous parameter of interest is generally the most aﬀected (e.82)
0. only recently has received some attention in the
23
. When unmeasured confounding is not controlled for any standard estimation method will yield biased and inconsistent parameter estimates.5 0.2.5 0. The plot depicting the smooth of bmi for naive GAM has not been reported as it was similar to that in Figure 5.3.8)
20 25 30 35 40
−0. The IV approach represents a valid means to account for unmeasured confounding..0 1.0
bmi
res
ˆ Figure 5: Smooth function estimates of body mass index (bmi) and ξu on the scale of the linear predictor.1.5 −1.8 −0. This technique. analysis using an imperfect instrument can still help in providing a more complete picture than regression alone.5 1.. However. “in observational settings where unmeasured confounding is suspected . The validity of the 2SGAM results certainly depends on the degree that the IV assumptions are met. Wooldridge 2002. 5).0 0. for the second stage equation.2
−0.

Health Services Research. the resulting intervals performed well in terms of coverage probabilities.
Acknowledgements
We would like to thank Simon N. We have also proposed a Bayesian interval correction procedure for 2SGAM.
24
. 1795–1843. 2. 38. Shapiro S and Pilote L (2003) Does aggressive care following acute myocardial infarction reduce mortality? Analysis with instrumental variables to compare eﬀectiveness in Canadian and United States patient populations. Penrod J. Statistics in Medicine. Wood whose suggestions have led to the Bayesian interval correction procedure of Section 5. logical arguments must be presented to justify the instrument choice. 11. Gyorkos TW. 1423–1440. Given that not all IV assumptions can be tested empirically. [4] Beck CA. However.applied statistical literature. 1747-1758. Our proposal is backed up with an extensive simulation experiment whose results conﬁrmed that 2SGAM represents a ﬂexible theoretically sound means of obtaining consistent curve/parameter estimates in the presence of unmeasured confounding. Journal of Econometrics. The major drawback in all IV methods (including ours) is the diﬃculty in choosing an appropriate instrument. In simulation. statistical analysis using an imperfect instrument can still help in providing insights into the possible eﬀect that unmeasured confounding has on the estimated relationship of interest. and David Lawrence Miller. the Associate Editor and one anonymous reviewer for helpful comments that have improved the presentation of the article.
References
[1] Ai C and Chen X (2003) Eﬃcient estimation of models with conditional moment restrictions containing unknown functions. 71. [2] Amemiya T (1974) The nonlinear two-stage least-squares estimator. We have proposed a ﬂexible procedure to carry out IV analysis within the GAM context. Econometrica. 105–110. [3] Becher H (1992) The concept of residual confounding in regression models and some applications.

2904–2929. Italy. [6] Brugiavini A. [11] Greenland S (2000) An introduction to instrumental variables for epidemiologists. 89–100. [16] Hall P and Horowitz JL (2005) Nonparametric methods for inference in the presence of instrumental variables. Edwards TC and Hastie T (2002) Generalized linear and generalized additive models in studies of species distributions: Setting the scene. [8] Das M (2005) Instrumental variables estimators of nonparametric models with discrete endogenous regressors. 124. Health Economics. [10] Gentle JE (2003) Random Number Generation and Monte Carlo Methods.[5] Bound J. 90. New York: Springer-Verlag. 62. International Journal of Epidemiology. Grumbach K. 255-264. Medical Care Research and Review. Universita’ di Salerno. [17] Harmon C and Nolan B (2001) Health insurance and health services utilization in Ireland. [12] Gu C (1992) Penalized Likelihood Regression . [9] Frosini BV (2006) Causality and causal models: A conceptual perspective. 2. 97–117. London: Springer-Verlag. Journal of Computational and Graphical Statistics. 135–145. [14] Gu C and Wahba G (1993) Smoothing Spline ANOVA with Component-Wise Bayesian Conﬁdence Intervals. 157. The Annals of Statistics. 29. International Statistical Review. Kronick R and Kahn JG (2005) Book review: The eﬀect of health insurance on medical care utilization and implications for insurance expansion: A review of the literature. aging and wealth. Journal of the American Statistical Association. 2. [7] Buchmueller TC.A Bayesian Analysis. Ecological Modelling. Jappelli T and Weber G (2002) The survey of health. 335–361. Journal of Econometrics. 10. 33. 443–450. 305–334. Jaeger DA and Baker RM (1995) Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. 3–30. [13] Gu C (2002) Smoothing Spline ANOVA Models. 74. [15] Guisan A. Statistica Sinica. 722–729.
25
.

[19] Hausman JA (1978) Speciﬁcation tests in econometrics. 57. Econometrica. 19. 284–293. 107–125. In Griliches Z and Intriligator MD. 423–439.
26
. London: Chapman & Hall. Econometrica. Journal of Evaluation in Clinical Practice. 71. London: Chapman & Hall. [22] Hofter RH (2006) Private health insurance and utilization of health services in Chile. New York: Cambridge University Press. 38. 27. 391–448. Statistical Methods in Medical Research. [21] Heckman J (1978) Dummy endogenous variables in a simultaneous equation system. 46. Journal of Clinical Epidemiology. 1539–1556. 931–59. [29] McCullagh P and Nelder JA (1989) Generalized Linear Models. Levy AR and Grootendorst P (2008) Use of instrumental variables in the analysis of generalized linear models in the presence of unmeasured confounding with applications to epidemiological research. Gustafson P. 1251–1271. Statistics in Medicine. 148–154. Handbook of Econometrics. Journal of the Royal Statistical Society Series B. eds. 46. [25] Leigh JP and Schembri M (2004) Instrumental variables technique: Cigarette price provided better estimate of eﬀects of smoking on SF-12. 12. [27] Maddala G (1983) Limited-Dependent and Qualitative Variables in Econometrics.[18] Hastie T and Tibshirani R (1990) Generalized Additive Models. [20] Hausman JA (1983) Speciﬁcation and Estimation of Simultaneous Equations Models. [23] Johnston KM. [28] Marra G and Radice R (2010) Penalised Regression Splines: Theory and Application to Medical Research. [24] Kauermann G. Krivobokova T and Fahrmeir L (2009) Some asymptotic results on generalized penalized spline smoothing. [26] Linden A and Adams JL (2006) Evaluating disease management programme eﬀectiveness: an introduction to instrumental variables. 487–503. Applied Economics. Amsterdam: North Holland.

[39] Wood SN (2008) Fast stable direct ﬁtting and smoothness selection for generalized additive models. 526–531. 557–586. London: Chapman & Hall. 71. [33] Ruppert D. Basu A and Rathouz PJ (2008) Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Crawford D. Wand MP and Carroll RJ (2003) Semiparametric Regression. Journal of the American Statistical Association. [36] Terza JV. Tilgner L and Gibbons C (2002) Relationship between body mass index and the use of healthcare services in Australia. [37] Wahba G (1983) Bayesian ‘Conﬁdence Intervals’ for the Cross-Validated Smoothing Spline. Cambridge: MIT Press. Journal of Health Economics. [31] Nychka D (1988) Bayesian Conﬁdence Intervals for Smoothing Splines. 45. 65. 495–518.[30] Newey WK and Powell JL (2003) Instrumental variable estimation of nonparametric models. 10. Journal of the Royal Statistical Society Series B. 1–52. 133–150. [35] Staiger D and Stock JH (1997) Instrumental variables regression with weak instruments. Journal of the Royal Statistical Society Series B. London: Cambridge University Press. [40] Wooldridge JM (2002) Econometric Analysis of Cross Section and Panel Data. [32] Reidpath DD. 1134–1143. Journal of the Royal Statistical Society Series B. Obesity research. 83. 27. Econometrica. [34] Silverman BW (1985) Some Aspects of the Spline Smoothing Approach to NonParametric Regression Curve Fitting. 70.
27
. [38] Wood SN (2006) Generalized Additive Models: An Introduction with R. 1565–1578. Econometrica. 47. 531–543.