You are on page 1of 18

Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on

Confounders
Author(s): James M. Robins, Steven D. Mark and Whitney K. Newey
Source: Biometrics, Vol. 48, No. 2 (Jun., 1992), pp. 479-495
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2532304 .
Accessed: 25/06/2014 06:30

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.

http://www.jstor.org

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
BIOMETRICS48, 479-495
June 1992

of
ExposureEffectsbyModellingtheExpectation
Estimating
on Confounders
ExposureConditional

JamesM. RobinsandStevenD. Mark


Harvard School of Public Health, 665 HuntingtonAvenue,
Boston,Massachusetts02115, U.S.A.
and
K. Newey
Whitney
Departmentof Economics, MassachusettsInstituteof Technology,
Cambridge,Massachusetts02139, U.S.A.

SUMMARY

In orderto estimatethecausaleffects of one or moreexposuresor treatments on an outcomeof


one has to accountforthe effectof "confounding
interest, factors"whichbothovary withthe
exposuresor treatmentsand are independent predictorsof theoutcome.In thispaperwe present
regressionmethodswhich,in contrastto standardmethods,adjustfortheconfounding effect
of
multiplecontinuousordiscrete covariatesbymodelling theconditional expectationoftheexposures
or treatmentsgiventheconfounders. In thespecialcase of a univariate dichotomous exposureor
treatment,thisconditionalexpectation is identicalto whatRosenbaumand Rubinhavecalledthe
propensityscore.Theyhave also proposedmethodsto estimatecausal effects by modellingthe
propensityscore.Ourmethods generalizethoseofRosenbaumand Rubinin severalways.First,our
approachstraightforwardlyallowsformultivariate exposuresor treatments, each ofwhichmaybe
continuous,ordinal,or discrete.Second,even in thecase of a singledichotomous exposure,our
approachdoesnotrequiresubclassification or matching on thepropensity scoreso thatthepotential
for"residualconfounding," i.e.,bias,due to incomplete matching is avoided.Third,ourapproach
allowsa rathergeneralformalization of theidea thatit is betterto use the"estimated propensity
score"thanthetruepropensity scoreevenwhenthetruescoreis known.The additionalpowerof
ourapproachderivesfromthefactthatwe assumethecausaleffects oftheexposures or treatments
can be describedbytheparametric component ofa semiparametric model.To illustrate
regression
ourmethods, we reanalyzetheeffect ofcurrent smokingon thelevelofforcedexpiratory
cigarette
volumein one secondin a cohortof 2,713adultwhitemales.We comparetheresultswiththose
obtainedusingstandard methods.

1. Introduction

1.1 The Problem

In orderto estimatethecausal effectofone or moreexposuresor treatmentson an outcome


of interest,one has to account forthe effectof "confoundingfactors"which both ovary
withthe exposuresor treatmentsand are independentpredictorsof the outcome. If fewin
number,categoricalconfoundingfactorsare commonlydealt withby stratification. When
thereare manyconfoundingfactorsor when some ofthe factorsare continuous,regression
methods are used. In this paper we presentregressionmethods which, in contrastto
standardmethods,adjust forconfoundingby modellingaspectsof the marginalassociation
ofthe exposuresof interestwiththe confoundersratherthanby modellingtheindependent

Key' words. Causal inference;Covariance adjustment;Epidemiologic methods; Propensityscore;


Semiparametric
efficiency;
Semiparametric
regression.
479

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
480 Biometrics,June 1992

associationoftheconfounders withtheoutcome.Specifically,wewillmodeltheconditional
expectation oftheexposuresgiventheconfounders. Thesemethodsofestimation willbe
useful
particularly when priorknowledge regarding of
theassociation theconfounders with
exposurestatusis more precisethan knowledgeregarding theirassociationwiththe
outcome.
For concreteness,we shallattemptto estimatethe effect of beinga currentcigarette
smokeron thelevelofforcedexpiratory volumein one second(FEV 1) in a cohortof2,713
adultwhitemaleformer and current smokersfromtheinitialcross-sectional
cigarette data
collectedin the HarvardSix CitiesStudy(Dockeryet al., 1988). We shallestimatethis
effectwhileadjustingforthe presenceof the 22 potentialconfounding factorslistedin
Table 1 thatincludepast smokinghistory, past respiratory
symptoms, age, height,and
coexistentheartdisease.In thisexampletheexposureof interest is dichotomous and we
assumethatthereis no interactionbetweenthatexposureand theconfounders. Thatis,we
assumethattheabsoluteeffect ofcurrent smoking on FEV 1 doesnotdependon a subject's
age,weight, previoussmokinghistory, etc.In thissetting
themostcommonapproachto
estimatingtheeffectofcurrentsmoking on FEV1 wouldbe to postulatea linearregression
model
K

Yi = /31 + [S, + E IkXki + el, E[eiISi, XJ]= 0, (1)


k=2

where Yi, Si, Xi = (XX,i, . . ., XK,i) are respectivelyrandom variablesrepresenting


subject
i's FEV1 level,current smokingstatus(Si = 1 ifa current smokerand Si = 0 otherwise),
and valueson a vectorXi of potentialconfounding factors.Note thattheparameter of
interest,
3,is distinguishedfromthe"nuisance"parameters (d3, ... ., OK) bytheabsenceof
a subscript.Fornotationalsimplicity,we shallassumethat(Yi, Si,Xi) areindependent and
identically
distributed
randomvectors, although, withminormodifications, ourresultswill
holdiftheXi arefixedconstants and the(el,Si) areindependent acrosssubjects.
Defineo-(S X) = var[eiIS, X]. We writeo-2(S,X) = u2 iftheerrors c, arehomoscedastic.
Unlessstatedotherwise,we shallassumehomoscedastic errors,although we do notassume
thatthisfactis knownto thedata analyst.The cl are notassumedto be independent of
the (Si, Xi).
Supposewe areunwilling
to assumethattheindependent oftheconfounders
association
Xi withtheoutcomeYi has a knownfunctionalform.In thatcase,we wouldgeneralize
model(1) to
S, + h(Xi) + ci, E[eiISX1] = 0,
Y = (2)
whereh(Xi) is an unknownreal-valuedfunctionof the vectorXi. Model (2) has a
Table 1
Twenty-two
potentialconfounders ofcurrentsmokingon FEV 1
oftheeffect
Age Historyofemphysema
Age-squared Pasthistory
ofasthma
Height Currentasthma
Bodymassindex Formercigarsmoker
Chroniccough Currentcigarsmokerlevel= hi
Recurrentboutsofcoughing Currentcigarsmokerlevel= medium
Historyoftreatment
for Currentcigarsmokerlevel= lo
heartdisease Formerpipesmoker
Chronicphlegmproduction Currentpipesmokerlevel= hi
Chronicwheeze Currentpipesmokerlevel= medium
Totalyears
ofcigarette
smoking Current
pipesmoker
level= lo
Lifetime
pack-years
smoked

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
EstimatingExposureEffects 481

semiparametric regression
function withparametric component3Sj and nonparametric
componenth(Xi). This paper is concernedwiththe estimationof d frommodel (2).
Robinson( 1988)hasprovided an asymptotically
normalandunbiasedestimator ofd under
a large-sample limitingmodelin whichthenumberofconfounding factors
remainsfixed
as thesamplesizegrows.His estimator relieson thefactthat,undersucha limiting model,
theunknownfunction h(Xi) can be consistentlyestimatedby nonparametric regression
techniques. In epidemiologic research,thenumberofconfounding variablescan be quite
large.In theseinstances, themoreappropriate limitingmodelwouldbe one in whichwe
allowedthenumberof confounding factorscontainedin Xi to increasewiththesample
size(Huber,1981).
It is difficult
to generalizeRobinson'sapproachbasedon nonparametric estimation of
h(Xi)whenthedimensionofXi is large.As a consequence, to obtainconsistent estimators
of 3,we shallconsidermakingadditionala prioriassumptions beyondthosespecified by
model(2). The standardapproachwouldbe to assumethath(Xi)is knowna prioriexcept
fora finitenumberof unknownparameters. As an example,thelinearregression model
(1) assumesthat
K
h(Xi) = d1 + X OkXk, i
k=2

In contrast
tothestandard approach,inthispaperweshallsupposethatpriorinformation
concerningthemarginal associationofSi withXi is sharper
thanthatconcerning theform
ofh(X1).Thuswe shallleaveh(Xi) completely unspecified and insteadspecifyparametric
modelsforthemarginal associationofSi andXi. Specifically,
we shallconsiderparametric
modelsforE(S IX) = p(S= 1 IXi) suchas thelogisticregressionmodel
1 exp(ai + E
[S
P1
I
IiXi; o]~~= +exp(a I +
2 akXki)
Xk=2 akXk-,)(3

wherea = (a1, ..., aK). We shall show that we can obtain asymptoticallynormal and
unbiasedestimators ofd in model(2) providedourmodel(3) forp(S = 1IXi) is correctly
specified.
Althoughcorrectly specifiedparametric modelsforeitherh(Xi) or p(S = 1IXi) will
provideasymptotically normaland unbiasedestimates of A, nonetheless, as discussedin
thenextparagraph, leastsquaresestimators ofd basedon modelsforh(Xi)willalwaysbe
at leastas efficient
as anyestimator ofd basedon modelsforp(S = 1IXi). This suggests
that,forreasonsof efficiency, it is alwayspreferable to modelh(Xi) ratherthanp(S =
1IXi). But if,as we assumein thispaper,our priorinformation concerning h(Xi) is less
sharpthanthatconcerning p(S = 1 IXi), we wouldchoosenotto modelh(Xi)in orderto
protect againstspecification bias.
In orderto explainwhytheordinary leastsquaresestimator of / basedon a correctly
specifiedmodelforh(Xi)is alwaysat leastas efficient as anyestimator of/basedon models
forE[S IX], we needto reviewsomeresultsfromthetheoryofsemiparametric efficiency
boundsderivedby Chamberlain (1987; and DiscussionPaper 1494,HarvardInstitute of
EconomicResearch,1990)and exposited byNewey(1990).Forthemomentsupposeagain
that,as in equation(1), we wereable to correctly specify a parametric model,say,q(Xi; 0)
forh(Xi) depending on a parameter vector0. In equation(1), 0 = (1, . . ., /3K). Chamberlain
(1987) showedthattheestimator of/ obtainedbyfitting themodelY1= /S1+ q(Xi; 0) +
eiby unweighted, possiblynonlinear, leastsquaresis themostefficient possibleestimator
of /3thatis guaranteed to be asymptotically normaland unbiasedunderthe sole prior
restrictionsthatE[c1IS1,Xi] = 0 and h(Xi) = q(X1;0). [If,as in equation(1), q(X1;0) is
linearin 0, we fitusingordinaryleast squares.Otherwise, we fitusingnonlinearleast

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
482 Biometrics,June 1992

squares.]Therefore if,as in model(2), we are unwilling to specify a parametric formfor


h(X,) and yetwant our estimatorof d to be asymptotically normal and unbiased whatever
be h(Xi),the asymptotic varianceof any such estimator clearlycannotbe less thanthe
supremum oftheasymptotic variancesoftheleastsquaresestimators ofd takenoverthe
setofall possibleparametric modelsforh(X1).Thissupremum is calledthesemiparametric
efficiencyboundforan estimator ofd undermodel(2) (Bickeletal., 1992)andwas shown
byChamberlain (discussionpapercitedpreviously) to equal n-'O-2/E[var(SI X)], wheren is
thesamplesize.
Thus,ifwe are able to correctly specifya parametric modelforh(Xi),theleastsquares
estimator ofd alwayshasvarianceno greater thantheefficiency boundn-'o-2/E[var(SIX)].
In contrast,ifundermodel(2), we areunableto specify a parametric modelforh(Xi),but
insteadcorrectlyspecify a modelforE[S IX], no estimator thatis asymptotically unbiased
ford forall h(Xi) can have varianceless thanthe bound n'o-2/E[var(SIX)].This is a
consequenceof thefactthat{(Si, Xi), i E (1, . . . , n)} is ancillaryford undermodel(2)
(Cox and Hinkley,1974) and, as discussedby Newey(1990), knowledge concerning the
marginal distribution
ofan ancillary statistic
does notaffect thesemiparametric efficiency
boundfortheestimation ofd.
It needstobe stressedthat,evenwhenwecan obtaina consistent estimatorofd in model
(2), itdoesnotfollowthattheparameter d can be interpreted as thecausaleffect ofcurrent
cigarettesmoking on FEV 1. We nowdescribeconditions underwhichd doeshavea causal
interpretation.

1.2 A Cauisal Model


Following Rubin(1978),let Ys=1,i be subjecti's FEVI hadsubjecti beena current
smoker.
If subjecti is a currentsmokerin the actual study,then Ys=ii equals his observed
FEVI Yi. If subjecti is not a currentsmoker,Ys=ii is missing.Similarly,Ys=oi
is subjecti's FEVI if subjecti were,possiblycontrary to fact,a currentnonsmoker.
Rubindefinedtheaveragecausaleffect ofcurrent smokingamongsubjectswithobserved
covariateslevel Xi to be E[Ys=lXi] - E[Ys=olXi]. Now, underour model (2), we
know that E[YIXi, S = 1]- E[YJXi, S = 0] = d since E[YJXi, S = I1] = + h(Xi) and
E[YIXi, S= 0] = h(Xi).
Thusa sufficientconditionford to equal theaveragecausaleffect
ofcurrent
smoking
at
eachlevelXi is that,foreachXi,
E[Ys=sIX1]= E[YIX1,S = s], s E {0, 1}. (4a)
UnderRubin'scausalmodel,equation(4a) is equivalentto
E[ Ys=sIX] = E[ Ys=sXi, S = s]. (4b)
whenXi
We shallassumethatequation(4b) holdsand thusd hasa causalinterpretation
is thevectorof22 potentialconfoundingvariablesdescribedabove.The assumption that
in the sensethatit is compatiblewithanyjoint
equation(4b) holdsis nonidentifiable
distributionfortheobservable
randomvariables(Si, Xi, Yi).Whenequation(4b) holds,we
shall call model (2) a semiparametriccausal regressionmodel. Equation (4b) says that,
conditionalon thejointlevelofthe22 potentialindependent Xi, themeanof
riskfactors
Ys=samongsubjectswhoactuallyreceivetreatment S = 1 equalsthatamongsubjectswho
actuallyreceivetreatmentS = 0. We do notassumethatequation(4b) holdswhenXi is a
propersubsetofthe22 potential confounding variables.
The mathematical resultsin thispaperare concernedonlywiththeestimation of d in
model(2) anddo notdependon whether equation(4b) holds.Ofcourse,in general,
we are
interested
in theestimationof/3onlywhenwe believeithas a causalinterpretation.

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
Estimating
ExposureEffects 483
1.3 Relationship
to thePropensity
Score
Rosenbaumand Rubin(1983, 1984,1985)and Rosenbaum(1984, 1987,1988)havealso
consideredestimating the causal effectof a dichotomoustreatment such as Si on an
outcomeYi by modellingp(S = 1IXi) when equation(4) holds. These authorscall
p[S = 1IXi] thepropensity score.In contrastto theirapproach,ourapproachstraightfor-
wardlyallowsthetreatment or exposureSi to be continuous or ordinalratherthansimply
dichotomous. Furthermore, as discussedin theAppendix,our approachallowsSi to be
multivariate so thatwe can, say, estimatethe independent effectsof currentcigarette
smokingand past cigarette smoking.In addition,our "regression" approachdoes not
requiresubclassificationor matching on thepropensity scorep[S = 1IXJ]evenwhenXi
hascontinuous components so thatthepotential for"residual"confounding, i.e.,bias,due
to thefactthatone has notprecisely matchedon p[S = IXJ]is avoided.The additional
powerofourapproachderivesfromthefactthatwe assumethecausaleffect ofexposure
canbe described bytheparametric component ofa semiparametric causalregressionmodel
suchas model(2).
Rosenbaum(1984, 1988)also considered specifying causalmodelsto avoidtheneedto
matchor subclassify on thepropensity score.In general,Rosenbaumis concernedwith
small-sample (exact)ratherthanlarge-sample (asymptotic) inference.As a consequence,
hiscausalmodelstendto be evenmorerestrictive thanmodel(2). Specifically,he assumes
a constanttreatment effectmodel-thatis, Ys=,i = d + Ys=oiforall subjectsi-although
hisresults
wouldstillholdundertheweakerassumption thatthedistributionsof Ys=li and
Ys=oidiffered bya "shift"parameter d. Furthermore, as he pointsout,his"exact"methods
do notallowone to adjustfortheconfounding effects
ofcontinuous covariates.
Finally,as discussedin Section2, ourapproachallowsa rathergeneralformalization of
theideathatit is betterto use the"estimated" propensity scorethanthe"true"propensity
scoreevenwhenthetruescoreis known(Rosenbaum,1987).

2. Estimators
Based on ModelsfortheConditional ofExposure
Expectation
GivenConfounders

2.1 AnInfeasible
Estimator
In thissection,we considerestimatorsofd undermodel(2) whenwe can specify accurate
modelsforE(SI Xi). NotethatwhenSi is dichotomous, modelsforE(SI Xi) aremodelsfor
p(S = 1IXi). Initially,forpedagogicpurposes,we shall assumethatwe knowE(S IXi)
exactly.That is, we assumeexactpriorknowledge of the expectedvalue of S forevery
combination oftheconfounders Xi. Subsequentlywe makethemoretenableassumption
thatwe knowE(S IXi) up to a finitevectorofunknownparameters. We allowo-2(S,X) to
dependon (S, X)..Henceforth, we adoptthefollowing notationalconvention: d willrefer
to thetruebut unknownvalue of thecoefficient of Si in model(2); ft willreferto any
hypothesized, valueford.
possiblyincorrect,
The estimator we shallconsider,
whichwe call theE-estimator,
d=L=1 Yi[Si - E(SIXi)]
(5)
i= Si[Si - E(SIXi)]
-

is basedon a suggestion byNewey(1990).


It is shownin TheoremA.1 in theAppendixthat3E has a limiting normaldistribution
withmeand.
The consistencyofI3E is basedon thefactthatmodel(2) impliesthat
E[zj lXi, Si] = E[zI Xi], (6)

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
484 Biometrics,
June1992
wherezi = Yi- S43.In theproofofTheoremA.1 intheAppendix,
itis shownthatequation
(6) impliestheidentity
E[U(O] = 0, (7)
where, foranyf t, U(ft) = AI (Y1- Sift)(Si - E(S IXi)). The E-estimator
fE is thesolution
1t to theunbiasedestimating equationU(ft) = 0.

2.2 A FeasibleEstimator
Of course,theestimator E(S IXi) is unknown.We can
since,in practice,
fE is notfeasible
overcomethisdifficultyifwe assumea priorithatthelogisticregression modelequation
(3) holds.We thenestimateE(S IXi) bylogisticregression
and subsequently estimate
d by

- X=inYi[Si - E(SIXi)]
-i=1SI [Si - E(S IXi)] (8)
valuePi- p[S = 1IXi; a'] ofp[S = 1IXi],anda' is themaximum
whereE(S IXi) is thefitted
likelihoodestimator of fromthelogistic
a regression. Notethatwe usethesymbolfE rather
thanfE to represent thefeasibleestimatorofequation(8).
As shownin TheoremA.1 in theAppendix,it followsfromPierce(1982) and Newey
(1990) thatwhenthelogistic modelofequation(3) is true,3E is asymptotically normaland
unbiasedand itsasymptotic covariancematrixcan be consistently estimated by
varest(&E)= vares(s3E)- Q[varest()]QT (9)
where

varest(E) = [on z1(S - ) 2 (10)


Si(Si
Yi is theK-vector
with [7I 1
2i IESi, QT jth component
7_ =i Zi~A(1 - p1)Xji

Has- , S1(S -Pi)

(wherewe defineXj,i = 1 whenj = 1),and varest(&)


is theestimated
covariancematrix(i.e.,
theinverseoftheobservedinformation matrix)fromthefitofthelogisticmodelequation
(3). The observedinformationmatrixhas (j, k) entry-y]=, AI(1 - Ii)XjiXki. The estimator
varest(fE)is notguaranteed A positive-definite
to be positive-definite. consistent variance
estimator is obtainedbyreplacing A(1 - Aj)by(Si - Aj)2bothin thenumerator ofQj and
in theobservedinformation matrix.
Even though&iE is infeasible whenp[S = 1IX] is unknown, vare~s(fE)is stilla feasible
consistent estimator ofitsasymptotic variance.Thereforeit followsfromequation(9) that
onegenerates a morepreciseestimate ofd byestimating thepropensity scoreE(S IXi) than
byusingthetruepopulationvalueofthepropensity scoreevenwerethelatterknown.That
is, varest(fE)is alwaysless thanor equal to varest(AE).As discussedin theAppendix,this
resultdependson the factthata' is an efficient estimator of a. The preference forfE
comparedto fE whentheparameter a ofmodel(3) is knowncan also be viewedin terms
ofconditional bias.Specifically,
it can be shownthat,conditional on theancillary statistic
[varest(av]Io a- a), fE becomes asymptoticallybiased while fE remains asymptotically
unbiased(Robinsand Morgenstern, 1987;Rosenbaum,1987;Efronand Hinkley,1978).
In Table 2 we presentfourdifferent
estimates
I3E ofd3basedon specifying
fourdifferent

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
EstimatingExposureEffects 485

Table 2
EstimatesHE underfourdifferent forp[S = 1 I Xj]
specifications
Covariates included
Xk,,i
in logistic
modelof
(3) for
equation var(stf3E) vartt(fE)
Analysis p[S= 1 I Xi] E X 10-4 X 10-4

(1) Constanttermonly -.0580 9.49 159.0


(2) Constant,chroniccough(Yes,No) .0429 9.36 167.0
(3) Constant,pack-yearsofsmoking .0520 7.45 157.9
(4) Constant,22 covariates
in Table 1 -.1133 8.82 332.6

logisticregression modelsforp[S = 1IXj]. In thefirstanalysisin Table 2, we assumeno


confounding. Thatis,we fitonlya constant terma, in equation(3). In thesecondanalysis
Xi in equation(3) is the singlebinarycovariate-history of chroniccough.In thethird
analysis Xi is thesinglecontinuous covariate-lifetime numberofpack-years. In thefourth
analysis in
Xi equation(3) is the of
22-vector potential confounders. The striking efficiency
advantageattributable to estimating thepropensity scorep[S = 1IXi] can be obtainedby
comparing varest0(E) to varest(f3E)in Table 2.
Undertheassumptions that(a) thecoefficient /in equation(2) hasa causalinterpretation
[i.e.,equation(4b) holds] when Xi is the22-vector of confounders and (b) themodelfor
p[S = 1 in is
IXi] used analysis(4) true,analysis(4) provides a consistentestimator ofthis
causal/. Therefore, we estimatethatcurrent smoking causes a decreaseof .1 133 literin
FEV1. A 95% confidence interval ford is -.113 ? (1.96)(.00088)1/2= (-.170, -.056).
Underassumptions (a) and (b), we now providesufficient conditionsforthe simpler
analyses(1)-(3) also to provideconsistent estimators of the "causal" / associatedwith
model(2) withXi the22-vector ofcovariates.
We shallrestrict attention to analysis(3) sincetheconditions foranalyses(1) and (2) are
similar.Let Xk*be thecovariate"lifetime numberof pack-years" usedin analysis(3). /E
fromanalysis(3) willbe consistent forthecausald ifeitherofthefollowing is true:
condition
Sufficient (1): WithXi the22-vector ak = 0 forthe2 1 covariates
ofcovariates,
Xkin thelogisticmodel(3) otherthanXk* (i.e.,lifetime
pack-yearsis theonlypredictorof
currentsmoking amongthe22 potential confounding factors).
Sufficientcondition(2):The unknown functionh(Xi) = h(X2,i,..., XKj), K = 22, is
actuallyonlya function
ofXk*,i (i.e.,lifetime
pack-years is theonlyindependent riskfactor
amongthe22 potentialconfounding and p[S = 1 IXk*,j] followsa linearlogistic
factors)
model.
In generalitwouldbe unlikely
thatan investigator
wouldbe willing
toassumethateither
oftheabovesufficientconditionsheld,and thuswouldtendto relyon analysis(4).
d = 0 that
Supposeequation(4b) holdsand considerthe testof the null hypothesis
rejectsif /E ? 1.96[varest(E)] 1/2 failsto include 0. Then exceptforthe assumptionthatthe
model(3) forp[S = 1IXi] is correctly thistestis an "otherwise
specified, asymptotically
.05 a-leveltestofthesharpnullhypothesis
distribution-free" ofno causaleffect
ofexposure,
i.e., of the hypothesisYs=1,i= Ys=o,i= Y1forall subjectsi.
thatwillbe "otherwise
Rosenbaum(1984, ?4.2) proposesa testofthisnullhypothesis
asymptotically
distribution-free" I
undertheconditionthat(Ys== Ys=i,) and S are con-
ditionally
independentgivenXi.

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
486 Biometrics,June 1992

3. Relationship to Ordinary
ofE-Estimators Least Squares
The ordinary
leastsquares(OLS) estimator
ofd in equation(1) can be written

~OLS
= Yi(Si - P(S I X))
E [Si - P(SIXT)] (11)
wheresummationsignswithoutindexeswill referto sumsoverindividualsand where
P(S IXi) is thefitted
valuefromtheOLS regression ofS on Xi and theconstantone. Now
theright-hand sideofequation( 11) can be written
as
- I
~OLSYi=(Si F(S X))
E
(Si - P(S
I IXi))

usingthefactthat,forOLS, theempirical correlation ofthefitted


valuesand theresiduals
is zero.Now supposewe had modelledE(SI Xi) = p[S = I IX] by thelinearprobability
modelp[S = 1IXi; a] = a1 + Ek=2 akXki ratherthanby a logisticmodel,and we fitthe
linearprobability modelby leastsquares.Then E(S IXi) = P(S IXi). Therefore, fromits
definition &E = /OLS- In the previoussectionwe showedthat/E is consistent if our
modelforE(SI Xi) is true.It follows,as pointedout by Newey(1990), thatif,in truth,
E(SIXi) = a? + k=2 akXki [i.e., E(SIX,) is linearin Xj], then/OLS is consistent ford
evenifh(Xi)is nonlinear and thusequation(1) is false.Nonetheless ifh(Xi)is nonlinear,
theestimate ofthevarianceof/OLS providedbystandard softwarepackagesis inconsistent,
and equation(9) mustbe used. If E(S IXi) is notlinearin Xi, theordinary leastsquares
estimate ofd would,in general, be inconsistent iftheunknownfunction h(X1)is,in truth,
nonlinear in Xi.
Table 3 shows/OLS fromthefitofequation(1) forthefourchoicesofXi as in Table 2.
Notethat/OLS = fE inanalysis(2). Thisreflects thefactthatwhenXiisa singledichotomous
covariate, is
E(S IXi) simultaneously linearand and E(S IXi) = P(S IXi). For
linearlogistic,
similarreasons/OLS = fE in analysis(1). fE fromanalysis(3) usingthecontinuous variable
"pack-years" is notidenticalto theOLS estimatesinceE(S IXi) # P(S IXi). The factthat
iBE and /3OLS are closecan be explained bythenearlinearity ofE(S IXi) in ourdata,which
can be checkedbyplotting E(S IXi) versusXi.
We nowdiscussa modification oftheestimator /3 thathasan evencloserconnection to
OLS thandoes fi. Define

Em
E Yi(Si - E(SIAXi))
=
[Si - E(S I X)] (12)
WhenE(S IXi) is nonlinear(e.g.,logistic),
&Em willnotin general
equal fi. Nonetheless,

Table3
EstimatesIDOLS underfour
different forcovariates
specifications includedinO3+ 2/3-Xk.iin model
equation(1)
Covariates
XAi varest(OOLs)a
Analysis included in equation (1) IDOLS
X 10-4

(1) Constanttermonly -.0580 9.50


(2) Constant,chroniccough (Yes, No) .0429 9.39
(3) Constant,pack-yearsof smoking .0492 7.64
(4) Constant,
22 cvariatesinTable1 -.1199 8.68
a UsingWhite's( 1980)heteroscedastic
consistent
varianceestimator.

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
Estimating
ExposureEfJfects 487

/Emand /E havethesameasymptotic One obtains/Em byregressing


distribution. Y,versus
withno intercept.
Si - E(S IXi) usingOLS regression
"theresiduals"

4. Two-StageE-Estimators
Throughout thissectionwe assumethatthelogistic specified
modelequation(3) is correctly
withXi thevectorof22 covariates. Then /E is a consistent
estimatorof/ in equation(2)
withoutmakinganyassumptions abouttheformof h(Xi).Supposenowthatwe havean
a prioriguessas to theshapeof h(Xi). For concreteness, supposewe believedthath(Xi)
waslinearorat leastnearlylinearinXi,i.e.,h(Xi)= /1+ Xkk=2OkXki. We can nowconsider
howto developan estimator, say/3*, thatmaybe muchmoreefficient than/E ifourguess
concerning the shape of h(Xi) is corrector nearlycorrect,and will remainconsistent,
asymptoticallynormalno matter howwrongourguessmaybe. To construct /*,weproceed
in twosteps.Firstwe computeE(S IXi) and /E as before.We thenregress that2i = Yj -
/ESi on Xi. We thendefine/* to be thesolution/t to theestimatingequation
K
0 = U*(/t) Z - fits - OkXki)(SI - E(SIX1)),
k=2 /
where(f1,. . ., ,K) of2i on Xi. Therefore,
fromtheregression
aretheOLS estimates
=
- (Y - ~=2
h- kXk-,i)[Si - E(SIXi)]
Z Si(Si -E(S I Xi))
In TheoremsA.1 and A.3 in theAppendixwe showthat 3*is asymptotically
normal
and unbiasedeveniftheproposedlinearmodelforh(Xi)is incorrect.
A consistent ofvarA(f*)is
estimator

[ S,(S;- _i)]2
- (Q*)varet(&)(Q*) (13)

where[i&= k- 3*k - >k=2 /kXki - /1 and (Q*)T has components


=* -E [ A(lip'1-Pi)xii
Qj* Si (Si P i)
In our example03*is -.117 withvar't(f*) = 8.79 x 10-4 whenXi is the 22-vector
of
covariates.
In thefinalparagraphoftheAppendixwe showthat,ifthelinearmodelpostulated for
h(Xi) werecorrect,then(1) Q* converges
to zero in probability
so the correction
term
could be ignored;and (2) if a2(S, X) thenvarA(f*) = n-'U2/E[var(SIX)].
= U2,
Whenh(X1)is linear,/OLS will be consistent
asymptoticallynormaland varA(OOLs) will
withequalitywhenE(S IXi) is linearin Xi. Of course,if
be lessthanor equal to varA(03*),
neitherh(Xi) nor E(S IXi) is linear, 3* but not fOLS remainsconsistent
[providedthe
nonlinearmodelforE(S IXi) is correctly Whena2(S, X) = U2 and h(Xi) is, in
specified].
truth,linear, 3*has the smallestasymptotic varianceamongall estimatorsthatremain
asymptotically unbiasedevenwereh(Xi) nonlinear(Chamberlain, discussionpapercited
Thatis,itattainsthesemiparametric
previously). boundformodel(2).
efficiency

5. Discussion
Supposeagain thatF3in model(2) is causal [i.e., equation(4b) holds]whenAZ1 is the
22-vector
ofcovariates.
ThenthevalidityofourB-estimators ofthecausaleffectofcurrent
smokingon FEV1 requiresthatthe semiparametric regressionmodel (2) and logistic

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
488 Biometrics,
June1992
regression
(3) be correctly
specified. Specificationof(3) canbe checkedusingthetechniques
describedbyLandwehr, Pregibon, and Shoemaker (1984). The no-interactionassumption
ofmodel(2) can be checkedbynesting (2) in themoregeneralsemiparametric regression
modelof the Appendixthatincludesinteractions betweencurrentsmokingSi and the
in Xi,and thentesting
covariates whether theinteraction coefficientsarenonzero.
We notethat,ratherthansimplymodelling p[S = 1IX] by thelinearno-interaction
modelequation(3), we couldcontinueto addto equation(3) additional
logistic termssuch
as powersoftheXk,i (e.g.,X2 i, X3 i) forcontinuous covariatesand all ordersofinteraction
betweenthe various covariatesand theirpowers(e.g., X 4,i . X3,i). This will greatly
increasethenumberof freecoefficients in our modelforp[S = 1IXi]. As we add these
additionalterms,we derivetwobenefits. First,we decreaseanyasymptotic bias in /E (or
d3*)due to possiblemisspecification
of thelinearno-interactionmodelforp[S = 1IXi].
Second,whenthelinearno-interaction logisticmodelis correctly
specifiedand thusthe
additionaltermsare notnecessaryto makeoE unbiased,theasymptotic varianceofAE (or
0*) is nonincreasingand willusuallydecreaseas the numberof freeparameters in the
modelforp[S = I IX] increases[see Pierce(1982) and Corollary A.1 oftheAppendix].
Thus, ratherthan havingthe usual tradeoff betweenefficiency and bias, we findthat
increasingthe numberof freeparameters can lead to improvements in bothbias and
Thisapparent"freelunch"mustbe tempered
efficiency. bytwofacts.First,no matter how
many termswe add, varA(fE) and varA(f*) will always exceed n-l 2/E[var(SIX)] (with
homoscedastic errors)(Chamberlain, 1987). Second,theresultswe have derivedrequire
thattheestimates ofthefreeparameters in themodelforp[S = 1IXi] are n'/2-consistent.
[Newey(1990) suggests thatn'/4-consistency is sufficient.]
Thislimitsthenumberof free
parameters we mayhavein our modelforp[S = 1IX] as a function of samplesize.For
example,we couldnotallowthenumberoffreeparameters to equal thetotalsamplesize.
Cross-validationtechniquesformodelselection shouldbe usefulin choosinga properratio
ofsamplesizeto parameters. Moderateand small-sample simulation studiesareneededas
a guideto practice.
We note thatwhen the linearno-interaction logisticmodel (3) is misspecified,the
asymptotic varianceofthe(now potentially biased)estimator FE basedon a misspecified
modelforp[S = 1IXi] can be lessthantheasymptotic varianceoftheestimator FE based
on a morerichly parameterized, correctly specified modelin whichthemisspecified model
is nested.Thisphenomenon is evidentin a comparison ofanalyses(3) and (4) in Table 2.
The estimatedvarianceof OE in analysis(3) is less thanthatin analysis(4), because
covariatesotherthan"pack-years of smoking"are also important predictors of current
smoking.
The resultsdescribedin the precedingthreeparagraphshelp to clarifyboth when
E-estimation willand willnot be preferable to standardcovarianceadjustment by least
squares.Considerfirst thecase in whichthesamplesizeis quitelargeand thedimension
ofXi is small,so thatrichlyparameterized modelsforeitherh(Xi)orp[S = 1IXi] can be
used.Then,as discussedabove and in technical detailbyNewey(1990),as one addspower
and interactiontermsto the model (3) forp[S = 1IXi], any bias in /d*and OE would tend
to zeroand theasymptotic varianceofd*, and evenOE, will approachthesemiparametric
efficiencybound of n-l2/E[varIS(X)]. Similarly, in thissetting,ifwe expandedthelinear
regression model(1) by addingadditionaltermssuchas powersofXki and interactions
betweentheXki and theirpowers,thebias of/OLS fromtheleastsquaresfitof(1) would
tend to zero, and the variance of OOLS would approach the efficiency bound
n-'2/E[var(SIX)]. Thus, in this setting, the use of highlyparameterized modelsfor
h(Xn) fitbyleastsquaresor theuse ofhighlyparameterized modelsforp[S = 1IXi] fitby
B-estimationleadsto estimators ofd3 withsimilarproperties.

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
Estimating
ExposureEffects 489
Now,considerthecase in whichthedimensionofXi is largeand/orthesamplesize is
moderate.One is thenrestricted to choosingparsimonious parametric modelsforh(Xi)
and/orp[S = 1IXJ. Further, sincetheratioofthesamplesize to thedimensionofXi is
small,thepowerto discriminate betweencorrect and incorrectlyspecifiedmodelsforh(Xi)
and/orp[S = 1IXJ]willbe poor.If,as is oftenthecase in an etiologicstudy,ourprimary
interest is in obtainingvalidinferences concerning / (e.g.,confidence intervalsthatcover
at theirnominalrate),itis essentialto tryto obtainasymptotically unbiasedestimators of
3. Since,in general,unbiasedestimation of/ requiresthatthemodelusedin theanalysis
be correct, we wouldprefer E-estimation overleastsquaresestimation ifwe believedthat
our abilityto specifynearlycorrectparsimonious modelsforp[S = 1IX] exceededour
abilityto specifysuch modelsforh(Xi). This wouldbe the case whenthe investigator
thinks, basedon substantive considerations,thathisor herknowledge oftheshapeofthe
regression surfacep[S = 1IXi] is sharper
thanknowledge oftheshapeofthefunction h(Xi).
In thespecialcase,represented byourexamplein Section3, in whichthefitted regression
surfaceJ[S = 1IX] is nearlylinearin the Xi, E-estimation and standardcovariance
adjustment byleastsquareswillprovidesimilarestimates irrespectiveofwhether h(Xi) is
oris notlinear.
We nextconsiderwhether it mightbe possibleto developrobustE-estimators. Evenif
thelinearmodel(1) weretrue,theefficiency of fOLS wouldbe pooriftheerrorsei have
heavy-tailed distributions(Huber,1981).If we are willingto assumethat,in additionto
(1), theerrors wereindependent robustestimation
ofthe(Si, Xi), efficient basedon M, L,
or R estimators is possible(Huber,1981).If ei is independent of (Si, Xi) but model(1)
werenottrue,robustE-estimation of model(2) couldbe based on solvingan unbiased
estimatingequation of the formEim(Yi- 3tSi,Xi)(Si - E[SIX ]) = 0, wherethe function
m(Yi - StSi,Xi) wouldbe chosento downweight observations forwhichYi - fSi differs
greatly fromitsexpectedvaluegivenXi. (Such observations willbe associatedwithlarge
valuesof the residuals.)How to choosethe function m(Yi - dtSi,Xi) in thissettingis
outsidethescopeofthispaper.
If a2(S, X) dependson X alone or on S and X, it is possibleto develop"weighted"
E-estimators thatwill be moreefficient thanthe E-estimators 3E or 3* (Chamberlain,

1987).
Supposenextthattheoutcomeof interest is a dichotomous diseasevariable.Then Y4
willbe a Bernoullirandomvariable.In thatcase,one mightno longerwishto specify the
semiparametric model(2), i.e.,
E[YiIXi, Si] = h(Xi) + fSi,
sincethemodeldoes not naturally obeytherestriction mustlie in the
thatprobabilities
interval a semiparametric
one mightspecify
[0, 1]. Therefore logisticmodel
+
E[YiIXi, Si] = 1 + exp[h(Xi)OSi] (14)

Unfortunately, theapproachdevelopedin thispaperwillnot allow us to consistently


estimate thed ofequation( 14)eventhoughBickeletal. ( 1992)andChamberlain (discussion
papercitedpreviously) showthat,inprinciple, thereshouldexistan n'/2-consistent
estimator
ofd basedon data(Xi,Si, Yi) [atleastwhenthedimension ofXi is fixedas thesamplesize
increases]. Ourapproachfailsbecauseitis fundamentally basedon thefactthat,formodel
fromthe "pseudo-data"(Si, Vi, YJ),whereVi = E(SI Xi). We call Vi
(2), d is identified
"pseudo-data." For example,if Viwereknown,ourestimator 3E does notrequire data on
Xi. It can be shown that/3 in equation(14) is notidentified
from pseudo-data (Se, Vi,Y1)
dueto the"noncollapsibility" ofthelogisticparameter /3whenwe collapsefromthe"raw

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
490 Biometrics,
June1992
data"Xito Vi.Indeed,Gail,Wieand,and Piantadosi(1984) essentially provethisnoniden-
resultin thespecialcase forwhichVi= 2 forall subjects.In fact,supposeXiwere
tifiability
dichotomous and thuse: was thecommonexposure(S)-disease (Y) oddsratioin thetwo
2 x 2 tablesindexedbythelevelsofX. In thisspecialcase,thenonidentifiability ofd when
Viis a fixedconstantforall subjectsi is simplya restatement ofthefollowing well-known
fact.EvenwhenS and X are (marginally) independent, thecommonodds ratioe: is not
identified fromdata (Si, Yi) sincethemarginalexposure-disease odds ratio(ignoring X)
maydiffer frome: and themagnitude ofthedifference dependson thedistribution ofX
(Gail etal., 1984).However,in contrast to ournonidentifiability
resultsforthe 3 ofmodel
(14), if equation(4b) holds,theaveragecausal effect of S on disease Y, i.e., E[Ys=1]-
from(Si, Vi,Yi) (Rosenbaumand Rubin,1983).
E[ Ys=o],is identified
SupposenextthatYihasa Poissonor overdispersed Poissondistribution. We mightthen
wishto specify semiparametric log-linearmodels,e.g.,
E[YilXi, Si] = exp[h(Xi) + /Sij. (15)
For log-linear
models,a simplemodification ofourapproachcan be usedto consistently
estimated frompseudo-data since,undermodel(15), E[U()] =
(Si, Vi, Yi). Specifically,
0, where
n
U(/t) _ E Yie-Otsi(Si- E(SIXi)), (16)
1=1

thesolutionfE to U(/t) = 0 willbe consistent, normal.A feasible


asymptotically consistent
estimator&E can be obtainedfromdata (Si, Xi, Yi) by specifying
a (correct)modelfor
E[S IXJ.
The methodsof E-estimation can be extendedto estimatethecausal effect of a time-
varying treatment. Robins(1989a, 1992a,1992b,1992c,1992d)and Robins
Specifically,
et al. (1992) use an extension whichtheycall G-estimation,
of E-estimation, to estimate,
fromobservational data,thecausal effectofa time-varying treatment bothon a survival
timeoutcomeandon theevolution ofthemeanofa continuous outcomevariablemeasured
repeatedly overtimeinthepresence oftime-dependent confounding factors.
Robins(1989a,
1992b, 1992d) uses G-estimation in
to correctfor noncompliance randomizedtrials
studying theeffectofa time-varying treatmentbothon survival timeoutcomesand on the
evolution ofthemeanofa continuous outcomevariablewhennoncompliance dependson
time-dependent prognosticfactors.G-estimationis ofparticularimportance in estimating
thecausal effect of a time-varyingtreatmentin thepresenceof time-varying prognostic
factors becausestandard covarianceadjustmentbasedon time-dependent Cox proportional
hazardmodelsforsurvivaltimeoutcomesor generalized estimatingequations(Liangand
Zeger,1986)forrepeatedmeasuresoutcomescannotconsistently estimatethetreatment
effect (Robins,1986,1989a,1989b,1992a,1992b,1992c).

ACKNOWLEDGEMENTS

Thisworkwas supported ofHealthGrants2 P30 ES00002,


in partbyNationalInstitutes
ROL-ES03405, K04-ES00180,ES01108, and CA09001. We would like to thankDoug
Dockery,FrankSpeizer,BenjaminFerris,
and othercontributors
to theHarvardSix Cities
Studyfortheirgeneroussharingoftimeand data.

RE~SUME~
Pourestimer1'influence
d'unou plusieurs
facteurs
surunevariable il fautprendre
d'interet, en
compteleseffets
descovariables
quid'unepartvarient
aveclesditsfacteurs,
etd'autre
partaidentai

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
Estimating
ExposureEffects 491
predirela variabled'inter6t, ind6pendamment de ces facteurs.
Dans cetarticle,nouspresentons des
methodesde regression qui, a la differencedes m6thodesusuelles,ajustent1'effet confondant de
plusieurscovariables(continuesou discr&tes) par modelisation de 1'esperanceconditionnelle des
diff6rents
facteursenfonction descovariables.Dans le casparticulierd'unseulfacteura deuxniveaux,
cetteesperanceconditionnelle estidentiquea ce que Rosenbaumet Rubinontappealle scorede
propension. Ces auteurs,d'ailleurs,ont aussi proposedes methodesd'estimation passantpar la
modelisation de ce scorede propension. Nos methodes generalisentcellesde Rosenbaumet Rubin
de plusieurs
mani&res. Toutd'abord,notreapproches'etendd'embl6ea touslescas de figure possibles
pourles facteurs,chacund'entreeux pouvantetrecontinu,ordinalou discret.Ensuite,m6medans
le cas d'un seul facteura deux niveaux,notreapprochene necessitepas de classification ou
d'appariement d'apr&sle scorede propension, de tellesorteque le risquede "confusion residuelle"
de biais)li6 a ces methodes
(c'est-a-dire estevite.Enfin,notreapprochepermetde comforter l'id'e
qu'ilvautmieuxutiliser le scorede propension estimeque le vraiscorede propension, m6melorsque
ce vrai scoreest connu.Le surcroitde puissancede notreapproche- provientdu faitque nous
supposons que l'influence desfacteurs peut6tred6crite parla composante parametrique d'unmodule
de regressionsemi-parametrique. A titred'illustration,nousreanalysons, surune cohortede 2,713
adultesblancsde sexemasculin, du tabacsurla valeurdu volumeexpiratoire
l'effet maximalseconde,
etnouscomparons les resultatsobtenusavecceuxdes methodes classiques.

REFERENCES
Begun,J. M., Hall, W. J.,Huang,W. M., and Wellner, J.A. (1983). Information and asymptotic
efficiencyin parametric-nonparametric models.AnnalsofStatistics 11,432-452.
Bickel,P., Klaassen,C. A. J.,Ritov,Y., and Wellner, J.A. (1992).Efficient andAdaptiveInference
inSemiparametric Models.Baltimore, Maryland: JohnsHopkinsUniversity Press.
Chamberlain, G. (1987). Asymptotic efficiencyin estimation withconditional momentrestrictions.
Journal ofEconometrics 34, 305-334.
Cox,D. R. and Hinkley, D. V. (1974). Theoretical London:Chapmanand Hall.
Statistics.
Dockery,D. W., Speizer,F. E., Ferris,B. G., Ware,J. H., Louis, T. A., and Spiro,A. (1988).
Cumulative and reversible effectsoflifetime smoking on simpletestsoflungfunction in adults.
American ReviewofRespiratory Diseases137,286-292.
Efron,B. and Hinkley,D. V. (1978). Assessing theaccuracyofthemaximumlikelihood estimator:
ObservedversusexpectedFisherinformation. Biometrika 65, 657-687.
Gail,M. H., Wieand,S., andPiantadosi, S. (1984).Biasedestimates oftreatment effectinrandomized
experiments withnonlinear regressions and omitted covariates.Biometrika 71, 431-444.
Huber,P. (1981).RobustStatistics. NewYork:Wiley.
Landwehr, J.M., Pregibon,D., andShoemaker, A. C. (1984).Graphicalmethods forassessing logistic
regression models.JournaloftheAmerican StatisticalAssociation 79, 61-71.
Liang,K. Y. and Zeger,S. L. (1986). Longitudinal data analysisusinggeneralized linearmodel.
Biometrika 73, 13-22.
Manski,C. F. (1988).AnalogEstimation MethodsinEconometrics. NewYork:Chapmanand Hall.
Newey,W. K. (1990).Semiparametric efficiencybounds.Journal ofApplied Econometrics 5, 99-135.
Pierce,D. A. (1982).The asymptotic effect ofsubstitutingestimators forparameters in certaintypes
ofstatistics.
AnnalsofStatistics 10,475-478.
Robins,J.M. (1986).A newapproachtocausalinference in mortality studieswithsustained exposure
periods-application to controlofthehealthy worker survivor effect.
Mathematical Modelling 7,
1393-1512.
Robins,J.M. (1989a).The analysisofrandomized and nonrandomized AIDS treatment trialsusing
a newapproachto causalinference in longitudinalstudies.In HealthServiceResearchMethod-
ology.A Focus onAIDS, L. Sechrest, H. Freeman,and A. Mulley(eds), 113-159.Washington,
DC.: NCHSR, U.S. PublicHealthService.
Robins,J.M. (1989b).The controlofconfounding byintermediate variables.StatisticsinMedicine
8, 679-701.
Robins,J. M. (1992a). Correcting fornoncompliance in randomized trialsusingstructural nested
meanmodels.Communications inStatistics,in press.
Robins,J.M. (1992b).Estimating thecausaleffect ofa time-varying treatment on survivalusinga
newclassoffailure timemodels.Communications in press.
inStatistics,
Robins,J. M. (1992c). Estimationof the time-dependent acceleratedfailuretimemodelin the
presenceofconfounding factors.Biometrika, in press.
Robins,J.M. (1992d).Analytic methodsforHIV treatment and cofactor effects.In Methodological
IssuesofAIDS Behavioral Research,D. G. Ostrowand R. Kessler(eds).NewYork:Plenum.

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
492 Biometrics,June 1992

Robins,J. M., Blevins,D., Ritter,G., and Wulfsohn, M. (1992). G-estimation of the effectof
prophylaxis therapyforpneumocystis cariniipneumonia (PCP) on thesurvivalofAIDS patients.
Epidemiology, in press.
Robins,J. M. and Morgenstern, H. (1987). The foundations of confounding in epidemiology.
Computers andMathematics withApplications 14, 869-916.
Robinson,P. (1988).Root-N-consistent semiparametric regression.Econometrica 56, 931-954.
Rosenbaum, P. R. (1984). Conditionalpermutation testsand thepropensity scorein observational
studies.Journal oftheAmerican Statistical
Association 79, 565-574.
Rosenbaum,P. R. (1987). Model-baseddirectadjustment. Journalof theAmericanStatistical
Association 82, 387-394.
Rosenbaum,P. R. (1988). Permutation testsformatchedpairswithadjustments forcovariates.
AppliedStatistics 37, 401-411.
Rosenbaum, P. R. and Rubin,D. B. (1983).The centralroleofthepropensity scorein observational
studiesforcausaleffects. Biometrika 70, 41-55.
Rosenbaum, P. R. and Rubin,D. B. (1984). Reducingbias in observational studiesusingsubclassi-
ficationon thepropensity score.Journal oftheAmerican Statistical
Association 79, 516-524.
Rosenbaum, P. R. andRubin,D. B. (1985).Constructing a control groupusingmultivariatematched
sampling methodsthatincorporate thepropensity score.TheAmerican Statistician
39, 33-38.
Rubin,D. B. (1978). Bayesianinference forcausal effects:The role of randomization. Annalsof
Statistics6, 34-58.
White,H. (1980). A heteroskedasticity-consistentcovariancematrixestimator and a directtestfor
heteroskedasticity.Econometrica 48, 817-838.

ReceivedDecember1989;revisedOctober1990andJanuary1991;acceptedFebruary1991.

APPENDIX

In thisAppendix, statedin thetext.We assumethat


we provetheresults
Y, = f(Si, X, 13)+ h(X1)+ -s, E[.i I Si, Xi] = 0, (A.0)
wheref(Si, Xi, 3)is a linearfunction ofa V-dimensionalparameter vectord thattakesthevaluezero
whenSi = 0. [Extension ofourresultsto nonlinear functions ofd is straightforward.]Model(2) in
thetextis thespecialcase in whichf(Si, Xi,53) = 3Siforunivariate d and dichotomous Si. (A.0)
generalizes (2) byallowingformultivariate eachcomponent
exposures, ofwhichmaybe categorical,
ordinal,or continuous.For example,we mightsupposeSi = (Sl, ..., Sari) and f(Si, Xi, 3) =
Y-m=IOmSmi + /M+I SiX4i with V = M + 1. If equation (4b) holds when s is any value of Si, then
f(Si, Xi, ,3) is theaverageeffect ofjoint exposurelevelSi comparedto thebaselinelevelSi = 0
amongsubjectswithcovariatelevelXi. Iff(Si, Xi, 3) dependson Xi, we saythereis an exposure-
covariate interaction.
Definefo(Si,Xi) to be the V-vector off(Si, Xi, f) withrespectto the
of partialderivatives
componentsofd and let
E[fo(Si, Xi) IX] = r(Xi; a), (A. 1)
wherer(.; *) is a knownfunctionand a is an unknownparameter.DefineR(Si, Xi; a) =f(Si, Xi) -
r(Xi; a). Note that E[R(Si, Xi; a) IXi] = 0. If, as in the text,
f (Si, Xi) = Si is a Bernoullirandom
variable,
(A.1) is a fullyparametricmodelforSi givenXi; otherwise, (A.1) is a semiparametric
model
forthedensityf,,(SiIXi), sincethedistributionofR(Si, Xi; a) is completely unrestricted
exceptfor
havingmeanzerogivenXi.
Nowforanynonrandom function
g(x), define
n-'/2U(/t, g, ) - n-1/2 E [Y, - f(Si, X,, At) - g(Xi)]R(Si, Xi; &)

-n-l 1E iUi(OI, g5,&o), (A.2)


where& is asymptotically
equivalent solutionto 0 = iM_(at)=-im(Si, Xi,at)
to an n'12-consistent
forsome Mi(at) satisfying E[Mi(a)] = 0. That is, whenMi(a) is continuously differentiable,
n /(& - a) = -{E[8Mi(a)/aa']}n-1n-2Mi(a) + op(l), and we say that -{E[8Mi(a)/8a']}'M1(a)
is theinfluence
function of &. Chamberlain (1987) provesthat& is semiparametric fora
efficient

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
EstimatingExposureEffects 493
underthesolerestriction
(A.1) on theconditional
distribution
ofSi givenXi onlyifMi(a) equals

ar(Xi; a)

Thatis, & is semiparametric onlyifitis asymptotically


efficient equivalentto theoptimalweighted
(possiblynonlinear)leastsquaresestimator
of a. Henceforth, we shallsaythat& is semiparametric
efficientunder(A.1) if & has influence
function Iffo(Si,Xi) = Si is
-{E[8Mqff(a)/8a']}-'M~ff(a).
Bernoulli, semiparametric under(A.1) is just ordinary
efficiency parametric Our main
efficiency.
resultis givenas TheoremA.1.

Theorem A.1 Underregularity conditions givenin Corollary1,Chapter8 ofManski(1988),there


existsa solution3E(g) -39(g, &) to n- /2U(3t, g, &) = 0 suchthatnlM2(/E(g)- 3) is asymptotically
normalwithmean0 and variancethatcan be consistently estimated
by
I-l A~~~g)(I
tI,)- ~(A. 3a)
where
I'-n-' E i8Ui(3,g, &)'/1a = ni' Xif,(Si,Xi)R(Si,Xi; &)',
A(g) = iF' >i Ki(3, g, &)K'(/, g, &),
Ki(3,g, ) = U(, g, )-(g)-Mi()

B(g) n-' Z, 8Ui(3, g, &)/Oa' = n-' X[Yi -f(Si, X,,/ - g(Xi)] aR(SaXi; &)

C= n-i' > Mi(&)/8c'.


If & is semiparametric
efficient varianceof nl'/2(E(g) - 3) can be
under(A.1), the asymptotic
estimated
consistently by
I-l (g)(I' ) - Q(g)WQ'W(g), (A.3b)
whereQ(g) B-'B(g), i(g) = n-'yLiUi0(E(g), g, &)Ui(0E(g) g, &)', and Q is a consistentestimatorof
varA[nl/2(& - ao)].
Exceptwhenffl(Si,Xi) equalsa dichotomous Si (as in thetext),var[R(Si,Xi; a) IXJ]maybe an
unknownfunction of a and Xi. Hence,ifone choosesto estimatea by theunweighted (possibly
nonlinear) offf(Si,Xi) on Xi,itis necessary
leastsquaresregression to useformula (A.3a)rather
than
onlyifthe(unknown)
(A.3b),sincea willthenbe efficient varianceofR(Si, Xi; a) doesnotdepend
on Xi.
ifone hasa correctly
However, modelvar[R(Si,Xi; a) IXJ]= 4(Xi; 0), whereW(Xi;Ot) is
specified
and 0 is an unknownparameter,
a knownfunction thenit is wellknownthattheestimate& that
solves0 = yi{8r(Xi;a)/8aa}{4(Xi; q)}-'R(Si, Xi; a) has influencefunction-{E[M~ff(a)/8a' ]}-'Mqff(a)
and (A.3b) can be used. Here 0 is the (possiblynonlinear)multivariate leastsquaresregression
estimateof 0 obtainedby regressingR(Si, Xi; &)R'(Si, Xi; &) on Xi, wherea is obtainedfroma
preliminaryunweightedleastsquaresregression off,(Si,Xi) on Xi.
Applicationof TheoremA.1 Considerequation(9) in the text. In that setting,g(Xi) 0;
53) = 3S1;.f(Si,Xi) = Si; R(Si, Xi; &) = Si - Pi,wherePi= e" -Xi/(I+ eaI Xi);
f(Si, Xi,
(Si8 Xi; aa
OR [e(-t)'xi/(1+ e(t)yxi)]I t= pi31 - pi)Xi;

Yi- f(S, Xi, ME(g))


- g(X) = 2i; U0(3E(g), g, &) = 2i(Si Pii); I = n-'iSi(Si -ii);
j(g) = n-'1i2~(Si - i)2; Q(g) = _
Lifii(l - Pi). Substituting
- fii)X'//yiSi(Si intoequation(A.3b),we
obtainequation(9).
The readercan checkthatsubstituting in (A.3b) also givesequation(13) if we set g(Xi) =
_3 + &:_ Xk.i above. [As we shallsee in Theorem(A.3) below,thefactthatg(Xi) is basedon
estimates/3kdoesnotaffect theasymptotic variance(A.3b).]

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
494 Biometrics,June 1992

ProofofTheoremA.1 For pedagogicpurposeswe firstsketcha proof.We thenshowhowCorol-


provethetheorem.
lary1 ofManski'sChapter8 can be usedto formally Since
E[ Uj(3, g, a)] = 0 (A.4)
by(A.O),we havethat,undertheregularity discussedbelow,a Taylorexpansionand the
conditions
weaklawoflargenumbers (WLLN) gives
0 = n-1/2 (OE(g), g, a) = n-"2U(g) + I[n"12(ME(g) - )] + B(g)[n"12(& - a)] + o0(1),
wherenn-2nU(g) = iF1-2U(3,g, a), I E[8Uj(3, g, a)/1a'], whichdoes notdepend
on g.
B(g) B (8Uj 9,g,a)

Thus
nl/2(E(g) -
) =-I-a[B(g)n a) + n-w"2U(g)] + o'(1).
/2(o&- (A.5)
Byassumption, n"2(&- a) = -C-'n-"2 yLM + o,(l), whereC= E[8Mj(a)/8a],Mi Mi(a). Hence
nlE/2(W(g)
-
3) =-I-'n- _jU,(g) - B(g)CMj] + oJ(1).Thus n"/2(/E(g) - 3) is asymptotically
normalwithmeanzeroand varianceB'A(g)(I')', whereA(g) = var[Ui(g)- B(g)C-'Mj] since
nl1/2(E(g) - 3) is a sum of independent mean-zerorandomvariablesplus a termof o,(l).
Formula(A.3a) follows bytheWLLN.
We nextestablish (A.3b)fora semiparametric efficient& under(A.1) usingarguments similarto
thosein Pierce(1982)and Newey(1990).LetLi(at, N7t)-f(SiI Xi; at, -t) be any(regular) parametric
submodelwithtruevaluesa, X forthedensity ofSi givenXi consistent withtherestriction (A.1). Let
Sai = a In Li(a, -)/lat. Let
T = {a(Si, Xi); a(S;, Xi) = a In Li(a, -)/&iit forsomeparametric submodel}.
Note T = {a(S,, Xi); E[a(Si, X,)IXj] = 0 and E[R(Si, Xi; a)a(S1, Xi)' IXi] = 0} since the
scoresa(Si, Xi) are restricted onlyby havinga conditionalmean of zero and by beingcondi-
tionallyuncorrelated withR(Si, Xi; a). It followsfromChamberlain (1987), Begunet al. (1983),
and Newey(1990) that(a) Sj - M~' E T and E[M~'a(Sj, Xi)'] = 0 forall a(S,, Xi) E T and
(b) varA[n'/2(- a)] = {E[Meff(M f)']}-'. Mq' is calledthe efficient scorein thesemiparametric
model(A.1) forthelawofSi givenXi.
Nowbydifferentiating theidentity Ej3 at,,t[U1(3,g, at)] = 0 withrespect to at usingthechainrule
and evaluatingat the truevalues(a, -), we obtainB(g) = -E[Uj(g)S',], whereEg(t,t refers
to expectation withrespectto a density thatdiffers fromthetruthonlyin thatthelaw of Si given
Xi is f(Si lXi; at, at). Similarlydifferentiating this identitywith respectto qt, we obtain
E[Uj(g)a'(Sj, Xi)] = 0 forall a(Si, Xi) E T. Thus,by(a) in thelastparagraph, we concludeB(g) =
-E[Uj(g)(Meff)']. Similarly, the identity E:, at, t[Mj(at)] = 0 impliesC = -E[Mj(M~ff)']. Hence
Ki(g) Us(g) - B(g)C-'Mi = Us(g) - E[Ui(g)(Mql)']{E[Mi(M~f)']}'-Mi. In the special case in
whichM1 = Mqff, Ki(g) is theresidualfromthe(population)leastsquaresregression of Uj(g) on
M,, and a standard calculation givesvar[Ki(g)]= var[Ui(g)]- B(g)C-'B'(g). (A3.b)thenfollows
by(b) in thelastparagraph and theWLLN.
TheoremA.1 is formally provedbynotingthatitis an immediate consequenceofCorollary1 in
Manski'sChapter8 and theabovevariancecalculations whenwe setManski'sfunction g(z, b) equal
g, al)', Mi(at)')' and Manski'sfunction
to (Uw(/3t, r(x) equalto x'x, wherex is a vector.
A.1 If&ut)
Corollary underthejth ofJ nestedcorrectly
efficient
is semiparametric models
specified
E[fg(Sj, Xi)IXj] = r(X1;oak), (j = 1, . J., J), withthedimension withj, thenthe
of a0t increasing
asymptoticvarianceof OE(J)(g) -Mp(g, 5')) is nonincreasingwithj.

impliesthat,forI > 1*, Mie'tis thefirst


Proof Correctspecification j* componentsofMs'f, the
scoreforthejth model.But,by standardleastsquarestheory,
efficient thevarianceoftheresidual
KjJ)(g)basedon thejth modelmustbe lessthanorequalto thatbasedon model1*.
theoremwill be used in provingthe claimsmade in the paragraphfollowing
The following
equation(13).
A.2
Theoremn
(a) varA[fl/2(/3E(g) 3 varA[nh/2(/3
- p3)] (h) -p3)].
(b) varA[nh/2(/3E(h) - p)] = var'fn"/2(f3E(h)- p)], where/3E(h) /3E(h,ae) and /3E(h) /3E(h,&e).

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions
EstimatingExposureEffects 495
Proofof(a) (a) is an immediate
consequenceofthefollowing
twolemmas.
LemmaA.1 The function
g minimizing
varA[nlh/2(&(g)- p)] also minimizes g, &)].
varA[n-l/2U(o,
Proof Bya Taylorexpansion,
we have

= n-'2U(NE(g), , &) - n-"2U(3, g ) + n-1 U(3,g' &) [n1/2OEW - 0)] + Op(0) (A.6)
A furtherTaylor expansionof n-'8U(3, g, &)/If3' around ao and the WLLN proves
= I + op(l),proving
n-'OU(3,g, &)/1f3' thelemma.
LemmaA.2 The function
h minimizes g, a)].
varA[n-l/2U(f,
Proof n"-2U(3,g, &) = n-'2DeiR(Si, Xi; () + n-2 >2[h(Xi)- g(X)]R(Sj, Xi; &) Al + A2(g),
say wherewe have used (A.0) to substitute ej + h(Xi) for Yj - f(Si, Xi, p3). If we can show
covA(Ai, A,(g)) = 0, thenvarA[n-l/2U(O, g, &)] = varA(A,)+ varA[A2(g)], whichis minimizedat
g = h sincevarA[A2(h)] = 0. Now AI and A2(g) have zero covariancesince(a) E[A I(S, X)] = 0
and (b) A2(g) is fixedgiven(S, X) {(Si, Xi); i = 1, ..., n}. (a) and (b) followfromthefactthat
E[e I(S, X)] = 0 and & dependson thedataonlythrough (S, X).
fromthefactthatB(h) = 0 by(A.0).
Proofof(b) (b) follows
In general,we do not knowh(Xi). Therefore, as in Section4, we shallhypothesize a model
h(X,)= g(Xi; 0) whereg(*, -) is a knownfunction and 0 is a vectorofparameters to be estimated.
We estimate 0 by(possibly leastsquaresregression
nonlinear) of Yj - f(Si, Xi,M3E)on Xi,where&3Eis
[3E(g)forg(Xi) O. 0 Let 0 be the(possiblynonlinear)leastsquaresestimator of0. It is clearthatsince
O3E is an n'12-consistentestimator of 3, ifthemodelforh(Xi) werecorrectly specified, n"2(O -0)
wouldhavea nondegenerate limiting
distributionwithmean0. Ifthemodelforh(Xi)weremisspe-
cified, therestillexists0* suchthatn112(O- 0*) hasa nondegenerate limitingdistribution withmean
0. The following theorem showsthatwe can thenuse 0 to construct an adaptiveestimator of 3 that
(1) hasthesamelimiting as 1E(h) ifourmodelh(Xi)is correctly
distribution specified and(2) remains
consistent, asymptotically normalevenifourmodelis misspecified.
TheoremA.3 If n112(O- 0*) has a nondegenerate limitingdistribution with mean 0, then
hasthesamelimiting
&E[g(X5, 0)] distribution
as OE[g(Xi, 0*)]. In particular,
itwillbe consistentand
normalwhether
asymptotically or notthehypothesized modelforh(Xi)is correct, and it willhave
thesamelimiting as fE(h) ifthemodelforh(Xi)is correct.
distribution
Proof Fornotational assumethat0 is one-dimensional.
convenience, It willbe sufficient
to show
that
5& g(O)) =
n-l/2U(o3t n-1/2(U(1t5 &, g(fl*)) + op(1) (A.7)
for11 - 1=
- 0
O(n-"/2).By a Taylor seriesexpansion
& g(Q))
n-l/2U(/3t, = n- 1/2(U(/3t &, g(0l*)) +
n1/2(- 0*)[n-lU[ft, &, g'(fl*)] (A.8)

n. /2( -f 0*)2[nWlU[ft, oj, g"(f*)]] (A.9)

forsome 0* between0 and 0*. Now, ifOt = 3, by TheoremA.1 and Pierce(1982), [n-7U[ft, &,
sinceit has mean0 to op(n-"/2) withvarianceconverging
g'(0*)]] convergesto 0 in probability to 0
as n -* oo. Further,underregularity conditions, thisremainstrueif ot - 0 1= O(n-"/2). It then
followsfromSlutsky'stheorem thatexpression (A.8) convergesin lawto 0 and thusin probabilityto
0. Further, it followsthat
sincen-'U[o3t, &, g"(fl*)] is at mostOp(l) and n"/2(O- 0 *)2 is Op(n-w/2),
expression(A.9) is Oj(n-1/2). Thus equation (A.7) is true.
(1) in theparagraph
Theorem(A.3) and part(b) of Theorem(A.2) implyproposition following
equation (13). Proposition(2) is an easy calculation.

This content downloaded from 62.122.73.86 on Wed, 25 Jun 2014 06:30:31 AM


All use subject to JSTOR Terms and Conditions

You might also like