You are on page 1of 7

Statistics & Probability Letters 55 (2001) 413 – 419

Robust inference for generalized linear models with application


to logistic regression
Gianfranco Adimari ∗ , Laura Ventura
Dipartimento di Scienze Statistiche, Universita di Padova, Via C. Battisti, 241–243, 35121 Padova, Italy

Received January 2001; received in revised form July 2001; accepted August 2001

Abstract
In this paper we consider a suitable scale adjustment of the estimating function which de.nes a class of robust
M-estimators for generalized linear models. This leads to a robust version of the quasi-pro.le loglikelihood which al-
lows us to derive robust likelihood ratio type tests for inference and model selection having the standard asymptotic
c 2001 Elsevier Science B.V. All rights reserved
behaviour. An application to logistic regression is discussed. 

Keywords: Likelihood ratio test; Logistic regression; M-estimator; Quasi-likelihood; Robustness

1. Introduction

Generalized linear models (GLM) (McCullagh and Nelder, 1989) are a technique for modelling the
relationship between p + 1 predictors xi = (1; x1i ; : : : ; xpi )T and a function of the means i of continuous
or discrete response variables yi ; i = 1; : : : ; n. More precisely, a GLM assumes that
i = g(i ) = T xi ;
where  = ( 0 ; 1 ; : : : ; p )T is a vector of unknown parameters belonging to Rp+1 and g(·) is the link function.
The distribution of the response is usually supposed to belong to an exponential family.
As is well known, the maximum likelihood estimator for  and the usual quasi-likelihood estimator are
not robust (see, e.g. McCullagh and Nelder, 1989; Heyde, 1997). Both estimators are de.ned as solutions of
estimating equations of the form
n
 (yi − i )
i = 0; (1)
V (i )
i=1

∗ Corresponding author. Fax: +39-49-827-4170.

E-mail addresses: adimari@stat.unipd.it (G. Adimari), ventura@stat.unipd.it (L. Ventura).

0167-7152/01/$ - see front matter  c 2001 Elsevier Science B.V. All rights reserved
PII: S 0 1 6 7 - 7 1 5 2 ( 0 1 ) 0 0 1 5 7 - 2
414 G. Adimari, L. Ventura / Statistics & Probability Letters 55 (2001) 413 – 419

where i = g−1 (T xi ), i = @i =@T , and V (i ) = Var(yi ) is the variance function, which is assumed to be
known in the quasi-likelihood approach. These estimators are M-estimators with unbounded inFuence functions
(see, e.g. Hampel et al., 1986), which means that large deviations of the response from its mean or outlying
points in the explanatory variables may have a large inFuence on the estimators. In view of this, estimators
with better robustness properties are needed and several robust alternatives have been proposed in the literature
(Pregibon, 1982; Stefanski et al., 1986; KHunsch et al., 1989; Carroll and Pederson, 1993; Preisser and Qaqish,
1999). In particular, Preisser and Qaqish (1999) propose a class of robust estimators in the more general
setting of generalized estimating equations.
Starting from the class of robust estimators discussed by Preisser and Qaqish (1999), Cantoni and Ronchetti
(2001) propose a set of robust inferential tools which apply to the whole class of GLM and are based on a
natural generalization of the quasi-likelihood approach. They de.ne in particular robust deviances and related
tests for model selection that play the same role as the classical tests based on quasi-deviances. However,
unlike their classical counterparts, these tests do not have the standard asymptotic distribution. This is because
the estimating function considered by Cantoni and Ronchetti does not satisfy a condition equivalent to the
second Bartlett identity.
The aim of this paper is to discuss a suitable scale adjustment of the estimating function that de.nes the
class of robust M-estimators considered by Cantoni and Ronchetti (2001). Such an adjustment allows us to
derive a robust quasi-pro.le loglikelihood function which can be used as an ordinary pro.le loglikelihood to
make inference about a scalar parameter of interest, in the presence of nuisance parameters. In particular, the
related likelihood ratio type tests for inference and model selection present the classical asymptotic behaviour.

2. M-estimators and robust inference

Cantoni and Ronchetti (2001) consider a class of Mallows-type robust estimators for GLM, where the inFu-
ence of deviations on the response and on the predictors are bounded separately. A Mallows quasi-likelihood
estimator for  is de.ned as the solution ˆ of the estimating equation
n
() = (yi ; i ) = 0; (2)
i=1

with
 
1
(yi ; i ) = k (ri )w(xi )  i − a() ;
V (i )

where k (u) = min{k; max{u; −k}} is Huber’s -function, for .xed k ¿ 0, andri = (yi − i )= V (i ) are the
n
Pearson’s residuals. The correction term a() = (1=n) i=1 E[ k (ri )]w(xi )i = V (i ) ensures Fisher consis-
tency of the estimator ˆ and can be computed explicitly for binomial, Poisson and logistic models. Here
the expectation is taken with respect to the conditional distribution of y|x. The estimating equation (2) has
a structure suggested by the classical quasi-likelihood equation (1) and is a special case of a more general
estimating equation discussed in Preisser and Qaqish (1999).
The inFuence function for the M-estimator de.ned by (2) is M −1 (y; ), where M = X T BX=n, with
B = diag(b1 ; : : : ; bn ),
    2
@ w(xi ) @i
bi = E k (ri ) log h(yi ; i ) 
@i V (i ) @ i
and h(· ; ·) is the conditional density or probability of yi |xi . The shape of (· ; ·) ensures robustness by putting
a bound on the inFuence function. In particular, function k (·) controls deviations in the y-space, and leverage
G. Adimari, L. Ventura / Statistics & Probability Letters 55 (2001) 413 – 419 415


points are down-weighted by the weights w(xi ). A simple choice for w(xi ) is 1 − hi , where hi is the ith
diagonal element of H = X (X T X )−1 X T and X denotes the design matrix. This choice is suggested by the
classical linear model theory (see Staudte and Sheather, 1990, Section 7). More sophisticated choices for w(·)
are also available (see Cantoni and Ronchetti, 2001, for discussion and some references). Subject to some
regularity conditions, ˆ admits an asymptotic normal distribution with mean  and variance M −1 QM −1 , where
Q = X T AX=n − a()a()T , with A = diag(a1 ; : : : ; an ) and
 2
w2 (xi ) @i
ai = E[ k (ri )2 ] :
V (i ) @ i
Solving Eq. (2) is equivalent to minimizing, with respect to , the quantity


lM () = (b) db;

which can be seen as the robust counterpart of the classical quasi-loglikelihood function. Cantoni and Ronchetti
(2001) use lM () to de.ne robust quasi-deviances and obtain robust tests for model selection. Such tests
are generalizations of the quasi-deviance tests for GLM. In particular, in order to test H0 : (1) = 0 against
H1 : (1) = 0, where (1) is a subset of q 6 p components of , a suitable robust statistic based on lM () is
ˆ − lM ()];
!M = 2[lM () ˙

where ˙ is the estimate of  under H0 . In view of the structure of the function (·; ·), lM (·) takes the form
n
i
 
 yi − t yi − t w(x )
lM () = k  −E k   i dt;
i=1
V (t) V (t) V (t)

where i = g−1 (T xi ). Consequently, the computation of the statistic !M involves n one-dimensional inte-
grations, which can be performed numerically. Note that !M = 2[lMP (ˆ(1) ) − lMP (0)], where lMP (·) is the
quasi-pro.le loglikelihood for (1) .
A diNculty in using tests based on lM () is that, unlike their classical counterparts, they do not have
the standard asymptotic $2 distribution. Indeed, !M is asymptotically distributed as a linear combination of
q independent $12 variables, whose coeNcients are the eigenvalues of a suitable matrix. In general, these
coeNcients depend on the unknown parameter ; see Proposition 1 in Cantoni and Ronchetti (2001) and
Heritier and Ronchetti (1994).
The discrepancy between the asymptotic behaviour of quasi-likelihood ratio type tests in the classical and
in the robust framework occurs because lM () does not verify the relation
 
@
Var(()) = − E () ;
@T
which corresponds to the second Bartlett identity when () is the usual score function. However, Adimari
and Ventura (2001) show that in the presence of a scalar parameter of interest, it is possible to modify
the estimating function (2) so as to obtain quasi-pro.le loglikelihood functions with the standard asymptotic
behaviour.
Let j be a scalar component of  of interest. Let & j ( j ; ) be the estimating function corresponding to
j . Here  indicates the vector  without its jth element. The adjusted quasi-pro.le loglikelihood function for
j (Adimari and Ventura, 2001) can be written as

j
lQP ( j ) = !(b; ˆb )& j (b; ˆb ) db; (3)
416 G. Adimari, L. Ventura / Statistics & Probability Letters 55 (2001) 413 – 419

with
M j ; j − T j M(−j)
−1
j
!( j ; ) = ;
Q j ; j − 2 T j M(−j)
−1

j + T j M(−j)
−1 −T
Q(−j) M(−j) j
where M j ; j is the jth diagonal element of the matrix M , j is the jth column of M without its jth element,
M(−j) denotes M without the jth column and the jth row,
j is the jth column of Q without its jth element,
and ˆ j is the estimate of  for j .xed. Function (3) is obtained from a scale adjustment which corrects the
quasi-pro.le score & j ( j ; ˆ j ) to have information bias of the proper order O(1) (see Adimari and Ventura,
2001; McCullagh and Tibshirani, 1990). Consequently, lQP ( j ) has properties similar to that of the ordinary
pro.le loglikelihood. In particular, for setting quasi-likelihood con.dence regions or for hypothesis testing,
the adjusted quasi-likelihood ratio statistic

ˆj
WQP ( j ) = 2[lQP ( ˆj ) − lQP ( j )] = 2 !(b; ˆb )& j (b; ˆb ) db (4)
j

can be used. Unlike !M , under H0 : j = 0 and usual regularity conditions, WQP (0) is approximately $12
distributed (see Adimari and Ventura, 2001). Asymptotic con.dence regions with nominal coverage 1 − *
2 2
can be constructed as { j : WQP ( j ) 6 $1;1−* }, where $1;1−* is the (1 − *) quantile of the $12 distribution.

Alternatively, the quasi-directed likelihood rQP ( j ) = sgn( ˆj − j ) WQP ( j ), which is approximately standard
normal, can be used. Note that the statistic (4) suNces to verify the signi.cance of a variable in a model
selection procedure.
As the adjustment of & j ( j ; ˆ j ) by the factor !(·; ·) leaves the M-estimator for j unchanged, the robust-
ness properties are maintained. According to the results in Cantoni and Ronchetti (2001) and Heritier and
Ronchetti (1994), these properties will carry over to quasi-likelihood-based inferential procedures.

3. Application: logistic regression

Consider the U.S. Food Stamp data previously analysed by Stefanski et al. (1986), KHunsch et al. (1989),
Carroll and Pederson (1993), in the framework of robust estimation, and Heritier and Ronchetti (1994). The
response (y) indicates participation in the Federal Food Stamp program and the covariates considered in
the study include two dichotomous variables, tenancy (x1 ) and supplemental income (x2 ), and a logarithmic
tranformation of monthly income [log(monthly income + 1)] (x3 ). In all, 150 observations are available.
Previous analyses highlighted that the data contain at least a leverage point (case 5); some authors also
suggest that case 66 is somewhat outlying. Consider the model
 
i
log = 0 + 1 x1i + 2 x2i + 3 x3i :
1 − i

We have i = pr(yi = 1 | xi ) = e xi =(1 + e xi ), with  = ( 0 ; 1 ; 2 ; 3 )T and xi = (1; x1i ; x2i ; x3i )T , V (i )
T T

= i (1 − i ) and i = i (1 − i )xi . The Mallows quasi-likelihood estimator is de.ned by the estimating
function
n
 
 yi − i 
() = k  w(xi ) i (1 − i ) xi − a() ; (5)
i=1
i (1 − i )
where
n  
1 
e− xi ) e
T T
a() = [ xi )(1
k( i − k( − i )]w(xi ) i (1 − i )xi :
n
i=1
G. Adimari, L. Ventura / Statistics & Probability Letters 55 (2001) 413 – 419 417


Fig. 1. Quasi-pro.le
√ score (solid line) and adjusted quasi-pro.le score (dashed line) for 3 when k = 1:55 and (a) w(xi ) = 1 − hi ,
(b) w(xi ) = 1 − hi and case 5 is removed, (c) w(xi ) = (1 − hi )2 .

The diagonal elements of the matrices A and B are


 
2
ai = [ k ( e − T
x i ) − 2 T xi )(1 − i )]w2 (xi )i (1 − i )
i k( e

and
 
e− xi ) e
T T
bi = [ k( + k(
xi )]w(x
i )[i (1 − i )]3=2 :
The parameter of interest is the fourth component of , 3 , which corresponds to monthly income. Previous
analyses showed that its estimate and the corresponding signi.cance level for H0 : 3 √ = 0 change considerably
if case 5 is deleted. The Mallows estimator de.ned by (5) with k = 1:55 and w(xi ) = 1 − hi produces results
that are very similar to those obtained from the conditional estimators proposed in KHunsch et al. (1989), and
appears to be suNciently robust. In particular, ˆ3 = − 1:18 with estimated standard error 0.50. On the other
hand, the classical maximum likelihood estimator yields −0:33 with estimated standard error 0.27.
To illustrate the use of the adjusted quasi-pro.le loglikelihood (3), we test the hypothesis H0 : 3 = 0
against the ˆ
√ alternative H1 : 3 = 0. The Wald-type test based on the Mallows estimator 3 with k = 1:55 and
w(xi ) = 1 − hi rejects the null hypothesis at the 5% signi.cance level. The corresponding test based on the
adjusted quasi-likelihood ratio statistic (4) performs similarly to the classical Wald test and gives a p-value
of 0.135. The discrepancy between these results is explained in Fig. 1, which gives the quasi-pro.le score
418 G. Adimari, L. Ventura / Statistics & Probability Letters 55 (2001) 413 – 419


Fig. 2. Adjusted quasi-pro.le loglikelihood ratio statistic WQP for the parameter 3 when k = 1:55 and w(xi ) = 1 − hi . The horizontal
line is at the asymptotical 95% signi.cance level.

& 3 ( 3 ; ˆ 3 ) (solid line) and the adjusted quasi-pro.le score !( 3 ; ˆ 3 )& 3 ( 3 ; ˆ 3 ) (dashed line) for 3 , when
k = 1:55 and

(a) w(xi ) = √1 − hi ;
(b) w(xi ) = 1 − hi and case 5 is removed;
(c) w(xi ) = (1 − hi )2 .
Fig. 1 shows the inFuence of observation
√ 5 on the estimate ˆ3 and on the shape of the adjusted quasi-pro.le
score. Note that, when w(xi ) = 1 − hi , the shape of the adjusted quasi-pro.le score is still largely inFuenced
by observation 5, unlike the numerical value of the estimate ˆ3 . Since the adjusted quasi-likelihood ratio
statistic is twice the area under the dashed curve from ˆ3 to 0, it is clear why the likelihood ratio type test
fails to reject the null hypothesis. This result suggests that care is needed in choosing the weight function and
con.rms√some limits of automatic procedures for measuring leverage. In particular, it seems that the function
w(xi ) = 1 − hi does not downweight observation 5 enough. For comparison, √ the following table√shows the
weights given to cases 5 and 66 by three diSerent functions, namely 1 − hi , (1 − hi )2 and 1 − hi .

√ √
Case 1 − hi (1 − hi )2 1− hi
5 0.828 0.471 0.440
66 0.992 0.969 0.875

If we use the adjusted quasi-likelihood ratio statistic based on (5) with k = 1:55 and w(xi ) = (1 − hi )2 , we
obtain a p-value of 0.022, which leads to the rejection of the null hypothesis.
The adjusted quasi-likelihood approach can also be used to construct con.dence intervals. As an exam-
ple, Fig. 2 plots the √
adjusted quasi-pro.le loglikelihood ratio function WQP ( 3 ) computed from (5) with
k = 1:55 and w(xi ) = 1 − hi . The horizontal line is at the asymptotical 95% signi.cance level. DiSerently
G. Adimari, L. Ventura / Statistics & Probability Letters 55 (2001) 413 – 419 419

from what happens in the classical framework where one uses the asymptotic distribution of ˆ3 , con.dence
intervals obtained by our technique are not symmetric. A simulation experiment (based on 5000 Monte Carlo
trials) has been run to evaluate the real coverage of the nominal 1-* con.dence intervals for 3 obtained
by the adjusted quasi-pro.le loglikelihood. The responses yi have been generated according to the model
T T
pr(yi = 1 | xi ) = e∗ xi =(1 + e∗ xi ), where ∗ = (6; −1:8; 0:7; −1:2)T . For nominal 0.90, 0.95 and 0.99 coverage
probabilities we obtained real coverages of 0.898, 0.953 and 0.988, respectively.

4. Conclusion

The adjusted quasi-pro.le loglikelihood obtained from an estimating function that de.ne Mallows-type robust
estimators for GLM (Cantoni and Ronchetti, 2001) is appealing. It represents a robust version of the pro.le
loglikelihood and leads to robust inference and model selection. In particular, it allows us to derive robust
likelihood ratio type pivots with approximate standard $12 distribution. In addition, the example discussed
in Section 3 showed that this tool is also useful in evaluating the resistance properties of the underlying
estimating function.

References

Adimari, G., Ventura, L., 2001. Quasi-pro.le loglikelihoods for unbiased estimating equations. Ann. Inst. Statist. Math., to appear.
Cantoni, E., Ronchetti, E., 2001. Robust inference for generalized linear models. J. Amer. Statist. Assoc. 96, to appear.
Carroll, R.J., Pederson, S., 1993. On robustness in the logistic regression model. J. Roy. Statist. Soc. B 55, 693–706.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on InFuence Functions.
Wiley, New York.
Heritier, S., Ronchetti, E., 1994. Robust bounded-inFuence tests in general parametric models. J. Amer. Statist. Assoc. 89, 897–904.
Heyde, C.C., 1997. Quasi-Likelihood and its Application. Springer, New York.
KHunsch, H.R., Stefanski, L.A., Carroll, R.J., 1989. Conditionally unbiased bounded-inFuence estimation in general regression models,
with applications to generalized linear models. J. Amer. Statist. Assoc. 84, 460–466.
McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, 2nd Edition. Chapman & Hall, London.
McCullagh, P., Tibshirani, R., 1990. A simple method for the adjustment of pro.le likelihoods. J. Roy. Statist. Soc. B 52, 325–344.
Pregibon, D., 1982. Resistant .ts for some commonly used logistic models with medical applications. Biometrics 55, 574–579.
Preisser, J.S., Qaqish, B.F., 1999. Robust regression for clustered data with applications to binary regression. Biometrics 55, 574–579.
Staudte, R.G., Sheather, S.J., 1990. Robust Estimation and Testing. Wiley, New York.
Stefanski, L.A., Carroll, R.J., Ruppert, D., 1986. Optimally bounded score functions for generalized linear models with applications
to logistic regression. Biometrika 73, 413–424.

You might also like