Professional Documents
Culture Documents
Technometrics
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utch20
To cite this article: Diane Lambert (1992) Zero-Inflated Poisson Regression, With an Application to Defects in
Manufacturing, Technometrics, 34:1, 1-14
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall
not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other
liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Q 1992 American Statistical Association and
TECHNOMETRICS, FEBRUARY 1992, VOL. 34, NO. 1
the American Society for Quality Control
Zero-inflated Poisson (ZIP) regression is a model for count data with excesszeros. It assumes
that with probability p the only possible observation is 0, and with probability 1 - p, a
Poisson(A) random variable is observed. For example, when manufacturing equipment is
properly aligned, defects may be nearly impossible. But when it is misaligned, defects may
occur according to a Poisson(A) distribution. Both the probability p of the perfect, zero defect
state and the mean number of defects A in the imperfect state may depend on covariates.
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
Standard arguments suggest that, when a reliable The same or different covariates might affect p and
manufacturing process is in control, the number of A, and p and A might or might not be functionally
defects on an item should be Poisson distributed. If related. When p is a decreasing function of A, the
the Poisson mean is A, a large sample of 12items probability of the perfect state and the mean of the
should have about ne-” items with no defects. Some- imperfect state improve or deteriorate together.
times, however, there are many more items without Heilbron (1989)concurrently proposedsimilar zero-
defects than would be predicted from the numbers altered Poisson and negative binomial regression
of defects on imperfect items (an example is given models with different parameterizations of p and ap-
in Sec. l), One interpretation is that slight, unob- plied them to data on high-risk behavior in gay men.
served changesin the environment causethe process (Although the models were developed independ-
to move randomly back and forth between a perfect ently, the acronym ZIP is just an apt modification
state in which defects are extremely rare and an im- of Heilbron’s acronym ZAP for zero-altered Pois-
perfect state in which defects are possible but not son.) He also considered models with an arbitrary
inevitable. The transient perfect state, or existence probability of 0. Arbitrary zeros are introduced by
of items that are unusually resistant to defects, in- mixing point mass at 0 with a positive Poisson that
creasesthe number of zeros in the data. assignsno mass to 0 rather than a standard Poisson.
This article describesa new technique, called zero- Other authors have previously considered mixing
inflated Poisson (ZIP) regression, for handling zero- a distribution degenerateat 0 with distributions other
inflated count data. ZIP models without covariates than the Poisson or negative binomial. For example,
havebeen discussedby others (for example, seeCohen Feuerverger (1979) mixed zeros with a gamma dis-
1963; Johnson and Kotz 1969), but here both the tribution and coupled the probability of 0 with the
probability p of the perfect state and the mean A of mean of the gamma to model rainfall data. Farewell
the imperfect state can depend on covariates. In par- (1986) and Meeker (1987) mixed zeros with right-
ticular, log(A) and logit = log(pl(1 - p)) are censored continuous distributions to model survival
assumed to be linear functions of some covariates. data when some items are indestructible and testing
1
DIANE LAMBERT
0 0
0
OI i 0
Figure 1. The Data From the Printed Wiring Board Experiment. Each boxplot shows the number of defects on one of the 25
boards in the experiment. The boxes are labeled by the board factors. When the box degenerates to a line, as happens for the
leftmost board, which has large openings, thick solder, and mask A, 75% or more of the 27 counts on the board were 0. Counts
beyond 1.5 interquartile ranges of the nearest quartile are plotted individually as circles.
is stopped before all items that can fail have failed. Mask is the most unbalanced factor, C is its most
Regression models that mix zeros and Poissonsare unbalanced level, and no board had large openings
described in detail in Section 2. Section 3 shows how with mask C or small openings with mask E (see
to compute maximum likelihood estimates (MLE’s). Fig. 1).
Section 4 discussesfinite-sample inference in the con- In each area, only the total number of leads im-
text of simulations. Section 5 applies ZIP regression properly soldered was recorded, giving 27 responses
to the manufacturing data introduced in Section 1. between 0 and 48 for each of the 25 boards; that is,
Section 6 gives conclusions. the response vector Y has 675 elements between 0
and 48. Out of the 675 areas, 81% had zero defects,
1. THE MOTIVATING APPLICATION 8% had at least five defects, and 5.2% had at least
nine. Plainly, most areas had no defects, but those
Components are mounted on printed wiring boards that did have defects often had several, as Figure 1
by soldering their leads onto pads on the board. An shows.
experiment at AT&T Bell Laboratories studied five Since each lead has only a small probability of not
influences on solderability: being soldered and there are many leads per area, it
Mask-five types of the surface of the board (A- is reasonable to fit a log-linear Poisson(A) model;
El that is, log(A) = BP for some design matrix B and
Opening-large (L), medium (M), or small (S) coefficients p. Table 1 summarizes the fit of several
clearances in the mask around the pad such models.
Solder-thick or thin
Pad-nine combinations of different lengths and Table 1. Fits of Some Poisson Models for the Number of
widths of the pads (A-I) Defects per Area
Panel-l = first part of the board to be soldered; Residual
2 = second; 3 = last Log- degrees
Highest terms in the model likelihood of freedom
Even the richest log-linear Poisson model, which 144).] In many applications, however, there is little
has a three-way interaction between mask, opening, prior information about how p relates to A. If so, a
and solder, predicts poorly. Although 81% of the natural parameterization is
areashad no defects,the model predicts that only 71%
of the areas will have no defects, since Xf151P(Y = log(A) = BP and logit = - ~Bf3 (4
Olfii)/675 = .71. Large counts are also underpre- for an unknown, real-valuedshapeparameter7, which
dieted. Although 5.2% of the areashad at least nine implies that pi = (1 + A;) - l. In the terminology of
defects, the model predicts only 3.7% will have at generalized linear models, log(A) and logit are the
least nine defects. Poisson models with fewer inter- natural links or transformations that linearize Pois-
action terms are even worse. For example, the richest son means and Bernoulli probabilities of success.If
model for which all coefficients are estimable pre- the term BP is thought of as stress, the natural links
dicts only 68% zeros and 3.4% counts of at least for p and A are both proportional to stress.ZIP model
nine. Modeling the split-plot nature of the experi- (2) with logit link for p, log link for A, and shape
ment in a more complicated, generalized Poisson parameter T will be denoted ZIP(r).
regression still does not give good predictions, as The logit link for p is symmetric around .5. Two
Section 5.2 shows. In short, there are too many zeros popular asymmetric links are the log-log link defined
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
and too many large counts for the data to be Poisson. by log( -log(p)) = ~Bf3 or, equivalently, pi =
exp( - q) and the complementary log-log link de-
2. THE MODEL fined by log( - log(1 - p)) = - TBP or pi = 1 -
In ZIP regression, the responsesY = (Yl, . . . , exp( - A;‘). Heilbron (1989) used an additive log-
Y,)’ are independent and log link defined by pi = exp( - rAi) or log( - log(p))
= BP + log(r). Linear, instead of proportional or
Yi - 0 with probability pi additive, logit links and log-log links could be defined
- Poisson(&) with probability 1 - pi, by logit = log(a) - TBP and log( -log(p)) =
log(o) + TBP, respectively.
so that With any of these links, when r > 0 the perfect
Yi = 0 with probability pi + (1 - pi)ecAc state becomes less likely as the imperfect mean in-
creases, and as r --, ~0the perfect state becomes
= k with probability (1 - pi)e-“‘hf/k!, impossible if Ai stays fixed. As r + 0 under the ad-
k = 1,2, . . . . ditive log-log link and as T ---, -CC under the other
links, the perfect state becomes certain. Negative T
Moreover, the parameters A = (A,, . . . , A,)’ and are not permitted with the additive log-log link, but
P = (PIT * . . , p,)’ satisfy for the other links with T < 0, the Poisson mean
log(A) = BP and increases as excesszeros become more likely. This
could be relevant in manufacturing applications if
logit = log(p/(l - p)) = Gy (1) setting parameters to improve the fraction of perfect
for covariate matrices B and G. items leads to more defects on imperfect items.
The covariates that affect the Poisson mean of the
imperfect state may or may not be the same as the 3. MAXIMUM LIKELIHOOD ESTIMATION
covariates that affect the probability of the perfect The number of parameters that can be estimated
state. When they are the same and A and p are not in a ZIP regression model depends on the richness
functionally related, B = G and ZIP regression re- of the data. If there are only a few positive counts
quires twice as many parameters as Poisson regres- and A and p are not functionally related, then only
sion. At the other extreme, when the probability of simple models should be considered for A. The sim-
the perfect state does not depend on the covariates, ulations in Section 4 suggest that the data are ade-
G is a column of ones, and ZIP regression requires quate to estimate the parameters of a ZIP or ZIP(T)
only one more parameter than Poisson regression. model if the observed information matrix is nonsin-
If the same covariates affect p and A, it is natural gular. This section showshow to compute the MLE’s;
to reduce the number of parameters by thinking of the observed information matrix is given in the
p as a function of A. Assuming that the function is Appendix.
known up to a constant nearly halves the number of
parameters needed for ZIP regression and may ac- 3.1 A and p Unrelated
celerate the computations considerably. [The maxi-
mum likelihood computations are no faster than those When A and p are not functionally related, the log-
for least squares, which are O(k3) in the number of likelihood for ZIP regression with the standard par-
parameters k; for example, see Chambers (1977, p. ameterization (1) is
TECHNOMETRICS, FEBRUARY 1992, VOL. 34, NO. 1
DIANE LAMBERT
L(y) P; y) = yzo log(ec~y + exp( - eB@)) E Step. Estimate Zi by its posterior mean Zik)
under the current estimates Y(“) and Pck).Here
+ ZIo OtiBiP - eBcB) .Zik) = PIperfect stately,, y(*), fi(“)]
= (1 + e- c&+exp(Bl6(*)) 1- 1 ify, = 0
=O ify,= 1,2,.
(3)
M step fir j3. Find P(“+l) by maximizing L,(p;
y, Zck)).Note that @ck+
l) can be found from a weighted,
where Gi and Bi are the ith rows of G and B. The log-linear Poisson regression with weights 1 - Zck),
sum of exponentials in the first term complicates the as described by McCullagh and Nelder (1989), for
maximization of L(y, p; y). But suppose we knew example.
which zeros came from the perfect state and which
came from the Poisson; that is, suppose we could M step for y. Maximize L,(y; y, Z(“)) = 2,,,zo
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
observe .Zi = 1 when Y, is from the perfect, zero Zjk)Giy - 2y,=o Zfk) log(1 + e”“) - X.:=,(1 -
state and Zi = 0 when Yi is from the Poisson state. Zik)) log(1 + eG”) as a function of y. (The equality
Then the log-likelihood with the complete data (y, holds becauseZjk) = 0 whenever yi > 0.) To do this,
z) would be suppose that no of the yls are 0. Say yi,, . . . , yin0
are 0. Define y: = (yl, . . . , y,, yil, . . . , yin”),
G; = (G;,. . . , GL, G,!,, . . . , G&), and P: = (pl,
* . . tPn,Pil,. . ’ >pi,,). Define also a diagonal matrix
W@) with diagonal wck) = (1 - Z(lk), . . . , 1 -
Zik’, zp, . . . , Zi$)‘. In this notation, L,(y; y, Zck))
+ jIl l”df(Yilzi~ PI) = Cyzy Y.&~) G,iy - Zy’_fr”wjk) log(1 + eG*lr)
the gradient dr score function is G:W@)(y, - P,j
n = 0, and the Hessian or negative of the information
= C (ZiGiy - lOg(1 + eG1’)) matrix is - G:W(k)Q,G,, where Q, is the diagonal
i=l
matrix with P,(l - P,) on the diagonal. These func-
tions are identical to those for a weighted logistic
+ ,zl(1- zi)(YiB# - eBiP) regression with response y*, covariate matrix G,,
and prior weights wck); that is, ~(~1can be found by
weighted logistic regression.
The EM algorithm converges in this problem (see
the Appendix). Moreover, despite its reputation, it
converges reasonably quickly because the MLE for
the positive Poissonlog-likelihood is an excellent guess
- i: (1 - Zi)lOg(yi!). for p. The positive Poisson log-likelihood is
I=1
L+(P;s’+) = y~n(~iBiP - eB@>
This log-likelihood is easyto maximize, becauseL,(y;
-.pg(l - e- exp@y - y-” log(y,!),
y, z) and L&3; y, z) can be maximized separately.
With the EM algorithm, the incomplete log-like-
lihood (3) is maximized iteratively by alternating be- its score equations are
tween estimating Z, by its expectation under the cur- eB+P
rent estimates of (y, g) (E step) and then, with the
Zts fixed at their expected values from the E step,
B: Y+ -
1 - exp( -eB+P)
0,
maximizing L,(y, p; y, z) (M-step) (for example,
Dempster, Laird, and Rubin 1977). Once the ex- and its Hessian is B’+DB + , where the subscript +
pected 2,‘s converge, the estimated (y, p) converges, indicates that only the elements or rows correspond-
and iteration stops. The estimates from the final it- ing to positive yl’s are used and D is a diagonal matrix
eration are the MLE’s (9, fi) for the log-likelihood with diagonal
(3). eB+P(l - e-@*p(l f eB+P))
In more detail, iteration (k + 1) of the EM al-
gorithm requires three steps. (1 - e-eB+P)Z .
The score equations can be solved by iteratively re- matrices 1-l and I;‘, respectively. (The formulas for
weighted least squares (see Green 1984). I and I, are given in the Appendix.) Thus, for large
The initial guessfor y has been unimportant in the enough samples, the MLE’s and regular functions of
examples and simulations considered to date. One the MLE’s, such as the probability of a defect and
possibility when y includes an intercept is to set all the mean number of defects, are nearly unbiased.
elements other than the intercept to 0 and estimate Normal-theory, large-sample confidence intervals
the intercept by the log odds of the observed average are easy to construct, but they assumethat the log-
probability of an excess 0. The observed average likelihood is approximately quadratic near the MLE.
probability of an excess0 is When it is not quadratic a few standard errors from
#(yi = 0) - 2 e- -P@‘#“))
the MLE, normal-theory confidence intervals can
h =
mislead. Likelihood ratio confidence intervals are
n more difficult to compute but are often more trust-
(If the fraction of zeros is smaller than the fitted worthy. A two-sided (1 - a)lOO% likelihood ratio
Poisson models predict, there is no reason to fit a confidence interval for /3r in ZIP regression, for ex-
ZIP regression model.) ample, is found by computing the set of p1 for which
There are other algorithms, such as the Newton- %W, b> - max,,p-CKP1~ P-A 7)) < XL where
,& is the upper a quantile of a x2 distribution with
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
were found by the Newton-Raphson algorithm when when y cannot. Moreover, if we require that the
the log-likelihood was first maximized over p for observed information matrix I be nonsingular, then
fixed 7, as described in Section 3. For y1 = 25, the the bias in 9 and fi decreases dramatically, espe-
Newton-Raphson algorithm failed to find ZIP(T) cially for n = 25.
MLE’s when started from the ZIP MLE’s in 180runs, Looking at only runs with nonsingular observed
but it succeededin these runs when 7 was arbitrarily information, the standard deviations of 8, and p, es-
initialized to 10. In short, the ZIP and ZIP(T) regres- timated from observed information are reasonably
sions were not difficult to fit. close to the sample standard deviations of
fil and b2 (within 8% for ZIP regression and n =
4.2 Simulated Properties of the MLE’s
25, within 14% for ZIP(T) regression and n = 25,
Table 2 summarizes the finite-sample properties and less than 4% otherwise). The results for +r are
of the estimated coefficients. For all n, the mean fi slightly worse, and those for F2and + are noticeably
is close to the true p for both ZIP and ZIP(T) regres- worse (about 50% too small for IZ5 50); that is, the
sions. For n = 50, however, the mean 9 is far from precision in the estimated slope of the relationship
y, and for IZ = 25 the mean 9 is off by a factor of between the covariate and the log odds of the prob-
4,500. This is not surprising, becauseeven standard ability of a perfect 0 is hard to estimate well. As
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
logistic regression MLE’s have infinite bias in finite often happens, standard deviations based on ex-
samples. ZIP regressions are, of course, harder to pected information seriously underestimate the var-
fit than standard logistic regressionsbecauseit is not iability in the MLE’s and should not be trusted.
known which of the zeros are perfect. What is en- Standard deviations based on observed information
couraging, however, is that p can be estimated even are also optimistic but to a much lesser extent.
Table 2. Behavior of ZIP and ZIP(r) Coefficients As Estimated From 2,000 Simulated Trials
*For n = 100. observed information was never singular. ZIP observed information was singular in 17 runs for n = 50 and
in 160 runs for n = 25; ZIP(T) observed information was singular in one run for n = 50 and in 62 runs for n = 25.
4.3 Simulated Behavior of Confidence x between 0 and 1. In Figure 2, all 2,000 simulation
Intervals and Tests runs are used, including those with singular observed
information, because predictions from models with
Table 3 shows that 95% normal-theory confidence
poorly determined coefficients can be valid. The top
intervals for y2 are unreliable for IZ as large as 100
left plot shows that the average relative bias is much
and one-sided normal-theory error rates for T are
better for n = 50 or 100 than it is for n = 25, but
obviously asymmetric.Fortunately, even for 12as small
even for IZ = 15 the relative bias lies only between
as 25, ZIP and ZIP(T) 95% likelihood ratio confi-
-3.8% and 2.5%. For all n, E(Y(x) is, on average,
dence intervals perform well when only runs with
overestimated near x = 0 and underestimated near
nonsingularobservedinformation are used (seeTable
x = 1. Loosely speaking &Y/x) shrinks toward
3). For it = 25, 6.3% of the ZIP 95% intervals for n-l qL1 8( Y)x,). Figure 2 also shows that the rel-
& and 5.5% of the 85% intervals for y* did not cover
ative bias in E’(Y Ix) is even smaller for ZIP(T) regres-
the true values; 5.6% of the ZIP(T) 95% intervals
sion than for ZIP regression, except for x near 0.
for p2 and 5.6% of the 95% intervals for 7 did not
The right half of Figure 2 shows properties of the
cover the true values. For IZ = 50, the corresponding
relative bias of the estimated probability that Y =
error rates were 5.3% and 5.5% for ZIP and 5.1%
0; that is, it shows [P(Y = 01x) - P(Y = O(x)]/
and 4.5% for ZIP(r) regression. The errors were not
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
Table 3. Error Rates for 95% Two-Sided ZIP and ZIP(r) Confidence Intervals From 2,000
Simulated Trials
ZIP ZIP(T)
P2 Y2 P2 Y?
I
I I I I
0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.2 0.3 0.4 0.5 0.6 0.7 0.8
True Mean E(Y) True P(Y = 0)
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.2 0.3 0.4 0.5 0.6 0.7 0.8
True Mean E(Y) True P(Y = 0)
Figure 2. Bias in the Estimated Mean of Y and Estimated Probability of a Zero for 2,000 Simulated Trials. The left plots
describe &Y)lE(Y) - 1; the right plots show&Y = 0)IPIY = 0) - 7. The solid line represents n = 25, the dashed line represents
n = 50, and the dotted line represents n = 100.
NOTE: Some effects in the mask l opening interaction are not estimable. All
Although there are many parameters in the ZIP models have all main effects; the three-way interaction model also has all two-
regression model (5), 24 for A and 17 for p, the way interactions.
Table 5. Fits of ZIP Regression Models With Logit(pJ = solder, which is a “bad” combination, but no board
Panel t Pad + Opening + Mask
with mask C has large openings and thick solder,
Residual which is a “good” combination. Under the unbal-
Highest term in the model Log- degrees anced design the marginal mean for mask C is 7.47;
for log(A) likelihood of freedom
under the balanced design it is only 4.48.
No interactions - 563.3 640 Together, Figures 3 and 4 describe the roles of the
Opening c solder - 524.0 636 factors in detail. For example, pad e improves the
Mask t solder - 537.9 637
Mask l opening - 524.3 634
mean number of defects because it improves the
Mask t solder + opening l solder - 511.2 634 probability of the perfect state and the mean in the
NOTE: The model for p excludes solder. because including it increases the log-
imperfect state. Conditions Athin and Bthick have
likelihood by less than 1.0 whenever .4 includes at least one interaction. Observed about the same marginal means, but Athin is more
information is singular
model for A. even if a
when the mask l opening interaction is included in the
full rank design matrix is used. All models for h have all
often in the perfect state and Bthick hasfewer defects
main effects. in the imperfect state. Small openings are less likely
to be in the perfect state than large openings. But
son means range from .02 to 41.8 with a median of
1.42 and quartiles of .38 and 3.14.
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
I_
defects almost as badly as the pure Poisson model Table 6. Fits of ZIP(r) Regression Models
does. Additionally, although ZIP regressiondoes not
Residual
include a parameter for random board variability, Highest term in the model Log- degrees
the ZIP estimates of the mean total number of for log(A) likelihood of freedom
defects on a board, Cj?li (1 - pij)fiij, are as good as No interactions - 643.2 656
the negative binomial mean estimates Xf’i i,. Opening l solder - 609.4 654
The 25 residuals observed - expected for the total Mask l solder -611.2 652
Mask c opening -602.4 650
number of defects per board have mean 7.6 under Mask * solder + opening + solder - 595.0 650
the negative binomial and mean - .3 under the ZIP
NOTE: All models have all main effects. Not all mask opening coefficients are
and quartiles -3.8 and 4.2 under the negative bi- estimable.
l
nomial and -2.9 and 1.9 under the ZIP. Thus the
negative binomial model is not as good as the ZIP
regression model for these data, and simply increas- 5 shows. Indeed, the ZIP(r) regressionis little better
ing the variability in the Poisson distribution does than the Poisson for predicting large counts. It is
not necessarilyaccommodateexcesszeros. Of course, much better than the Poisson at predicting zero, one,
inflating a negative binomial model with “perfect or two defects, however. Second, the Pearson resid-
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
zeros” might provide an even better model for the uals, defined by (Yi - &YJ)l[var(YJ]l’z, are gen-
printed-wiring-board data than ZIP regression does. erally worse for ZIP(T) regression than for ZIP
Such a model was not successfully fit to these data, regression,as Figure 6 shows.For 16 out of 25 boards,
however. most of the ZIP(r) absolute residuals exceedthe ZIP
absolute residuals. For four other boards, the ZIP
5.3 ZIP (7) Regression
and ZIP(r) absolute residuals are nearly the same.
Since the factor levels in the fitted model (5) that Most of the estimated effects on A are similar for
improve A generally tend to improve p as well, it is ZIP(T) and unconstrained ZIP regression. There are
reasonable to fit ZIP(r) regressions to the printed- some differences, however. For example, boards with
wiring-board data. Table 6 summarizes the fits of small openings and thick solder cannot be distin-
several ZIP(r) models. The model with mask * solder guished from boards with large openings and thin
and opening * solder interactions has significantly solder in the ZIP(r) model, but they can be distin-
higher likelihood than the others and is the only one guished in the ZIP model (large openings with thin
considered here. solder are worse). Moreover, the marginal mean for
The best ZIP(T) regression, which has ? = .81, fits Cthick is better than the marginal mean for Ethin
much better than the Poisson regressionsin Table 1 under the ZIP model but not under the ZIP(r) model.
but not as well as the ZIP regression (5). First, the Although ZIP(r) regressiondoesnot fit the printed
ZIP(r) model underestimates the probability of a 0 wiring board data as well, it might be favored over
and the probability of at least nine defects, as Figure the ZIP regression becauseit has fewer parameters,
Figure 6. Difference in Absolute Pearson Residuals Under the ZIP(T) and ZIP Regressions for the 25 Printed Wiring Boards.
Positive values imply that the absolute ZIP(r) residual is larger than the absolute ZIP residual. Thus boxes that are positioned
above the y = 0 line favor the ZIP regression model over the ZIP(r) regression model.
fits better than the Poisson, and can be easier to BP - log(T), or (Y - 7BB should make the com-
interpret, since the same coefficients describe the putations tractable. Mixing zeros with nonexponen-
imperfect mean, the probability of the perfect state, tial family discrete distributions, such as negative
and the mean of Y. But in this application, ZIP binomial with unknown shape, however, may be
regression was preferred for two reasons. First, sep- computationally more difficult.
arating the effects on the perfect state from the ef-
fects on the imperfect mean made more senseto the ACKNOWLEDGMENTS
engineers than coupling the parameters of the two Many thanks are owed Don Rudy, who provided
models. More importantly, ZIP regression gave re- not only the data but also patient explanations about
sults that agreed more closely with the engineers’ printed wiring boards and an attentive audience for
prior experience. When the same manufacturing ex- new ways to model the data, and John Chambers,
periment had been run earlier with older practices Anne Freeny, Jim Landwehr, Scott Van der Weil,
and different equipment at a different factory, few and the referees and associateeditor whose thought-
counts of 0 had been observed. A Poisson regression ful comments led to a better article.
fit the earlier data well. When the samekind of Pois-
son regression model was fit to the later data, how- APPENDIX: SOME DETAILS
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
ever, the effects of some of the factors changed. But A.1 CONVERGENCE OF THE EM ALGORITHM
when the Poisson model was expanded to the ZIP,
the estimated effects on the imperfect mean agreed The EM algorithm is monotone in the sensethat
well with the estimated effects from the earlier ex- the log-likelihood L(8) (3) satisfiesL(E)(k+l))2 L(N‘)),
periment; that is, the two experiments led to the where 0 = (y, B) (for example, Dempster et al.
sameconclusions about large numbers of defects, but 1977).BecauseL(8) is bounded above, monotonicity
the earlier experiment was never in a state that pro- implies that the sequence of log-likelihoods gener-
duced very low levels of defects and the later one ated by the EM iterations converges, although not
was. This simple conclusion was so appealing that necessarily to a stationary point of L. The proof of
ZIP regressionwas preferred over ZIP(r) regression. that depends on the smoothness of Q(e*, 0) =
sw*l Y, 41Y, 01.
6. CONCLUSIONS Conditional on yi, the unobserved indicator Zi of
ZIP regression is a practical way to model count the perfect state is identically 0 if Yi > 0 and Ber-
data with both zeros and large counts. It is straight- noulli(r,) if yi = 0, where ri is the posterior proba-
forward to fit and not difficult to interpret, especially bility of the perfect state:
with the plots introduced here. Expressing the zero- eGt~ +A,
inflation term p as a function of A reducesthe number ri = 1 + eGr~+Ar if yi = 0
of parametersand simplifies interpretation. Both kinds
of parameterization have been fit without difficulty = 0 ify, = 1, 2, . . . .
to simulated and real data in this article. Further- Therefore,
more, ZIP and ZIP(r) MLE’s are asymptotically nor-
mal, twice the differences of log-likelihoods under ece*, 0)
nested hypotheses are asymptotically $, and simu- + y,B#* - eB@* - log(y,!)
lations suggest that log-likelihood ratio confidence 1 + &iy + exp@@*) 9
intervals are better than normal-theory confidence
intervals. which is continuous in 8* and 8. Using theorem 2 of
ZIP regression inflates the number of zeros by Wu (1983), this continuity guarantees that the se-
mixing point mass at 0 with a Poisson distribution. quence of EM log-likelihoods converges to a sta-
Plainly, the number of zeros in other discrete ex- tionary point of L(B). Moreover, since the first de-
ponential family distributions can also be inflated by rivative of Q(e*, 0) with respect to 8* is continuous
mixing with point mass at 0. The natural parame- in 8* and 8, it follows that 9@)converges to a local
terization r](p) of the mean p of the discrete distri- maximizer of L(B) (corollary 1, Wu 1983).
bution would be linear in the coefficients B of some
covariates B, and the logit (or log-log or comple-
A.2 ASYMPTOTIC DISTRIBUTION OF (9, rj,
mentary log-log) link of the probability p of the per-
AND THE ZIP LOG-LIKELIHOOD
fect state would be linear in the coefficients y
of covariates G. When the probability of the per- Define q = 1 - p and s = 1 - r, and let 0, $,
fect state depends on n(p), taking logit( or i, and j: be the analogous quantities with the MLE’s
- log( - log(p)), or log( - log(1 - p)) equal to - rBP, substituted for the true parameter values. Then the
i =
[’ I,
f1 i.*
‘3 ‘4
where D,,, is diagonal with elements 4 - I, D,,b is where i1 = B’D,B and D, is diagonal with elements
diagonal with elements -b, and Db,b is diagonal T*(p(r - p) - r(1 - r)) + A(1 - p) - Ap(1 - r)
with elements A(1 - P)(l - A?). Expected infor- + 2rAp (1 - r), i, = B’D, and Dz is diagonal with
mation is elements 7BBp(R - p) + ABBp(1 - R), and 1, =
W)‘WMr - P).
If t~-‘i~,~ has a positive-definite limit, then
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013
[P-P
(&,, +,J maximizes the log-likelihood (4) under a null
#2 -
1- Y
- asymptotically normal(O, ni;,;). hypothesis H,, of dimension q0 and (B, e) maximizes
the log-likelihood (4) under a nested alterna$ve hy-
pothesis H of dimension q > 49, then 2[L(p, +) -
Observed information can be substituted for ex- m3 %)I- asymptotically x3 _qO.
pected information in the asymptotic distribution. If
(+,, PO)maximizes the log-likelihood (3) under a null [Received April 1990. Revised July 1991.]
hypothesis Z-J,of dimension q9 and (9, fi) maximizes
the log-likelihood (3) under a nested alternative hy-
pothesis H of dimension q > qa, then
REFERENCES
I = 1I
I1 I2
13 14 ’
proach,” ACM Transactions on Mathematical Software, 9,503-
524.
Green, P. .I. (1984), “Iteratively Reweighted Least-Squares for
Maximum Likelihood Estimation and Some Robust and Re-
where I, = B’DIB and Di is diagonal with elements sistant Alternatives” (with discussion), Journal of the Royal
t”(i) - 6) + A(1 - P)(l - A?) + 2&, I2 = B’D,, Statistical Society, Ser. B, 46, 149-192.
Heilbron, D. C. (1989), “Generalized Linear Models for Altered McCullagh, P., and Nelder, J. A. (1989). Generalized Linear
Zero Probabilities and Overdispersion in Count Data,” unpub- Models, New York: Chapman and Hall.
lished technical report, University of California, San Francisco,
Meeker, W. Q. (1987). “Limited Failure Population Life Tests:
Dept. of Epidemiology and Biostatistics.
Application to Integrated Circuit Reliability,” Technomerrics,
Johnson, N. L., and Kotz, S. (1969), Distributions in Statistics:
29, 51-6.5.
Discrete Distributions, Boston: Houghton Mifflin.
McCullagh, P. (1983), “Quasi-likelihood Functions,” The Annals Wu, C. F. J. (1983), “On the Convergence Properties of the FM
of Statistics, 11, 59-67. Algorithm,” The Annals of Statistics, 11, 95-103.
Downloaded by [Moskow State Univ Bibliote] at 01:04 15 November 2013