Communications in Statistics - Theory and Methods

This article was downloaded by: [North Carolina State University]
On: 21 December 2012, At: 09:26

Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
House, 37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory and Methods

Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/lsta20
Generalized poisson regression model

a b
P.C. Consul & Felix Famoye
a
Department of Math & Statistics, University of Calgary, Calgary, Alberta, T2N 1N4,
Canada
b
Department of Mathematics, Central Michigan University, Mt.Pleasant, MIchigan,
48859
Version of record first published: 05 Jul 2007.
To cite this article: P.C. Consul & Felix Famoye (1992): Generalized poisson regression model, Communications in
Statistics - Theory and Methods, 21:1, 89-109
To link to this article: http://dx.doi.org/10.1080/03610929208830766
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to
anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising
directly or indirectly in connection with or arising out of the use of this material.
COMMUN. STATIST. -THEORY METH., 2 1 ( 1 ) , 89-109 (1992)
GENERALIZED POISSON REGRESSION MODEL
P.C. Consul Felix Famoye
Department of Math. & Statistics Department of Mathematics

University of Calgary Central Michigan University
Calgary, Alberta Mt. Pleasant
Canada T2N IN4 Michigan 48859
Keywords and Phrases: Count data; over-dispersion; under-

Downloaded by [North Carolina State University] at 09:26 21 December 2012
dispersion; generalized Poisson distribution; maximum likelihood;

deviance; hypothesis testing.
ABSTRACT
The generalized Poisson distribution has been found useful in

fitting over-dispersed as well as under-dispersed count data. Since a
number of models and methods have been proposed for the regression
analysis of count data either with under-dispersion or with over-
dispersion, we define and study a generalized Poisson regression (GPR)
model which is useful in predicting a response variable affected by one
or more covariates. This regression model is suitable for both types of
dispersions. The methods of maximum likelihood and moments are
given for the estimation of parameters. Approximate tests for the
adequacy of the model are considered. Asymptotic tests are given for
the significance of regression parameters. The GPR model has been
applied to four observed data sets to which other regression models
were applied earlier.
Copyright O 1992 by Marcel Dekker, Inc.

CONSUL AND FAMOYE
1. INTRODUCTION
Poisson regression models have been widely used to analyze those

count data where sample mean and sample variance are almost equal
[see Frome et. al. (1973), Haberman (1974), Frome (1983) and Holford
(1983)l. It is also well known that counts often exhibit substantial
variations where the sample variance is either larger or smaller than
the sample mean and it is classified as over-dispersion or under-
dispersion respectively. Various models have been suggested by
different authors to deal with this under-dispersion or over-dispersion
(extra-Poisson variation) [see Manton et. al. (1981), Williams (1975,
1982), Hinde (1982), Cox (1983, and Stein and Juritz (1988)l. Lawless
(1987) has made a detailed study of the negative binomial regression
model for count data with over-dispersion. However, none of the
models discussed by the various researchers is such that it can
accommodate both under-dispersion and over-dispersion.

We formulate and study a generalized Poisson regression (GPR)
model for count data, affected by a number of known explanatory
variables. This GPR model is applicable to count data which has over-
dispersion or under-dispersion or no dispersion at all. It is based upon
the generalized Poisson distribution, studied by many researchers and
fully described in the recent book by Consul (1989).
The generalized Poisson distribution (GPD) has been used to
describe data sets which show either over-dispersion or under-
dispersion. The reader is referred to the paper by Janardan et. al. (1979)
on biological applications of the generalized (Lagrangian) Poisson
distribution which gave some biological interpretations to the
dispersion parameter. Many other examples can also be found in
Consul (1989).
The Poisson model is used to describe a process in which
successive events occur at the same rate independently. Many real life
examples are given by Consul (1989, Chapt. 1) where the ~oisson
assumptions are not satisfied. McCullagh and Nelder (1989, pp. 199)
remarked that specific regression model may be used to analyze a count
data for which the mechanism producing the over-dispersion or
GENERALIZED POISSON REGRESSION MODEL 91
under-dispersion is known. Furthermore, in the absence of such

knowledge, one may assume that
var (YI x) = o2E[Y I XI.
To carry out the analysis, one assumes that Yi follows a Poisson

distribution. The precision of the regression parameter estimates is
obtained by incorporating the dispersion parameter a2 into the analysis.
We will see in section two that the variance function suggested by
McCullagh and Nelder corresponds to that of GPR model. In place of
using the Poisson regression model to fit the data and adjusting the
precision of the parameter estimates, we propose the use of GPR
model, defined in section two.
The estimation of the parameters of GPR model is discussed in
sections three and four. Test statistics for examining the
appropriateness of the regression model, testing the significance of the
regression parameters, and its goodness of fit are proposed in sections

five and six. The GPR model has been applied to four observed data
sets in section seven.
2. GENERALIZED POISSON REGRESSION MODEL
A random variable Y is said to have a generalized Poisson

distribution (GPD) if its probability distribution is given by
\e(e + hY)J'-' e - 0 4 ~ / ~ ! ;y = 0, I, 2, ...

P(Y = y) =
\ 0; for y > m w h e n h c 0 ,
and zero otherwise, where 8 > 0, ma(-1, -8/4) I h II and m is the

largest positive integer for which 8 + mh > 0 when h is negative. The
GPD reduces to the Poisson model when h = 0 and possesses the
property of over-dispersion for all values of h > 0 and the property of
under-dispersion for all values of h < 0. The model does get truncated
for negative values of h but it has been shown [see Consul and Shoukri
(1985)l that when max(-1, -8/4).S h c 0, the total error of truncation is
92 CONSUL AND FAMOYE
less than 0.5% which is very small and negligible because all discrete
models do get truncated under the sampling procedures. This GPD
model gets generated under various physical conditions and has been
found to be a very useful model in many fields of study such as
genetics, queueing, insurance, labour absenteeism and marketing
research, etc. [See Consul (1989)J.
When h 2 0, the support of GPD is independent of unknown
parameters. For this case, Consul and Shoukri (1984) proved that the
maximum likelihood (ML)estimators for parameters 0 and h are
unique. For the case 3c. < 0, the probability distribution in (2.1) is valid
only for y Im where m might depend on the unknown parameter h.
For this case, Consul and Famoye (1988) showed that the ML estimators
of 8 and h are also unique and furthermore, the ML estimators are
obtained by using the same likelihood equations as in the case h 2 0.
Since the count data obtained in any experiment are affected by a
number of explanatory regressor variables, one can easily define

regression model based upon the GPD. Let 3 be a (k-1) - dimensional
vector of explanatory variables and Y be the response variable, which is
a count, having response vaiues yl, y2, y3, ..., y, corresponding to the
given vector set + = {xQ,xi3, ..., xik). Like the Poisson regression model,
we stipulate that the distribution of Y,, for any given +, is that of
generalized Poisson given by (2.1) with the mean
where ci denotes some measure of exposure, and f$, P)> 0 is a known

differentiable function of +x and a k-dimensional vector of regression
parameters.
Since the mean p for the GPD is given by p = 8(1 -
= 88, one
can write corresponding generalized Poisson regression (GPR) model
in the form
and zero otherwise, where p = p ( 9 > 0 and cp 2 max(l/2, 1 - p / 4 )

represents the square root of the index of dispersion and m is the
largest positive integer for which p + m(cp-1) > 0 when cp < 1. The
variance of Y in model (2.3) is given by
When cp = 1, the GPR model (2.3) reduces to the Poisson regression

model. Obviously, for cp > 1 the above GPR model will represent count
data with over-dispersion and for 1/2 I cp < I it will represent count
data with under-dispersion when p > 2.
Under certain conditions the value of cp in the GPR model (2.3)
can be reduced to less than 1/2 as well, but we exclude such cases for
the present.
3. MAXIMUM LIKELIHOOD ESTIMATION
The likelihood function of the GPR model (2.3) is proportional to
The logarithm of the likelihood function L(B, cp) can be written as
The ML equations for the estimation of fi and cp are given by

equating to zero the first partial derivatives of 1@, cp) and these are
CONSUL AND FAMOYE
Formula (3.4) simplifies to some extent if the usual log-linear

specification is used. For this case,
and -
aPi = p. x.
1 ~r
ah
where xiT= (1, xi2,xi), ..., xiL)and fiT = (PI, P2, ..., P3. In particular, when
r = 1 the likelihood equation (3.4) reduces to
where the summation over i is from I to n. Multiplying (3.3) by ((p 1) -

and adding it to the above equation, one obtains
Thus the last term in (3.3) may be dropped when the log-linear
specification contains a constant term P1 without a covariate xi.
The ML equations (3.3) and (3.4) are clearly non-linear in the
parameters and one will have to use one of the methods of successive
approximation to obtain the maximum likelihood estimates (MLE) of
the parameters Band cp. The initial estimates (p,cpO) are the solutions
obtained for the parameters by the moments method (to be discussed in
the next section).
The generalized linear model software (GLIM), [McCullagh and
Neider (1983), page 1701 may also be used to obtain the ML estimates of
the parameters (8, cp); however, we did not get very satisfactory results
by using this software. The generalized Poisson regression and the
Poisson regression models have the same quasilikelihood function. By
using GLIM to fit the GPR model, only the variance and deviance
functions for the Poisson regression model are changed. Since the
variance function (2.4), which incidentally, corresponds to the variance
function for Poisson quasilikelihood, we obtain exactly the same
estimates as for the Poisson regression model. By examining the
likelihood function (3.1), it is obvious that the kernel of the log-
likelihood function for the GPR model is different from that of Poisson
regression model and so the parameter estimates should differ.
4. MOMENT ESTIMATION OF cp
Breslow (1984) suggested the estimation of a dispersion parameter

by equating the chi-square statistic to its degrees of freedom. For the
GPR model, this leads to
(yi - I.' = n-k,

i=l pig
which gives the value of cp in terms of pi. However, it further needs

the estimation of various pi's from the data by successive
approximations, where the initial values of pi are from Poisson
regression model.
We first fit the Poisson model (cp = 1) to obtain the initial pi's, and
then solve for in (4.1). If =. I, it implies that the Poisson regression
model is appropriate and no further estimation needs to be done.
h
However, if ;t 1, the estimated value of cp is used to obtain a new 0

h
from (3.4). On getting new B, we return to equation (4.1) and solve for a
new cp. This process is continued until we obtain a stable solution.
5. APPROPRIATENESS OF GPR MODEL
Since the GPR model (2.3) reduces to the Poisson regression model
when cp = 1, one can assess the appropriateness of the GPR model over
the Poisson regression model by testing the hypothesis.
%: cp = 1against HI: cp;t1. (5.1)
Rejection of % implies that the Poisson assumption should be rejected.

Thus, the GPR model may be appropriate for such a situation. To carry
out the test in (5.1) one has to consider the parameter vector Q. If B are
known, an appropriate test statistic is the log likelihood ratio.
which on substitution reduces to
For large n the test statistic in (5.2) has an approximate chi-square

distribution with 1 degree of freedom (d.f.) under Ho. In general, the
parameter vector B are unknown and they have to be estimated from

the sample. For this situation, we propose the test statistic
where 6 are the MLE's of J3 under the GPR model with no specified
*
h
parameter value and Q are the MLE's of B under GPR model when Ho
,.
A h
is true. In general, 8 need not equal 8. The test statistic in (5.3) will be
taken as an approximate chi-square random variable with 1 d.f. when
HQis true. We have not actually looked at the distributional form of
the statistic in (5.3).
The test in (5.1) depends on a satisfactory specification of
However, if there are replicate observations for some of the covariate

values, a separate assessment of over-dispersion and the specification
G E N E R A L I Z E D P O I S S O N R E G R E S S I O N MODEL 97
of the regression function is possible. A similar situation for the

Poisson regression model has been considered by Frome et. al. (1973).
Assuming the regression function
E(Y..) = pi'
1J V(Yijl = p2pi
j=1,2 ,..., n.1 and i = 1 , 2 ,..., N,
then the statistic
where y .. = (2 Z Y..)/(Zni) is approximately a chi-square random

i j '1
variable with
D = C ni - k degrees of freedom. Decomposing Q1 into two
independent approximately chi-square random variables Qll and Q12
we get
- Ti .12 + C (Yi.-- pi .12

(Yi.
Ql=CC 1-
i j Y.. i Y..
- Q1 + Q2
where yi. = Yij/ni.
1
The degrees of freedom of Qll and Q12 are Dll= C ni - N and D12 = N - k
respectively.
When the value of Q1 is significantly large, this may be due to
either over-dispersion/under-dispersionor lack of fit of the regression
model, or both. To test for over-dispersion/under-dispersion,we
compare Qll with chi-square distribution with Dll d.f. Thus, Qll is
used to test the hypothesis in (5.1) with no assumption about the
regression function.
If Qll is significantly large, we have either over-dispersion (when
cp > 1) or under-dispersion (when cp < 1). To test for lack of fit of the
regression model, we may use the F-ratio
which follows an approximate F-distribution. If the ratio F is large, it

implies that the regression function specification should be rejected.
6. GOODNESS OF FIT FOR REGRESSION MODELS
When a number of alternative regression models are available for

a given set of data, one has to decide as to which particular model is the
most suitable one for the given data. To reach this decision, one may
cwmpare the deviances D for the regression models, where
and UpI, yi) denotes the probability for Y = yi by the given regression
model. As soon as the unknown parameters of the model are
estimated, it is easy to get p(Gi, yi) for any model. It has been shown
that the statistic D has a x2-distribution with n - r degrees of freedom,
where r is the total number of estimated parameters. The statistic D is
constructed for assessing the adequacy of the specification of the
regression function
~ i ( x=) E(Yi I xi).
If D is too large, the regression function specification should be rejected.
This test corresponds to the use of F-ratio in Section 5.
Under a specified regression function, different regression models

like the GPR, the Poisson or the negative binomial regression may be
applied to a given data set. The regression model with the least value
of the deviance D, among all available regression models, can be
considered as the best model for the given data set.
When the regression function pi is the usual log linear function,
the adequacy of the log linear part of the GPR model may also be
examined. This is equivalent to testing for the parameter vector B.

When cp is known, an appropriate test statistic to test the hypothesis
Ho: B = (PI, P2t ."Pk)'

is the log likelihood ratio statistic
A h
where fi are the MLE's of fi under Ho and Bfullare the MLE's of fi under
the full or complete GPR model in which there is one parameter for
each value of i. Under Ho, the test statistic in (6.2) has an approximate
-
chi-square distribution with n k d.f. However, if cp is unknown, it has
to be estimated from the sample. For this case, the test statistic
has an approximate chi-square distribution with n-k-1 d.f. Note that l4

corresponds to our previous deviance statistic D.
The previous test can be carried out to see if one or more of the p's
are significant. For instance, to test
&:fi = & = (PI, P2t -PJ
against
8 = fil= (PI, P2. -.Pm,
HI: Pm + 1t -1 Pk)'
we use the test statistic
which has an approximate chi-square distribution with k-m d.f. This is

to test for the significance of (Om ..., Pk)'.
CONSUL AND FAMOYE
7. SOME APPLICATIONS
7.1 Ship Damage Incidents: The data on type of damage caused by

waves to the forward section of certain cargo-carrying vessels was
analyzed by McCullagh and Nelder (1989, pp. 204), hereinafter referred
to as MN. This data was also analyzed by Lawless (1987) by using the
negative binomial regression model.
The responses Yi are the numbers of damage incidents over
various five year periods, and the exposures ci are the aggregate
months in service for each ship. There are three qualitative factors:
ship type (A, B, C, D or E), year of construction (1960-64, 1965-69, 1970-
74, or 1975-79), and period of operation (1960-74 or 1975-79). Indicator
covariates were used to represent the main effects and a log-linear
specification (as in Lawless (1987))
E(Yi I zi) = pi = ci f(xilB) (7.1)

= Ci exp(2EiTe)
was used.
MN fitted a log-linear model by using quasilikelihood for the
Poisson distribution with a variance function given in (2.4). Although
the GPR model has a similar variance function but the kernel of its
log-likelihood is different from that of Poisson distribution.
We apply the GPR mode1 (2.3) to the observed number of damage
incidents by using the regression function (7.1). We note that the GPR
model also fits the data very well. The deviance from the
quasilikelihood approach is 38.70 with 25 d.f. while the deviance from
the GPR model is 30.88 with 24 d.f. The estimates of the regression
coefficients are slightly different (see Table 7.1).
Lawless (1987) found that the MLE of the dispersion parameter is 0
under the negative binomial regression model. This corresponds to
the case of no dispersion (i.e. when cp = 1 under the GPR model). We
report in Table 7.1 the parameter estimates for cp = 1 and for the
quasilikelihood function discussed by MN under (i) and (ii)
respectively. The ML estimates from the GPR model are shown under
(iii).
TABLE 7.1
Parameter Estimates
Parameter (D=1
Intercept -6.41 (.22)
Ship Type
A 0
B -.54(.I81
C -.69 (.33)
D -.08 (.29)
E .33 (.24)
Year of
Construction
60-64 0
65-69 .70 (,IS)
70-74 .82 (.17)

75-79 .45 (.23)
Service Period
60-64 0
75-79 .38 (.12)
Similar to the earlier results, the GPR fit indicates that the main
effects are significant, and there is some inconclusive evidence for an
interaction of ship type by year of construction.
To test the hypothesis (5.1) the observed value of the test statistic
(5.3) is 1.27 which is not significant. This shows that there is no strong
evidence of over-dispersion under the GPR model. Similar remark
was made by Lawless (1987) for the MN model and the negative
binomial regression model. In view of this, the parameter estimates as
well as their standard errors should be very close to the results under
the Poisson assumption (i.e. for the case cp = I). It is interesting to note
that this is in fact the case under the GPR model whereas the MN
approach leads to somewhat large standard errors.
CONSUL AND FAMOYE
7.2 NUMBER OF REVERTANT COLONIES ON A PLATE
Margolin et. al. (1981) considered Ames Salmonella assay data

which was subsequently analyzed by Breslow (1984) by considering a log
linear model. Lawless (1987) analyzed the same data by using the
negative binomial regression (NBR) model and showed that the data
set was better represented by the NBR model than the Poisson
regression model. The response variable Y is the number of revertant
colonies observed on a plate and the explanatory variable x is the dose
level of quinoline on the plate. Three observations were taken at each
of six dose levels in the assay.
We use the Margolin et. al.'s 'single hit' model which is also
considered by Breslow as well as by Lawless where the expected value
of the variable Yi is given by
We consider this with the GPR model (2.3) and apply it to the
given data. The maximum likelihood procedure, given in Section 4,
yields the ML estimates of the parameters with their respective
standard errors (in brackets) as
.cp = 1.5624 (0.2686), h
p, = 2.1937 (0.3370)
* h
P2 = 0.0009928 (0.0003792) and p3= 0.3142 (0.08803).
It is clear from the point estimate of cp that the data is over-

dispersed. The calculated value of the test statistic 12 for the null
hypothesis Ho: cp = 1 against its negation is 10.437 which is highly
significant when compared with the critical Xzvalues with 1 d.f. Thus,
the data set is over dispersed and a discrete probability model having
over-dispersion must be used for this data.
Although Lawless (1987) did not report the deviance for the NBR
model, this can be found. By using (6.3), we obtain the deviance for
the GPR model. The deviances for the Poisson regression model, the
NBR model and the GPR model are given below in Table 7.2.
TABLE 7.2
Deviances and d.f. for various models
Repssion model Deviance

Poisson regression 43,72
N.B. regression 17.71
GPR regression 17.21
It is clear from the values in the above table that both the NBR
model of Lawless (1987) and our GPR model are far better than the
Poisson regression model and that the GPR model is as good as the
NBR model in describing the data. Also, the actual estimates of PI, P2,
P3 under GPR model do not differ much from the estimates of Po, PI,
and P2 under the NBR model respectively, however, the value of
A h
P3/se (j3-J = 3.57 is slightly smaller than the corresponding value under
NBR.
In this example, there are replicate observations for the covariates
and so we can obtain the values of the statistics Q1, Qll, and QI2 in (5.4).
The value QI = 47.8207 is found to be significant when compared with
x2-distribution on 15 d.f. Thus, there is either over-dispersion or lack
of fit of the regression model, or both. The value Q1 = 37.4885 is also
found to be significant when compared with x2-distribution on 12 d.f.
This shows that there is over-dispersion which agrees with our earlier
result of testing the hypothesis in (5.1).
To test for lack of fit of the regression model, we obtain the F-ratio
in (5.5) as F = .907.This quantity is not significant and so the regression
function considered is appropriate.
7.3 FISH SPECIES
Barbour and Brown (1974) considered a data on fish species

diversity from 70 lakes in different parts of the world. Stein and Juritz
CONSUL AND FAMOYE
TABLE 7.3
Deviances and d.f. for four models
Model Deviance d,f,

Poisson lenression
" 1538 68
NBR 73.3 67
IGPR 64,O 67
GPR 33,08 67
(1988) have applied the Inverse-Gaussian-Poisson regression (IGPR)

model to this data and have shown that the IGPR model is better than
the Poisson regression model.
The response variable Y is the number of fish species and the
explanatory variable x is the log lake area. We apply the GPR model
(2.3) with
E[Yi I xi] = pi = exp(P1 + P2xi)
to the same data on fish species and compute the values of the
parameters, their variances and the deviance for the model. From
(6.3), we obtain the deviance for the GPR model. The deviances for the
Poisson regression and IGPR models are from Stein and Juritz (1988).
Table 7.3 contains the deviances for all four models applied to this data
It is clear from the above values that the NBR model is better than
the Poisson regression model for this data and the IGPR model of Stein
and Juritz (1988) is better than both of them. However, the GPR model
describes this data on the abundance of fish species in the best manner
in comparison with the other models.
The ML estimates of the parameters and their standard errors
h
(given in brackets) for the GPR model are = 5.4982 (0.69551, PI = 3.0137
.
(0.2116) and P2 = 0.09911 (0.02396).
The estimated value of cp is quite large and clearly indicates that
the data is overdispersed. If the null hypothesis Ho: cp = 1 against its
negation is applied [as given in (5.1)], the value of the chi-square test
statistic in (5.3) is 1263.95 with 1 d.f. which is highly significant. Also,
the log-likelihood for the GPR model is found to be the smallest one
among all the four models and is - 316.10. Thus the GPR model seems
to be the most suitable one for the abundance of fish species in the
various lakes.
7.4 MULTI-TARGET SURVIVAL CURVE
Frome et. al. (1973) applied the Poisson regression model with an
intrinsically non-linear regression function to cell survival data. In the
example, the observed response is the number of colonies produced in
the spleen of recipient animals.
The non-linear regression function for the effect of radiation
damage on stem cell survival is
- -~ X ~ ( - & X ~ ) ) ~ ~ I
pi = E(Yij l zi) = PI xil[l (I
where xi1 is the concentration of injected cells, xi2 is the radiation dose,
and PI, P2, Ps are parameters which have biological interpretations.
Details on this example can be found in Frome et. al. (1973) and the
references therein.
We applied the GPR model to the data by using the above mean
specification. The estimate of cp (with the standard error in bracket) is
0.7581 (.0710) which signifies an under-dispersion situation. The other
*
,.
parameter estimates (with their standard errors) are
h
P1 = 7.5254 (.6956),
PZ = .00933 (.000312), and Pg = 2.9349 (.6018).
To test for the significance
of the dispersion parameter cp, we compute the statistic Q, in (5.4) to be
31.97. Clearly, Q1 is not significant when compared with x2-distribution
with 52 d.f. This is similar to the result obtained by Frome et. al. for the
Poisson regression model.
For comparison, we report here the fit by the GPR model and that
of Poisson regression model which was reported by Frome et. al.
CONSUL AND FAMOYE
TABLE 7.4
Survival Curve Data
Poisson GPR
i 'il X.
12 n. zi=Cyij/ni Fit Fit
I
From the above table, we note that the GPR model is as good as
the Poisson regression model. Although, this example does not
indicate a better fit by the GPR model but it does show that the GPR
model will do equally well when the dispersion parameter cp is found
not to be significant.
CONCLUSION
Many methods of dealing with extra-Poisson variation (over-

dispersion) in regression analysis have been suggested in the literature
[see Lawless (1987) and the references therein]. Up till now, it seems as
if none of these methods is capable of accommodating under-
dispersion as well. It is quite interesting to find a regression model, the
generalized Poisson regression (GPR) model that can accommodate
both the over-dispersion and under-dispersion.
Methods of maximum likelihood and moments have been
proposed to estimate the parameters of the generalized Poisson
regression model. These procedures are quite easy to program by using

a standard computer language. The efficiency and robustness
properties of the parameter estimates will be addressed in a separate
paper.
Lawless (1987) applied the negative binomial regression (NBR)
model to describe the over-dispersion case. In his paper, the case a + 0,
rather than a = 0 leads to the Poisson regression model. This raises a
question as to whether the test for a = 0 is equivalent to testing for
Poisson assumption in the NBR model. This situation does not arise
with the GPR model proposed in this paper. The case cp = 1 in the GP
regression model corresponds to the Poisson regression model and
hence the appropriateness of the GP regression model can be checked
by testing Ho: cp = I against HI:cp # 1.
We note that the parameter estimates from GPR model are close
to those from Poisson regression model when the dispersion
parameter cp is very close to 1 (i.e. when we fail to reject & in (5.1)).

However, there is much difference in the parameter estimates when cp
is substantially different from 1. For such a situation, the results
obtained from the Poisson quasilikelihood method may not be as
reliable as those from the GPR model.
ACKNOWLEDGEMENT
The authors acknowledge the financial support from the Natural

Sciences and Engineering Research Council of Canada for this work.
BIBLIOGRAPHY
1. Barbour, C.D. and Brown, J.H. (1974). "Fish species diversity in

lakes." The American Naturalist, 108, 473-489.
2. Breslow, N. (1984). "Extra-Poisson variation in log-linear

models." Appl. Statist., 3 3 , 3 8 4 .
3. Consul, P.C.(1989). Generalized Poisson distributions: Properties

and Applicafions. Marcel Dekker, Inc., New York.
4. Consul, P.C. and Famoye, F. (1988). "Maximum likelihood

estimation for the generalized Poisson distribution when
sample mean is larger than sample variance". Commun.
- -
Statist. Theory Meth., 17(1), 299 309.
5 Consul, P.C. and Shoukri, M.M. (1984). "Maximum likelihood

estimation for the generalized Poisson distribution".
-
Commun. Statist. Theory Meth., 13(12), 1533-1547.
6. Consul, P.C. and Shoukri, M.M. (1985). "The generalized Poisson

distribution when the sample mean is large than the sample
variance." Commun. Statist.-Simulation Comput., 14(3), 667-
681.
7 Cox, D.R. (1983). "Some remarks on over-dispersion."

Biometrika, 70, 269-274.
8. Day, N.E. and Walter, S.D.(1984). "Simplified models of screening

for chronic disease: Estimation procedures from mass
screening programmes." Biometrics, 40, 1-14.
9. Frome, E.L. (1983). "The analysis of rates using Poisson regression

models." Biometrics, 39,665-674.
10. Frome, E.L., Kutner, M.H. and Beauchamp, J.J. (1973). "Regression
analysis of Poisson-distributed data." JASA 68,935-940.
11. Haberman, S.H.(1974). The analysis of frequency dafa. Univ. of

Chicago Press, Chicago.
12. Hinde, J. (1982). "Compound Poisson regression models." GLIM

82: Proc. Internat. Conf. Generalized Linear Models (R.
Gilchrist ed.), Springer, Berlin, 109-121.
13. Holford, T.R. (1983). "The estimation of age, period and cohort
effects of vital rates." Biometrics, 39,311-324
14. Janardan, K.G., Kerster, H.W., and Schaeffer, D.J. (1979).

"Biological applications of the Lagrangian Poisson
distribution." BioScience 29, 599-602.
15. Lawless, J.F. (1987). "Negative binomial and mixed Poisson

regression." The Canadian Journ. of Stat., 15(3), 2098-225.
16. Manton, K.G., Woodbury,JvI.A. and Stallard, E. (1981). "A

variance components approach to categorical data models with
heterogeneous cell populations: Analysis of spatial gradients

in lung cancer mortality rates in North Carolina counties."
Biometrics, 37, 259-269.
17. Margolin, B.H., Kapla, N. and Zeiger, E. (1981). "Statistical analysis

of the Ames Salmonella/microsome tests." Proc. Nat. Acad.
Sci., U.S.A., 76,3779-3783.
18. McCullagh, P. and Nelder, J.A. (1989). Generalized linear models,

2nd Ed. Chapman and Hall, London.
19. Rao, C.R. (1973). Linear statistical inference and its applications.
John Wiiey and Sons.
20. Stein, G.Z. and Juritz, J.M. (1988). "Linear models with an inverse
- -
Gaussian Poisson error distribution." Comm. Statist. Theor.
Meth., 17(2), 557-571.
21. Williams, D.A. (19751.' "The analysis of binary responses from

toxicological experiments involving reproduction and
teratogenicity." Biometrics, 31, 949-952.
22. Williams, D.A. (1982). "Extra-binomial Variation in Logistic

Linear Models." Appl. Statists., 31(2), 144-148.
Received September 1990; Revised June 1991

Communications in Statistics - Theory and Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Communications in Statistics - Theory and Methods

Uploaded by

Copyright:

Available Formats

This article was downloaded by: [North Carolina State University]

On: 21 December 2012, At: 09:26

Communications in Statistics - Theory and Methods

Generalized poisson regression model

To link to this article: http://dx.doi.org/10.1080/03610929208830766

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

GENERALIZED POISSON REGRESSION MODEL

P.C. Consul Felix Famoye

Department of Math. & Statistics Department of Mathematics

Keywords and Phrases: Count data; over-dispersion; under-

dispersion; generalized Poisson distribution; maximum likelihood;

The generalized Poisson distribution has been found useful in

Copyright O 1992 by Marcel Dekker, Inc.

Poisson regression models have been widely used to analyze those

accommodate both under-dispersion and over-dispersion.

under-dispersion is known. Furthermore, in the absence of such

To carry out the analysis, one assumes that Yi follows a Poisson

regression parameters, and its goodness of fit are proposed in sections

2. GENERALIZED POISSON REGRESSION MODEL

A random variable Y is said to have a generalized Poisson

\e(e + hY)J'-' e - 0 4 ~ / ~ ! ;y = 0, I, 2, ...

and zero otherwise, where 8 > 0, ma(-1, -8/4) I h II and m is the

number of explanatory regressor variables, one can easily define

where ci denotes some measure of exposure, and f$, P)> 0 is a known

and zero otherwise, where p = p ( 9 > 0 and cp 2 max(l/2, 1 - p / 4 )

When cp = 1, the GPR model (2.3) reduces to the Poisson regression

3. MAXIMUM LIKELIHOOD ESTIMATION

The likelihood function of the GPR model (2.3) is proportional to

The logarithm of the likelihood function L(B, cp) can be written as

The ML equations for the estimation of fi and cp are given by

Formula (3.4) simplifies to some extent if the usual log-linear

where the summation over i is from I to n. Multiplying (3.3) by ((p 1) -

and adding it to the above equation, one obtains

Breslow (1984) suggested the estimation of a dispersion parameter

(yi - I.' = n-k,

which gives the value of cp in terms of pi. However, it further needs

However, if ;t 1, the estimated value of cp is used to obtain a new 0

5. APPROPRIATENESS OF GPR MODEL

%: cp = 1against HI: cp;t1. (5.1)

Rejection of % implies that the Poisson assumption should be rejected.

which on substitution reduces to

For large n the test statistic in (5.2) has an approximate chi-square

parameter vector B are unknown and they have to be estimated from

However, if there are replicate observations for some of the covariate

of the regression function is possible. A similar situation for the

where y .. = (2 Z Y..)/(Zni) is approximately a chi-square random

- Ti .12 + C (Yi.-- pi .12

which follows an approximate F-distribution. If the ratio F is large, it

6. GOODNESS OF FIT FOR REGRESSION MODELS

When a number of alternative regression models are available for

Under a specified regression function, different regression models

examined. This is equivalent to testing for the parameter vector B.

Ho: B = (PI, P2t ."Pk)'

has an approximate chi-square distribution with n-k-1 d.f. Note that l4

we use the test statistic

which has an approximate chi-square distribution with k-m d.f. This is

7.1 Ship Damage Incidents: The data on type of damage caused by

E(Yi I zi) = pi = ci f(xilB) (7.1)

70-74 .82 (.17)

7.2 NUMBER OF REVERTANT COLONIES ON A PLATE

Margolin et. al. (1981) considered Ames Salmonella assay data

.cp = 1.5624 (0.2686), h