Professional Documents
Culture Documents
Communications in Statistics - Theory and Methods
Communications in Statistics - Theory and Methods
To cite this article: P.C. Consul & Felix Famoye (1992): Generalized poisson regression model, Communications in
Statistics - Theory and Methods, 21:1, 89-109
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to
anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising
directly or indirectly in connection with or arising out of the use of this material.
COMMUN. STATIST. -THEORY METH., 2 1 ( 1 ) , 89-109 (1992)
ABSTRACT
1. INTRODUCTION
less than 0.5% which is very small and negligible because all discrete
models do get truncated under the sampling procedures. This GPD
model gets generated under various physical conditions and has been
found to be a very useful model in many fields of study such as
genetics, queueing, insurance, labour absenteeism and marketing
research, etc. [See Consul (1989)J.
When h 2 0, the support of GPD is independent of unknown
parameters. For this case, Consul and Shoukri (1984) proved that the
maximum likelihood (ML)estimators for parameters 0 and h are
unique. For the case 3c. < 0, the probability distribution in (2.1) is valid
only for y Im where m might depend on the unknown parameter h.
For this case, Consul and Famoye (1988) showed that the ML estimators
of 8 and h are also unique and furthermore, the ML estimators are
obtained by using the same likelihood equations as in the case h 2 0.
Since the count data obtained in any experiment are affected by a
Downloaded by [North Carolina State University] at 09:26 21 December 2012
and -
aPi = p. x.
1 ~r
ah
where xiT= (1, xi2,xi), ..., xiL)and fiT = (PI, P2, ..., P3. In particular, when
r = 1 the likelihood equation (3.4) reduces to
Thus the last term in (3.3) may be dropped when the log-linear
specification contains a constant term P1 without a covariate xi.
The ML equations (3.3) and (3.4) are clearly non-linear in the
parameters and one will have to use one of the methods of successive
approximation to obtain the maximum likelihood estimates (MLE) of
the parameters Band cp. The initial estimates (p,cpO) are the solutions
obtained for the parameters by the moments method (to be discussed in
the next section).
The generalized linear model software (GLIM), [McCullagh and
Neider (1983), page 1701 may also be used to obtain the ML estimates of
the parameters (8, cp); however, we did not get very satisfactory results
by using this software. The generalized Poisson regression and the
Poisson regression models have the same quasilikelihood function. By
using GLIM to fit the GPR model, only the variance and deviance
GENERALIZED POISSON REGRESSION MODEL 95
functions for the Poisson regression model are changed. Since the
variance function (2.4), which incidentally, corresponds to the variance
function for Poisson quasilikelihood, we obtain exactly the same
estimates as for the Poisson regression model. By examining the
likelihood function (3.1), it is obvious that the kernel of the log-
likelihood function for the GPR model is different from that of Poisson
regression model and so the parameter estimates should differ.
4. MOMENT ESTIMATION OF cp
i=l pig
from (3.4). On getting new B, we return to equation (4.1) and solve for a
new cp. This process is continued until we obtain a stable solution.
Since the GPR model (2.3) reduces to the Poisson regression model
when cp = 1, one can assess the appropriateness of the GPR model over
the Poisson regression model by testing the hypothesis.
96 CONSUL AND FAMOYE
where 6 are the MLE's of J3 under the GPR model with no specified
*
h
parameter value and Q are the MLE's of B under GPR model when Ho
,.
A h
is true. In general, 8 need not equal 8. The test statistic in (5.3) will be
taken as an approximate chi-square random variable with 1 d.f. when
HQis true. We have not actually looked at the distributional form of
the statistic in (5.3).
The test in (5.1) depends on a satisfactory specification of
(Yi.
Ql=CC 1-
i j Y.. i Y..
- Q1 + Q2
where yi. = Yij/ni.
1
The degrees of freedom of Qll and Q12 are Dll= C ni - N and D12 = N - k
respectively.
When the value of Q1 is significantly large, this may be due to
either over-dispersion/under-dispersionor lack of fit of the regression
model, or both. To test for over-dispersion/under-dispersion,we
compare Qll with chi-square distribution with Dll d.f. Thus, Qll is
used to test the hypothesis in (5.1) with no assumption about the
regression function.
If Qll is significantly large, we have either over-dispersion (when
cp > 1) or under-dispersion (when cp < 1). To test for lack of fit of the
regression model, we may use the F-ratio
98 CONSUL AND FAMOYE
and UpI, yi) denotes the probability for Y = yi by the given regression
model. As soon as the unknown parameters of the model are
estimated, it is easy to get p(Gi, yi) for any model. It has been shown
that the statistic D has a x2-distribution with n - r degrees of freedom,
where r is the total number of estimated parameters. The statistic D is
constructed for assessing the adequacy of the specification of the
regression function
~ i ( x=) E(Yi I xi).
If D is too large, the regression function specification should be rejected.
This test corresponds to the use of F-ratio in Section 5.
A h
where fi are the MLE's of fi under Ho and Bfullare the MLE's of fi under
the full or complete GPR model in which there is one parameter for
each value of i. Under Ho, the test statistic in (6.2) has an approximate
-
chi-square distribution with n k d.f. However, if cp is unknown, it has
to be estimated from the sample. For this case, the test statistic
Downloaded by [North Carolina State University] at 09:26 21 December 2012
7. SOME APPLICATIONS
TABLE 7.1
Parameter Estimates
Parameter (D=1
Intercept -6.41 (.22)
Ship Type
A 0
B -.54(.I81
C -.69 (.33)
D -.08 (.29)
E .33 (.24)
Year of
Construction
60-64 0
65-69 .70 (,IS)
Downloaded by [North Carolina State University] at 09:26 21 December 2012
Service Period
60-64 0
75-79 .38 (.12)
Similar to the earlier results, the GPR fit indicates that the main
effects are significant, and there is some inconclusive evidence for an
interaction of ship type by year of construction.
To test the hypothesis (5.1) the observed value of the test statistic
(5.3) is 1.27 which is not significant. This shows that there is no strong
evidence of over-dispersion under the GPR model. Similar remark
was made by Lawless (1987) for the MN model and the negative
binomial regression model. In view of this, the parameter estimates as
well as their standard errors should be very close to the results under
the Poisson assumption (i.e. for the case cp = I). It is interesting to note
that this is in fact the case under the GPR model whereas the MN
approach leads to somewhat large standard errors.
CONSUL AND FAMOYE
We consider this with the GPR model (2.3) and apply it to the
given data. The maximum likelihood procedure, given in Section 4,
yields the ML estimates of the parameters with their respective
standard errors (in brackets) as
p, = 2.1937 (0.3370)
* h
TABLE 7.2
Deviances and d.f. for various models
It is clear from the values in the above table that both the NBR
model of Lawless (1987) and our GPR model are far better than the
Poisson regression model and that the GPR model is as good as the
NBR model in describing the data. Also, the actual estimates of PI, P2,
Downloaded by [North Carolina State University] at 09:26 21 December 2012
P3 under GPR model do not differ much from the estimates of Po, PI,
and P2 under the NBR model respectively, however, the value of
A h
P3/se (j3-J = 3.57 is slightly smaller than the corresponding value under
NBR.
In this example, there are replicate observations for the covariates
and so we can obtain the values of the statistics Q1, Qll, and QI2 in (5.4).
The value QI = 47.8207 is found to be significant when compared with
x2-distribution on 15 d.f. Thus, there is either over-dispersion or lack
of fit of the regression model, or both. The value Q1 = 37.4885 is also
found to be significant when compared with x2-distribution on 12 d.f.
This shows that there is over-dispersion which agrees with our earlier
result of testing the hypothesis in (5.1).
To test for lack of fit of the regression model, we obtain the F-ratio
in (5.5) as F = .907.This quantity is not significant and so the regression
function considered is appropriate.
TABLE 7.3
Deviances and d.f. for four models
explanatory variable x is the log lake area. We apply the GPR model
(2.3) with
E[Yi I xi] = pi = exp(P1 + P2xi)
to the same data on fish species and compute the values of the
parameters, their variances and the deviance for the model. From
(6.3), we obtain the deviance for the GPR model. The deviances for the
Poisson regression and IGPR models are from Stein and Juritz (1988).
Table 7.3 contains the deviances for all four models applied to this data
It is clear from the above values that the NBR model is better than
the Poisson regression model for this data and the IGPR model of Stein
and Juritz (1988) is better than both of them. However, the GPR model
describes this data on the abundance of fish species in the best manner
in comparison with the other models.
The ML estimates of the parameters and their standard errors
h
(given in brackets) for the GPR model are = 5.4982 (0.69551, PI = 3.0137
.
(0.2116) and P2 = 0.09911 (0.02396).
The estimated value of cp is quite large and clearly indicates that
the data is overdispersed. If the null hypothesis Ho: cp = 1 against its
GENERALIZED POISSON REGRESSION MODEL 105
negation is applied [as given in (5.1)], the value of the chi-square test
statistic in (5.3) is 1263.95 with 1 d.f. which is highly significant. Also,
the log-likelihood for the GPR model is found to be the smallest one
among all the four models and is - 316.10. Thus the GPR model seems
to be the most suitable one for the abundance of fish species in the
various lakes.
Frome et. al. (1973) applied the Poisson regression model with an
intrinsically non-linear regression function to cell survival data. In the
example, the observed response is the number of colonies produced in
the spleen of recipient animals.
The non-linear regression function for the effect of radiation
damage on stem cell survival is
- -~ X ~ ( - & X ~ ) ) ~ ~ I
Downloaded by [North Carolina State University] at 09:26 21 December 2012
where xi1 is the concentration of injected cells, xi2 is the radiation dose,
and PI, P2, Ps are parameters which have biological interpretations.
Details on this example can be found in Frome et. al. (1973) and the
references therein.
We applied the GPR model to the data by using the above mean
specification. The estimate of cp (with the standard error in bracket) is
0.7581 (.0710) which signifies an under-dispersion situation. The other
*
,.
parameter estimates (with their standard errors) are
h
P1 = 7.5254 (.6956),
PZ = .00933 (.000312), and Pg = 2.9349 (.6018).
To test for the significance
of the dispersion parameter cp, we compute the statistic Q, in (5.4) to be
31.97. Clearly, Q1 is not significant when compared with x2-distribution
with 52 d.f. This is similar to the result obtained by Frome et. al. for the
Poisson regression model.
For comparison, we report here the fit by the GPR model and that
of Poisson regression model which was reported by Frome et. al.
CONSUL AND FAMOYE
TABLE 7.4
Survival Curve Data
Poisson GPR
i 'il X.
12 n. zi=Cyij/ni Fit Fit
I
Downloaded by [North Carolina State University] at 09:26 21 December 2012
From the above table, we note that the GPR model is as good as
the Poisson regression model. Although, this example does not
indicate a better fit by the GPR model but it does show that the GPR
model will do equally well when the dispersion parameter cp is found
not to be significant.
CONCLUSION
ACKNOWLEDGEMENT
BIBLIOGRAPHY
10. Frome, E.L., Kutner, M.H. and Beauchamp, J.J. (1973). "Regression
analysis of Poisson-distributed data." JASA 68,935-940.
13. Holford, T.R. (1983). "The estimation of age, period and cohort
effects of vital rates." Biometrics, 39,311-324
19. Rao, C.R. (1973). Linear statistical inference and its applications.
John Wiiey and Sons.
20. Stein, G.Z. and Juritz, J.M. (1988). "Linear models with an inverse
- -
Gaussian Poisson error distribution." Comm. Statist. Theor.
Meth., 17(2), 557-571.