You are on page 1of 23

Journal of Statistical Computation and Simulation

ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: https://www.tandfonline.com/loi/gscs20

GEE-based zero-inflated generalized Poisson


model for clustered over or under-dispersed count
data

Fatemeh Sarvi, Abbas Moghimbeigi & Hossein Mahjub

To cite this article: Fatemeh Sarvi, Abbas Moghimbeigi & Hossein Mahjub (2019) GEE-
based zero-inflated generalized Poisson model for clustered over or under-dispersed
count data, Journal of Statistical Computation and Simulation, 89:14, 2711-2732, DOI:
10.1080/00949655.2019.1632857

To link to this article: https://doi.org/10.1080/00949655.2019.1632857

Published online: 25 Jun 2019.

Submit your article to this journal

Article views: 49

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=gscs20
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
2019, VOL. 89, NO. 14, 2711–2732
https://doi.org/10.1080/00949655.2019.1632857

GEE-based zero-inflated generalized Poisson model for


clustered over or under-dispersed count data
Fatemeh Sarvia , Abbas Moghimbeigi b and Hossein Mahjubc
a Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan,

Iran; b Modeling of Noncommunicable Diseases Research Center, Department of Biostatistics, School of Public
Health, Hamadan University of Medical Sciences, Hamadan, Iran; c Research Center for Health Sciences,
Department of Biostatistics, Faculty of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran

ABSTRACT ARTICLE HISTORY


The zero-inflated regression models such as zero-inflated Poisson Received 21 December 2018
(ZIP), zero-inflated negative binomial (ZINB) or zero-inflated gener- Accepted 12 June 2019
alized Poisson (ZIGP) regression models can model the count data KEYWORDS
with excess zeros. The ZINB model can handle over-dispersed and Dispersion;
the ZIGP model can handle the over or under-dispersed count data expectation–solution
with excess zeros as well. Moreover, the count data may be corre- algorithm; generalized
lated because of data collection procedure or special study design. estimating equation;
The clustered sampling approach is one of the examples in which generalized Poisson
the correlation among subjects could be defined. In such situa- regression; zero-inflation
tions, a marginal model using generalized estimating equation (GEE)
approach can incorporate these correlations and lead up to the rela-
tionships at the population level. In this study, the GEE-based zero-
inflated generalized Poisson regression model was proposed to fit
over and under-dispersed clustered count data with excess zeros.

1. Introduction
Modelling of the count data is common in various areas, such as health, social, biological
and economic sciences. The Poisson model has limited usage for modelling in these fields
because realistic count data are typically correlated and may be exhibited over or under-
dispersion and/or extra zeros.
The count data with excess zeros can result from a mixture distribution of a zero-
degenerated distribution and a standard discrete distribution. So, to consider extra zeros
in data, there has been interested in mixture models. These mixture models are also called
zero-inflated regression (ZI) models [1]. For example, Lambert [2] developed zero-inflated
Poisson (ZIP) model that mix Poisson regression model with zero-degenerate distribution,
Ridout et al. [3] introduced zero-inflated negative binomial (ZINB) model, Vieira et al. [4]
and Hall [5] introduced zero-inflated binomial (ZIB) model, as well as Famoye and Singh
[6] considered the Zero-inflated generalized Poisson (ZIGP) model that has been found
worthwhile and The Zero-inflated generalized Poisson (ZIGP) model that is introduced

CONTACT Abbas Moghimbeigi moghimb@yahoo.com

© 2019 Informa UK Limited, trading as Taylor & Francis Group


2712 F. SARVI ET AL.

by Czado et al. [7] and allows for modelling the over dispersed and zero inflated count
data.
However, even though ZI models usually can cover excess zeros of the observa-
tions well, they are no more sufficient for modelling correlated count data in many
applications. Ignoring the correlation of the data could result in invalid parameter
estimations [8,9].
Many statistical techniques are recently developed to analyse the clustered (longitudi-
nal) data. Among them, the generalized estimating equations (GEE) technique is the most
popular. GEE is an extension of the quasi-likelihood approach [10]. This marginal model
is used to describe the relationships at the population level [8,9]. The distribution of the
GEE estimators are asymptotically normal and also these estimators are consistent. For
these appropriate properties, the GEE method has been extensively applied to correlated
data [11]. For example, The GEE type method is applied in the generalized Poisson regres-
sion by Czado [12] and the mean and dispersion parameters are formulated by covariates.
However, when there have been excess zeros in the data, the GEE estimators are no longer
valid. This issue is addressed by extensions of Zero-inflated regression models to correlated
data. For example, Hall [5] incorporated the random effect in the ZIP and ZIB models, Yau
and Lee [13] used the specific random effect in ZI models when faced with a zero-inflated
clustered data, Choo-Wosoba et al. incorporated the specific random effect in the zero-
inflated Conway–Maxwell-Poisson (CMP) model [14]. Also, Choo-Wosoba et al. applied
the Bayesian method to deal with clustered count data with extra zeros by combining the
CMP distributions with a hurdle distribution for the zeros and random effects for clus-
tering [15]. Including the GEE approach in the fitting algorithm is another method for
analysing clustered data with extra zeros [16].
The use of GEE method based on the ZI models is mentioned in several papers. For
example, Hall and Zhang [17] developed the GEE approach based on the ZIP, ZIT and
ZIB models, Kong et al. [18] considered the GEE approach based on the ZINB model to
clustered data with extra zeros.
Along with the ZINB model, the ZIGP regression model can cover data with extra zeros
and over-dispersion. The ZIGP model can also cover the count data with under-dispersion
[19]. With the comparison between NB and GP, ‘it is revealed that NB distribution has a
higher mass at zero, while the GP distribution has a heavier tail’ [20]. In addition, in fitting
ZINB to the count data, there is a number of situations in that the iterative algorithm to find
a parameter estimator may fail and the convergence does not occur, while the ZIGP regres-
sion model can be converged in many situations [19]. Another model that can cover a wide
range of dispersion is the CMP model, Whose marginal model is expanded to clustered
data with extra zero by Choo-Wosoba et al. [21].
In this study, we introduce a marginal ZIGP model to clustered count data based on
the GEE method and name it as GEE.ZIGP. We follow the study of Rosen et al. [22] who
considered the mixtures of generalized linear models of correlated data and applied the
expectation–solution (ES) algorithm to estimate the parameters.
We fit GEE.ZIGP model on DMFT (decayed (D), missing (M), or filled (F) teeth) index
data. For comparison, we fitted the GEE-based ZICMP model and GEE-based ZINB model
and GEE-based ZIP model on this data. We also fitted the independent ZIGP and ZINB
model on data that does not consider the correlation. Finally, we presented a simulation
study to detect the properties of the GEE.ZIGP model.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2713

We organized the rest of the study as follows: In Section 2, we describe the ZI models and
Rosen et al approach which employed the Expectation Solution (ES) algorithm for fitting
mixtures of marginal GLMs, and also the details of GEE.ZIGP model and estimation of
the parameters and their variances are explained. In Section 3, we apply the mentioned
models to real data and report the results. The results of the simulation study are reported
in Section 4. The paper is concluded with a discussion in Section 5.

2. Statistical inference and GEE.ZIGP model


We developed the GEE-based ZIGP (GEE.ZIGP) model to handle the clustered count data
with excessive zeros. This model mixes the zero-degenerate distribution with a Generalized
Poisson (GP) distribution and the inferences are based on the ES algorithm and the GEE
method.

2.1. ZIGP model for clustered data


Let Wij be the random variable with GP distribution for the jth subject within the ith cluster
(i = 1 . . . N; j = 1 . . . ni ), the probability mass function related to Wij can be written as:

 wij  
λij (1 + τ wij ) wij −1 −λij (1 + τ wij )
Pr(Wij = wij ) = fGP (wij |λij , τ ) = exp .
1 + τ λij wij ! 1 + τ λij
(1)

The GP distribution in (1) is an extension of the Poisson distribution given in 1973 by


Frome et al. [23] and developed by Famoye and Singh [6]. The mean of Wij is given
by E(Wij |λij , τ ) = λij and the variance of Wij is given by Var(Wij |λij , τ ) = λij (1 + τ λij )2 .
Thus, as shown, the Generalized Poisson distribution (GPD) model multiplies the term of
(1 + τ λij )2 in the variance of the Poisson model to represent the over or under dispersion.
In this model τ is the dispersion parameter. When τ = 0, the variance of the (GPD) model
is equal to the mean and (GPD) model reduces to Poisson model. When τ > 0, the variance
of the (GPD) model is greater than the mean and the (GPD) model exhibits the over-
dispersion in count data. When τ < 0, the (GPD) model exhibits the under-dispersion
in count data.
Consequently, for clustered data with extra zeros, we can consider Yij be the response
variable for the jth subject within the ith cluster (i = 1 . . . N; j = 1 . . . ni ). The distribution
of the response variable Yij (i = 1 . . . N; j = 1 . . . ni ) is a mixture of a zero-degenerate dis-
tribution with mixing probability of pij and a GP distribution with mean λij and mixing
probability 1 − pij . In a ZIGP model, we assume


⎨0 with probability pij ,
Yij ∼
⎩Generalized Poisson (λij , τ ) with probability 1 − pij .
2714 F. SARVI ET AL.

We can write the probability mass function of the response variable (Yij ) as follows:


⎨pij + (1 − pij ) Pr(Wij = 0) if y = 0,
Pr(Yij = y) =
⎩(1 − pij ) Pr(Wij = y) if y ≥ 1,
⎧  
⎪ −λij
⎪pij + (1 − pij ) exp
⎪ if y = 0,
⎨ 1 + τ λij
=  wij  

⎪ λij (1 + τ wij ) wij −1 −λij (1 + τ wij )

⎩(1 − pij ) exp if y ≥ 1.
1 + τ λij wij ! 1 + τ λij
(2)

The mean of Yij is E(Yij ) = (1 − pij )λij and the variance of Yij is Var(Yij ) = E(Yij )((1 +
τ λij )2 + pij λij ).
We can especially assume that log(λij ) = xijT β and log it(pij ) = zijT γ (i = 1 . . . N; j =
1 . . . ni ), where β and also γ , are parameters of the regression models and must be esti-
mated and x and z are the corresponding design matrix. As a result, the mean of Yij , e.g.
E(Yij ) = (1 − pij )λij depends on β and γ .

2.2. ZIGP regression via the ES algorithm for clustered data and parameter
estimates
2.2.1. The estimation of the coefficients of the GEE.ZIGP regression model
The expectation–maximization (EM) algorithm is a proper procedure for obtaining the
parameter estimates of ML method of statistical models, especially, when the model
depends on latent variables. Therefore, this algorithm is typically used in obtaining of
inferences of ZI models.
Consequently, let ui = 1 when Yi is obtained from zero-degenerated distribution, and
ui = 0 otherwise. We can consider the missing data as u = (u1 , . . . , uk )T and obtain log-
likelihood of ZIGP model as:


k
c
 (β, γ , τ ; y, u) = [ui log pi (γ ) + (1 − ui ) log(1 − pi (γ ))]
i=1


k
+ [(1 − ui ) log f2 (yi ; β, τ )]
i=1

≡ c (γ ; y, u) + c (β, τ ; y, u). (3)

In (3) the GP probability mass function is denoted as f2 (yi ; β, τ ). In the (h + 1)th itera-
(h) (h)
tion of the ES algorithm, ξ(β, γ , τ |β , γ (h) , τ (h) ) = E(c (β, γ , τ ; y, u)|y, β , γ (h) , τ (h) )
is computed.
We can consider simply expectation step as c (β, γ , τ ; y, u(h) ), because c (β, γ , τ ; y, u)
is linear in terms of u, where u(h) = E(u|y, β (h) , γ (h) , τ (h) ) and is called the conditional
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2715

mean of u given y. The ith element of u(h) turns out as:

uhi = Pr(ui = 1|yi , β (h) , γ (h) , τ (h) )


1
= 1{yi =0} . (4)
1 + (1 − pi (γ (h) )) exp(−λi (β (h) )/(1 + τ (h) λi (β (h) )))/pi (γ (h) )

Regarding this equation in (3) yields ξ(β, γ , τ |β (h) , γ (h) , τ (h) ) = c (γ ; y, u(h) ) +
c (β, τ ; y, u(h) ) and then is maximized in terms of β, τ and γ in maximization step of the
EM algorithm.
The first of these components has the binomial log-likelihood form in terms of the
response variable of u(h) , and the second component has the log-likelihood form of
(h) (h)
weighted GP with weights 1 − u1 , . . . , 1 − uk . Thus, in the E step, the u(h) is updated
and in the M step, these two regression models are fitted and these steps iterate until
convergence.
For clustered count data, these formulas can be written as:

c (β, γ , τ ; y, u) = [uij log pij (γ ) + (1 − uij ) log(1 − pij (γ ))]
i,j

+ [(1 − uij )logf 2 (yij ; β, τ )]
i,j

≡ c (γ ; y, u) + c (β, τ ; y, u). (5)

And

uhij = Pr(uij = 1|yij , β (h) , γ (h) , τ (h) )


1
= 1{yij =0} . (6)
1 + (1 − pij (γ (h) )) exp(−λij (β (h) )/(1 + τ (h) λij (β (h) )))/pij (γ (h) )

Here i = 1, . . . , N and j = 1, . . . , ni . Maximization (5) in terms of γ results in a solving


equation as follows:

N
∂pi (γ )T (h)
[Ai {pi (γ )}IAi {pi (γ )}]−1 {ui − pi (γ )} = 0.
1/2 1/2
(7)
i=1
∂γ

Where Ai (pi (γ )) = diag{pin1 (γ )(1 − pin1 (γ )), . . . , pini (γ )(1 − pini (γ )), and Maximiza-
tion (5) in terms of β results in a solving equation as follow:

N
∂λi (β)T (h)
[Di {λi (β)}IDi {λi (β)}]−1 ui {yi − λi (β)} = 0.
1/2 1/2
(8)
i=1
∂β

Where u(h) (h) (h)


i = diag{(1 − ui1 ), . . . , (1 − uini )}, and Di (λi (β)) = diag{λi1 (β)(1 + τ λi1
(β))2 , . . . , λini (β)(1 + τ λini (β))2 }. Equations (7) and (8) have the form of GEE with an
independent working correlation matrix. We can incorporate the dependence between
2716 F. SARVI ET AL.

observations into the model by changing the independent working correlation such as
autoregressive or exchangeable. This change leads to these equations:

N
∂pi (γ )T (h)
[Ai {pi (γ )}R(α1 )Ai {pi (γ )}]−1 {ui − pi (γ )} = 0.
1/2 1/2
(9)
∂γ
i=1

And


N
∂λi (β)T (h)
[Di {λi (β)}R(α2 )Di {λi (β)}]−1 ui {yi − λi (β)} = 0.
1/2 1/2
(10)
∂β
i=1

Where R(α1 ) and R(α2 ) denote working-correlation matrix and α1 and α2 represent cor-
relation parameters that can be estimated using the GEE method, as exhibit next. Here, τ
is dispersion parameter and must be estimated.
We can update the estimates of β and γ with following equations via Fisher-Scoring
iterative method:
N −1
 ∂pi (γ )T 1/2 −1 ∂pi (γ )T
(h+1) (h) 1/2
γ =γ + [Ai {pi (γ )}R(α1 )Ai {pi (γ )}]
i=1
∂γ ∂γ

× Sγ |(γ ,α (h) ,α (h) ) . (11)


1 )=(γ 1

Where

N
∂pi (γ )T (h) (h)
[Ai {pi (γ )}R(α1 )Ai {pi (γ )}]−1 (ui
1/2 1/2
Sγ = − pi (γ )).
i=1
∂γ

And
−1

N
∂λi (β)T −1 (h) ∂λi (β)T
(h+1) (h) 1/2 1/2
β =β + [Di {λi (β)}R(α2 )Di {λi (β)}] ui
∂β ∂β
i=1

× Sβ |(β,τ ,α (h) ,τ (h) ,α (h) ) . (12)


2 )=(β 2

Where

N
∂λi (β)T (h) (h)
[Di {λi (β)}R(α2 )Di {λi (β)}]−1 ui (yi − λi (β)).
1/2 1/2
Sβ =
∂β
i=1

We also can directly apply the GEE method to the observed count data with considering
an equation as:
⎛ T⎞
∂μi
N
⎝ ∂β ⎠V−1 (α, β, γ , τ )(Yi − μi ) = 0. (13)
i
T ∂μi
i=1
∂λ
1/2
Where Yi = (Yi1 , . . . , Yini )T , μi = (μi1 , . . . , μini )T , Vi (α, β, γ , τ ) = Var(Yi ) = Di
1/2
Ri (α)Di with Ai = Diag{Var(Yij )|j = 1, . . . , ni }, but this application may cause serious
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2717

problems because β and γ may be confounded in Equation (13) And may be not identi-
fiable [17]. Therefore it is proper to set the latent variable uij (i = 1, . . . , N; j = 1, . . . , ni )
and obtain B and γ estimates with two separate equations as mentioned in (11) and (12)
equations.

2.2.2. The estimation of the dispersion parameter of the GEE.ZIGP model


We estimate the dispersion parameter (τ ) via this knowledge that just the variance function
is related to this parameter. To estimate the dispersion parameter (τ ), we can recruit εij2 =
(Yij − λij (β))2 and εi2 (β) = (εi1
2 (β), . . . , ε 2 (β))T .
ini
This implies E(εij2 (β)) = νij (τ ) provided with uij = 1 which νij (τ ) = λij (β)(1 +
τ λij (β))2 . Therefore, we are able to estimate the τ parameter through solving the equation
as:

N
∂ν i (τ )
Ui (ε 2i − ν i (τ )) = 0. (14)
i=1
∂τ

Here Ui = Diag((1 − ui1 ), . . . , (1 − uini )) and ν i = (νi1 (τ ), . . . , νini (τ ))T , and


⎛ ⎞
2λ2i1 (β)(1 + τ λi1 (β))
∂ν i (τ ) ⎜ ⎟
=⎝ ... ⎠.
∂τ
2λini (β)(1 + τ λini (β))
2

Given γ and β, the estimation of the τ parameter is the root of this cubic function:


N 
ni
(1 − uij )(aτ 3 + bτ 2 + cτ + d) = 0. (15)
i=1 j=1

Here a = −2λ6ij (β), b = −6λ5ij (β) and c = −2εij2 (β)λ6ij (β) + 6λ4ij (β), and d = −2εij2 (β)λ2ij
(β) + 2λ3ij (β).
According to the delta sign and based on this fact that the variance of the data can
be greater, or smaller than, or equal to the mean, we choose the proper root. When the
equation had more than one root, our decision is based on the sum of prediction errors
value. We selected that root which resulted in lower sum of prediction errors.

2.2.3. The estimation of the correlation parameters of the GEE.ZIGP model


The next step is the estimation of the correlation parameters (e.g. α1 and α2 ). In general,
we can write the estimator for correlation parameters as the following estimating equation
N 
 
∂ξ i T

(α) = Hi (ωi − ξ i ) = [0]q×1 . (16)
i=1
∂α

Here rij is the ijth Pearson residual and ωi = (ri1 ri2 , ri1 ri3 , . . . , rini −1 rini )Tq×1 , Hi =
 
ni
Diag(V(ωij ))q×q , ξ i = E(ωi )q×1 , and q = .
2
2718 F. SARVI ET AL.

According to the estimating equation as mentioned above, we consider the latent


variable as response and set
(uis − pis )(uit − pit )
Uγ ist =  . (17)
pis (1 − pis )pit (1 − pit )
The expected value of (17) is ργ ist (e.g. correlation coefficient between uis and
uit ) and Uγ i = (Uγ i12 , Uγ i13 , . . . , Uγ ini −1,ni )T , and ρ γ i (α1 ) = E(Uγ i ) = (ργ i12 , ργ i13 , . . . ,
ργ ini −1,ni )T .
We can construct a generalized estimating equation to estimate α1 as:


N
∂ρ γ i (α1 )
W−1
γ i (Uγ i − ρ γ i (α1 )) = 0. (18)
i=1
∂α1

Here, Wγ i is set as an identity matrix and ρ γ i (α1 ) has exchangeable structure, and the
estimation of the parameter α1 is:
1
α̂1 = N ni
1/Ntotal i=1 j=1 (uij − pij )2 /pij (1 − pij )
 


N
s<t (u is − pis )(u it − pit )/ pis (u is − p is )pit (u it − p it )
× n (n −1)
. (19)
i i
i=1 2

Here Ntotal = N i=1 ni .
To estimate α2 (the correlation coefficient between yis and yit from Generalized Poisson),
we set
(yis − λis )(yit − λit )
Uβist =  . (20)
λis (1 + τ λis )2 λit (1 + τ λit )2
The expected value of (20) is ρβist (e.g. the correlation coefficient between yis
and yit from GP) and Uβi = (Uβi12 , Uβi13 , . . . , Uβini −1,ni )T , and ρ βi (α2 ) = E(Uβi ) =
(ρβi12 , ρβi13 , . . . , ρβini −1,ni )T .
The solving equation for estimating α2 is


N
∂ρ βi (α2 )
Wβ −1
i Hβi (Uβi − ρ βi (α2 )) = 0. (21)
i=1
∂α2

Where Hβi = Diag{(1 − ui1 )(1 − ui2 ), . . . , (1 − uini−1 )(1 − uini )}.
Here, Wβi is set as an identity matrix and ρ βi (α2 ) has exchangeable structure, and the
estimation of the parameter α2 is:
1 1
α̂2 =
∗ N ni
N 1/Ntotal i=1 j=1 (1 − uij )2 (yij − λij )2 /λij (1 + τ λij )2

 N  (1 − ui )(1 − uj )(yis − λis )(yit − λit )


×  . (22)
i=1 s<t λis (1 + τ λis )2 λit (1 + τ λit )2
  N 
Where N ∗ = N
i=1 s<t (1 − uis )(1 − uit ) and Ntotal = i=1 s<t (1 − uij ) .
2
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2719

An iterative method is carried on between estimating β and γ given the estimations


of τ , α1 and α2 , and estimating of α1 , α2 and τ given the estimations of β and γ until
convergence is yielded.
Variance estimation of the parameters is given in Appendix 1. In Appendix 1, we exhib-
ited the Hall and Zhang [17] and Kong et al. [18] proposed variances. The Kong’s proposed
formula is better than the proposed sandwich variance that introduced by Hall and Zhang
which ignored the variability of the latent variable and showed as (B̂1 )−1 M̂(B̂1 )−1 . Also,
the proposed sandwich variance by Kong et al. [18] may not be equal to the exact variance,
especially when the counts are highly correlated because the variability of the replacement
of conditional mean is ignored.
We also used exact non-parametric bootstrap method as an alternative to obtaining
variance estimates [24]. For this purpose, we select random samples with replacement of all
observed clusters and refit the GEE.ZIGP model on these samples. This process is carried
out 200 times and the parameters estimation and empirical variances is calculated for each
these 200 sets.

3. Application
We illustrated our approach on the DMFT index. This criterion is used as an oral health
index in many caries types of research and can be used to represent the caries experience
[25–27].
This application was on the DMFT index of children aged 5–14 years in Kurdistan
province which is placed in the west of Iran. The data used for this study is the part of
the National Health Survey conducted by the Ministry of Health, Treatment and Medical
Education of Iran in 1991.
Finding risk factors of caries experience by an appropriate model is important in
dentistry.
In this study, the 15 clusters of Kurdistan province were selected and the dental health
of children in these clusters was measured. The DMFT index, number of times that they
brushed per day (brushing), sex (male/female) and age of 213 children aged 5–14 year
from 15 clusters were recorded. From these 213 children, 53.1% were male and 46.9% were
female. The mean of age was 9.6 years (SD = 2.56) and 11.7% of them did not brush at
all, 37.1% of them brushed 1 time per day, 11.7% of them brushed 2 times, 28.2% of them
brushed 3 times and 11.3% of them brushed 4 times per day. The maximum number of
subjects (children) were in the 5th cluster with 27 children and the minimum was in the
6th cluster with 7 children. The DMFT distribution for different clusters with the mean
and standard deviation are presented in Table 1.
Figure 1 shows the frequency distribution of DMFT. It is obvious from this figure that
the counts of zeros are more than expected (about 42.7%).
Also, the score test for testing the inflation of zeros in Poisson multilevel count data
that was introduced by Moghimbeigi et al. [28] supported the existing of extra zeros in
DMFT data (p < 0.001). Also, the score test that was introduced by Moghimbeigi [29]
exhibited the existence of extra zeros in the negative binomial mixed model (p < 0.001).
These results show that the zero-inflated models can better fit the data rather than non-
zero inflated models. In the present study, the children of each cluster are considered as
subjects of the cluster and the DMFT index of them are measured. Since, the DMFT index
2720 F. SARVI ET AL.

Table 1. The decayed, missing, and filled teeth (DMFT) index in different clusters of Kurdistan province.
Clusters Number of children Per cent Mean of DMFT Standard deviation of DMFT
1 8 3.8 3.70 3.77
2 10 4.7 1.40 1.42
3 10 4.7 2.00 2.00
4 17 8.0 3.17 3.92
5 27 12.7 2.96 2.68
6 7 3.3 1.71 2.22
7 16 7.5 0.81 1.33
8 19 8.9 2.47 2.96
9 15 7.0 1.00 1.56
10 18 8.5 2.11 2.32
11 8 3.8 1.51 3.05
12 17 8.0 1.65 2.69
13 15 7.0 0.93 2.05
14 17 8.0 2.35 2.37
15 9 4.2 2.12 2.98
Total 213 100.0 2.05 2.63

Figure 1. Frequency distribution of DMFT in children aged 5–14 years.

for children within a cluster are correlated, it is appropriate to fit a GEE-based model for
finding risk factors and their effect on the response (DMFT index). Since the measures
had no time dependence and subjects were nested in clusters, exchangeable (compound
symmetric) correlation structure was considered to count and zero-inflation components
[30]. The common correlations of this structure were estimated by using (19) and (22) for-
mulas. In this part, we fitted GEE-based zero-inflated GP model, GEE-based zero-inflated
CMP model, GEE-based ZINB model, GEE-based ZIP model, GP and ZINB models and
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2721

compare their estimates. The parameter estimates of GEE-based models are obtained from
the ES algorithm and fisher-Scoring iterative method (11 and 12 formulas). The formula
for estimating dispersion parameter was (15) for GEE-based GP model. The parameter
estimates and the dispersion parameter estimates are presented in Table 2. For compari-
son of different formulas of variance estimators, we obtained these estimators and compare
the results that are proposed by Kong et al. which consider the variability of estimating of
latent variable (e.g. B̂−1 M̂ B̂−1 = (B̂1 + B̂2 )−1 M̂(B̂1 + B̂2 )−1 ) and the Hall and Zhang for-
mula which is presented as (B̂1 )−1 M̂(B̂1 )−1 formula and exact non-parametric bootstrap
method in Table 2. In Table 2 we also presented the p-values of the estimators which are
obtained from the Wald test.
The dispersion parameter in GEE.ZINB was set in zero to fitting the GEE.ZIP model
and for fitting the ZIGP model, we used the SAS software and NLMIXED procedure.
The initial values for parameters of GEE.ZINB and GEE.ZIP models are obtained from
independent ZINB model which is implemented by related package (pscl package) and
the results of GEE.ZINB model is considered as initial values of parameters of GEE.ZIGP
model.
The results of fitting models are reported in Table 2.
As shown in Table 2, the parameter estimates from fitting six regression models were
similar and also their direction were the same. The dispersion parameter value in the
GEE.ZINB model was 0.259, 0.121 in the GEE.ZICMP model and in the GEE.ZIGP
model was 0.116. This value showed that the variance was more than the mean and
GEE.ZIP model underestimated the variance. The lower p-values obtained from GEE.ZIP
model and also the dispersion-test (z = 4.2455, p-value < 0.001) emphasized on this
issue.
The correlation of observation (children) that nested in clusters in the both components
(count and zero) are small and this small correlation can be the reason of the similar results
of GEE.ZIGP and ZIGP models. The results of GEE.ZINB, GEE.ZICMP and GEE.ZIGP
models are similar too. Also, comparing the ZINB and ZIGP models indicated that there
are slight differences between the results. In the simulation part of this study, we considered
different correlations and compared the results.
In the count component among covariates that we considered, with considering the S.E
of Kong, the age is statistically significant and show that with increasing age, the value of
DMFT is increased.
With comparing the standard errors that are obtained from Hall&Zhang study and
Kong study and exact non-parametric boot-strap method, The S.E of Hall&Zhang under-
estimated the true variance, because this statistics don’t consider the variability of latent
variable. The Kong’s proposed S.E are better but this statistics also underestimate the
true variance, because it replaced the marginal variability with conditional variability.
On the other hand, the Hall&Zhang estimates could severely underestimate the true val-
ues, leading to optimistic p-values and potentially false positive results. In fitting the
GEE.ZIGP model on the data, we replaced the initial values of parameters with the results
of ZIGP model. However, very slight differences in parameter estimates and their standard
deviations were observed.
The result of exact non-parametric bootstrap method which recruited the 200 sam-
ples to calculate the SE, yielded no significant p-value. In this method the entire space of
resamples is used and therefore there is no additional bias.
2722
Table 2. The results of fitting GEE.ZIGP, GEE.ZINB, GEE.ZIP and ZIGP models to determine the effects of age, gender and brushing on DMFT in children with 5–14
years old.

F. SARVI ET AL.
GEE.ZIGP count component GEE.ZIGP zero component
Estimate S.E. (H&Zh) p-value S.E. (K) p-value S.E. (BS) p-value Estimate S.E. (H&Zh) p-value S.E. (K) p-value S.E. (BS) p-value
Intercept 0.106 0.379 0.780 0.459 0.818 1.59 0.958 2.96 0.561 < 0.001 0.766 < 0.001 2.888 0.309
Gender 0.075 0.145 0.608 0.180 0.679 0.816 0.924 0.065 0.222 0.769 0.329 0.842 1.815 0.970
Age 0.1 0.023 < 0.001 0.031 0.001 0.221 0.644 −0.317 0.053 < 0.001 0.082 < 0.001 0.766 0.682
Brushing −0.057 0.074 0.44 0.087 0.51 0.914 0.949 −0.342 0.118 0.004 0.172 0.048 0.245 0.159
τ 0.116
Correlation 0.009 0.061
GEE.ZICMP count component GEE.ZICMP zero component
Intercept 0.102 0.378 0787 0.462 0.825 2.95 0.56 < 0.001 0.766 < 0.001
Gender 0.075 0.139 0.590 0.179 0.676 0.0652 0.224 0.771 0.330 0.84
Age 0.10 0.024 < 0.001 0.032 0.002 −0.317 0.05 < 0.001 0.082 < 0.001
Brushing −0.0571 0.0749 0.446 0.088 0.508 −0.346 0.12 0.004 0.169 0.041
τ 0.121
Correlation 0.0083 0.061
GEE.ZINB count component GEE.ZINB zero component
Intercept 0.083 0.378 0.825 0.462 0.857 2.93 0.559 < 0.001 0.768 < 0.001
Gender 0.078 0.145 0.591 0.181 0.666 0.068 0.222 0.760 0.330 0.837
Age 0.102 0.032 0.001 0.023 < 0.001 −0.314 0.053 < 0.001 0.082 < 0.001
Brushing −0.058 0.075 0.435 0.088 0.508 −0.345 0.118 0.003 0.174 0.047
τ 0.259
Correlation 0.009 0.062
GEE.ZIP count component GEE.ZIP zero component
Intercept 0.214 0.324 0.509 0.376 0.570 3.006 0.589 < 0.001 0.648 < 0.001
Gender 0.093 0.145 0.522 0.166 0.577 0.033 0.243 0.891 0.282 0.906
Age 0.096 0.023 < 0.001 0.029 < 0.001 −0.3 0.124 0.013 0.067 < 0.001
Brushing −0.066 0.075 0.381 0.086 0.442 −0.307 0.056 < 0.001 0.143 0.032
τ 0
Correlation 0.0001 0.062

(continued).
Table 2. Continued.
ZIGP count component ZIGP zero component

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION


Estimate S.E p-value Estimate S.E p-value
Intercept 0.090 0.4015 0.8223 2.822 0.9268 0.0026
Gender 0.079 0.1532 0.6044 0.047 0.3942 0.9044
Age 0.099 0.03317 0.0026 −0.321 0.09242 0.006
Brushing −0.054 0.06473 0.4035 −0.270 0.1738 0.121
τ 0.123 0.03787 0.0013
ZINB count component ZINB zero component
Intercept 0.080 0.422 0.8498 2.80 0.9271 0.0028
Gender 0.077 0.1602 0.6317 0.042 0.4102 0.9185
Age 0.098 0.0332 0.0035 −0.320 0.1001 0.0016
Brushing −0.057 0.077 0.4599 −0.273 0.1740 0.118
τ 0.201 0.042 < 0.001

2723
2724 F. SARVI ET AL.

4. Simulation
We evaluated the performance of the GEE.ZIGP model’s estimators via a simulation study
based on the data with correlation and extra zeros.
We used the 15 clusters like the real DMFT data with m subjects in each cluster. We
varied the numbers of subjects in each cluster and compared the results. We also varied the
correlation value within subjects as absent, medium and high and compared the results.
We generated the counts of the GP model (e.g. wij ) by considering a covariate that was
generated from a uniform distribution and another covariates that was generated from the
standard normal distribution, where the parameters (β0 , β1 , β2 ) were considered as (2.25,
0.05, −0.05).
We considered the following log-linear model and generated the data of the count
model.

log(λij ) = 2.25 + 0.05Xi1 − 0.05Xi2 , (i = 1, . . . , 15, j = 1, . . . , m). (23)

We used the mentioned covariates with the parameters (γ0 , γ1 , γ2 ) that were set at (0.8,
−0.6, −0.06) to generate the data from zero component with probability pij witch was
controlled via following logistic regression:

logit(pij ) = 0.8 − 0.6Xi1 − 0.06Xi2 , (i = 1, . . . , 15, j = 1, . . . , m). (24)

To generate the correlated count data with the size of m (the size of subjects in each
cluster) from the log-linear model, we supposed that these data were extracted from mul-
tivariate normal distribution with correlation of α and started with a sample from m
correlated standard normal variables with common correlation named α and then, we
applied the quantile-probability transformation to get m correlated Generalized Poisson
random variable while maintaining the marginal mean equal to (23).
To generate the m correlated indicators (uij ) in each cluster, we started with a sample
from m correlated standard normal variables with common correlation named α. Then we
applied quantile-probability transformation (the inverse c.d.f of the binomial distribution)
and convert them into m correlated binary random variables. We ensure that the marginal
probabilities were maintained as (24).
For simplicity, we applied the exchangeable working correlation matrix with the same
common correlation in both components of the model.
By considering the values of indicator variables for subjects, if uij = 1 then the response
value was equal to zero for that subject, if uij = 0 then the response value for that subject
was equal to corresponding Generalized Poisson random variable.
We conducted a simulation study with different scenarios as:

(1) We generated 1000 samples of size 75 (15 clusters with a size of 5 for each cluster) and
225 (15 clusters with a size of 15 for each cluster) and 450 (15 clusters with a size of 30
for each cluster).
(2) We generated 2000 samples of size 450 by considering three scenarios for correlation
as absent, medium (α = 0.5) and high (α = 0.9).
(3) We generated 1000 samples of size 450 by considering under-dispersion (τ = −1).
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2725

We computed the estimates of the parameters and their standard errors from ZIGP and
GEE.ZIGP for every dataset. The mean of these parameter estimates and their variances
was calculated and reported as final parameter estimates and their variances. According to
these estimates, the biases (bias(θ̂) = Eθ (θ̂) − θ ) and the mean squared error (MSE) which
is calculated as MSE(θ̂) = varθ (θ̂) + bias2 (θ̂ ), were obtained and presented in Table 3. The
results from zero component was given in Appendix 2 (Table A1).

4.1. Comparison of estimators from GEE.ZIGP and ZIGP models (the size of clusters:
5, 15 and 30 and the correlation set in 0.5)
Based on the results of the simulation study, the biases and the values of the MSE of the
dispersion estimator in the both components of the GEE.ZIGP model were less than the
ZIGP model. For the other regression coefficient estimators, although the biases of the
estimators were similar in both models, but the values of the MSE are less than in the both
components of the GEE.ZIGP model.
Also, the Q–Q plots of the estimated coefficients show the normal distribution of
estimated coefficients in GEE.ZIGP model (Figure 2).
Also, with increasing the sample size in each cluster, the MSE of the estimators in both
models for both components were decreased.
Comparison of estimators from GEE.ZIGP and ZIGP models (correlation as absent,
medium (α = 0.5) and high (α = 0.9) with cluster size of m = 30).
In this part, we compared the results of both models for various correlations (absent,
α = 0.5, α = 0.9). The biases of estimators of both models for both components were
almost similar for various correlations. By increasing the inter-cluster correlation, the vari-
ances and consequently the MSE of estimators which resulted from ZIGP model were
somewhat more than GEE.ZIGP model. Also, in the ZIGP model with increasing the
correlation, the MSE of estimators were increased partly. These results showed that with
increasing the inter-cluster correlation, using a model that can cover this correlation is
important.

4.2. Comparison of estimators from GEE.ZIGP and ZIGP models in the presence of
under-dispersion τ = −1
In the presence of under-dispersion and correlation of 0.5 and sample sizes of 30 for each
cluster, the biases for GEE.ZIGP model for two components were almost close to the biases
of ZIGP model. The MSE and variance of GEE.ZIGP model was less than the results of
ZIGP model. These results show that, in the presence of correlation and under-dispersion,
the GEE.ZIGP model can result in better estimators. The situation in that the convergence
of ZINB model failed in many iterations (995(%99.5) 0f 1000 iteration) and this model is
no longer sufficient.

5. Discussion
In this paper, we developed a new model based on GEE and named it GEE.ZIGP model
for analysing correlated count data with excessive zeros and with the presence of over or
under-dispersion and presented the inference procedures.
2726
Table 3. Simulation results related to bias, variance and mean square of errors of estimators in GEE.ZIGP and ZIGP models (count components).

F. SARVI ET AL.
GEE.ZIGP model, count component
M=5 M = 15 M = 30
Truth mean bias variance MSE mean bias variance MSE mean bias variance MSE
intercept 2.25 2 0.25 0.01 0.072 2.151 0.099 0.009 0.019 2.20 0.05 0.008 0.011
X1 0.05 0.03 0.02 0.0004 0.0008 0.033 0.017 0.0004 0.0007 0.042 0.008 0.0003 0.0003
X2 −0.05 −0.03 −0.02 0.002 0.0024 −0.039 −0.011 0.0019 0.0021 −0.044 −0.006 0.002 0.002
τ 0.3 0.1 0.2 1.2 1.24 0.18 0.12 0.81 0.8244 0.22 0.08 0.72 0.73
ZIGP model, count component
intercept 2.25 1.9 0.35 0.013 0.135 2.155 0.094 0.006 0.015 2.201 0.049 0.0082 0.011
X1 0.05 0.03 0.02 0.005 0.0054 0.032 0.018 0.0007 0.0012 0.04 0.01 0.0007 0.0008
X2 −0.05 −0.031 −0.019 0.009 0.0093 −0.038 −0.012 0.0028 0.003 −0.043 −0.007 0.0024 0.0025
τ 0.3 2.4 −2.1 2.80 7.21 2.32 −2.02 2.54 6.62 1.8 −1.5 2.23 4.48
GEE.ZIGP model, count component
α=0 α = 0.5 α = 0.9
intercept 2.25 2.29 −0.04 0.01 0.0116 2.200 0.05 0.008 0.0105 2.3 0.05 0.01 0.0125
X1 0.05 0.033 0.017 0.0004 0.0007 0.0420 0.008 0.0003 0.0004 0.045 0.005 0.0008 0.0008
X2 −0.05 −0.039 −0.011 0.0019 0.0021 −0.044 −0.006 0.002 0.002 −0.04 −0.01 0.002 0.0021
τ 0.3 0.24 0.06 0.8 0.804 0.22 0.08 0.72 0.726 0.27 0.03 0.701 0.70
ZIGP model, count component
intercept 2.25 2.20 0.05 0.011 0.013 2.21 0.04 0.014 0.016 2.211 0.039 0.0142 0.016
X1 0.05 0.032 0.018 0.0004 0.0007 0.04 0.01 0.001 0.0011 0.0453 0.005 0.0016 0.0016
X2 −0.05 −0.039 −0.011 0.0019 0.002 −0.043 −0.007 0.0024 0.0024 −0.041 −0.09 0.0029 0.003
τ 0.3 0.41 −0.11 1.012 1.024 1.03 −0.73 1.2 1.73 1.14 −0.84 1.64 2.34
GEE.ZIGP model, count component ZIGP model, count component
τ = −1 τ = −1
intercept 2.25 2.251 −0.001 0.00023 0.0002 2.242 0.008 0.002 0.0021
X1 0.05 0.0481 0.0019 0.00013 0.00013 0.0472 0.0028 0.003 0.003
X2 −0.05 −0.0491 −0.0009 0.0005 0.0005 −0.0475 −0.0025 0.0008 0.0008
τ −1 −0.92 −0.08 0.12 0.126 −1.16 0.16 0.324 0.3496
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2727

Figure 2. The Q–Q plots of the estimation of the parameters for count component (Panels A1–A3) and
the zero-inflated component (Panels B1–B3) of GEE.ZIGP model, each plot is based on the estimated
parameters from 500 simulated datasets. The quantiles of standard normal distribution is present as x-
axis, and the y-axis is the quantiles of the 500 estimated parameters.

We applied the approach of Satten et al. [31] and incorporated the GEE method in the
ES algorithm to estimate the parameters because the estimations of this method are closer
to the maximum likelihood estimations. Rosen et al. [22] applied their methodology to the
GLM components of mixture models. We exhibited that this methodology can be applied
to the models outside GLMs such as the ZIGP model.
Usefulness of the GEE.ZIGP model over other models explained by an application of this
model to the DMFT indices of children with age, brushing and sex covariates. The results
show that the dispersion parameter is significant and the ZIP model is no more sufficient
to handle these data. On the other hand, the presence of zero inflation in the dataset was
significant and the use of GEE model lonely cannot be advisable.
In the application part, the results of GEE.ZIGP model and the results of ZIGP model
were similar because the inter cluster correlation was small but with increasing the inter
cluster correlation in the simulation part, the use of GEE.ZIGP model resulted in valid
estimations.
These results are similar to the extracted results of the study of Almasi et al. [32] who is
considered the multilevel ZIGP model and resulted in accurate and valid results. Also, these
results are similar to the results of Mahmoodi et al. [33] who is considered the semipara-
metric models for the multilevel over-dispersed count data with extra zeros and found out
2728 F. SARVI ET AL.

that the semiparametric multilevel ZIGP model for over dispersed count data with excess
zeros results in valid results.
By comparing the results of GEE.ZIGP model and GEE.ZINB model in the fitting on
DMFT data (over -dispersion), even though the results were similar but the GEE.ZIGP
model was converged faster than the GEE.ZINB model (about 1.5 times faster). This result
was almost similar to the result of Almasi et al. [32] study, which the multilevel ZIGP model
converged two times faster than the multilevel ZINB model. In addition, this result is in
agreement with the result of Famoy and Singh [19] who considered the ZIGP and ZINB
models and found out that there are situations in which the convergence of the iterative
estimation technique in fitting the ZINB model may fail. Also, in this paper, three types
of variances for estimators are discussed in the application part. We found out that even
though the Kong et al. proposed variance was better than the Hall et al. proposed variance
in explaining the latent sources of variability, it had pitfalls. Therefore, we followed the
study of Kisielinska [24] which applied the exact non-parametric bootstrap method whose
results is closer to reality, but in the simulation part, the results of the two methods were
very close and the results of the Kong et al. proposed variance have been reported.
In the simulation study for under-dispersed correlated generated count data, we saw
that the GEE.ZIGP model exhibited a better fit than the independent ZIGP model. Also,
for under dispersed data, we fitted the GEE.ZINB model, but in 75% of iterations, this
model did not converge. This result is similar to the study of Almasi et al. [32] that for
under-dispersed generated data, the multilevel ZIGP model exhibited a better fit than the
multilevel ZINB model.
However, The coefficients of multilevel zero-inflated models like Multilevel zero inflated
negative binomial and Multilevel zero inflated generalized Poisson model models are inter-
preted at the individual level and for interpretation at the population level, it is important
to apply a proper model like GEE models [9,28,32].
The limitation of our study was the situation in that the root of cubic equation for esti-
mating of dispersion parameter result in complex roots. These situations that is rarely
occurred in the simulation study were ignored. The extension of the GEE.ZIGP model
to these situations is proposed for future studies. Also, the multilevel extend of this model
is proposed to future studies which can be applied to the dataset with several levels and we
used the exchangeable correlation matrix to clustered data, the extension of this model to
longitudinal data with considering other correlation matrixes is proposed.
We considered the dispersion parameter constant in this study, but another approach
could allow for regression on the dispersion parameter for studying the regression effects
on dispersion level. The use of this approach and compare the results with the present study
are recommended.

Acknowledgments
This study was adapted from a PhD thesis at Hamadan University of Medical sciences.

Disclosure statement
No potential conflict of interest was reported by the authors.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2729

Funding
The study was funded by Vice-chancellor for Research and Technology, Hamadan University of
Medical Sciences (Grant No.9609286079).

ORCID
Abbas Moghimbeigi http://orcid.org/0000-0002-3803-3663

References
[1] Cheung YB. Zero-inflated models for regression analysis of count data: a study of growth and
development. Stat Med. 2002;21(10):1461–1469.
[2] Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing.
Technometrics. 1992;34(1):1–14.
[3] Ridout M, Hinde J, Demétrio CG. A score test for testing a zero-inflated Poisson regression
model against zero-inflated negative binomial alternatives. Biometrics. 2001;57(1):219–223.
[4] Vieira A, Hinde JP, Demétrio CG. Zero-inflated proportion data models applied to a biological
control assay. J Appl Stat. 2000;27(3):373–389.
[5] Hall DB. Zero-inflated Poisson and binomial regression with random effects: a case study.
Biometrics. 2000;56(4):1030–1039.
[6] Famoye F, Singh KP. On inflated generalized Poisson regression models. Adv Appl Stat.
2003;3(2):145–158.
[7] Czado C, Erhardt V, Min A, et al. Zero-inflated generalized Poisson models with regression
effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates.
Stat Model. 2007 Jul;7(2):125–153. doi:10.1177/1471082x0700700202
[8] Hedeker D, Gibbons RD. Longitudinal data analysis. Vol. 451. Chicago: John Wiley & Sons;
2006.
[9] Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. Vol. 998. Boston: John
Wiley & Sons; 2012.
[10] Hardin JW. Generalized estimating equations (GEE). New York: Wiley Online Library; 2005.
[11] Zeileis A, Kleiber C, Jackman S. Regression models for count data in R. J Stat Softw.
2008;27(8):1–25.
[12] Erhardt V, Czado C. Generalized estimating equations for longitudinal generalized Poisson
count data with regression effects on the mean and dispersion level; 2009. Preprint.
[13] Yau KK, Lee AH. Zero-inflated Poisson regression with random effects to evaluate an occupa-
tional injury prevention programme. Stat Med. 2001;20(19):2907–2920.
[14] Choo-Wosoba H, Datta S. Analyzing clustered count data with a cluster-specific random effect
zero-inflated Conway–Maxwell–Poisson distribution. J Appl Stat. 2018;45(5):799–814.
[15] Choo-Wosoba H, Gaskins J, Levy S, et al. A Bayesian approach for analyzing zero-inflated
clustered count data with dispersion. Stat Med. 2018;37(5):801–812.
[16] Dobbie MJ, Welsh AH. Theory & methods: modelling correlated zero-inflated count data. Aust
N Z J Stat. 2001;43(4):431–444.
[17] Hall DB, Zhang Z. Marginal models for zero inflated clustered data. Stat Model.
2004;4(3):161–180.
[18] Kong M, Xu S, Levy SM, et al. GEE type inference for clustered zero-inflated negative binomial
regression with application to dental caries. Comput Stat Data Anal. 2015;85:54–66.
[19] Famoye F, Singh KP. Zero-inflated generalized Poisson regression model with an application
to domestic violence data. J Data Sci. 2006;4(1):117–130.
[20] Joe H, Zhu R. Generalized Poisson distribution: the property of mixture of Poisson and
comparison with negative binomial distribution. Biom J. 2005;47(2):219–229.
[21] Choo-Wosoba H, Levy SM, Datta S. Marginal regression models for clustered count data
based on zero-inflated Conway–Maxwell–Poisson distribution with applications. Biometrics.
2016;72(2):606–618.
2730 F. SARVI ET AL.

[22] Rosen O, Jiang W, Tanner MA. Mixtures of marginal models. Biometrika. 2000;87(2):391–404.
[23] Frome EL, Kutner MH, Beauchamp JJ. Regression analysis of Poisson-distributed data. J Amer
Statist Assoc. 1973;68(344):935–940.
[24] Kisielinska J. The exact bootstrap method shown on the example of the mean and variance
estimation. Comput Stat. 2013;28(3):1061–1077.
[25] Becker T, Levin L, Shochat T, et al. How much does the DMFT index underestimate the need
for restorative care? J Dent Educ. 2007;71(5):677–681.
[26] Eslamipour F, Borzabadi-Farahani A, Asgari I. The relationship between aging and oral health
inequalities assessed by the DMFT index. Eur J Paediatr Dent. 2010;11(4):193.
[27] Schiffner U, Hoffmann T, Kerschbaum T, et al. Oral health in German children, adolescents,
adults and senior citizens in 2005. Community Dent Health. 2009;26(1):18–22.
[28] Moghimbeigi A, Eshraghian MR, Mohammad K, et al. Multilevel zero-inflated negative
binomial regression modeling for over-dispersed count data with extra zeros. J Appl Stat.
2008;35(10):1193–1202.
[29] Moghimbeigi A. A score test for extra zeros in negative binomial mixed models. J Stat Comput
Simul. 2011;81(5):635–644.
[30] Hardin JW. Generalized estimating equations (GEE). Encyclopedia of Statistics in Behavioral
Science; 2005.
[31] Satten GA, Datta S. The SU algorithm for missing data problems. Comput Stat.
2000;15(2):243–277.
[32] Almasi A, Rahimiforoushani A, Eshraghian MR, et al. Effect of nutritional habits on
dental caries in permanent dentition among schoolchildren aged 10-12 years: a zero-in-
flated generalized Poisson regression model approach [article]. Iran J Public Health. 2016
Mar;45(3):353–361.
[33] Mahmoodi M, Moghimbeigi A, Mohammad K, et al. Semiparametric models for multilevel
overdispersed count data with extra zeros. Stat Methods Med Res. 2018;27(4):1187-1201

Appendices
Appendix 1. Variance estimation of the parameters
We follow the work of Kong et al. to estimate the variance of the parameters. They introduced the
sandwich variance that considered the variability of the latent variable. This estimator is a good
alternative to the proposed variance estimator by Hall that ignored the variability because of latent
variables. When we have a latent variable in the model, the latent variables in the ES algorithm, e.g.
(h)
uij must be replaced by their conditional mean uij that introduced in (6) formula of main text.

To estimate the variance, in the first step, we set N i=1 Si (ui , yi |θ) = 0, here θ = (β, γ , τ ) and
⎛ ∂pTi

−1
∂γ {Vγ i } (ui − pi )
Si (ui , yi |θ) = ⎝ ⎠ (A1)
∂λTi −1 Diag(1 − u )(y
∂β {Vβi } i i − λi )

In the second step, we replaced the latent variables by their conditional mean as:
⎛ ∂pTi

−1 (h)
∂γ {Vγ i } (ui − pi )
Si (yi |θ) = ⎝ ⎠. (A2)
∂λTi −1 Diag(1 − u(h) )(y
∂β {Vβi } i i − λi )

In the third step we set the covariance for the β and γ by the sandwich form as follow:
 
γ̂
ˆ
Var = B̂−1 M̂ B̂−1 .
β̂
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2731

⎛ −1
⎞⎛ −1
⎞T
∂ p̂Ti ∂ p̂Ti

N
∂ γ̂
{V̂γ i } (ûi − p̂i ) ∂ γ̂
{V̂γ i } (ûi − p̂i )
⎜ ⎟⎜ ⎟
M̂ = ⎝ T ⎠⎝ T ⎠ . (A3)
∂ λ̂i −1 ∂ λ̂i −1
i=1 {V̂βi } Diag(1 − ûi )(yi − λ̂i ) {V̂βi } Diag(1 − ûi )(yi − λ̂i )
∂ β̂ ∂ β̂

Where B̂ = N i=1 ∂Si (yi |θ)/∂θ , and by the work of Satten and Data, the following formula is
obtained for the marginal Hessian matrix.
  
∂Si (yi |θ ) ∂Si (ui , yi |θ) T
= + [S i (u i , y i |θ) − S i (y i |θ)][S i (u i , y i |θ) − S i (y i |θ)] dFθ (ui |yi ).
∂θ ∂θ T
(A4)
Here dFθ (ui |yi ) is the conditional cumulative density function of ui given yi .
The first component is as follows:
⎛ ∂pT ⎞
   − i
{V }−1 ∂pi
T
0
∂Si (ui , yi |θ) ∂γ γ i ∂γ
dFθ (ui |yi ) = ⎝ ⎠ = B1i .
∂θ T 0
∂λT
− ∂βi {Vβi }−1 Diag(1 − ui ) ∂βi
∂λT

(A5)
The second component is as follows:

{[Si (ui , yi |θ ) − Si (yi |θ)][Si (ui , yi |θ) − Si (yi |θ)]T } dFθ (ui |yi )
⎛ ∂pTi
⎞ ⎛ ∂pTi
⎞T
−1 −1
∂γ {Vγ i } ∂γ {Vγ i }
=⎝ ⎠ Var(ui |yi )⎝ ⎠ = B2i . (A6)
∂λTi −1 ∂λTi −1
− ∂β {Vβi } Diag(yi − λi ) − ∂β {Vβi } Diag(yi − λi )
Where
V̂ar(ui |yi ) = Diag(uhij (1 − uhij ))1/2 {Rγ̂ i }Diag(uhij (1 − uhij ))1/2 .
Thus, the sandwich estimator for the variance of θ̂ is B̂−1 M̂ B̂−1 = (B̂ −1 B̂1 + B̂2 )−1 . As
1 + B̂2 ) M̂(
seen, the variability of the latent variable is explained by B̂2 and B1 = B1i , B2 = B2i .
The estimations are obtained via replacing all estimations until convergence.
2732
F. SARVI ET AL.
Appendix 2. The simulation results of zero component

Table A1. Simulation results related to bias, variance and mean square of errors of estimators in GEE.ZIGP and ZIGP models (zero components).
GEE.ZIGP model, zero component
M=5 M = 15 M = 30
Truth mean bias variance MSE mean bias variance MSE mean bias variance MSE
intercept 0.8 0.84 −0.04 0.154 0.1556 0.82 −0.02 0.15 0.1504 0.81 −0.01 0.10 0.10
X1 −0.6 −0.64 0.04 0.013 0.0146 −0.63 0.03 0.013 0.0139 −0.608 0.008 0.01 0.010
X2 −0.06 −0.050 −0.01 0.032 0.032 −0.057 −0.003 0.02 0.020 −0.058 −0.002 0.018 0.018
ZIGP model, zero component
intercept 0.8 0.84 −0.04 0.150 0.152 0.83 −0.03 0.154 0.155 0.821 −0.021 0.124 0.124
X1 −0.6 −0.631 0.031 0.0147 0.0156 −0.631 0.031 0.014 0.015 −0.611 0.011 0.01 0.0101
X2 −0.06 −0.051 −0.009 0.028 0.028 −0.053 −0.007 0.027 0.027 −0.055 −0.005 0.02 0.0201
GEE.ZIGP model, zero component
α=0 α = 0.5 α = 0.9
intercept 0.8 0.84 −0.04 0.18 0.18 0.81 −0.01 0.11 0.11 0.81 −0.01 0.09 0.10
X1 −0.6 −0.62 0.02 0.013 0.013 −0.608 0.008 0.01 0.011 −0.61 0.01 0.011 0.011
X2 −0.06 −0.053 −0.007 0.04 0.04 −0.058 −0.002 0.0181 0.018 −0.058 −0.002 0.018 0.018
ZIGP model, zero component
intercept 0.8 0.84 −0.04 0.181 0.182 0.841 −0.041 0.181 0.182 0.799 −0.001 0.195 0.195
X1 −0.6 −0.62 0.02 0.0135 0.0139 −0.611 0.011 0.014 0.014 −0.62 0.02 0.014 0.0145
X2 −0.06 −0.053 −0.007 0.041 0.042 −0.053 −0.007 0.044 0.044 −0.056 −0.004 0.049 0.05
GEE.ZIGP model, zero component ZIGP model, zero component
τ = −1 τ = −1
intercept 0.8 0.78 0.02 0.105 0.1054 0.738 0.062 0.11 0.113
X1 −0.6 −0.537 −0.063 0.0072 0.011 −0.529 0.071 0.009 0.014
X2 −0.06 −0.061 −0.001 0.03 0.03 −0.061 −0.001 0.038 0.038

You might also like