Robust Inference in The Multilevel Zero-Inflated Negative Binomial Model

Journal of Applied Statistics
ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: https://www.tandfonline.com/loi/cjas20
Robust inference in the multilevel zero-inflated

negative binomial model
Eghbal Zandkarimi, Abbas Moghimbeigi, Hossein Mahjub & Reza Majdzadeh
To cite this article: Eghbal Zandkarimi, Abbas Moghimbeigi, Hossein Mahjub & Reza Majdzadeh
(2019): Robust inference in the multilevel zero-inflated negative binomial model, Journal of Applied
Statistics, DOI: 10.1080/02664763.2019.1636942
To link to this article: https://doi.org/10.1080/02664763.2019.1636942
Published online: 02 Jul 2019.
Submit your article to this journal
Article views: 18
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=cjas20
JOURNAL OF APPLIED STATISTICS
https://doi.org/10.1080/02664763.2019.1636942
Robust inference in the multilevel zero-inflated negative

binomial model
Eghbal Zandkarimi a , Abbas Moghimbeigi b , Hossein Mahjub c and
Reza Majdzadeh d
a Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan,
Iran; b Modeling of Noncommunicable Diseases Research Center, Department of Biostatistics, School of Public
Health, Hamadan University of Medical Sciences, Hamadan, Iran; c Research Center for Health Sciences,
Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran;
d Iranian Institute for Health Sciences Research, Tehran University of Medical Sciences, Tehran, Iran
ABSTRACT ARTICLE HISTORY

A popular way to model correlated count data with excess zeros and Received 3 October 2018
over-dispersion simultaneously is by means of the multilevel zero- Accepted 20 June 2019
inflated negative binomial (MZINB) distribution. Due to the complex- KEYWORDS
ity of the likelihood of these models, numerical methods such as the Expectation-maximization
EM algorithm are used to estimate parameters. On the other hand, (EM) algorithm; robust
in the presence of outliers or when mixture components are poorly expectation-solution (RES);
separated, the likelihood-based methods can become unstable. To decayed- missing and filled
overcome this challenge, we extend the robust expectation-solution teeth (DMFT); mean square
(RES) approach for building a robust estimator of the regression error (MSE); multilevel
parameters in the MZINB model. This approach achieves robustness zero-inflated negative
binomial (MZINB)
by applying robust estimating equations in the S-step instead of esti-
mating equations in the M-step of the EM algorithm. The robust esti-
mation equation in the logistic component only weighs the design
matrix (X) and reduces the effect of the leverage points, but in the
negative binomial component, the influence of deviations on the
response (Y) and design matrix (X) are bound separately. Simulation
studies under various settings show that the RES algorithm gives us
consistent estimates with smaller biases than the EM algorithm under
contaminations. The RES algorithm applies to the data of the DMFT
index and the fertility rate data.
1. Introduction
The zero-inflated (ZI) regression models are proposed to model count data with excess
zeros. These models are a mixture of two components, the first component is the logistic
regression (structural zero) and the second component is the Poisson regression, neg-
ative binomial regression and etc. Various ZI models have been introduced, including
the zero-inflated Poisson (ZIP) model by Lambert [27], the zero-inflated binomial (ZIB)
model by Hall [17], the zero-inflated negative binomial (ZINB) model by Greene [16], the
zero-inflated generalized Poisson (ZIGP) model by Famoye [13] and the Hardel model
CONTACT Abbas Moghimbeigi moghimb@yahoo.com Modeling of Noncommunicable Diseases Research

Center, Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
© 2019 Informa UK Limited, trading as Taylor & Francis Group
2 E. ZANDKARIMI ET AL.
extends by Mullahy [36]. In some cases, due to the fact that the study design is hierarchi-
cal or longitudinal, a kind of correlation is observed in the data. Hall [17], Yau [47], and
Hur [23] considered correlated ZI models with cluster-specific random effects, whereas
Lee [28], Moghimbeigi [34], Zhu [48] and Almasi [2] considered correlated ZI models
with random effects in both components. In practice, the count data (correlated or non-
correlated) with excess zeros are often over-dispersed and if the ZIP model (correlated or
non-connected) is used to fit such data, the frequency of the observed zero exceeds from the
frequency of the expected zero and therefore estimation of the ZIP parameters in the Pois-
son component can be severely biased. On the other hand, in the ZINB model, the observed
over-dispersion parameter has modeled via the negative binomial component [47]. There-
fore, in order to overcome the over-dispersion in the ZI data, it is suggested to use the
ZINB model instead of the ZIP model [12]. Given the fact that multilevel ZI models have
a complex likelihood function, therefore numerical methods such as the EM algorithm
have proposed for estimating parameters. On the other hand, in the multilevel ZI models
to obtain a stable estimate, the EM algorithm in conjunction with penalized likelihood is
applied, and the restricted maximum likelihood (REML) methods are used to estimate of
the variance components [28,34,47]. The likelihood-based methods are consistent and effi-
cient, but these methods are sensitive to the existence of outliers and the poor separation
of components in ZI models and provide unstable estimates and may not be consistent and
efficient [18]. To solve this problem, Hall and Shen [18] have introduced the RES approach
for a non-correlated ZIP model. The RES approach is the subset of the M-estimator class
[22]. This approach achieves robustness in parameter estimate by applying robust estimat-
ing equations instead estimating equations in the M-step of the EM algorithm. Therefore,
we extend the RES approach for building robust estimator of the regression parameters in
the MZINB model. The paper is organized as follows. After a short review of a three-level
ZINB model in section 2.1, in sections 2.2 and 2.3, the EM algorithm and the extension of
the RES algorithm are expressed in the three-level ZINB model. Section 3 illustrates simu-
lation studies and section 4 uses a robust estimation method for DMFT data for elementary
students and the factors affecting the number of birth, and finally, section 5 relates to the
discussion.
2. Methods
2.1. Three-levels zero-inflated negative-binomial (ZINB) model
Let Yijk (i = 1, 2, . . . , m, j = 1, 2, . . . , ni , k = 1, 2, . . . , nij ) represents the kth subject
m of
the jth cluster of the second level within ni of the third level. Let n = i=1 ni
ith cluster
be the total number of clusters and N = m i=1 j=1 nij gives the total number of sub-
jects. Therefore, the probability distribution function of the three-level ZINB model can be
written as
⎧
⎪ r r
⎪
⎪ πijk + (1 − πijk )( ) yijk = 0
⎨ μijk + r
p(Yijk = yijk ) = (1)
⎪
⎪ (yijk + r) μ yijk
⎪
⎩(1 − πijk ) ( r )r ( ijk ) yijk > 0
(yijk + 1)(r) μijk +r μijk +r
JOURNAL OF APPLIED STATISTICS 3
where r−1 is an over-dispersion parameter of the underlying NB distribution and πijk is

the probability of an extra zero response (0 ≤ πij ≤ 1). The μijk is the mean of the negative
binomial component. As r−1 → 0 then the three-level ZINB distribution converges to a
three-level ZIP distribution. Hence the response of subjects is nested in the second-level
clusters and second-level clusters were also being nested in third-level clusters. For con-
sideration correlation in the observations (within the same cluster), random cluster effects
are introduced into the linear predictors [47], therefore
T
logit(πijk ) = ξijk = Xijk × α + ui + τij (2)
T
log(μijk ) = ηijk = Xijk × β + νi +
ij (3)
where XN×(p+1) and XN×(q+1) have full rank p and q for the logistic and the NB
components, respectively, and α(p+1)×1 and β(q+1)×1 are the corresponding vectors of
regression coefficients. As seen in relations (2) and (3), the mixing probability and
mean of the component of the negative binomial are linked to the independent vari-
ables through logit and logarithmic link functions. The vectors u = (u1 , u2 , . . . , um ) and
ν = (ν1 , ν2 , . . . , νm ) denote random effects of the third level in the components of logistic
and negative binomial, respectively, whereas τ = (τ11 , . . . , τ1n1 , . . . , τm1 , . . . , τmnm ) and
= (
11 , . . . ,
1n1 , . . . ,
m1 , . . . ,
mnm ) are the random effects of the second level. For
simplicity of interpretation and mathematical calculations, the random effects u, ν, τ and
are assumed to be independent and normally distributed with, mean zero and variances
σu2 , σv2 , στ2 and σ
2 , respectively [28,47].
2.2. Expectation -maximization (EM) algorithm

Due to the non-linearity the likelihood function of the ZINB model, the use of numeri-
cal procedures such as the EM algorithm has proposed for estimating parameters [32]. In
this algorithm, penalized log-likelihood used to ensure the convergence of the algorithm
and stability of the estimated parameters [28,34,47]. Therefore, the logarithm likelihood of
fixed effects is penalized by the logarithm likelihood of random effects.
l = l1 + l2

eξijk + tijk
r

l1 = log +
yijk =0
1 + eξijk yijk>0

(yijk + r)
× log( ) + r × log(tijk ) + yijk × log(1 − tijk ) − log(1 + eξijk ) (4)
(yijk + 1)(r)
1
l2 = − {m × log(2π σν2 ) + σν−2 ν T ν + n × log(2π σ
2 ) + σ
−2
T
}
2
1
− {m × log(2π σu2 ) + σu−2 uT u + n × log(2π στ2 ) + στ−2 τ T τ } (5)
2
where
r r
tijk =
r + exp(X T ijk × β + νi +
ij )
Estimation proceeds by maximizing l1 with variance components fixed at their current

values and then updating the variance components using REML estimates obtained by
the consideration of l2 [34]. If lc is the logarithm of the likelihood of complete data, then
lc = lξ + lη , in this case, lξ is the logarithm of the likelihood of the logistic component and
lη is the logarithm of the likelihood of the negative binomial component

lξ = {zijk × ξijk − log(1 + eξijk )}
ijk
1
− {m × log(2π σu2 ) + σu−2 uT u + n × log(2π στ2 ) + στ−2 τ T τ } (6)
2

(yijk + r)
lη = (1 − zijk ) × log( ) + r × log(tijk ) + yijk × log(1 − tijk )
(yijk + 1)(r)
ijk
1
− {m × log(2π σν2 ) + σν−2 ν T ν + n × log(2π σ
2 ) + σ
−2
T
} (7)
2
where Zijk is an unobserved binary variable indicating whether yijk comes from the latent
class of 0 (Zijk = 1) or yijk ∼ NB (Zijk = 0). Such a decomposition of the complete-data
log-likelihood enables a convenient method for parameter estimation through separate
maximization of the two log-likelihood components [34]. The algorithm EM starts with the
T T T T T T T
initial values of θ (0) = (α (0) , u(0) , τ (0) , β (0) , ν (0) ,
(0) ) and repeats the steps below
to achieve convergence [47].
2.2.1. E-step
(p)
In the E-step of the EM algorithm, Zijk is estimated by its conditional expectation zijk
T T T T T T
under current estimates α (p) , u(p) , τ (p) , β (p) , ν (p) and
(p) , where p denotes the pth
iteration.
(p) T T T T T T
zijk = p(zero state|yijk , α (p) , u(p) , τ (p) , β (p) , ν (p) ,
(p) )
p(yijk |zero state) × p(zero state)
=
p(yijk |zero state) × p(zero state) + p(yijk |Negative Binomial state)
×p(Negative Binomial state)
⎧
⎨ 1
(p) (p) r̂(p) yijk = 0
1+exp(−(Xijk ×α̂
T (p) +ûi +τ̂ij ))tijk
= (8)
⎩0 yijk > 0
and
r̂(p) r
tijk = (p) (p)
(9)
r + exp(X T ijk × β̂ (p) + ν̂i ˆ ij )
+
2.2.2. M-step
(p) ˆ (p) , r̂(p) } can be obtained by
With the Zijk is fixed at Zijk , {α̂ (p) , û(p) , τ̂ (p) } and {β̂ (p) , ν̂ (p) ,
maximizing lξ and lη separately in view of the orthogonal partition lc = lξ + lη [47].

M step for α: Find α (p+1) by maximizing lξc (α; y, u(p) , τ (p) , z(p) ). So, maximizing α is
equivalent to solving the estimating equation as
nij
1
m ni
(p) eξijk
zijk − ×Xijk = 0 (10)
N i=1 j=1
k=1
1 + eξijk
M step for β: Find β (p+1) by maximizing lηc (β; y, ν (p) ,

(p) , z(p) , r(p) ). So, the maximizing
β is equivalent to solving the estimating equation
nij
1
m ni
(p) (yijk − eηijk )
(1 − zijk ) × η × Xijk =0 (11)
N e ijk
i=1 j=1 k=1 1+ r
In the M-step of EM algorithm, it is assumed that the dispersion parameter (r) and the
variance components (σu2 , σν2 , στ2 and σ
2 ) are given. But, these are unknown and required
to be estimated. Kelvin [47] suggested an updated estimate of the dispersion parameter
(r̂(p) ), that obtains by maximizing lηc . Also, the estimation of the asymptotic standard error
of the scale parameter in the negative binomial component is according to the proposed
method of Kelvin [28]. The recursive Newton–Raphson is used to estimate parameters. The
square root of the inverse of the diagonal elements of the information matrix provides the
respective standard errors of the regression coefficients α and β [28,34]. After estimating
linear predictive elements, variance components are estimated, and this process continues
until the estimation of the parameters converges. Therefore, the REML approach has been
used to estimate the variance components [28,34,47].
2.3. Robust expectation-solution (RES) algorithm

In this section, to have robust estimation, we extend the RES algorithm [18] in the three-
level ZINB model. The RES approach is an extension of the M-Estimator method [19]. The
M-estimator in the generalized linear model (GLM) obtained by solution of the estimating
equation as
n n

∂μi
(yi , xi , ) = (yi − μi )v−1 (μi ) xi = 0 (12)
i=1 i=1
∂η i
Here (.) denotes a score function or the derivation relative to the vector of the parameters
of the likelihood function. In the GLM models, for different distributions, various robust
estimators have been proposed that we note in following.
2.3.1. Logistic component

For a robust estimate in the logistic regression, the estimator of Mallows’ class is pro-
posed [8,9,15,21,25] and estimating Equation (12) is substituted by the robust estimating
equation as

n
{W(xi ) × (yi , xi , )} = 0 (13)
i=1
Here, W (.) is the weight on the design matrix and reduces the impact of possible leverage
points. In fact, the class of the estimators of Mallows is functioning only on the parameters
and the design matrices [31]. Therefore, in the RES algorithm for obtaining α (p+1) , the
estimating Equation (10) in the M-step of the EM algorithm is substituted by with robust
estimating equation as
nij
1
m ni
(p) eξijk
W(Xijk ) × zijk − × Xijk = 0 (14)
N i=1 j=1
k=1
1 + eξijk
√
where, W(X) = 1 − h, and h (a vector N × 1 and N is a total number of observations) is
the vector of leverages or diagonal elements of the hat matrix H = X(X T X)−1 X T (a matrix
N × N). Here, W(X) is a simple function to down-weight the outlier observations and large
leverage points [1,7,8,18,22].
2.3.2. Negative binomial component

Cantoni et al. [1] considered a general class of M-estimators of Mallows’ type for
GLMs, where the influence of deviations on y and on x are bound separately. Therefore,
M-Estimator are considered in the form below [1,7,8,21]

∂μi

−1
(ψ(.) × v (μi ) × × W(xi ) × xi ) − ai () = 0 (15)
i
∂ηi
where ψ(.) is a continuous and bounded function which depends on a few tuning constants
and ai () is a correction term, ensuring Fisher consistency at the model [1] and is defined
as
∂μi
ai () = E(ψ(.)) × v−1 (μi ) × × W(xi ) × xi (16)
∂ηi
In fact, ψ(.) is a weight on the response vector in order to reduce the effect of the outliers.
Bianco and Yohai introduced ψ-BY [4], ψ-Tukey and ψ-Hampel introduced by Cantoni
et al. [1] and ψ-Huber introduced by Huber [22]. The Huber’s function has been used [11]
to build robust estimators in the negative binomial model. This function defined as
ψ-Huber (rp ,C) = max ( − C, min (C, rp )) (17)
where rp is Pearson residual and C is tuning constant.
Therefore, we suggest (in the RES algorithm) for obtaining β (p+1) , the estimating
Equation (11) substituted with the robust estimating equation below
nij
1
m ni
(p) (ψ(y ijk ) − E ijk (β, c))
(1 − zijk ) × W(X ijk ) × Eijk (β,c)

× Xijk =0 (18)
N 1+
i=1 j=1 k=1 r
For weight on the response, we choose the ψ-Huber which has considered by Hall and
Shen [18]. Therefore
⎧
⎪
⎨q1 yijk < q1
ψ(yijk ) = yijk yijk ∈ [q1 , q2 ] (19)
⎪
⎩
q2 yijk > q2
where q1 and q2 are quantiles of order C and 1–C of the negative binomial compo-
nent, respectively, and for a Fisher consistency correction term we use the function have
suggested by Hall and Shen [18] and Shen [41].
Eijk (β, c) = E(ψ(Yijk )) = q1 × p(yijk < q1 ) + μijk

× p(q1 − 1 ≤ yijk < q2 ) + q2 × p(yijk > q2 ) (20)
where probabilities of relation (20) are computed based on the negative binomial com-
ponent density. W(Xijk ) has the same definition as W(X ). In the ψ(y ), a quartile is
ijk ijk
chosen, so that maintain a trade-off between efficiency at the model and robustness under
outliers [1,8,18,41]. In the simulation studies and real data, we take C = 0.01. In the S-Step
of the RES algorithm, we use the Newton–Raphson algorithm. To increase the convergence
speed of the algorithm, the initial values proposed are used [18,41]. The calculation of the
P-value is based on the Wald method and we report this P-value in all of the paper.
2.3.3. Asymptotic
In this section we prove, if the RES algorithm converges, the robust estimating equations are
unbiased, consistent and asymptotically normal (under mild regularity conditions). Since
the estimation of the parameters is obtained from solving the robust estimating Equations
(14) and (18), so for simplicity, with combining them we have the following equation:
nij
1
m ni
S(θ , y) = Eθ (Sijk (yijk , zijk , θ )|yijk ) = 0 (21)
N i=1 j=1
k=1
For all θ = (α T , β T )T ∈ and all i = 1, 2, . . . , m, j = 1, 2, . . . , ni , k = 1, 2, . . . , nij .

Here,
Sijk (yijk , zijk , θ) = (Sijk,logistic (yijk , zijk , θ )T , Sijk,NB (yijk , zijk , θ )T )T
with
nij
1
m ni
eξijk
Sijk,logistic (yijk , zijk , θ) = W(Xijk ) × zijk − × Xijk
N
i=1 j=1 k=1
1 + eξijk
and
nij
1
m ni
Sijk,NB (yijk , zijk , θ) = (1 − zijk ) × W(X ijk )
N i=1 j=1
k=1

(ψ(yijk ) − Eijk (β, c))
× Eijk (β,c)
× Xijk
1+ r
Rosen et al. [39] showed under certain regularity conditions if the expectation solution
algorithm converges and there exists a point θ̂ ∈ such that lim θ (δ) = θ̂ , where θ (δ) ,
δ→∞
for δ = 0, 1, 2, . . . ,is a sequence generated by the expectation solution, then
1) θ̂ satisfies the S(θ̂ , y) = 0

2) S(θ , y) = 0 is an unbiased estimating equation, satisfying Eθ (S(θ , y)) = 0 for θ ∈ .
Hall et al. [18] showed that the conditions of this proposition are easily verifiable for the
RES algorithm that has applied to the ZIP regression model. Therefore, if the RES algorithm
converges, it converges to a solution θ̂ of an unbiased estimating equation. Moreover, under
regularity conditions [10], the RES estimator is consistent (θ̂ → θ) and asymptotically
√ D
normal ( n(θ̂ − θ) −→ N(0, V)). Here, V = U −1 IU −T where
∂
U −T = (U −1 )T , U = −E( S(θ , y)) and I = E(N × S(θ , y) × S(θ , y)T ).
∂θ T
The asymptotic variance θ̂ can be estimated by Vn = Un−1 In Un−T at θ̂ , where
n
1
m ni ijk
∂
Un = − Eθ ( T (Sijk (yijk , zijk , θ )|yijk ))
N i=1 j=1 ∂θ
k=1
and
n
1
m ni
ijk
In = Eθ (Sijk (yijk , zijk , θ)|yijk ) × Eθ (Sijk (yijk , zijk , θ )|yijk )T .

N
i=1 j=1 k=1
The standard errors of the coefficients α and β are the square roots of the diagonal elements
of the variance-covariance matrix Vn .
2.3.4. The tuning quantile (C)

In this section, we evaluate the bias and MSE of the RES algorithm in the different modes
of the tuning quantile. To do this, we generate data from the three-level ZINB models
with various modes of mixing probability (π = 0.3, π = 0.5, π = 0.65 and π = 0.8),
with mean parameter near to real data (μ = 4) and different modes of the tuning quan-
tile (C = 0.001, C = 0.005, C = 0.01, C = 0.015 and C = 0.02). Actual values α, β and
r−1 with their bias and MSE are given in Table 1. In some cases, the bias and MSE decrease
with an increasing C (e.g. in α 0 and β 0 in the models with mixing probability of 0.3, 0.5,
0.65 and 0.8). In some cases, the bias and MSE increase with an increasing of C (e.g. β 1
in the models with mixing probability of 0.3, 0.5, 0.65 and r−1 in the model with mixing
probability of 0.3). In some cases, increasing in the C does not have much effect on bias
and MSE (e.g. β 2 and α 1 in the models with mixing probability of 0.3, 0.5, 0.65 and 0.8).
Therefore, we consider C = 0.01 in the simulation studies and data analysis of sections 3
and 4.
3. Simulation studies
The purpose of the simulation is to evaluate the performance of the EM algorithm and
the RES algorithm in the presence of the outliers and the different probabilities of mixing.
In this section, we consider the various modes of mixing probability with mean parame-
ter near to real data (μ = 4) and for ensuring exist over-dispersion we suppose r−1 = 2.
To evaluate the consistency of estimation methods, we conduct simulation studies in two
sample size (n = 250 and n = 500). For sample size n = 250, the number of clusters in
the third level is 5 (np = 5) with 50 (m × nc = 10 × 5 = 50) subjects in each cluster and
the number of clusters in the second level is 25 clusters (np × nc = 5 × 5 = 25) with 10
Table 1. Bias and MSE for three level ZINB model for different values of the tuning quantile C.
C = 0.001 C = 0.005 C = 0.01 C = 0.015 C = 0.02
MP Parameters Bias MSE Bias MSE Bias MSE Bias MSE Bias MSE
π = 0.3
α0 = −0.7 0.524 0.275 0.508 0.258 0.495 0.245 0.486 0.236 0.476 0.227
α1 = −1 0.521 0.272 0.521 0.272 0.519 0.271 0.519 0.269 0.517 0.268
α2 = 0.6 −0.256 0.066 −0.253 0.065 −0.252 0.064 −0.249 0.063 −0.247 0.062
β0 = 1.8 0.454 0.207 0.393 0.154 0.338 0.114 0.294 0.087 0.255 0.065
β1 = −0.4 0.102 0.011 0.127 0.017 0.144 0.021 0.155 0.024 0.167 0.028
β2 = 0.05 −0.026 0.001 −0.024 0.001 −0.025 0.001 −0.026 0.001 −0.025 0.001
r−1 = 2 0.789 0.624 0.842 0.709 0.836 0.699 0.818 0.669 0.804 0.647
π = 0.5
α0 = −0.5 0.397 0.158 0.391 0.153 0.391 0.153 0.392 0.154 0.391 0.153
α1 = 0.55 −0.048 0.003 −0.043 0.003 −0.042 0.002 −0.041 0.002 −0.042 0.002
α2 = 0.55 −0.138 0.019 −0.138 0.019 −0.143 0.021 −0.145 0.022 −0.148 0.022
β0 = 1.8 0.333 0.112 0.277 0.077 0.222 0.049 0.173 0.030 0.134 0.018
β1 = −0.4 0.126 0.016 0.153 0.024 0.173 0.030 0.192 0.037 0.205 0.042
β2 = 0.05 0.027 0.001 0.023 0.001 0.018 0.001 0.014 0.001 0.011 0.001
r−1 = 2 0.968 0.937 0.977 0.954 0.937 0.880 0.894 0.800 0.865 0.749
π = 0.6
α0 = −0.5 0.488 0.238 0.472 0.223 0.457 0.209 0.444 0.198 0.435 0.189
α1 = 0.9 −0.144 0.021 −0.141 0.020 −0.134 0.019 −0.133 0.018 −0.129 0.017
α2 = 1 −0.291 0.085 −0.290 0.085 −0.286 0.083 −0.284 0.081 −0.282 0.079
β0 = 1.8 0.390 0.153 0.317 0.101 0.246 0.061 0.193 0.037 0.149 0.022
β1 = −0.4 0.149 0.023 0.175 0.031 0.197 0.039 0.206 0.043 0.215 0.047
β2 = 0.05 −0.003 0.001 −0.007 0.001 −0.007 0.001 −0.007 0.001 −0.009 0.001
r−1 = 2 0.785 0.616 0.805 0.648 0.787 0.619 0.767 0.588 0.742 0.550
π = 0.8
α0 = 1.3 0.142 0.021 0.127 0.017 0.113 0.013 0.102 0.011 0.090 0.008
α1 = 0.6 0.009 0.001 0.014 0.001 0.019 0.001 0.019 0.001 0.020 0.001
α2 = −0.3 −0.013 0.001 −0.015 0.001 −0.017 0.001 −0.019 0.001 −0.020 0.001
β0 = 1.8 0.282 0.081 0.189 0.037 0.100 0.011 0.032 0.002 −0.025 0.001
β1 = −0.4 0.199 0.041 0.229 0.054 0.250 0.064 0.256 0.067 0.261 0.069
β2 = 0.05 −0.010 0.002 −0.019 0.002 −0.028 0.002 −0.037 0.002 −0.042 0.003
r−1 = 2 0.769 0.592 0.759 0.577 0.711 0.506 0.669 0.448 0.633 0.401
Note: MP: Mixing probability; MSE: Mean Square Error.
(m = 10) subjects per cluster. In the sample size n = 500, the number of clusters on the
third level is 5 (np = 5) with 100 (m × nc = 10 × 10 = 100) subjects in each cluster and
the number of clusters in the second level is 50 (np × nc = 5 × 10 = 50) with 10 (m = 10)
subjects per cluster. The results for sample size 250 at the top of the tables and the results
for the sample size of 500 are listed below. In this paper, we use the Monte Carlo simulation
method. In all simulation studies, we produce data from the three-level ZINB model with
the distribution below

0 πijk
yijk ∼ (22)
NB(μijk , r) 1 − πijk
where r = 0.5 is the dispersion parameter.
μijk = exp(β0 + β1 X1i + β2 X2i + νi +

ij ) (23)
πijk = logit−1 (α0 + α1 X1i + α2 X2i + ui + τij ) (24)

where, ui ∼ N(0, 14 ), νi ∼ N(0, 14 ), τij ∼ N(0, 14 ),
ij ∼ N(0, 14 ). To ensure the existence
of leverage points, we generate covariates from the uniform distribution (X1i ∼
Table 2. Simulation results for the three level ZINB model with π = 0.3.
RES EM
Sample size Parameters Bias MSE Bias MSE
n = 250
α0 = −0.7 0.5224 0.2734 0.5719 0.3275
α1 = −1 0.5162 0.2671 0.5089 0.2595
m = 10 α2 = 0.6 −0.2529 0.0645 −0.2890 0.0841
np = 5 β0 = 1.8 0.3989 0.1594 0.5091 0.2596
nc = 5 β1 = −0.4 0.1218 0.0151 0.0871 0.0080
β2 = 0.05 0.0231 0.0009 −0.0153 0.0007
r−1 = 2 0.5661 0.4077 −1.9121 3.6565
n = 500
α0 = −0.7 0.5206 0.2713 0.5916 0.3502
α1 = −1 0.4847 0.2353 0.4917 0.2420
m = 10 α2 = 0.6 −0.2281 0.0524 −0.2823 0.0799
np = 5 β0 = 1.8 0.3886 0.1512 0.5306 0.2818
nc = 10 β1 = −0.4 0.1403 0.0199 0.0881 0.0080
β2 = 0.05 −0.007 0.0002 −0.0148 0.0005
r−1 = 2 0.2380 0.0567 −1.9498 3.8023
Note: RES: Robust Expectation-Solution; EM: Expectation–Maximization; MSE: Mean Square Error.
U(0, 1) and X2i ∼ U(0, 1)). In order to ensure that we really have outliers, we randomly
select 5% of the generated ‘y’ responses and substitute it with ‘y + 15’. In this section, the
tuning quantile (C) is 0.01 and the regression structure for the mixing probability and the
mean of the negative binomial component is used.
Study 1: In this study, a three-level ZINB model with a mixing probability of 0.3. In order
to ensure the existence of the outliers, randomly 5% responses of the negative binomial
and 3.5% responses of the logistic, have been contaminated. According to Table 2, the bias
and MSE most of the estimated parameters in the RES algorithm is smaller than the EM
algorithm, and with an increase in the sample size, the bias and MSE of almost all the
parameters are reduced (exception of one parameter (β 1 )). But in the EM algorithm, with
increasing sample size the bias and the MSE values, most of the estimated parameters does
not decrease.
Study 2: In this study, we consider a three-level ZINB model with a 50:50 mixing prob-
ability. Therefore, the components of the mixture are well separated. In this study, we
generate data similar to the previous study and to ensure the existence of the outlier, we
randomly added outliers to 5% negative binomial responses and 2.5% of logistic responses.
Actual values α, β, and r−1 with their estimates are given in Table 3. Vector β is selected
similarly the previous study also the values of vector α are chosen so that the value of ZI
is moderate (π = 0.5). According to Table 3, in the small sample size, the bias of the EM
algorithm is smaller than RES algorithm. But by increasing the sample size to 500, in most
situations, the bias of the estimated parameters in the RES algorithm is smaller than the
EM algorithm. In the EM algorithm, with increasing sample size, the MSE values do not
decrease and there is no consistency.
Study 3: In this study, we consider a three-level ZINB model with a poor mixing prob-
ability (π = 0.65) consider. In order to ensure the existence of the outliers, randomly
5% responses of the negative binomial and 2% responses of the logistic substitute with
‘y + 15’. According to Table 4, the bias and MSE most of the estimated parameters in
the RES algorithm is smaller than the EM algorithm, and in the RES algorithm with an
RES EM
n = 250
α0 = −0.5 0.3329 0.1114 0.2869 0.0829
α1 = 0.55 −0.0276 0.0014 −0.0252 0.0014
m = 10 α2 = 0.55 −0.1420 0.0208 −0.1192 0.0149
np = 5 β0 = 1.8 0.4194 0.1762 0.2738 0.0754
nc = 5 β1 = −0.4 0.2319 0.0541 0.0494 0.0096
β2 = 0.05 0.0006 0.0003 0.0089 0.0007
r−1 = 2 0.1931 0.0374 −1.8884 3.5667
n = 500
α0 = −0.5 −0.2235 0.0509 0.2915 0.0853
α1 = 0.55 0.0548 0.0036 −0.0469 0.0025
m = 10 α2 = 0.55 0.0052 0.0007 −0.1261 0.0163
np = 5 β0 = 1.8 0.0592 0.0037 0.2993 0.0898
nc = 10 β1 = −0.4 0.1286 0.0167 0.1038 0.0111
β2 = 0.05 −0.0172 0.0005 −0.0202 0.0007
r−1 = 2 −0.0597 0.0036 −1.9396 3.7630
Note: RES, Robust Expectation-Solution; EM, Expectation–Maximization; MSE, Mean Square Error.
RES EM
n = 250
α0 = −0.5 0.4273 0.1831 0.4971 0.2475
α1 = 0.9 −0.1656 0.0281 −0.1639 0.0275
m = 10 α2 = 1 −0.2487 0.0624 −0.2429 0.0596
np = 5 β0 = 1.8 0.2733 0.0751 0.4885 0.2391
nc = 5 β1 = −0.4 0.2009 0.0408 0.0927 0.0093
β2 = 0.05 −0.0425 0.0023 0.0151 0.0010
r−1 = 2 0.2679 0.0718 −1.8817 3.5412
n = 500
α0 = −0.5 −0.1006 0.0111 0.4834 0.2339
α1 = 0.9 0.0150 0.0007 −0.1493 0.0226
m = 10 α2 = 1 −0.0737 0.0059 −0.2532 0.0644
np = 5 β0 = 1.8 0.0636 0.0043 0.5014 0.2517
nc = 10 β1 = −0.4 0.1818 0.0333 0.1302 0.0173
β2 = 0.05 −0.0026 0.0002 0.0013 0.0004
r−1 = 2 0.3385 0.1161 −1.9239 3.7020
increase in the sample size, the bias and MSE almost all of the parameters are significantly
reduced (exception of r−1 ). On the other hand, in the EM algorithm, in most cases, the
inconsistency exists and with increasing sample size, MSE values did not decrease.
Study 4: In this study, a three-level ZINB model with a poor mixing probability
(π = 0.8). In order to generate outliers, we substitute 5% of the negative binomial
responses and 1% of the logistic responses with ‘y + 15’. Given that both sides of the ZINB
component have been added to the outlier data, the reported bias is relatively large in some
cases. According to Table 5, in most situations, the bias of the estimated parameters in the
RES method is smaller than the EM algorithm. It can be seen that in the EM algorithm, the
bias and MSE of the scale parameter are relatively large. By comparing the values of MSE
in two sample sizes (exception of two parameters), in the RES approach, there is complete
RES EM
n = 250
α0 = 1.3 −0.0241 0.0015 0.0829 0.0076
α1 = 0.6 0.0471 0.0033 −0.0276 0.0019
m = 10 α2 = −0.3 −0.0283 0.0018 0.0181 0.0014
np = 5 β0 = 1.8 0.0127 0.0011 0.2623 0.0699
nc = 5 β1 = −0.4 0.2548 0.0661 0.0945 0.0110
β2 = 0.05 −0.0276 0.0019 −0.0327 0.0030
r−1 = 2 0.2499 0.0626 −1.7835 3.1825
n = 500
α0 = 1.3 −0.2018 0.0413 0.0248 0.0010
α1 = 0.6 −0.0173 0.0008 0.0201 0.0009
m = 10 α2 = −0.3 −0.0002 0.0005 0.0318 0.0014
np = 5 β0 = 1.8 −0.0092 0.0005 0.2730 0.0750
nc = 10 β1 = −0.4 0.2046 0.0422 0.1633 0.0275
β2 = 0.05 −0.0447 0.0024 −0.0393 0.0024
r−1 = 2 0.0536 0.0030 −1.8949 3.5919
consistency and MSE values have decreased with increasing sample size. But in the EM
algorithm, in most cases, the inconsistency exists and with increasing sample size, MSE
values have not decreased.
4. Data analysis
We apply this robust approach for the primary school students DMFT index and factors
affecting fertility.
4.1. DMFT
The DMFT index represents the oral and dental status of individuals, and this indicator
counts the number of the decayed, missing and filled surface of the permanent and primary
teeth. The data used in this article is a part of the 1991 national Iranian health study [34].
In this study, clusters with unequal size are considered within the provinces, and then the
health status of the residents (included the dental health status) has been investigated. In
this article, due to the importance of oral and dental health in primary school children, the
DMFT index of 1045 students residing in 17 provinces of Iran has been studied. The child’s
DMFT index is the response. yijk (i = 1, 2, . . . , m; j = 1, 2, . . . , ni ; k = 1, 2, . . . , nij ) be
the DMFT index of the kth child in the jth cluster and ith province, and the covariates used
in the study include the residential region (rural 1; urban 0); gender (female 1; male 0); and
brushing (no brushing 0; seldom 1; once a day 2; more than once a day 3). In the sample,
43.7% reside in the rural district, 46.9% are girls and 67% of students use the toothbrush
at least once a day. The frequency, percent, and percentage of cumulative frequency of the
DMFT index are given in Table 6 so that 62.2% of the students have a DMFT index zero.
On the other hand, the results of the score test [33] for extra zeros against the NB-mixed
regression and the score test [35] for assessing ZIP regression against Poisson regression
in the multilevel count data are significant (P < 0.0001). The box-plot shows the presence
of the outliers in the observations (Figure 1). In this paper, we use the RES approach with
Table 6. Frequency distribution of DMFT index.

Value Frequency Percent (%) Cumulative percent
0 652 62.24 62.24
1 55 5.26 67.66
2 69 6.6 74.26
3 56 5.36 79.62
4 75 7.18 86.79
5 36 3.44 90.24
6 29 2.78 93.01
7 22 2.1 95.12
8 22 2.1 97.23
9 11 1.05 98.28
10 6 0.57 98.85
11 2 0.19 99.04
12 2 0.19 99.23
13 1 0.09 99.33
14 4 0.38 99.71
15 1 0.09 99.81
16 0 0 99.81
17 1 0.09 99.90
18 0 0 99.90
19 1 0.09 100
Figure 1. Box-plot of the primary school children DMFT index.
quantile constant equals 0.01. The coefficients estimate and their SE are given in Table 7.
In the DMFT studies which are modeled using the ZI model, π represents the percentage
of subjects in the sample that are not susceptible to dental caries and 1-π of the selected
sample are susceptible to DMFT. Therefore, the probability of ZI in the sample equal to
exp(0.488 − 0.782 × areaijk − 0.049 × genderijk

+0.18 × brushingijk + ui + τij )
πijk =
1 + exp(0.488 − 0.782 × areaijk − 0.049 × genderijk
+0.18 × brushingijk + ui + τij )
Table 7. Estimation of Parameters and standard error (in parentheses) of the DMFT data.
Estimate method RES EM
Component Logistic NB Logistic NB
Fixed effects Intercept 0.488 (0.354) 1.39 (0.103)*** 0.329 (0.405) 1.411 (0.102)*
Residential areas −0.782 (0.177)*** −0.144 (0.022)*** 0.080 (0.204) −0.127 (0.065)*
Gender −0.049 (0.161) 0.005 (0.066) −0.227 (0.171) 0.112 (0.057)*
Brushing 0.18 (0.088)* −0.004 (0.037) −0.031 (0.095) −0.019 (0.031)
1/r 0.007 (0.001)*** 0.103(0.002)***
Random effect σ 2 (Province) 1.016 (0.415)* 0.020 (0.006)** 1.396 (0.57)* 0.033 (0.018)**
σ 2 Cluster 0.042 (0.214) 0.001 (0.03) 0.989 (0.265)* 0.030 (0.021)
Note: NB, Negative Binomial; *P < 0.05, **P < 0.01, ***P < 0.001.
The mean DMFT count of children at risk equal to
μijk = exp(1.39 − 0.144 × areaijk + 0.005 × genderijk − 0.004 × brushingijk + νi +

ij )
Therefore, after adjusting the random effects of clusters and provinces, the probability
ZI for the boy student resident in the urban district who does not use the toothbrush is
0.62. According to Table 7, the results of the estimation of parameters are different in both
RES and EM algorithms. In the RES approach, covariates region (p-value < 0.0001) and
brushing (p-value = 0.033) are significant But in EM algorithm, none of the variables are
significant (logistic part). In other words, students’ DMFT index resides in the rural dis-
tricts are more than urban districts and brushing having a positive effect on the decrease
DMFT index. In the NB part and by RES approach, only the region (p-value < 0.0001) is
significant, but in the EM algorithm, covariates of the region (p-value = 0.043) and gender
(p-value = 0.043) are significant. According to the results of RES in the negative binomial
part, students in the rural districts have the lower DMFT index than students in the urban
districts. Therefore, although the number of students with zero DMFT index in urban
regions is higher than in rural regions, the DMFT index of the students in rural regions
is lower than urban regions.
4.2. The factors affecting the number of births

In this section, we evaluate the factors affecting the number of births, to do this, we use the
data of health sciences research of Iran in 2015. In this study, the clusters were selected from
all the cities of Sistan-va-Baluchistan province (located in eastern Iran) and households’
health status was studied. From these households, questions asked in a variety of areas
including the social and economic status of the household, mortality, fertility, coverage of
children’s and women’s health services, and people’s knowledge of important diseases. In
this article, due to the importance of the fertility rate, affecting factors on the number of
children of 3043 women resident in 10 the cities in the province of Sistan-va-Baluchistan
have been studied. The number of a woman’s children is the response. yijk (i = 1, 2, . . . ,
m; j = 1, 2, . . . , ni ; k = 1, 2, . . . , nij ) is the number of children of the kth woman in the
jth cluster and ith city. The covariates used in the study include the use of contraceptive
methods (the pill, IUD and etc.) (No 0; Yes 1), monthly household income (Low (Less than
5 million Rials); Medium (Between 5 million Rials to 10 million Rials); High (More than
10 million Rials)) and marriage age. In the sample, 51.2% use contraceptive methods, 7.9%
Table 8. Frequency distribution of number of birth in women of Sistan-

Baluchistan province.
Value Frequency Percent (%) Cumulative percent
0 754 24.8 24.8
1 646 21.2 46
2 512 16.8 62.8
3 371 12.2 75
4 270 8.9 83.9
5 184 6.0 89.9
6 125 4.1 94.1
7 72 2.4 96.4
8 46 1.5 97.9
9 29 1.0 98.9
10 18 0.6 99.5
11 8 0.3 99.7
12 3 0.1 99.8
13 1 0.0 99.9
14 1 0.0 99.9
15 1 0.0 99.9
16 1 0.0 100
17 1 0.0 100
Figure 2. Box-Plot of the number of birth.
have medium to high incomes and mean of the marriage age is 17.05 ± 3.74. The frequency,
percent, and percentage of cumulative frequency of the number of women’s children are
given in Table 8. Therefore, 24.8% of women did not have children. The results of the score
test [33] for extra zeros against the NB-mixed regression and the score test [35] for assess-
ing ZIP regression against Poisson regression in the multilevel count data are significant
(P < 0.0001). On the other hand, the box-plot shows the presence of the outliers in the
observations (Figure 2). In this section, we use the RES algorithm with C = 0.01. Results
from fitting three-level ZINB model are given in Table 9, which presents parameter esti-
mates and standard error for all parameters related to the predictors. In the RES algorithm,
Table 9. Estimation of parameters and standard error (in parentheses) of the number of birth data.
RES EM
Parameters n (%) Logistic NB Logistic NB
Fixed Effects Intercept − −3.29 (0.54)*** 1.55 (0.10)*** −3.35 (0.58)*** 1.58 (0.11)***
Marriage age − 0.05 (0.03) −0.04 (0.01)*** 0.05 (0.03) −0.04 (0.01)***
Contraceptive
No 1485 (48.8) − − − −
Yes 1558 (51.2) 0.43 (0.23) −0.08 (0.03)* 0.50 (0.25)* −0.09 (0.04)*
Household income
Low 2803 (92.1) – – – –
Medium 207 (6.8) −0.07 (0.46) −0.005 (0.07) −0.12 (0.51) −0.007 (0.07)
High 33 (1.1) −2.26 (0.003)*** −0.17 (0.19) −1.74 (4.78) −0.14 (0.22)
1/r – 0.25 (0.003)*** 0.26 (0.003)***
Random Effect σ 2 (City) – 0 (0.05) 0.02 (0.01)* 0 (0.06) 0.02 (0.01)*
σ 2 (Cluster) – 0 (0.11) 0.05 (0.01)*** 0.01 (0.12) 0.06 (0.01)***
Note: NB: Negative Binomial; *P < 0.05, **P < 0.01, ***P < 0.001.
covariate high-income level (p-value < 0.0001) is significant But in the EM algorithm,
the use of contraceptive (p-value = 0.043) is significant (logistic part). In the NB part
and by the RES approach, the marriage age (p-value < 0.01) and the use of contraceptive
(p-value = 0.019) are significant. In the EM algorithm, similar to the RES algorithm, the
marriage age (p-value < 0.0001) and the use of contraceptive (p-value = 0.012) are sig-
nificant. Therefore, according to the results of the RES algorithm, the birth rate decreases
with increasing marriage age and the use of contraceptive methods.
5. Discussion
A popular way to model correlated count data with excess zeros and over-dispersion
simultaneously is by means of the MZINB model. The numerical methods such as the
EM algorithm are used for estimating the parameters, but in the presence of outliers and
other types of contaminations the likelihood-based the approaches are unstable. To over-
come this challenge we have extended RES approach [18] for building robust estimates of
the regression parameters in the multilevel ZINB distribution. This approach achieves to
robustness in the estimation of the parameters by means of the robust estimating equa-
tions. Robust estimating equations in the RES algorithm belong to the Mallows class and
in the logistic component only design matrix or covariates are weighted and reduces the
impact of possible leverage points, but in the negative binomial component is considered a
general class of M-estimators of type Mallows, where the influence of deviations on y and
on x are bound separately [8]. Given the fact that reducing the effect of outliers instead
of omitting the effect of theirs is the only logical way to deal with the outliers, therefore,
robust estimating equations are proposed in the RES algorithm that causes down-weight
of the outliers. The simulation results showed that the RES algorithm was robust com-
pared to the EM algorithm in the presence of outliers and different modes of the MZINB
model component separation. But in the EM algorithm, in most cases, the inconsistency
existed, also the bias and MSE of some parameters (e.g. scale parameter) were relatively
large. On the other hand, although the variance /MSE and bias of most of the parameters
in the RES algorithm decreases with increasing sample size, but the variance/MSE of most
of the parameters of the RES algorithm is greater than the EM algorithm because the effi-
ciency of this method is lower than the EM algorithm [20]. The simulation studies confirm
this (Tables 2–5). To the balance between efficiency and robustness, C should be appropri-
ately selected [45]. For this purpose, different values of C were investigated in the interval
[0.001, 0.02], and according to simulation studies in section 2.3.4, the value of C = 0.01
gave us good results. In the data analysis section, DMFT and birth data were used to exam-
ine the usefulness of the RES algorithm in comparison with the EM algorithm. The RES
algorithm gave us safe and logical results, which was consistent with DMFT and fertility
literature. Many studies [6,14,30,42,43,46] have confirmed the positive impact of brushing
on the decline of the DMFT index. Several studies [5,37,38,40] confirm that the DMFT
index score in rural areas is higher than in urban areas. On the other hand, many studies
[24,26,44] have confirmed the negative impact of marriage age on the increasing number
of birth. Also, some of the papers [3,29] have confirmed the negative impact of the use of
contraceptive methods on increasing number of birth.
In the end, it’s easy to extend the RES approach for other multi-level models with excess
zeros, such as ZIGP, ZIP, ZIB, etc. We suggest this algorithm instead of the EM algorithm
for correlated data with excess zero and in the presence of outliers.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This study has been adapted from a PhD thesis at Hamadan University of Medical Sciences. The
study was funded by Vice-chancellor for Research and Technology, Hamadan University of Medical
Sciences (No. 970204575).
ORCID
Eghbal Zandkarimi http://orcid.org/0000-0001-7583-6628
Abbas Moghimbeigi http://orcid.org/0000-0002-3803-3663
Hossein Mahjub http://orcid.org/0000-0002-9375-3807
Reza Majdzadeh http://orcid.org/0000-0003-3989-4470
References
[1] W.H. Aeberhard, E. Cantoni, and S. Heritier, Robust inference in the negative binomial regression
model with an application to falls data. Biometrics 70 (2014), pp. 920–931.
[2] A. Almasi, M.R. Eshraghian, A. Moghimbeigi, A. Rahimi, K. Mohammad, and S. Fallahigilan,
Multilevel zero-inflated generalized Poisson regression modeling for dispersed correlated count
data. Stat. Methodol. 30 (2016), pp. 1–14.
[3] A. Beatty, Recent Fertility Trends in Sub-Saharan Africa: Workshop Summary, National
Academies Press, Washington, DC, 2016.
[4] A.M. Bianco and V.J. Yohai, Robust estimation in the logistic regression model, in Robust
Statistics, Data Analysis, and Computer Intensive Methods, Springer, New York, NY, 1996,
pp. 17–34.
[5] L. Bilder, E. Stepco, D. Uncuta, D. Aizenbud, E. Machtei, A. Bilder, and H.D. Sgan-Cohen, The
pathfinder study among schoolchildren in the Republic of Moldova: Dental caries experience. Int.
Dent. J. 68 (2018), pp. 344–347.
[6] T. Cakar, L. Harrison-Barry, M. Pukallus, S. Kazoullis, and W. Seow, Caries experience of chil-
dren in primary schools with long-term tooth brushing programs: A pilot Australian study. Int. J.
Dent. Hyg. 16 (2018), pp. 233–240.
[7] E. Cantoni, A robust approach to longitudinal data analysis. Canad. J. Statist. 32 (2004),
pp. 169–180.
[8] E. Cantoni and E. Ronchetti, Robust inference for generalized linear models. J. Am. Stat. Assoc.
96 (2001), pp. 1022–1030.
[9] R.J. Carroll and S. Pederson, On robustness in the logistic regression model. J. R. Stat. Soc. Ser.
B. Stat. Methodol. 55 (1993), pp. 693–706.
[10] R. Carroll, D. Ruppert, and L. Stefanski, Nonlinear Measurement Error Models, Monographs on
Statistics and Applied Probability. Vol. 63, Chapman and Hall, New York, 1995.
[11] R. Chambers, E. Dreassi, and N. Salvati, Disease mapping via negative binomial M-quantile
regression, (2013).
[12] C.D. Desjardins, Modeling zero-inflated and overdispersed count data: An empirical study of
school suspensions. J. Exp. Educ. 84 (2016), pp. 449–472. doi:10.1080/00220973.2015.1054334.
[13] F. Famoye and K.P. Singh, Zero-inflated generalized Poisson regression model with an application
to domestic violence data. J. Data. Sci. 4 (2006), pp. 117–130.
[14] F.A. Farooqi, A. Khabeer, I.A. Moheet, S.Q. Khan, and I. Farooq, Prevalence of dental caries in
primary and permanent teeth and its relation with tooth brushing habits among schoolchildren
in eastern Saudi Arabia. Saudi Med. J. 36 (2015), pp. 737–742.
[15] J. Feng, H. Xu, S. Mannor, and S. Yan, Robust logistic regression and classification, in Advances
in neural information processing systems, (2014), pp. 253–261.
[16] W.H. Greene, Accounting for excess zeros and sample selection in Poisson and negative
binomial regression models, (1994).
[17] D.B. Hall, Zero-inflated Poisson and binomial regression with random effects: A case study.
Biometrics 56 (2000), pp. 1030–1039.
[18] D.B. Hall, and J. Shen, Robust estimation for zero-inflated Poisson regression. Scan. J. Stat.
37 (2010), pp. 237–252.
[19] F. Hampel, E. Ronchetti, P. Rousseeuw, and W. Stahel, Robust Statistics, John Wiley & Sons,
New York, 1986.
[20] S. Heritier, E. Cantoni, S. Copt, and M.-P. Victoria-Feser, Robust methods in biostatistics, ed,
Vol. 825, John Wiley & Sons, Chichester, 2009.
[21] S. Hosseinian, Robust Inference for Generalized Linear Models: Binary and Poisson Regression,
Verlag nicht ermittelbar (Lausanne), 2009.
[22] P.J. Huber, Robust estimation of a location parameter. An. Math. Stat. 35 (1964),
pp. 73–101.
[23] K. Hur, D. Hedeker, W. Henderson, S. Khuri, and J. Daley, Modeling clustered count data with
excess zeros in health care outcomes research. Health Serv. Outcomes Res. Methodol. 3 (2002),
pp. 5–20.
[24] A. Kabir, G. Jahan, and R. Jahan, Female age at marriage as a determinant of fertility. Sciences.
1 (2001), pp. 372–376.
[25] N. Kordzakhia, G. Mishra, and L. Reiersølmoen, Robust estimation in the logistic regression
model. J. Stat. Plan. Inference. 98 (2001), pp. 211–223.
[26] A. Kumar Acharya, The influence of female age at marriage on fertility and child loss in India.
Trayectorias 12 (2010), pp. 61–80.
[27] D. Lambert, Zero-inflated Poisson regression, with an application to defects in manufacturing.
Technometrics. 34 (1992), pp. 1–14.
[28] A.H. Lee, K. Wang, J.A. Scott, K.K. Yau, and G.J. McLachlan, Multi-level zero-inflated Pois-
son regression modelling of correlated count data with excess zeros. Stat. Methods Med. Res.
15 (2006), pp. 47–61.
[29] H. Leridon, Demographic effects of the introduction of steroid contraception in developed
countries. Hum. Reprod. Update 12 (2006), pp. 603–616.
[30] C.-J. Liu, W. Zhou, and X.-S. Feng, Dental caries status of students from migrant primary schools
in Shanghai Pudong New Area. BMC. Oral. Health. 16 (2016), p. 28.
[31] C.L. Mallows, On Some Topics in Robustness, Unpublished Memorandum, Bell Telephone
Laboratories, Murray Hill, NJ, 1975.
[32] G. McLachlan, On the EM algorithm for overdispersed count data. Stat. Methods Med. Res.
6 (1997), pp. 76–98.
[33] A. Moghimbeigi, A score test for extra zeros in negative binomial mixed models. J. Stat. Comput.
Simul. 81 (2011), pp. 635–644.
[34] A. Moghimbeigi, M.R. Eshraghian, K. Mohammad, and B. Mcardle, Multilevel zero-inflated
negative binomial regression modeling for over-dispersed count data with extra zeros. J. Appl.
Stat. 35 (2008), pp. 1193–1202.
[35] A. Moghimbeigi, M.R. Eshraghian, K. Mohammad, and B. McArdle, A score test for zero-
inflation in multilevel count data. Comput. Stat. Data. Anal. 53 (2009), pp. 1239–1248.
[36] J. Mullahy, Specification and testing of some modified count data models. J. Econom. 33 (1986),
pp. 341–365.
[37] S. Ngedup, M.A. Lee, D. Phurpa, and N. Wangmo, Maternal oral health: An examination survey
conducted in three referral hospitals in Bhutan. Bhutan Health J. 4 (2018), pp. 23–32.
[38] M. Păcurar, E. Bud, H. Alexandra, M. Chibelean, and M.C. Figueiredo, Clinical-statistical
analysis of correlations between caries risk indicators and the prevalence of maxillary dental
anomalies in a group of children from Tirgu Mures.
[39] O. Rosen, W. Jiang, and M.A. Tanner, Mixtures of marginal models. Biometrika 87 (2000),
pp. 391–404.
[40] S. Sanadhya, P. Aapaliya, S. Jain, N. Sharma, G. Choudhary, and N. Dobaria, Assessment and
comparison of clinical dental status and its impact on oral health-related quality of life among
rural and urban adults of Udaipur, India: A cross-sectional study. J. Basic. Clin. Pharm. 6 (2015),
p. 50.
[41] J. Shen, Robust Estimation and Inference in Finite Mixtures of Generalized Linear Models,
University of Georgia, uga, 2006.
[42] A. Sujlana, and P.K. Pannu, Family related factors associated with caries prevalence in the primary
dentition of five-year-old children. J. Indian Soc. Pedod. Prev. Dent. 33 (2015), p. 83.
[43] K.M. Thwin, W.T. Lin, and A. Than, A pilot study of oral health situation of chin population
in West Hilly Regions of Myanmar.
[44] E. Van De Walle, Age at marriage and fertility (Implications for family planning). IPPF. Med.
Bull. 7 (1973), p. 1.
[45] Y.-G. Wang, X. Lin, M. Zhu, and Z. Bai, Robust estimation using the Huber function with a
data-dependent tuning constant. J. Comput. Graph. Stat. 16 (2007), pp. 468–481.
[46] J. Winter, A. Jablonski-Momeni, A. Ladda, and K. Pieper, Effect of supervised brushing with fluo-
ride gel during primary school, taking into account the group prevention schedule in kindergarten.
Clin. Oral Investig. 21 (2017), pp. 2101–2107.
[47] K.K. Yau, K. Wang, and A.H. Lee, Zero-inflated negative binomial mixed regression modeling of
over-dispersed count data with extra zeros. Biom. J. 45 (2003), pp. 437–452.
[48] H. Zhu, S. Luo, and S.M. DeSantis, Zero-inflated count models for longitudinal measurements
with heterogeneous random effects. Stat. Methods Med. Res. 26 (2017), pp. 1774–1786.

Robust Inference in The Multilevel Zero-Inflated Negative Binomial Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robust Inference in The Multilevel Zero-Inflated Negative Binomial Model

Uploaded by

Copyright:

Available Formats

Journal of Applied Statistics

ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: https://www.tandfonline.com/loi/cjas20

Robust inference in the multilevel zero-inflated

Eghbal Zandkarimi, Abbas Moghimbeigi, Hossein Mahjub & Reza Majdzadeh

To link to this article: https://doi.org/10.1080/02664763.2019.1636942

Published online: 02 Jul 2019.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at

Robust inference in the multilevel zero-inflated negative

ABSTRACT ARTICLE HISTORY

CONTACT Abbas Moghimbeigi moghimb@yahoo.com Modeling of Noncommunicable Diseases Research

where r−1 is an over-dispersion parameter of the underlying NB distribution and πijk is

2.2. Expectation -maximization (EM) algorithm

Estimation proceeds by maximizing l1 with variance components fixed at their current

maximizing lξ and lη separately in view of the orthogonal partition lc = lξ + lη [47].

M step for β: Find β (p+1) by maximizing lηc (β; y, ν (p) ,

2.3. Robust expectation-solution (RES) algorithm

2.3.1. Logistic component

2.3.2. Negative binomial component

suggested by Hall and Shen [18] and Shen [41].

Eijk (β, c) = E(ψ(Yijk )) = q1 × p(yijk < q1 ) + μijk

For all θ = (α T , β T )T ∈  and all i = 1, 2, . . . , m, j = 1, 2, . . . , ni , k = 1, 2, . . . , nij .

Sijk (yijk , zijk , θ) = (Sijk,logistic (yijk , zijk , θ )T , Sijk,NB (yijk , zijk , θ )T )T

1) θ̂ satisfies the S(θ̂ , y) = 0

In = Eθ (Sijk (yijk , zijk , θ)|yijk ) × Eθ (Sijk (yijk , zijk , θ )|yijk )T .

2.3.4. The tuning quantile (C)

μijk = exp(β0 + β1 X1i + β2 X2i + νi +

πijk = logit−1 (α0 + α1 X1i + α2 X2i + ui + τij ) (24)

Table 6. Frequency distribution of DMFT index.

Figure 1. Box-plot of the primary school children DMFT index.

exp(0.488 − 0.782 × areaijk − 0.049 × genderijk

The mean DMFT count of children at risk equal to

μijk = exp(1.39 − 0.144 × areaijk + 0.005 × genderijk − 0.004 × brushingijk + νi +

4.2. The factors affecting the number of births

Table 8. Frequency distribution of number of birth in women of Sistan-

Figure 2. Box-Plot of the number of birth.

You might also like

For all θ = (α T , β T )T ∈ and all i = 1, 2, . . . , m, j = 1, 2, . . . , ni , k = 1, 2, . . . , nij .