Professional Documents
Culture Documents
To cite this article: Eghbal Zandkarimi, Abbas Moghimbeigi, Hossein Mahjub & Reza Majdzadeh
(2019): Robust inference in the multilevel zero-inflated negative binomial model, Journal of Applied
Statistics, DOI: 10.1080/02664763.2019.1636942
Article views: 18
Iran; b Modeling of Noncommunicable Diseases Research Center, Department of Biostatistics, School of Public
Health, Hamadan University of Medical Sciences, Hamadan, Iran; c Research Center for Health Sciences,
Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran;
d Iranian Institute for Health Sciences Research, Tehran University of Medical Sciences, Tehran, Iran
1. Introduction
The zero-inflated (ZI) regression models are proposed to model count data with excess
zeros. These models are a mixture of two components, the first component is the logistic
regression (structural zero) and the second component is the Poisson regression, neg-
ative binomial regression and etc. Various ZI models have been introduced, including
the zero-inflated Poisson (ZIP) model by Lambert [27], the zero-inflated binomial (ZIB)
model by Hall [17], the zero-inflated negative binomial (ZINB) model by Greene [16], the
zero-inflated generalized Poisson (ZIGP) model by Famoye [13] and the Hardel model
extends by Mullahy [36]. In some cases, due to the fact that the study design is hierarchi-
cal or longitudinal, a kind of correlation is observed in the data. Hall [17], Yau [47], and
Hur [23] considered correlated ZI models with cluster-specific random effects, whereas
Lee [28], Moghimbeigi [34], Zhu [48] and Almasi [2] considered correlated ZI models
with random effects in both components. In practice, the count data (correlated or non-
correlated) with excess zeros are often over-dispersed and if the ZIP model (correlated or
non-connected) is used to fit such data, the frequency of the observed zero exceeds from the
frequency of the expected zero and therefore estimation of the ZIP parameters in the Pois-
son component can be severely biased. On the other hand, in the ZINB model, the observed
over-dispersion parameter has modeled via the negative binomial component [47]. There-
fore, in order to overcome the over-dispersion in the ZI data, it is suggested to use the
ZINB model instead of the ZIP model [12]. Given the fact that multilevel ZI models have
a complex likelihood function, therefore numerical methods such as the EM algorithm
have proposed for estimating parameters. On the other hand, in the multilevel ZI models
to obtain a stable estimate, the EM algorithm in conjunction with penalized likelihood is
applied, and the restricted maximum likelihood (REML) methods are used to estimate of
the variance components [28,34,47]. The likelihood-based methods are consistent and effi-
cient, but these methods are sensitive to the existence of outliers and the poor separation
of components in ZI models and provide unstable estimates and may not be consistent and
efficient [18]. To solve this problem, Hall and Shen [18] have introduced the RES approach
for a non-correlated ZIP model. The RES approach is the subset of the M-estimator class
[22]. This approach achieves robustness in parameter estimate by applying robust estimat-
ing equations instead estimating equations in the M-step of the EM algorithm. Therefore,
we extend the RES approach for building robust estimator of the regression parameters in
the MZINB model. The paper is organized as follows. After a short review of a three-level
ZINB model in section 2.1, in sections 2.2 and 2.3, the EM algorithm and the extension of
the RES algorithm are expressed in the three-level ZINB model. Section 3 illustrates simu-
lation studies and section 4 uses a robust estimation method for DMFT data for elementary
students and the factors affecting the number of birth, and finally, section 5 relates to the
discussion.
2. Methods
2.1. Three-levels zero-inflated negative-binomial (ZINB) model
Let Yijk (i = 1, 2, . . . , m, j = 1, 2, . . . , ni , k = 1, 2, . . . , nij ) represents the kth subject
m of
the jth cluster of the second level within ni of the third level. Let n = i=1 ni
ith cluster
be the total number of clusters and N = m i=1 j=1 nij gives the total number of sub-
jects. Therefore, the probability distribution function of the three-level ZINB model can be
written as
⎧
⎪ r r
⎪
⎪ πijk + (1 − πijk )( ) yijk = 0
⎨ μijk + r
p(Yijk = yijk ) = (1)
⎪
⎪ (yijk + r) μ yijk
⎪
⎩(1 − πijk ) ( r )r ( ijk ) yijk > 0
(yijk + 1)(r) μijk +r μijk +r
JOURNAL OF APPLIED STATISTICS 3
where XN×(p+1) and XN×(q+1) have full rank p and q for the logistic and the NB
components, respectively, and α(p+1)×1 and β(q+1)×1 are the corresponding vectors of
regression coefficients. As seen in relations (2) and (3), the mixing probability and
mean of the component of the negative binomial are linked to the independent vari-
ables through logit and logarithmic link functions. The vectors u = (u1 , u2 , . . . , um ) and
ν = (ν1 , ν2 , . . . , νm ) denote random effects of the third level in the components of logistic
and negative binomial, respectively, whereas τ = (τ11 , . . . , τ1n1 , . . . , τm1 , . . . , τmnm ) and
= (
11 , . . . ,
1n1 , . . . ,
m1 , . . . ,
mnm ) are the random effects of the second level. For
simplicity of interpretation and mathematical calculations, the random effects u, ν, τ and
are assumed to be independent and normally distributed with, mean zero and variances
σu2 , σv2 , στ2 and σ
2 , respectively [28,47].
l = l1 + l2
eξijk + tijk
r
l1 = log +
yijk =0
1 + eξijk yijk>0
(yijk + r)
× log( ) + r × log(tijk ) + yijk × log(1 − tijk ) − log(1 + eξijk ) (4)
(yijk + 1)(r)
1
l2 = − {m × log(2π σν2 ) + σν−2 ν T ν + n × log(2π σ
2 ) + σ
−2
T
}
2
1
− {m × log(2π σu2 ) + σu−2 uT u + n × log(2π στ2 ) + στ−2 τ T τ } (5)
2
where
r r
tijk =
r + exp(X T ijk × β + νi +
ij )
4 E. ZANDKARIMI ET AL.
where Zijk is an unobserved binary variable indicating whether yijk comes from the latent
class of 0 (Zijk = 1) or yijk ∼ NB (Zijk = 0). Such a decomposition of the complete-data
log-likelihood enables a convenient method for parameter estimation through separate
maximization of the two log-likelihood components [34]. The algorithm EM starts with the
T T T T T T T
initial values of θ (0) = (α (0) , u(0) , τ (0) , β (0) , ν (0) ,
(0) ) and repeats the steps below
to achieve convergence [47].
2.2.1. E-step
(p)
In the E-step of the EM algorithm, Zijk is estimated by its conditional expectation zijk
T T T T T T
under current estimates α (p) , u(p) , τ (p) , β (p) , ν (p) and
(p) , where p denotes the pth
iteration.
(p) T T T T T T
zijk = p(zero state|yijk , α (p) , u(p) , τ (p) , β (p) , ν (p) ,
(p) )
p(yijk |zero state) × p(zero state)
=
p(yijk |zero state) × p(zero state) + p(yijk |Negative Binomial state)
×p(Negative Binomial state)
⎧
⎨ 1
(p) (p) r̂(p) yijk = 0
1+exp(−(Xijk ×α̂
T (p) +ûi +τ̂ij ))tijk
= (8)
⎩0 yijk > 0
and
r̂(p) r
tijk = (p) (p)
(9)
r + exp(X T ijk × β̂ (p) + ν̂i ˆ ij )
+
2.2.2. M-step
(p) ˆ (p) , r̂(p) } can be obtained by
With the Zijk is fixed at Zijk , {α̂ (p) , û(p) , τ̂ (p) } and {β̂ (p) , ν̂ (p) ,
M step for α: Find α (p+1) by maximizing lξc (α; y, u(p) , τ (p) , z(p) ). So, maximizing α is
equivalent to solving the estimating equation as
nij
1
m ni
(p) eξijk
zijk − ×Xijk = 0 (10)
N i=1 j=1
k=1
1 + eξijk
In the M-step of EM algorithm, it is assumed that the dispersion parameter (r) and the
variance components (σu2 , σν2 , στ2 and σ
2 ) are given. But, these are unknown and required
to be estimated. Kelvin [47] suggested an updated estimate of the dispersion parameter
(r̂(p) ), that obtains by maximizing lηc . Also, the estimation of the asymptotic standard error
of the scale parameter in the negative binomial component is according to the proposed
method of Kelvin [28]. The recursive Newton–Raphson is used to estimate parameters. The
square root of the inverse of the diagonal elements of the information matrix provides the
respective standard errors of the regression coefficients α and β [28,34]. After estimating
linear predictive elements, variance components are estimated, and this process continues
until the estimation of the parameters converges. Therefore, the REML approach has been
used to estimate the variance components [28,34,47].
Here
(.) denotes a score function or the derivation relative to the vector of the parameters
of the likelihood function. In the GLM models, for different distributions, various robust
estimators have been proposed that we note in following.
and the design matrices [31]. Therefore, in the RES algorithm for obtaining α (p+1) , the
estimating Equation (10) in the M-step of the EM algorithm is substituted by with robust
estimating equation as
nij
1
m ni
(p) eξijk
W(Xijk ) × zijk − × Xijk = 0 (14)
N i=1 j=1
k=1
1 + eξijk
√
where, W(X) = 1 − h, and h (a vector N × 1 and N is a total number of observations) is
the vector of leverages or diagonal elements of the hat matrix H = X(X T X)−1 X T (a matrix
N × N). Here, W(X) is a simple function to down-weight the outlier observations and large
leverage points [1,7,8,18,22].
where ψ(.) is a continuous and bounded function which depends on a few tuning constants
and ai () is a correction term, ensuring Fisher consistency at the model [1] and is defined
as
∂μi
ai () = E(ψ(.)) × v−1 (μi ) × × W(xi ) × xi (16)
∂ηi
In fact, ψ(.) is a weight on the response vector in order to reduce the effect of the outliers.
Bianco and Yohai introduced ψ-BY [4], ψ-Tukey and ψ-Hampel introduced by Cantoni
et al. [1] and ψ-Huber introduced by Huber [22]. The Huber’s function has been used [11]
to build robust estimators in the negative binomial model. This function defined as
ψ-Huber (rp ,C) = max ( − C, min (C, rp )) (17)
where rp is Pearson residual and C is tuning constant.
Therefore, we suggest (in the RES algorithm) for obtaining β (p+1) , the estimating
Equation (11) substituted with the robust estimating equation below
nij
1
m ni
(p) (ψ(y ijk ) − E ijk (β, c))
(1 − zijk ) × W(X ijk ) × Eijk (β,c)
× Xijk =0 (18)
N 1+
i=1 j=1 k=1 r
For weight on the response, we choose the ψ-Huber which has considered by Hall and
Shen [18]. Therefore
⎧
⎪
⎨q1 yijk < q1
ψ(yijk ) = yijk yijk ∈ [q1 , q2 ] (19)
⎪
⎩
q2 yijk > q2
where q1 and q2 are quantiles of order C and 1–C of the negative binomial compo-
nent, respectively, and for a Fisher consistency correction term we use the function have
JOURNAL OF APPLIED STATISTICS 7
where probabilities of relation (20) are computed based on the negative binomial com-
ponent density. W(Xijk ) has the same definition as W(X ). In the ψ(y ), a quartile is
ijk ijk
chosen, so that maintain a trade-off between efficiency at the model and robustness under
outliers [1,8,18,41]. In the simulation studies and real data, we take C = 0.01. In the S-Step
of the RES algorithm, we use the Newton–Raphson algorithm. To increase the convergence
speed of the algorithm, the initial values proposed are used [18,41]. The calculation of the
P-value is based on the Wald method and we report this P-value in all of the paper.
2.3.3. Asymptotic
In this section we prove, if the RES algorithm converges, the robust estimating equations are
unbiased, consistent and asymptotically normal (under mild regularity conditions). Since
the estimation of the parameters is obtained from solving the robust estimating Equations
(14) and (18), so for simplicity, with combining them we have the following equation:
nij
1
m ni
S(θ , y) = Eθ (Sijk (yijk , zijk , θ )|yijk ) = 0 (21)
N i=1 j=1
k=1
with
nij
1
m ni
eξijk
Sijk,logistic (yijk , zijk , θ) = W(Xijk ) × zijk − × Xijk
N
i=1 j=1 k=1
1 + eξijk
and
nij
1
m ni
Sijk,NB (yijk , zijk , θ) = (1 − zijk ) × W(X ijk )
N i=1 j=1
k=1
(ψ(yijk ) − Eijk (β, c))
× Eijk (β,c)
× Xijk
1+ r
Rosen et al. [39] showed under certain regularity conditions if the expectation solution
algorithm converges and there exists a point θ̂ ∈ such that lim θ (δ) = θ̂ , where θ (δ) ,
δ→∞
for δ = 0, 1, 2, . . . ,is a sequence generated by the expectation solution, then
Hall et al. [18] showed that the conditions of this proposition are easily verifiable for the
RES algorithm that has applied to the ZIP regression model. Therefore, if the RES algorithm
converges, it converges to a solution θ̂ of an unbiased estimating equation. Moreover, under
regularity conditions [10], the RES estimator is consistent (θ̂ → θ) and asymptotically
√ D
normal ( n(θ̂ − θ) −→ N(0, V)). Here, V = U −1 IU −T where
∂
U −T = (U −1 )T , U = −E( S(θ , y)) and I = E(N × S(θ , y) × S(θ , y)T ).
∂θ T
The asymptotic variance θ̂ can be estimated by Vn = Un−1 In Un−T at θ̂ , where
n
1
m ni ijk
∂
Un = − Eθ ( T (Sijk (yijk , zijk , θ )|yijk ))
N i=1 j=1 ∂θ
k=1
and
n
1
m ni
ijk
The standard errors of the coefficients α and β are the square roots of the diagonal elements
of the variance-covariance matrix Vn .
3. Simulation studies
The purpose of the simulation is to evaluate the performance of the EM algorithm and
the RES algorithm in the presence of the outliers and the different probabilities of mixing.
In this section, we consider the various modes of mixing probability with mean parame-
ter near to real data (μ = 4) and for ensuring exist over-dispersion we suppose r−1 = 2.
To evaluate the consistency of estimation methods, we conduct simulation studies in two
sample size (n = 250 and n = 500). For sample size n = 250, the number of clusters in
the third level is 5 (np = 5) with 50 (m × nc = 10 × 5 = 50) subjects in each cluster and
the number of clusters in the second level is 25 clusters (np × nc = 5 × 5 = 25) with 10
JOURNAL OF APPLIED STATISTICS 9
Table 1. Bias and MSE for three level ZINB model for different values of the tuning quantile C.
C = 0.001 C = 0.005 C = 0.01 C = 0.015 C = 0.02
MP Parameters Bias MSE Bias MSE Bias MSE Bias MSE Bias MSE
π = 0.3
α0 = −0.7 0.524 0.275 0.508 0.258 0.495 0.245 0.486 0.236 0.476 0.227
α1 = −1 0.521 0.272 0.521 0.272 0.519 0.271 0.519 0.269 0.517 0.268
α2 = 0.6 −0.256 0.066 −0.253 0.065 −0.252 0.064 −0.249 0.063 −0.247 0.062
β0 = 1.8 0.454 0.207 0.393 0.154 0.338 0.114 0.294 0.087 0.255 0.065
β1 = −0.4 0.102 0.011 0.127 0.017 0.144 0.021 0.155 0.024 0.167 0.028
β2 = 0.05 −0.026 0.001 −0.024 0.001 −0.025 0.001 −0.026 0.001 −0.025 0.001
r−1 = 2 0.789 0.624 0.842 0.709 0.836 0.699 0.818 0.669 0.804 0.647
π = 0.5
α0 = −0.5 0.397 0.158 0.391 0.153 0.391 0.153 0.392 0.154 0.391 0.153
α1 = 0.55 −0.048 0.003 −0.043 0.003 −0.042 0.002 −0.041 0.002 −0.042 0.002
α2 = 0.55 −0.138 0.019 −0.138 0.019 −0.143 0.021 −0.145 0.022 −0.148 0.022
β0 = 1.8 0.333 0.112 0.277 0.077 0.222 0.049 0.173 0.030 0.134 0.018
β1 = −0.4 0.126 0.016 0.153 0.024 0.173 0.030 0.192 0.037 0.205 0.042
β2 = 0.05 0.027 0.001 0.023 0.001 0.018 0.001 0.014 0.001 0.011 0.001
r−1 = 2 0.968 0.937 0.977 0.954 0.937 0.880 0.894 0.800 0.865 0.749
π = 0.6
α0 = −0.5 0.488 0.238 0.472 0.223 0.457 0.209 0.444 0.198 0.435 0.189
α1 = 0.9 −0.144 0.021 −0.141 0.020 −0.134 0.019 −0.133 0.018 −0.129 0.017
α2 = 1 −0.291 0.085 −0.290 0.085 −0.286 0.083 −0.284 0.081 −0.282 0.079
β0 = 1.8 0.390 0.153 0.317 0.101 0.246 0.061 0.193 0.037 0.149 0.022
β1 = −0.4 0.149 0.023 0.175 0.031 0.197 0.039 0.206 0.043 0.215 0.047
β2 = 0.05 −0.003 0.001 −0.007 0.001 −0.007 0.001 −0.007 0.001 −0.009 0.001
r−1 = 2 0.785 0.616 0.805 0.648 0.787 0.619 0.767 0.588 0.742 0.550
π = 0.8
α0 = 1.3 0.142 0.021 0.127 0.017 0.113 0.013 0.102 0.011 0.090 0.008
α1 = 0.6 0.009 0.001 0.014 0.001 0.019 0.001 0.019 0.001 0.020 0.001
α2 = −0.3 −0.013 0.001 −0.015 0.001 −0.017 0.001 −0.019 0.001 −0.020 0.001
β0 = 1.8 0.282 0.081 0.189 0.037 0.100 0.011 0.032 0.002 −0.025 0.001
β1 = −0.4 0.199 0.041 0.229 0.054 0.250 0.064 0.256 0.067 0.261 0.069
β2 = 0.05 −0.010 0.002 −0.019 0.002 −0.028 0.002 −0.037 0.002 −0.042 0.003
r−1 = 2 0.769 0.592 0.759 0.577 0.711 0.506 0.669 0.448 0.633 0.401
Note: MP: Mixing probability; MSE: Mean Square Error.
(m = 10) subjects per cluster. In the sample size n = 500, the number of clusters on the
third level is 5 (np = 5) with 100 (m × nc = 10 × 10 = 100) subjects in each cluster and
the number of clusters in the second level is 50 (np × nc = 5 × 10 = 50) with 10 (m = 10)
subjects per cluster. The results for sample size 250 at the top of the tables and the results
for the sample size of 500 are listed below. In this paper, we use the Monte Carlo simulation
method. In all simulation studies, we produce data from the three-level ZINB model with
the distribution below
0 πijk
yijk ∼ (22)
NB(μijk , r) 1 − πijk
where r = 0.5 is the dispersion parameter.
Table 2. Simulation results for the three level ZINB model with π = 0.3.
RES EM
Sample size Parameters Bias MSE Bias MSE
n = 250
α0 = −0.7 0.5224 0.2734 0.5719 0.3275
α1 = −1 0.5162 0.2671 0.5089 0.2595
m = 10 α2 = 0.6 −0.2529 0.0645 −0.2890 0.0841
np = 5 β0 = 1.8 0.3989 0.1594 0.5091 0.2596
nc = 5 β1 = −0.4 0.1218 0.0151 0.0871 0.0080
β2 = 0.05 0.0231 0.0009 −0.0153 0.0007
r−1 = 2 0.5661 0.4077 −1.9121 3.6565
n = 500
α0 = −0.7 0.5206 0.2713 0.5916 0.3502
α1 = −1 0.4847 0.2353 0.4917 0.2420
m = 10 α2 = 0.6 −0.2281 0.0524 −0.2823 0.0799
np = 5 β0 = 1.8 0.3886 0.1512 0.5306 0.2818
nc = 10 β1 = −0.4 0.1403 0.0199 0.0881 0.0080
β2 = 0.05 −0.007 0.0002 −0.0148 0.0005
r−1 = 2 0.2380 0.0567 −1.9498 3.8023
Note: RES: Robust Expectation-Solution; EM: Expectation–Maximization; MSE: Mean Square Error.
U(0, 1) and X2i ∼ U(0, 1)). In order to ensure that we really have outliers, we randomly
select 5% of the generated ‘y’ responses and substitute it with ‘y + 15’. In this section, the
tuning quantile (C) is 0.01 and the regression structure for the mixing probability and the
mean of the negative binomial component is used.
Study 1: In this study, a three-level ZINB model with a mixing probability of 0.3. In order
to ensure the existence of the outliers, randomly 5% responses of the negative binomial
and 3.5% responses of the logistic, have been contaminated. According to Table 2, the bias
and MSE most of the estimated parameters in the RES algorithm is smaller than the EM
algorithm, and with an increase in the sample size, the bias and MSE of almost all the
parameters are reduced (exception of one parameter (β 1 )). But in the EM algorithm, with
increasing sample size the bias and the MSE values, most of the estimated parameters does
not decrease.
Study 2: In this study, we consider a three-level ZINB model with a 50:50 mixing prob-
ability. Therefore, the components of the mixture are well separated. In this study, we
generate data similar to the previous study and to ensure the existence of the outlier, we
randomly added outliers to 5% negative binomial responses and 2.5% of logistic responses.
Actual values α, β, and r−1 with their estimates are given in Table 3. Vector β is selected
similarly the previous study also the values of vector α are chosen so that the value of ZI
is moderate (π = 0.5). According to Table 3, in the small sample size, the bias of the EM
algorithm is smaller than RES algorithm. But by increasing the sample size to 500, in most
situations, the bias of the estimated parameters in the RES algorithm is smaller than the
EM algorithm. In the EM algorithm, with increasing sample size, the MSE values do not
decrease and there is no consistency.
Study 3: In this study, we consider a three-level ZINB model with a poor mixing prob-
ability (π = 0.65) consider. In order to ensure the existence of the outliers, randomly
5% responses of the negative binomial and 2% responses of the logistic substitute with
‘y + 15’. According to Table 4, the bias and MSE most of the estimated parameters in
the RES algorithm is smaller than the EM algorithm, and in the RES algorithm with an
JOURNAL OF APPLIED STATISTICS 11
Table 3. Simulation results for the three level ZINB model with π = 0.5.
RES EM
Sample size Parameters Bias MSE Bias MSE
n = 250
α0 = −0.5 0.3329 0.1114 0.2869 0.0829
α1 = 0.55 −0.0276 0.0014 −0.0252 0.0014
m = 10 α2 = 0.55 −0.1420 0.0208 −0.1192 0.0149
np = 5 β0 = 1.8 0.4194 0.1762 0.2738 0.0754
nc = 5 β1 = −0.4 0.2319 0.0541 0.0494 0.0096
β2 = 0.05 0.0006 0.0003 0.0089 0.0007
r−1 = 2 0.1931 0.0374 −1.8884 3.5667
n = 500
α0 = −0.5 −0.2235 0.0509 0.2915 0.0853
α1 = 0.55 0.0548 0.0036 −0.0469 0.0025
m = 10 α2 = 0.55 0.0052 0.0007 −0.1261 0.0163
np = 5 β0 = 1.8 0.0592 0.0037 0.2993 0.0898
nc = 10 β1 = −0.4 0.1286 0.0167 0.1038 0.0111
β2 = 0.05 −0.0172 0.0005 −0.0202 0.0007
r−1 = 2 −0.0597 0.0036 −1.9396 3.7630
Note: RES, Robust Expectation-Solution; EM, Expectation–Maximization; MSE, Mean Square Error.
Table 4. Simulation results for the three level ZINB model with π = 0.65.
RES EM
Sample size Parameters Bias MSE Bias MSE
n = 250
α0 = −0.5 0.4273 0.1831 0.4971 0.2475
α1 = 0.9 −0.1656 0.0281 −0.1639 0.0275
m = 10 α2 = 1 −0.2487 0.0624 −0.2429 0.0596
np = 5 β0 = 1.8 0.2733 0.0751 0.4885 0.2391
nc = 5 β1 = −0.4 0.2009 0.0408 0.0927 0.0093
β2 = 0.05 −0.0425 0.0023 0.0151 0.0010
r−1 = 2 0.2679 0.0718 −1.8817 3.5412
n = 500
α0 = −0.5 −0.1006 0.0111 0.4834 0.2339
α1 = 0.9 0.0150 0.0007 −0.1493 0.0226
m = 10 α2 = 1 −0.0737 0.0059 −0.2532 0.0644
np = 5 β0 = 1.8 0.0636 0.0043 0.5014 0.2517
nc = 10 β1 = −0.4 0.1818 0.0333 0.1302 0.0173
β2 = 0.05 −0.0026 0.0002 0.0013 0.0004
r−1 = 2 0.3385 0.1161 −1.9239 3.7020
Note: RES: Robust Expectation-Solution; EM: Expectation–Maximization; MSE: Mean Square Error.
increase in the sample size, the bias and MSE almost all of the parameters are significantly
reduced (exception of r−1 ). On the other hand, in the EM algorithm, in most cases, the
inconsistency exists and with increasing sample size, MSE values did not decrease.
Study 4: In this study, a three-level ZINB model with a poor mixing probability
(π = 0.8). In order to generate outliers, we substitute 5% of the negative binomial
responses and 1% of the logistic responses with ‘y + 15’. Given that both sides of the ZINB
component have been added to the outlier data, the reported bias is relatively large in some
cases. According to Table 5, in most situations, the bias of the estimated parameters in the
RES method is smaller than the EM algorithm. It can be seen that in the EM algorithm, the
bias and MSE of the scale parameter are relatively large. By comparing the values of MSE
in two sample sizes (exception of two parameters), in the RES approach, there is complete
12 E. ZANDKARIMI ET AL.
Table 5. Simulation results for the three level ZINB model with π = 0.8.
RES EM
Sample size Parameters Bias MSE Bias MSE
n = 250
α0 = 1.3 −0.0241 0.0015 0.0829 0.0076
α1 = 0.6 0.0471 0.0033 −0.0276 0.0019
m = 10 α2 = −0.3 −0.0283 0.0018 0.0181 0.0014
np = 5 β0 = 1.8 0.0127 0.0011 0.2623 0.0699
nc = 5 β1 = −0.4 0.2548 0.0661 0.0945 0.0110
β2 = 0.05 −0.0276 0.0019 −0.0327 0.0030
r−1 = 2 0.2499 0.0626 −1.7835 3.1825
n = 500
α0 = 1.3 −0.2018 0.0413 0.0248 0.0010
α1 = 0.6 −0.0173 0.0008 0.0201 0.0009
m = 10 α2 = −0.3 −0.0002 0.0005 0.0318 0.0014
np = 5 β0 = 1.8 −0.0092 0.0005 0.2730 0.0750
nc = 10 β1 = −0.4 0.2046 0.0422 0.1633 0.0275
β2 = 0.05 −0.0447 0.0024 −0.0393 0.0024
r−1 = 2 0.0536 0.0030 −1.8949 3.5919
Note: RES: Robust Expectation-Solution; EM: Expectation–Maximization; MSE: Mean Square Error.
consistency and MSE values have decreased with increasing sample size. But in the EM
algorithm, in most cases, the inconsistency exists and with increasing sample size, MSE
values have not decreased.
4. Data analysis
We apply this robust approach for the primary school students DMFT index and factors
affecting fertility.
4.1. DMFT
The DMFT index represents the oral and dental status of individuals, and this indicator
counts the number of the decayed, missing and filled surface of the permanent and primary
teeth. The data used in this article is a part of the 1991 national Iranian health study [34].
In this study, clusters with unequal size are considered within the provinces, and then the
health status of the residents (included the dental health status) has been investigated. In
this article, due to the importance of oral and dental health in primary school children, the
DMFT index of 1045 students residing in 17 provinces of Iran has been studied. The child’s
DMFT index is the response. yijk (i = 1, 2, . . . , m; j = 1, 2, . . . , ni ; k = 1, 2, . . . , nij ) be
the DMFT index of the kth child in the jth cluster and ith province, and the covariates used
in the study include the residential region (rural 1; urban 0); gender (female 1; male 0); and
brushing (no brushing 0; seldom 1; once a day 2; more than once a day 3). In the sample,
43.7% reside in the rural district, 46.9% are girls and 67% of students use the toothbrush
at least once a day. The frequency, percent, and percentage of cumulative frequency of the
DMFT index are given in Table 6 so that 62.2% of the students have a DMFT index zero.
On the other hand, the results of the score test [33] for extra zeros against the NB-mixed
regression and the score test [35] for assessing ZIP regression against Poisson regression
in the multilevel count data are significant (P < 0.0001). The box-plot shows the presence
of the outliers in the observations (Figure 1). In this paper, we use the RES approach with
JOURNAL OF APPLIED STATISTICS 13
quantile constant equals 0.01. The coefficients estimate and their SE are given in Table 7.
In the DMFT studies which are modeled using the ZI model, π represents the percentage
of subjects in the sample that are not susceptible to dental caries and 1-π of the selected
sample are susceptible to DMFT. Therefore, the probability of ZI in the sample equal to
Table 7. Estimation of Parameters and standard error (in parentheses) of the DMFT data.
Estimate method RES EM
Component Logistic NB Logistic NB
Fixed effects Intercept 0.488 (0.354) 1.39 (0.103)*** 0.329 (0.405) 1.411 (0.102)*
Residential areas −0.782 (0.177)*** −0.144 (0.022)*** 0.080 (0.204) −0.127 (0.065)*
Gender −0.049 (0.161) 0.005 (0.066) −0.227 (0.171) 0.112 (0.057)*
Brushing 0.18 (0.088)* −0.004 (0.037) −0.031 (0.095) −0.019 (0.031)
1/r 0.007 (0.001)*** 0.103(0.002)***
Random effect σ 2 (Province) 1.016 (0.415)* 0.020 (0.006)** 1.396 (0.57)* 0.033 (0.018)**
σ 2 Cluster 0.042 (0.214) 0.001 (0.03) 0.989 (0.265)* 0.030 (0.021)
Note: NB, Negative Binomial; *P < 0.05, **P < 0.01, ***P < 0.001.
Therefore, after adjusting the random effects of clusters and provinces, the probability
ZI for the boy student resident in the urban district who does not use the toothbrush is
0.62. According to Table 7, the results of the estimation of parameters are different in both
RES and EM algorithms. In the RES approach, covariates region (p-value < 0.0001) and
brushing (p-value = 0.033) are significant But in EM algorithm, none of the variables are
significant (logistic part). In other words, students’ DMFT index resides in the rural dis-
tricts are more than urban districts and brushing having a positive effect on the decrease
DMFT index. In the NB part and by RES approach, only the region (p-value < 0.0001) is
significant, but in the EM algorithm, covariates of the region (p-value = 0.043) and gender
(p-value = 0.043) are significant. According to the results of RES in the negative binomial
part, students in the rural districts have the lower DMFT index than students in the urban
districts. Therefore, although the number of students with zero DMFT index in urban
regions is higher than in rural regions, the DMFT index of the students in rural regions
is lower than urban regions.
have medium to high incomes and mean of the marriage age is 17.05 ± 3.74. The frequency,
percent, and percentage of cumulative frequency of the number of women’s children are
given in Table 8. Therefore, 24.8% of women did not have children. The results of the score
test [33] for extra zeros against the NB-mixed regression and the score test [35] for assess-
ing ZIP regression against Poisson regression in the multilevel count data are significant
(P < 0.0001). On the other hand, the box-plot shows the presence of the outliers in the
observations (Figure 2). In this section, we use the RES algorithm with C = 0.01. Results
from fitting three-level ZINB model are given in Table 9, which presents parameter esti-
mates and standard error for all parameters related to the predictors. In the RES algorithm,
16 E. ZANDKARIMI ET AL.
Table 9. Estimation of parameters and standard error (in parentheses) of the number of birth data.
RES EM
Parameters n (%) Logistic NB Logistic NB
Fixed Effects Intercept − −3.29 (0.54)*** 1.55 (0.10)*** −3.35 (0.58)*** 1.58 (0.11)***
Marriage age − 0.05 (0.03) −0.04 (0.01)*** 0.05 (0.03) −0.04 (0.01)***
Contraceptive
No 1485 (48.8) − − − −
Yes 1558 (51.2) 0.43 (0.23) −0.08 (0.03)* 0.50 (0.25)* −0.09 (0.04)*
Household income
Low 2803 (92.1) – – – –
Medium 207 (6.8) −0.07 (0.46) −0.005 (0.07) −0.12 (0.51) −0.007 (0.07)
High 33 (1.1) −2.26 (0.003)*** −0.17 (0.19) −1.74 (4.78) −0.14 (0.22)
1/r – 0.25 (0.003)*** 0.26 (0.003)***
Random Effect σ 2 (City) – 0 (0.05) 0.02 (0.01)* 0 (0.06) 0.02 (0.01)*
σ 2 (Cluster) – 0 (0.11) 0.05 (0.01)*** 0.01 (0.12) 0.06 (0.01)***
Note: NB: Negative Binomial; *P < 0.05, **P < 0.01, ***P < 0.001.
covariate high-income level (p-value < 0.0001) is significant But in the EM algorithm,
the use of contraceptive (p-value = 0.043) is significant (logistic part). In the NB part
and by the RES approach, the marriage age (p-value < 0.01) and the use of contraceptive
(p-value = 0.019) are significant. In the EM algorithm, similar to the RES algorithm, the
marriage age (p-value < 0.0001) and the use of contraceptive (p-value = 0.012) are sig-
nificant. Therefore, according to the results of the RES algorithm, the birth rate decreases
with increasing marriage age and the use of contraceptive methods.
5. Discussion
A popular way to model correlated count data with excess zeros and over-dispersion
simultaneously is by means of the MZINB model. The numerical methods such as the
EM algorithm are used for estimating the parameters, but in the presence of outliers and
other types of contaminations the likelihood-based the approaches are unstable. To over-
come this challenge we have extended RES approach [18] for building robust estimates of
the regression parameters in the multilevel ZINB distribution. This approach achieves to
robustness in the estimation of the parameters by means of the robust estimating equa-
tions. Robust estimating equations in the RES algorithm belong to the Mallows class and
in the logistic component only design matrix or covariates are weighted and reduces the
impact of possible leverage points, but in the negative binomial component is considered a
general class of M-estimators of type Mallows, where the influence of deviations on y and
on x are bound separately [8]. Given the fact that reducing the effect of outliers instead
of omitting the effect of theirs is the only logical way to deal with the outliers, therefore,
robust estimating equations are proposed in the RES algorithm that causes down-weight
of the outliers. The simulation results showed that the RES algorithm was robust com-
pared to the EM algorithm in the presence of outliers and different modes of the MZINB
model component separation. But in the EM algorithm, in most cases, the inconsistency
existed, also the bias and MSE of some parameters (e.g. scale parameter) were relatively
large. On the other hand, although the variance /MSE and bias of most of the parameters
in the RES algorithm decreases with increasing sample size, but the variance/MSE of most
JOURNAL OF APPLIED STATISTICS 17
of the parameters of the RES algorithm is greater than the EM algorithm because the effi-
ciency of this method is lower than the EM algorithm [20]. The simulation studies confirm
this (Tables 2–5). To the balance between efficiency and robustness, C should be appropri-
ately selected [45]. For this purpose, different values of C were investigated in the interval
[0.001, 0.02], and according to simulation studies in section 2.3.4, the value of C = 0.01
gave us good results. In the data analysis section, DMFT and birth data were used to exam-
ine the usefulness of the RES algorithm in comparison with the EM algorithm. The RES
algorithm gave us safe and logical results, which was consistent with DMFT and fertility
literature. Many studies [6,14,30,42,43,46] have confirmed the positive impact of brushing
on the decline of the DMFT index. Several studies [5,37,38,40] confirm that the DMFT
index score in rural areas is higher than in urban areas. On the other hand, many studies
[24,26,44] have confirmed the negative impact of marriage age on the increasing number
of birth. Also, some of the papers [3,29] have confirmed the negative impact of the use of
contraceptive methods on increasing number of birth.
In the end, it’s easy to extend the RES approach for other multi-level models with excess
zeros, such as ZIGP, ZIP, ZIB, etc. We suggest this algorithm instead of the EM algorithm
for correlated data with excess zero and in the presence of outliers.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This study has been adapted from a PhD thesis at Hamadan University of Medical Sciences. The
study was funded by Vice-chancellor for Research and Technology, Hamadan University of Medical
Sciences (No. 970204575).
ORCID
Eghbal Zandkarimi http://orcid.org/0000-0001-7583-6628
Abbas Moghimbeigi http://orcid.org/0000-0002-3803-3663
Hossein Mahjub http://orcid.org/0000-0002-9375-3807
Reza Majdzadeh http://orcid.org/0000-0003-3989-4470
References
[1] W.H. Aeberhard, E. Cantoni, and S. Heritier, Robust inference in the negative binomial regression
model with an application to falls data. Biometrics 70 (2014), pp. 920–931.
[2] A. Almasi, M.R. Eshraghian, A. Moghimbeigi, A. Rahimi, K. Mohammad, and S. Fallahigilan,
Multilevel zero-inflated generalized Poisson regression modeling for dispersed correlated count
data. Stat. Methodol. 30 (2016), pp. 1–14.
[3] A. Beatty, Recent Fertility Trends in Sub-Saharan Africa: Workshop Summary, National
Academies Press, Washington, DC, 2016.
[4] A.M. Bianco and V.J. Yohai, Robust estimation in the logistic regression model, in Robust
Statistics, Data Analysis, and Computer Intensive Methods, Springer, New York, NY, 1996,
pp. 17–34.
[5] L. Bilder, E. Stepco, D. Uncuta, D. Aizenbud, E. Machtei, A. Bilder, and H.D. Sgan-Cohen, The
pathfinder study among schoolchildren in the Republic of Moldova: Dental caries experience. Int.
Dent. J. 68 (2018), pp. 344–347.
18 E. ZANDKARIMI ET AL.
[6] T. Cakar, L. Harrison-Barry, M. Pukallus, S. Kazoullis, and W. Seow, Caries experience of chil-
dren in primary schools with long-term tooth brushing programs: A pilot Australian study. Int. J.
Dent. Hyg. 16 (2018), pp. 233–240.
[7] E. Cantoni, A robust approach to longitudinal data analysis. Canad. J. Statist. 32 (2004),
pp. 169–180.
[8] E. Cantoni and E. Ronchetti, Robust inference for generalized linear models. J. Am. Stat. Assoc.
96 (2001), pp. 1022–1030.
[9] R.J. Carroll and S. Pederson, On robustness in the logistic regression model. J. R. Stat. Soc. Ser.
B. Stat. Methodol. 55 (1993), pp. 693–706.
[10] R. Carroll, D. Ruppert, and L. Stefanski, Nonlinear Measurement Error Models, Monographs on
Statistics and Applied Probability. Vol. 63, Chapman and Hall, New York, 1995.
[11] R. Chambers, E. Dreassi, and N. Salvati, Disease mapping via negative binomial M-quantile
regression, (2013).
[12] C.D. Desjardins, Modeling zero-inflated and overdispersed count data: An empirical study of
school suspensions. J. Exp. Educ. 84 (2016), pp. 449–472. doi:10.1080/00220973.2015.1054334.
[13] F. Famoye and K.P. Singh, Zero-inflated generalized Poisson regression model with an application
to domestic violence data. J. Data. Sci. 4 (2006), pp. 117–130.
[14] F.A. Farooqi, A. Khabeer, I.A. Moheet, S.Q. Khan, and I. Farooq, Prevalence of dental caries in
primary and permanent teeth and its relation with tooth brushing habits among schoolchildren
in eastern Saudi Arabia. Saudi Med. J. 36 (2015), pp. 737–742.
[15] J. Feng, H. Xu, S. Mannor, and S. Yan, Robust logistic regression and classification, in Advances
in neural information processing systems, (2014), pp. 253–261.
[16] W.H. Greene, Accounting for excess zeros and sample selection in Poisson and negative
binomial regression models, (1994).
[17] D.B. Hall, Zero-inflated Poisson and binomial regression with random effects: A case study.
Biometrics 56 (2000), pp. 1030–1039.
[18] D.B. Hall, and J. Shen, Robust estimation for zero-inflated Poisson regression. Scan. J. Stat.
37 (2010), pp. 237–252.
[19] F. Hampel, E. Ronchetti, P. Rousseeuw, and W. Stahel, Robust Statistics, John Wiley & Sons,
New York, 1986.
[20] S. Heritier, E. Cantoni, S. Copt, and M.-P. Victoria-Feser, Robust methods in biostatistics, ed,
Vol. 825, John Wiley & Sons, Chichester, 2009.
[21] S. Hosseinian, Robust Inference for Generalized Linear Models: Binary and Poisson Regression,
Verlag nicht ermittelbar (Lausanne), 2009.
[22] P.J. Huber, Robust estimation of a location parameter. An. Math. Stat. 35 (1964),
pp. 73–101.
[23] K. Hur, D. Hedeker, W. Henderson, S. Khuri, and J. Daley, Modeling clustered count data with
excess zeros in health care outcomes research. Health Serv. Outcomes Res. Methodol. 3 (2002),
pp. 5–20.
[24] A. Kabir, G. Jahan, and R. Jahan, Female age at marriage as a determinant of fertility. Sciences.
1 (2001), pp. 372–376.
[25] N. Kordzakhia, G. Mishra, and L. Reiersølmoen, Robust estimation in the logistic regression
model. J. Stat. Plan. Inference. 98 (2001), pp. 211–223.
[26] A. Kumar Acharya, The influence of female age at marriage on fertility and child loss in India.
Trayectorias 12 (2010), pp. 61–80.
[27] D. Lambert, Zero-inflated Poisson regression, with an application to defects in manufacturing.
Technometrics. 34 (1992), pp. 1–14.
[28] A.H. Lee, K. Wang, J.A. Scott, K.K. Yau, and G.J. McLachlan, Multi-level zero-inflated Pois-
son regression modelling of correlated count data with excess zeros. Stat. Methods Med. Res.
15 (2006), pp. 47–61.
[29] H. Leridon, Demographic effects of the introduction of steroid contraception in developed
countries. Hum. Reprod. Update 12 (2006), pp. 603–616.
[30] C.-J. Liu, W. Zhou, and X.-S. Feng, Dental caries status of students from migrant primary schools
in Shanghai Pudong New Area. BMC. Oral. Health. 16 (2016), p. 28.
JOURNAL OF APPLIED STATISTICS 19
[31] C.L. Mallows, On Some Topics in Robustness, Unpublished Memorandum, Bell Telephone
Laboratories, Murray Hill, NJ, 1975.
[32] G. McLachlan, On the EM algorithm for overdispersed count data. Stat. Methods Med. Res.
6 (1997), pp. 76–98.
[33] A. Moghimbeigi, A score test for extra zeros in negative binomial mixed models. J. Stat. Comput.
Simul. 81 (2011), pp. 635–644.
[34] A. Moghimbeigi, M.R. Eshraghian, K. Mohammad, and B. Mcardle, Multilevel zero-inflated
negative binomial regression modeling for over-dispersed count data with extra zeros. J. Appl.
Stat. 35 (2008), pp. 1193–1202.
[35] A. Moghimbeigi, M.R. Eshraghian, K. Mohammad, and B. McArdle, A score test for zero-
inflation in multilevel count data. Comput. Stat. Data. Anal. 53 (2009), pp. 1239–1248.
[36] J. Mullahy, Specification and testing of some modified count data models. J. Econom. 33 (1986),
pp. 341–365.
[37] S. Ngedup, M.A. Lee, D. Phurpa, and N. Wangmo, Maternal oral health: An examination survey
conducted in three referral hospitals in Bhutan. Bhutan Health J. 4 (2018), pp. 23–32.
[38] M. Păcurar, E. Bud, H. Alexandra, M. Chibelean, and M.C. Figueiredo, Clinical-statistical
analysis of correlations between caries risk indicators and the prevalence of maxillary dental
anomalies in a group of children from Tirgu Mures.
[39] O. Rosen, W. Jiang, and M.A. Tanner, Mixtures of marginal models. Biometrika 87 (2000),
pp. 391–404.
[40] S. Sanadhya, P. Aapaliya, S. Jain, N. Sharma, G. Choudhary, and N. Dobaria, Assessment and
comparison of clinical dental status and its impact on oral health-related quality of life among
rural and urban adults of Udaipur, India: A cross-sectional study. J. Basic. Clin. Pharm. 6 (2015),
p. 50.
[41] J. Shen, Robust Estimation and Inference in Finite Mixtures of Generalized Linear Models,
University of Georgia, uga, 2006.
[42] A. Sujlana, and P.K. Pannu, Family related factors associated with caries prevalence in the primary
dentition of five-year-old children. J. Indian Soc. Pedod. Prev. Dent. 33 (2015), p. 83.
[43] K.M. Thwin, W.T. Lin, and A. Than, A pilot study of oral health situation of chin population
in West Hilly Regions of Myanmar.
[44] E. Van De Walle, Age at marriage and fertility (Implications for family planning). IPPF. Med.
Bull. 7 (1973), p. 1.
[45] Y.-G. Wang, X. Lin, M. Zhu, and Z. Bai, Robust estimation using the Huber function with a
data-dependent tuning constant. J. Comput. Graph. Stat. 16 (2007), pp. 468–481.
[46] J. Winter, A. Jablonski-Momeni, A. Ladda, and K. Pieper, Effect of supervised brushing with fluo-
ride gel during primary school, taking into account the group prevention schedule in kindergarten.
Clin. Oral Investig. 21 (2017), pp. 2101–2107.
[47] K.K. Yau, K. Wang, and A.H. Lee, Zero-inflated negative binomial mixed regression modeling of
over-dispersed count data with extra zeros. Biom. J. 45 (2003), pp. 437–452.
[48] H. Zhu, S. Luo, and S.M. DeSantis, Zero-inflated count models for longitudinal measurements
with heterogeneous random effects. Stat. Methods Med. Res. 26 (2017), pp. 1774–1786.