Bai 2015

This article was downloaded by: [Universiteit Leiden / LUMC]
On: 28 April 2015, At: 04:27

Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Statistical Computation and

Simulation
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/gscs20
Mixture of linear mixed models using

multivariate t distribution
a b c
Xiuqin Bai , Kun Chen & Weixin Yao
a
Department of Statistics, Kansas State University, Manhattan, KS
66506, USA
b
Department of Statistics, University of Connecticut, Storrs, CT
06269, USA
c
Department of Statistics, University of California, Riverside, CA
92521, USA
Published online: 22 Apr 2015.
Click for updates
To cite this article: Xiuqin Bai, Kun Chen & Weixin Yao (2015): Mixture of linear mixed models
using multivariate t distribution, Journal of Statistical Computation and Simulation, DOI:
10.1080/00949655.2015.1036431
To link to this article: http://dx.doi.org/10.1080/00949655.2015.1036431
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Downloaded by [Universiteit Leiden / LUMC] at 04:27 28 April 2015
Journal of Statistical Computation and Simulation, 2015
http://dx.doi.org/10.1080/00949655.2015.1036431
Mixture of linear mixed models using multivariate t distribution

Xiuqin Baia , Kun Chenb and Weixin Yaoc∗
a Department of Statistics, Kansas State University, Manhattan, KS 66506, USA; b Department of Statistics,
University of Connecticut, Storrs, CT 06269, USA; c Department of Statistics, University of California,
Riverside, CA 92521, USA
(Received 18 December 2014; accepted 28 March 2015)
Linear mixed models are widely used when multiple correlated measurements are made on each unit of
interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be
more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in
which both the random effects and the error parts are assumed to follow normal distribution, is sensitive
to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and infer-
ence. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture
component, we assume the response and the random effects jointly follow a multivariate t distribution,
to conveniently robustify the estimation procedure. An efficient expectation conditional maximization
algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parame-
ters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation
robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal
data showcase the efficacy of the proposed approach.
Keywords: ECM algorithm; linear mixed models; longitudinal data; mixture models; multivariate t
distribution
1. Introduction
The classical linear mixed model can be expressed as
y = Xβ + Ub + , (1)
where y is an N × 1 response vector, X is an N × p design matrix for the fixed effects, β is

a p × 1 vector of fixed-effect coefficients, U is an N × q design matrix for the random effects,
b ∼ Nq (0, ) is a q × 1 vector of random effect coefficients, and is an N × 1 vector of errors for
observations and assumed to have multivariate normal distribution with mean zero and covari-
ance matrix . Based on the above model set-up, Xβ models the fixed effects and Ub models the
random effects. It follows that y has a multivariate normal distribution with mean E(y) = Xβ and
covariance matrix V = cov(y) = UUT + . For clarity and without loss of generality, in the
sequel, we shall mainly refer to the repeated measurement set-up when presenting our proposed
methodology, similar to Celeux et al.[1]
*Corresponding author. Email: weixin.yao@ucr.edu
© 2015 Taylor & Francis

2 X. Bai et al.
In many applications, however, the underlying assumption that the regression relationship is
homogeneous across all the subjects could be violated. Of particular interest here is the situation
that the subjects may form several distinct clusters, indicating mixed regression relationships.
Such heterogeneity can be modelled by a finite mixture of linear mixed regression models, con-
sisting of, say, m homogeneous groups/components.[1–3] Suppose there are I subjects under
study, and ni repeated measurements are gathered on the ith subjects, for i = 1, . . . I. We con-
sider a finite mixture of linear mixed regression models as follows. For each i = 1, . . . I, let Zi
be a latent variable with P(Zi = j) = πj , j = 1, . . . m. Given Zi = j, we assume that the response
yi ∈ Rni follows a linear mixed model, i.e.
yi = Xi β j + Ui bij + eij , (2)
where Xi ∈ Rni ×p is the fixed-effect covariate matrix, β j ∈ Rp a fixed-effect coefficient vector,

Ui ∈ Rni ×q the random-effect covariate matrix, bij the random-effect coefficient vector which is
thought as random, and eij the random error vector. Following the conventional formulations of
the normal mixture model and the mixed model, it is natural to assume that
bij ∼ Nq (0, j ), eij ∼ Nni (0, ij ),
and all bij s, eij s, for i = 1, . . . , I and j = 1, . . . , m are independent. Usually each error covariance
matrix ij is assumed to be dependent on i only through its dimension, e.g. an AR(1) correla-
tion structure with some correlation parameter ρ so that ij = (ρ, i). The correlation structure
among each ni observations on subject i is induced and modelled by the random component
Ui bij . Conditional on Zi = j, the joint distribution of (yi , bij ) is

yi Xi β j Ui j UTi + ij Ui j
|Zi = j ∼ Nni +q , , (3)
bij 0 j UTi j
and the mixture distribution of yi itself, without observing Zi , is

m
yi ∼ πj Nni (Xi β j , Ui j UTi + ij ). (4)
j=1
Although the above normal mixture linear mixed model is quite appealing in modelling the
regression relationship with the aforementioned hierarchically clustered data, one drawback of
the model is that it can be very sensitive to outliers, an undesirable property inherited from the
normal mixture model.
Motivated by Lange et al.,[4] Welsh and Richardson,[5] and Pinheiro et al.,[6] we propose
a new mixture linear mixed model by replacing the normal distribution with the multivariate t
distribution. For each mixture component, we assume the response and the random effects jointly
follow a multivariate t distribution, in a similar fashion as Equation (3), to conveniently robustify
the estimation procedure. An efficient expectation conditional maximization (ECM) algorithm is
developed for conducting maximum likelihood estimation. The degrees of freedom parameters of
the t distributions are chosen data adaptively for achieving flexible trade-off between estimation
robustness and efficiency. We demonstrate via simulation study that the proposed approach is
indeed robust and can be much more efficient than the traditional normal mixture model when
outliers are present in the data, and in the absence of outliers the proposed approach leads to
comparable performance to that of the normal mixture model. An application on lung growth of
children further showcases the efficacy of the proposed approach.
The rest of this paper is organized as follows. In Section 2, we introduce our new method
of using the multivariate t distribution in mixture of linear mixed-effects models. In Section 3,
Journal of Statistical Computation and Simulation 3
we provide a simulation study to compare our new method with the traditional normal-based
estimation method. In Section 4, an application of the new method to a real data set is provided.
We conclude this paper with some discussion in Section 5.
2. Robust t-mixture linear mixed models
2.1. The t-mixture of linear mixed models
In practice, outliers and anomalies are bounded to occur, and failure to accommodate outliers may
put both the model estimation and inference in jeopardy. This motivates us to construct a robust
t-mixture of linear mixed model. Given Zi = j, we start by assuming that the joint distribution of
(yi , bij ) is

yi Xi β j Ui j UTi + ij Ui j
|Zi = j ∼ tni +q , , νj , (5)
bij 0 j UTi j
where we use tn (μ, , ν) to denotes an n-dimensional multivariate t distribution with mean vector
μ, scale matrix and degrees of freedom ν; in the sequel we also use tn (·; μ, , ν) to denote
its probability density function. Throughout, the error covariance matrices are assumed to take
the form ij = σj2 Ri , for i = 1, . . . , I, j = 1, . . . , m, where Ri are known matrices taken to be the
identity matrix, unless otherwise noted.
The proposed approach essentially assumes that yi follows a mixture distribution,

m
yi ∼ πj tni (Xi β j , Ui j UTi + ij , νj ), (6)
j=1
and given the observed data for i = 1, . . . , I, the log-likelihood function is

⎧ ⎫

I ⎨m ⎬
ln πj tni (yi ; Xi β j , Ui j UTi + ij , νj ) . (7)
⎩ ⎭
i=1 j=1
Comparing to model (4), we have used the multivariate t distribution to replace the multivariate
normal distribution, following similar idea in [4]. This extension allows us to carry out the mix-
ture mixed-effect model analysis for any data involving errors with longer-than-normal tails. The
degrees of freedom parameters of the t distributed components are unknown and estimated from
the data, and this provides a convenient way for achieving flexible trade-off between robustness
and efficiency, i.e. in the special case ν = 1, the distribution becomes a multivariate Cauchy dis-
tribution, and as ν → ∞, the distribution rolls back to the multivariate normal. Also note that in
the above model, we have directly specified the distribution of yi as the multivariate t, instead of
separately specifying the distributions of the random effects and the error terms, as the latter is
unnecessary and may lead to untractable or inconvenient marginal distribution of yi .
To understand better about model (6), we shall discuss several of its alternative representations,
which may ultimately facilitate the maximum likelihood estimation to be elaborated in the next
section. It is known that the multivariate t distribution can be written as a normal scale mixture
distribution, i.e. its probability density function t(x; μ, , ν) can be expressed as

∞
ν ν
t(x; μ, , ν) = f x; μ, g u; , du,
0 u 2 2
4 X. Bai et al.
where f denotes the normal density and g the Gamma density. In light of the above characteriza-
tion, it is convenient to express model (6) as a hierarchical model,

m
1
yi | bij , τij , j = 1, . . . , m ∼ πj N Xi β j + Ui bij , ij ,
j=1
τij

1
bij | τij ∼ N 0, j for j = 1, . . . , m,
τij
ν ν
j j
τij ∼ Gamma , for j = 1, . . . , m. (8)
2 2
Model (6) could also be written in a conventional linear mixed model form. Given Zi = j,
yi = Xi β j + Ui bij + eij , i = 1, . . . , I,
where bij ∼ tq (0, j , νj ), and eij ∼ tni (0, ij , νj ). Condition on τij , bij is independent of eij , which
means that in general bij and eij are uncorrelated but not independent, for any νj < ∞.[6] It is
now clear that in our proposed method, both bij and eij follow multivariate t distribution, and thus
the method is robust against potential outliers in both the random effects or the within-subject
random errors.
By integrating out bij , the hierarchical model can be equivalently expressed as

m
1
yi | τij , j = 1, . . . , m ∼ πj N Xi β j , (Ui j UTi + ij ) ,
j=1
τij
ν ν
j j
τij ∼ Gamma , , for j = 1, . . . , m.
2 2
The conditional distribution of τij can then be readily derived,

νj + ni νj + δij2 (β j , j , σj2 )
τij | yi , Zi = j ∼ Gamma , ,
2 2
where
δij2 (β j , j , σj2 ) = (yi − Xi β j )T (Ui j UTi + ij )−1 (yi − Xi β j ). (9)
Therefore,
νj + ni
E(τij | yi , Zi = j) = . (10)
νj + δij2 (β j , j , σj2 )
The above results will be useful in the proposed ECM algorithm in the next section.
2.2. An efficient ECM algorithm for maximum likelihood estimation
We propose to conduct maximum likelihood estimation and inference of the proposed robust t-
mixture linear mixed model. Direct maximization of the log-likelihood function (7) constructed
from mixture multivariate t distributions is quite difficult. In this section, we derive an efficient
ECM algorithm to solve the problem, extending the works by Lange et al. [4] and Pinheiro
et al. [6] in a more general context of mixture model. The EM algorithm is commonly applied
in problems with missing or incomplete data, which is particularly suitable here, in view of the
alternative hierarchical model representation of the t-mixture model in Equation (8).
Let y = {y1 , . . . , yI }, b = {bij ; i = 1, . . . , I, j = 1, . . . , m}, and τ = {τij ; i = 1, . . . , I, j =

1, . . . , m}. Let

1 if the ith subject is from the jth mixture component,
Zij =
0 otherwise,
and Z = {Zij ; i = 1, . . . , I, j = 1, . . . , m}. Similarly, let π = {πj ; j = 1, . . . , m}, β = {β j ; j =

1, . . . , m}, = { j ; j = 1, . . . , m}, σ 2 = {σj2 ; j = 1, . . . , m}, and ν = {νj ; j = 1, . . . , m}.
In our problem, y is the observed response vector, while (b, τ , Z) can be viewed as the missing
data. Based on the hierarchical model formulation in Equation (8), the likelihood of the complete
data (y, b, τ , Z) given the covariates (Xi , Ui ) is,
m
νj νj Zij
I
1 1
πj f yi ; Xi β j + Ui bij , ij f bij ; 0, j g τij ; , .
i=1 j=1
τij τij 2 2
It follows that the complete log-likelihood function is

I
m
(π, β, , σ 2 , ν | y, b, τ , Z) = Zij ln(πj )
i=1 j=1

−1

I
m
1 1 2 1 T 1 2
+ Zij − ln σj Ri − Eij σ Ri Eij + const
i=1 j=1
2 τij 2 τij j
−1
I m
1 1 1 1
+ Zij − ln j − (bij ) T
j bij + const
i=1 j=1
2 τij 2 τij

I
m ν τij ν
j j
+ Zij − 1 ln(τij ) − νj − ln
i=1 j=1
2 2 2
νj νj
+ ln ,
2 2
where Eij = yi − Xi β j − Ui bij , and we have adopted the setting that ij = σj2 Ri . Based on the
idea of ECM algorithm, we shall separate the above log-likelihood function into four parts, based
on the parameters involved, i.e. let
(π, β, , σ 2 , ν | y, b, τ , Z) = 0 (π | y, b, τ , Z) + 1 (β, σ 2 | y, b, τ , Z)
+ 2 ( | y, b, τ , Z) + 3 (ν | y, b, τ , Z),
where

I
m
0 (π | y, b, τ , Z) = Zij ln(πj ),
i=1 j=1

I
m
ni τij T −1
1 (β, σ | y, b, τ , Z) =
2
Zij − ln σj2 − E Ri Eij
i=1 j=1
2 2σj2 ij
6 X. Bai et al.

I
m
ni
=− ln σj2
Zij
i=1 j=1
2

I m
τij −1
− Zij tr{Ri (yi − Ui bij )(yi − Ui bij ) }
T
i=1 j=1
2σj2

I m
τij T T −1
+ Zij β X R (yi − Ui bij )
i=1 j=1
σj2 j i i

I m
τij T T −1
− Zij β X R Xi β j ,
2σj2 j i i
i=1 j=1

I
1
m I m
1
2 ( | y, b, τ , Z) = Zij − ln | j | − Zij (τij bTij −1
j bij )
i=1 j=1
2 2 i=1 j=1
⎛ ⎞
1 1 ⎝ −1
I m I m
=− Zij ln | j | − tr j Zij τij bij bTij ⎠ ,
2 i=1 j=1 2 i=1 j=1
and

I
m ν ν ν !
j j j
3 (ν | y, b, τ , Z) = Zij ln + ln(τij ) − τij − ln(τij ) − ln ( ) .
i=1 j=1
2 2 2
Let θ = (π , β, , σ 2 , ν), collecting all the unknown parameters. Given θ = θ̂, we now derive
the expected complete data log-likelihood, E{(θ | y, b, τ , Z) | y, θ̂}, with respect to the missing
data (b, τ , Z) and conditional on the observed data y. This simplifies to the calculations of the
following quantities:
pij = E(Zij = 1 | θ = θ̂ , y),

τ̂ij = E(τij | θ = θ̂, y, Zij = 1),
b̂ij = E(bij | θ = θ̂, y, Zij = 1, τij ),
ˆ ij = τij cov(bij | θ = θ̂ , y, Zij = 1, τij ).

From Equation (6), it is easy to show that

ˆ j UTi , ν̂j )
πj tni (yi ; Xi β̂ j , Ui
pij = "m . (11)
ˆ T
j=1 πj tni (yi ; Xi β̂ j , Ui j Ui , ν̂j )
By Equation (10), we have

ν̂j + ni
τ̂ij = , (12)
ˆ j , σ̂j2 )
ν̂j + δij2 (β j ,
ˆ j , σ̂j2 ) is defined as in Equation (9). Next, based on the assumed multivariate t
where δij2 (β j ,
model (5) and the normal scale mixture representation,

1
bij | yi , Zij = 1, τij ∼ Nq A(yi − Xi β j ), ( j − AUi j ) ,
τij
where A = j UTi (Ui j UTi + σj2 Ri )−1 . It follows that
ˆ j UTi (Ui
b̂ij = ˆ j UTi + σ̂j2 Ri )−1 (yi − Xi β̂ j )
−1
−1
ˆj + 1 −1 1 T −1
= T
U R Ui
2 i i
U R (yi − Xi β̂ j ), (13)
σ̂j σ̂j2 i i
and
−1
ˆ ij =
ˆj−
ˆ j UTi (Ui
ˆ j UTi + σ̂j2 Ri )−1 Ui
ˆj = ˆ −1 1
j + 2 UTi R−1
i Ui . (14)
σ̂j
With all the above results, we have

I
m
E(0 (π | y, b, τ , Z) | y, θ̂ ) = pij ln(πj ), (15)
i=1 j=1
E(1 (β, σ 2 | y, b, τ , Z) | y, θ̂ )

I
m
ni I m
1
=− pij ln σj2 − pij 2 tr[R−1 ˆ T
i {Ui ij Ui + τ̂ij (yi − Ui b̂ij )(yi − Ui b̂ij ) }]
T
i=1 j=1
2 i=1 j=1
2σ j

I
m
1 I m
1
+ pij τ̂ β T XT R−1 (yi − Ui b̂ij ) −
2 ij j i i
pij 2 τ̂ij β Tj XTi R−1
i Xi β j , (16)
i=1 j=1
σj i=1 j=1
2σ j
E(2 ( | y, b, τ , Z) | y, θ̂)
⎧ ⎫
1
I m
1 ⎨
I
m
T
⎬
=− pij ln | j | − tr −1 ˆ ij ) ,
pij (τ̂ij b̂ij b̂ij + (17)
2 i=1 j=1
2 ⎩ j
i=1 j=1
⎭
and
νj νj
I
m
E(3 (ν | y, b, τ , Z) | y, θ̂ ) = pij ln + E[ln(τij ) | y, θ̂, Zij = 1] − τ̂ij
i=1 j=1
2 2
ν !
j
−E[ln(τij ) | y, θ̂ , Zij = 1] − ln . (18)
2
Following Khodabina and Alireza [7] and based on properties of generalized Gamma distribu-
tion,

νj + ni νj + ni
E(ln(τij ) | y, θ̂ , Zij = 1) = ln τ̂ij + ψ − ln ,
2 2
where

νj + ni ∂ ((νj + ni )/2) νj + ni
ψ = / .
2 ∂((νj + ni )/2) 2
Now, we are ready to fully describe our proposed ECM algorithm [8] for conducting maximum
likelihood estimation.
Initialization: Set k = 0; obtain some initial estimates of the parameters θ (0) , including πj(0) ,
β j , (0)
(0) (0)
j , νj , and σj
2(0)
, for j = 1, . . . , m.
8 X. Bai et al.
E-step: At (k + 1)th iteration, given θ = θ (k) , compute p(k+1)

ij , b(k+1)
ij , τij(k+1) and (k+1)
ij based
on Equations (11)–(14), respectively, for i = 1, . . . , I and j = 1, . . . , m. Subsequently, the four
components of the expected complete log-likelihood can be constructed from Equations (15)–
(18), respectively.
CM-step:
M-0: Obtain πj(k+1) , j = 1, . . . , m, by maximizing E(0 (π | y, b, τ , Z) | y, θ (k) ), with respect
to π,
1 (k+1)
I
πj(k+1) = p .
I i=1 ij
M-1: Given σj2 = σj2(k) , j = 1, . . . , m, obtain β (k+1) j , j = 1, . . . , m, by maximizing

E(1 (β, σ 2(k) | y, b, τ , Z) | y, θ (k) ) with respect to β,
I −1
τij(k+1) T −1
I
τij(k+1) T −1
β (k+1)
j = p(k+1)
ij Xi Ri Xi p(k+1)
ij Xi Ri (yi − Ui b(k+1)
ij ) .
i=1 σj2(k) i=1 σj2(k)
M-2: Given β j = β (k+1) j , j = 1, . . . , m, obtain σj2(k+1) , j = 1, . . . , m, by maximizing

E(1 (β (k+1) , σ 2 | y, b, τ , Z) | y, θ (k) ) with respect to σ 2 ,
"I (k+1) (k+1) T −1
2(k+1) i=1 pij {τij Eij Ri Eij + tr((k+1)
ij UTi R−1
i Ui )}
σj = "I (k+1)
,
i=1 pij ni
M-3: Obtain (k+1)

j , j = 1, . . . , m, by maximizing E(2 ( | y, b, τ , Z) | y, θ (k) ) with respect
to ,
"I (k+1) (k+1) (k+1) (k+1) T
(k+1) i=1 pij (τij bij (bij ) + (k+1)ij )
j = "I (k+1)
.
i=1 pij
M-4: Obtain νj(k+1) , j = 1, . . . , m, by maximizing E(3 (ν | y, b, τ , Z) | y, θ (k) ) with respect to

ν.
I
νj ν ν
νj(k+1) = p(k+1) (k+1)
+ E[ln(τij ) | y, θ (k) , Zij = 1]− τij
j j
arg max ij ln − ln .
νj 2 2 2
i=1
The problem is separable in each νj . Although these one-dimensional problems do not admit
explicit solutions, they can be solved by numerical optimization methods, e.g. the Newton–
Raphson algorithm or the secant method. However, we find that the above approach may not
be always stable, partly due to the high nonlinearity of the objective function. Alternatively,
we can replace M-4 by carrying out constrained estimation of the actual log-likelihood (7) with
respect to the unknown degrees of freedom parameters, with all the other parameters held fixed
at their currently updated values.[6]
M-4*: Obtain νj(k+1) , j = 1, . . . , m, by maximizing Equation (7) with respect to ν, with π =
π (k+1) , β = β (k+1) , = (k+1) , and σ 2 = σ 2(k+1) .
In the case that νj = ν, j = 1, . . . , m, it is convenient to use a profile likelihood approach to
avoid either M-4 or M-4* step entirely in the ECM algorithm, i.e. conduct maximum likelihood
estimation with ν held fixed, for a grid of ν values, say, ν = 1, . . . , 20, and then the final estimate
of the degrees of freedom is selected as the one that gives the largest log-likelihood.
In the M-step, we do not aim to fully maximize the expected log-likelihood, as it requires iter-
atively solving M-1 and M-2, which may be computationally inefficient. Nevertheless, solving
each of the five subproblems once in the M-step monotonically increases the expected log-
likelihood, which implies that the stable monotone convergence property of the ECM algorithm
is preserved. The E-step and M-step are carried out alternatingly, until convergence is reached,
i.e. the log-likelihood function (7) stops increasing up to a small tolerance value. Based on our
limited experience, the proposed algorithm works well in terms of both computational stability
and efficiency.
There are many ways to find the initial values for {πj(0) , β (0)
j ,σ
(0)
, j = 1, . . . , m}. One method
is to use trimmed likelihood estimates (TLE).[9] Note that the TLE is robust to both low leverage
and high leverage outliers under certain general conditions.[9] Another possible method is that
we first randomly partition the data or a subset of the data into m groups. For each group, we
use some robust regression method, such as the MM-estimate,[10] to estimate the component
regression parameters. Similar partition ideas have been used to find the initial values for finite
mixture models.[11] In addition, we can also apply the robust linear clustering method to find
the initial regression parameter values. See, e.g. Hennig [12,13] and García-Escudero et al.[14].
Note that though, technically, the robust linear clustering methods do not produce consistent
regression component estimators. But in many cases, they are close enough to provide good
initial values, since the proposed algorithm does not require the initial values to be consistent.
3. Simulation study
We generate data from the following model:

Xi β 1 + Ui bi1 + ei1 if Zi = 1;
yi =
Xi β 2 + Ui bi2 + ei2 if Zi = 2,
where i = 1, . . . , I, β 1 = (1, 1, 0, 0)T , β 2 = (0, 0, 1, 1)T , and π1 = P(Zi = 1) = 0.4. The rows of
the covariates Xi ∈ Rni ×4 are independently generated from N4 (0, I). The rows of Ui ∈ Rni ×2 are
independently generated from N2 (0, I).
We consider the following three types of random effects and error distributions.
(1) t distribution: eij ∼ tni (0, ij , ν), bij ∼ tq (0, j , ν), and given τij , bij and eij are conditionally
independent. That is, bij | τij ∼ N(0, (1/τij ) j ) and eij | τij ∼ N(0, (1/τij )ij ). We set ij as
an identity matrix and j as a diagonal matrix with diagonal elements 1 and off-diagonal
elements 0.5. We consider three degrees of freedom values, i.e. ν ∈ {1, 3, 5}.
(2) Normal distribution: eij ∼ Nni (0, ij ) and bij ∼ Nq (0, j ), where we set ij as an identity
matrix and j as a diagonal matrix with diagonal elements 1 and off-diagonal elements 0.5.
(3) Contaminated normal distribution: eij ∼ 0.95Nni (0, I) + 0.05Nni (0, 25I) and bij ∼
0.95Nq (0, I) + 0.05Nq (0, 25I).
We have experimented with various sample sizes and numbers of replicates per sample. In
particular, the following four cases are considered herein,
Case 1 ni = 8, I = 100.
Case 2 ni = 8, I = 200.
Case 3 ni = 4, I = 200.
Case 4 ni = 4, I = 400.
The simulation is replicated 500 times under each setting.
We mainly compare our proposed robust t-mixture method to the normal mixture mixed effect
approach. Our implemented EM algorithm can be readily modified to fit the normal mixture
model; a more straightforward approach is to use our t-mixture EM algorithm with the degrees
10 X. Bai et al.
of freedom parameters held fixed at large number values, say, 1000, so that the t distribution
becomes very close to normal. Similar to Bordes et al. [15] and Hunter and Young,[16] we
use the true parameter values as the initial values to start the EM algorithm, in order to avoid
the possible bias introduced by different starting values among replications or label switching
issues,[17–20] so as to compare the ‘best-case’ results of the various estimation methods. The
degrees of freedom estimates in the t-mixture model is determined based on the aforementioned
profile likelihood approach.
In Table 1, we report the average degrees of freedom estimates using the t-mixture model
under the aforementioned five mixed effect and error structures. As the tail of the assumed mixed
effect and error distribution becomes heavier, the estimated degrees of freedom becomes smaller
Table 1. Degrees of freedom estimation results, based on 500 simulation runs.
Estimated degrees of freedom

#replicates #subjects t1 t3 t5 Normal Contaminated normal
ni = 8 I = 100 1.605 5.665 6.716 12.46 4.839

I = 200 1.605 3.832 9.868 12.46 4.133
ni = 4 I = 200 2.575 6.149 7.199 10.65 5.665
I = 400 1.488 5.252 7.766 12.46 3.015
Table 2. Simulation results for Case 1: ni = 8, I = 100.
Random effects and error distribution

Estimator t1 t3 t5 Normal Contaminated N.
π̂1 Abias(NMM) 0.453 0.166 0.082 0.101 0.158

Abias(tMM) 0.030 0.086 0.082 0.102 0.098
Efficiency 3.063 1.400 1.000 1.000 1.556
β̂11 Abias(NMM) 0.495 0.469 0.248 0.001 0.192
Abias(tMM) 0.325 0.022 0.001 0.000 0.032
Efficiency 1.345 17.667 1.667 1.000 2.500
β̂21 Abias(NMM) 0.495 0.478 0.255 0.006 0.187
Abias(tMM) 0.317 0.032 0.000 0.007 0.033
Efficiency 1.292 15.714 1.250 1.000 3.000
β̂31 Abias(NMM) 0.485 0.481 0.248 0.002 0.189
Abias(tMM) 0.319 0.033 0.001 0.003 0.032
Efficiency 1.165 11.750 1.200 1.000 2.500
β̂41 Abias(NMM) 0.470 0.483 0.248 0.004 0.195
Abias(tMM) 0.329 0.039 0.004 0.006 0.039
Efficiency 1.380 14.750 1.250 1.000 3.000
β̂12 Abias(NMM) 4.652 0.609 0.254 0.004 0.185
Abias(tMM) 0.304 0.034 0.006 0.003 0.034
Efficiency 92.600 1.280 1.250 1.000 2.750
β̂22 Abias(NMM) 12.377 0.508 0.248 0.000 0.186
Abias(tMM) 0.315 0.044 0.001 0.000 0.032
Efficiency 92.600 3.750 1.500 1.000 3.000
β̂32 Abias(NMM) 3.722 0.387 0.250 0.001 0.207
Abias(tMM) 0.289 0.038 0.001 0.001 0.037
Efficiency 89.321 3.400 2.000 1.000 3.000
β̂42 Abias(NMM) 5.316 0.567 0.258 0.005 0.186
Abias(tMM) 0.318 0.041 0.005 0.005 0.033
Efficiency 72.457 3.400 1.200 1.000 3.000

π̂1 Abias(NMM) 0.484 0.170 0.090 0.099 0.157

Abias(tMM) 0.094 0.101 0.090 0.099 0.105
Efficiency 3.907 1.000 1.000 1.000 1.273
β̂11 Abias(NMM) 0.514 0.440 0.145 0.002 0.167
Abias(tMM) 0.015 0.001 0.002 0.003 0.003
Efficiency 1.914 20.500 2.000 1.000 2.000
β̂21 Abias(NMM) 0.498 0.444 0.148 0.003 0.163
Abias(tMM) 0.014 0.000 0.000 0.002 0.000
Efficiency 1.942 20 1.500 0.500 2.500
β̂31 Abias(NMM) 0.524 0.442 0.149 0.000 0.168

Abias(tMM) 0.012 0.003 0.003 0.001 0.007
Efficiency 1.652 11.333 1.500 1.000 2.500
β̂41 Abias(NMM) 0.498 0.440 0.147 0.004 0.165
Abias(tMM) 0.017 0.005 0.002 0.003 0.003
Efficiency 1.748 20.000 2.000 1.000 2.000
β̂12 Abias(NMM) 32.976 0.429 0.141 0.002 0.194
Abias(tMM) 0.013 0.001 0.000 0.002 0.004
Efficiency 250.097 4.000 2.00 1.000 3.500
β̂22 Abias(NMM) 12.544 0.456 0.143 0.003 0.180
Abias(tMM) 0.019 0.001 0.004 0.004 0.001
Efficiency 207.036 4.000 2.000 1.000 4.000
β̂32 Abias(NMM) 4.837 0.440 0.143 0.000 0.200
Abias(tMM) 0.017 0.003 0.002 0.001 0.005
Efficiency 175.029 4.000 2.000 0.500 4.500
β̂42 Abias(NMM) 25.269 0.428 0.144 0.003 0.176
Abias(tMM) 0.009 0.000 0.000 0.004 0.001
Efficiency 204.545 4.500 1.500 1.000 4.000
on average as expected. Therefore, the proposed approach captures the tail behaviour of the
mixed effect and error distributions quite well.
In Tables 2–5, we report the absolute bias of parameter estimates and the relative efficien-
cies of our proposed t-mixture method as compared to the conventional normal mixture model
based on median squared errors (MedSE). In Figures 1–2, we also show the MedSE for some of
parameter estimates for cases 1 and 2. Our t-mixture approach works very well and consistently
outperforms the normal mixture model when the random effects and error distributions are of
heavy tail or are contaminated by outliers. Even when the random effects and the error terms
follow normal distribution, the performance of the t-mixture model is comparable to that of the
normal mixture model. This is essentially because the latter method can be treated as a special
case of our proposed robust t-mixture model, and thus the efficiency loss is minimal when no
outlier presents in the data. When the true model has t-distributed random effects and errors, the
relative efficiency estimates may be very high. This is because the normal mixture model may
fail miserably when applied to heavy-tailed Cauchy or close-to-Cauchy distributions.
4. Lung growth data analysis
We consider a longitudinal data set on lung growth of girls, from a study of air pollution and
health in six cities across the USA, see [21] for the details of the study. Here, we focus on
12 X. Bai et al.

π̂1 Abias(NMM) 0.441 0.299 0.110 0.102 0.297

Abias(tMM) 0.045 0.086 0.110 0.102 0.091
Efficiency 2.633 3.000 1.000 1.000 8.500
β̂11 Abias(NMM) 2.195 0.465 0.215 0.006 0.302
Abias(tMM) 0.123 0.000 0.012 0.004 0.020
Efficiency 1.000 2.130 1.167 2.000 19.000
β̂21 Abias(NMM) 10.237 0.462 0.217 0.003 0.300
Abias(tMM) 0.119 0.003 0.008 0.005 0.022
Efficiency 0.927 20.100 1.200 1.500 5.567
β̂31 Abias(NMM) 27.226 0.454 0.209 0.000 0.300

Abias(tMM) 0.122 0.003 0.007 0.002 0.019
Efficiency 1.008 21.556 1.400 1.500 21.200
β̂41 Abias(NMM) 0.269 0.462 0.221 0.004 0.296
Abias(tMM) 0.124 0.004 0.016 0.001 0.018
Efficiency 1.029 25.375 1.500 2.000 18.833
β̂12 Abias(NMM) 13.042 0.647 0.207 0.006 0.302
Abias(tMM) 0.135 0.018 0.003 0.005 0.018
Efficiency 29.249 30.455 1.400 2.000 10.750
β̂22 Abias(NMM) 13.553 0.778 0.219 0.001 0.281
Abias(tMM) 0.136 0.011 0.005 0.002 0.021
Efficiency 37.756 23.222 1.167 2.000 12.000
β̂32 Abias(NMM) 13.130 0.068 0.209 0.003 0.303
Abias(tMM) 0.153 0.010 0.003 0.003 0.021
Efficiency 46.691 30.000 1.400 2.000 11.250
β̂42 Abias(NMM) 24.677 0.674 0.202 0.002 0.239
Abias(tMM) 0.150 0.016 0.006 0.001 0.003
Efficiency 40.456 30.556 1.333 1.500 8.750
the records gathered from Topeka, Kansas. The lung growth status of 300 girls in Topeka were
tracked. Most of them were enrolled in the first or second grade and between the ages of six
and seven, and measurements of participants were obtained annually until graduation from high
school or loss to follow-up.[21] We have omitted the subjects with only one record, and now the
number of observations gathered on each of the remaining 252 subjects over time ranges from 2
to 12.
We use the logarithmic forced expiratory volume in one second (fev1) as the response variable.
Specially, the response variable measures the volume of air that can be forcibly exhaled from the
lungs in the first second of a forced expiratory manoeuvre, and it is critically important in the
diagnosis of obstructive and restrictive diseases and is a commonly used measure of lung function
from the pulmonary function tests. The smallest response variable is −0.041, the largest response
variable is 1.595, and the sample standard deviation is 0.325. We are interested in modelling the
lung growth pattern over time, and thus the age variable is used as both the fixed-effect covariate
and the random-effect covariate. It is also of great interest to investigate whether the subjects
form several distinct clusters or groups that exhibit different behaviours on lung growth. We thus
fit the data based on the traditional normal mixture of linear mixed models and the proposed
robust t-mixture of linear mixed models. Following Heinzl and Tutz,[22] a three-component
mixture model is used.
Table 6 shows the estimated parameters. β̂0i ’s and β̂1i ’s are relative intercepts and slopes in
the mixture of linear mixed model with response ‘fev1’ and predictor ‘age’. Following the three
Random Effects and Error Distribution

π̂1 Abias(NMM) 0.492 0.327 0.100 0.102 0.334

Abias(tMM) 0.079 0.090 0.105 0.101 0.101
Efficiency 5.045 3.907 1.000 1.000 22.400
β̂11 Abias(NMM) 0.819 0.492 0.171 0.001 0.326
Abias(tMM) 0.017 0.000 0.001 0.001 0.004
Efficiency 3.313 1.914 1.167 1.000 88.500
β̂21 Abias(NMM) 0.272 0.493 0.173 0.003 0.325
Abias(tMM) 0.022 0.003 0.001 0.006 0.002
Efficiency 3.333 1.942 1.200 1.000 87.000
β̂31 Abias(NMM) 0.723 0.498 0.174 0.004 0.327

Abias(tMM) 0.025 0.003 0.000 0.002 0.006
Efficiency 3.532 1.652 1.400 1.000 90.500
β̂41 Abias(NMM) 0.212 0.492 0.170 0.003 0.327
Abias(tMM) 0.023 0.007 0.001 0.004 0.006
Efficiency 3.680 1.748 1.600 1.000 90.000
β̂12 Abias(NMM) 1.197 0.471 0.177 0.001 0.305
Abias(tMM) 0.022 0.003 0.000 0.004 0.003
Efficiency 619.000 250.097 1.400 1.000 21.000
β̂22 Abias(NMM) 2.438 0.503 0.184 0.001 0.336
Abias(tMM) 0.029 0.004 0.001 0.002 0.003
Efficiency 711.120 207.036 1.167 1.000 29.500
β̂32 Abias(NMM) 3.178 0.492 0.183 0.002 0.312
Abias(tMM) 0.024 0.003 0.001 0.004 0.003
Efficiency 427.900 175.029 1.400 1.000 21.500
β̂42 Abias(NMM) 0.344 0.431 0.175 0.002 0.303
Abias(tMM) 0.024 0.000 0.004 0.001 0.001
Efficiency 621.8 205.545 1.333 2.000 20.500
components in the original paper, we got β̂0i , β̂1i , where i = 1, 2, 3, and two probabilities π̂ ,
where π̂3 = 1 − π̂1 − π̂2 . Based on the profile likelihood approach, the degrees of freedom of
the t-mixture model is estimated to be ν̂ = 28, which is quite large. This result suggests that the
random effects and the errors may be approximately normally distributed in this application. To
test our robust estimation approach, however, we add some artificial outliers for some arbitrarily
selected subjects in the data set and refit the t-mixture model. Using contaminated data sets with
outliers in one subject (add 10 to the response (fev1) values in the first subject), the estimated
degrees of freedom is ν = 9, and using the contaminated data sets with outliers in two subjects
(add 10 to the response (fev1) values in the first and second subjects), the estimate becomes ν =
6. The decrease in the estimated degrees of freedom as the number of outliers increases clearly
demonstrates the robustness of the proposed approach. In addition, compared to the estimates
of the traditional normal mixture of linear mixed models, the parameter estimates for the new
method does not change much when the outliers are added into the data set.
Our analysis based on original data reveals some interesting cluster structure. In Figure 3,
three distinct groups can be clearly distinguished by the intercept and slope estimates bases
on the mixed effects. Girls assigned to different clusters are marked with different colours and
symbols. It appears that cluster 1 (blue, triangle) consists of the girls who had initial low-level
lung function and then experienced relatively fast lung growth to their adulthood. In contrast,
cluster 2 (red, circle) consists of the girls who had relatively high level of initial lung development
and then experienced relatively slow lung growth to their adulthood. Cluster 3 (black, cross) is
14 X. Bai et al.
MedSE’s of Case 1: r = 8, I = 100
MedSE’s of beta11 MedSE’s of beta21
0.00 0.10 0.20
0.00 0.10 0.20

MedSE
MedSE
t1 t3 t5 N CN t1 t3 t5 N CN
5 conditions 5 conditions

0.00 0.10 0.20
0.00 0.10 0.20

MedSE
MedSE
t1 t3 t5 N CN t1 t3 t5 TN
Figure 1. The median squared errors of Case 1. Solid line is for the t-mixture method and dashed line is for the normal
mixture method. The five conditions refer to five scenarios of the random effects and error distributions, i.e. t1 , t3 , t5 ,
normal, and contaminated normal, respectively.
MedSE’s of Case 2: r=8, I=200

0.00 0.10 0.20 0.30
0.00 0.10 0.20 0.30

MedSE
MedSE

0.00 0.10 0.20 0.30
0.00 0.10 0.20 0.30

MedSE
MedSE
Figure 2. The median squared errors of Case 2. All the settings are the same as in Figure 1.
Table 6. Estimation results, with bootstrapped standard errors in subscripts, for the Topeka girls lung
function data analysis.
Original With 1 outlier With 2 outliers

t28 Normal t9 Normal t6 Normal
π̂1 0.2480.081 0.2350.045 0.2810.015 0.5690.178 0.2970.178 0.1960.148

π̂2 0.6880.077 0.7040.040 0.6520.195 0.4020.136 0.6300.249 0.7650.194
β̂01 −0.0100.328 −0.0100.129 −0.0410.673 −0.1260.566 −0.0560.553 −0.1750.305
β̂11 0.0740.028 0.0740.010 0.0740.062 0.0830.030 0.0750.048 0.0830.011
β̂02 −0.3500.090 −0.3410.080 −0.3610.369 −0.3360.228 −0.3680.389 −0.2740.288
β̂12 0.0920.010 0.0910.008 0.0930.042 0.0900.028 0.0930.044 0.0870.032
β̂03 −0.3070.093 −0.2960.074 −0.2930.477 −0.4180.229 −0.2790.540 −0.3350.362
β̂13 0.0740.010 0.0730.007 0.0740.050 0.0900.025 0.0750.055 0.0880.045
0.11
0.10
0.09
Slope
0.08
0.07
0.06
−0.6 −0.4 −0.2 0.0 0.2

Intercept
Figure 3. Cluster patten revealed by the t-mixture model based on the estimated intercept and slope parameters of the
mixed effects.
the smallest cluster of the three, which appears to consist of the girls who had relatively low level
of initial lung development and also experienced relatively slow lung growth over time.
5. Discussion
In this article, we have proposed a robust mixture linear mixed model, using multivariate t dis-
tribution to robustify the model estimation and inference. An ECM algorithm is proposed to
maximize the mixture likelihood. The simulation study and real data application demonstrated
that the proposed method has comparable performance to normal-based method when there are
no outliers but has much better performance when the error has heavy tail or there are outliers.
However, based on our limited empirical experience, the estimates of degrees of freedom are
not very accurate. Its consistency might require large sample size. Although Hennig [23] and
Yao et al. [24] pointed out that the mixture of t distribution has a very small breakdown point,
16 X. Bai et al.
the breakdown occurs only when the outliers are very extreme. Therefore, the t distribution has
been widely used to provide a robust estimation for mixture models.[25]
It is of interest to extend the proposed model to other distributions that possess certain robust-
ness properties, for example, mixture of Laplace distributed mixed effects and random errors.[26]
Note that the proposed method can only handle moderate outliers in the y direction and is not
robust to outliers in the x direction. It is worthwhile to apply the trimmed likelihood idea to the
mixture linear mixed model set-ups. The recently developed penalized estimation approaches
may also be adopted to directly capture and accommodate potential outliers.
Acknowledgments
The authors thank the editor, the associate editor, and reviewers for their constructive comments that have led to a
dramatic improvement of the earlier version of this article.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
Yao’s research is supported by NSF grant DMS-1461677.
References
[1] Celeux G, Martin O, Lavergne Ch. Mixture of linear mixed models for clustering gene expression profiles from
repeated microarray experiments. Statist Model. 2005;5:1–25.
[2] Ng SK, Mclachlan GJ, Wang K, Jones LB-T, Ng S-W. A mixture model with random-effects components for
clustering correlated gene-expression profiles. Bioinformatrics. 2006;22:1745–1752.
[3] Yau KKW, Lee AH, Ng SK. Finite mixture regression model with random effects: application to neonatal hospital
length of stay. Comput Stat Data Anal. 2002;41:359–366.
[4] Lange KL, Little RJA, Taylor JMG. Robust statistical modeling using the t distribution. J Amer Statist Assoc.
1989;84:881–896.
[5] Welsh AH, Richardson AM. Chapter 13, Approaches to the robust estimation of mixed models. In: Maddala, GS,
Rao, CR, editors. Handbook of statistics, Vol. 15. Amsterdam: Elsevier Science; 1997. p. 343–384.
[6] Pinheiro JC, Liu CHH, Wu YN. Efficient algorithms for robust estimation in linear mixed-effects models using the
multivariate t-distribution. J Comput Graph Stat. 2001;10(2):249–276.
[7] Khodabina M, Alireza A. Some properties of generalized gamma distribution. Math Sci. 2010;4:9–28.
[8] Meng X-L, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika.
1993;80:267–278.
[9] Neykov N, Filzmoser P, Dimova R, Neytchev P. Robust fitting of mixtures using the trimmed likelihood estimator.
Comput Stat Data Anal. 2007;52:299–308.
[10] Yohai VJ. High Breakdown-point and high efficiency robust estimates for regression. Ann Statist. 1987;15:642–656.
[11] McLachlan GJ, Peel D. Finite mixture models. New York: Wiley; 2000.
[12] Hennig C. Fixed point clusters for linear regression: computation and comparison. J Classif. 2002;19:249–276.
[13] Hennig C. Clusters, outliers, and regression: fixed point clusters. J Multivariate Anal. 2003;86:183–212.
[14] Garcia-Escudero LA, Gordaliza A, San Martin R, Van Aelst S, Zamar R. Robust linear clustering. J Roy Stat Soc
B. 2009;71:301–318.
[15] Bordes L, Chauveau D, Vandekerkhove P. A stochastic EM algorithm for a semiparametric mixture model. Comput
Stat Data Anal. 2007;51:5429–5443.
[16] Hunter DR, Young DS. Semiparametric mixtures of regressions. J Nonparametric Stat. 2012;24(1):19–38.
[17] Celeux G, Hurn M, Robert CP. Computational and inferential difficulties with mixture posterior distributions. J
Amer Statist Assoc. 2000;95:957–970.
[18] Stephens M. Dealing with label switching in mixture models. J R Statist Soc. 2000;B62:795–809.
[19] Yao W. Bayesian mixture labeling and clustering. Commun Stat – Theory Methods. 2012;41:403–421.
[20] Yao W, Lindsay BG. Bayesian mixture labeling by highest posterior density. J Amer Statist Assoc. 2009;104:758–
767.
[21] Dockery DW, Berkery CS, Ware JH, Speizer FE, Ferris BG. Distribution of fvc and fev1 in children 6 to 11 years
old. Amer Rev Respir Dis. 1983;128:405–12.
[22] Heinzl F, Tutz G. Clustering in linear mixed models with approximate Dirichlet process mixtures using EM
algorithm. Statist Model. 2013;13:41–67.
[23] Hennig C. Breakdown points for maximum likelihood-estimators of locationscale mixtures. Ann Statist.
2004;32:1313–1340.
[24] Yao W, Wei Y, Yu C. Robust mixture regression using the t-distribution. Comput Stat Data Anal. 2014;71:116–127.
[25] Peel D, McLachlan GJ. Robust mixture modelling using the t distribution. Stat Comput. 2000;10:339–348.
[26] Song W, Yao W, Xing Y. Robust mixture regression model fitting by laplace distribution. Comput Stat Data Anal.
2014;71:128–137.

Bai 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bai 2015

Uploaded by

Copyright:

Available Formats

This article was downloaded by: [Universiteit Leiden / LUMC]

On: 28 April 2015, At: 04:27

Journal of Statistical Computation and

Mixture of linear mixed models using

To link to this article: http://dx.doi.org/10.1080/00949655.2015.1036431

PLEASE SCROLL DOWN FOR ARTICLE

Mixture of linear mixed models using multivariate t distribution

Riverside, CA 92521, USA

(Received 18 December 2014; accepted 28 March 2015)

The classical linear mixed model can be expressed as

where y is an N × 1 response vector, X is an N × p design matrix for the fixed effects, β is

*Corresponding author. Email: weixin.yao@ucr.edu

© 2015 Taylor & Francis

yi = Xi β j + Ui bij + eij , (2)

where Xi ∈ Rni ×p is the fixed-effect covariate matrix, β j ∈ Rp a fixed-effect coefficient vector,

bij ∼ Nq (0, j ), eij ∼ Nni (0, ij ),

and the mixture distribution of yi itself, without observing Zi , is

2. Robust t-mixture linear mixed models

2.1. The t-mixture of linear mixed models

and given the observed data for i = 1, . . . , I, the log-likelihood function is

2.2. An efficient ECM algorithm for maximum likelihood estimation

Let y = {y1 , . . . , yI }, b = {bij ; i = 1, . . . , I, j = 1, . . . , m}, and τ = {τij ; i = 1, . . . , I, j =

and Z = {Zij ; i = 1, . . . , I, j = 1, . . . , m}. Similarly, let π = {πj ; j = 1, . . . , m}, β = {β j ; j =

It follows that the complete log-likelihood function is

pij = E(Zij = 1 | θ = θ̂ , y),

From Equation (6), it is easy to show that

By Equation (10), we have

where A = j UTi (Ui j UTi + σj2 Ri )−1 . It follows that

With all the above results, we have

E-step: At (k + 1)th iteration, given θ = θ (k) , compute p(k+1)

M-1: Given σj2 = σj2(k) , j = 1, . . . , m, obtain β (k+1) j , j = 1, . . . , m, by maximizing

M-2: Given β j = β (k+1) j , j = 1, . . . , m, obtain σj2(k+1) , j = 1, . . . , m, by maximizing

M-3: Obtain (k+1)

M-4: Obtain νj(k+1) , j = 1, . . . , m, by maximizing E(3 (ν | y, b, τ , Z) | y, θ (k) ) with respect to

We generate data from the following model:

Table 1. Degrees of freedom estimation results, based on 500 simulation runs.

Estimated degrees of freedom

ni = 8 I = 100 1.605 5.665 6.716 12.46 4.839

Table 2. Simulation results for Case 1: ni = 8, I = 100.

Random effects and error distribution

π̂1 Abias(NMM) 0.453 0.166 0.082 0.101 0.158

Table 3. Simulation results for Case 2: ni = 8, I = 200.

Random effects and error distribution

π̂1 Abias(NMM) 0.484 0.170 0.090 0.099 0.157

β̂31 Abias(NMM) 0.524 0.442 0.149 0.000 0.168

4. Lung growth data analysis

Table 4. Simulation results for Case 3: ni = 4, I = 200.

Random effects and error distribution

π̂1 Abias(NMM) 0.441 0.299 0.110 0.102 0.297

β̂31 Abias(NMM) 27.226 0.454 0.209 0.000 0.300

Table 5. Simulation results for Case 4: ni = 4, I = 400.

Random Effects and Error Distribution

π̂1 Abias(NMM) 0.492 0.327 0.100 0.102 0.334

β̂31 Abias(NMM) 0.723 0.498 0.174 0.004 0.327

MedSE’s of Case 1: r = 8, I = 100

MedSE’s of beta11 MedSE’s of beta21

0.00 0.10 0.20

0.00 0.10 0.20

MedSE’s of beta31 MedSE’s of beta41

0.00 0.10 0.20

MedSE’s of Case 2: r=8, I=200

MedSE’s of beta11 MedSE’s of beta21

0.00 0.10 0.20 0.30

MedSE’s of beta31 MedSE’s of beta41

bij ∼ Nq (0, j ), eij ∼ Nni (0, ij ),

M-4: Obtain νj(k+1) , j = 1, . . . , m, by maximizing E(3 (ν | y, b, τ , Z) | y, θ (k) ) with respect to