You are on page 1of 16

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.

Published by Blackwell Publishers Ltd, 108 Cowley Road,


Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 27: 305±320, 2000

Multivariate Dispersion Models Generated


From Gaussian Copula
PETER XUE-KUN SONG
York University

ABSTRACT. In this paper a class of multivariate dispersion models generated from the
multivariate Gaussian copula is presented. Being a multivariate extension of Jùrgensen's
(1987a) dispersion models, this class of multivariate models is parametrized by marginal
position, dispersion and dependence parameters, producing a large variety of multivariate
discrete and continuous models including the multivariate normal as a special case. Proper-
ties of the multivariate distributions are investigated, some of which are similar to those of
the multivariate normal distribution, which makes these models potentially useful for the
analysis of correlated non-normal data in a way analogous to that of multivariate normal
data. As an example, we illustrate an application of the models to the regression analysis of
longitudinal data, and establish an asymptotic relationship between the likelihood equation
and the generalized estimating equation of Liang & Zeger (1986).

Key words: copula, dependence, dispersion model, generalized estimating equation, general-
ized linear model, longitudinal data, regression, small-dispersion asymptotics

1. Introduction
Dispersion models, introduced ®rst by Jùrgensen (1987a) as the class of error distributions
for the generalized linear models, has drawn a lot of attention in the literature. The
dispersion models contain many commonly used distributions such as normal, Poisson,
gamma, binomial, negative binomial, inverse Gaussian, compound Poisson, von Mises and
simplex distributions (Barndorff-Nielsen & Jùrgensen, 1991). The dispersion models have
properties similar to those of the normal distribution, which makes the class of models
useful in many statistical areas. See Jùrgensen (1997) for details.
The attempt to identify a multivariate extension of the dispersion models has long been
of interest in the literature. An early version of such an extension was proposed by
Jùrgensen (1987a), where his class of models has only a single parameter available for
modelling the correlation structure, and hence has very limited ¯exibility for the use in
multivariate analysis. A recent development of the multivariate extension was discussed by
Jùrgensen & Lauritzen (1998) in which the density of the multivariate dispersion model is
de®ned in a similar form to that of the multivariate normal. However, their models are not
marginally closed in the sense that the marginal distributions may not be, in general,
in the given distribution class. This drawback limits the use of these distributions in, for
instance, regression analysis.
For the regression analysis of multivariate data, the pattern of marginal means is often of
interest and expected to be explicitly modelled as a function of covariates (explanatory
variables). This virtually gives rise to the development of multivariate distributions that can
provide the given margins. The focus of this paper is on the application of Sklar's (1959) copula
approach to a multivariate construction of the dispersion models, which leads to a class of
multivariate models that are marginally closed.
Constructing multivariate distributions by means of copulas has proved popular in recent
years; see for example Joe (1993, 1997), Hutchinson & Lai (1990), and Marshall & Olkin
306 P. X.-K. Song Scand J Statist 27

(1988). The motivation for the copula approach is probably rooted in the aim of forming
multivariate non-normal distributions by combining given non-normal marginal models, in
a certain way, with dependence patterns. Being a multivariate joint distribution which
contains only information regarding dependence, a copula produces new multivariate
distributions whenever new suitable margins are fed into it. Amongst several types of
copulas available in the literature, we are particularly interested in the multivariate
Gaussian copula in this paper, which is ``extracted'' from the multivariate normal
distribution, say N m (ì, Ã) where à ˆ (ã ij ) is a Pearson correlation matrix. It is noted
that in the Gaussian copula the correlation matrix à is responsible for the dependence, the
values of which may vary between ÿ1 and 1, accommodating both negative and positive
dependence. Combining the copula with dispersion model margins, a large variety of
multivariate distributions are produced in a uni®ed fashion, including both continuous and
discrete distributions such as multivariate gamma, multivariate binomial and multivariate
Poisson.
The copula approach to generating multivariate distributions gives rise to a question of
interpreting the dependence matrix à in new multivariate distributions. It turns out that
in continuous models its (i, j)th entry, ã ij , gives a non-linear dependence measurement
for the (i, j)th pair of components in the sense of the normal scoring í introduced in
the present paper. This dependence measure can also be extended for use in discrete
models as shown in both binomial and Poisson models that are discussed by this paper
in detail.
This Gaussian copula approach makes the multivariate models constructed appear to have
many properties similar to the multivariate normal distribution. For instance, the model is
``reproducible'' in the sense that its subvector has a distribution of the same form as that of its
full vector. This property is not satis®ed by the log-linear model of Bishop et al. (1975) nor the
quadratic exponential model of Zhao & Prentice (1990) in the binary variable case. Also
because the parametrization of our model is similar to that of the multivariate normal, the
constraint between the correlations and the marginal means is rather simple, circumventing the
drawback of the Bahadur's representation (see Bahadur, 1961) for the binary case (see Diggle
et al., 1994, p. 149).
It is shown that Jùrgensen & Lauritzen's (1998) multivariate dispersion models effectively
turn out to be the limiting distributions of our multivariate dispersion models for small
dispersion parameters. This result links the two classes of multivariate extensions of dispersion
models, and it helps us to understand the multivariate generalization of dispersion models from
a different point of view.
The class of models are ®nally applied to the longitudinal regression model where
response variables are assumed to follow multivariate exponential dispersion models. We
study the maximum likelihood estimation for regression parameters under the assumed joint
density, and show that Liang & Zeger's generalized estimating equation turns out to be an
approximate version of our likelihood equation. This result might be regarded as a comple-
ment to the result obtained by Fitzmaurice et al. (1993), where they proved likelihood
equations for binary regression parameters are of exactly the same form as the generalized
estimating equation (GEE) for binary longitudinal responses. Theoretically, our result gives
an asymptotic justi®cation for the GEE approach in a wider range of distributions than the
binary case.
This paper is organized as follows. Section 2 discusses the construction of multivariate
dispersion models, and then gives three examples: binomial, Poisson and gamma. Some
properties are studied in section 3, and section 4 illustrates an application of the multivariate
models to a longitudinal regression analysis.

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 307

2. Multivariate dispersion models


After a brief review for copula, we give a construction of the multivariate dispersion
models, and give some examples.

2.1. Copula
Let uÿS be a subvector of u ˆ (u1 , . . ., u m )T with those components indicated by the set S
being omitted, where S is a subset of the indices f1, . . ., mg. A mapping C:
(0, 1) m ! (0, 1) is called a copula if (1) it is a continuous distribution function; and (2)
each margin is a univariate uniform distribution, namely
lim C(u) ˆ u i , u i 2 (0, 1)
uÿi !1

where the limit is taken under u j ! 1, 8 j 6ˆ i. Clearly, lim u j !0 C(u) ˆ 0, for any
j ˆ 1, . . ., m. It is easy to prove that for any subset S, the marginal obtained by
limuÿS !1 C(u) is also a copula. Copulas are easy to construct from a given multivariate
distribution.
If X ˆ (X 1 , . . ., X m )T  H where H is an m-dimensional distribution function with margins
H 1 , . . ., H m , then the copula is of the form
CH (u1 , . . ., u m ) ˆ Hf H ÿ1 ÿ1
1 (u1 ), . . ., H m (u m )g, u i 2 (0, 1), i ˆ 1, . . ., m,
provided that the marginal inverse distribution functions H ÿ1
i of H i exist. An important
special case (which is the focus of this paper) is obtained when X  N m (0, Ã) with
standardized margins and H i  Ö. The m-dimensional Gaussian copula is denoted by
CÖ (ujÃ), and its density is given by
   
1 1 1
cÖ (ujÃ) ˆ jÃjÿ1=2 exp ÿ qT à ÿ1 q ‡ qT q ˆ jÃjÿ1=2 exp qT (I m ÿ à ÿ1 )q (1)
2 2 2
where q ˆ (q, . . ., q m )T with normal scores q i ˆ Öÿ1 (ui ), i ˆ 1, . . ., m.
Figure 1 shows four contour plots of bivariate density functions of bivariate Gaussian copulas
with different values of the correlation parameter ã, namely ÿ0:9, ÿ0.5, 0.5 and 0.9. The
Gaussian copulas with negative values of ã turn out to be concentrated in an opposite direction
in relation to those with positive values of ã, re¯ecting, respectively, the negative correlation
and positive correlation between variables u1 and u2, as desired.
It is shown in Joe (1997, sect. 5.1) that the bivariate Gaussian copula attains the lower FreÂchet
bound maxf0, u1 ‡ u2 ÿ 1g, independence, or the upper FreÂchet bound minfu1 , u2 g, according
to the values of the corresponding correlation parameter equal to ÿ1, 0, or 1.

2.2. Construction
By complementing the copula CH with given margins, say F1 , . . ., Fm, a new multivariate
distribution can be obtained by
G(y) ˆ CH fF1 ( y1 ), . . ., Fm ( y m )g: (2)
One of its properties is that the ith margin of G gives the orginal Fi , namely the
distribution is marginally closed.
A class of m-variate multivariate dispersion models is obtained by (2) when copula
CH  CÖ (:jÃ) and margins Fi s are dispersion models. The multivariate dispersion models,
denoted by MDM m (ì, ó2 , Ã), are parametrized by three sets of parameters, ì ˆ ( ì1 , . . ., ì m )T ,

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


308 P. X.-K. Song Scand J Statist 27

Fig. 1. Four contours of bivariate Gaussian copula densities with different values of ã.

the vector of position parameters, ó2 ˆ (ó 21 , . . ., ó 2m )T , the vector of dispersion parameters, and


Ã, the correlation matrix. It is noted that the ith dispersion model (DM) margin is parametrized
by ( ì i , ó 2i ), and its density function is given by
 
1
f ( yi ; ì i , ó 2i ) ˆ a( yi ; ó 2i ) exp ÿ 2 d( yi ; ì i ) (3)
2ó i

where d is the regular unit deviance. Exponential dispersion (ED) models and proper
dispersion (PD) models are two important special classes of dispersion models given,
respectively, as follows. When d in (3) takes the form of
d( yi ; ì i ) ˆ yi æ1 ( ì i ) ‡ æ2 ( ì i ) ‡ æ3 ( yi )
for suitable functions æ1 , æ2 and æ3, the dispersion model is indeed an exponential
dispersion model with mean ì i and dispersion parameter ó 2i . If a in (3) is factorized to be
the form of b(ó 2i )c( yi ), the dispersion model becomes a proper dispersion model. See
Jùrgensen (1997, sect. 1.2) for more details.
In parallel, we obtain the multivariate exponential dispersion model, and the multivariate
proper dispersion models, denoted by MED m (ì, ó2 , Ã) and MPD m (ì, ó2 , Ã), respectively,
when the corresponding margins are used in construction.
When marginal models are continuous, a multivariate dispersion model can be equivalently
de®ned by the density of the following form
g(y; ì, ó2 , Ã) ˆ cÖ fF1 ( y1 ), . . ., Fm ( ym )jÃg f ( y1 ; ì1 , ó 21 )    f ( ym ; ì m , ó 2m ): (4)

Consequently we obtain a large class of continuous multivariate models including multi-


variate gamma, multivariate inverse Gaussian, multivariate von Mises, and multivariate simplex
distribution.
When marginal models are discrete, a multivariate probability function is obtained by taking
Radon±Nikodym derivative for G(y) in (2) with respect to the counting measure,

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 309

X
2 X
2
g(y) ˆ P(Y1 ˆ y1 , . . ., Ym ˆ ym ) ˆ  (ÿ1) j1 ‡‡ j m CÖ (u1 j1 , . . ., u mj m jÃ) (5)
j1 ˆ1 j m ˆ1

where u i1 ˆ Fi ( yi ) and u i2 ˆ Fi ( yi ÿ). Here Fi ( yi ÿ) is the left-hand limit of Fi at yi ,


which is equal to Fi ( yi ÿ 1) when the support of Fi is an integer set such as the case of
Poisson or binomial. Some examples are exhibited in section 2.3.
It follows immediately from the construction that the components Y1 , . . ., Y m are mutually
independent if matrix à is the identity matrix I m . It is easy to see that if marginal distributions
Fi are continuous, then
[Öÿ1 fF1 (Y1 )g, . . ., Öÿ1 fFm (Ym )g]  N m (0, Ã):
It follows obviously that
d:f :
ã ij ˆ corr[Öÿ1 fFi (Yi )g, Öÿ1 fFj (Yj )g] ˆ í(Yi , Yj ),

where ã ij is the Pearson correlation between two normal scores, measuring the dependence
between Yi and Yj based on a monotone non-linear transformation. We shall refer to this
non-linear dependence measure as the normal scoring í, which is similar to the Spearman's
r dependence measure determined via the Pearson correlation of transformed variables that
follow the uniform distribution on (0, 1). Precisely, r ij ˆ 12EGi (Yi )Gj (Yj ) ÿ 3 where the
expectation is taken under the joint distribution G( yi , yj ) of (Yi , Yj ). Also Kendall's ô
dependence of pair (Yi , Yj ) equals to ô ij ˆ 4EG(Yi , Yj ) ÿ 1 where the expectation is taken
under G( yi , yj ) as well. See for example Kendall & Gibbons (1990) for r and ô. Clearly,
both r and ô dependence measurements are non-linear functions in ã ij . For each ®xed ã ij,
the Monte Carlo method may be employed to get r ij and ô ij numerically. It is found that in
the copula setting (1) í and r are effectively very close to each other, and í and ô are
positively ``correlated'' in the sense that increasing the value of í dependence results in a
similar increase for the value of ô dependence, and vice versa.
Note that we also use measure í for the dependence of discrete random variables although the
above interpretation is not applicable in the discrete case. Some supportive evidence for this
extension can be drawn from examples 1 and 2 where the bivariate binomial and Poisson are
studied in detail. Hence in the following the measure í is assumed well-de®ned in both the
continuous and discrete cases.
Let S ˆ fr1 , . . ., rs g be a subset of indices f1, . . ., mg. The marginal distribution function of
Y r1 , . . ., Y rs is obtained by letting components yi ! 1, i 2 S in the joint distribution
G(y; ì, ó2 , Ã) where S is the complementary set of S. Clearly, the marginal distribution is
CS (F r1 ( y r1 ), . . ., F rs ( y rs )jà S ) where CS (u S ) is the marginal distribution of CÖ (u) and the
dependence matrix à S is the submatrix of à with entries corresponding to the set S. In
particular, for both continuous and discrete cases, the marginal density of one component Yi is
equal to the density of univariate DM( ì i , ó 2i ), like the normal case, not depending on the matrix
à at all.

2.3. Some examples


We now give three examples, two of which are discrete models.

Example 1 (Multivariate binary model). Let Yi , i ˆ 1, . . ., m be m binary random variables


with the probability of success pi . The distribution of Yi is

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


310 P. X.-K. Song Scand J Statist 27

8
<0 yi , 0
Fi ( yi ) ˆ 1 ÿ pi 0 < yi , 1
:
1 yi > 1

The m-variate probability function is given by (5). In particular when m ˆ 2, the bivariate
probability function is of the form
P(Y1 ˆ y1 , Y2 ˆ y2 ) ˆ Cã (u1 , u2 ) ÿ Cã (u1 , v2 ) ÿ Cã (v1 , u2 ) ‡ Cã (v1 , v2 ) (6)

where ui ˆ Fi ( yi ) and v i ˆ Fi ( yi ÿ 1). Here Cã is the bivariate Gaussian copula which is


parametrized by a single dependence parameter ã 2 (ÿ1, 1). It follows that the four point
probabilities are given by
8
>
> Cã (1 ÿ p1 , 1 ÿ p2 ), if y1 ˆ 0, y2 ˆ 0
<
1 ÿ p1 ÿ Cã (1 ÿ p1 , 1 ÿ p2 ), if y1 ˆ 0, y2 ˆ 1
P(Y1 ˆ y1 , Y2 ˆ y2 ) ˆ (7)
>
> 1 ÿ p2 ÿ Cã (1 ÿ p1 , 1 ÿ p2 ), if y1 ˆ 1, y2 ˆ 0
:
p1 ‡ p2 ‡ Cã (1 ÿ p1 , 1 ÿ p2 ) ÿ 1, if y1 ˆ 1, y2 ˆ 1:

To make use of this model for the regression analysis of binary data, we may assume a
generalized linear model structure for marginal expectations, namely, logit( pi ) ˆ ç i or
Öÿ1 ( pi ) ˆ ç i , where ç i ˆ xTi â and x i are a vector of covariates. This leads to a multivariate
logistic model or a multivariate probit model, respectively. In particular the probit link results in
1 ÿ pi ˆ Ö(ÿç i ), and therefore Cã (1 ÿ p1 , 1 ÿ p2 ) ˆ Ö2 (ÿç1 , ÿç2 jã).
As a matter of fact, the multivariate probit model can be interpreted as a probit model with
the latent variable representation. This may be seen through a bivariate case. Let ( Z 1 , Z 2 ) be
the latent normal vector satisfying Z i ˆ xTi ⠇ E i , i ˆ 1, 2, where (E1 , E2 )  N (0, 0, 1, 1, ã),
and de®ne Yi ˆ 0, if Z i < 0; 1, otherwise. Then the point probability P(Y1 ˆ 0,
Y2 ˆ 0) ˆ Ö2 (ÿxT1 â, ÿxT2 âjã), identical to the ®rst expression of (7). It is easy to prove that the
other three point probabilities are the same as the rest in (7). In this case, the correlation
parameter ã in (6) is identical to that from the latent normal distribution.
For the bivariate binary model, the lower and upper FreÂchet bounds are given, respectively, in
the ®rst and second lines of the following two-way table,

0 < y1 , 1 y1 > 1

0 < y2 , 1 maxf0, 1 ÿ p1 ÿ p2 g 1 ÿ p2
1 ÿ maxf p1 , p2 g 1 ÿ p2

y2 > 1 1 ÿ p1 1
1 ÿ p1 1

and the bounds otherwise equal to zero. It is easy to show that the bivariate binary model
attains these two bounds when ã equals to ÿ1 and 1 respectively.

Example 2 (Multivariate Poisson model). Let Y i , i ˆ 1, . . ., m be m Poisson random vari-


ables with mean parameter ì i. Similarly the joint probability function of Y is given by (5). To
link this model to a set of covariates x i in the context of regression analysis, we assume a log-
linear model for each of marginal expectations ì i , i.e. log( ì i ) ˆ xTi â.
It is known that the stochastic representation is another approach in the literature (see for
example Joe, 1997, sect. 7.2) to constructing a multivariate Poisson distribution. For instance,

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 311

this method constructs bivariate Poisson as (Y1 , Y2 ) ˆ ( Z 1 ‡ Z 12 , Z 2 ‡ Z 12 ) where Z 1 , Z 2 , Z 12


are independent Poisson with parameters ë1 , ë2 , ë12 . Although this construction seems much
simpler, it can only allow positive dependence, whereas the copula-based distribution (5) can
accommodate both positive and negative dependence.
A simple comparison of the two constructions is considered on the basis of conditional
expectation as follows. It is easy to prove that
r
ë12 ì1
E( Z 1 ‡ Z 12 j Z 2 ‡ Z 12 ˆ y2 ) ˆ ë1 ‡ y2 ˆ ì1 ‡ ã ( y2 ÿ ì2 ), (8)
ë2 ‡ ë12 ì2

as a linear function
p in y2 , where ã is the Pearson correlation coef®cient of (Y1 , Y2 ) equal
 p
to ë12 = ë1 ‡ ë12 ë2 ‡ ë12 and ì i ˆ ë i ‡ ë12 are given marginal means. For the copula-
based construction, the conditional mean is
X
1
E(Y1 jY2 ˆ y2 ) ˆ y1 P(Y1 ˆ y1 , Y2 ˆ y2 )=P(Y2 ˆ y2 ), (9)
y1 ˆ0

where the joint point probability P(Y1 ˆ y1 , Y2 ˆ y2 ) is same as (6). It is relatively easy to
compute this function numerically, although its closed form expression is unavailable. A
comparison between the two conditional means are illustrated in Fig. 2, where the two
margins are set to be the same.
In Fig. 2, a linear approximation to conditional mean (9) is also shown. This approximation
takes a form similar to (8), given by
E(Y1 jY2 ˆ y2 )  ì1 ‡ ãK( ì1 )ø( y2 , ì2 ), (10)

Fig. 2. Two exact conditional means and a linear approximation represented, respectively, by solid line
( ), dashed line (- - -) and a dotted line (´ ´ ´ ´).

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


312 P. X.-K. Song Scand J Statist 27

P1
where K( ì1 ) ˆ y1 ˆ0 öfq1 ( y1 )g and
öfq2 ( y2 ÿ 1)g ÿ öfq2 ( y2 )g
ø( y2 , ì2 ) ˆ ,
F2 ( y2 ) ÿ F2 ( y2 ÿ 1)
where ö is the standard normal density. The approximation (10) is obtained simply by the
Taylor expansion of (9) around 㠈 0, given by for u i ˆ Fi ( yi ), i ˆ 1, 2,
Cã (u1 , u2 ) ˆ F1 ( y1 )F2 ( y2 ) ‡ ö(q1 )ö(q2 )㠇 O(ã2 ):
p
The function difference ì ÿ K( ì) is found positive at ì ˆ 1, 2, . . ., and a monotone
decreasing to zero as ì goes to the in®nity. For example, it equals to 0.0225, 0.0127, 0.0099
when ì ˆ 10, 30, 50, respectively.
Figure 2 contains nine plots with all possible combinations of (ã, ì) for 㠈 0:3, 0.6, 0.9 and
ì ˆ 5, 20, 40, and each graph consists of three lines corresponding to the linear conditional
mean (8), the conditional mean (9) and the approximation (10), respectively, represented by
solid line (ÐÐ), dashed line (- - -) and dotted line (´ ´ ´ ´).
Clearly, when marginal means are not small, say 20 or bigger as seen in the ®gure, the two
exact conditional means are almost identical within fairly reasonably large ranges around the
means, and the approximations are also shown fairly close to the other two, although
approximations near tails have some small departures.
For small marginal means (equal to 5 in the ®gure), the two exact conditional means are still
close enough to each other, and the approximations almost overlap with the other two at low y
values but start to go away from them when y is far from the mean ì (in this ®gure the going-
away begins approximately at 2 ì).
This comparison also sheds light on the interpretation of the correlation parameter ã in the
copula-based construction. At least numerically we see the closeness between this parameter and
the one appearing in the stochastic representation method that has the traditional interpretation.

Example 3 (Multivariate gamma model). Let Yi , i ˆ 1, . . ., m be m gamma random variables,


and Yi  Ga( ì i , ó 2i ) where ì i and ó 2i are the mean and dispersion parameters, respectively.
Clearly, the m-variate joint density of Y is given by (4). With connection to the regression
model, we assume a generalized linear model for marginal expectations as h( ì i ) ˆ xTi â and a
constant dispersion ó 2i ˆ ó 2 , i ˆ 1, . . ., m, where h is the link function which may be chosen to
be the reciprocal link or the log link in the context of gamma regression. Some numerical results
can be found in section 4.3 where this model is used for a simulation study.

3. Properties
3.1. Moments
It is known from the previous section that the marginal distributions of MDM m (ì, ó2 , Ã)
are just DM( ì i , ó 2i ), and hence their marginal moments are always straightforward
obtained. For the special case of the MED model, E(Yi ) ˆ ì i and var(Yi ) ˆ ó 2i v( ì i ), where
v(:) is the corresponding variance function.
Unlike the marginal moments, joint moments are in general unavailable in closed forms.
However, in some cases we may use the Monte Carlo approach to obtain them numerically.
Suppose Y ˆ (Y1 , . . ., Ym )T follows a continuous MDM m (ì, ó2 , Ã). Then for a function h such
that h(Y) satis®es the conditions of the law of large number, when M is large,
1 XM
(i)
E G h(Y1 , . . ., Ym )  h[F ÿ1 ÿ1 (i)
1 fÖ(X 1 )g, . . ., F m fÖ(X m )g] (11)
M iˆ1

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 313

where (X (i) (i)


1 , . . ., X m ), i ˆ 1, . . ., M, are M i.i.d. variates from N m (0, Ã).
For the discrete case, if Yi s take ®nite number of values, joint moments may also be obtained
numerically; otherwise, certain approximation such as the small dispersion asymptotics (Jùrgen-
sen, 1987b) may be invoked. For instance, when ó 2i and ó 2j are small, we have
F ÿ1
k fÖ(xk )g ˆ ì k ‡ ó k v
1=2
( ì k )x k ‡ o(ó k ),
and hence, the covariance of (Yi , Yj ) may be approximated by
cov(Yi , Yj ) ' ó i v1=2 ( ì i )ó j v1=2 ( ì j )ã ij : (12)
Furthermore, for the ED model margins that satisfy var(Yk ) ˆ ó 2k v( ì k ), k ˆ 1, . . ., m, we
obtain
corr(Yi , Yj ) ' ã ij : (13)
It is noteworthy that (12) and (13) are also valid for continuous models. In the continuous
case, a better approximation than (12) based on higher order asymptotic expansion of
h k (x) ˆ F ÿ1
k fÖ(x)g, k ˆ i, j leads to

ó 2i ó 2j
cov(Yi , Yj ) ˆ h i0h j ‡ ó ij h9i h9j ‡ h i h j0 ‡ O(ó 2i ó j ) ‡ O(ó i ó 2j ) ‡ O(ó 3i ) ‡ O(ó 3j )
2 2
where ó ij ˆ ó i v1=2 ( ì i )ó j v1=2 ( ì j )ã ij and h k ˆ h k ( ì k ), h9k ˆ h9k ( ì k ), h k0 ˆ h k0 ( ì k ), k ˆ i, j.

3.2. Moment generating function


This section concerns only continuous MED m (ì, ó2 , Ã) models, assuming moment generat-
ing functions exist. Although the results presented in this section are in fact valid for other
cases where the joint density functions are of product form like (4), in the following, we
will base our discussions on the proposed models.
For t 2 R m, let q ˆ (q1 , . . ., q m )T , and q ˆ (q1 , . . ., qm )T with the ith component de®ned
as
q ˆ q ( y , è ‡ t =ë , ë ) ˆ Öÿ1 fG( y , è ‡ t =ë , ë )g:
i i i i i i i i i i i

From the fact that the density function (4) appears to be of a product form of the copula
density and marginal densities, the moment generating function is then easily obtained as
Qm
j (t) ˆ iˆ1 j (t) where the ith marginal moment generating function of ED( ì i , ó 2i )
j(t i )j
is given by
j(t i ) ˆ exp[ë i fk(èi ‡ t i =ë i ) ÿ k(èi )g], with ë i ˆ 1=ó 2i
and
 
1 1
j  (t) ˆ E exp qT (I m ÿ à ÿ1 )q ÿ q T (I m ÿ à ÿ1 )q : (14)
2 2
The expectation in (14) is taken under the multivariate model MED m (ì , ó2 , Ã) where the
ith element of ì corresponds to èi ‡ t i =ë i under the mean mapping k9(:). Obviously,
j  (0) ˆ 1.
Clearly, the moment generating function is factorized by two parts where the ®rst factor is the
product of m marginal moment generating functions of ED random variables, and the second
factor j  (t) is in general too complicated to have a closed form expression. But the following
theorem describes a pro®le of this function, that is, function j  (t) contains the covariance
information of the multivariate distribution.

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


314 P. X.-K. Song Scand J Statist 27

Theorem 1
If j  (t) exists in a neighbourhood of 0, then
@j j (0) @ 2 j  (0)
ˆ 0, ˆ 0, for i ˆ 1, . . ., m,
@ ti @ t2i
and
@ 2 j  (0)
ˆ cov(Y i , Y j ), for i 6ˆ j:
@ ti@ t j

Proof. Since the marginal distributions of MED m (ì, ó2 , Ã) are univariate dispersion models
ED( ì i , ó 2i ), we have

j(t)
@j j (t)
 (0)k9(è ) ‡ @j j (0)
@j
ì i ˆ EYi ˆ ˆ j i ˆ ìi ‡ :
@ t i tˆ0 @ t i tˆ0 @ ti
Hence, @j j (0)=@ t i ˆ 0. A similar procedure results in @ 2 j (0)=@ t2i ˆ 0. A simple calcula-
tion gives for i 6ˆ j,

@ 2 j  (t)
E(Yi Yj ) ˆ ì i ì j ‡ ,
@t @t
i j tˆ0

which leads to
@ 2 j  (0)
ˆ cov(Y i , Y j ):
@ ti@ t j

This theorem implies that j  (t) is the contributor of the covariances, and for this sake, j  (t)
may be called the dependence generating function.

3.3. Asymptotic normality


We now turn to develop the asymptotic multivariate normality in the sense of ``small
dispersion'', similar to Jùrgensen (1987b) for the univariate dispersion models.
2 1=2
The pEuclidean
 norm of a matrix A, de®ned pby
 (ÓÓa ) , may be expressed as
T T
Pijm 2 1=2
kAk ˆ tr(A A), and the norm of a vector t is ktk ˆ t t ˆ ( iˆ1 t i ) . Let Ó ˆ var(Y).

Theorem 2
Suppose j (t) exists for t 2 R m. If the variance±covariance matrix Ó is positive de®nite,
then
d
Óÿ1=2 (Y ÿ ì) ! N m (0, I m ), as kÓk ! 0 or kÓÿ1 k ! 0:

Proof. We ®rst give the proof for the case of kÓk ! 0. By Cauchy±Schwarz's inequality, it is
easy to see that ó 2max ˆ max(ó 21 , . . ., ó 2m ) ! 0 is equivalent to kÓk ! 0. According to
Jùrgensen (1997, sect. 3.6), the marginal asymptotic normality takes the following form,
F( y; ì i , ó 2i ) ˆ Ö(î i =ó i )f1 ‡ O(ó 2i )g, i ˆ 1, . . ., m, (15)
where î i is the Pearson residual given by
y ÿ ìi
î i ˆ î i ( y) ˆ 1=2 , i ˆ 1, . . ., m,
v ( ìi)

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 315

a one-to-one linear function of y for ®xed ì i . For ®xed y ˆ ( y1 , . . ., y m ), let u iFi ˆ Fi ( yi )


and uÖ
i ˆ Ö(î i =ó i ), i ˆ 1, . . ., m. Then from (15),
F
CÖ (u1 1 , . . ., u Fmm ) ˆ CÖ (uÖ Ö F Ö Ö Ö 2
1 , . . ., u m ) ‡ o(ku ÿ u k) ˆ CÖ (u1 , . . ., u m ) ‡ o(ó max ):

That is, Y is asymptotically multivariate normal with mean ì and covariance matrix given
by
diag[ó 1 v1=2 ( ì1 ), . . ., ó m v1=2 ( ì m )]Ã diag[ó 1 v1=2 ( ì1 ), . . ., ó m v1=2 ( ì m )],
as ó 2max ! 0. As a matter of fact, this covariance matrix is equal to Ó because of (13).
Pm
We now prove the other case of kÓÿ1 k ! 0. Let Ù(t) ˆ iˆ1 ë i fk(èi ‡ t i =ë i ) ÿ k(èi )g. An
application of the multivariate version of the Taylor expansion leads to
1
Ù(t) ˆ ìT t ‡ tT diag[ó 21 v( ì1 ), . . ., ó 2m v( ì m )]t ‡ o(ktk2 ):
2
On the other hand, from theorem 1 and the Taylor expansion for term j  (t), we obtain
1
j  (t) ˆ 1 ‡ tT Ó0 t ‡ o(ktk2 )
2
where Ó0 is a matrix with diagonals zero and off-diagonals covariances. Combining the
two above expressions of expansion, we have
 
1
j (t) ˆ exp ìT t ‡ tT Ót ‡ o(ktk2 ) : (16)
2
Furthermore, letting Z ˆ Óÿ1=2 (Y ÿ ì), we obtain the moment generating function of Z as
follows,
j Z (t) ˆ expfÿtT Óÿ1=2 ìgj
jY (Óÿ1=2 t)
 
1
ˆ expfÿtT Óÿ1=2 ìgexp ìT Óÿ1=2 t ‡ tT Óÿ1=2 ÓÓÿ1=2 t ‡ o(kÓÿ1=2 tk2 )
2
 
1
ˆ exp tT t ‡ o(kÓÿ1=2 tk2 ) :
2
Note that
kÓÿ1=2 tk2 < kÓÿ1=2 k2 ktk2 ˆ tr(Óÿ1 )ktk2 :
Therefore, for any t 2 R m,
 
1 T
j Z (t) ! exp t t , as tr(Óÿ1 ) ! 0:
2
According to the uniqueness theorem,
d
Z ˆ Óÿ1=2 (Y ÿ ì) ! N m (0, I m ):
Pm
Let 0 , r1 < ´´´ < rm be the eigenvalues of Ó. Clearly kÓÿ1 k ˆ ÿ2
iˆ1 r i ! 0 is equivalent
P m ÿ1
to tr(Óÿ1 ) ˆ iˆ1 r i ! 0.

3.4. Approximation based on deviance


We now establish an asymptotic connection between our MDM density and the multivariate
dispersion density of Jùrgensen & Lauritzen (1998). This relationship is effectively obtained

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


316 P. X.-K. Song Scand J Statist 27

by using a third-order approximation to marginal normal scores on the basis of the


deviance residual (see for example McCullagh & Nelder 1989, sect. 2.4) de®ned by
r ˆ r( y) ˆ d 1=2 ( y; ì):
The theory of the tail area approximation claims (Jùrgensen 1997, sect. 3.6) that
F( y; ì, ó 2 ) ˆ Ö(r=ó )f1 ‡ O(ó 3 )g,
and hence q( y) ' r=ó .
For the multivariate setting let r(y; ì) ˆ (r1 , . . ., r m )T . We may approximate the logarithm of
density for a continuous model as
1 Xm
1
log g(y) ' ÿ logjÃj ‡ log a( yi ; ó 2i ) ÿ rT (y; ì)Óÿ1 r(y; ì),
2 iˆ1
2

and the density might be re-written approximately as


 
Ym
1 T
ÿ1=2 2 ÿ1
g(y) ' jÃj a( yi ; ó i ) exp ÿ r (y; ì)Ó r(y; ì) (17)
iˆ1
2

where Ó ˆ diag(ó i )à diag(ó i ). It is noted that the density function obtained by normalizing
the right-hand side of the approximation (17) coincides with the de®nition for multivariate
dispersion density of Jùrgensen & Lauritzen (1998).

4. Modelling longitudinal data


We now formulate multivariate generalized linear models based on the multivariate ex-
ponential dispersion models proposed in the previous sections.

4.1. Marginal model


Consider a longitudinal data set consisting of n independent response vectors y1 , . . ., y n .
Assume each y i ˆ ( yi1 , . . ., yim )T  MED n (ì i , ó2i , Ã), i ˆ 1, . . ., n, where ì i ˆ ( ì i1 ,
. . ., ì im )T and ó2i ˆ (ó 2i1 , . . ., ó 2im )T are respectively the mean vector and the dispersion
vector for subject i ˆ 1, . . ., n. Further we assume ó 2it ˆ ó 2 =w2it where w it are known
positive weights, t ˆ 1, . . ., m and ó 2 is an unknown dispersion parameter common for all
responses.
Let x it be a p-element vector of covariates associated with subject i at time t. And
X i ˆ (x i1 , . . ., x im )T forms an n 3 p matrix of covariates.
The marginal expectations are further assumed to follow generalized linear models,
h( ì it ) ˆ xTit â, where â is a p-vector of regression parameter of interest and h is a known link
function. The main objective is to estimate the parameter è ˆ (â, ó 2 , Ã). Note that the
dependence among the components of y i is characterized by the matrix à which may be further
parametrized by a q 3 1 parameter vector á, denoted by Ã(á) if present.
Throughout the rest of the section, we shall concentrate our demonstration on the continuous
models, and set all weights w2it to 1 for convenience.
It follows from (4) that the full log-likelihood function for the parameter è is of the form
n Xn X m
l(è; y) ˆ ÿ log jÃj ‡ log g( yit ; â, ó 2 )
2 iˆ1 tˆ1

1X n
‡ qT (y i ; â, ó 2 )(I m ÿ à ÿ1 )q i (y i ; â, ó 2 ) (18)
2 iˆ1 i

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 317

where q i (y i ; â, ó 2 ) ˆ (q i1 , . . ., q im )T is the normal score vector related to subject i.


Consequently, applying the chain rule of differentiation with respect to â, the likelihood
equation for â is given by
( n )
1 X Xn
Ø(â; y, ó 2 , Ã) ˆ 2 DTi V ÿ1
i (y i ÿ ì i ) ‡ QT
i (I m ÿ Ã ÿ1
)q i ˆ 0: (19)
ó iˆ1 iˆ1

where DTi ˆ @ìTi =@â, V i ˆ diagfv( ì it )g and QTi ˆ ó 2 @qTi =@â. The maximum likelihood
estimate â^ ML of â can be obtained as the solution to (19).
Note that Qi is an m 3 p matrix and the tth column is given by
… yit 
@q it
ó2 ˆ x it fv( ì it )h9( ì it )ö(q it )gÿ1 u f it (u) du ÿ ì it Fit
@â ÿ1
 … yit 
ˆ x it fv( ì it )h9( ì it )ö(q it )gÿ1 ( yit ÿ ì it )Fit ÿ Fit (u) du :
ÿ1

If repeated observations from a subject are independent of one another, i.e. à ˆ I m , then (19)
will be simpli®ed to the so-called score equation in the context of generalized linear models.
This likelihood equation also indicates a different approach to incorporating the dependence
feature in the estimating procedure in comparison to the GEE approach. Unlike the GEE
approach of Liang & Zeger (1986) that in fact generalizes the score equation (namely the ®rst
term inside the curly bracket in (19)) using a non-zero off-diagonal matrix V ~ i to substitute a
diagonal covariance matrix V i , this likelihood equation introduces the dependence matrix à in a
separate term (namely the second term inside the curly bracket in (19)) that may be regarded as
a penalty term for correlation.
Under some mild regularity conditions, the standard large sample theory for likelihood
estimates implies both the consistency and asymptotic normality, where in particular the
observed Fisher information matrix is Ø ^Ø^ T with all parameters being replaced by their
corresponding estimates in (19).
In general the computation in solving (19) is not straightforward because the normal scores
q it and their derivatives are non-linear functions involving both parameters â and ó 2, and a
software for such a numerical implementation is necessary. Some further studies for this aspect
is needed.

4.2. GEE justi®cation


When kÓk ! 0 or kÓÿ1 k ! 0, it follows from theorem 2 that the log-likelihood can be
approximately written as
n Xn X m
1 Xn
ÿ1=2 ÿ1=2
l a (è) ' ÿ log jÃj ‡ log a( yit ; ó 2 ) ÿ 2 (y i ÿ ì i )T V i à ÿ1 V i (y i ÿ ì i )
2 iˆ1 tˆ1
2ó iˆ1
(20)
where V i ˆ diagfv( ì it )g. Consequently, the estimate of â that maximizes l a , denoted by
^ a , is obtained as the solution to the following estimating equation
â
!
Xn
@ìTi ÿ1=2 ÿ1=2
Xn
U (â) ˆ ó ÿ2
V i à ÿ1 V i (y i ÿ ì i ) ˆ DTi Óÿ1
i (y i ÿ ì i ) ˆ 0 (21)
iˆ1
@â iˆ1

where Ó i ˆ ó 2 diagfv1=2 ( ì it )gà diagfv1=2 ( ì it )g.


Note that (21) coincides with the form of Liang & Zeger's (1986) generalized estimating
equation which gives the same estimate of â as â ^ a if the working correlation matrix is taken the

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


318 P. X.-K. Song Scand J Statist 27

form of Ó i . Also note that Ã, being treated as the nuisance parameters in the GEE approach,
may be interpreted, in our setting, as the matrix of normal scoring í dependence measurements
between components of y i . Also by (13), the matrix Ó i is indeed the approximate covariance
^ a may be done by the GEE routine available in S-Plus.
matrix of y i . Solving (21) to obtain â
Clearly the inference function U (â) is unbiased. Therefore, under some mild regularity
conditions (see for example Godambe, 1991), we have the asymptotic normality as follows,
p ^ d
n(â a ÿ â) ! N p (0,  (â)), as n ! 1

with  (â) ˆ lim n nJ ÿ1 (â) where J (â) is the Godambe information matrix given by
J (â) ˆ S(â)V ÿ1 (â)S T (â):
Here S(â) ˆ Eâ U 9(â) and V (â) ˆ Eâ U (â)U T (â).
Applying the saddlepoint approximation for a(:) function involving in (20), we could also
obtain an estimate of ó 2 ,
1 X m
ó^ 2a ˆ (y i ÿ ì i )T Óÿ1
i (y i ÿ ì i ) (22)
mn ÿ p iˆ1

which is of exactly same form as Liang & Zeger's (1986) estimate for ó 2. Similarly, the
estimate of à is obtained by
X
n
^ˆ 1
à V
ÿ1=2 ÿ1=2
(y i ÿ ì i )(y i ÿ ì i )T V i :
nó iˆ1 i
2

4.3. A simulation study


We have conducted a simple simulation study to examine the relative ef®ciencies of â ^ML
^
obtained from the likelihood equation (19) and âGEE obtained from the generalized
estimating equation of Liang & Zeger (1986) or from (21) with Ó i taken as the covariance
matrix of y i . We assume our longitudinal data are from a multivariate exponential
dispersion model with gamma margins, so we can study the ef®ciency in terms not only of
the correlation parameter but also of the dispersion parameter. Following Liang & Zeger
(1986), the marginal expectation is speci®ed as of the form,
log( ì it ) ˆ á ‡ ât=10, t ˆ 1, . . ., 10
with the true parameters set á ˆ ⠈ 1. We used the exchangeable correlation structure for
the matrix Ã(ã) where ã is a one-dimensional dependence parameter. In GEE, we chose
the covariance matrix, cov(y i ), as the working covariance, which can be calculated via
formula (11). According to Crowder (1987), the corresponding â ^GEE will actually be the
optimal estimator of â, which has minimum asymptotic variance among all estimators
obtained from the linear estimating equations of the form Ó i W i (y i ÿ ì i ) ˆ 0 where W i is a
non-random weight matrix.
Table 1 reports the asymptotic relative ef®ciency of â^ML and â ^ GEE s for different pairs of
2
dispersion parameter and dependence parameter (ó , ã) ranged in (0:01, 2:00) and (0:0, 0:9),
respectively.
㠈 0 corresponds to the case of independence, in which both estimators are equally ef®cient.
When the dispersion parameter is less than 0.5, the corresponding relative ef®ciciencies are
close to 1, due to the small-dispersion asymptotics. From this table, clearly the asymptotic
^ GEE decreases when either the dependence parameter or the dispersion
relative ef®ciency of â

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


Scand J Statist 27 Multivariate dispersion models 319

^ ML and â
Table 1. Asymptotic relative ef®ciency of â ^ GEE using gamma
regression model
Dispersion Dependence ã
2
ó 0.0 0.3 0.6 0.9
0.01 1.0000 1.0039 1.0034 1.0311
0.10 1.0000 1.0350 1.1014 1.2312
0.50 1.0000 1.1580 1.3749 1.8148
1.00 1.0000 1.2772 1.6083 2.0808
2.00 1.0000 1.4553 2.0590 3.2390

^GEE becomes very


parameter increases. In particular, as both ó 2 exceeds 1 and ã approaches 1, â
^
inef®cient, and âML is surely preferred.

Acknowledgements
This work is a part of the author's PhD dissertation under the supervision of Professor B.
Jùrgensen at University of British Columbia. Also this research was partially supported by
a grant from the Natural Sciences and Engineering Research Council of Canada and by a
grant from Faculty of Arts, York University.
The author thanks the two referees and the associate editor for helpful comments which lead
this paper to a better exposition.

References
Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In
Studies on item analysis and prediction (ed. H. Solomon), 158±168: Stanford Mathematical Studies in the
Social Sciences VI, Stanford University Press, Stanford, CA.
Barndorff-Nielsen, O. E. and Jùrgensen, B. (1991). Some parametric models on the simplex. J. Multivariate
Anal. 39, 106±116.
Bishop, Y. M. M., Fienberg, S. E. & Holland, P. W. (1975). Discrete multivariate analysis: theory and
practice. MIT Press, Cambridge, MA.
Crowder, M. (1987). On linear and quadratic estimating functions. Biometrika 74, 591±597.
Diggle, P. J., Liang, K.-Y. & Zeger, S. L. (1994). The analysis of longitudinal data. Oxford, Oxford University
Press.
Fitzmaurice, G. M., Laird, N. M. & Rotnitzky, A. G. (1993). Regression models for discrete longitudinal
responses. Statist. Sci. 8, 284±309.
Godambe, P. V. (1991). Estimating functions: an overview. Oxford University Press, Oxford.
Hutchinson, T. P. & Lai, C. D. (1990). Continuous bivariate distributions, emphasising applications. Rumsby,
Sydney.
Joe, H. (1993). Parametric family of multivariate distributions with given margins. J. Multivariate Anal. 46,
262±282.
Joe, H. (1997). Multivariate models and dependence concepts. Chapman & Hall, London.
Jùrgensen, B. (1987a). Exponential dispersion models (with discussion). J. Roy. Statist. Soc. Ser. B 49,
127±162.
Jùrgensen, B. (1987b). Small-dispersion asymptotics. Braz. J. Probab. Statist. 1, 59±90.
Jùrgensen, B. (1997). The theory of dispersion models. Chapman & Hall, London.
Jùrgensen, B. & Lauritzen, S. L. (1998). Multivariate dispersion models. Research Report 2, Department of
Statistics and Demography, Odense University, Denmark.
Kendall, M. & Gibbons, J. D. (1990). Rank correlation methods. 5th edn. Edward Arnold, London.
Liang, K.-Y. & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika
73, 13±22.

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.


320 P. X.-K. Song Scand J Statist 27

Marshall, A. W. & Olkin, I. (1988). Families of multivariate distributions. J. Amer. Statist. Assoc. 83,
834±841.
McCullagh, P. & Nelder, J. A. (1989). Generalized linear models. 2nd edn. Chapman & Hall, London.
Sklar, A. (1959). Fonctions de reÂpartition aÁ n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8,
229±231.
Zhao, L. P. & Prentice, R. L. (1990). Correlated binary regression using a generalized quadratic model.
Biometrika 77, 642±648.

First received August 1997, in ®nal form May 1999

P. X.-K. Song, Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada M3J
1P3.

# Board of the Foundation of the Scandinavian Journal of Statistics 2000.

You might also like