Andersen E.W. - Composite Likelihood and Two-Stage Estimation in Family Studies (2004)

Biostatistics (2004), 5, 1, pp.
15–30
Printed in Great Britain
Composite likelihood and two-stage estimation in

family studies
ELISABETH WREFORD ANDERSEN
The Danish Epidemiology Science Centre, Statens Serum Institut, Artillerivej 5, 2300 Copenhagen S,
Denmark and Department of Biostatistics, University of Copenhagen, Blegdamsvej 3, 2200
Copenhagen N, Denmark
ewan@lundbeck.com
S UMMARY
In this paper register based family studies provide the motivation for linking a two-stage estimation
procedure in copula models for multivariate failure time data with a composite likelihood approach.
The asymptotic properties of the estimators in both parametric and semi-parametric models are derived,
combining the approaches of Parner (2001) and Andersen (2003). The method is mainly studied when the
families consist of groups of exchangeable members (e.g. siblings) or members at different levels (e.g.
parents and children). The advantages of the proposed method are especially clear in this last case where
very flexible modelling is possible. The suggested method is also studied in simulations and found to be
efficient compared to maximum likelihood. Finally, the suggested method is applied to a family study of
deep venous thromboembolism where it is seen that the association between ages at onset is larger for
siblings than for parents or for parents and siblings.
Keywords: All possible pairs; Composite likelihood; Copula; Family studies; Optimal weights; Two-stage estimation.
1. I NTRODUCTION
In register-based family studies failure times on related individuals are observed and familial
aggregation of a disease can be regarded as correlation of failure times within families. A family may
be a group of exchangeable individuals such as siblings, but this is not always the case. The family may,
for example, consist of parents and siblings. Both of these types of family studies have been the practical
motivation for this work.
There have been two main approaches when modelling correlated data, namely random effects models
or marginal models, and this has also been the case for survival data (Lee et al., 1992; Wei et al., 1989;
Oakes, 1989; Nielsen et al., 1992).
This paper will, however, concentrate on copula models (Genest and MacKay, 1986), which offer a
very flexible framework for combining the marginal approach with a model for the dependence within
units. The joint survival function is modelled through the marginal survival functions and an association
parameter. In family studies the main interest lies in the association between family members, but it is also
important to be able to take possible confounders into account. Using the copula approach the association
is estimated while covariates are included in the marginal models.
A two-stage estimation procedure suggests itself to these models by first estimating the parameters
in the marginal models and regarding them as fixed when estimating the association parameter. This
Biostatistics 5(1)
c Oxford University Press (2004); all rights reserved.
16 E. W REFORD A NDERSEN
estimation procedure was suggested by Hougaard (1986) and has later been studied by Shih and Louis
(1995) and Genest et al. (1995). Glidden (2000) concentrated on the case where the marginal model is a
Cox model and the model for the association is based on the gamma model, whereas in Andersen (2003)
a general choice of marginal model combined with a copula was studied.
An extension of the copula models to hierarchical data was suggested by Bandeen-Roche and Liang
(1996), but as noted by the authors this approach is not so well suited to families of parents and children
because some choices of copula lead to unwanted constraints on the parameters.
In this paper the copula approach is extended to include families consisting of members at different
levels, e.g. parents and children, by combining the two-stage estimation procedure with the composite
likelihood approach (Parner, 2001; Heagerty and Lele, 1998). Although the methods are motivated by
family studies, they can also be used in other cases of correlated failure time data.
The paper is organized as follows. Copula models are briefly described in Section 2. In Section 3 the
composite likelihood approach for groups of siblings and families of parents and children is presented.
The statistical properties of the estimators reached by two-stage estimation combined with composite
likelihood are derived in Section 4. In Section 5 different choices of weights are discussed. Section 6
concerns the ascertainment of data. In Section 7 the properties of the suggested estimators are studied in
simulations. An application to a family study of deep venous thromboembolism is described in Section 8.
Section 9 contains a discussion.
2. C OPULA MODELS
Let (T1 , . . . , TK ) be the failure times from a family of exchangeable members and S1 , . . . , S K the
marginal survival functions, possibly depending on covariates. The joint distribution of (T1 , . . . , TK ) is
fully specified by the joint survival function S(t1 , . . . , t K ). When S(t1 , . . . , t K ) can be written in the form
S(t1 , . . . , t K ) = Cθ {S1 (t1 ), . . . , S K (t K )}, t1 , . . . , t K 0, (2.1)
where Cθ is a K dimensional survival function Cθ : [0; 1] K → [0; 1] with uniform margins and θ is
a parameter or possibly a vector of parameters, then (T1 , . . . , TK ) is said to come from the Cθ copula.
Different choices of Cθ give different joint distributions but the marginal models are unaltered.
A special group of copulas is the Archimedean copula model family, where the copulas are of the form
Cθ (u 1 , . . . , u K ) = φθ {φθ−1 (u 1 ) + · · · + φθ−1 (u K )}
with 0 u i 1, i = 1, . . . , K , 0 φθ , φθ (0) = 1, φθ < 0, φθ > 0.

In this paper the main example of an Archimedean copula is Clayton’s family. The survival times are
(T1 , . . . , TK ) with marginal survival functions {S1 (t1 ), . . . , S K (t K )} = (u 1 , . . . , u K ) and joint survival
function C(u 1 , . . . , u K ). Clayton’s family is then given as
1
C(u 1 , . . . , u K ) = {u 1−θ
1 + · · · + u 1−θ
K − (K − 1)}
1−θ , θ > 1.
1
Here φ(u) = (1 + u) 1−θ is the Laplace transform of a gamma distribution with mean 1 and variance θ − 1.
The failure times Ti and Th are positively associated when θ > 1 and independent for θ → 1.
3. T HE COMPOSITE LIKELIHOOD APPROACH

In the following it will be shown that the composite likelihood approach offers a very flexible way of
analysing clustered failure time data. With the work done by Andersen (2003) it is possible to study groups
Composite likelihood for family studies 17
of exchangeable members, e.g. siblings, where the groups can have any size. However, in this paper the
family members do not have to be exchangeable. This situation occurs when the families consist not only
of siblings, but also family members on another level, e.g. parents and children or half-siblings who live
in the same home. When the families are groups of siblings a composite likelihood approach is also a
possibility, in which instead of one contribution from each group, each possible pair of siblings gives rise
to a contribution. This means that software meant for analysing pairs can be used for groups of any size.
This is, however, just a special case of the situation where the members are no longer exchangeable.
3.1 Groups of siblings

The joint distribution for a family is given by the joint survival function which again is modelled through
the marginal survival functions and a copula tying the marginal distributions together.
In the two-stage estimation the parameters in the margins, β, are estimated in the first stage taking
the clustering into account when estimating the variance of the parameters (Section 4). In the second
stage the association parameter θ is estimated using the score equation from the joint likelihood, but
with the estimates from stage one regarded as known. If the logarithm of the likelihood is log L(β, θ ) =
n
j=1 j (β, θ ) then the score equation for θ is
n
∂
Uθ (β̂, θ ) = j (β̂, θ).
j=1
∂θ
Instead of using the joint likelihood in the second stage we suggest using score equations based on
the bivariate distributions of all possible pairs of siblings. When the joint survival function is given by
a copula as in (2.1) then the bivariate survival functions are given by the same copula. For example,
the bivariate survival function for siblings 1 and 2 is S12 (t1 , t2 ) = Cθ {S1 (t1 ), S2 (t2 )}. The composite
likelihood proposed in the second stage of the estimation is based on these bivariate distributions. Let G j
be the set of possible pairs for family j and L i h (β, θ ) the likelihood for pair (i, h), then the composite
likelihood is

n
n
log L ∗ (θ, β) = wi h log L i h (θ, β) = wi h i h (θ, β) (3.2)
j=1 (i,h)∈G j j=1 (i,h)∈G j
where wi h are positive weights. Weights are introduced to compensate for the composite likelihood,
thereby putting more emphasis on the large families by comparison with the full likelihood. Parner (2001)
showed that the estimates found by maximizing the composite likelihood L ∗ are asymptotically Normal
under suitable assumptions. In this paper the composite likelihood (3.2) is used to find pseudo score
equations for θ in the second stage of the estimation.
Because there is just one association parameter, the model can be simplified so each pair in the family
enters with the same weight in the composite likelihood (3.2) and the weights only depend on the family
size. The choice of weights will be discussed in Section 5. The simplified version of the composite
likelihood (3.2) is

n
log L ∗ (θ, β) = wj i h (θ, β). (3.3)
j=1 (i,h)∈G j
This can be fitted by software for bivariate data by listing separate bivariate observations for the k(k −1)/2
pairs coming from a sibling group of size k.
3.2 Families of parents and children

In the case where the families consist of members at different levels, e.g. parents and children, one
possibility is to consider the hierarchical model suggested by Bandeen-Roche and Liang (1996). Assume,
for instance, that the families consist of parents (1, 2) and children (3, . . . , K j ) with survival times
given by T = {(T1 , T2 ), (T3 , · · · , TK j )} where separate survival functions for parents and children are
Archimedian copulas
S(t1 , t2 ) = φθ1 [φθ−1

1
{S1 (t1 )} + φθ−1
1
{S2 (t2 )}]
S(t3 , . . . , t K j ) = φθ2 [φθ−1
2
{S3 (t3 )} + · · · + φθ−1
2
{S K j (t K j )}].
To find the simultaneous distribution for the family, Bandeen-Roche and Liang (1996) suggested
combining these two survival functions in a joint survival function using an Archimedian copula
S(t1 , . . . , t K j ) = φθ3 [φθ−1

3
{S(t1 , t2 )} + φθ−1
3
{S(t3 , . . . , t K j )}]. (3.4)
Bandeen-Roche and Liang (1996) gave conditions to ensure that (3.4) is a valid survival function. These
conditions can in some cases lead to unwanted constraints on the association parameters. For instance, if
1
Clayton’s family is chosen for each of the three Archimedian copulas, so φθ1 (s) = (1 + s) 1−θ1 , φθ2 (s) =
1 1
(1 + s) 1−θ2 and φθ3 (s) = (1 + s) 1−θ3 , then (3.4) is a survival function with the following constraints on
the three parameters
1 < θ3 < θ1 and 1 < θ3 < θ2 . (3.5)
The parameters in Clayton’s family can be interpreted in the following manner. One can define an
association measure γ as
λT |T (ti |Th = th )
γ (ti , th ) = i h ,
λTi |Th (ti |Th > th )
where λ is the intensity of disease. Then γ measures the change in risk for person i if person h gets the
disease at time th compared to when person h is disease-free at time th . It is only in Clayton’s family
that γ is independent of time (i.e. constant) and moreover γ = θ. In the model with parents and children
the three parameters can be thought of as θ1 = γ (t1 , t2 ), which is the association between parents, θ2 =
γ (ti , th ) i, h 3, the association between two children, and θ3 = γ (ti , th ) i = 1, 2, h 3, the association
between parents and children. The constraints (3.5) on the parameters imply that the association between
parents is stronger than that between parents and children, which may not be plausible because parents
and children are genetically more similar than parents.
Instead of postulating a joint model such as (3.4), the suggestion here is to model the bivariate margins
and combine the likelihood contributions in a composite likelihood.
If H j is the set of possible pairs combining parents and children in family j, G j the set of possible
pairs of children and i h the logarithm of the likelihood for pair (i, h) then the logarithm of the composite
likelihood can be written as

n
log L ∗ (β, θ1 , θ2 , θ3 ) =
j j
{w12 12 (β, θ1 ) + wi h i h (β, θ2 )
j=1 (i,h)∈G j
j
+ wlm lm (β, θ3 )}. (3.6)
(l,m)∈H j
The composite likelihood in (3.6) is simplified in the same way as (3.2) so pairs of parents and children
from the same family get the same weight, and likewise pairs of children from the same family get the
same weight. This means that (3.6) can be written as

n
log L ∗ (β, θ1 , θ2 , θ3 ) = {w1 j 12 (β, θ1 ) + w2 j i h (β, θ2 )
j=1 (i,h)∈G j

+w3 j lm (β, θ3 )}. (3.7)
(l,m)∈H j
This is a composite likelihood of exactly the same type as (3.3).
4. T WO - STAGE ESTIMATION
As in Andersen (2003), two-stage estimation will be used to find the estimates of the parameters
(β, θ ). The two-stage estimation suits the way the models are constructed using copulas to tie the marginal
distributions together with an association parameter. In the first stage the parameters in the marginal
model, β, are estimated taking the clustering into account when the variance of the estimate is calculated.
In the second stage the estimates from the first stage are regarded as fixed in an estimating equation for
the association parameter θ. Calculation of the variance of the estimated association parameter then takes
into account the estimation uncertainty from the first stage.
4.1 Notation and some assumptions

There are n families indexed j = 1, . . . , n with K j members in family j, indexed i = 1, . . . , K j .
Let Ti j be the failure time for person (i, j), Ci j the censoring time and Z i j covariates. Define T j =
{Ti j , i = 1, . . . , K j } and similarly C j and Z j . Suppose (T j , C j )|Z j ( j = 1, . . . , n) are independent
identically distributed random variables and T j is independent of C j conditional on Z j . We observe
X i j = min(Ti j , Ci j ) and δi j = I (Ti j Ci j ).
The composite likelihood used to find an estimate of the association parameter θ is based on the
bivariate distributions, so a model is assumed for all the ‘interesting’ combinations of pairs in the family.
Let M j be the set of ‘interesting’ pairs. The logarithm of the composite likelihood becomes

n
n
log L ∗ (θ, β) = wi h log L i h (θ, β) = wi h i h (θ, β) (4.8)
j=1 (i,h)∈M j j=1 (i,h)∈M j
where wi h are positive weights, L i h is the likelihood for pair (i, h) and i h = log L i h . The composite
likelihood (4.8) covers both of the cases in Sections 3.1 and 3.2.
4.2 The asymptotic distribution in the parametric case

First assume that the margins are modelled parametrically depending on a finite number of parameters β,
which may include effects of the covariates Z .
In the first stage, β is estimated by solving

n
Kj
∂ ∂ n
β
Uβ (β) = δi j log f (xi j , β) + (1 − δi j ) log Si j (xi j , β) = U. j (β) = 0.
j=1 i=1
∂β ∂β j=1
This is also the score equation for β in the case of independence and is unrelated to the composite
likelihood (4.8).
In the second stage, the estimate θ̂ of the association parameter θ is found as the solution to the pseudo
score equation for θ based on the composite likelihood (4.8) with the estimate from stage one (β̂) plugged
in, hence
∂ n ∂ n
Uθ (β̂, θ ) = log L ∗ = wi h log L i h (β̂, θ) = U.θj (β̂, θ ) = 0. (4.9)
∂θ j=1 (i,h)∈M
∂θ j=1
j
β β
Let Vβ = varU.1 (β0 ), Vθ = varU.1θ (β0 , θ0 ), Vβ,θ =cov{U.1 (β0 ), U.1θ (β0 , θ0 )} and R = Iβ−1 Vβ Iβ−1 .
P ROPOSITION 4.1 Assume standard regularity conditions for the marginal models, the regularity
∂ ∂ ∂
conditions stated in the appendix and that −n −1 ∂β Uβ , −n −1 ∂β Uθ and −n −1 ∂θ Uθ converge to Iβ , Iβθ
1
and Iθ at (β0 , θ0 ) as n → ∞. Then n 2 (β̂ − β0 , θ̂ − θ0 ) converges to a Normal distribution with mean
(0, 0) and variance–covariance

R −R Iβθ I −1 + I −1 V I −1
θ β βθ θ
,
−Iθ−1 Iβθ R + Iθ−1 Vβθ Iβ−1 V
where
V = Iθ−1 Vθ Iθ−1 + Iθ−1 Iβθ R Iβθ

Iθ−1 − Iθ−1 Vβθ Iβ−1 Iβθ

Iθ−1 − Iθ−1 Iβθ Iβ−1 Vβθ
−1
Iθ .
The proof of Proposition 4.1 is a straightforward generalization of the proof of Theorem 1 in Shih and
Louis (1995). The variance

Vβ Vβθ β β
= E{(U.1 , U.1θ ) (U.1 , U.1θ )}
Vβθ Vθ
is estimated by

n
β β
n −1 (U. j , U.θj ) (U. j , U.θj ).
j=1
4.3 The asymptotic distribution in the semi-parametric case

In this section we derive the asymptotic distribution of the parameters in the semi-parametric model using
the two-stage method and a composite likelihood for θ .
The marginal intensity λi j (t) for person i in family j follows a Cox model
λi j (t) = λ0 (t) exp(β Z i j ),
where the baseline intensity λ0 (t) is an unknown function of t and Z i j is a vector of covariates for person
(i, j). It is also possible to have a stratified model or a model without covariates, leaving the marginal
model purely non-parametric.
We denote the counting process as Ni j (t) = I (X i j t, δi j = 1), the indicator of risk
Yi j (t) = I (X i j t), the maximum follow-up time τ , and the integrated baseline intensity
t
0 (t) = 0 λ0 (s)ds. The composite log-likelihood is a sum over the possible pairs, log L =
n
j=1 (i,h)∈M j {θ, β,
0 (X i j ),
0 (X h j )}.
In the first stage of the estimation the marginal models are fitted taking the clustering into account
using the method of Lee et al. (1992). The resulting estimates are β̂ and
ˆ 0 (t). The estimate for β is
found by solving the marginal score equation
Kj τ

n
S (1) (β, u) n
β
Uβ (β) = Z i j − (0) dNi j (u) = U. j = 0, (4.10)
j=1 i=1 0
S (β, u) j=1
K j
where S (0) (β, u) = n −1 nj=1 i=1 Yi j (u) exp(β Z i j ) and
K j
S (1) (β, u) = n −1 nj=1 i=1 Yi j (u)Z i j exp(β Z i j ), while the estimator for
0 (t) is an Aalen–Breslow
type estimator t
dN... (u)
ˆ 0 (t, β̂) =

.
0 nS (0) (β̂, u)
Spiekerman and Lin (1998) have shown that under suitable regularity conditions, β̂ is asymptotically
1
Normal around the true value and n 2 {
ˆ 0 (t, β̂) −
0 (t)} converges to a zero-mean Gaussian random field.
At the second stage of estimation the estimates from the first stage are plugged into the pseudo score
function for θ, which is based on the composite log-likelihood (4.8). This creates the pseudo score function
Uθ for the parameter θ :

n ∂
ˆ 0) =
Uθ (θ, β̂,
ˆ 0 (β̂, ti ),
wi h {θ, β̂,
ˆ 0 (β̂, th )}. (4.11)
j=1 (i,h)∈M j
∂θ
The estimate θ̂ is found by solving the equation obtained by setting (4.11) equal to zero.
Under the regularity conditions stated in the appendix the estimator of the association parameter has
the following asymptotic distribution.
1
P ROPOSITION 4.2 n 2 (θ̂ − θ0 ) converges to a Normal distribution with mean zero and variance
Iθ−1 V (W1 + 1 )Iθ−1 .
The precise definition of V (W1 + 1 ) is found in the appendix. The proof of Proposition 4.2 follows
closely that of Proposition 3.2 in Andersen (2003). The variance from Proposition 4.2 is estimated by
inserting the estimates in the formulae in the appendix.
5. C HOICE OF WEIGHTS
In the composite likelihood approach (3.2) and (3.6) the likelihood contributions are weighted together
using positive weights, different choices of weights leading to different estimators. The maximum
likelihood estimate using the full likelihood is the most efficient method, but is not always available.
Andersen (2003) showed that the two-stage method has good efficiency and one would expect that the
two-stage method using a composite likelihood in the second stage would be less efficient. The question
is now whether it is possible to choose optimal weights so the loss of efficiency becomes as small as
possible. This will be investigated informally for the two situations from Sections 3.1 and 3.2.
5.1 One association parameter

First, we concentrate on the simplest case with a parametric model and just one association parameter as
in Section 3.1. Even in this case the resulting variance of θ̂ is quite complicated (Propositions 4.1 and
4.2) and the problem is simplified by assuming that the parameters from the first stage are known. This
leaves the variance as V = Iθ−1 Vθ Iθ−1 . Lindsay (1988) has found an expression for optimal weights in the
one-dimensional case, and since the problem is here reduced to only one dimension the suggested weights
are calculated.
From (3.3) one sees that the pseudo score equation for θ is
∂ n ∂ n
Uθ (β̂, θ ) = log L ∗ = wj i h (β̂, θ ) = wj S(i h) j (β̂, θ ), (5.12)
∂θ j=1 (i,h)∈G
∂θ j=1 (i,h)∈G
j j
where S(i h) j is the score contribution from pair (i, h) in family j.

Let U be the score function for θ based on the full likelihood with the marginal parameters β assumed
known, i.e.
∂
U= log L(β̂, θ).
∂θ
Lindsay (1988) shows that the optimal weights are
wopt = [varS]−1 E(U S), (5.13)
where S is the vector of score contributions. In this set-up E(U S) = E(S 2 ), where S 2 denotes the vector
whose elements are the squared elements of S.
The variance used for the weights (5.13) is a block matrix since the families are assumed to be
independent and the size of each block depends on the size of the family. It is modelled using three
parameters: σ 2 for the variance in the diagonal, ω for the covariance between score contributions from
pairs with one person in common and ρ for the covariance between score contributions from pairs with
nobody in common.
If, for instance, the families have 2, 3, 4 or 5 members then the weights (w2 , w3 , w4 , w5 ) are found
by solving equation (5.13): w2 = 1, w3 = σ 2 /(σ 2 + 2ω), w4 = σ 2 /(σ 2 + 4ω + ρ) and w5 = σ 2 /(σ 2 +
6ω + 3ρ).
The parameters σ 2 , ω, and ρ are estimated by
1 n
σ̂ 2 = S2
n 1 j=1 (i h)∈G (i h) j
j
1 n
ω̂ = S(i h) j S(lm) j
n 2 j=1
{((i h),(lm))∈G 2j |i=l,h=m or i=l,h=m}
1
n
ρ̂ = S(i h) j S(lm) j ,
n3 j=1 {((i h),(lm))∈G 2 |i=l,h=m}
j
where n 1 is the total number of pairs, n 2 the number of elements in the set {((i h), (lm)) ∈ G 2j |i = l, h =
m or i = l, h = m}, j = 1, . . . , n and n 3 the number of elements in the set {((i h), (lm)) ∈ G 2j |i = l, h =
m}, j = 1, . . . , n.
5.2 More than one association parameter

In Section 3.2 families of parents and children were considered. The association parameter θ = (θ1 , θ2 , θ3 )
(for parents, children, and parent child pairs) is now three-dimensional and the variance–covariance matrix
could be defined as optimal if it is smaller than other variance–covariance matrices when ordering the
matrices by positive definite differences. However, in Lindsay (1988) it is mentioned that an optimal
choice of weights is not usually globally attainable.
Since the three likelihood contributions depend on separate parameters one possible approach is to
treat the choice of weights as three separate problems.
The pseudo score equations for θ are found from (3.7):
∂ ∂
Uθ1 (β̂, θ1 ) = log L ∗ = w1 12 (β̂, θ1 ) (5.14)
∂θ1 j=1
∂θ1
∂ ∂
Uθ2 (β̂, θ2 ) = log L ∗ = w2 j i h (β̂, θ2 ) (5.15)
∂θ2 j=1 (i,h)∈G
∂θ2
j
∂ ∂
Uθ3 (β̂, θ3 ) = log L ∗ = w3 j lm (β̂, θ3 ). (5.16)
∂θ3 j=1 (l,m)∈H
∂θ3
j
Here w1 is independent of family number, since θ1 is the association parameter for parents and estimation
is always based on the pair of parents, so the natural choice is w1 = 1.
(2) (3)
Let S 2 be the vector of score contributions S(i h) j for θ2 and S 3 the vector of score contributions S(lm) j
for θ3 . Then one could choose weights as in (5.13)
w2 = [varS 2 ]−1 E(U S 2 ) (5.17)

−1
w3 = [varS 3 ] E(U S 3 ). (5.18)
Again the variances used to calculate the weights are block matrices since the families are assumed to
be independent and the size of each block depends on the size of the family. They are modelled as in
Section 5.1 with separate parameters in the two variances.
(2) (3)
If, for example, the families have 1, 2, or 3 children then the weights for θ2 , (w2 , w2 ), are now
(1) (2) (3)
found by solving equation (5.17). Similarly, the weights for θ3 , (w3 , w3 , w3 ), are found by solving
(2) (3) (1) (2)
equation (5.18). This leads to w2 = 1, w2 = σ22 /(σ22 + 2ω2 ), w3 = σ32 /(σ32 + ω3 ), w3 = σ32 /(σ32 +
(3)
2ω3 + ρ3 ) and w3 = σ32 /(σ32 + 3ω3 + 2ρ3 ).
The parameters σ22 , ω2 , σ32 , ω3 and ρ3 are estimated in the following way:
1 n (2) (2)
σ̂22 = S S
n 1 j=1 (i h)∈G (i h) j (i h) j
j
1 n (2) (2)
ω̂2 = S(i h) j S(lm) j
n 2 j=1
{((i h),(lm))∈G 2j |i=l,h=m or i=l,h=m}
1
n (3) (3)
σ̂32 = S(i h) j S(i h) j
n3 j=1 (i h)∈H j
1 n (3) (3)
ω̂3 = S(i h) j S(lm) j
n 4 j=1
{((i h),(lm))∈H2j |i=l,h=m or i=l,h=m}
1 n (2) (2)
ρ̂3 = S(i h) j S(lm) j ,
n 5 j=1
{((i h),(lm))∈H2j |i=l,h=m}
where n 1 is the total number of pairs in G j , j = 1, . . . , n, n 2 the number of elements in the set
{((i h), (lm)) ∈ G 2j |i = l, h = m or i = l, h = m}, j = 1, . . . , n, n 3 is the total number of pairs in H j ,
j = 1, . . . , n, n 4 the number of elements in the set {((i h), (lm)) ∈ H2j |i = l, h = m or i = l, h = m},
j = 1, . . . , n and n 5 the number of elements in the set {((i h), (lm)) ∈ H2j |i = l, h = m}, j = 1, . . . , n.
The weights calculated in this way are used in the second stage of estimation when θ = (θ1 , θ2 , θ3 ) is
estimated setting (5.14)–(5.16) equal to zero simultaneously.
6. A SAMPLED DATASET
Until now it has been assumed that the dataset is a random sample of families. For rare diseases
this design may be inefficient, and different sampling schemes may be considered. One possible strategy
could be to sample all families with at least one case and a random sample of families without a case.
Adapting the results from Binder (1992) this sampling scheme has been considered in Andersen (2003)
who suggests weighting the estimating equations by the inverse sampling probabilities. Let π j be the
sampling probability for family j, ξ j = 1 if family j is chosen and 0 otherwise, and n the total number of
families in the population. Taking the parametric case as an example, the estimating equation for β in the
first stage becomes
n k
ξj ij
Ũβ = Uβ .
π
j=1 i=1 j
The estimating equation for θ , (4.9) or (4.11), is weighted with the inverse sampling probabilities in the
same way, which means that in the second stage of estimation there are two sets of weights, one to take
the sampling into account and one for the pairwise comparisons, leading to
n
ξj j
Ũθ = Uθ .
π
j=1 j
The estimates derived from the weighted analysis are still asymptotically Normal with a distribution
derived in exactly the same way as in Andersen (2003). It is important to have a good approximation
to the true sampling probability π j for a family j as simulations have shown that misspecified weights
give biased results. For the suggested sampling scheme the probability is known to be 1, when there is at
least one case in the family. The sampling probability π j is

1 if there is at least one case in family j
πj = mk
Nk if family j is of size k and no case in family j,
where m k is the number of families of size k in the sample and Nk is the number of families of size k in
the population.
In practice it can be a problem to determine Nk , but an approximation can be found when the families
are constructed from a random sample of individuals. Let X be the number of families constructed on the
basis of the random
sample of persons drawn from a populationof N individuals. Then the number of
families is X = k 1 m k and the number of individuals is N = k 1 k Nk . An ad hoc approximation to

Nk is Ñk = (m k N )/(k X ), which preserves the correct number of individuals, N = k 1 k Ñk .
7. S IMULATION STUDIES
Some simulation studies were conducted to assess the statistical properties of the proposed method
specifically for the sib groups and families of parents and children.
A set of simulations was carried out comparing full maximum likelihood, the two-stage estimation
using the full family as suggested in Andersen (2003) and the composite likelihood approach suggested
here with different choices of weights for different distributions of sib group size.
As expected, the maximum likelihood method is the most efficient, followed by the two-stage method
where the full sib group is used in the second stage. For the two-stage method with the composite
likelihood in the second stage different sets of weights were chosen with the optimal weights doing slightly
better than the others.
The loss of information is largest in the case with a big proportion of large families (25% of the
families have five members), which does not seem surprising, but with the optimal weights the efficiency
is still 94% compared to maximum likelihood. In most practical applications in Denmark the average size
of sib groups will be closer to two than five.
Simulation studies were also performed for datasets consisting of parents and children and again the
method using a composite likelihood in the second stage of estimation had an efficiency of more than 90%
compared to maximum likelihood, and showed little bias.
A detailed description of the simulation results can be found at http://www.biostatistics.
oupjournals.org.
8. A FAMILY STUDY OF DEEP VENOUS THROMBOEMBOLISM
Several studies have shown that there are genetic and acquired risk factors for developing deep venous
thrombosis and pulmonary embolism (in the following, thromboembolism) (Rosendaal, 1999; Seligsohn
and Lubetsky, 2001). The analysis presented in this section is an exploratory analysis of the different
amounts of familial aggregation in pairs of parents, parents and children, and sibling pairs.
The study makes use of the nation-wide Danish registers based on the Civil Registration System and
the Danish National Registry of Patients. Everybody in Denmark is registered in the Civil Registration
System with a personal identification number, which is used in all registers. The Civil Registration System
also includes a link to parents, making it possible to identify families. The Danish National Registry of
Patients started in 1977 and includes information on all admissions to Danish hospitals. The study base
was constructed by taking all patients from the Danish National Registry of Patients who were born in the
period 1953–94 and with a given set of diagnoses (5329 patients). A random sample of 49 224 persons,
born in the same period and alive on 1 January 1977, was drawn from the population using the Civil
Registration System. The parents and siblings of these 54 553 sampled persons were identified using the
link in the Civil Registration System. Together with the original sample of persons, these now constitute
the study base.
The events were identified in the Danish National Registry of Patients in the period 1 January 1977
until 31 December 1993. All in all there were 47 298 families where the children had both parents in
common. Within the study period, 3339 first-time events of thromboembolism were identified. When
studying familial aggregation the families with several events contain most information. In our data, 2871
families had one person with an event, 208 had two, 16 had three and one family has four persons with a
diagnosis of thromboembolism.
Clayton’s distributional family was chosen for each of the three types of pairs. In the case where the
5329 from DNRP 49224 random sample (CRS)

born 1953-94 born 1953-94, alive 1/1 1977
@
@
R
@
Parents and sibs in CRS
Fig. 1. Construction of the study base using the Danish National Registry of Patients (DNRP) and the Civil
Registration System (CRS).
family consists of two parents (T1 , T2 ) and k − 2 children (T3 , . . . , Tk ) this means that
1
S(t1 , t2 ) = {S(t1 )(1−θ1 ) + S(t2 )(1−θ1 ) − 1} 1−θ1
1
S(ti , t j ) = {S(ti )(1−θ2 ) + S(t j )(1−θ2 ) − 1} 1−θ2 , i, j = 3, . . . , k (8.19)
1
S(ti , t j ) = {S(ti )(1−θ3 ) + S(t j )(1−θ3 ) − 1} 1−θ3
, i = 1, 2 j = 3, . . . , k.
Here θ1 is the association between parents, θ2 the association between two siblings and θ3 the association
between a parent and a child.
There is delayed entry because the Danish National Registry of Patients started in 1977. This is
taken into account by using the conditional survival function. If vi , v j denote the ages at entry, then
the conditional survival function is
S(ti , t j )
P(Ti > ti , T j > t j |Ti > vi , T j > v j ) = . (8.20)
S(vi , v j )
Time-dependent covariates are now difficult to handle correctly since the conditional survival function
(8.20) still depends on the time from birth until the person entered the study. In this example calendar
period is part of the model and it is assumed that the risk of thromboembolism was the same before the
register started as in the first period from 1977–79.
The data are sampled as described in Section 6, and this is taken into account in the analyses. For
the first stage of the analysis population rates have been used and assumed known. This means that there
is no extra variation to take into account in the second stage leading to possible underestimation of the
variance. However, since the dataset is so large the estimates from the marginal analysis will be close to
the population rates.
In the second stage the model (8.19) was fitted to the data taking the sampling and delayed entry into
account. Two different sets of weights were chosen to account for the pairs: all weights set to 1 or the
‘optimal’ weights from Section 5.2. The results are seen in Table 1.
Table 1 shows that the association is largest for siblings with θ̂2 = 10.0, which is significantly larger
than 1. The association for parents is smallest with an estimate of θ̂1 = 2.5 and a confidence interval
including 1 when the optimal weights are chosen. This means that the constraints from (3.5) do not hold
and it would not be possible to fit a joint model of the type (3.4) using Clayton’s family for each copula.
Since the association for parents is smaller than the association for the other types of pairs, this could
Table 1. Estimates of the association between parents (θ1 ), children (θ2 ) and parent/child
(θ3 ) in the application to deep venous thromboembolism
Pairwise weights Parameter Estimate Std error θ̄(95% Conf. int.) Kendall’s τ
(1,1,1) log θ1 0.9164 0.4328 2.50(1.07; 5.84) 0.4286
log θ2 2.3036 0.2358 10.01(6.31; 15.89) 0.8183
log θ3 1.4353 0.0920 4.20(3.51; 5.03) 0.6155
‘optimal’ log θ1 0.9164 0.4756 2.50(0.98; 6.35) 0.4286
log θ2 2.3029 0.2191 10.00(6.51; 15.37) 0.8182
log θ3 1.4144 0.0981 4.11(3.39; 4.99) 0.6089
indicate a genetic factor in the familial aggregation. A simple genetic model would imply that θ2 = θ3 , as
a parent and a child share 50% of their genes on average as do two siblings. Testing this hypothesis, using
a Wald type test in the setting with optimal weights, results in a test statistic of 12.16 and a test probability
of 0.0005, hence the hypothesis is rejected.
The ‘optimal’ choice of weights improves the standard error for the association among siblings and
the relative improvement is larger than the loss of precision for the parent child pairs, suggesting that the
‘optimal’ weights are a good choice.
All calculations were carried out using SAS version 6.12.
9. D ISCUSSION
In this paper a two-stage procedure combined with a composite likelihood in the second stage has
been studied, with particular reference to the two cases of sib groups and families consisting of parents
and children.
The sib groups can already be studied using a two-stage method as in Andersen (2003), but with the
method presented here it is only necessary to study all possible pairs. This means that software designed
for pairs can be used. The loss of efficiency compared to the methods using the full likelihood depends on
the amount of information outside the pairs and the choice of weights. Simulations indicate that choosing
the optimal weights from Lindsay (1988) is sensible. When 25% of sibling groups have five members the
efficiency is still above 90%.
For the families of parents and children the composite likelihood approach gives a flexible framework
in which to model this type of data. Other approaches have been suggested. The hierarchical models
in Bandeen-Roche and Liang (1996) are also based on copulas but they can give unwanted constraints
on the parameters. Additive and multiplicative frailty models have also been suggested (e.g. Petersen,
1998, Yashin, 1995). In these models the bivariate margins are not generally shared frailty models. Li and
Zhong (2002) suggested an additive genetic gamma frailty model, which can be used in family studies
where genetic information is available. The composite likelihood approach has also been studied by Parner
(2001), but in this paper the composite likelihood approach is linked to the two-stage estimation.
The weights suggested in Section 5 are not optimal in a mathematically precise sense, but they are
optimal in some simple situations and also seem to perform well in more complicated cases. The situation
where the families consist of members at the same level is very close to the one-dimensional case studied
by Lindsay (1988) where the weights are truly optimal. They are not difficult to calculate, and simulations
have shown that they perform well. In the case of family members at different levels the weights are more
complicated to calculate and the advantage is not as clear. In such cases, one might consider a simpler
choice of weights.
In summary, the composite likelihood approach combined with the two-stage estimation promises to
be a useful tool in the analysis of sibling groups. It also presents new possibilities when studying families
with members at different levels.
ACKNOWLEDGEMENTS
Work for this paper was started while the author was visiting the MRC Biostatistics Unit, Cambridge,
UK. The author would like to thank Per Kragh Andersen and David Clayton for their valuable
suggestions and comments and Henrik Toft Sørensen and Jørn Olsen for making the data on deep venous
thromboembolism available. The activities of the Danish epidemiology Science Centre are supported by
a grant from the Danish National Research Foundation.
A PPENDIX
We first give some notation. Let M be the number of possible pairs, X 1 = (X 11 , . . . , X 1M ), . . . , X n =
(X n1 , . . . , X n M ) n independent identically distributed replications of M pairs Y = (Y1 , . . . , Y M ),
L ik (β, θk ) the likelihood function for X ik , Uik (β, θk ) the score function,
∂
Uik (β, θk ) = log L ik .
∂θk
Define the Fisher information matrix for the kth pair
∂2
i k (θk ) = E 0 {− log L ik (β, θk )}
∂θk ∂θk
and the observed information matrix
1 n
∂2
jk (θk ) = − L ik (β, θ ).
n i=1 ∂θk ∂θk
The following assumptions (A.1) concerning the bivariate models are adapted from Parner (2001) and
assumed for Proposition 4.1.
A SSUMPTION A.1 1. The functions
∂ ∂
L ik (θk ), log L ik (θk )
∂θk ∂θk ∂θk ∂θk
are locally, uniformly in θk , dominated by integrable functions.

2. If is unbounded then for any sequence {θkn }n in such that |θkn | → ∞,

n
log L ik (θkn ) → −∞, P − a.s.
i=1
3. For any sequence {θkn }n in such that θkn → θk then
1 n
log L ik (θkn ) → E 0 {log L 1k (θk )}, P − a.s.
n i=1
4. For any sequence {θkn }n in where θkn → θk then
jkn (θkn ) → i k (θk ).
5. The parameter θk can be identified from the distribution of Yk .

6. The Fisher information matrix i k (θk ) is positive definite.
7. The expectations
E 0 {U jl (θ0 j )Ukh (θ0k )}2 < ∞
for j, k = 1, . . . , K , l = 1, . . . , dim(θ j ) and h = 1, . . . , dim(θk ).
D EFINITION A.1 Let the quantities in Proposition 4.2 be
∂
Wθ (θ, u, v1 , . . . , v K j ) = wi h (θ, u, vi , vh ),
(i,h)∈M j
∂θ
∂2
Vθ (θ, u, v1 , . . . , v K j ) = wi h (θ, u, vi , vh ),
(i,h)∈M j
∂θ 2
∂2
Vi (θ, u, v1 , . . . , v K j ) = wlh (θ, u, vl , vh )
i=l or i=h,(l,h)∈M j
∂θ∂vi
Iθ = E{−Vθ (θ0 , β0 ,
0 )}
t
Mi j (t) = Ni j (t) − exp(β Z i j ) Yi j (s)λ0 (s)ds.
0
The following assumptions are used in the proof of Proposition 4.2.

A SSUMPTION A.2 Assume the regularity conditions from Spiekerman and Lin (1998), Assumption A.1
∂ ∂2
and that ∂θ (θ, u, vi , vh ) and ∂θ 2 (θ, u, vi , vh ) are continuous and bounded functions of u, vi and vh .
The variance from Proposition 4.2 is consistently estimated by inserting the estimates in the formulae
in the following way:
Ŵ j = Wθ {θ̂ , β̂,
ˆ 0 (β̂, X 1 j ), . . . ,
ˆ 0 (β̂, X K j j )}
Kj τ
ˆ j = Iˆθβ Iˆ−1 Û β + Iˆi (ti )dˆ j (ti )
β .j
i=1 0
n
∂
Iˆθ = −n −1 ˆ 0 (β̂, X 1 j ), . . . ,
Wθ {θ̂, β̂,
ˆ 0 (β̂, X K j j )}
j=1
∂θ
n
∂
Iˆθβ = −n −1 ˆ 0 (β̂, X 1 j ), . . . ,
Wθ {θ̂, β̂,
ˆ 0 (β̂, X K j j )}
j=1
∂β

n
Iˆi (ti ) = −n −1 ˆ 0 (β̂, X 1 j ), . . . ,
Yi j (ti )Vi {θ̂, β̂,

ˆ 0 (β̂, X K j j )}
j=1

t d M̂. j (u) t
β
ˆ j (t) =
− ˆ 0 (u, β̂)
E(β̂, u)d
Iˆβ−1 Û. j .
0 S (0) (β̂) 0
R EFERENCES
A NDERSEN , E. W. (2003). Two-stage estimation in copula models used in family studies. Lifetime Data Analysis
(accepted).
BANDEEN -ROCHE , K. J. AND L IANG , K.-Y. (1996). Modelling failure-time associations in data with multiple levels
of clustering. Biometrika 83, 29–39.
B INDER , D. A. (1992). Fitting Cox’s proportional hazards models from survey data. Biometrika 79, 139–147.
G ENEST , C., G HOUDI , K. AND R IVEST , L. (1995). A semiparametric estimation procedure of dependence
parameters in multivariate families of distributions. Biometrika 82, 543–552.
G ENEST , C. AND M AC K AY , J. (1986). The joy of copulas. The American Statistician 40, 280–283.
G LIDDEN , D. V. (2000). A two-stage estimator of the dependence parameter for the Clayton–Oakes model. Lifetime
Data Analysis 6, 141–156.
H EAGERTY , P. J. AND L ELE , S. R. (1998). A composite likelihood approach to binary spatial data. Journal of the
American Statistical Association 93, 1099–1111.
H OUGAARD , P. (1986). A class of multivariate failure time distributions. Biometrika 73, 671–678.
L EE , E. W., W EI , L. J. AND A MATO , D. A. (1992). Cox-type regression analysis for large numbers of small
groups of correlated failure time observations. In Klein, J. and Goel, P. (eds), Survival Analysis: State of the
Art, Dordrecht: Kluwer, pp. 237–247.
L I , H. AND Z HONG , X. (2002). Multivariate survival models induced by genetic frailties, with application to linkage
analysis. Biostatistics 3, 57–75.
L INDSAY , B. G. (1988). Composite likelihood methods. Contemporary Mathematics 80, 221–239.
N IELSEN , G. G., G ILL , R. D., A NDERSEN , P. K. AND S ØRENSEN , T. I. A. (1992). A counting process approach
to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics 19, 25–43.
OAKES , D. (1989). Bivariate survival models induced by frailties. Journal of the American Statistical Association 84,
487–493.
PARNER , E. (2001). A composite likelihood approach to multivariate survival data. Scandinavian Journal of Statistics
28, 295–302.
P ETERSEN , J. H. (1998). An additive frailty model for correlated life times. Biometrics 54, 646–661.
ROSENDAAL , F. (1999). Venous thrombosis: a multicausal disease. The Lancet 353, 1167–1173.
S ELIGSOHN , U. AND L UBETSKY , A. (2001). Genetic susceptibility to venous thrombosis. New England Journal of
Medicine 344, 1222–1231.
S HIH , J. H. AND L OUIS , T. A. (1995). Inferences on association parameter in copula models for bivariate survival
data. Biometrics 51, 1384–1399.
S PIEKERMAN , C. F. AND L IN , D. Y. (1998). Marginal regression models for multivariate failure time data. Journal
of the American Statistical Association 93, 1164–1175.
W EI , L. J., L IN , D. Y. AND W EISSFELD , L. (1989). Regression analysis of multivariate incomplete failure time data
by modelling marginal distributions. Journal of the American Statistical Association 84, 1068–1073.
YASHIN , A., VAUPEL , J. AND I ACHINE , I. (1995). Correlated individual frailty: an advantageous approach to
survival analysis of bivariate data. Mathematical Population Studies 5, 145–159.
[Received June 10, 2002; first revision March 17, 2003; second revision April 14, 2003;
accepted for publication May 7, 2003]

Andersen E.W. - Composite Likelihood and Two-Stage Estimation in Family Studies (2004)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Andersen E.W. - Composite Likelihood and Two-Stage Estimation in Family Studies (2004)

Uploaded by

Copyright:

Available Formats

Biostatistics (2004), 5, 1, pp.

Composite likelihood and two-stage estimation in

S(t1 , . . . , t K ) = Cθ {S1 (t1 ), . . . , S K (t K )}, t1 , . . . , t K 0, (2.1)

with 0 u i 1, i = 1, . . . , K , 0 φθ , φθ (0) = 1, φθ < 0, φθ > 0.

3. T HE COMPOSITE LIKELIHOOD APPROACH

3.1 Groups of siblings

3.2 Families of parents and children

S(t1 , t2 ) = φθ1 [φθ−1

S(t1 , . . . , t K j ) = φθ3 [φθ−1

1 < θ3 < θ1 and 1 < θ3 < θ2 . (3.5)

This is a composite likelihood of exactly the same type as (3.3).

4.1 Notation and some assumptions

4.2 The asymptotic distribution in the parametric case

4.3 The asymptotic distribution in the semi-parametric case

λi j (t) = λ0 (t) exp(β Z i j ),

5.1 One association parameter

where S(i h) j is the score contribution from pair (i, h) in family j.

wopt = [varS]−1 E(U S), (5.13)

5.2 More than one association parameter

w2 = [varS 2 ]−1 E(U S 2 ) (5.17)

8. A FAMILY STUDY OF DEEP VENOUS THROMBOEMBOLISM

5329 from DNRP 49224 random sample (CRS)

Parents and sibs in CRS

and the observed information matrix

are locally, uniformly in θk , dominated by integrable functions.

3. For any sequence {θkn }n in  such that θkn → θk then

4. For any sequence {θkn }n in  where θkn → θk then

jkn (θkn ) → i k (θk ).

5. The parameter θk can be identified from the distribution of Yk .

The following assumptions are used in the proof of Proposition 4.2.

Yi j (ti )Vi {θ̂, β̂,

You might also like

3. For any sequence {θkn }n in such that θkn → θk then

4. For any sequence {θkn }n in where θkn → θk then