Professional Documents
Culture Documents
Key words: Compound symmetry projection; explained variance; R2 statistics; random intercept;
random slope.
MSC 2000: Primary 62H20; Secondary 62H10.
Abstract: Variability explained by covariates or explained variance is a well-known concept in assessing the
importance of covariates for dependent outcomes. In this paper we study R2 statistics of explained variance
pertinent to longitudinal data under linear mixed-effect models, where the R2 statistics are computed at
two different levels to measure, respectively, within- and between-subject variabilities explained by the
covariates. By deriving the limits of R2 statistics, we find that the interpretation of explained variance for
the existing R2 statistics is clear only in the case where the covariance matrix of the outcome vector is
compound symmetric. Two new R2 statistics are proposed to address the effect of time-dependent covariate
means. In the general case where the outcome covariance matrix is not compound symmetric, we introduce
the concept of compound symmetry projection and use it to define level-one and level-two R2 statistics.
Numerical results are provided to support the theoretical findings and demonstrate the performance of the
R2 statistics. The Canadian Journal of Statistics 38: 352–368; 2010 © 2010 Statistical Society of Canada
Résumé: La variation expliquée par les covariables (ou la variance expliquée) est un concept bien connu
pour mesurer l’importance de ces covariables sur la variable dépendante. Dans cet article, nous étudions la
statistique du R carré pour la variance expliquée pertinente aux données longitudinales pour des modèles
linéaires à effets mixtes. La statistique du R carré est calculée à deux niveaux différents pour mesurer la
variation expliquée par les covariables à l’intérieur et entre les sujets. En obtenant des limites aux statistiques
du R carré, nous trouvons que l’interprétation de la variance expliquée pour les statistiques du R carré
existantes est claire seulement dans le cas où la matrice de variance-covariance des observations dépendantes
est symétrique composée. Deux nouvelles statistiques du R carré sont proposées afin de prendre en compte
les effets des moyennes des covariables pouvant dépendre du temps. Dans le cas général où la matrice
de variance-covariance des observations n’est pas symétrique composée, nous introduisons le concept de
projection symétrique composée et nous l’utilisons pour définir les statistiques du R carré de niveaux 1 et
2. Des résultats numériques appuient nos résultats théoriques et ils montrent la performance des statistiques
du R carré. La revue canadienne de statistique 38: 352–368; 2010 © 2010 Société statistique du Canada
1. INTRODUCTION
In medical research studies, clustered or longitudinal data (i.e., repeated measurements from each
subject in the study) are often encountered. Linear mixed-effect models (Laird & Ware, 1982)
are particularly useful in applications as they allow assessment of within- and between-subject
variabilities. Measuring the proportion of variability in the outcomes explained by the covariates
in a linear mixed-effect model is of great interest to applied statisticians. Kent (1983) and Korn
& Simon (1991) discussed the general definition of measures of explained variance. A common
type of such measures are R2 statistics. The concept of R2 statistics of explained variance is well
known in linear regression (Helland, 1987; Draper & Smith, 1998). Mittlböck & Schemper (1996)
and Hu, Palta & Shao (2006) studied the R2 statistics for logistic regression models. Schemper
& Henderson (2000) studied the R2 statistics for proportional hazard models.
For linear mixed-effect models, measuring variability explained by covariates is more com-
plicated, since there are multiple variance components. Zheng (2000) and Xu (2003) proposed
R2 statistics to assess how the covariates explain the within-subject variance, whereas Snijders &
Bosker (1999) proposed an R2 to assess the covariate effect on the total variance of the outcomes.
In view of the particular variance components structure in linear mixed-effect models, it is useful
to consider different R2 statistics for different variance components. In the special case where
there is only a random intercept but no random slopes, Raudenbush & Bryk (2002) suggested
constructing one R2 for assessing the covariate effect on the within-subject variance and another
R2 for the between-subject variance. The R2 statistic for the within-subject variance is referred
to as a level-one R2 while the R2 statistic for the between-subject variance is referred to as a
level-two R2 . Singer & Willett (2003) studied R2 statistics for general linear mixed-effect models
that have both random intercept and slopes. Although these R2 statistics have become popular in
practice, their statistical properties have not been fully studied. It is not clear, for example, why
some R2 statistics take negative values while a statistic measuring proportion should be between
0 and 1. The purposes of the present paper are to explore what the existing R2 statistics measure
generally, to construct some new R2 statistics, and to study statistical properties of the existing
and proposed R2 statistics.
To develop the idea, we first consider in Section 2 the simple case of random intercept models,
where the covariance matrix of the outcomes of each subject is compound symmetric (see Section
2 for the definition of compound symmetry). By deriving the limits of the existing R2 statistics,
we show what they measure and how to interpret them. We also show that the limit of the existing
level-two R2 could be negative when the means of the covariates vary with time. We propose
two new R2 statistics to address this problem. Furthermore, we derive approximate sampling
distributions of these R2 statistics and construct confidence intervals of interest.
Section 3 considers general linear mixed-effect models. Although R2 statistics are defined
as for the random intercept models, their interpretation is not straightforward in the general
case since the covariance matrix of the outcome vector, unconditioning on the covariates, is not
compound symmetric. We define a geometry projection of the covariance matrix to the subspace
corresponding to a compound symmetric matrix. We then use the compound symmetry projection
to describe what are measured by the proposed R2 statistics.
In Section 4, we carry out a simulation study to examine the performance of the R2 statistics.
Section 5 applies the R2 statistics to data from a randomized clinical trial. Section 6 concludes
the paper with a brief summary and discussion.
where (α, βT )T is the parameter vector, bi ∼ N(0, σb2 ) and eit ∼ N(0, σe2 ) are, respectively, the
random intercept and error, and bi ’s, eit ’s, and Xi ’s are independent.
where α00 is a constant. In fitting this model by the maximum likelihood procedure, it is assumed
that b0i ∼ N(0, σb0
2 ), e
0it ∼ N(0, σe0 ), and b0i ’s and e0it ’s are independent. These assumptions
2
where bxi ∼ N(0, x ), exit ∼ N(0, ex ), and bxi ’s and exit ’s are independent. Under models (1)
and (3), Var(Xi β) = (βT ex β)I + (βT x β)11T and
that is, both covariance matrices Var(Xi β) and Var(yi ) are compound symmetric.
Let σ̂e2 and σ̂b2 be the maximum likelihood estimates (see Harville, 1977; Verbeke & Molen-
berghs, 2000) of the variance components σe2 and σb2 in model (1), respectively, and let σ̂e02 and
2 2 2
σ̂b0 be the maximum likelihood estimates of σe0 and σb0 in the null model (2), respectively. These
estimated variance components also depend on the maximum likelihood estimates of the regres-
sion coefficients in (1) and (2). Raudenbush & Bryk (2002) defined the level-one and level-two
R2 statistics as
σ̂e2 σ̂b2
R21 = 1 − 2
and R22 = 1 − 2
. (5)
σ̂e0 σ̂b0
The level-one R21 was also proposed by Xu (2003). Some variants of these R2 statistics exist in
the literature. For example, Zheng (2000) and Xu (2003) proposed a different level-one R2 that
2 .
uses the means of squared residuals to estimate σe2 and σe0
It follows directly from the likelihood theory and Equation (4) that under the above assump-
tions, as the number of subjects n → ∞, the two R2 statistics converge in probability to
σe2 σb2
1 = 1 − and 2 = 1 − (6)
σe2 + βT ex β σb2 + βT x β
respectively. Note that σe2 is the coefficient for the I term of the covariance matrix of yi in model
(1), while σe2 + βT ex β is the corresponding coefficient of Var(yi ) in the null model (2) without
covariates. In other words, σe2 and σe2 + βT ex β are, respectively, conditional and unconditional
within-subject variances. Thus, R21 measures the within-subject variability explained by the co-
variates. Similarly, 2 is related with the ratio of the coefficients for the 11T term in the two
models and R22 measures the between-subject variability explained by the covariates.
In many studies, however, the covariates may have time-dependent means and they follow a
more general model than model (3):
where the mean vector µxt depends on t and bxi and exit are the same as those in (3). Let
µ̄x = kt=1 µxt /k and D = kt=1 [βT (µxt − µ̄x )]2 /(k − 1). Since E(yit ) = βT µxt , D represents
the variability of the outcome means that is caused by the time-dependent covariate means. It is
shown in the Appendix that as n → ∞, R21 and R22 defined in (5) converge in probability to
σe2 σb2
D
1 =1− and D
2 =1− , (8)
σe2 + βT ex β + D σb2 + βT x β − D/k
where x̃it = xit − µxt has mean zero. The covariate mean µxt is absorbed into the outcome mean
α + βT µxt , which is deterministic and is not involved in Var(yi ). We then consider the following
null model:
where α0t is an unknown parameter. This null model differs from the null model (2) as it models
time-dependent means for the outcomes even without any covariates. Model (9) is nested in model
(1) under assumption (7).
2 and σ̃ 2 be the maximum likelihood estimates of σ 2 and σ 2 , respectively, under
Let σ̃e0 b0 e0 b0
model (9). We propose two new R2 statistics:
σ̂e2 σ̂b2
R̃21 = 1 − 2
and R̃22 = 1 − 2
, (10)
σ̃e0 σ̃b0
where σ̂e2 and σ̂b2 are the same variance estimates used in the definition of the R2 statistics (5).
As n → ∞, R̃2l converges in probability to l given in (6), l = 1, 2. Hence, the new statistics
R̃21 and R̃22 measure variability explained by covariates regardless of whether the covariates have
time-dependent means, and their limits are always between 0 and 1.
The new R̃21 in (10) is more appropriate from the perspective of studying how much the covari-
ates reduce the within-subject variance of the outcomes. If one wants a level-one R2 that measures
not only the within-subject variance explained by the covariates but also how the outcome means
are explained by the covariate means, R21 in (5) is preferred. To measure the covariate effect on
the between-subject variance, R̃22 in (10) is always more appropriate than R22 in (5) because R22
has the problems described in issue 3 in Section 2.1.
√ √
n(R2l − D
l ) →d N(0, σl )
2
and n(R̃2l − l ) →d N(0, σ̃l2 ) (11)
for l = 1, 2, where →d denotes convergence in distribution. The variances σl2 and σ̃l2 can be
estimated, respectively, by
2
σ̂l2 = ∇fl (T̄ )T
ˆ T ∇fl (T̄ ) and σ̃ˆ l = ∇ f̃l (T̄ )T
ˆ T ∇ f̃l (T̄ ),
where ∇f represents the gradient operator; the functions fl and f̃l are defined in (19) and
(20) in the Appendix, T̄ and ˆ T are the sample mean and covariance of the vectors T̂i =
2 2
(r̂i1 , . . . , r̂ik , r̄i. , yi1 , . . . , yik , ȳi. , yi1 , . . . , yik )T , i = 1, . . . , n, r̂it ’s are the residuals of model (1),
2 2 ˆ 2 2
ȳi. = kt=1 yit /k, and r̄ˆ i. = kt=1 r̂it /k.
For not very large n, a better confidence interval for l or D l can be obtained by using the
transformation log(1 + Rl ). 2
σe2 σb2
1 = 1 − and 2 = 1 − , (13)
σe2 + Ew σb2 + Eb
σe2 σb2
D
1 =1− and D
2 =1− , (14)
σe2 + Ew + D σb2 + Eb − D/k
respectively, where
k k
t=1 Var(xit β) − kVar(x̄i. β) t=1 E(zit c zit ) − kE(z̄i. c z̄i. )
T T T T
Ew = + ,
k−1 k−1
T Cov(x , x )β
2 t<l [β it il + E(zTit c zil )]
Eb = + 2µ̄Tz bc ,
k(k − 1)
µ̄z = kt=1 µzt /k and D is the same as that in Section 2.1. Although Ew is always nonnegative,
the value of Eb may be negative in some situations. For longitudinal data, covariate vectors xit
and xil (and zit and zil ) are usually positively correlated and, thus, Eb is typically nonnegative.
The interpretation of these limits is not trivial since the meanings of Ew and Eb are not clear.
The complication arises from the fact that the covariance matrix Var(yi ) in the null models may not
be compound symmetric. Hence, the within- and between-subject variances are not well-defined
and it is difficult to interpret σe2 + Ew and σb2 + Eb . For example, even in the special case where
the covariate follows model (7), the covariance matrix of yi is
where Mz = (µz1 , . . . , µzk )T , Var(zit ) = ez + z and Cov(zit , zil ) = z for t = l. Var(yi ) is
not compound symmetric unless (i) c = 0 and bc = 0 so that the model reduces to a random
intercept model, or (ii) µzt = µz for all t so that Mz = 1µTz .
To interpret what the R2 statistics in (5) and (10) measure under the general linear mixed-effect
model (12), we introduce a concept of compound symmetry projection. For any random k-vector
y, the covariance matrix Var(y) = with the (j, l) element σjl can be expressed as a vector
Let A1 , . . . , Ak(k+1)/2 be a set of base vectors for the k(k + 1)/2 dimensional Euclidean space
Rk(k+1)/2 with A1 = (1Tk , 0Tk(k−1)/2 )T (corresponding to the identity matrix Ik ) and A2 = 1Tk(k+1)/2
(corresponding to 1k 1Tk ), where 1d is a d-vector of ones and 0d is a d-vector of zeros. The rest of
base vectors shall be chosen to be linearly independent with A1 and A2 , for example,
k(k+1)/2
Then, V can be written as V = j=1 cj Aj , where cj ’s are coefficients depending on .
In particular,
k k
j=1 Var(yij ) − kVar(ȳ) j=1 σjj
c1 = = − c2 , (15)
k−1 k
c1 2 1≤j<l≤k σjl
c2 = Var(ȳ) − = .
k k(k − 1)
Var(yi ) = σe2 I + σb2 11T + Var(Xi β) + E(Zi c ZTi ) + Mz bc 1T + 1Tbc MTz , (16)
whose compound symmetry projection is exactly (σe2 + Ew )I + (σb2 + Eb )11T . Note that σe2 +
Ew appears in the limit 1 given by (13), representing the within-subject variance of the outcomes
in the null model without covariates. Thus, R̃21 measures how the covariates explain the within-
subject variance. The old R21 , however, mixes the covariate effects on the within-subject variance
and the means of the outcomes. Similarly R̃22 measures how the covariates explain the between-
subject variance. R22 , however, is still not an appropriate measure unless the covariate mean is
constant over time, because of the reasons discussed in Section 2.
ki
[Var(xitT β) − ki Var(x̄i.T β) + E(zTit c zit ) − ki E(z̄Ti. c z̄i. )] = (ki − 1)Ewi .
t=1
σe2
1 = 1 − n ki −1
. (17)
σe2 + i=1 N−n Ewi
Following the discussion in Section 3.1, σe2 + Ewi isthe within-subject variance of yi uncon-
ditional on the covariates, and the weighted average ni=1 ((ki − 1)/(N − n))(σe2 + Ewi ) can be
considered as an overall within-subject variance of the outcomes in the null model. The interpre-
tation of R̃21 is thus the same as that for balanced data.
2 of σ 2 in
For the between-subject variance, the limit of the maximum likelihood estimate σ̃b0 b0
model (9) satisfies the equation
n 2
σe0 k2 /(σ 2 + ki σb0
2 )2
2
σb0 = wi Var(ȳi. ) − , wi = n i 2 e0 2 .
j=1 kj /(σe0 + kj σb0 )
ki 2 2
i=1
This equation is not an explicit solution of σb02 since w depends on both σ 2 and σ 2 unless k = k
i e0 b0 i
for all i. Since σe0 is the unconditional within-subject variance, Var(ȳi. ) − σe0
2 2 /k can be considered
i
as an approximation to the unconditional between-subject variance of y i following the definition of
c2 in the compound symmetry projection (15). The weighted average ni=1 wi [Var(ȳi. ) − σe0 2 /k ]
i
is then an approximation to the between-subject variance of the outcomes in the null model without
covariates. As n → ∞, R̃22 − 2 converges to 0 in probability, where
σ2
2 = 1 − b . (18)
n 2
σe0
i=1 wi Var(ȳi. ) − ki
Hence, the interpretation of the level-two statistic R̃22 is also the same as that for balanced data.
On the other hand, the R2 statistics defined by (5) still include the effect of the time-dependent
covariate means. As n → ∞, R21 − D 1 converges to 0 in probability, where
n ki
σe2 i=1 t=1 (β
T (u
xit − ūxi. ))2
D
1 =1− n ki −1
, D= .
σe2 + i=1 N−n Ewi +D N −n
Note that D quantifies the effect of covariate means on the outcome means. Similarly, R22 also
involves D and can be negative in some cases.
Results (17) and (18) reduce to the earlier results (13) for balanced data and further reduce to
(6) for the random intercept model with balanced data.
Under the general model (12), it is difficult to explicitly derive the asymptotic distributions of
the R2 statistics, although R2 statistics are still asymptotically normal since the maximum likeli-
hood estimators are asymptotically normal. The bootstrap technique can be applied to calculate
the confidence intervals. Ukoumunne et al. (2003) proposed a nonparametric bootstrap proce-
dure to calculate the confidence intervals for the intraclass correlation coefficient by resampling
the subjects. Their method can be easily applied to the R2 statistics. A transformation such as
log(1 + R2 ) may be applied to improve the performance of confidence intervals when n is not
very large.
Table 1: Simulation results of R2 s for random intercept models with balanced data.
D β R21 (D
1) CP R̃21 (1 ) CP R22 (D
2) CP R̃22 (2 ) CP
(n, k) = (100, 4)
0 0.5 0.36 (0.36) 95.0 0.36 (0.36) 94.3 0.13 (0.14) 93.3 0.13 (0.14) 93.4
1.0 0.69 (0.69) 94.5 0.69 (0.69) 95.1 0.38 (0.39) 93.8 0.38 (0.39) 94.6
1 0.5 0.68 (0.68) 94.5 0.35 (0.36) 94.0 0.08 (0.09) 93.7 0.13 (0.14) 94.2
1.0 0.89 (0.89) 94.0 0.69 (0.69) 95.2 0.26 (0.28) 94.5 0.38 (0.39) 93.1
kβ2 x 0.5 0.87 (0.87) 95.1 0.36 (0.36) 94.8 0.01 (0.00) 94.7 0.14 (0.14) 94.6
1.0 0.96 (0.97) 95.3 0.69 (0.69) 95.2 −0.06 (0.00) 93.6 0.38 (0.39) 93.1
(n, k) = (50, 8)
0 0.5 0.36 (0.36) 94.9 0.35 (0.36) 94.8 0.13 (0.14) 92.7 0.13 (0.14) 92.8
1.0 0.69 (0.69) 93.5 0.69 (0.69) 94.6 0.37 (0.39) 93.8 0.37 (0.39) 94.1
1 0.5 0.68 (0.68) 93.6 0.35 (0.36) 93.8 0.11 (0.11) 93.6 0.13 (0.14) 93.2
1.0 0.90 (0.89) 95.8 0.69 (0.69) 94.6 0.32 (0.34) 93.2 0.38 (0.39) 93.1
2
kβ x 0.5 0.90 (0.90) 94.4 0.35 (0.36) 93.8 −0.02 (0.00) 94.2 0.13 (0.14) 93.0
1.0 0.97 (0.97) 94.1 0.69 (0.69) 93.8 −0.06 (0.00) 93.5 0.38 (0.39) 92.9
4. A SIMULATION STUDY
In this section we demonstrate by simulation the finite sample performance of the R2 statistics.
We considered linear mixed-effect models with a single covariate, which was assumed to follow
decomposition (7), that is, xit = µxt + bxi + exit with bxi ∼ N(0, 0.64) and exit ∼ N(0, 0.36).
The covariate means were chosen to give different values of D.
We first considered balanced data. Two different sample sizes were considered: n = 100
subjects with k = 4 observations for each subject; and n = 50 subjects with k = 8 observations
for each subject. For the random intercept model (1), the data were generated from
where the variances of bi and eit were 0.16 and 1, respectively. The coefficient β was 0.5 or 1. For
each combination of sample size and model parameter, we simulated the R2 statistics 1000 times.
Table 1 shows the simulation results. The R2 statistics shown are the average over 1000
simulations. All the R2 statistics are very close to their limits. The values of the R2 statistics
increase with the coefficient β. For each fixed β, the R2 statistics behave differently for different
values of D. In the first case where D = 0, R21 and R̃21 are about the same since 1 is identical to
D 2
1 when the covariate mean is constant over time. Similar results are obtained for R2 and R̃2 .
2
In the second case where D = 1, R1 becomes larger than that in the first case while the value of
2
R22 decreases. These results suggest that the time-dependent covariate means affect R21 and R22 as
discussed in the previous sections. The existence of the time-dependent covariate means inflates
R21 but lowers R22 . By contrast, the values of the proposed R̃21 and R̃22 tend to be the same as those
in the first case, which indicates that they do not depend on the covariate means. In the third case
where D = kβ2 x , R21 is very close to the maximum one since D is much larger than σe2 and
D β R21 (D
1) CP R̃21 (1 ) CP R22 (D
2) CP R̃22 (2 ) CP
(n, k) = (100, 4)
0 0.5 0.53 (0.53) 93.4 0.52 (0.53) 93.7 0.37 (0.36) 93.3 0.37 (0.36) 93.4
1.0 0.74 (0.74) 93.8 0.74 (0.74) 93.2 0.52 (0.51) 94.3 0.52 (0.51) 94.3
kβ2 x 0.5 0.90 (0.90) 92.5 0.83 (0.84) 91.8 0.32 (0.33) 94.6 0.39 (0.40) 93.6
1.0 0.96 (0.96) 95.2 0.87 (0.87) 94.1 0.34 (0.33) 94.1 0.54 (0.53) 94.4
(n, k) = (50, 8)
0 0.5 0.53 (0.53) 92.0 0.52 (0.53) 91.5 0.37 (0.36) 92.4 0.37 (0.36) 93.5
1.0 0.74 (0.74) 93.8 0.74 (0.74) 93.3 0.51 (0.51) 93.5 0.51 (0.51) 93.4
kβ2 x 0.5 0.95 (0.95) 92.9 0.90 (0.90) 93.7 0.38 (0.38) 94.4 0.44 (0.44) 94.1
1.0 0.98 (0.98) 93.6 0.91 (0.92) 93.6 0.39 (0.38) 95.2 0.56 (0.56) 93.9
β2 ex . A disturbing result is observed for the level-two R22 in this case, that is, R22 is nearly zero
because D 2 = (β x − D/k)/(1 + β x − D/k). On the other hand, the proposed R̃1 and R̃2
2 2 2 2
have the same limits as those in the first two cases. They appear to be more appropriate in the
existence of a strong variation in the covariate means. The confidence intervals obtained from the
asymptotic distributions (11) perform well as the percentages of covering the true limits are close
to the nominal level 95%.
Table 2 shows the simulation results for the general linear mixed-effect model
where the random slope ci has a variance of c = 0.25, and is independent of the random intercept
(bc = 0). The comparison of R2l and R̃2l is very similar to that in Table 1. In the case where
D = 0, R21 is very close to R̃21 and R22 is close to R̃22 . In the case where D = kβ2 x , R21 increases
while R22 decreases. Unlike the random intercept model, R22 does not degenerate to zero since
the random slope also contributes to explained variance. The bootstrap method was applied to
construct confidence intervals for the limits of R2 statistics with the transformation log(1 + R2 ).
The bootstrap Monte Carlo size was 200. The coverage percentages are generally close to the
nominal level.
The second part of the simulation was for unbalanced data with n = 100 subjects. For each
subject, the number of observations ki was randomly chosen from 2, 3, and 4. The data were
generated from the random intercept model described earlier in this section. We considered two
cases for the covariates: constant means (D = 0) and time-dependent means (D = 0). Figure 1
plots the means, 2.5th and 97.5th percentiles of 1000 simulated R2 statistics for different values
of β in each case. All R2 statistics increase as β increases. For each fixed β, the new and old R2
statistics are almost identical when D = 0. When D = 0, the new R2 statistics exclude the effect
of variation in covariate means while the old R2 statistics include such an effect. In particular,
the level-two R22 is around zero in some cases.
1.0
1.0
Old measure
New measure
0.9
0.9
Level−one R2
0.8
0.8
Old measure
New measure
0.7
0.7
β 1 1.25 1.5 2 2.5 3 β 1 1.25 1.5 2 2.5 3
0.8
New measure New measure
0.6
0.6
Level−two R2
0.4
0.4
0.2
0.2
0.0
0.0
Figure 1: Means, 2.5th and 97.5th percentiles of the simulated R2 s for unbalanced data.
5. AN EXAMPLE
In this section we illustrate our results by data from the American African Study of Kidney
Disease (AASKD). The AASKD study was a randomized clinical trial to study end-stage renal
disease in American Africans. During a 5-year study period, a total of 1094 patients were enrolled
and each patient was randomized into one of the two treatment groups. We study the biomarker
albumin, which was measured annually for each patient (k = 5). There were 264 patients who
reached a terminal event of death or end-stage renal disease. These patients were excluded in our
analysis because their albumin measurements were not available after the occurrence of terminal
events. We used a linear mixed-effect model to relate albumin to six covariates. Four covariates,
gender, baseline age, baseline albumin, and treatment, are time-independent. Urinary protein
was measured annually and it is thus a time-dependent covariate. The time at which albumin was
measured is also a time-dependent continuous covariate, and there are between- and within-patient
variations. Figure 2 shows a boxplot of the covariate “time” at five annual visits, which indicates
a time-trend in the mean of the covariate “time.”
We first consider the random intercept model (1). Table 3 shows the values of R2 statistics,
estimated fixed covariate effects, and their 95% confidence intervals under model (1). The level-
one R21 is much larger than the proposed R̃21 . This is expected from our discussion in Section 2
(i.e., the effect of D), since the time covariate has a strong trend in its mean (Figure 2) and the
time effect on the outcome is significant (P-values <0.001). On the other hand, R̃22 is only slightly
larger than R22 . This can be explained by the fact that the effect of D is divided by k = 5 in D2
given by (8). In this example, the level-two R2 statistics are much larger than the level-one R2
statistics, which suggests that the covariates contribute more to the between-subject variability
than to the variability within each subject. In theory the four time-independent covariates do not
contribute to the within-subject variability.
48
36
Time (months)
24
12
0
1 2 3 4 5
Annual Visit
We next consider model (12) with zit being the time-dependent covariates time and urinary
protein. The values of R2 statistics, estimated fixed covariate effects, and their 95% confidence
intervals under model (12) are also given in Table 3. Although the estimates of fixed covariate
effects under two models are nearly equal and the relationship between R2l and R̃2l remains the
same, including random slopes substantially increases the R2 values. Since model (1) is more
restrictive than model (12), the results under model (12) are more reliable.
Table 3: R2 statistics and estimated fixed effects for the albumin data example.
APPENDIX
We assume that the covariates are random and independent of the random effects and error.
Moreover, for any fixed t, x1t , · · · , xnt are i.i.d. and z1t , · · · , znt are i.i.d.
(i) Proof of the limits (6), (8), (13), and (14) of the R2 statistics. Since (6) and (8) are special
cases of (13) and (14), respectively, we only prove (13) and (14). When the data are balanced, the
maximum likelihood estimates of the variance components in the null models (2) and (9) have
the following explicit forms:
n k n
i=1 t=1 (yit − ȳi. )2 i=1 (ȳi. − ȳ.. )2 2
σ̂e0
2
σ̂e0 = , 2
σ̂b0 = − ;
n(k − 1) n k
n
k n
t=1 (yit − ȳ.t ) − k(ȳi. − ȳ.. )
2 2
i=1 (ȳi. − ȳ.. )2 σ̃ 2
2
σ̃e0 = , σ̃b0 = i=1
2
− e0 ,
n(k − 1) n k
which only involve sample moments of the outcome variables. Because (y1t , . . . , ynt ) are i.i.d.,
and (ȳ1. , . . . , ȳn. ) are also i.i.d, it follows from the Law of Large Numbers that
k k
t=1 E(yit ) − kE(ȳi. ) t=1 Var(yit ) − kVar(ȳi. )
2 2
2
σ̂e0 →p σe1
2
= , 2
σ̃e0 →p σe2
2
= ,
k−1 k−1
2
σe1 2
σe2
2
σ̂b0 →p Var(ȳi. ) − , 2
σ̃b0 →p Var(ȳi. ) − ,
k k
where →p denotes convergence in probability. Under the general model (12), we have
σe2
Var(ȳi. ) = + σb2 + βT Var(x̄i. )β + E(z̄Ti. c z̄i. ) + 2µ̄Tz bc .
k
Therefore,
k k
t=1 Var(xit ) − kVar(x̄i. ) t=1 E(zit c zit ) − kE(z̄i. c z̄i. )
T T
2
σe2 = σe2 +β T
β+
k−1 k−1
= σe2 + Ew .
Similarly
T Cov(x , x )β
σ2 2 t<l β it il + E(zTit c zil )
Var(ȳi. ) − e2 = σb2 + + 2µ̄Tz bc = σb2 + Eb .
k k(k − 1)
Hence, result (13) holds since σ̂e2 and σ̂b2 from the full model (12) are consistent estimates for σe2
and σb2 , respectively. The limits for the old R2 statistics (14) follow from the fact that
n
it ) − kE
2 (y 2 (ȳ
t=1 E i. )
2
σe1 − σe2
2
= ≡D
k−1
2 − σ 2 = −D/k.
and σb1 䊏
b2
(ii) Proof of (11) for random intercept models. Let rit = yit − α − βT xit and r̂it = yit − α̂ −
where θ̂ = (α̂, β̂T ) denotes MLE for (α, β) in model (1). When the data are balanced, we
β̂T xit ,
have
n k n 2 2
i=1 t=1 r̂it − k
2 ˆ
i=1 r̄ i. k2 ni=1 r̄ˆ i. − ni=1 kt=1 r̂it2
σ̂e =
2
, σ̂b =
2
.
n(k − 1) nk(k − 1)
Let Ti = (ri1
2 , . . . , r 2 , r̄ 2 , y2 , . . . , y2 , ȳ2 , yT )T be a random (3k + 2)-vector, which has a mean
ik i. i1 ik i. i
of µT and a covariance matrix of T . Define f1 (·) and f2 (·) as two functions from the Euclidean
space R3k+2 to R, for a vector a = (a1 , . . . , a3k+2 )T ,
k
i=1 ai − kak+1
f1 (a) = 1 − 2k+1 ,
i=k+2 ai − ka2k+2
k2 ak+1 − ki=1 ai
f2 (a) = 1 −
2 . (19)
k−1 3k+2
k2 a2k+2 − 2k+1
i=k+2 ia − k a
i=2k+3 i
Consider R21 as a function of the maximum likelihood estimate θ̂, and write R21 = R21 (θ̂). Then
f1 (T̄ ) = R21 (θ̂), which has the same sampling distribution as R21 . By the Central Limit Theorem,
with σ12√= ∇f1 (µT )T T ∇f1 (µT ) by the delta-method. Note that f1 (µT ) is identical to D 1.
Hence n(R21 (θ̂) − 1 )√→d N(0, σ12 ). The result for R22 can be established in the same way.
Similarly, for l = 1, 2, n(R̃2l −
˜ l ) →d N(0, σ̃l2 ) with σl2 = ∇fl (µT )T T ∇fl (µT ) and σ̃l2 =
∇ f̃l (µT ) T ∇ f̃l (µT ), where
T
k
− kak+1i=1 ai
f̃1 (a) = 1 −
2 ,
2k+1 3k+2 3k+2
a
i=k+2 i − ka 2k+2 − a
i=2k+3 i
2+ a
i=2k+3 i /k
k2 ak+1 − ki=1 ai
f̃2 (a) = 1 −
2 . (20)
k2 a2k+2 − 2k+1 a
i=k+2 i − 3k+2
a
i=2k+3 i + 3k+2
a
i=2k+3 i
2
(iii) Proof of (17) for unbalanced data. Let µ̂0it be the estimate of the mean of yit in the null
model (9). The score equations for the variance components are
1 1
n ki n
N −n ki r̄i.2
2
− 4
(r it − r̄ i. )2
= − ,
σe0 σe0 i=1 t=1
λ
i=1 i i=1
λ2i
ki n
ki2 r̄i.2
= ,
λ
i=1 i i=1
λ2i
n
where rit = yit − µ̂0it , λi = σe0
2 + k σ 2 and N =
i b0 i=1 ki .
First, the second score equation can be rewritten as
n
σ2
2
σb0 = wi r̄i.2 − e0
ki
i=1
with
k2 /λ2
wi = n i 2i 2 .
i=1 ki /λi
It is easy to verify that û0it is a consistent estimate for the mean E(yit ) since the null model (9)
correctly specifies the mean of yit (though the covariance-structure could be misspecified). In fact
µ̂0it is a weighted average of the responses measured at time point j. The first score equation can
be written as
n
ki
i=1
2
t=1 rit − ki r̄i.2 σe4 λi − ki r̄i.2
n
2
σ̂e0 = + .
N −n N −n λ2i
i=1
The second term on the right-hand side is of order op (1) because of the second score equation
2 by
and the fact that ki is bounded. Thus, we approximate σ̂e0
n ki 2
i=1 r
t=1 it − k i r̄ 2
i.
,
N −n
which is essentially the ANOVA estimate. The limit (17) holds since
n ki 2
i=1 r
t=1 it − k r̄
i i.
2
N −n
is asymptotically equivalent to
n
ki
i=1 t=1 Var(yit ) − ki Var(ȳi. )
.
N −n
䊏
ACKNOWLEDGMENTS
The authors thank the editor, the associate editor and two referees for their helpful comments and
suggestions.
BIBLIOGRAPHY
N. R. Draper & H. Smith (1998). “Applied Regression Analysis.” New York: John Wiley & Sons, Inc.
D. A. Harville (1977). Maximum likelihood approaches to variance component estimation and to related
problems. Journal of the American Statistical Association, 72, 320–338.
I. S. Helland (1987). On the interpretation and use of R2 in regression analysis. Biometrics, 43, 61–69.
B. Hu, M. Palta & J. Shao (2006). Properties of R2 statistics for logistic regression. Statistics in Medicine,
25, 1383–1395.
J. T. Kent (1983). Information gain and a general measure of correlation. Biometrika, 70, 163–173.
E. L. Korn & R. Simon (1991). Explained residual variation, explained risk, and goodness of fit. American
Statistician, 45, 201–206.
N. M. Laird & J. H. Ware (1982). Random effects models for longitudinal data. Biometrics, 38, 963–974.
M. Mittlböck & M. Schemper (1996). Explained variation for logistic regression. Statistics in Medicine, 15,
1987–1997.
S. W. Raudenbush & A. S. Bryk (2002). “Hierarchical Linear Models: Applications and Data Analysis
Methods.” Newbury Park, CA: Sage publications.
M. Schemper & R. Henderson (2000). Predictive accuracy and explained variation in Cox regression. Bio-
metrics, 56, 249–255.
J. D. Singer & J. B. Willett (2003). “Applied Longitudinal Analysis: Modeling Change and Event Occur-
rence.” New York: Oxford University Press.
T. A. Snijders & R. J. Bosker (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel
Modeling. Sage Publications.
O. C. Ukoumunne, A. C. Davision, M.C. Gulliford & S. Chinn (2003). Non-parametric bootstrap confidence
intervals for the intraclass correlation coefficient. Statistics in Medicine, 22, 3805–3821.
G. Verbeke & G. Molenberghs (2000). “Linear Mixed Models for Longitudinal Data.” New York: Springer.
G. Verbeke & E. Lesaffre (1997). The effect of misspecifying the random-effects distribution in linear mixed
models for longitudinal data. Computational Statistics & Data Analysis, 23, 541–556.
R. Xu (2003). Measuring explained variation in linear mixed effects models. Statistics in Medicine, 22,
3527–3541.
B. Y. Zheng (2000). Summarizing the goodness of fit of generalized linear models for longitudinal data.
Statistics in Medicine, 19, 1265–1275.