Professional Documents
Culture Documents
Kwi 017
Kwi 017
1
Copyright © 2005 by the Johns Hopkins Bloomberg School of Public Health Printed in U.S.A.
All rights reserved DOI: 10.1093/aje/kwi017
1 Clinical Research Unit, Hvidovre University Hospital, University of Copenhagen, Hvidovre, Denmark.
2 Department of Community Medicine, Malmö University Hospital, Lund University, Malmö, Sweden.
The logistic regression model is frequently used in epidemiologic studies, yielding odds ratio or relative risk
interpretations. Inspired by the theory of linear normal models, the logistic regression model has been extended
to allow for correlated responses by introducing random effects. However, the model does not inherit the
interpretational features of the normal model. In this paper, the authors argue that the existing measures are
unsatisfactory (and some of them are even improper) when quantifying results from multilevel logistic regression
analyses. The authors suggest a measure of heterogeneity, the median odds ratio, that quantifies cluster
heterogeneity and facilitates a direct comparison between covariate effects and the magnitude of heterogeneity
in terms of well-known odds ratios. Quantifying cluster-level covariates in a meaningful way is a challenge in
multilevel logistic regression. For this purpose, the authors propose an odds ratio measure, the interval odds ratio,
that takes these difficulties into account. The authors demonstrate the two measures by investigating
heterogeneity between neighborhoods and effects of neighborhood-level covariates in two examples—public
physician visits and ischemic heart disease hospitalizations—using 1999 data on 11,312 men aged 45–85 years
in Malmö, Sweden.
data interpretation, statistical; epidemiologic methods; hierarchical model; logistic models; odds ratio; residence
characteristics
A growing number of epidemiologic studies apply multi- The partition of variance at different levels (e.g., neighbor-
level regression analysis for the investigation of associations hood and individual) is the sine qua non of multilevel regres-
between area of residence (e.g., neighborhood) and indi- sion analysis, and its consideration is relevant for both
vidual health (1, 2). A majority of studies have focused on statistical reasons (improved estimation) and substantive
traditional measures of association such as fixed effects epidemiologic reasons (quantification of the importance of
using regression models for the relation between neighbor- the neighborhoods for understanding individual health) (5).
hood (or cluster) characteristics and individual health. Other However, contrary to normally distributed continuous vari-
multilevel regression analysis approaches have concentrated ables, components of variance are tricky to investigate when
on determining the components of health variation (3). On it comes to dichotomous response variables. Forcing clas-
some occasions, difficult questions arise, such as “Is it sical interpretative schemes wastes information and may be
worthwhile to study the association between neighborhood inappropriate. New measures are needed in order to quantify
characteristics and health when the neighborhood variation effects and ultimately provide a better understanding of the
is very small?” (4). Neither pure fixed-effects regression data.
analysis nor calculation of intraclass correlations can In this paper, our aim is to highlight two measures previ-
provide sufficient insight to answer such questions in a satis- ously described (6): the median odds ratio (MOR) and the
factory way. interval odds ratio (IOR). These measures facilitate the inte-
Correspondence to Dr. Klaus Larsen, Clinical Research Unit, Section 136, Hvidovre University Hospital, Kettegård Allé 30, DK-2650 Hvidovre,
Denmark (e-mail: klaus.larsen@hh.hosp.dk).
81 Am J Epidemiol 2005;161:81–88
82 Larsen and Merlo
gration and presentation of both fixed and random effects in The model
logistic regression.
Consider a population of N individuals. Each individual
has a vector of covariates, x, and each individual belongs to
MOTIVATION FOR NEW MEASURES OF EFFECT SIZE IN one of K clusters. The parameters corresponding to the cova-
TWO-LEVEL MODELS WITH DICHOTOMOUS riates are in the vector β. The K mutually independent cluster
RESPONSES variables, u1, u2, ..., uK, are not to be estimated, since they are
The multivariate normal distribution is an attractive frame- not of interest per se; rather, it is the variation between clus-
work for statistical modeling, when data on the response ters that needs be quantified. Therefore, a normal distribu-
variable are continuous and normal. However, when the tion is assumed for the u’s, and parameters characterizing
response variable is dichotomous, the distribution is incor- this distribution can then be used to characterize the hetero-
rect, and therefore conclusions based on the normal distribu- geneity induced by the random effects.
tion are likely to be flawed. The response variable, Y, is a dichotomous variable. That
When formulating a two-level model, it is common to is, for each individual, it is observed whether Y = 0 or 1. The
assume normality for the cluster-level (level 2) variation and model has two levels.
Am J Epidemiol 2005;161:81–88
Quantification of Heterogeneity due to Clustering 83
Am J Epidemiol 2005;161:81–88
84 Larsen and Merlo
TABLE 1. Estimates and standard errors from analyses of hospitalization for ischemic heart disease (models 1 and 2)
and visiting a public physician (models 3 and 4), Malmö, Sweden, 1999
Fixed-effects variables ( β̂ )
Age group (years)
70–74 vs. 65–69 0.458 0.132 0.458 0.132 0.116 0.066 0.116 0.066
75–79 vs. 65–69 0.498 0.137 0.506 0.137 0.262 0.073 0.264 0.074
80–85 vs. 65–69 0.733 0.142 0.750 0.142 0.437 0.083 0.442 0.083
Education (low vs. high) 0.223 0.116 0.200 0.117 0.243 0.062 0.239 0.062
Neighborhood education
had visited a public physician (coded as 1) or not (coded as tion versus a high level were 1.25 in model 1 and 1.22 in
0). We restricted our analysis to persons who had used the model 2. These odds ratios are conditional on age and
health-care system and visited a general practitioner at least neighborhood.
once during the year 1999. Note that the above odds ratios are correct only for
Parameters in the random-effects model were estimated comparisons of persons belonging to the same cluster. When
using restricted iterative generalized least squares. The comparing persons from different clusters, it is necessary to
MLwiN software package (15), version 1.1, was used to calculate an IOR or leave the subject-specific interpretation
perform the analyses. Extrabinomial variation was explored and consider population-averaged odds ratios. The popula-
systematically in all of the models, and there was no indica- tion-averaged odds ratio is the odds ratio between two
tion of either under- or overdispersion. persons from different clusters. In models 1 and 2, the esti-
Population-averaged parameters were calculated on the mates on the linear predictor scale are only shrunk by factors
basis of the approximate formula (7) of
2 2
2 2 2
β̃ = β̂ ⁄ 1 + ( 16 × 3 ⁄ ( 15 × π ) ) × σˆ , 1 ⁄ 1 + ( 16 × 3 ⁄ ( 15 × π ) ) × 0.028 = 0.995
and
where β̃ is the population-averaged parameter and β̂ is the
2
subject-specific parameter. The cluster variance is σ̂ . 2 2
1 ⁄ 1 + ( 16 × 3 ⁄ ( 15 × π ) ) × 0.009 = 0.998,
Example 1: hospitalization for ischemic heart disease respectively. This indicates an apparently small amount of
heterogeneity of the clusters. Thus, in effect, the attenuation
In example 1, two models are considered. In model 1, age is hardly visible in this analysis.
and individual education were included together with a In model 1, the cluster heterogeneity is interpreted in the
random neighborhood effect. Model 2 was an extension of following way. Consider two randomly chosen persons with
model 1 that also included the cluster-level covariate neigh- the same covariates from two different clusters (e.g., two
borhood education. more-educated persons aged 65–69 years) and conduct the
In both analyses, it appeared that older people were more hypothetical experiment of calculating the odds ratio for the
likely to be hospitalized than younger people and that less- person with the higher propensity to be hospitalized versus
educated persons were more likely to be hospitalized than the person with the lower propensity. Repeating this compar-
the more educated. The estimates shown in table 1 were very ison of two randomly chosen persons will lead to a series of
similar in the two models, but there were some differences odds ratios. This distribution is estimated when estimating
(discussed in detail below). The parameter estimates were the parameters, and it is shown in figure 1 (on the log scale).
transformed into odds ratios, which are shown in table 2. The median of these odds ratios between the person with a
The individual-specific fixed effects are conditional on the higher propensity and the person with a lower propensity is
random effects. That is, they may be interpreted as odds estimated to be 1.17 in model 1. This is a low odds ratio, and
ratios for within-cluster comparisons. For the individual it suggests that the clustering effect is small even without
education variable, the odds ratios for a low level of educa- inclusion of any cluster-level covariates. When neighbor-
Am J Epidemiol 2005;161:81–88
Quantification of Heterogeneity due to Clustering 85
TABLE 2. Odds ratios for being hospitalized for ischemic heart disease (models 1 and 2) and visiting a public
physician (models 3 and 4), Malmö, Sweden, 1999
* Not included.
hood education is included (model 2), the unexplained [1.07; 1.50]. This means that when randomly choosing two
cluster heterogeneity (comparing persons from neighbor- persons with identical individual covariates (e.g., 80- to 85-
hoods of the same kind—for example, both neighborhoods year-olds with a low individual level of education) from a
with a high level of education) decreases, yielding an MOR low-education neighborhood and a high-education neighbor-
of 1.09, which is a very low odds ratio. Thus, there is very hood, the odds ratio between the two lies within the interval
little variation between neighborhoods in the propensity for in 80 percent of the cases. The estimated distribution of the
hospitalization for ischemic heart disease. odds ratios is shown in figure 2 (on the log scale). Two
Although the cluster heterogeneity is small in both models, things are worth noticing regarding this interval. First, the
it is interesting that neighborhood education accounts for interval does not contain 1. This suggests that the effect of
quite a large part of the variation between clusters. This can neighborhood education is large relative to the cluster effect,
best be seen from the IOR in the following way. The IOR is because apparently there is little chance that the person from
Am J Epidemiol 2005;161:81–88
86 Larsen and Merlo
the less-educated neighborhood has a lower propensity for reflected in the IOR, which is very broad: [0.28; 27.3]. The
hospitalization than the person from the more-educated interpretation is that if one is randomly selecting two
neighborhood. Second, the interval appears to be relatively persons, one from a low-education neighborhood and one
narrow, which suggests that there is little cluster heteroge- from a high-education neighborhood, and comparing their
neity (as does the MOR). Therefore, a fairly large proportion odds of having visited a public physician, the middle 80
of the (small) variation between neighborhoods in the percent of the odds ratios will lie within this interval. The
propensity to be hospitalized may be explained by neighbor- interval contains 1, which implies that neighborhood educa-
hood educational level. tion does not account for a substantial amount of the neigh-
A population-averaged approach cannot convey the rami- borhood heterogeneity. In addition, the interval is quite
fications of the effect of neighborhood education on the broad, reflecting a large amount of unexplained variation
propensity to be hospitalized in an equally satisfactory way. between neighborhoods in the propensity to visit a public
The results of such an analysis would be quantified in terms physician. Other cluster-level variables are needed to explain
of a (slightly attenuated) odds ratio of the cluster heterogeneity.
A population-averaged approach might lead to a different
2 2
exp ( 0.236 ⁄ 1 + ( 16 × 3 ⁄ ( 15 × π ) ) × 0.009 ) = 1.27 conclusion. The population-averaged effect of neighborhood
Am J Epidemiol 2005;161:81–88
Quantification of Heterogeneity due to Clustering 87
examples 1 and 2, the effect of individual education is condi- The authors express their gratitude to Prof. Niels Keiding
tional on neighborhood. A subject-specific approach is of the Department of Biostatistics, University of Copen-
proper, because it facilitates measures of effect on the indi- hagen, for his comments on an early draft of this paper. The
vidual level: What is the direct effect of having a high indi- authors also thank Dr. Basile Chaix of the Research Team on
vidual level of education versus a low level for a person the Social Determinants of Health and Health Care, French
under his/her specific conditions? In this sense, the subject- National Institute of Health and Medical Research, for his
specific parameters are more “clean” measures of the effect comments on the final manuscript.
of individual education, because they are not attenuated by
unmeasured heterogeneity.
A general argument favoring the random-effects model
REFERENCES
is the possibility of quantification of heterogeneity in an
intuitively attractive way. As is shown in the four models 1. Diez-Roux AV. Multilevel analysis in public health research.
in the examples, the heterogeneity may be quantified by Annu Rev Public Health 2000;21:171–92.
characteristics from the distribution of the odds ratios 2. Kawachi I, Berkman LF. Neighborhoods and health. New
Am J Epidemiol 2005;161:81–88
88 Larsen and Merlo
APPENDIX 2 –1
z = exp ( 2 × σ × Φ ( 0.75 ) ).
Mathematical Derivations
The MOR is the median of this distribution, so it can be and the 90th percentile is
calculated as the solution to the equation, F(z) = 0.5, which
2 –1
leads to IOR upper = exp ( β × ( x 1 – x 2 ) + 2 × σ × Φ ( 0.9 ) ).
Am J Epidemiol 2005;161:81–88