You are on page 1of 9

HAL MORGENSTERN,PHD

Abstract: Despite the widesprcad use of ecologic trate the important principIes and methods. A careful
analysis in epidemiologic research and health plan- distinction is made between ecologic studies that gen-
ning, little attention has been given by health scientists erate or test etiologic hypotheses and those that evalu-
and practitioners to the methodological aspects of this ate the impact of intervention prograrns or policies
approach. This paper reviews the major types of (given adequate knowledge of disease etiology). Fail-
ecologic study designs, the analytic methods appropri- ure to recognize this difference in the conduct of
ate for each, the limitations of ecologic data for making ecologic studies can lead to results that are not very
causal inferences and what can be done to minimize informative or that are misinterpretod by others. (Am J
these problems, and the relative advantages of ecolog- Public Health 1982; 72: 1336-1344.)
ic analysis. Numerous examples are provided to illus-

Ecologic studies are empirical investigation s involving mentals of analyzing a simple cohort study. To simplify the
the group as the unit of analysis. Typically, the group is a presentation, we will consider the possible association be-
geographically defined arca, such as a state, county, or tween a dichotomous study factor (x = exposed or une x-
census tract. Because they can often be done by combining posed) a:td a dichotomous disease outcome (y = case or
existing data files on large populations, ecologic studies are noncase). On the basis of data obtained from a cohort study
generally less expensive and take less time than studies (see Table 1 for notations), epidemiologists usually estimate
involving the individual as the unit of analysis; ecologic at least one of the folIowing parameters:1
studiescan also achieve certain objectives generally not met 1. The risk ratio (or relative risk) is the risk of develop-
with nonecologic designs. On the other hand, data on many ing the disease for exposed persons divided by the corre-
variables (e.g.. behaviors. attitudes, and medical histories) sponding risk for unexposed persons. Using the data nota-
may not be available at the ecologic level, and the results of tions in Table 1, we can estimate the risk ratio (RR) as
ecologic analyses are subject to certain limitations not
applicable to many other study designs. a/nj
(1)
The purposes of this paper are to show how ecologic RR=d;
analyses may be used in epidemiologic research and health
planning. and to illustrate some of the advantages and where aln, is the estimated risk for exposed subjects and c/no
disadvantagesof this approach. It is important to recognize is the estimated risk for unexposed subjects. Under the null
that ecologic studies have two major aims: 1) to generate or hypothesis (i.e., no association between the study factor and
test etiologic hypotheses. i.e, to explain disease occurrence; the disease),the RR is equal to one; a value greater than one
and 2) to evaluate the effectiveness of population interven- indicates a positive association; and a value less than one
tions, i.e.. to test the application of our knowledge for indicates a negative association. Of course, RR can dilfer
preventing disease and promoting health. In the discussion from its null value (one), even under the null hypothesis,
that follows. emphasis will be given to the methodological becauseof random and nonrandom errors (i.e., chance and
problems that are especially relevant to the interpretation of bias, respectively).
ecologic findings for achieving these two different aims. 2. The Pearson product-moment correlation coefficient
is fue covariance (CXy)of fue study factor (x) and the disease
(y) divided by the square root of the product of the two
The Simple Cohort Study and Ecologic Data variances (V x and V y). The estimated correlation coefficient
(r) from simple cohort data is
As a basis for comparing and thus interpreting the
results of an ecologic study, we will first review the funda-
r= ~-= ad -bc
= cj>, (2)
---
Address reprint requests to Hal Morgenstern, PhD, Assistant vV;-V-; v'mln1oDlnO
Professor, Department of Epidemiology and Public Health, Yale
University School of Medicine, 60 College Street, Box 3333, New For 2 x 2 tables. the correlation coefficient is usually called
Haven, CT 06510. Professor Morgenstern is also affiliated with the
the phi «11)coefficient. This measure reflects the extent to
Vale Center for Health Studies, Institution for Social and Policy
Studies. This paper, submitted to the Journal December 21, 1981, which each variable is able to predict the other. More
was revised and accepted for publication March 11, 1982. specifically. r2 (or <112)
is the proportion of the variance in
1) 1982 American.Journal of Public Health ..,." each yariable_explained by theother. The vaLue.._ofr can

1336 AJPH December 1982, Vol. 72. No. 12

Morgenstern H. Uses of ecologic analysis in epidemiologic r'esearch. Am J Public Health 1982; 72: 1336-
1344.
USE OF ECOLOGIC ANAL YSIS

TABLE 1-Data Layout: Cohort Study

Disease Status (y)


-
Study Factor
(x) Case Noncase Total

Exposed a b n,
Unexposed c d no
TOTAL m, mo n

range from negative one (indicating a perfect negative asso- mortality rates were greatest in the urban Northeast, which
ciation) to one (indicating a perfect positive association), and is consistent with the geographic distributions of known risk
the null value of r is lero. factors (i.e., tobacco smoking and alcohol consumption); bul
3. The e/iologic frac/ion (or population attributable risk among women, the mortality rates were greatest in the
per cent) is the proportion of all new cases in the population Southeast. These findings led some cancer epidemiologists
due to the exposure. We can estimate the etiologic fraction at NCI to speculate that snuff dipping, which is common
(EF) in the source population as among rural southern women, mar be a risk factor for oral
cancer.3 In fact, the results of a recent case-control study"
.p (RR -1) (3) support this hypothesis.
--p EF - (RR -1) + 1 ,
In the multiple-group compari.\"onstudy. we observe the
association between the average exposure level (X) and the
where p is the estimated proportion of exposed persons at diseaserate (Y) among several groups. We can quantify and
risk in the source population = nI/o, if n is sampled from a test this ecologic association by regressing y on X, i.e., by
single population. This measure reflects the potential impact fitting the data to a mathematical model', such as
on disease frequency of eliminating the exposure in the
y = Bo + B.X (4)
population; thus, it is particularly useful in health planning.
The null value of the EF is zero, and a value of one indicates where Y is the predicted value of y for any given value of X.
that the exposure is a necessary cause of the disease (i.e., B. is the estimated slope, and Bo is Ihe estimated y.
the RR is infinite). interceptoTo simpliry the presentation, a simple linear model
As shown in Table 2, the key feature of ecologic data, will be used throughout this paper; however, the method can
relative to cohort data, is the lack of information about the be extended to include several predictors (Xs), polynomial
joint distribution of the study factor and the disease within and interaction terms, and other nonlinear models. What is
each group (i.e., unit of analysis). That is, we know the most relevant to epidemiologists about this type of analysis
number of exposed subjects (nlj) and the number of cases is that the regression coefficients (Bo and B,) mar be used 10
(mlj) within each group, but we do not know the number of eslimate the same categorical measures normally compuled
exposed cases (~). In ecologic analysis. the independent from studies conducted at the individuallevel. To eslimale
variable (X) is the proportion of exposed subjects within the the RR from ecologic data,6.7we can divide the predicted
group (n¡ynJ, and the dependent variable (Y) is the rate (or rate (Y) where X is set equal to one (all exposed) by Ihe
risk) of disease (mlynJ. Notice that both ecologic variables predicted rate where X is set equal to zero (all unexposed).*
are continuous, even though the two factors are dichoto- i.e.,
mous at the individuallevel. RR = YIX = I
YIX = o =
Bo + B,
= +-B.
Bo
(5)
Bo
Ecologic Study Designs The ecologic correlation coefficient (R) is equal to
The fundamentals of ecologic analysis can best be
presented by dividing ecologic study designs joto tour types. R = B 1 v-V;¡V-; (6)
The simplest of these is the exploratory study in which we
observe geographic ditIerences in the disease rate among
where Vx and Vy are the ecologic (between-group) vari-
several regions (groups). The objective is tI> search for
ances of X and Y. respectively." To estimate the EF. we
spatial patterns that might suggestan environmental etiology
simply substitute the above estimate of the RR joto expres-
or a special etiologic hypothesis. No exposures are mea-
sion 3. where p is the estimated proportion of exposed
sured and, generally, no formal data analysis is used. For
persons in the entire source population. (Note that p may no!
example, the National Cancer lnstitute (NCI) mapped the
age-adjustedcancer mortality rates in the US by county for
the period 1950-69.2For oral cancers, they found a striking "Notice that. assuming a linear modelo Ihe eslimaled risk
ditIerence in geographic patterns by sex: among men. the dilJerence (R. -Ro) is equallo the slope (B,).

1337
AJPH December 1982. Vol. 72. No. 12
MORGENSTERN

3
41
...2
~
00
~
>-
~
..
...
:=
...
00
1
00 ...
IX o
~ O
..
"O
u
~
In

Percent Protestant (X)


FIGURE I--.",uicide Rate (per lOSpopulation) by Rellgious Composi-
tion (Per Cent Protestan!) ror Four Groups or Prussian Provlnces,
1883-1890.
*The rour observed poin!s (X, y) on !he graph are: (30, 9.~6),
(4~, 16.36), (78.5, 22.0), and (9~, 26.46). The fitled linc (y) is bascd
on unweigh!ed least squares regression. SOURCE: Durkheim"

be equal to X, the unweighted mean of the proportion


exposed within each group.)
As an illustration of an ecologic comparison study, 1900 1920 1940 1960
consider the work of Emile Durkheim,9 who investigated FIGURE 2-Time Trends In the Age-Standardlzed Mortallty Rates
suicide in western Europe during the 19thcentury. The data (per lo" populatlon) for Two Infectlous Diseasesin Relation to Ihe
in Figure 1 represent some of Durheim's findings for four Introductlon of a Vacclne: US 1900-1973
groups of Prussian provinces between 1883and 1890. The
suicide rate (Y) is regressed on the per cent of each group Source: McKinlay and McKinlay'O (Reprinted in revised form
by permission of Milbank Memorial Fund Quarterly.)
that is Protestant (X). Using unweighted** least squares
regression,B, is 24.0 x 10-~and Bo is 36.6 X 10-6, where X
= .62, Vx = .0891,and Vy = 53 X lO-1°. Fromexpression 5,
we calculate RR to be 1 + (24 x 10-5/36.6 x 10-6) = 7.57, In the time trend study (or time series study), we
i.e., it appears that Protestants were 71/2times as likely to observe the relationship between the change in the average
commit suicide as were other residents (most of whom were exposure level (or intervention) and the change in the
Catholic). From expression 6, we find R to be 24 x 10-5 x diseaserate for a single population. t With time trend studies
V.0891/53 X 10-1°= .983. Thus, 97 percent(= .98310fthe involving a sudden change in exposure, such as the start of
between-group variance in suicide rate is explained by an intervention program, we compare the slope in the
religious composition of the region. The fact that this value is diseasetrend before and after the intervention. For example,
so close to one indicates a near perfect lit of the linear in Figure 2, the age-adjusted mortality rates for two infec-
modeloFrom expression 7. assuming that p is equal to X,*** tious diseases are graphed for the period 1900-1973. While
we calculate EF to be .62(7.57 -1)/[.62x (7.57 -1) + IJ = the data suggestthat the introduction of the polio vaccine in
.803,i.e.. it appears that about 80 per cent of all suicides in the 1950swas etfective in preventing polio deaths in the USo
Prussia between 1883and 1890were due to being Protestant it appears that the introduction of the measles vaccine in the
(or to other factors associated with religion). 1960shad little or no etfect on measles mortality.loNote that
the measles mortality rate had been declining in this country
since 1910and reached its current low level about 20 yearN
..Since y is a proportion or rate, the variante of Y may vary before the vaccine was introduced.
appreciably over values of X. In this situation, it is preferable to use
,,'tighttd least squares regression. tForrnal statistical analysis oí the relationship between time
...Since Durkheim did not provide the number of residents in
each group of provinces. we cannot determine p directly from his trends (Iagged time series regression. cohort analysis. and Fourier
analysis) is beyond the scope of this paper.
data.

AJPH December 1982, Vol. 72, No. 12


1338
..-,,=_c --

USE OF ECOLOGIC ANAL YSIS

between 1948 and 1964 for 83 British towns. by sex, age. and
change in water hardness. For al! sex-age groups. espccially
for males. they found an inverse association between trends
c n
o
'"x in water hardness and CVC mortality. In middle-age men,
~ "t' for instance, the increase in CVD mortality was less in towns
5
~
.3
~ that made their water harder than in towns that made their
.:c-
water softer. Thus. we might infer that water hardnes~
'O
~v ~
~
protects one against CVD. In general. the mixed design
:c ';; provides a better test of an etiologic hypothesis than do the
other ecologic designs. because the results of the mixed
study are less likely to be due to the confounding clfect~ 01"
extraneous risk factors.
1940 19~ 1960 1970 1980

FIGURE 3-Tlme Trends In the Age-Standardlzed Coronary Hear!


Dlsease(CHD) Mortallty Rate (per lQ5 populatlon) and Per Capita The Ecological Fallacy
Alcohol Consumptlon (gal/yr): US, 1945-1975
The major limitation of ecologic analysis for testing
Source: Kuller, el a/.'1 etiologic hypotheses is the potential for substantial bias in
effect estimation. The central problem, known as the "eco-
logical fallacy", 14results from making a causal inference
With time trend studies involving a gradual change in about individual phenomena on the basis of observations of
average exposure level (observational designs), we must groups. In Durkheim's data,9for example, it may have been
compare trends in both variables. The objective is to test for Catholics (not Protestants) who were committing suicide in
a relationship between trends, taking into account a possible predominantly Protestant provinces. This alternative expla-
lag period between a given change in exposure and the nation is possible, because none of the provinces were
subsequentdisease efIect. For example, Figure 3 shows the entirely homogeneous with respect to religion. Indeed, it is
age-adjusted coronary heart disease (CHD) mortality rate quite plausible that members of a certain religious minorilY
and the per capita alcohol consumption in the US between might have been more likely to take their own lives than
1945and 1975.11Since the two trends oppose each other were members of the majority. tt
systematically throughout the periodo we might infer from The ecological fallacy was first demonslraled mathe-
these data that alcohol consumption is a protective (nega- matically in 1950 by William Robinson," who showed that
tive) risk factor for CHD. Alternatively. the observed rela- the total covariance of two variables can be expressed as the
tionship between the two variables may be due to changes in sum of a within-group component and a between-group
diagnostic custom or to trends in other CHD risk factors (ecologic) component. Using this form of the "covariance
associated with changes in alcohol consumption. theorem," Robinson derived the mathematical relation~hip
A similar analysis of artificial sweeteners and bladder between the average within-group correlalion coefficient (r)
cancer mortality in the US between 1950and 1967 showed at thc individual level and the between-group correlation
no relationship between these two trendsl2 for either males coefficient (R) at the ecologic level. Several years later.
or females. It is possible that the exposure efIect is too weak Duncan, el al,'6extended the covariance theorem to expres~
to be detected with this type of designo Alternatively. the relationship between the average within-group and be-
because of the long latency period of bladder cancer. the tween-group regression coefficients (bl and B" respective-
increased use of artificial sweeteners beginning in the early Iy).
l%Os
1980sor may not show
1990s. . an efIect on the disease rate until the To understand better the meaning of the ecological
fallacy, we mar partition the bias re~ulting from ecologic
In the mixed study, we observe the relationship between analysis into two components: 1) ag/(reKation hia.\'-due to
the change in the average exposure level and the change in the grouping of individuals; and 2) .\'pecificati(Jn hia.\'-due to
the disease rate among several groups. The anaJysis and the confounding effect of the "group" itself,"" In the (atter
interpretation of such data are similar to that of the compari- component, either certain extraneou~ risk factors are differ-
son study; the only difIerence is that in the mixed design entially distributed by group. or some propertY of the group
both variables are measured as absolute changes between itself, e.g., social disorganization. affect~ disease occur.
the same two times (or periods).'Thus, we can estimate the rence" The sum of these components is called cr(I,\",\".le,'('{
risk ratio (RR), the correlation coefficient (R), and the
etiologic fraction (EF) from the ecologic regression coeffi- ttln CactoDurkheim actually compared suicide rates for Protes-
cients, using expressions 5,6, and 3. Frequently, it is more tants, Catholics, and Jews living in Prussia. From his data. we find
convenient or informative to categorize the exposure vari- that the rale was about twice as greal among Prolestants as among
able and to compare the changes in disease rate among other religious groups, suggesting a subslanli.al difference belween
the results obtained at Ihe ecolo~ic level (RR = 7.57) and Ihose
groups having difIerent mean exposure levels. For examp)e, obtained at the individuallevel (RR a 2). Yel Durkheim failed 10
Crawford, et al,!) observed the absolute changes in the notice Ihis quantitative difference because he did nol actually
average annual cardiovascular disease (CVD) mortality rate eslimale Ihe ma/(nitude of the effect in eilher analysis.

AJPH December 1982, Vol. 72, No. 12 1339


MORGENSTERN

TABLE 3-Number 01 Subjects, by Exposure Status, Dlsease Status, and Group: Hypothetlcal
Data to Illustrate Cross-Level Bias

Exposed Unexposed
--- ---
Group O) Case Noncase Total Case Noncase Total

1 10 90 100 90 810 900


2 45 255 300 105 595 700
3 100 400 500 100 400 500
4 175 525 700 75 225 300
5 270 630 900 30 70 100
TOTAL 600 1900 2500 400 2100--- 2500

b;a.~, which, in theory, can make the ecologic association measuresa ditferent underlying construct(s) than the corre-
appear stronger or weaker tllan it is at the individual sponding variable (x) does at the individ'uallevel, 17'
level.".11 In fact, the two components may cancel each other For a given statistical model at the individual level.
out. resulting in no bias in the estimation of effect. However, there can be only specification bias. i.e,. confounding; while
in practice. this latter possibility is very infrequent; ordinari- for the same model at the ecologic level. there can be both
Iy, cross-level bias exaggerates the magnilude of the true specification and aggregationbiases. The net effect of group-
association. ing. therefore. is that a confounding variable at the individual
There will be no cross-level bias when. and only when. level may not be confounding at the ecologic level.'" For
the group mea n of the exposure variable (X) has no etfect on example. although the risk of bladder cancer is much greater
disease risk (y) at the individual level. controlling for Ihe for men than for women. sexoas an ecologic variable (per
individual's exposure value (X).11 Expressed more formally. cent male in each group), will not be very predictive of the
there will be no cross-level bias (or ecological fallacy) if the diseaserate (or anything else) acro~~geographically defined
regression coefficient ~2 (the population parameter) in the groups. The reason for this difference is that there are
following st~ctural equation is equal t~ zero: approximately the same proportion of males and females in
all groups. Conversely. a confounding variable at the ecolog-
y = ~o + ~IX + ~2X + ~ (7)
ic level may not be confounding at the individual level.'M
where the unit of analysis is the individual, X refers to the This latter discrepancy is greatest for ecologic analyses
average exposure level of the group to which the individual involving grouping by the dependent variable. thereby in-
belongs. and ~ is the error termo Essentially, then, cross- creasing the between-group variance of Y. This type of
level bias occurs when an ecologic predictor variable (X) grouping will result in confounding due to any correlate of
disease status, even if the corre late is unrelated to the
exposure variable at the individual level (and therefore not
an individual confounder).

¡lllustration 01 Cross-Level Bias


>

The hypothetical data in Table 3 involve 5,000 subjects


..
~
;.. aggregated into five equal-size groups. Using linear regres-
'" sion, as described earlier for multiple-group comparison
c
c studies, we find that B. is .25 and Bo is .075, where X = ..50,
.2
r T .80 + 81 X Vx = .10, and Vy = .00625 (see Figure 4). According to
8-
o /' expression 6, R is equal to .25 V.IO/.00625 = 1.00; thus, the
Lo
Q.
linear model fits the data perfectly, as indicated in Figure 4.
(Note that all observations fall on the fitted line.) From
expression 5, we estimate the RR to be 1 + .251.075= 4.33,
suggestinga fairly strong positive association. The estimated
o .2 .4 .6 .8 I EF. according to expression 3,is .5(4.33 -1)/[.5(4.33 -1) +
1] = .625, where p = .50. That is, of the 1,000 observed
Proportion Exposed (X)
cases,we estimate that 62.5 per cent (625 cases)were due to
FIGURE 4-The Proportion ofDiseased Subjects by the Proportion of the exposure.
Exposed Subjects for Five Groups* (Based on the Hypothetical Data in
Next, we will conduct acrude analysis at the individual
Tahle 3)
level, ignoring group affiliation. According to expression 1,
the estimated crude RR is (600/2500)/(400/2500)= 1.50 (refer
*In this ecologic analysis. the litted line IY) i~ based on least
to the row of "Totals" in Table 3). Thus, the crude associa-
squares regression.

AJPH December 1982. Vol. 72. No. 12


1340

~
USE OF ECOLOGIC ANAL YSIS

tion between the exposure and disease at the individuallevel TABLE 4-Summary of Results for the Three Analyses, Uslng
is weaker than the ecologic association (1.50 vs 4.33). the Hypothetical Data in Tabie 3
Similllrly, thc crudc (zcru-urdcr) currclation cocfficicnt Ir =
<ti)is also smaller than the corresponding ecologic measure
(R). According to expression 2, r is [600(2100) -1900(400)JI
V 1000(4000)(2500)(2500) = .100. Thus, differences in expo-
sure explain only I per cent (=.102) of the variance of
disease status. From expression 3. we estimate the EF to be
.5(1.5 -1)/[.5(1.5 -1) + 1] = .200, which is also less than
the ecologic estimate of .625.
.In the last análysis, we adjust (or standardize) for group
affiliation by controlling for "group" in the individual anal y-
siso The general formula for computing an (internally) stan-
dardized risk ratio' (sRR) is

¿ aj
j
sRR = (8)
individuals in that area about their drinking habits. On the
¿~ other hand. aggregate measurements mar be less accurate
j nOj
than individual measurements, especially when the former
where ajand Cjrefer to the number of exposed casesand are based on small samples within each group. For example.
unexposed cases, respectively, in the j-th stratum (group) errors in the measurement of serum cholesterol are likely to
(refer to Tables \ and 2), sRR is a weigh-ted average of the be more extensive at the ecologic level than at the individuitl
RRj, and sRR is also called the standardized morbidity ratio level. Thus, we mar conclude that inconsistencies between
(SMR). We find that sR'R for the data in Table 3 is 1.00, i.e., the results of ecologic and nonecologic studies mar be due.
there is no association between the exposure and the dis- at least in part, to differences in the relative accuritcy of
ease, controlling tor group. In fact, the estimated RRj are measurements.
exactly one in every stratum (group). The standardized (or Another type of bias that can occur in many types of
partial) correlation coefficient is also a weighted average of observational studies is a reversal of the hypothesized cause
the stratum-specific estimates (rj = <l>v,all of which are equal and elfect, i.e., the observed association is due to the direct
tu zcru in the example; thus. the partial r is equa\ to zcro. or indirect inf1uenceof the disease on exposure status. What
The standardized EF is estimated from expression 3, substi- makesecologic studies particularly vulnerable to this error is
tuting sRR for RR. For the hypothetical data in Table 3, that it can occur with prevalence. mortality, or even inci-
therefore, thc estimate of the standardized EF is zero. dence data. Suppose. for example. we were to observe an
A summary of results for all threc analyses is given in ecologic relationship between the decreasing prevalence of
Table 4. The dift.erence between estimates for the crude and oral contraceptive (OC) use and the increasing incidence ratt:
standardized analyses is due to the confounding etfect of of benign breast disease in a population. It is possible that
extraneous risk factors ditferentially distributed across this inverse association is due to the prescribed termination
groups (specification bias). The ditference between esti- of OC use following disease detection. Researchers must use
mates for the ecologic and crude individual analyses is due to their knowledge of disease. clinical practice. and human
the ditferent units of analysis (aggregation bias). We would behavior to rule out this alternative explanation for ecologic
conclude from these data that the exposure is not a risk associations involving observational data.
factor for the disease. Estimates obtained from the ecologic Another problem with ecologic analysis is that certain
analysis are strongly biased as a result of the cumulative predictor variables (especially sociodemographic and envi-
etfects of misspecification and aggregation. ronmental variables) tend to be more highly correlated with
each other than they are at the individuallevel, 'Ma phenome-
non called multicolinearíty. Consequently, the increased
Other Problems with Ecologic Analysis correlations between these Xs make it particularly difficult
to isolate their etfects on the disease in an ecologic analys;s.
To compare the results of ecologic studies with the For example. with the water basin area as the unit of
results of nonecologic studies dealing with the same hypoth- analysis. it is most difficult to separate the etfects of tJilferent
esized relationship. we must consider the relative accuracy trace elements in water supplies on cancer mortality .I~
of our measurementsat each level. since errors in measure- Becauseaverage trace element levels are so highly intercor-
ment are an important source of bias in the estimation of related. an estimate of the elfect of one exposure. controlling
etfect.1 In certain situations. aggregate measurements mar for the others, will be very imprecise (or equivalently, the
be more accurate than individual measurements. especially statistical test of the null hypothesis will be inefficient). In
if the phenomenon being measured involves a sensitive general. multicolinearity is most problematic for ecologic
issue. For example. sales or tax record s mar provide a better studies involving geographically defined units of analysis
estimate of alcohol consumption for an area than asking that are large and/or few in number .1".2()

AJPH December 1982. Vol. 72. No. 12 1341


MORGENSTERN

Minimizing rhe Problems (because it has not been measured) or that subjects are
grouped on a correlale of an excluded risk factor. In this
Bias and multicolinearity may seriously limit our ability situation. we should include in the model all variables
to test an etiologic hypothesis with ecologic data. Yet there thought to be related to the grouping process, even if these
are a few things that we can do to minimize these problems. variables are not risk factors for the disease.1
First, as done in the previous illustrations, we should use The findings of an ecologic analysis can be compared
ecologic regression-not correlation-to estimate the magni- with the findings of olher observational sludies designed to
tude of the desired association. Although grouping by x does test lhe same eliologic hypothesis. If the eslimaled etfect is
not result in any aggregation bias of the regression coeffi. consislent across studies involving different designs and
cient (if X is in the model), it does increase the variance of X differenl populalions, a causal interprelation is enhanced.
relative to the variance of Y. Therefore, the absolute value For example. the results of two case-control studies 22.23
of the correlation coefficient will also increase.M,'M,20In other linking number of pregnancies to ovarian cancer are consist-
words, in the situation where groups tend to be homoge- ent with the results of ecologic analyses linking observed
neous with respect to one of the independent variables, changes in these two vari~bles.2' On the other hand, the
ecologic regression coefficients, but not ecologic correlation negative ecologic association reported earlier belween water
coefficients, will result in unbiased estimates of their corre- hardness and CVD mortalityl3 is not consistent with the
sponding individual measures. Since grouping by x is very results of a case-control study2' and a cohort (mortality)
common in ecologic studies involving geographically defined sludy ,26bOlh of which failed to find a significant association
groups, the independent variables in an ecologic model mar in lhe hypothesized direclion. Unfortunalely. it is not possi-
easily explain 75 per cent or more of the variance of Y. This ble to determine from lhese results alone lhe extent lo which
finding does not necessarily indicate that the included varia- lhe ditference in observed effects is due to aggregation bias.
bles are the most important determinants of the disease; it specification bias. other sources of bias. or sampling error.
mar only indicate that, as a result of the grouping process, Therefore. we cannot say which finding is most accurale.
we have controlled for the elTects of other important risk
factors.
Evaluation o/ lnterventions
The second way to minimize inferential problems in
ecologic studies is to make the groups as homogeneous as
In addition to generating and testing etiologic hypothe-
possible by using smaller units of analysis, e.g., counties
sesoecologic analyses can be used to evaluate the effectiyt'-
instead of states, and more groups.'6,21 In theory, this
ness of population interventions.27Aggregation bias and the
strategy amounts to grouping by x, thereby ensuring a valid
ecological fallacy are moot issues in a population interven-
and more precise estima'te of effect. Unfortunately, this
tion study. if the link between a modifiable risk factor and
option is not often feasible because of the limited availability
the disease has already been established and if modification
of data. In addition, grouping on one x included in the model
of the risk factor can favorably affect disease outcome. i.e..
is likely to create grouping unintentionalIy on another risk
the treatment is efficacious. In this situation. we do not wish
factor not included in the modelo This situation could pro-
to make inferences about individuals but about groups-i.e..
duce the opposite effect of what is intended: aggregation bias
we are chiefty concerned with the impact of collective action
might increase.8 By using smalIer groups, the investigator
on disease rates.U':? An ecologic study in this setting is not
mar also increase the distorting effects of migration between
merely an inferior substitute for a nonecologic study; in Cacto
regions. This problem is most serious for individual ex po-
ecologic analysis is preferred for evaluating the effectiveness
sures or diseases that often lead to migration, e.g., asthma-
of a population intervention. For example. we might assess
!ics moving to a drier climate. Furthermore, the use of
the impact of a high blood pressure control program on the
smaller units of analysis tends to make the estimates of
rete of hypertensive-related mortality in a target population.
disease frequency for groups less stable, which mar become
Many population interventions do not involve direct
a serious limitation, since most diseases are "rare" events.
manipulation of a risk factor; they involve some collective
Unstable rate estimates for smalIer groups mar contribute to
action aimed at changing the targeted exp~sure. This feature
less accurate eslimates of effect, if we cannot also increase
applies to all kinds of intervention strategies. including
the number of groups.
persuasion or educational approaches. financial incentives
A third way to minimize the problems of ecologic
or barriers. and regulatory or legal sanctions. For example.
inference is to assess how our groups were, in fact, formed in the past several years. several states and local jurisdic-
and analyze the data accordingly.8 If subjects tend to be tions have passedlaws requiring smoke detectors in all new.
grouped by disease status, ecologic estimates of effect will and sometimes existing. dwelling units.30We might test the
be more biased than the corresponding estimates in an effectiveness of this legislation with a mixed ecologic designo
individual analysis. The researcher can, however, eliminate comparing changes in fire-related death rates during the past
lhis aggregation bias analyticalIy for ecologic models involv- lO years in states or localities with and without such
ing one or two independent variables. The method, proposed laws.*** The researchobjective is not to test whether smoke
by Leo Goodman,"is based on a reversal ofindependent and
dependent variables (X is regressed on Y). ***This design is based on an unpublished research protocol by
Alternatively, we mar believe that subjects are grouped Cyndie B. Gareleck from the Vale Department of Epidemiology and
on a risk factor that is excluded from the ecologic model Public Health.

1342 AJPH Oecember 1982. Vol. 72. No. 12


USE OFECOLOGIC ANALYSIS

detectors can indicate the presence of smoke:':-we know unintended consequences of population intervention~. AI-
they can. Instead. we want to assess the cumulative impact though a single ecologic study may be designed to achieve
of several related factors that determine the success or etiologic and evaluative objectives. the scope of an ecologic
failure of the legislation: whether the laws are enforced; analysis. to be most informative. should be more focused-
whether the detectors are installed correctly; whether they either it sheds some new light on our understanding af
are operated and maintained properly by the residents; ando disease occurrence. or it test~ the application of our knawl-
ultimately. whether they prevent injuries and deaths from edge to controlling the disease in society. Failure to recog-
tires. Moreover. the impact of smoke detector laws on tire- nize the difference in principIes between etiologic and evalu-
related deaths could not be assessed adequately at the ative research can lead to ecologic results that are not very
individual level, because a smoke detector in one unit can informative or that are misunderstood by others.
prevent deaths in nearby units or buildings that are them-
selves unprotected. REFERENCES
Another relative advantage of ecologic analysis for l. Kleinbaum DG, Kupper LL. Morgenstern H: Epidemiologic
evaluation is that population interventions often have unin- Research: Principies and Quantitative Methods. Bc:lmont. CA:
tended consequences that are not likely to be spotted by Lifetime Leaming Publications, 1982.
2. Mason TJ, McKay FW, Hoover R, Blot WJ. Fraumeni JF Jr:
studies conducted at the individuallevel. For example. in an
Atlas of Cancer Mortality for US Counties: 1950-1969. DHEW
ecologic comparison study of high school driver education Pub. No. (NIH) 75-780. Washington. DC: Govt Printing Office.
and motor vehicle fatalities involving teenage drivers. by 1975. pp 36-37.
state. Robertson and Zador' found no significant associa- 3. Blot WJ, Fraumeni JF Jr: Geographic patterns of oral cancer In
tion between driver education and the cate of fatal crash the United States: etiologic implications. J Chron Dis 1977:
30:745-757.
involvement among licensed teenagedrivers. However. they 4. Winn DM. Blot WJ. Shy CM. Pickle LW. Toledo A. Frdumeni
did find that driver education was associated with a substan- JF Jr: Snuff dipping and oral cancer among women in thc:
tial increase in the number of licensed teenage drivers. The southem United States. N Engl J Med 1981:3()4:745-749.
net effect was a higher fatal crash involvement cate among all 5. Goodman LA: Ecological regression and behavior of individ-
uals. Am Sociol Rev 1953; 13:663-664.
16-17 year-olds in states with greater proportions of teen- 6. Goodman LA: Some alternatives to ecological correlation. Am
agers receiving driver education. In other words. given the Sociol Rev 1959; 64:610-625.
current licensure laws for young drivers. the end result uf 7. Beral V, Chilvers C, Fra!icr P: On the c~timation of relative ri~k
publicly funded driver education programs mar be to in- from vital ~tatistical data. J Epidemiol & Comm Health 1979:
crease the motor vehicle fatality rate in the population by 33:159-162.
8. Langbein LI. Lichtman AJ: Ecological Inference. Series on
increasing the number of licensed teenage drivers. Quantitative Applications in the Social Science!i. No. 07-010.
Beverly Hills: Sage, 197M.
9. Durkheim E: Suicide: A Study in Sociology. Nc:w York: Free
Press, 1951. p 153.
Summary lO. McKinlay JB. McKinlay SM: The questionable contribution of
medical measures to the decline of mortality in the United
This paper has reviewed four types of ecologic study States in ¡he Twentieth Century. Milbank Meml Fund Q 1977:
designs and the analytic methods appropriate for each. In 55:405-428.
many ecologic studies. we can estimate the same parameters 11. Kuller LH, LaPorte RE. Weinberg GB: The decline in i!ichemic
that are estimable at the individuallevel for quantifying the heart disease mortality: environmental and social variable!i. In:
Havlik RJ. Feinleib M (eds): Proceedings of the Conference on
magnitude of the association between the exposure and the the Decline in Coronary Heart Disease Mortality. DHEW Pub.
disease. Despite the practical advantagesof ecologic data for No. (NIH) 79-1610. Washington, DC: Govt Printing Olfice.
generating and testing etiologic hypotheses. causal inference 1979,pp 312-339 (see Figure 6, p 334).
about individual events from grouped data is limited by 12. Burbank F. Fraumeni JF Jr: Synthetic sweetener consumption
and bladder cancer trends in the United States. Nature 1970:
certain methodological problems. including specification
227:296-297.
bias. aggregation Bias, measurement error, ambiguity of 13. Crawford MD. Gardner MJ. Morris JN: Changes in water
cause and elfect, migration between groups, and multicolin- hardnessand local death-rates. Lancet 1971;2:327-329.
earity. To minimize these problems, the researcher can: 1) 14. Selvín HC: Durkheim's "Suicide" and problems of empírical
use ecologic regression, instead of correlation, including in research. Am J Sociol 1958; 63:607-619.
15. Robinson WS: Ecological correlations and the behavior of
the statistical model as many risk factors as possible; 2) use individuals. Am Sociol Rev 1950; 15:351-357.
data that are grouped into the smallest geographic units of 16. Duncan OD. Cuzzort RP. Duncan B: Statistical Geography:
analysis as possible, subject tothe constraints of intergroup Problems in Analyzing Areal Data. Westport. CT: Greenwood
migration and unstable rate estimation; and 3) attempt to Press.I96I.
17. Firebaugh G: A rule for inferring individual-Ievel relationships
ascertain how the groups were formed and analyze accord- from aggregatedata. Am Sociol Rev 1978; 43:557-572.
ingly, which ordinarily means including in the model all 18. Blalock HM Jr: Causal Inferences in Nonexperimental Re-
variables thought to be related to the grouping process. search. New York: WW Norton & Co, 1964. Chpt 4. pp 97-114.
In addition to testing etiologic hypotheses, ecologic 19. Stavraky KM: The role of ecologic analysis in studies of the
etiology of disease: a discussion with reference to large bowel
analyses Can also be used to evaluate population interven-
cancer. J Chron Dis 1976; 29:435-444,
tions. Given adequate knowledge of disease etiology, eco- 20. Valkonen T: Individual and structural elfects in ecological
logic studies are particularly appropriate for assessing the research, In: Dogan M. Rokkan S (~ds): Social Ecology. Cam-
impact of collective actions on disease rates and for spotting bridge. MA: MIT Press, 1%9, Chpt 3. pp 53-68.

AJPH December 1982, Vol. 72, No. 12 1343


MORGENSTERN

21. Oreglia A. Ouncan RP: Health planning and the problem of the 29. Allardt E: Aggregate analysis: the problem of its informutivc
ecological fallacy. Am J Health Plan 1977; 2: 1-6. value. In: Dogan M. Rokkan S (edsl: Social Ecology. Cam-
22. Joly OJ. Lilienfeld AM, Oiamond EL. Bross 10J: An epidemio- bridge. MA: MIT Press. 1969. Chpt 2. pp 41-51.
logic study of the relationship of reproductive experience to 30. US Dept of Commerce: Smoke Detectors and Legislation.
cancer of the ovary. Am J Epidemiol 1974;99: 190-209. Washington. DC: National Fire Prevention and Control Admin.
23. Newhouse ML. Pearson RM. Fullerton JM, Boesen EAM, istration. 1978.
Shannon HS: A case control study of carcinoma of the ovary. 31. Robertson LS. Zador PL: Driver education and fatal crash
BrJ Prev & Soc Med 1977: 31:148-153. involvement of teenaged drivers. Am J Public Health 1978;
24. Beral V, Fraser P. Chilvers C: Ooes pregnancy protect against 68:959-965.
uvarian cancer'? Lancet 1978; 1:1083-1086.
25. Comstock GW: Fatal arteriosclerotic heart disease, water hard-
ness at home, and socioeconomic characteristics. Am J Epide-
mio11971; 94:1-10.
26. Comstock GW, Cauthen GM. Helsing KJ: Water hardness at ACKNOWLEDGMENTS
home and deaths from arteriosclerotic heart disease in Washing- [ would like to thank Dr. Doug Thompson of Ya le University
ton County. Maryland. Am J Epidemiol 1980; 112:209-216. for his helpful suggestions. An earlier version of this paper wa~
27. Campbell OT, Stanley JC: Experimental and Quasi-experimen- presented at the l09th Annual Meeting of the American Public
tal Research. Chicago; Rand McNally & Co, 1963. Health Association. Los Angeles. CA. November 1-5. 1981. This
28. Menzel H; Comment on Robinson's "Ecological correlations work
Grant was
No. supported
5 D04 AH by US Public Health' Service Special Project
0[759-3.
and the behavior of individuals." Am Social Rev 1950; 15:674.

ADVERTISERS'
INDEX
American Journal of Public Heafth December 1982

Abbol Diagnostics
AJ(t'lIc)':MacFar/alld. Avt'.vard & Ctllnpun)' 1345-1346

AVIV Biumedicallncurp. 1314

Campbell Soup Company.. 1315


A,J.'"nc.v: CSC Ad\."rlisin~

The Johns Hopkins University Press... 1314


Aj(ency: Welch. Mirabile and C().. //I(

Mead Johnson Nutritional Division 1316.1317


A1{t'ncy: Boclarn, Inc.

Plipharmecs
Ajft'ncy: S. J. Weinstein A.f.fOciate.f. Inc 1318. 1319

Professional Seminar Consultants. lnc. .. xxx


PurdueFrederickCo.
Ajfency: MEDICUS INTERCON 1326.1327.1328

Reed&Carnrick
Ajfenc.\': MED Cotn//l/tnic/ttion.f 1334. 1335

Ross Laboratories o o o o o o o o o cover 2


ARf'moy: KiRhl HUlllllYo /111"

Smith Sternau, , , , , , 13:!5


..\if"II("': NW A.,'('f

Wyeth
A!('/I(\',' Laboratoríes
Kallir. Philip.v. R(J.f,f. Inc cover 4

AJPH December 1982. Vol. 72. No. 12


1344

You might also like