You are on page 1of 8

STATISTICS IN MEDICINE, VOL.

15, 1927-1934 (1996)

USE OF STONE’S METHOD IN STUDIES OF DISEASE RISK


AROUND POINT SOURCES OF ENVIRONMENTAL
POLLUTION
GAVIN SHADDICK AND PAUL ELLIOTT
Small Area Health Statistics Unit, Department of Epidemiology and Public Health.
Imperial College School of Medicine at St. Mary’s, Norfolk Place, London W2 IPG, U.K

SUMMARY
The Small Area Health Statistics Unit is a national facility funded by the U.K. government for the analysis of
disease risk around sources of environmental pollution. It holds cancer incidence (from 1974) and mortality
data (from 1981) for Great Britain. Data retrieval is based on the postcode of residence, relating on average
to 14 households. Population data for the calculation of disease rates and small area measures of
socioeconomic deprivation are from census small area statistics for 1981 and 1991. Isotonic regression
methods first described by Stone are used to test for declines in disease risk with distance from point sources
of environmental pollution. This paper describes modifications of the method to include adjustments for
socioeconomic confounding, a conditional approach to allow for generally elevated risks near the source,
and methods to deal with pooling of data around a number of point sources. Examples from recent studies
are given.

INTRODUCTION
There are increasing concerns about the possible effects on health of point sources of
environmental pollution, but the analysis of disease risk around a source is often constrained by
the lack of actual or modelled exposure estimates.’ Often the only practical approach is to test for
an association of disease risk with distance from the source according to some general exposure
decline-distance model.’ This paper describes how methods originally proposed by Stone’ have
been developed and are used within the U.K. government funded Small Area Health Statistics
Unit (SAHSU).3-6 Implementation of these methods requires the retrieval of observed and
expected numbers of cases for small sub-areas around a point source. In the simplest case, where
distance is used as a surrogate for exposure, these sub-areas would be defined by a series of
concentric circles around the source (Figure 1). If additional information were available, such as
pollution measurements or wind patterns, areas defined by differing levels of exposure could be
created within the framework of Geographical Information Systems
In view of the rarity of the diseases that are often studied, and because the inquiry may have
been carried out in response to an apparent excess of cases around the source, it is often desirable
to widen an inquiry by including other sources exhibiting the same, or similar, pollution effects.
This presents a new set of problems, both in the allocation of populations and cases to particular
sources and in the statistical analysis. Where the definition of the sub-areas around each of the

CCC 0277-6715/96/181927-08
0 1996 by John Wiley & Sons, Ltd.
1928 G. SHADDICK AND P. ELLIOTT

Figure 1. Schematic representation of study area around a point source s with sub-areas for analysis defined by a set of
concentric circles around the source. The hatched area represents a distance band defined by radii d 3 - d , .

sites is consistent then the analysis is much simplified, and methods have been developed to
accommodate such pooling of data.

METHODS
Before applying Stone’s methods, observed and expected numbers are first required for the
set of sub-areas selected around the point source (Figure 1). In SAHSU, these numbers
are retrieved from our own national d a t a b a ~ e , ~using
. ~ . ~ the postcode of residence to locate
cases. There are currently 1.7 million unit postcodes covering 25 million postal addresses,
each postcode containing on average 15 households (about 40 persons). Postcoded individual
health events stored in the database include, amongst others, all deaths (by specific cause) from
1981 and cancer registrations from 1974. Population data are held by age quinquennia and sex
for enumeration districts, which are the smallest aggregate units for which census data are
released. On average, enumeration districts contain data on around 400 persons. The population
at risk at each distance is found by linking postcodes in the study area to the enumeration
districtse6
One major difficulty in the interpretation of small area health statistics is the problem
of potential confounding, as sources of pollution tend to be located in deprived areas,
and deprivation itself is a strong predictor of disease incidence and m ~ r t a l i t y .To
~ allow
for that possibility, a measure of deprivation has been calculated for each census enumeration
district using variables derived from the census small area statistics, for example, unemployment
rates, household over~rowding.~ Each variable was standardized over Great Britain to have zero
mean and unit variance and then for each enumeration district their values were summed to give
the deprivation score. These scores were then grouped into national quintiles, together with
STONES METHOD IN STUDIES OF DISEASE AROUND POINT SOURCES 1929

a small (sixth) stratum for those enumeration districts where data were insufficient to provide
a score.
National rates (rijkl) are calculated for the 216 strata for each calendar year defined by
deprivation score (6 groups), sex and age (18 x 5-year groups) for each calendar year:

rijkf i = 1, ... , 6 deprivation

j = 1, 2 sex

k = 1, ... ,18 age-group


1 = 1, ... calendar year (number of years, for example, from 1974 for cancers).

Expected numbers by four digit code according to the International Classification of Diseases 8th
or 9th revision**9are then obtained by applying these rates to the populations of the study
sub-areas, for example, the hatched area in Figure 1, using indirect standardization." This is
done by calendar year, age and sex with further adjustment for region (Regional Health
Authority) to allow for regional variations in incidence or mortality, and, for cancers, in
completeness of registration.
First, expected numbers standardized for age and deprivation are obtained separately for males
and females and each calendar year (Ejl):

Ejl = 1Eijkf = C(rijk1Pijkf)


ik ik
where Pijk, is the population of the study sub-area, stratified by deprivation, sex, age and calendar
year. Adjustment is made for region, by multiplying these expected values by standardized
incidence or mortality ratios for the region, adjusted for deprivation,

where ROjl and RE,, are year-and-sex-specific regional observed and expected values,
respectively. These numbers are then summed up over calendar years for males, females and for
both sexes combined, that is, for both sexes combined,

Stone's methods

Unconditional test
Given sets of observed and expected numbers for the set of sub-areas around the source, these are
ranked according to their distance (or some other exposure measurement) from the source. In
order to test for a decline in risk over distance from the source, a null hypothesis that relative risk
in each sub-area is equal to one is tested against an alternative of non-increasing risk over
distance. This implies a very general underlying model of exposure decline with distance, which is
a desirable feature when actual exposure information is unavailable (as is often the case). In this
first test, known as the unconditional test, the assumption is made that the number of observed
cases in each sub-area is independently Poisson distributed with mean proportional to
1930 G. SHADDICK AND P.ELLIOTT

-
the corresponding expected values, that is, Oi Poisson ( l i E i ) ,where A, is the relative risk in the
sub-area:
H o : Al = A2 = -.. = A k - 1 k = number of discrete sub-areas,
H,:A1 2 A2 2 ... 2 &. but excluding H o .
In performing the test, an estimate of the relative risk in each sub-area is required under the
constraints of the alternative hypothesis, which is analogous to the problem of isotonic regression
under simple ordering' and can be achieved using the following formula:
r 1 1

An estimate, I,, of the relative risk in a particular area is therefore the maximum possible
cumulative observed/expected ratio available using that and any number of the subsequent areas.
Stone originally suggested a statistic known as the Poisson maximum, which uses only the first
of the estimates, equating to the largest possible cumulative ratio available over the entire study
area.2 Any decline in risk will therefore be represented by a single downward step. Often, in
studies of rare diseases or where the areas close to the source are small (with a low number of
expected cases), this largest cumulative observed/expected ratio will arise from just a single case in
the first one or two sub-areas, reflecting only the discrete nature of the Poisson distribution, and
will not necessarily give a good summary of the level of risk around the source."
For this reason, when dealing with small areas, an approach suggested by Stone using
likelihood ratio tests is preferred. This uses the full set of estimates, enabling the decline to be
represented by up to k - 1 downward steps over the study area. The maximum likelihood ratio
(MLR) using the given set of observed and expected numbers is calculated and compared with
a large number of MLRs calculated using simulated sets of observed numbers generated from the
Poisson distribution under the terms of the null hypothesis. A p-value is thus obtained by ranking
the MLR from the given data within the set calculated using the simulated values.
A statistically significant result from this unconditional test could arise either from a decline in
the relative risks over distance or because the relative risk in the study area as a whole differs from
unity, or from some combination of the two. In order to help distinguish between these
possibilities, a second test - known as the conditional test - is performed.

Conditional test
The conditional test has the more general null hypothesis that the relative risks in the sub-areas
are all equal to a constant but not necessarily equal to one, thus allowing for a generally elevated
background risk:
Hb: A1 = A2 = ... =Ak ( = constant)
H I : A1 2 A2 > ... 2 A k but excluding H o .
A significant result from this test would imply a genuine decline in relative risk over distance from
the source, although this can also include the situation where the relative risk over the study area
as a whole is less than one.
Again a p-value is obtained by the comparison of the MLR from the data within a set
calculated using simulation values, but some modification to the method is required. The more
STONES METHOD IN STUDIES OF DISEASE AROUND POINT SOURCES 1931

general hypothesis introduces an additional unknown parameter, the constant. For this reason,
the number of observed cases in each area is assumed to come from the multinomial distribution
(with probabilities proportional to the number of expected cases in each area). By conditioning
the sum of expected cases to be equal to the number of observed cases, this unknown parameter is
eliminated.

Multiple Sources
Where a study involves the analysis of data from a number of sources, it is desirable to produce
a single overall result by pooling information from each of the sources. If the sub-areas chosen
around each source are similarly defined, for example, a series of concentric circles with the same
radii, the data for the unconditional test can be pooled directly, assuming there is no interaction
between sources. If two sources have study areas that overlap, the circles can be modified to form
a series of concentric shapes around the sources, with appropriate allocation of sub-areas to each
source.6
For the conditional test, the null hypotheses are unique to each source in order to allow for
different background risks. The tests therefore have to be performed separately for each source

-
and the results pooled. One approach is to combine the p-values from each source in standard
fashion, that is, - 21logp xzn. Methods are available to incorporate arbitrary weightings in
such an a p p r ~ a c h , ’a~desirable option if the population sizes around the sources are markedly
different. An alternative approach, and the one routinely used within SAHSU, is to sum the
maximum likelihood ratios from the individual sources at each simulation run, and to examine
the set of summed MLRs. The overall p-value is therefore obtained from a ‘study wide’
simulation, lessening the problems of potentially unstable or unobtainable p-values arising from
sources with small populations. In practice, the two approaches produce similar results.

EXAMPLES
Table I shows a set of observed and expected numbers of incident cases of all cancers, 1974-84,
for eight distance bands within 7.5 km of a petrochemical plant in Baglan Bay, South Wales.14
Observed/expected ( O / E ) ratios and the maximum likelihood estimates of the relative risks in
each band calculated under the constraints of the alternative hypothesis are also shown.
The unconditional test for these data gives a p-value of 0.001, that is, the observed set of data
was more extreme than all of 999 simulated sets. However, examination of the data suggests that
there is a generally elevated risk over the entire study area. In fact, this was found to be the case
more generally for West Glamorgan,” which contains the study area, and had a relative risk of
1.05 (95% CI 1.04-1.07) relative to Wales as a wh01e.I~In contrast, the conditional test gave
ap-value of 0.489. This was not indicative of a decline in risk with distance from the plant. Further
study of cancer incidence around all industrial complexes in Great Britain that contain a major
oil refinery is underway.
Table I1 shows data from a study of cancer incidence around 72 municipal incinerators in
Great Britain.16 Aggregated observed and expected numbers of lung cancer are given from
a sample of 20 incinerators for 1974-1986, allowing for a lag period of 10 years between first
operation of a plant and inclusion of incident cancer cases into the study. The p-value for the
single unconditional test on the aggregated data was p = 0401. The conditional test for multiple
sources showed that the summed MLR over all incinerator sites was again greater than any
obtained from the simulated data sets (p = 0.001). Comparing pooled p-values with the
chi-squared distribution also gave a highly significant result (p < 0.001). Thus in this example the
1932 G . SHADDlCK AND P. ELLIOTT

Table I. Observed (0)and expected (E) numbers of all cancer


incident cases by eight distance bands around the Baglan Bay
petrochemical plant, 1974- 1984, observed/expected ratios
( O / E )and maximum likelihood estimates (MLE) of the relative
risks (RR) calculated under the constraint of the alternative
hypothesis of Stone’s tests
0 E RR(MLE)
-
0.5 0 7.7 0.00 1.09
1 0 3.2 0.00 1.09
2 54 1 532.7 1.02 1.09
3 634 551.8 1.15 1.09
4.6 1169 1083.8 1.08 1-09
5.7 1433 12894 1.11 1.09
6.7 1106 1064.6 1.04 1.05
7.5 534 496.9 1.07 1.05

Table 11. Aggregated observed (0)and expected (E)numbers of


incident cases of lung cancer from around a sample of munici-
pal incinerators within Great Britain, 1974-1987 by eight dis-
tance bands, observed/expected ratios ( O / E ) and maximum
likelihood estimates (MLE) of the relative risks (RR) calculated
under the constraint of the alternative hypothesis of Stone’s
tests

Distance 0 E O/E RR(MLE)


(km)
0.5 98 98.3 1.00 1.14
1 526 463.6 1.13 1.14
2 1564 1479.5 1.06 1.14
3 2794 2341.1 1.19 1.14
4.6 4490 4037.7 1.11 1.11
5.7 3038 2770.6 1.10 1.10
6.7 2004 1915.8 1*05 1.05
7.5 1569 15307 1.03 1.03

combination of Stone’s tests, unconditional and conditional, suggested a ‘true’ decline in lung
cancer risk with distance. Subsequent analyses suggested that this finding was most likely due to
residual confounding rather than to a possible effect of pollution from the incinerators, as in
subsequent analysis a significant result was obtained amongst the remaining 52 incinerators
during the period before incinerators came in to operation.’

DISCUSSION
Stone’s method is a valuable addition to the analysis of disease risk around a point source,
particularly in the absence of relevant exposure information. Under such circumstances, the
adoption of a very general model of decline of exposure and risk with distance is often the only
viable approach.
STONE‘S METHOD IN STUDIES OF DISEASE AROUND POINT SOURCES 1933

There are a number of problems associated with the application of these methods. One is the
computer resources that are required; obtaining p-values by simulation is a computer intensive
process, especially when dealing with many causes or multiple sources of pollution. In his original
paper, Stone ranked administrative areas (enumeration districts) according to their distance from
source;’ to run the maximum likelihood methods presented here using such an approach would
be computationally prohibitive, especially when dealing with a large number of sources. For this
reason, together with the need for clear presentation of the data, SAHSU uses sets of arbitrarily
chosen distance bands around the source(s). The inner sub-areas are chosen to concentrate on
local clustering and are small, whilst the remainder are chosen to have approximately equal areas.
This approach has the advantage that, by using the same definition for all sources, data from
several sources can simply be aggregated by summing within the bands. This process would not
be possible if, for example, administrative boundaries were used.
Another potential problem relates to the choice of expected values. As currently formulated,
covariate information for the calculation of the expected numbers has to be included in the
standardization procedure. SAHSU is fortunate in this respect to have information on potential
confounders, such as small area measures of deprivation, for the entire population. These can
then be used in the calculation of the stratified national rates, although such an approach is only
really practical for a small number ofcovariates. A further issue is that stone’s method, although it
gives a p-value, does not offer a quantitative estimate of the size of risk in proximity to the source.
Work is under progress in developing a non-linear regression approach to deal with covariates
and effect estimation, whilst retaining some of the attractive features of Stone’s method. This
would include the ability to allow constant risk up to a certain point, with a decline in risk over
the remaining distance.

ACKNOWLEDGEMENTS
The Small Area Health Statistics Unit is funded by grants from the Department of Health,
Department of the Environment, Health and Safety Executive, Scottish Office Home and Health
Department, Welsh Office, and Northern Ireland Department of Health and Social Services. We
thank the Office of National Statistics (formerly the Office of Population, Census and Surveys) and
the Information and Statistics Division of the Scottish Health Service for the cancer incidence data.
We also thank Jeremy Bullard, Marco Martuzzi and Rob Nichols for their helpful comments.

REFERENCES
1. Elliott, P., Martuzzi, M. and Shaddick, G. ‘Spatial statistical methods in environmental epidemiology:
a critique’, Statistical Methods in Medical Research, 4, 137-1 59 (1995).
2. Stone, R. A. ‘Investigations of excess environmental risks around putative sources: statistical problems
and a proposed test’, Statistics in Medicine, 7 , 649-660 (1988).
3. Elliott, P., Westlake, A. J., Kleinschmidt, I., Rodrigues, L., Hills, M., McGale, P., et al. ‘The Small
Area Health Statistics Unit: a national facility for investigating health around point sources of
environmental pollution in the United Kingdom’, Journal of Epidemiology and Community Health, 46,
345-349 (1992).
4. Elliott. P., Hills, M., Beresford, J., Kleinschmidt, I., Pattenden, S., Jolley, D. et al. ‘Incidence of cancer of the
larynx and lung near incinerators of waste solvents and oils in Great Britain’, Lancet, 339,854-858 (1992).
5. Elliott, P., Kleinschmidt, I. and Westlake, A. J. ‘Use of routine data in studies of point sources of
environmental pollution’, in Elliott, P., Cuzick, J., English, D. and Stern, R. (eds), Geographical and
Environmental Epidemiology: Methods for Small Area Studies, Oxford University Press, 1992.
6. Kleinschmidt, I., Pattenden, S., Walls, P., Grundy, C., Stevenson, S., Shaddick, G. and Elliott, P. ‘A
national health statistics database: data quality requirements for small area health studies’, in
Frederiksen, P. (ed), Eurocarto X I I : Geo-related Databases. 10-12 October 1994, Copenhagen,
Denmark, Proceedings, 1994, XVI-1-XVI-10.
1934 G . SHADDICK AND P. ELLIOTT

7. Jolley, D., Jarman, B. and Elliott, P. ‘Socio-economicconfounding’, in Elliott, P., Cuzick, J., English, D.
and Stern, R. (eds), Geographical and Environmental Epidemiology: Methods for Small Area Studies,
Oxford University Press, Oxford, 1992, pp. 115-124.
8. World Health Organisation. Manual of the International Statistical Classification of Diseases, Injuries
and Causes of Death, (8th revision conference), WHO, Geneva, 1967.
9. World Health Organisation. Manual of the International Statistical Classification of Diseases, Injuries
and Causes of Death, (8th revision conference), WHO, Geneva, 1978.
10. Breslow, N. E. and Day, N. E. Statistical Methods in Cancer Research. Volume It. T h e Design and
Analysis of Cohort Studies, IARC No. 82, Lyon, France, 1987.
11. Barlow, R. E., Bartholemew, D. J., Bremmer, J. M. et al. Statistical Inference Under Order Restrictions:
the Theory and Application of Isotonic Regression, Wiley, New York, 1972.
12. Hills, M. ‘Some comments on methods for investigatingdisease risk around a point source’, in Elliott, P.,
Cuzick, J., English, D. and Stern, R. (eds), Geographical and Environmental Epidemiology Methods for
Small Area Studies, Oxford University Press, Oxford, 1992, pp. 231-237.
13. Lancaster, H. ‘The combination of probabilities: an application of orthonormal functions’, Australian
Journal of Statistics, 3, 20-33 (1961).
14. Sans, S., Elliott, P., Kleinschmidt, I., Shaddick, G., Pattenden, S., Walls, P., Grundy, C. and Dolk, H.
‘Cancer incidence and mortality near the Baglan Bay petrochemical works, South Wales’, Occupational
and Environmental Medicine, 52, 217-224 (1995).
15. Office of Censuses and Surveys. Cancer Registrations 1987, H M S O , London, 1993 (OPCS Series MBI).
16. Elliott, P., Shaddick, G., Kleinschmidt, I., Jolley, D., Walls, P., Beresford, J. and Grundy, C. ‘Cancer
incidence near municipal solid waste incinerators in Great Britain’, British Journal of Cancer, 73,
702-710 (1996).

You might also like