Professional Documents
Culture Documents
Land use regression (LUR) models have been used successfully for predicting local variation in
traffic pollution, but few studies have explored this method for deriving fine particle exposure
surfaces. The primary purpose of this method is to develop a LUR model for predicting fine
particle or PM2.5 mass over the five county metropolitan statistical area (MSA) of Los Angeles.
PM2.5 includes all particles with diameter less than or equal to 2.5 microns. In the Los Angeles
MSA, 23 monitors of PM2.5 were available in the year 2000. This study uses GIS to integrate
data regarding land use, transportation and physical geography to derive a PM2.5 dataset
covering Los Angeles. Multiple linear regression was used to create the model for predicting the
PM2.5 surface. Our parsimonious model explained 69% of the variance in PM2.5 with three
predictors: (1) traffic density within 300 m, (2) industrial land area within 5000 m, and (3)
government land area within 5000 m of the monitoring site. These results suggest the LUR
method can refine exposure models for epidemiologic studies in a North American context.
This journal is
c The Royal Society of Chemistry 2007 J. Environ. Monit., 2007, 9, 246–252 | 247
View Article Online
major road would have a class of 3 and a variation of speed implemented to determine which variables were most strongly
limits. For class 3 roads that had a speed limit of 35 but no related to PM2.5. This first step tested over 140 independent
traffic count data, we assigned the average traffic count of all variables, as land use and road variables not only a have a
class 3 roads with a speed limit of 35. The average traffic count large number of categories but also varying buffer size. A
over all road segments within each buffer was calculated and multiple linear regression model was developed using the
assigned to each monitor. Using this method we were able to significant parameters from the bivariate models with a man-
Published on 19 January 2007. Downloaded by University of Massachusetts - Amherst on 26/10/2014 15:25:20.
obtain an accurate measurement of AADT throughout the ual forward selection process, based on the highest t-score for
MSA. each variable. This process builds from the most significant
variable adding in the next significant variable until a model of
Population data maximum prediction is achieved. The variance inflation factor
Population density is an important factor in determining how (VIF) was then examined to identify variables that were
much and what type of pollution is produced in a given area. collinear and could be eliminated. Variables with the highest
Densely populated areas typically contribute to more traffic- VIF and variables with the lowest t-scores were removed until
related pollution than sparsely populated areas, and within the a parsimonious prediction model with the highest R2 and
city, density may also influence emissions.12 In addition to the acceptable collinearity was derived.
number of people, we also posit that the volume of traffic, Using the bootstrap method, a sensitivity analysis was
number of businesses and lack of green space will directly completed to test the stability of the estimates from the
relate to population density. Higher population density gen- multiple regression model.17 From the 23 monitor data points,
erates both an increase in the traffic pollution from commuting a random sample of 15 data points were selected with replace-
and traveling to commercial areas, and an increase in the ment, for 1000 repetitions. This enabled determination of bias
pollution from heating combustion. To determine the popula- in the estimates and how accurately the model predicted PM2.5
tion density across the SCAG area, a kernel estimate was when multiple points were excluded from the regression
calculated using the 2000 census population assigned to the analysis. Additional model diagnostics included the
census tract centroids. A moving window roams over popula- Cook–Weisberg test for heteroskedasticity, which is testing
tion surface calculating the number of people within a given that the residuals have a normal variance, and the df Betas and
radius at each 50 m grid cell. This involves a Gaussian Cook’s distance to examine outliers. Specifically, the df Betas
distribution where points further away from the center of measure how the estimate changes when an observation is left
the moving window account for less influence involved in out, one at a time.18 This will lead us to determine if specific
the calculation of the estimate.15 The radius of the kernel points have any influence in the model. Cook’s distance is also
was determined through analyses with a semivariogram. This used to determine whether a single observation changes
process revealed that spatial autocorrelation in the range of the regression estimates. The statistic determines how far the
5–10 km was optimal. estimate is from the mean, when an observation point is
left out.18
Physical geography
Closer proximity to a large body of water, such as the ocean, Visualizing the surface
reduces the sources and concentrations of pollutants. The
Visualization is an important diagnostic for assessing the face
onshore marine breeze also maintains pollution at relatively
validity of a predicted LUR model. Approximately 18 000
low levels near the coast.16 The distance to the ocean was
lattice points were created for the SCAG area with a cell
calculated for each monitor location and was tested in relation
resolution of 2.3 km. We created lattice points that were
to measured pollution levels. Elevation data was acquired
relatively close together in distance to more finely estimate
from the United States Geological Survey (USGS) at 30 metre
the PM2.5 surface, but were partly constrained by computa-
resolution and each monitoring area was assigned an elevation
tional capacity. Buffers for the independent variables in the
to test in the pollution model.
final regression model were created around each lattice point,
Modeling methods where the areas of each variable were once again calculated in
hectares. Using the fitted regression equation, we calculated a
Regression and spatial analysis was used to create an inter- predicted PM2.5 value for each of the 18 000 lattice points. This
polated PM2.5 pollution surface. ArcView v3.3, ArcMap v9.0, method allows for visualization, but for subsequent health
ArcInfo v9.0 (Redlands, CA, USA), Splus 2000 (Boston, MA, analysis, geographical points corresponding to study subjects
USA) and Stata v8 (College Station, TX, USA) were used for could serve as the lattice assignment points, to minimize
these analyses. We used an inverse distance weighting method assignment error.
to create an accurate surface, as will be discussed in more We then used the inverse distance weighting (IDW) method
detail. to interpolate and visualize the predicted pollution surface.
The IDW interpolator assumes that at each prediction point
Model selection
there is local influence that lessens with distance away from
Linear regression was conducted using the natural-logarithm that location.19 A specified number of predicted PM2.5 points,
transformation of the PM2.5 measurements. We used the or all points within a specified radius, can be used to determine
logarithm of the PM2.5 estimates to normalize the distribution the output value for each location creating a surface. The
for statistical analysis. Bivariate linear regressions were first power in the IDW interpolation determines how influential
This journal is
c The Royal Society of Chemistry 2007 J. Environ. Monit., 2007, 9, 246–252 | 249
Published on 19 January 2007. Downloaded by University of Massachusetts - Amherst on 26/10/2014 15:25:20. View Article Online
Fig. 3 Graph showing the predicted PM2.5 vs. the measured PM2.5
from the land use regression model
Fig. 4 Map of over-predicted values from the land use regression
model, showing most over-prediction occurs at freeway intersections
distribution and storage centers in inland areas. There are
more transport trucks, which combust diesel fuel, traveling on
the 710 freeway than any other freeway in the Los Angeles structure, transected by major highways and commercial
basin.23 Westerdahl et al.24 found that measured freeway areas, leads to a broader regional scale of influence for
concentrations of PM2.5 had a range of 60 to 820 mg m 3 on processes generating PM pollution, and the 2000–5000 m
the 10 East Freeway, and that concentrations along major buffers surrounding the PM2.5 monitors probably reflect this
roadways with high traffic density were up to 20 times higher dispersed form of urban development.
than residential concentrations. These measurements were The most closely related study geographically to this Los
taken over a five day period in April of 2003. Moreover, Angeles study is the work done by Ross et al.,12 in San Diego
increased concentrations of PM2.5 were associated with high County, California, which predicted ambient NO2. The study
diesel traffic along freeways. Thus, the over-predicted values in area in San Diego County was much smaller than the Los
our analysis are at plausible levels, although further field Angeles MSA, 11 721 km2 versus 98 500 km2 for the SCAG
validation work is needed to assess whether annual average region. While the Ross study found smaller buffer radii were
levels on or near freeway intersections are indeed this high. more effective predictors than in Los Angeles, the traffic count
Brauer et al.5 and Hochadel et al.6 have also used LUR to variable and the industrial land use were common to both
predict PM2.5 in Europe. Brauer et al. modeled air pollution in models. NO2 is known to vary over smaller areas than PM2.5
communities throughout the Netherlands, in Munich, Ger- in proximity to traffic.27 Larger buffer sizes used in this study
many and in Stockholm, Sweden. Although Brauer only used may reflect either inherent spatial scale of variation in the
traffic indicators and did not use land use classifications in his pollutants or the more dispersed urban structure of Los
multivariate regression model, he was able to derive significant Angeles. NO2 in the Ross study arises from local sources
prediction models for each of the three locations, both for where PM2.5 is more of a mixture of local, primary sources and
PM2.5 and filter absorbance (reflectance method), which is a secondary formation that we may expect to be more regionally
marker for diesel exhaust.5 For the Netherlands, Munich and dispersed.
Stockholm, the R2 values for the prediction model were 0.78, A kriging model has been previously developed for the Los
0.76 and 0.63, respectively for the PM2.5 model and 0.9, 0.83 Angeles basin, based on PM2.5 values from the 23 monitors .14
and 0.76, respectively for the absorbance model. Since absor- This model did not show as much of the local-area variations
bance models are used to assess traffic-related pollutants, the as the LUR models that were developed for this study (see Fig.
regression model results are a better fit than for the PM2.5 5). The success of the models in the Los Angeles MSA shows
model because the variables used in the model were chosen that traffic and land use are strong predictors of PM2.5.
specifically to measure traffic. Hochadel et al. conducted a Although the scale of variation appears larger than what we
study in Wesel Germany using primarily traffic-based indica- might expect from purely local processes around the emission
tors as the geographic factors. The regression models predicted source, the covariance of PM2.5 with land use predictors
strongly for PM absorbance (R2 0.65), but not for PM mass suggests a significant role for local land use and traffic in
(R2 0.094). As mentioned above when using traffic indicators, PM2.5 distributions within this large urban area. The general-
the relationship with PM absorbance should be stronger than izability of this model may be lacking as these monitors were
with PM mass. run by government agencies for the purpose of regulatory
The buffers that were used in our analysis to measure the compliance, and therefore may not reflect the true variability
area of land use were quite large when compared to other in mixture of land use. If the network is not representative of
studies.4,5,12,25 The Los Angeles MSA is a massive and sprawl- the true pollution variability we would expect to see bias in the
ing urban area, with one of the highest levels of employment true pollution surface. Our diagnostics with the bootstrap
and population dispersion in the US.26 This dispersed urban method demonstrated that influential points are unlikely to
Fig. 5 Universal kriging map of PM2.5 in the Los Angeles area; adapted from Künzli et al. (2005)11
drive the regression model. For example, none of the monitor (continued )
locations are specifically designed to measure traffic impacts GIS Geographic information systems
and the closest proxy we have available for this is densely IDW Inverse distance weighting
populated urban areas. While there is some overlap between FRM Federal reference method
densely populated areas and traffic, we may find more local LUR Land use regression
variation in locations that would include traffic sites. MSA Metropolitan statistical area
There is another caveat for the use of land use regression NAMS National air monitoring stations
surfaces for health research: while the surface depends intrin- NOx Nitrogen oxides
sically on other areas such as traffic, the fact that the covari- PM Particulate matter
ates are related to other health factors such as socio-economic SCAG Southern California Association Governments
position (SEP), suggests that the surface may include hidden SCC Southern California compass
confounders. Consequently for this problem, researchers must SLAMS State and local monitoring stations
adjust for other contextual confounders that may be related to USGS United States Geologic Survey
the land use and traffic input to ensure unbiased estimates of VIF Variance inflation factor
air pollution health effects.
We found that land use regression predicts 69% of the
variance in PM2.5 in the Los Angeles MSA. With sensitivity
analysis we observed few influential monitors. Traffic, indus-
Acknowledgements
trial and government areas are of most significance because of
the small particulate that are released into the air, most likely Funding from Southern California Environmental Health
as a result of the increased traffic on roadways and within the Sciences Center funded by NIEHS grant 5P30 ES07048.
government areas, and point source pollution from the in- Additionally, we acknowledge funding from EPA grant
dustries. More work is needed to further investigate the RD83186101, NIEHS grants 5P01 ES11627, 5P01 ES09581,
extraordinarily high levels of PM2.5 predicted around the the Health Effects Institute, and Health Canada and the
intersections of freeways since these levels may constitute a Canadian Institutes of Health Research. We would also like
significant threat to public health. to thank Bernie Beckerman for his GIS contributions.
Abbreviations
References
1 B. Brunekreef and S. T. Holgate, Lancet, 2002, 360, 1233–1242.
AADT Average annual daily traffic 2 D. J. Briggs, C. de Hoogh, J. Gulliver, J. Wills, P. Elliott, S.
AQMD Air Quality Management District Kingham and K. Smallbone, Sci. Total Environ., 2000, 253,
151–197.
ARB Air Resources Board 3 N. L. Gilbert, M. S. Goldberg, B. Beckerman, J. R. Brook and M.
EPA Environment Protection Agency Jerrett, J. Air Waste Manage. Assoc., 2005, 55, 1059–1063.
This journal is
c The Royal Society of Chemistry 2007 J. Environ. Monit., 2007, 9, 246–252 | 251
View Article Online
4 T. Sahsuvaroglu, A. Arain, P. Kanaroglou, N. Finkelstein, B. 15 T. Bailey and A. Gatrell, Interactive Spatial data Analysis, Prentice
Newbold, M. Jerrett, B. Beckerman, J. Brook, M. Finkelstein and Hall, New York, 1995.
N. L. Gilbert, J. Air Waste Manage. Assoc., 2006, 56, 1059–1069. 16 BAAQMD, Climate, physiography and air pollution potential–Bay
5 M. Brauer, G. Hoek, P. van Vliet, K. Meliefste, P. Fischer, U. Area and its subregions, Bay Area Air Quality Management Dis-
Gehring, J. Heinrich, J. Cyrys, T. Bellander, M. Lewne and B. trict, 2005.
Brunekreef, Epidemiology, 2003, 14, 228–239. 17 P. Burrough and R. McDonnell, Principles of Geographical In-
6 M. Hochadel, J. Heinrich, U. Gehring, V. Margentern, T. Kuhl- formation Systems, Oxford University Press, New York, 1998.
Published on 19 January 2007. Downloaded by University of Massachusetts - Amherst on 26/10/2014 15:25:20.
busch, E. Link, H. Whichmann and U. Kramer, Atmos. Environ., 18 D. G. Kleinbaum, L. L. Kupper, K. E. Muller and A. Nizam,
2006, 40, 542–553. Applied Regression Analysis and Other Multivariate Methods, 3rd
7 Southern California Association of Governments, Spring 2006, edn., Druxbury, North Scituate, 1997.
www.scag.ca.gov/factsheets. 19 ESRI, Deterministic methods for spatial interpolation, Help File,
8 Southern California Compass, Growth Vision Report, Southern Environmental Systems Research Institute, Redlands, California,
California Compass, 2004. USA, 2004.
9 W. J. Gauderman, E. Avol, F. Gilliland, H. Vora, D. Thomas, K. 20 A. Hricko, Road to an unhealthyfuture for Southern California’s
Berhane, R. McConnell, N. Kuenzli, F. Lurmann, E. Rappaport, children, Urban Initiative Policy Brief, University of Southern
H. Margolis, D. Bates and J. Peters, N. Engl. J. Med., 2004, 351, California, Los Angeles, California, USA, 2004.
1057–1067. 21 A. M. Hricko, Environ. Health Perspect., 2006, 114, A204–205.
10 California Environment Protection Agency and Air Resources 22 J. J. Corbett and D. Chapman, J. Air Waste Manage. Assoc., 2006,
Board, 2002 California PM2.5 monitory network description, Cali- 56, 841–851.
fornia Environment Protection Agency, 2003. 23 M. Meyer, Transportation Study, Meyer Mohaddes Associates,
11 N. Künzli, M. Jerrett, W. J. Mack, B. Beckerman, L. LaBree, F. Inc., Los Angeles, CA, USA, 2003.
Gilliland, D. Thomas, J. Peters and H. N. Hodis, Environ. Health 24 D. Westerdahl, S. Fruin, T. Sax, P. M. Fine and C. Sioutas, Atmos.
Perspect., 2005, 113, 201–206. Environ., 2005, 39, 3597–3610.
12 Z. Ross, P. B. English, R. Scalf, R. Gunier, S. Smorodinsky, S. Wall 25 M. Hochadel, J. Heinrich, U. Gehring, V. Margenstern, T. Kuhl-
and M. Jerrett, J. Exposure Anal. Environ. Epidemiol., 2005, 1–9. busch, E. Link, H. Wichmann and U. Kramer, Atmos. Environ.,
13 M. Jerrett, A. Arain, P. Kanaroglou, B. Beckerman, D. Potoglou, 2006, 40, 542–553.
T. Sahsuvaroglu, J. Morrison and C. Giovis, J. Exposure Anal. 26 P. Gordon and H. Richardson, J. Am. Plann. Assoc., 1996, 62,
Environ. Epidemiol., 2005, 15, 185–204. 289–295.
14 M. Jerrett, R. T. Burnett, A. Willis, D. Krewski, M. S. Goldberg, 27 F. Gilliland, E. Avol, P. Kinney, M. Jerrett, T. Dvonch, F.
P. DeLuca and N. Finkelstein, J. Toxicol. Environ. Health A, 2003, Lurmann, T. Buckley, P. Breysse, G. Keeler, T. de Villiers and
66, 1735–1777. R. McConnell, Environ. Health Perspect., 2005, 113, 1447–1454.