You are on page 1of 9

Science of the Total Environment 701 (2020) 134463

Contents lists available at ScienceDirect

Science of the Total Environment


journal homepage: www.elsevier.com/locate/scitotenv

Review

Statistical spatial-temporal modeling of ambient ozone exposure for


environmental epidemiology studies: A review
Runmei Ma, Jie Ban, Qing Wang, Tiantian Li ⇑
National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, No. 7, Panjiayuan Nanli, Chaoyang District, Beijing 100021, China

h i g h l i g h t s g r a p h i c a l a b s t r a c t

 Comprehensively reviewed main


statistical models for ambient ozone
exposure assessment.
 Advantages and disadvantages of the
above models were summarized.
 Suggestions on future application of
these models in environmental
epidemiology were provided.

a r t i c l e i n f o a b s t r a c t

Article history: Background: Studies have discovered the adverse health impacts of ambient ozone. Most epidemiological
Received 29 June 2019 studies explore the relationship between ambient ozone and health effects based on fixed site monitoring
Received in revised form 28 August 2019 data. Fine modeling of ground-level ozone exposure conducted by statistical models has great advantages
Accepted 13 September 2019
for improving exposure accuracy and reducing exposure bias. However, there is no review summarizing
Available online 5 October 2019
such studies.
Guest editor: Pavlos Kassomenos Objectives: A review is presented to summarize the basic process of model development and to provide
some suggestions for researchers.
Keywords:
Methods: A search of PubMed, Web of Science and the Wanfang Database was performed for dates
Ambient ozone through July 1, 2019 to obtain relevant studies worldwide. We also examined the references of the arti-
Statistical model cles of interest to ensure that as many articles as possible were included.
Exposure assessment Results: The land use regression model (LUR model), random forest model and artificial neural network
Environmental epidemiology model have been used in this field. We summarized these studies in terms of model selection, data prepa-
ration, simulation scale selection, and model establishment and validation. Multiparameters are a major
feature of models. Parameters that influence the formation of ground-level ozone concentrations and
parameters that have been extremely important in previous articles should be considered first. The pro-
cess of model establishment and validation is essentially a process of continuously optimizing the model
performance, but there are certain differences in the specific models.
Conclusion: This review summarized the basic process of the statistical model for ambient ozone expo-
sure. We gave the applicable conditions and application scope of different models and summarized the
advantages and disadvantages of various models in ozone modeling research. In the future, research is
still needed to explore this area based on its own research purposes and capabilities.
Ó 2019 Elsevier B.V. All rights reserved.

⇑ Corresponding author.
E-mail address: litiantian@nieh.chinacdc.cn (T. Li).

https://doi.org/10.1016/j.scitotenv.2019.134463
0048-9697/Ó 2019 Elsevier B.V. All rights reserved.
2 R. Ma et al. / Science of the Total Environment 701 (2020) 134463

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Land use regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2. Random forest model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3. Artificial neural network model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2. Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.1. Monitoring data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.2. Parameter selection and preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3. Simulation scale determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.4. Model development and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.5. Application of the simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.6. Prospects and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Declaration of Competing Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1. Introduction and to provide suggestions for future studies. We will provide a


brief introduction of the models mostly used in this field. In this
As a secondary air pollutant, ambient ozone is a concern due to review, four factors will be considered: model selection, data
its widespread pollution (Anenberg et al., 2010). The adverse preparation, simulation scale determination, and model develop-
health effects of ground-level ozone have been proven by many ment and validation. Applications and limitations will also be
international studies, and the outcomes have showed varied discussed.
effects, such as the mortality of all non-accidental causes, cardio-
vascular, respiratory and coronary diseases, and hypertension
2. Methods
(Yin et al., 2017; Bell, 2004; Wong et al., 2008; Peng et al., 2013;
Di et al., 2017b; Yang et al., 2017). The harmful effects of ozone
To include a wide range of relevant studies, we searched impor-
have also been tested at more sophisticated levels by using, for
tant databases in English and Chinese, including PubMed, Web of
example, glucose (Yang et al., 2018), blood pressure (Cole-Hunter
Science and the Wanfang Database. The search was performed
et al., 2018) and inflammatory markers (Lee et al., 2018). Therefore,
for dates through July 1, 2019 to obtain relative studies worldwide.
ambient ozone causes serious public health problems. Exposure
The search words included were ‘‘ozone”, ‘‘O3”, ‘‘estimate”, ‘‘pre-
assessments of ozone are key to exploring the relationship
dict”, ‘‘forecast”, ‘‘spatiotemporal”, ‘‘spatial”, and ‘‘temporal”,
between ambient ozone and health.
which were searched in both English and Chinese. We also exam-
For environmental epidemiological studies that focus on the
ined the references of the articles of interest to ensure that as many
health effects caused by ambient ozone, there is a great demand
articles as possible were included.
for the accurate assessment of ambient ozone exposure. For long-
The exclusion criteria for the review are as follows: 1) studies
term exposure studies, high-resolution exposure data for historical
that do not mainly rely on mathematical statistics theories, such
periods were sparse, while for short-term exposure studies, many
as chemical transport models (CTMs) and air quality models; 2)
regions lack continuous daily ozone data from monitoring sites.
unspecified descriptions of the parameters and conducting process
Monitoring data have been commonly used in previous studies,
and unclear results; and 3) literature with repeated reports (Fig. 1).
but they are insufficient at the spatial and temporal scales and thus
result in incomplete exposure assessment. Even when the monitor-
ing data are complete, using the data to represent personal expo-
sure results in errors caused by the low precision of instruments
or the gap between averaged individual exposure and the true level
of air pollutants. Some studies further improved exposure accuracy
by matching monitoring data based on home address, but Berkson
error still exists due to differences between personal exposure
level and averaged individual exposure measurements. To solve
the above problems, some studies have explored the use of models
to estimate ground-level ozone exposure at high spatial and tem-
poral resolutions to try to avoid exposure assessment errors as
much as possible. In recent years, focusing on the adverse health
impacts of ozone and the accumulation of various data (satellite
data, chemical transport modeling data, etc.), studies have used
multiparameter statistical models to simulate ground-level ozone
exposure, and the results have been applied to epidemiological
studies; however, a review of these studies has not been
conducted.
The objectives of this review are to discuss recent ground-level
ozone statistical models for environmental epidemiology studies Fig. 1. The study selection process.
Table 1
Details of the studies included in this review.

Authors (year) Study location Study period Temporal Spatial Model Parameters Results
resolution resolution
1 Zhan et al. (2018) China 2015 Daily 0.1°  0.1° Random frost model Atmospheric pressure, evaporation, precipitation, R2 = 0.69;
maximum relative humidity, sunshine duration, temperature, RMSE = 26 mg/m3
8 h mean wind speed, planetary boundary layer height,
elevation, anthropogenic emission inventory, land
use, normalized difference vegetation index (NDVI),
road density, and population density
2 Di et al. (2017) America 2000-2012 Daily 1 km  1 km Neural network Satellite ozone measurements, CTM output, air R2 = 0.76 (0.74–0.80).
maximum 8h temperature, accumulated total precipitation,
mean downward shortwave radiation flux, accumulated
total evaporation, planetary boundary layer height,
low cloud area fraction, precipitation rate,
precipitable water for the entire atmosphere,
pressure, specific humidity at 2 m, visibility, wind
speed, medium cloud area fraction, high cloud area
fraction, albedo, ndvi and vegetation percentage

R. Ma et al. / Science of the Total Environment 701 (2020) 134463


3 Wang et al. (2015) Six metropolitan 1999-2013 Average two- 50 m  50 m LUR with universal Traffic, industrial and port emissions, population city R2: 0.65–0.88; local
areas in America week kriging density, land use and land cover, annual average of R2: 0.60–0.91
concentration specific emission sources
4 Malmqvist et al. (2014) Umeå and Malmö, 2012 Daily average – LUR and temporal Traffic data, land use data, population density, For Malmö, adjusted
Sweden model altitude, wind speed, wind direction, global R2 = 0.40 and CV
radiation, net radiation, temperature, vertical R2 = 0.17. For Umeå,
temperature difference (2–8 m) and vertical adjusted R2 = 0.67 and
temperature difference (24–8 m) CV adjusted R2 = 0.48.
5 Adam-Poupart et al. (2014) Quebec, Canada 2005 8-hr midday 1 km  1 km LUR mixed-effects Temperature, precipitation, days, years, road BME-LUR:
concentration model, Bayesian density and latitude; RMSE = 7.06 ppb,
maximum entropy R2 = 0.653; LUR:
model and kriging R2 = 0.466,
method model RMSE = 8.747; BME
kriging R2 = 0.414,
RMSE = 9.164
6 Kerckhoffs et al. (2015) Netherlands 2012: 2.28–3.15; 4.24– Average of 50  50 m LUR Traffic intensity, length of major roads, low density Summer: R2 = 0.71
5.10; 9.4–9.20; 11.28– the summer residential land, urban green, region annual; R2 = 0.77
12.14 and annual
period
7 Beelen et al. (2009) Europe 2001 Annual 1 km  1 km LUR Land use data, road traffic, population density, R2 = 0.70; RMSE = 7.69
average meteorology, altitude, distance from the ocean.
8 Wang et al. (2016) Los Angeles Basin, 2000–2008 Average two- 4  4 km LUR CTM data, road networks, industrial and port Local: RMSE = 4.56,
America week emissions, population density, land use R2 = 0.78
concentration Routine monitoring and
fixed sites: RMSE = 5.64,
R2 = 0.86
9 Wolf et al (2017) Augsburg, 2014.3-2015.4 Annual 1 km  1 km LUR Local land use, building density, population density, R2 = 0.92
Germany average household density, topography, coordinates, traffic
variables, road network
10 Son et al. (2018) Metropolitan area 2011, 2014 Hourly and 30 m  30 m LUR Temperature, humidity, precipitation, wind speed Hourly R2 = 0.653;
of Mexico monthly and hourly traffic density variables, elevation Monthly R2 = 0.719
11 De Hoogh et al. (2018) Western Europe 2010 Annual 100 m  100 m LUR CTM data, all roads, major roads, ports area, Annual model: R2 = 0.63;
average residential land cover, altitude, north-south trend, warm model: R2 = 0.44;
east-west trend, MACC dispersion model product, cold model: R2 = 0.67
natural land, total build up, urban green
12 Huang et al. (2017) Nanjing, China 2013 Annual 100  100 m LUR Longitude and slope R2 = 0.65
average

3
4 R. Ma et al. / Science of the Total Environment 701 (2020) 134463

3. Results the prediction results of the regression tree for multiple subsample
sets through several rounds of training (Yang, 2014).
Based on a preliminary analysis of the literature, we found three Random forest models have been widely used to simulate the
types of models that have been widely used: the land use regres- spatial and temporal distributions of air pollutants (Hu et al.,
sion model (LUR model), the random forest model and the artificial 2017; Zhan et al., 2017), but few studies have used random forest
neural network model. Detailed information of the studies models in ambient ozone exposure simulations. Zhan et al. (2018)
included in the review is shown in Table 1. assessed the ambient ozone exposure intensity and duration in
China in 2015. Meteorological data, elevation data, emissions
inventories, land use data, vegetation indexes, road density data
3.1. Land use regression model
and population density data were prepared based on 0.1°  0.1°
resolution to model the daily maximum 8-h mean ozone concen-
The LUR model is a multivariate regression model estab-
tration. The contribution of meteorological parameters was found
lished by the least squares method to simulate the spatial dis-
to be 65%, and the contribution of evaporation, which is a compre-
tribution of the air pollutant concentrations in the study area.
hensive index of temperature and humidity and has an important
The model integrates land use data and traffic and population
influence on the formation and stability of ozone, was especially
density data around the monitoring site into the geographic
notable. The authors also compared results with the spatiotempo-
information system (GIS) to analyze the relationship between
ral kriging interpolation method, showing that the performance of
these parameters and the spatial distribution of pollutant con-
the random forest model was slightly higher (the R2 of the random
centrations to estimate the concentration at any location that
forest model was 0.69, while the R2 of the interpolation method
does not have monitoring data (Wu et al., 2016). The variables
was 0.68).
are generated by setting an increasing buffer distance and
selected according to the backward algorithm and the forward
algorithm. Based on pollutant concentrations and geographic 3.3. Artificial neural network model
variables, a multivariate regression model was established by
the least squares method. The artificial neural network is a nonlinear complex network
LUR models were introduced in the air pollution modeling field system composed of a large number of simple neurons. In the
in 1997 (Briggs et al., 1997), and they are normally used in regres- learning process of the neural network, the connection and topol-
sion mapping for ambient particulate matter, NOx or VOCs (Hoek ogy of each neuron are constantly changing under the external
et al., 2008). To the best of our knowledge, the earliest studies stimulus until the network output gradually approaches the
using LUR models to simulate ambient ozone exposure were con- expected output. The model can be divided into a feedforward neu-
ducted in 2009 (Beelen et al., 2009). For the spatial scale, in some ral network and a feedback neural network. Convolutional neural
ways, LUR models mainly focus on a certain city or region, such as networks, as a type of deep feedforward neural network model,
Quebec (Adam-Poupart et al., 2014) in Canada or metropolitan reduce the complexity of network models, have the ability to per-
areas in the United States (Wang et al., 2015). However, some stud- form hierarchical learning and can accurately extract features
ies have modeled ground-level ozone exposure at a broader scale, (Chang, 2013; Lei, 2018).
such as Beelen et al. (2009), which was conducted in Europe. For Similar to the random forest model, the artificial neural net-
the temporal scale, studies that have focused on long-term expo- work has many advantages, but few studies implementing this
sure are generally based on the annual mean value (Beelen et al., method are currently used for ground-level ozone simulation.
2009; Kerckhoffs et al., 2015; Wolf et al., 2017; De Hoogh et al., Di et al. (2017a) established a convolutional neural network
2018; Huang et al., 2017), and studies that have focused on model that considered neighboring information through the
short-term exposure have scattered hourly values (Son et al., introduction of the convolutional layer and used remote sensing
2018), 8-h averages (Adam-Poupart et al., 2014), daily means ozone products, nitrogen oxide and sulfur dioxide data from
(Malmqvist et al., 2014) and two-week concentrations (Wang both monitoring sites and satellite-based products, volatile
et al., 2015, 2016). Most LUR models applied in large metropolitan organic compound (VOC) data from monitoring sites, chemical
areas with high spatial resolutions (Hoek et al., 2008), such as the transport model outputs, meteorological data and land use vari-
studies of the metropolitan areas of Mexico City, with a resolution ables to simulate near-surface ozone exposure based on a
of 30 m  30 m (Son et al., 2018), or Wang et al. (2015)’s research, 1 km  1 km grid in the United States from 2000 to 2012. R2
which was based on six metropolitan areas in America and used a reached 0.76, which indicates good performance. These predic-
resolution of 50 m  50 m. According to the parameters used in tive results were used in epidemiological studies that assessed
various studies, commonly used parameters include land use term, the relationship between ozone and mortality in the U.S. from
traffic density, population density, emissions inventory and mete- 2000 to 2012 (Di et al., 2017b).
orological parameters. Most studies evaluated the performance of
the model through the correlation coefficient (R2) or root mean
4. Discussion
square error (RMSE). The performances were generally good, with
R2 values up to 0.92 (Augsburg, Germany, 1 km  1 km, hourly)
We have reviewed the multiparameter statistical models used
(Wolf et al., 2017). The LUR model has been widely used in expo-
to simulate high-resolution ground-level ozone exposure. This
sure assessments in Europe and North America, and the simulation
review is of great significance for the development of future epi-
results have been used in epidemiological studies (Holle et al.,
demiological studies and facilitates the use of accurate exposure
2005).
assessment methods for subsequent studies. Furthermore, this
review can provide a basic understanding from some perspectives
3.2. Random forest model for the introduction of policies and management measures to pre-
vent and control the adverse effects of ambient ozone on human
The random forest model is an algorithm based on a classifica- health. We will introduce the basic process of developing research
tion tree composed of a tree classifier {h (x, H k), k = 1. . .}, where that simulates ground-level ozone exposure using statistical mod-
{H k} is an independent identical distribution random vector and els and compare the differences between studies to provide sug-
each tree votes on x. The final results are obtained by averaging gestions for future studies.
R. Ma et al. / Science of the Total Environment 701 (2020) 134463 5

4.1. Model selection compared the difference between fixed sites and purpose-
designed samplers: the R2 value for fixed locations ranged from
The most suitable model should be chosen after considering the 0.65 to 0.88, while for the predictions at home sites, the R2 value
study purpose, study area, data availability, and computing was between 0.60 and 0.91. Using purpose-designed monitoring
resources. or multiple sources of monitoring data can improve the ability to
Currently, the LUR model is more widely used than the random capture spatiotemporal variations in the model on a small scale,
forest and artificial neural network models, and the reasons are but it cannot adapt to the requirements for conducting large-
discussed as follows. First, the models have undergone long-term scale studies.
development, and all aspects of the technology are mature, with In addition to differences in the number and type of monitoring
fewer types of parameters, simpler operations and higher spatial network, the length and continuity in the time series of the moni-
resolutions. Furthermore, this model uses practical experience by toring data are also different. The metric used in the model should
setting parameter weights to establish models, which can avoid consider future applications. For health effects caused by long-
unreasonable results (Shi and Wang, 2016). However, it also has term exposure, complete time series are more important than high
disadvantages, such as unstable modeling methods and poor spa- temporal resolutions for the modeled results. For example, Wang
tial and temporal migration (Hoek et al., 2008). Different urban et al. (2015) simulated two-week, ground-level ozone exposure
structures may result in different model performances for the same from 1999 to 2013 in America. Due to the sample period, the
model. We found that the LUR model was more likely to be used in weekly time frame for the monitoring data limited the prediction
regions instead of whole countries because it has difficulty forming ability for 8-h maximum values that are commonly used, but
a prediction for long-term time series due to the search for high investigators believed that the two-week time scale can reduce
resolution. The LUR model has difficulty capturing small spatial the impact of meteorological factors and time autocorrelation.
changes in concentrations when the parameters of the study area For studies focusing on acute health effects, high-temporal-
are not very variable. When the change is too large, the modeled resolution metrics are needed. LUR models often fail to form
concentration tends to be higher than the true exposure, and the long-term time series in pursuit of a high spatial resolution, which
correlation is poor (Briggs et al., 2000). is one of the issues that researchers should consider when choos-
The random forest and neural network models are black-box ing models. For studies that include long time scales and high res-
models that explore the relationship between ozone exposure olutions, the machine-learning method is better.
and different variables and realize the rapid calculation of large-
scale pollution simulations. The random forest model improves 4.2.2. Parameter selection and preparation
performance without drastically increasing computational com- Multiparameters are a major feature used to simulate ambient
plexity. In addition, it can obtain the order of parameter contribu- ozone concentrations based on statistical models. They can extend
tions, has high insensitivity to multivariate collinearity, and has the variable category and adjust the variable form. The explanatory
high stability with missing and unbalanced data; thus, it can ability of the model can be improved, and the influences of specific
explain the influences of thousands of variables and does not need variables can be highlighted (Wu et al., 2016). The multiparameter
to be normalized (Li, 2013). Furthermore, random forest models trend for ambient ozone simulation is definite. Thus, parameters
have certain advantages that can help to avoid overfitting: a model can be considered in several ways.
with more trees obtains a limited generalization error (Breiman, First, the mechanism of ground-level ozone formation should
2001; Shu, 2013). However, it is not easy to explain the results dur- be considered. Ground-level ozone is a typical secondary atmo-
ing the process because of random forest models are black-box spheric pollutant and is formed by a series of photochemical
models. The artificial neural network has excellent performance reactions of nitrogen oxides (NOx) and VOCs under ultraviolet
and wide adaptability; however, it converges slowly and easily light at a suitable temperature. A variety of typical factors
overfits high-dimensional features, and a large number of calcula- directly affect the formation of ground-level ozone, including
tions results are required by the research team, which burdens the the concentration of precursor materials, meteorological factors,
computing platform. The random forest model and artificial neural and traffic-related factors affecting propagation. Socioeconomic,
network are more advantageous than the regression model when land use, vegetation coverage and other factors also have a cer-
handling multiparameter data, and they have prediction advan- tain impact on the formation and diffusion of the ground-level
tages for long-term time series and large study sites. ozone concentration (Chen et al., 2017). More specifically, the
After obtaining a good understanding of the basic research sit- parameters can include 1) meteorological parameters such as
uation, including the study purpose and the advantages and disad- wind speed, wind direction, solar radiation, temperature, rainfall,
vantages of different models, researchers can choose the and relative humidity, 2) geographic variables such as land use
appropriate model according to their own needs. On this basis, data, green space data, traffic data, residential and population
parameters can be collected and prepared. density data, regional indicators, elevation, 3) emissions inven-
tory parameters such as NO, NO2, organic carbon, NOx, VOCs,
4.2. Data preparation SO2 and particulate matter, and 4) monitoring site parameters
such as NOx, sulfur oxides, VOCs and other pollutants at the
4.2.1. Monitoring data ground monitoring site. Data products reflecting the formation,
The research included in this review shows a very large differ- distribution and simulation of ground-level ozone concentra-
ence in the area of the study region, from national scale to local tions, such as satellite data products provided by open platforms,
scale, which means that the number of monitoring sites varies such as the National Aeronautics and Space Administration
widely. Instead of routine monitoring networks, some investiga- (NASA) and CTM products from previous studies, were also used
tors set passive samplers on purpose or select stations for special in models to improve performance.
design (Wolf et al., 2017; Malmqvist et al., 2014; Wang et al., Furthermore, after considering factors that influence the for-
2015, 2016; Kerckhoffs et al., 2015). Purpose-designed monitoring mation of ground-level ozone, new researchers can be inspired
sites normally consider population exposure. The researchers by previous researches on the contribution of parameters. The
divide the sites into categories, such as regional background, urban contributions of variables vary by study. Zhan et al. (2018) found
background, traffic sites and industrial sites, or simply set monitor- that meteorological variables have the highest relative impor-
ing sites at the location of the subjects. Wang et al. (2015) tance, accounting for 65%, with the highest contributions being
6 R. Ma et al. / Science of the Total Environment 701 (2020) 134463

attributed to evaporation, temperature and humidity. Son et al. 4.3. Simulation scale determination
(2018) found that detailed emission patterns from local pollution
sources, coupled with wind field data (wind speed, wind direction Due to the characteristics of data availability and study pur-
and boundary layer height), are needed to improve current LUR poses, the time-scale simulation can be divided into continuous
models. Altitude (Beelen et al., 2009; Wang et al., 2016), road simulations and simulations for specific periods. Furthermore,
density, residential land (Beelen et al., 2009), green space and the spatial and temporal resolutions can be divided into high and
primary emission feature (traffic, population and impervious low resolutions. The high-spatiotemporal-resolution results can
surfaces) (Wang et al., 2016) data are also contributors. Further- provide fine-concentration population exposure, which can
more, Wang et al. (2016) and De Hoogh et al. (2018) found that improve accuracy and reduce conflicting factors, especially for
the use of CTMs results in models with improved performance. studies focusing on acute health effects. However, this also
Some variable contributions were opposite in different studies. increases the difficulty of data acquisition and processing and pre-
For example, Wang et al. (2015) found that smaller-scale GIS sents higher requirements for the hardware and software facilities
covariates, such as road network and population density, cannot of the research team. The low-spatiotemporal-resolution results
represent the characteristics of ambient ozone well, which may can save time and manpower and obtain results more quickly.
lead to poor performance of the ozone simulation model. More For long-term exposure studies that have low requirements for
kinds of parameters can be involved in the model, and the appro- time-resolution results, researchers can choose metrics at the
priate parameters can be selected during the model development annual or monthly level to improve effectiveness.
process. Based on study purpose and need, the model type can be cho-
After parameter selection, the process of preparing key sen. For finer simulations, especially on a large study scale with
parameters varied in previous articles. The specific metrics of complex trends, machine-learning methods performance better
some variables were different based on data availability and than the LUR model. Although studies that focused on ambient
the aims of the investigators. Taking traffic data as an example, ozone exposure were rare, applications for other air pollutants
a multicenter study conducted in Europe found that there was proved superior in some contexts. Using PM2.5 as an example, Hu
no high-resolution European database for roads, and traffic flow et al. (2017) used a random forest model in the United States in
data were not consistently available in these countries; thus, 2001 and obtained a simulation result of R2 = 0.8 based on a reso-
road length was the only traffic variable in the model (Beelen lution of 1 km  1 km; Zhan et al. (2017) used a geographically
et al., 2009). Some studies calculated conclusive variables to rep- weighted gradient lifting tree in China in 2014, and the result
resent local traffic conditions, like traffic density. However, the was obtained based on a spatial resolution of 0.5°0.5°
definition of traffic density is not the same in each study. Most (R2 = 0.76). Di et al. (2016) used a convolutional neural network
LUR studies considered traffic density to be a long-term variable from 2000 to 2012 at a 1 km  1 km resolution and obtained a total
(Hoek et al., 2008). Adam-Poupart et al. (2014) defined traffic R2 of 0.84; Li et al. (2017) used a neural network in China in 2015,
density as kilometers of road within a circular area with a 1- with an R2 up to 0.88 based on a resolution of 0.1°0.1°.
km radius, while Son et al. (2018) manually coded traffic density
from the Google traffic website, and Kerckhoffs et al. (2015) used 4.4. Model development and validation
traffic counts multiplied by 48 to obtain traffic intensities during
daytime hours and then by 1.29 to calculate traffic intensities The process of model development varies based on model type.
per 24-hour period. Some studies used multiple variables instead The land use model builds a linear regression model using a series
of conclusive variables, including road length, distance to nearby of related variables to simulate ozone exposure. The random forest
major roads and, within buffers, lengths of roads and truck model can be regarded as a regression model when simulating
routes and counts of intersections, to determine the specific rela- ozone concentration by building relationships between indepen-
tionship between traffic and ambient ozone (Wang et al., 2015). dent and dependent variables. The neural network freely learns
The quality of the raw traffic data was variable. Wang et al. any function form from the training data to simulate the ozone
(2015) used monitoring data from air quality system (AQS) mon- concentration. Di et al. (2017a) considered neighboring informa-
itors, which were purposefully placed away from roads, and tion during the modeling process through the introduction of a
studies using these data may have underestimated the effects convolutional layer.
of traffic indicators. Researcher considerations also impact data The processes of developing models based on regression algo-
preparation. For example, Wolf et al. (2017) divided land use rithms and machine-learning algorithms are similar: model
data into residential, industrial, built-up, urban green, forest, parameters and features are constantly adjusted to achieve opti-
seminatural and water body areas, while Malmqvist et al. mal performance. Regression algorithms and machine-learning
(2014) defined land use data as high-density residential, low- algorithms have similarities and differences in terms of model
density residential, industrial, port, urban green, seminatural establishment and verification.
and forested areas. Four types of land use data were defined However, the development processes of the three models could
by Huang et al. (2017), including residential land, agricultural be different. For the LUR model, parameter selection, which is
land, green spaces and water bodies. mainly conducted through a supervised stepwise procedure, is
The selection process mainly depends on the formation mecha- important. Basically, a regression model is used for all potential
nism of ambient ozone and previous studies or experience. Param- predictors and the one with the highest explained variance is cho-
eters commonly used in the model mainly include 1) sen. Then, researchers add further predictors step by step if the
meteorological variables such as temperature, humidity, evapora- increase in adjusted R2 is >1%. Some researchers use a priori effect
tion, solar radiation, wind speed, boundary layer height, and pre- direction for each variable and require that the effect direction of
cipitation, 2) precursor concentration, 3) geographical variables the variable included must be the same as the a priori direction
such as land use term, vegetation cover, and elevation, 4) emission and that the effect direction of variables already in the model does
inventory, 5) socioeconomic factors such as transportation, resi- not change. According to these researchers, not constraining a pri-
dential and population density, and 6) simulation products such ori directions leads to models that make unrealistic predictions. A
as chemical transport model output (Table 1). Researchers can constraint was needed because there was considerable collinearity
change the definitions of specific parameters based on their aims between predictor variables (Beelen et al., 2009). Others had the
to obtain an optimal combination. opposite view: ozone, as a secondary pollutant, was involved in
R. Ma et al. / Science of the Total Environment 701 (2020) 134463 7

many reactions, and the effect directions were unclear (Malmqvist extrapolated to other regions, should be considered. Some studies
et al., 2014). Other researchers (Son et al., 2018) used the least have found factors that influence the robustness of the models. The
absolute shrinkage and selection operator (LASSO) method, which degree of urbanization and the distribution of the monitoring sites
aims to improve the prediction accuracy and interpretability of the vary by region, so care should be taken when the data are extrap-
model (Tibshirani, 1996) and the performance of the traditional olated to other regions. For example, Wang et al. (2015) noted that
LUR method. Developing a multivariate logistic regression model models are limited when applied to areas where monitoring net-
requires knowing the optimal number of prognostic factors to works were sparse; Adam-Poupart et al. (2014) found that esti-
include. The LASSO method has a smaller mean square error mates exhibit higher discrepancies from data values that are
(MSE) than conventional methods and handles the multicollinear- mostly from regions characterized by data scarcity. Kerckhoffs
ity problem. Some researchers defined their own methods; for et al. (2015) found that, by excluding some observations, which
example, Wang et al (2015) developed a hierarchical spatiotempo- increased the urbanization of sites, the robustness of the model
ral model to accommodate unique features and selected the best remained good. Furthermore, Beelen et al. (2009) found that
model based on the cross-validated R2 value. Furthermore, restric- extrapolating far beyond the range of measured concentrations
tions were proposed to remove variables in some situations. These would lead to errors, with a possible solution being truncating
restrictions aimed to improve the stability of the models (Wolf the predicted concentrations to the highest measured concentra-
et al., 2017; Wang et al., 2015). For example, Wolf et al. set the fol- tion. To explore the relationship between ambient ozone and
lowing conditions: at least five sites must exhibit different values health more precisely, more high-resolution exposure assessments
and the minimum or maximum values must lie within threefold are needed, and care should be taken when assessment results are
of the 10th to 90th percentile range below or above the 10th and extrapolated from other places.
90th percentiles. In this way, the selection of specific predictors
that included mainly zeros or extreme outliers was prevented. 4.6. Prospects and limitations
For the machine-learning method, like the integrated learning
algorithm, parameter selection is mainly achieved by performing With the development of epidemiological studies, further stud-
subsequent operations based on model importance ordering of ies are needed that focus on simulating ozone with high spatial and
all parameters. The important parameters were number of vari- temporal resolutions. Complete time series and large coverage
ables included in every tree and the number of trees. Meanwhile, areas also provide convenience for the exposure assessments of
the deep learning algorithm has the ability to automatically learn populations on a fine scale. On one hand, the daily modeling data
features. The hidden layer number, hidden node number and active complemented the time series, especially provided historical expo-
function should be set. These model parameters should be adjusted sure data; on the other hand, the high-spatial-resolution modeling
during the model development process depending on model per- results provide exposure data for areas without monitoring sites.
formance, mainly reflecting R2, R, MSE, and RMSE, which are calcu- Exposure data at high spatial and temporal resolutions improves
lated during the validation process. exposure accuracy and reduces possible exposure assessment
The main validation method is the cross-validation (CV) errors for both acute and chronic environmental epidemiological
method, which can be divided into two types in these articles. studies. The modeling results are derived from statistical models
The first type is leave-out-one cross validation (Wolf et al., 2017), based on a variety of parameters affecting ozone formation and
which is the most common CV method (Geisser, 1974). In this field, elimination. The errors caused by relying solely on ozone monitor-
the basic idea is that a monitoring station is excluded from the ing instruments are reduced, as well as the errors due to the lim-
total number of sampling sites (K) to generate training datasets ited number and placement of monitoring stations. Furthermore,
with K-1 sites, resulting in fitted models that simulate pollutant the modeling results with high spatial and temporal resolutions
concentrations at the excluded sites (test set). This method is com- can better represent the spatiotemporal variability in the ambient
monly used for the LUR model and has been criticized for overes- ozone distribution and reduce the exposure error. Compared with
timating predictive ability (Wang et al., 2013). The other method developed countries, developing countries with shorter monitoring
is V-fold CV (Son et al., 2018), where V = 10, which means that histories and rare distributions of monitoring air pollution stations
the monitoring data are divided into 10 random splits, 9 for train- have greater demand for these studies. Furthermore, considering
ing the model and 1 for testing the model performance, until the the variance caused by the form of varied variables and the difficul-
best model performance is achieved, is the most common (Wang ties in comparing results, standard databases that are capable of
et al., 2015). Some researchers set the cut-off values at 75% and multiple parameter requirements and shared platforms for
25% (Beelen et al., 2009) or 80% and 20% (De Hoogh et al., 2018). researchers are inevitable.
In addition to model performance, the set of model parameters Limitations still exist in this field. First, current studies that
is also limited by the computing resources; for example, a higher have focused on fine ambient ozone exposure modeling are limited
tree number and number of steps for iteration would lead to better and need to be continuously developed. Most studies have focused
model performance in some contexts but also to a longer calcula- on low-pollution areas in developed countries, while few research-
tion time. The research team can set the parameters according to ers have paid attention to highly polluted areas in developing
research purpose and ability and finally establish the optimal countries, where monitoring stations are sparsely distributed and
model at the current level. the monitoring history is not long. The demand for the fine-
resolution assessment from epidemiological studies has not been
4.5. Application of the simulation results completely met. Second, the validation of the modeling results
remains a limitation. Availability and effectiveness should be
A few studies modeled ground-level ozone exposure in Europe, tested not only by the model performance but also by the results
North America and China, and some results have been used in epi- of epidemiological studies that use modeling results as exposure
demiological studies (Holle et al., 2005; Yang et al., 2018). The high data. Few studies have evaluated error type and the amount of
resolution of modeled ambient ozone exposure data provided a measurement error by comparing difference between true expo-
chance, in some contexts, to effectively explore the relationship sure and simulated exposure using different exposure data, which
between ozone and human health. will affect health risk estimates and statistical power (Goldman
Some studies achieved high model performance, but the robust- et al., 2011, 2012), and validation of modeling results was rarely
ness of the studies, i.e., what might happen when the model is tested in environmental epidemiology. A joint indicator system
8 R. Ma et al. / Science of the Total Environment 701 (2020) 134463

should be established to assess the applicability of exposure and De Hoogh, K. et al., 2018. Spatial PM2.5, NO2, O3 and BC models for Western Europe
– evaluation of spatiotemporal stability. Environ. Int. 120, 81–92. https://doi.
provide a theoretical basis for use in environmental epidemiology.
org/10.1016/j.envint.2018.07.036.
Di, Q. et al., 2016. Assessing PM2.5 exposures with high spatiotemporal resolution
across the continental United States. Environ. Sci. Technol. 50 (9), 4712–4721.
5. Conclusion https://doi.org/10.1021/acs.est.5b06121.
Di, Q. et al., 2017a. A hybrid model for spatially and temporally resolved ozone
exposures in the continental United States. J. Air Waste Manage. Assoc. 67 (1),
Multiparameter statistical models have been used to simulate 39–52. https://doi.org/10.1080/10962247.2016.1200159.
ambient ozone exposure, which improves the accuracy of exposure Di, Q. et al., 2017b. Air pollution and mortality in the medicare population. N. Engl. J.
assessment in epidemiological studies. This review summarized Med. 376 (26), 2513–2522. https://doi.org/10.1056/NEJMc1709849.
Geisser, S., 1974. A predictive approach to the random effect model. Biometrika 61
the basic process of model development in terms of model selec-
(1), 101–107. https://doi.org/10.1093/biomet/61.1.101.
tion, data preparation, determination of simulation scale, and Goldman, G.T. et al., 2011. Impact of exposure measurement error in air pollution
model development and validation, introducing the underlying epidemiology: effect of error type in time-series studies. Environ. Health 10 (1),
context and method for whole processes. Land use regression mod- 61. https://doi.org/10.1186/1476-069X-10-61.
Goldman, G.T. et al., 2012. Characterization of ambient air pollution
els are mature modeling techniques with low computational com- measurement error in a time-series health study using a geostatistical
plexity, and they are easy to use to obtain satisfying model simulation approach. Atmos. Environ. 57, 101–108. https://doi.org/10.1016/j.
performance at small scales; however, they have limited capacities atmosenv.2012.04.045.
Holle, R. et al., 2005. KORA–a research platform for population based health
to capture temporal variations and can miss short-term and regio- research. Gesundheitswesen 67 (Suppl. 1), S19–S25. https://doi.org/10.1055/s-
nal patterns. Comparatively, machine-learning algorithms are bet- 2005-858235.
ter at processing multiparameter data in larger research areas, Hoek, G., Beelen, R., Hoogh, K.D., et al., 2008. A review of land-use regression models
to assess spatial variation of outdoor air pollution. Atmos. Environ. 42 (33),
which increases computational requirements. The models have 7561–7578.
their own characteristics, and the study purpose, study area, data Hu, X. et al., 2017. Estimating PM2.5 concentrations in the conterminous United
availability, and computing resources should be considered during States using the random forest approach. Environ. Sci. Technol. 51 (12), 6936–
6944. https://doi.org/10.1021/acs.est.7b01210.
the model selection and development process. In the future, fur- Huang, L. et al., 2017. Development of land use regression models for PM2.5, SO2,
ther research based on multiple parameters and different simula- NO2 and O3 in Nanjing, China. Environ. Res. 158, 542–552. https://doi.org/
tion scales is needed. 10.1016/j.envres.2017.07.010.
Kerckhoffs, J. et al., 2015. A national fine spatial scale land-use regression model for
ozone. Environ. Res. 140, 440–448. https://doi.org/10.1016/j.
Declaration of Competing Interest envres.2015.04.014.
Lee, H. et al., 2018. Short- and long-term exposure to ambient air pollution and
circulating biomarkers of inflammation in non-smokers: a hospital-based
The authors declare that they have no known competing finan- cohort study in South Korea. Environ. Int. 119, 264–273. https://doi.org/
cial interests or personal relationships that could have appeared 10.1016/j.envint.2018.06.041.
Lei, H.J., 2018. The review of artificial neural network. China Science and Technology
to influence the work reported in this paper. Overview 16, 44–47 (in Chinese).
Li, X.H., 2013. Using random forest for classification and regression. Chinese J. Appl.
Entomol. 50 (4), 1190–1197 (in Chinese).
Acknowledgments Li, T. et al., 2017. Estimating ground-level PM2.5 by fusing satellite and station
observations: a geo-intelligent deep learning approach. Geophys. Res. Lett.
https://doi.org/10.1002/2017gl075710.
This work was funded by grants from the National Key Research
Malmqvist, E. et al., 2014. Assessing ozone exposure for epidemiological studies in
and Development Program of China (Grant: 2017YFC0211706), the Malmö and Umeå, Sweden. Atmos. Environ. 94, 241–248. https://doi.org/
Special Foundation of Basic Science and Technology Resources Sur- 10.1016/j.atmosenv.2014.05.038.
vey of the Ministry of Science and Technology of China (Grant: Peng, R.D. et al., 2013. Acute effects of ambient ozone on mortality in Europe and
North America: results from the APHENA study. Air Qual. Atmos. Health 6 (2),
2017FY101204), and the Beijing Natural Science Foundation 445–453. https://doi.org/10.1007/s11869-012-0180-9.
(7172145). Shi, X., Wang, F.H., 2016. Application of Geospatial information Technologies in
Public Health. Higher Education Press, Beijing, China (in Chinese).
Shu, X., 2013. Research of Object Tracking Based on Random Forest. Hefei University
References of Technology (in Chinese).
Son, Y. et al., 2018. Land use regression models to assess air pollution exposure in
Mexico City using finer spatial and temporal input parameters. Sci. Total
Anenberg, S.C. et al., 2010. An estimate of the global burden of anthropogenic ozone
Environ. 639, 40–48. https://doi.org/10.1016/j.scitotenv.2018.05.144.
and fine particulate matter on premature human mortality using atmospheric
Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. J. Roy. Stat.
modeling. Environ. Health Perspect. 118 (9), 1189–1195. https://doi.org/
Soc.: Ser. B (Methodol.) 58 (1), 267–288. https://doi.org/10.1111/j.2517-
10.1289/ehp.0901220.
6161.1996.tb02080.x.
Adam-Poupart, A. et al., 2014. Spatiotemporal modeling of ozone levels in Quebec
Wang, M. et al., 2013. Evaluation of land use regression models for NO2 and
(Canada): a comparison of kriging, land-use regression (LUR), and combined
particulate matter in 20 European study areas: the ESCAPE project. Environ. Sci.
Bayesian maximum entropy–LUR approaches. Environ. Health Perspect. 122 (9),
Technol. 47 (9), 4357–4364. https://doi.org/10.1021/es305129t.
970–976. https://doi.org/10.1289/ehp.1306566.
Wang, M. et al., 2015. Development of long-term spatiotemporal models for
Beelen, R. et al., 2009. Mapping of background air pollution at a fine spatial scale
ambient ozone in six metropolitan regions of the United States: The MESA Air
across the European Union. Sci. Total Environ. 407 (6), 1852–1867. https://doi.
Study. Atmos. Environ. (1994) 123 (A), 79–87. https://doi.org/10.1016/j.
org/10.1016/j.scitotenv.2008.11.048.
atmosenv.2015.10.042.
Bell, M.L. et al., 2004. Ozone and short-term mortality in 95 US urban communities,
Wang, M. et al., 2016. Combining land-use regression and chemical transport
1987-2000. JAMA 292 (19), 2372–2378. https://doi.org/
modeling in a spatiotemporal geostatistical model for ozone and PM2.5.
10.1001/jama.292.19.2372.
Environ. Sci. Technol. 50 (10), 5111–5118. https://doi.org/10.1021/acs.
Breiman, L., 2001. Random forests. Machine Learning 45 (1), 5–32.
est.5b06001.
Briggs, D.J., Collins, S., Elliott, P., et al., 1997. Mapping urban air pollution using GIS:
Wolf, K. et al., 2017. Land use regression modeling of ultrafine particles, ozone,
a regression-based approach. Int. J. Geographical Information Systems 11 (7),
nitrogen oxides and markers of particulate matter pollution in Augsburg,
699–718.
Germany. Sci. Total Environ. 579, 1531–1540. https://doi.org/10.1016/j.
Briggs, D.J. et al., 2000. A regression-based method for mapping traffic-related air
scitotenv.2016.11.160.
pollution: application and testing in four contrasting urban environments. Sci.
Wong, C.M. et al., 2008. Public Health and Air Pollution in Asia (PAPA): a multicity
Total Environ. 253 (1–3), 151–167. https://doi.org/10.1016/s0048-9697(00)
study of short-term effects of air pollution on mortality. Environ. Health
00429-0.
Perspect. 116 (9), 1195–1202. https://doi.org/10.1289/ehp.11257.
Chang, R.F., 2013. Prediction and Study of Atmospheric Pollutant Concentration in
Wu, J.S. et al., 2016. Application of land-use regression model in spatial-temporal
Ningdong based on Artificial Neural Network. Ningxia University (in Chinese).
differentiation of air pollution. Environ. Sci. 37 (2), 413–419 (in Chinese).
Chen, L. et al., 2017. Ozone pollution in China and its adverse health effects. J.
Yang, B.Y. et al., 2017. Is prehypertension more strongly associated with long-term
Environ. Occup. Med. 34 (11), 1025–1030 (in Chinese).
ambient air pollution exposure than hypertension? Findings from the 33
Cole-Hunter, T. et al., 2018. Estimated effects of air pollution and space-time-
Communities Chinese Health Study. Environ. Pollut. 229, 696–704. https://doi.
activity on cardiopulmonary outcomes in healthy adults: a repeated measures
org/10.1016/j.envpol.2017.07.016.
study. Environ. Int. 111, 247–259. https://doi.org/10.1016/j.envint.2017.11.024.
R. Ma et al. / Science of the Total Environment 701 (2020) 134463 9

Yang, B.Y. et al., 2018. Ambient air pollution in relation to diabetes and glucose- Zhan, Y. et al., 2017. Spatiotemporal prediction of continuous daily PM 2.5
homoeostasis markers in China: a cross-sectional study with findings from the concentrations across China using a spatially explicit machine learning
33 Communities Chinese Health Study. The Lancet Planetary Health 2 (2), e64– algorithm. Atmos. Environ. 155, 129–139. https://doi.org/10.1016/j.
e73. https://doi.org/10.1016/S2542-5196(18)30001-9. atmosenv.2017.02.023.
Yang, S.Q., 2014. Application of Random Forest Model In Fine Particle Concentration Zhan, Y. et al., 2018. Spatiotemporal prediction of daily ambient ozone levels across
Prediction In Taiyuan. Taiyuan University of Technology, Taiyuan, pp. 17–19 (in China using random forest for human exposure assessment. Environ. Pollut.
Chinese). 233, 464–473. https://doi.org/10.1016/j.envpol.2017.10.029.
Yin, P. et al., 2017. Ambient ozone pollution and daily mortality: a nationwide study
in 272 Chinese cities. Environ. Health Perspect. 125, (11). https://doi.org/
10.1289/EHP1849 117006.

You might also like