You are on page 1of 5

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL.

19, 2022 1004005

Satellite Remote Sensing of Daily Surface Ozone


in a Mountainous Area
Songyan Zhu , Hao Zhu, Jian Xu , Member, IEEE, Qiaolin Zeng , Dejun Zhang, and Xiaoran Liu

Abstract— High levels of surface ozone (O3 ) pollution threaten concentration varies by region due to many factors (e.g., mete-
human and environmental health. Chongqing, a mountainous orology and industrial structures). Previous studies mainly
municipality located in southwest China, is exposed to serious focused on flat eastern China (e.g., North China Plain and
O3 pollution and requires more studies. Due to its complex
terrain and always foggy weather, it is difficult to maintain Yangtze River Delta) [3], [4], but the impact of terrain could
many in situ sites in Chongqing, and chemical transportation not be neglected in mountainous topography.
model (CTM) simulations are also challenged. The recently Mountainous southwest China has been experiencing
launched (in 2017) Sentinel-5p satellite provides O3 columns with high levels of O3 pollution [3]. In situ monitoring [5],
advanced spatiotemporal resolution. Without the dependence satellite remote sensing [6], and chemical transportation mod-
on CTMs, we linked O3 columns and surface monitoring data
from 2019 to 2021 in virtue of a deep forest machine learning els (CTMs) [7] are the main approaches for O3 monitor-
model. Compared with another widely used machine learning ing/estimation. The complex terrain causes extra challenges
model and previous studies, our results showed great advantages in maintaining in situ sites (i.e., huge financial and labor
in estimating surface O3 on a daily scale. Validated against cost) and in modeling pollutants concentration/dispersion (e.g.,
in situ sites in Chongqing, averaged R 2 of cross validations difficulties in height layers parameterization) [8]. In addition,
reached 0.9, while the root-mean-squared error (RMSE) and
mean bias error (MBE) were 13.57 and 0.37 µg/m3 , respectively. studies only rely on in situ monitoring might face under-
We found out that the model performance is associated with sampling issues in capturing regional O3 variations, and the
the relative height difference between training sites and the test comprehensive use of in situ and grided data could be the
site. The model performed stably when the height difference was solution [9], [10].
lower than 200 m, but obvious performance degradation was seen Estimating surface O3 directly from satellite column
when the height difference is exceeding 400 m.
products is an alternative solution apart from improving
Index Terms— Deep forest, machine learning, mountainous CTM simulations because the latter approach requires more
areas, O3 pollution, Sentinel-5p, TROPOspheric Monitoring advanced hardware (e.g., computer clusters), which might not
Instrument (TROPOMI).
be available for all researchers across the globe [11]. Deep
learning has demonstrated the robustness in estimating air
I. I NTRODUCTION pollution due to its complex architecture [12]. In the litera-
ture, mountainous areas were less focused on than hot-spot
S URFACE ozone (O3 ), one predominant air pollu-
tant in China [1], is formed through photochemical
reactions between anthropogenic/biogenic volatile organic
areas (e.g., North China Plain) [1], [3]. For the first time,
we used a deep forest (DF21) [13] to estimate surface O3
compounds (VOCs) and human-release nitrogen oxides from the TROPOspheric Monitoring Instrument (TROPOMI)
(NOx = NO + NO2). The formation pathways of O3 depend level-3 O3 columns [14] in Chongqing from 2019 to early
on the ratio of the two precursors (VOCs/NOx) [2]. The O3 2021. The estimated O3 was validated against with both the
China National Environmental Monitoring Centre (CNEMC,
Manuscript received July 10, 2021; revised September 6, 2021; accepted http://www.cnemc.cn/en/) in situ network [9] and the widely
September 27, 2021. Date of publication October 13, 2021; date of current used CTM community multiscale air quality (CMAQ) [15].
version January 6, 2022. This work was supported in part by the Chongqing
Meteorological Department Business Technology Research Project under Our aim is to provide a low cost (i.e., used data and methods
Grant YWJSGG-202105. (Corresponding author: Hao Zhu.) are freely accessible, and the computation does not require
Songyan Zhu is with the Department of Geography, University of Exeter, advanced hardware) but high-accuracy approach to monitor
Exeter EX4 4RJ, U.K. (e-mail: sz394@exeter.ac.uk).
Hao Zhu, Dejun Zhang, and Xiaoran Liu are with the Chongqing Institute regional surface O3 from space.
of Meteorological Sciences, Chongqing 401147, China, and also with the
Chongqing Engineering Research Center of Agrometeorology and Satellite
Remote Sensing, Chongqing 401157, China (e-mail: zhuh1993@yeah.net). II. M ETHODOLOGY
Jian Xu is with the National Space Science Center, Chinese Academy of
Sciences, Beijing 100190, China (e-mail: xujian@nssc.ac.cn). A. Study Area and Data
Qiaolin Zeng is with the College of Computer Science and Technology, Chongqing is a large and mountainous municipality
Chongqing University of Posts and Telecommunications, Chongqing 400065,
China, and also with the Chongqing Institute of Meteorological Sciences, (82 300 km2 ) located in southwest China because about
Chongqing 401147, China (e-mail: zengql@cqupt.edu.cn). 1/3 days per year are foggy [16], which makes it more
This article has supplementary material provided by the challenging for estimating surface O3 from space. There are
authors and color versions of one or more figures available at
https://doi.org/10.1109/LGRS.2021.3119699. 17 CNEMC sites that provide hourly measured O3 mass den-
Digital Object Identifier 10.1109/LGRS.2021.3119699 sity in µg/m3 between 2019 and 2021 (Fig. 1). The difference
1558-0571 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Tsinghua University. Downloaded on May 22,2023 at 02:10:51 UTC from IEEE Xplore. Restrictions apply.
1004005 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022

Forecasts (ECMWF) [17]. ERA5 provides a large amount


of reanalyzed meteorological parameters, along with solar
radiation, fraction of cloud cover, ozone mass mixing ratio,
relative humidity, rainwater content, temperature, and U-/V-
component of wind that were used at five pressure levels
(200, 500, 700, 900, and 1000 hPa) to implicitly represent
(by nonlinear fitting) the impacts of meteorology on O3 in the
deep forest model.

B. Surface O3 Estimation Methods


1) Deep Forest: The deep forest (DF21) is an ensemble
of ensembles of decision trees [13], a cascade of tree-based
models (e.g., Xgboost [18]). Instead of using deep neural
networks (DNNs) [7], the DF21 was adopted here because
the following conditions hold [13].
1) Training DNNs requires a huge amount of data, but
DF21 exhibits great performance on small scale of
training data.
Fig. 1. Distribution of CNEMC sites and mean O3 (2019–2021) columns in
2) The architecture of DNNs could be too complicated
Chongqing. The base map shows the terrain of Chongqing. for many cases, but the complexity of DF21 is data-
dependent, i.e., the model complexity is automati-
TABLE I cally determined by terminating training via predefined
L ATITUDE /L ONGITUDE (◦ ), H EIGHT (m), AND S ITE -L EVEL O 3 thresholds [13].
M EASUREMENT OF 17 U SED CNEMC S ITES IN C HONGQING . 3) The element of DF21, i.e., tree-based models, has
M EAN O 3 (O3(MN) , µ G/M3 ) I S S ITE -M EASURED M ASS D ENSITY superior performance to DNNs on many tasks. The
OF 2019–2021 AND M EAN S UMMER O 3 (O3(MNS ) , µ G/ M 3 )
I S S ITE -M EASURED M ASS D ENSITY features of DF21 are favorable, because for the model
IN S UMMER OF 2019–2021 of estimating surface O3 , it should be light (without
many hyperparameters and heavy dependence on CTMs)
and be off-shelf for other researchers who might be
interested in other scales/regions with limited training
data.
2) Model Implementation: The O3 columns are the integral
of O3 density along layers of height, and the ratio of surface O3
to O3 columns depends on meteorological profiles that affect
the dispersion, depletion, and transformation of O3 . The series
of O3 physical and chemical processes could be considered as
stacked nonlinear functions
   
O3(col.) = f n f n−1 · · · f 2 f 1 O3(surf.) , O3(prof.)

S R, T, R H, W(U,V ) ,Vap,cloud
where O3(col.) is the TROPOMI O3 columns, O3(surf.) is the
CNEMC surface O3 mass density, O3(prof.) is the ERA5 O3
mixing ratio profiles, SR is the solar radiation, T is the
temperature profile, RH is the relative humidity profile, W(U,V )
in height of sites could be greater than 600 m, and all sites is the U-/V-wind component profile, Vap is the rainwater
showed high levels of O3 pollution (Table I). For the spatial content profile, and cloud is the cloud cover profile.
distribution, O3 concentration in the northwest area was higher TROPOMI O3 columns and ERA5 meteorology are inputs
than in the southeast part (Fig. 1). In opposite, the majority of (i.e., explanatory variables) for model training, and the
sites are located in the southeast area and only two sites are CNEMC O3 mass density (i.e., dependent variable) is
in the northwest part (1414A and 1416A) (Fig. 1). for model validation. We adopted the leave-one-out cross-
All datasets were from January 1, 2019 to January 31, validation (LOOCV) strategy, and therefore, we trained and
2021 because the TROPOMI Level-3 O3 product was avail- validated DF21 17 times. For each time, all data men-
able since late 2018. The TROPOMI Level-3 O3 product tioned of 16 sites were to train the model. O3 columns and
provides global gridded (0.01◦ × 0.01◦ ) O3 columns on ERA5 meteorology of the other one site were then inputted
a daily scale [14]. The hourly reanalyzed meteorology data to the trained model to get predictions. The model skill was
were derived from the fifth-generation atmospheric reanalyses assessed by comparing the predictions with the CNEMC O3
(ERA5) from the European Centre for Medium-Range Weather mass density of the other one site. Four statistical measures

Authorized licensed use limited to: Tsinghua University. Downloaded on May 22,2023 at 02:10:51 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: SATELLITE REMOTE SENSING OF DAILY SURFACE OZONE IN MOUNTAINOUS AREA 1004005

TABLE II
S TATISTICAL M EASURES B ETWEEN DF21 E STIMATED S URFACE O 3 AND
CNEMC M EASUREMENTS , R 2 B ETWEEN CNEMC M EASUREMENTS
2
AND TROPMI C OLUMNS ( RC_T ), AND R 2 B ETWEEN CNEMC
M EASUREMENTS AND CMAQ E STIMATION ( RC_C 2 ). T HE U NIT
OF RMSE AND MBE I S µ G/ M 3

Fig. 2. Site-averaged DF21 estimated surface O3 (blue line) and CNEMC


measurements (orange line). The temporal resolution is three days for better
visual demonstration. Both blue and orange shadow areas represent the
intersite standard deviation.

time series grouped by months at different years showed a


large difference, but no overall interannual pattern was seen.
It can be seen from the figure that in Chongqing, the summer
O3 was higher than that in spring and further higher than that
in autumn. In winter, when the lowest O3 was seen throughout
the whole year, the mean O3 concentration was approximately
one-third of that in summer.
Table II gives the statistical measures of DF21 validation,
R 2 between the CNEMC measured surface O3 and TROPOMI
2
columns (RC_T ), and R 2 between the CNEMC measured
2
surface O3 and CMAQ estimation (RC_C ). The first quartile
(Q1), median, and third quartile (Q3) of R 2 between CNEMC
measured and DF21 estimated surface O3 were 0.89, 0.92, and
Fig. 3. Daily site-averaged DF21 estimated, CNEMC, and CMAQ surface 0.94, respectively. The median of slope (of linear regression),
O3 grouped by months.
RMSE, and MBE were 0.97, 10.84 µg/m3 , and 4.55 µg/m3 ,
2 2
respectively. In contrast, RC_T and RC_C were low with
were used for assessing the model skill: determination coef- the median of 0.18 and 0.40, respectively. In terms of R 2 ,
ficient (R 2 ), linear regression slope, root-mean-squared error the median of DF21 was 4.1 times higher than RC_T 2
and
(RMSE), and mean bias error (MBE). 2
1.30 times higher than RC_C . Large spatial differences in the
statistical measures were seen, especially for RMSE and MBE.
III. R ESULTS Fig. 4 shows the spatial distribution of R 2 [Fig. 4(a)] and
Over Chongqing, DF21 reproduced the surface O3 time RMSE [Fig. 4(b)], as the slope positively correlated with
series in excellent agreement with the CNEMC measurements R 2 with the Pearson correlation coefficient (PCC) of 0.48
(Fig. 2). Seasonal patterns (e.g., summertime peaks and win- ( p-value = 0.05) and the absolute MBE significantly corre-
tertime troughs) found in the CNEMC measurements were lated with RMSE (PCC = 0.98 and p-value = 0.00). Both
captured by DF21 as well. The surface O3 in summer reached panels show that the performance of DF21 was better in the
around 100 µg/m3 and was around 20 µg/m3 in winter. The southeast Chongqing than that in the northeast Chongqing.
O3 concentrations (i.e., mass density) in 2019 seemed higher Site 1414A, where the elevation is about twice of other sites
than those in 2020, but no obvious interannual difference on average, had the worst performance. At this site, the O3
can be found from the figure. According to the narrower concentration was the highest compared to other sites, and
shadows for the DF21 O3 time series, the intersite variations the elevation was also the highest (600 m higher than the
of the DF21 results were smaller than those of the CNEMC nearest site 1416A and 344 m higher than the second highest
measurements. site 3015A) (Fig. 1 and Table I). Site 3015A was also seen
The daily DF21 O3 time series agreed well with the with large RMSE (Fig. 4(b) and Table II). On the contrary,
CNEMC measurements in all 12 months (Fig. 3). Daily scale DF21 performed better in the southeast area, where the spatial

Authorized licensed use limited to: Tsinghua University. Downloaded on May 22,2023 at 02:10:51 UTC from IEEE Xplore. Restrictions apply.
1004005 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022

and/or highly polluted areas require dense distribution of


ground sites to achieve reliable performance. However, most
sites in Chongqing were operated in areas with relatively
low heights and O3 concentrations. The application here is
also beneficial to select suitable locations for future ground
sites. For instance, in Chongqing, more studies are needed to
determine the exact location and minimum number of ground
sites to guarantee the model performance.
According to the China Traffic Management Bureau
(https://122.gov.cn/), the number of vehicles in Chongqing
reached five million in April 2021, top three of all Chinese
Fig. 4. Spatial distribution of (a) R 2 and (b) RMSE for DF21 in estimating cities. Vehicle combustion is a major source of NOx (a pre-
surface O3 . The base map shows the terrain of Chongqing. cursor of O3 ) [20], but measured O3 in the first four months
of 2020 was 0.5 µg/m3 per day higher than the same time
distribution of the sites was denser, the difference in the site in 2019, considering the lockdown during the COrona VIrus
height was smaller, and the overall O3 concentration was Disease 2019 (COVID-19) pandemic. This is different from
lower. the situation in other areas where significant variations were
observed [21] and implies that O3 pathway in Chongqing
might be more correlated with VOCs as suggested in [22].
IV. D ISCUSSION Future work is required to address this issue that is beyond
DF21 was validated to estimate daily surface O3 with- the scope of the current work. DF21 in estimating surface O3
out dependence on CTMs. The good performance (mean in Chongqing achieves a similar performance to previous work
R 2 > 0.9) can be attributed to the complexity of DF21 archi- in other regions [21]. Considering the promising performance
tecture [13] and the use of hourly CNEMC O3 and meteorol- despite difficulties in estimating surface O3 in Chongqing,
ogy. Indicated by the importance of explanatory variables [19], DF21 could potentially make a valuable contribution to the
relative humidity, temperature, and cloud cover were the surface O3 research in many other areas.
most important features by separately contributing 25%, 15%,
and 10% of the importance (Table S1). This suggested that
V. C ONCLUSION
in Chongqing, the foggy weather and clouds could heavily
affect both O3 photochemical process and model accuracy. In this study, we trained a deep forest (DF21)-based model
The performance of Xgboost validated in the same scenario to estimate surface O3 concentration in a mountainous area
(mean R 2 = 0.8) was worse than DF21 (Table S2). In addition, (Chongqing, China) from satellite observations on a daily
the use of DF21 is further suggested by the poor performance scale. This task is challenging considering the complex terrain
(R 2 < 0.6) of 2-/8-/100-layer multilayer perceptron (MLP) and clouds coverage in Chongqing, but the performance of
neural networks (Tables S3–S5) and the balance between DF21 was good (R 2 = 0.9) compared to previous studies,
accuracy and minimum requirements of run time/hardware and the stacked “forests” outperformed single “forest” (i.e.,
(Table S6). The use of hourly data increased the size of Xgboost). Despite the fact that DF21 can theoretically estimate
data by 24 times, and both DF21 and Xgboost showed better surface O3 on an hourly scale, the results were not convincing.
performance than trained with daily data (Tables S7 and S8). The DF21 model successfully reproduced surface O3 time
The use of hourly data also suggests that the DF21 can series at test sites, but the performance varied spatially. The
theoretically estimate hourly surface O3 , but it can be expected distribution density of ground validation site network had an
that the performance would heavily rely on the meteorology important impact on the performance of DF21. Due to an
data. insufficient number of in situ sites in the current training,
On an hourly scale, R 2 between CMAQ modeled O3 and the model performance still requires an improvement in the
CNEMC measurements was lower than 0.10 at all sites. regions where it has high altitude and high O3 pollution. In the
On the contrary, the mean R 2 at daily scale increased to 0.44 future, we will investigate the exact number and locations
(Table II), and this infers the difficulties to estimate the surface of ground sites to ensure good DF21 performance, and the
O3 in Chongqing, especially on an hourly scale. The perfor- application of DF21 will be extended to other areas.
mance of DF21 and Xgboost for estimating hourly surface O3
separately dropped by 8% (Table S9) and 20% (Table S10).
R EFERENCES
Therefore, we would not recommend estimating hourly surface
O3 from satellite columns, compared to previous work aiming [1] H. Liu et al., “Ground-level ozone pollution and its health impacts in
at improving the performance of CTM(s) [11]. In a word, for China,” Atmos. Environ., vol. 173, pp. 223–230, Jan. 2018.
[2] X. Jin and T. Holloway, “Spatial and temporal variability of ozone
researchers without easy in-time access to CTMs, estimating sensitivity over hina observed from the ozone monitoring instrument,”
surface O3 from satellite columns via DF21 can be a good J. Geophys. Res., Atmos., vol. 120, no. 14, pp. 7229–7246, Jul. 2015.
alternative to track O3 pollution on a daily scale. [3] T. Wang, L. Xue, P. Brimblecombe, Y. F. Lam, L. Li, and L. Zhang,
“Ozone pollution in China: A review of concentrations, meteorologi-
The location of ground sites can have a significant impact cal influences, chemical precursors, and effects,” Sci. Total Environ.,
on the DF21 model performance (Fig. 4), and the high altitude vol. 575, pp. 1582–1596, Jan. 2017.

Authorized licensed use limited to: Tsinghua University. Downloaded on May 22,2023 at 02:10:51 UTC from IEEE Xplore. Restrictions apply.
ZHU et al.: SATELLITE REMOTE SENSING OF DAILY SURFACE OZONE IN MOUNTAINOUS AREA 1004005

[4] X. Lu et al., “Severe surface ozone pollution in China: A global [13] Z.-H. Zhou and J. Feng, “Deep forest,” 2017, arXiv:1702.08835.
perspective,” Environ. Sci. Technol. Lett., vol. 5, no. 8, pp. 487–494, [Online]. Available: http://arxiv.org/abs/1702.08835
2018. [14] K.-P. Heue, K. U. Eichmann, and P. Valks, “TROPOMI/S5P ATBD of
[5] L. Kong, X. Tang, J. Zhu, Z. Wang, H. Wu, and J. Li, “Developing tropospheric ozone data products,” Deutsches Zentrum für Luft- und
high-resolution air quality reanalysis dataset over China for years Raumfahrt (DLR), Helmholtz-Gemeinschaft, Tech Rep. S5P-L2-IUP-
2013–2018 based on ensemble Kalman filter and surface observations ATBD-400C, 2018.
from CNEMC,” in Proc. EGU Gen. Assem. Conf. Abstr., 2020, p. 6848. [15] M. Astitha, H. Luo, S. T. Rao, C. Hogrefe, R. Mathur, and N. Kumar,
[6] A. H. Souri et al., “Revisiting the effectiveness of HCHO/NO2 ratios for “Dynamic evaluation of two decades of WRF-CMAQ ozone simula-
inferring ozone sensitivity to its precursors using high resolution airborne tions over the contiguous United States,” Atmos. Environ., vol. 164,
remote sensing observations in a high ozone episode during the KORUS- pp. 102–116, Sep. 2017.
AQ campaign,” Atmos. Environ., vol. 224, Mar. 2020, Art. no. 117341. [16] D. Liu, Z. Li, W. Yan, and Y. Li, “Advances in fog microphysics research
[7] H. Lu et al., “Adjusting prediction of ozone concentration based on in China,” Asia–Pacific J. Atmos. Sci., vol. 53, no. 1, pp. 131–148,
CMAQ model and machine learning methods in Sichuan-Chongqing Feb. 2017.
region, China,” Atmos. Pollut. Res., vol. 12, no. 6, Jun. 2021, [17] H. Hersbach et al., “The ERA5 global reanalysis,” Quart. J. Roy.
Art. no. 101066. Meteorol. Soc., vol. 146, no. 730, pp. 1999–2049, 2020.
[8] U. Radok, “Air pollution in the mountains,” Mountain Res. Develop., [18] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,”
vol. 2, no. 4, pp. 385–389, 1982. in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
[9] G. Yang, Y. Liu, and X. Li, “Spatiotemporal distribution of ground- Aug. 2016, pp. 785–794.
level ozone in China at a city level,” Sci. Rep., vol. 10, no. 1, pp. 1–12, [19] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32,
Dec. 2020. 2001.
[10] L. Gao et al., “Comparison of ozone and PM2.5 concentrations over [20] Rethinking the Ozone Problem in Urban and Regional Air Pollution,
urban, suburban, and background sites in China,” Adv. Atmos. Sci., Nat. Academies Press, Washington, DC, USA, 1992.
vol. 37, no. 12, pp. 1297–1309, Dec. 2020. [21] Y. Wang, Q. Yuan, T. Li, L. Zhu, and L. Zhang, “Estimating daily full-
[11] S. Zhu et al., “An optimization approach for hourly ozone simulation: coverage near surface O3 , CO, and NO2 concentrations at a high spatial
A case study in Chongqing, China,” IEEE Geosci. Remote Sens. Lett., resolution over China based on S5P-TROPOMI and GEOS-FP,” ISPRS
early access, Jul. 31, 2020, doi: 10.1109/LGRS.2020.3010416. J. Photogramm. Remote Sens., vol. 175, pp. 311–325, May 2021.
[12] T. Li, H. Shen, Q. Yuan, X. Zhang, and L. Zhang, “Estimating [22] B. Liang, X. Yu, H. Mi, D. Liu, Q. Huang, and M. Tian, “Health
ground-level PM2.5 by fusing satellite and station observations: A geo- risk assessment and source apportionment of VOCs inside new vehicle
intelligent deep learning approach,” Geophys. Res. Lett., vol. 44, no. 23, cabins: A case study from Chongqing, China,” Atmos. Pollut. Res.,
pp. 11,985–11,993, 2017. vol. 10, no. 5, pp. 1677–1684, Sep. 2019.

Authorized licensed use limited to: Tsinghua University. Downloaded on May 22,2023 at 02:10:51 UTC from IEEE Xplore. Restrictions apply.

You might also like