1 s2.0 S0269749121000518 Main

Environmental Pollution 273 (2021) 116473
Contents lists available at ScienceDirect
Environmental Pollution
journal homepage: www.elsevier.com/locate/envpol
Forecasting PM2.5 using hybrid graph convolution-based model

considering dynamic wind-field to offer the benefit of spatial
interpretability*
Hongye Zhou a, Feng Zhang a, b, *, Zhenhong Du a, b, Renyi Liu a, b, c
a
School of Earth Sciences, Zhejiang University, Hangzhou, 310027, China
b
Zhejiang Provincial Key Laboratory of Geographic Information Science, Hangzhou, 310028, China
c
Ocean Academy, Zhejiang University, Zhoushan, 316021, China
a r t i c l e i n f o a b s t r a c t
Article history: Air pollution is a complex process and is affected by meteorological conditions and other chemical
Received 19 November 2020 components. Numerous studies have demonstrated that data-driven spatio-temporal prediction models
Accepted 6 January 2021 of PM2.5 concentration are comparable with the model-driven model. However, data-driven models are
Available online 19 January 2021
usually depending on the statistical correlation between PM2.5 and other factors and have challenges in
dealing with causality in complex systems. In this paper, we argue that domain knowledge should be
Keywords:
incorporated into data-driven models to enhance prediction accuracy and make the model more phys-
PM2.5 concentration forecast
ically realistic. We focus on the influence of dynamic wind-field on PM2.5 concentration distribution and
Domain knowledge
Dynamic wind-field
fuse the pollution diffusion distance with the deep learning model based on a wind-field surface. In order
Graph convolution network to model spatial dependence between monitoring stations, which is dynamic and anisotropic because of
Temporal convolution network the wind-field, we proposed a hybrid deep learning framework, dynamic directed spatio-temporal graph
convolution networks (DD-STGCN). It expanded the ability to deal with space-time prediction in the
continuous and dynamic wind-field. We used a directed graph time-series to describe the vertex state
and topological relationship between vertices and replaced traditional Euclidean distance with wind-
field diffusion distance to describe the proximity relationship between vertices. Our experiment re-
sults demonstrated that the DD-STGCN model achieved a better prediction ability than LSTM, GC-LSTM,
and STGCN models. Compared to the best comparison model, MAPE, MAE, and RMSE were improved by
10.2%, 9.7%, and 9.6% in 12 h on an average, respectively. The performance of our model was further
tested during a haze period. In the case that two models both considered the effect of wind, compared
with the pure data-driven model, our model performed better in prediction distribution and showed the
benefit of spatial interpretability provided by domain knowledge.
© 2021 Elsevier Ltd. All rights reserved.
1. Introduction infant mortality (Heft-Neal et al., 2018). With the increasing

development of the economy, air pollution problems are becoming
PM2.5 (atmospheric particles with aerodynamic diameters less more and more serious all over the world. Requirements for man-
than or equal to 2.5 mm) is the main component of the hazy episode. aging haze is no longer limited to the acquisition of real-time air
Exposure to ambient PM2.5 can have detrimental acute and chronic quality monitoring data. Efficiently and accurately predicting the
health effects, including cardiac and pulmonary disease (Leclercq PM2.5 concentration for a long period in the future have implica-
et al., 2018; Hamra et al., 2014; Hoek et al., 2013), detrimental ef- tions for air pollution prevention and urban management planning.
fects on birth outcomes (Sun et al., 2020; Brauer et al., 2008), and PM2.5 forecasting methods are mainly divided into two cate-
gories: model-driven methods and data-driven methods. In the last
two decades, model-driven methods are widely used for predicting
* the spatial distribution of pollutant in various scales, represented
This paper has been recommended for acceptance by Prof. Pavlos Kassomenos.
* Corresponding author. School of Earth Sciences, Zhejiang University, Hangzhou, by the Community Multi-scale Air Quality (CMAQ) model (Foley
310027, China. et al., 2010), the Nested Air Quality Prediction Modeling System
E-mail address: zfcarnation@zju.edu.cn (F. Zhang).
https://doi.org/10.1016/j.envpol.2021.116473
0269-7491/© 2021 Elsevier Ltd. All rights reserved.
H. Zhou, F. Zhang, Z. Du et al. Environmental Pollution 273 (2021) 116473
(NAQPMS) (Wang et al., 2001), and the WRFChem model (Hong to the wind-field is dynamic, the spatial autocorrelation of stations
et al., 2020; Chuang et al., 2011). This type of model does not is unstable. Secondly, the wind direction largely determines the
require a large amount of historical weather data, and the accuracy direction of air pollutants spread. The easier it is for the pollutants
of the model prediction depends on how well the model agrees at a certain point to spread to the downwind area, and the more
with the actual atmospheric conditions. In addition, the clear causal difficult it is to spread to the upwind area. This makes the distri-
relationship between pollution sources and air pollution makes the bution of PM2.5 under the wind field anisotropic.
model very interpretable. However, the air pollution emission For forecasting PM2.5 concentration under the effect of dynamic
source data we obtain is always difficult to be comprehensive and wind-field, in this paper, we propose a novel deep-learning
true, and the amount of calculation is very large, even a super- framework Dynamic Directed Spatio-temporal Graph Convolution
computer takes a long time. Network (DD-STGCN). We use graph convolution and time convo-
Most recently, data-driven approaches have shown their ad- lution block to extract spatial dependency and temporal de-
vantages in predicting the trend of pollutants. As the change of pendency respectively. Besides, according to combining these two
PM2.5 concentration is periodic and trendy, statistical models for modules, we can further explore the changes in the dependence
time series were commonly used in early studies, such as the relationship that varies with the wind-field.
autoregressive integrated moving average (ARIMA) model (Wang
and Guo, 2009). In contrast to the statistical models, non- 2. Materials
parametric methods can better handle the complex nonlinear re-
lationships of temporal data due to their innate ability, such as 2.1. Study area
artificial neural networks (ANN) (Pe rez et al., 2000; Lu et al., 2002)
and support vector machine (SVM) (Lu and Wang, 2005). As a Our study area is suited in the Yangtze River Delta region (Fig. 1),
branch of machine learning, deep learning algorithms extract including Shanghai, Jiangsu, and Zhejiang provinces, which is one
complex high-level abstractions as data representations through a of the most economically developed regions in China. However,
hierarchical learning process (Najafabadi et al., 2015), which offers economic development, population growth, and pollution emis-
significant promise for modeling dynamic temporal correlation. sions have led to an increasing decline of the urban ecological
Extreme gradient boost (Xgboost) (Pan, 2018), deep belief net environment. Atmospheric composite pollution dominated by
(DBN) (Li et al., 2019), and recurrent neural networks (RNN) (Zhao PM2.5 has a significant impact on the Yangtze River Delta region,
et al., 2019; Bai et al., 2019) models have been widely applied to resulting in the continuous increase of ash weather, seriously
PM2.5 concentration forecasting. jeopardizing people’s health.
Methods mentioned above focus on the dynamic characteristics
of time and ignore the spatial dependence between monitoring 2.2. Datasets
stations. In deep learning methods, in order to take the spatial
dependency into account, there is a growing interest in blending The datasets used in this study mainly consisted of hourly in-
spatial learning and sequence learning (Reichstein et al., 2019), situ PM2.5 concentrations and wind-field data.
which can take full advantage of spatial and temporal dependency. Observed PM2.5 concentration. In 2012, according to the
The most common combination is the fusion of convolutional Environmental Air Quality Standards revised by the Ministry of
neural networks (CNN) and RNN, such as the Convolutional LSTM Environmental Protection, China listed PM2.5 as an air quality
(Xingjian et al., 2015; Pak et al., 2020), Quasi-RNN model (Bradbury monitoring indicator for the first time. From January 2013, the real-
et al., 2016) and APNet model (Huang and Kuo, 2018). However, the time release platform for urban air quality established by the China
regular convolution operation of these CNN-based methods only National Environmental Monitoring Center was officially launched
applies to process grid structures (images videos), thereby they are (http://106.37.208.233:20035/). As of 2015, there are 108 national
appropriate for spatial relationships only in the Euclidean space. To control stations in the Yangtze River Delta region. In this paper, we
solve this problem, Qi et al. (2019) constructed a station topology selected the stations with a PM2.5 data loss rate less than 5%, a total
map to describe the spatial relationship based on the station’s of 57.
spatial distribution. Through the convolution operation similar to Continuous wind-field data. Wind-field data was obtained
CNN, the graph is convoluted to extract the spatial relationship. from the CERA-SAT atmosphere reanalysis products. The period of
When forecasting PM2.5 concentration, much of the available this dataset is from January 1, 2008 to December 31, 2016. We
research has considered many other factors related to PM2.5 con- collected 10-m wind data with u and v component in 20152016.
centration, such as meteorological conditions (e.g., temperature, These two components represent the airflow in the warp and weft
humidity, and wind) and other pollutants (e.g., SO22,O3 and PM2.5). directions, respectively. Since the time resolution of the prediction
These studies mainly considered correlations between PM2.5 and in this paper is 1 h and the temporal resolution of this dataset is 3 h,
these factors but lack the consideration of the causality relationship we linearly interpolated between data points to generate an hourly
between them. In order to make the modeling results more time-series.
consistent with the actual situation, we consider incorporating
domain knowledge to enhance the model’s prediction ability. Han Y
3. Methods
et al. (Han et al., 2020) included a regularization term into the
model based on the strong statistical relationship between
3.1. Problem definition
PM2.5and PM10 pollution. However, there is no strong causal rela-
tionship between the two of them, thus this constraint is relatively
3.1.1. Definition 1
weak. Comparatively, wind-field has more significant impacts on
Dynamic graph time-series. We used a directed graph time-
the movement and distribution of air pollutants in a region (Li et al.,
series G ¼ fG 1 ; G 2 ; /; G t g to describe the topological structure
2014). The influence of wind-field on PM2.5 can be considered in
of stations in a dynamic wind-field. G t ¼ ðV t ; E t ; At Þðt 2½1; TÞ
terms of direction and speed. Firstly, different wind speeds have
different effects on the diffusion of pollutants. The wind speed represents a digraph network at time t. V t ¼ fV 1t ; V 2t ; /; V nt g
largely determines the diffusion distance of air pollutants, which denotes values of all stations at time t. E t ¼ fE 1t ; E 2t ; /; E nt g
directly affects the interaction between stations. Furthermore, due represents the series of links between stations at time t. At 2RNN
2
Fig. 1. Study area and dataset. (a) the location of Yangtze River Delta region, including Shanghai, Jiangsu and Zhejiang provinces. (b) The spatial distribution of PM2.5 monitoring
stations (marked by red circles). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
denotes the adjacency matric of G t . The elements of At are Therefore, when considering the influence of the wind-field in
computed based on the distances between stations. the PM2.5 prediction problem, we introduced the wind-field dis-
8 tance to measure the impact factor between the stations.
< 1;
> isj Calculation of the wind-field distance.
aij ¼ dij (1) The calculation of the wind-field distance can be divided into
>
: two steps (Fig. 2). First, with the full-coverage wind-field data ob-
0 ; otherwise
tained from the meteorological reanalysis product, we calculated
Notably, dij does not indicate Euclidean distance, but the wind- the wind-field distance between adjacent units in the wind-field,
field distance. then we can obtain a diffusion cost surface. Then, based on the
cost surface, we can use the shortest path algorithm (e.g. Dijkstra,
3.1.2. Definition 2 Floyd algorithm) to calculate the wind-field distance between any
PM2.5 prediction in dynamic wind-field. For the input dynamic two target points.
graph series G with length T, by considering the temporal correla- The critical point here is to calculate the diffusion distance.
tion and spatial topological relationship of the graph time-series Diffusion distance is used to describe the difficulty of diffusing air
fG tT ; /; G t1 g, we can predict the state of the vertex at time t. pollutants between adjacent units of the wind-field. The Gaussian
The problem can be defined as diffusion model is a standard model for solving the diffusion
problem in the wind-field. The basic formula of the model is as
c t ¼ argmaxPðV t jV
V tT ; /; V t1 Þ (2) follows
!
where fV tT ; /; V t1 g represents the concentration values of Q y2 z2
precious graph. V t represents the observed concentration values C0 ðx; y; z; uÞ ¼ exp 2 (3)
pusy sz 2sy 2sz
2
c t represents the predictive concentration values at time
at time t. V
t. where C0 ðx; y; z; uÞ indicates the air pollutant concentration. x and y
represent the downwind distance and the horizontal distance from
3.2. Grpah convolution the centerline of the wind direction respectively. z denotes the
height of the pollution source. u is the horizontal wind speed. sy
3.2.1. Graph construction and sz represent the standard deviation of diffusion in the hori-
A graph can represent the spatial association between geo- zontal and vertical directions, respectively. When considering only
spatial data. When we are forecasting PM2.5 concentration, we need horizontal diffusion, the formula can be simplified to (Li et al.,
to consider the spatial connection between the ground monitoring 2014).
stations and trained them in collaboration. Traditionally, we
directly used the straight-line distance between stations to repre- costðEAB Þ ¼ ½FðDA ; DM Þ þ FðDB ; DM Þ LAB (4)
sent the spatial association among the stations and use it as the
edge weight of the graph. In fact, this value can be understood as where A and B is the starting and target point. EAB is the edge be-
the difficulty of interaction between stations. PM2.5 is not tween the two points. DA and DB denote the azimuths of these two
completely freely diffused but will be affected by the wind-field. points. DM denotes the azimuth of EAB . LA represents the length of
3
Fig. 2. Steps of calculating wind-field distance. (a) full-coverage wind-field data. (b) Calculate the diffusion cost between adjacent cells and obtain a diffusion cost surface. (c) the
shortest path between two points in the wind-field based on the cost surface.
EAB . the function F is used to calculate the absolute value of azimuth constructed a directed graph of the wind-field every hour.
difference.
3.4. Graph convolution neural networks
3.3. Dynamic directed graph time-series construction
There are two basic approaches currently being adopted in
We used the wind-field distance to construct a directed graph extracting the spatial character of a graph. One is in the spatial
between PM2.5 stations. The graph structure generated at a domain (vertex domain), which is similar to the CNN on pixels, and
particular timestamp is shown in Fig. 3 (a). We used two directed the other is to manipulate in the spectral domain, which is derived
edges to connect each station with other stations, and the wind- from graph signal processing. For these two convolution methods,
field distance between the two stations is used to calculate the spatial domain-based GCN models can directly handle directed
weight of edges. Fig. 3(b) shows the adjacency matrix at this graphs, such as GraphSage (Hamilton et al., 2017), GAT (Veli ckovic
moment. It can be seen that in this matrix, the values on both sides et al., 2017) etc., spatial convolution is similar to the application of
of the diagonal are complementary. It is easy to understand that convolution in deep learning. Its core lies in aggregating the in-
when the pollutants from station A are blown down to station B, formation of neighbour vertices. As for the spectral domain-based
the status of station B hardly affects station A. Because we need to approach, since the spectral convolution method is based on the
consider the dynamically changing of the wind network, we assumption of an undirected graph, its Laplacian matrix is
Fig. 3. (a) Generated graph structure of PM2.5 monitoring stations. (b) Weighted adjacency matrix. The deeper of the color stands for the larger value. (For interpretation of the
references to color in this figure legend, the reader is referred to the Web version of this article.)
4
symmetric. Due to the definition of the Laplacian matrix of a graph series G of length TP , we can treat the timestep as x axis, the
directed graph is ambiguous, scholars have made many improve- number of the vertices as y axis, the features of vertex as the
ments to the directed graph convolution (Kampffmeyer et al., 2019; number channels, thus the input of the time convolution block Xvl 2
Monti et al., 2018; Ma et al., 2019), which are more complex than
RC T N can be analogized to an image with n channels, the size of
l l
the former. Furthermore, spectral convolution tends to take more

time than spatial convolution on training. Therefore, in this paper, convolution kernel f is ½C l ; C lþ1 ; Q ; 1, the convolution is performed
we performed the convolution in the spatial domain. as
Graph convolution in the spatial domain can be analogized to
convolution directly on the pixels of the picture. Each vertex col- Znlþ1 ¼ Xnl* fln (10)
lects information from its neighbours. Since the strength of the
relationship between the vertex and its neighbours is different, we where Q denotes the size of time window.
need to perform a weighted average calculation on the values of all For the adjacency matrix time-series, the input of the temporal
vertices. We used the tensor X to denote the signal on the vertices convolution is no longer an image, but a 3-D matrix of size N*N*T l .
of graph G, the process of updating the value of the vertex can be In our model, whether it is 3-D or 2-D convolution, the size of time
written as window Q remains the same
X * ¼ AX (5) ZAlþ1 ¼ XAl *flA (11)

To combine the characteristics of the vertices themselves, a self-
loop is usually added when updating the vertices’ states,
Xi* ¼ Aij Xj þ Xi (6)

3.6. DD-STGCN model
X * ¼ ðA þ IÞX (7) For a dynamic graph time-series, the temporal convolution layer
is used to update the vertex and adjacency matrix by merging the
then the adjacency matrix and the corresponding degree matrix dynamic information from the neighbouring time slice separately,
can be written as A ~ ¼ PA
~ ¼ A þ I and D ~ . and the spatial graph convolution is used to capture neighbouring
ij
j status for each vertex on the graph in the spatial dimension. Before
Because the edge weights of vertices differ widely, the infor- the spatial convolution operation, we first performed time convo-
mation needs to be normalized before updating vertices. The geo- lution on the dynamically changing vertex and adjacency matrix
pffiffiffiffiffiffi
metric mean ab is used for normalization. Thus, Equation (7) time-series to obtain the variation characteristics of adjacent time
transforms to slices. Thus, when we collected the state of the neighbour vertices
in the graph convolution to extract the spatial dependency, what
~ 1=2 A
X* ¼ D ~D~ 1=2 X (8) we captured is not the state of the vertex at a single timestamp, but
the temporal characteristics of each neighbour vertex over time.
Then, to transform the aggregated feature X * 2Rd of this vertex Since we are modeling the prediction problem in the dynamic
to the h dimension, a linear transformation matrix W2 Rdh is vector field, the spatial dependence between vertices is continu-
applied on the aggregated result ously changing. Then we stacked a temporal convolution after the
! spatial convolution to extract the temporal feature of the spatial
~ 1=2 A
~D~ 1=2 X l W l characteristics. This combination can make the spatio-temporal
X lþ1
¼s D (9) features better integrated. The convolution process can be
expressed as
where s represents the activation function.
X tc ¼ sðF*T X Þ (12)
3.5. Temporal convolution Atc ¼ sðF*T AÞ (13)
In space-time forecasting problems, the RNN-based model is !!!

widely used to extract temporal features. Because each step of X lþ1
¼ stemp F*T sgraph ~ 1=2 A
D ~D~ 1=2 X l W l (14)
these models depends on the result of the previous step, recurrent
neural network training is inefficient. Also, the computational
complexity of RNN forward propagation is OðjSequence LengthjÞ, the where X tc and Atc represent the new vertex and adjacency matrix
amount of calculation depends on the length of the sequence. Be- time-series after a temporal convolution operation *T. F denotes
sides, each time step in the long sequence includes a memory I/O the temporal convolution kernel. stemp and sgraph are activation
operation, constrained by the GPU’s maximum threads and functions after temporal and graph convolution respectively.
maximum memory bandwidth. Finally, we used a fully-connected layer network as a decoder to
To make temporal operation more efficient, we applied the turn the hidden vector into the final prediction vertex result. The
temporal convolution model, employing a convolution structure schematic of our proposed model DD-STGCN is illustrated in Fig. 4.
along the time dimension to extract the temporal feature. The main characteristics of the proposed model can be sum-
Compared with LSTM, even if the LSTM model has a memory gate, it marized as follows:
cannot fully remember all historical information. Yet, the con-
volutional network layer has a causal relationship between the 1) The introduction of domain knowledge. We use the dynamic
layers, which means that no historical information or future data wind-field distance to define the spatial correlation between
will miss. stations, which can make the spatial relationship better
For the vertices series V ¼ fV tTP þ1 ; /; V t g in the dynamic described.
5
Fig. 4. The architecture of DD-STGCN. The framework is mainly integrated by temporal convolution block and graph convolution blocks. The temporal convolution blocks are
divided into 2-D and 3-D convolution blocks, which are for vertex and adjacency matrix time-series, respectively. For the input graph series fG tT ; /; G t1 g with length T, we
transform the vertex and adjacency matrix time-series into new ones after temporal convolution. The result can be recombined as a new graph time-series and flows to the graph
convolution block for dynamic spatial feature extraction. In order to fully integrate the spatial and temporal features, we further added a temporal block on the vertex time-series.
Finally, we stack a fully-connected layer in the end to generate the final prediction result.
2) Support for dynamic directed graphs. Previous studies of space-

time prediction have not dealt with the dynamic non-Euclidean PP
n
dependency. By mining and reorganizing the time-series re- joi pi j
i¼1
lationships between vertices and adjacency matrices, we can MAE ¼ (15)
n
explore the changes in the dependence relationship that varies
with the wind-field.
n
3) More efficient extraction. The main module of our model, the 100% X
oi pi
MAPE ¼ (16)
n i¼1 oi

spatio-temporal convolution module, consists of time convolu-
tion and space convolution. Compared with other complex
neural network models, the calculation is more straightforward.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uP P
u n
u ðo pi Þ2
t i¼1 i
3.7. Experimental settings
RMSE ¼ (17)
n
We used the open-source TensorFlow framework to implement where oi and pi denote the observed value and predicted value of
our experiment, and all models are trained on Linux servers. First, the i-th sample. o denotes the mean value of observations.
we divided the data into multiple data blocks with a length of 60 h We evaluated our proposed model by comparing it with three
(history length þ prediction length), then all blocks are randomly baselines:
partitioned into a training set (60%), a validation set (20%), and a
test set (20%). In each block, we used the first 12 h of PM2.5 observed (1) LSTM. LSTM model is a type of RNN designed to capture long
value to forecast 8 time points (1, 2, 4, 8, 12, 24, 36, and 48 h) in the term dependencies in sequential data. It is widely used in
next 48 h. In order to alleviate the over-fitting of the neural many time-series prediction problems, but cannot model
network, we used the ReLU activation function with sparse acti- spatial relationships.
vation characteristics and the dropout strategy to reduce the (2) GC-LSTM (Qi et al., 2019). GC-LSTM is the state-of-the-art
connection complexity between neurons. We selected ADAM as the model in PM2.5 forecasting based on graph convolution. It
rules of gradient update. embeds the graph convolution network and long short-term
For evaluating the prediction effect, we calculated the deviation memory network. Experiments (Qi et al., 2019) have proved
between the predicted and observed value and the degree of con- that GC-LSTM performs better than LSTM due to the
sistency between the two. Therefore, this paper uses Mean Abso- considering of spatial dependency.
lute Error (MAE), Mean Absolute Percentage Errors (MAPE), and (3) STGCN (Yu et al., 2017). To illustrate the advantages of
Root Mean Squared Error (RMSE) as evaluation indicators. considering dynamic wind-fields, we use the Euclidean
6
distance between the stations as the input of the DD-STGCN lines’ slopes of the four models are all less than 1. For DD-STGCN
model, and keep the same input for each time step. and STGCN, there was an underestimation of the predicted value
in high-value areas, but the fitting in the low-value area was much
better. However, For LSTM and GC-LSTM, there was a significant
4. Result and discussion
underestimation in both high-value and low-value cases. Of the
four models, it can be seen from Fig. 5 that our model showed the
4.1. Comparison of model performance
best fit between the predicted and observed values, the correlation
coefficient (R2) is 0.75, which is visibly higher than those of other
In this part, DD-STGCN is compared with the state-of-the-art
three models (0.72, 0.67 and 0.63 respectively). This shows that
forecasting approaches: the LSTM, GC-LSTM, and STGCN models.
there was a remarkable improvement with the prediction accuracy
Table 1 shows the performances of our approach and those used as
by considering the directed wind-fields.
baselines for hourly PM2.5 forecasting. The comparison of the four
To study the difference in error between the two models after
models reveals that the performance of the DD-STGCN model is
considering the wind-field, we used the wind-field data to draw
better than the other three models according to the MAPE, MAE,
wind frequency rose charts. Fig. 6(a) is the wind speed-wind fre-
and RMSE values. This highlights the importance of considering the
quency rose chart, and Fig. 6(b) is the error difference-wind fre-
dynamic directional spatial and temporal dependency in fore-
quency rose chart. It can be seen that data with the highest wind
casting PM25 concentration.
speed in the test concentrated on the northeast (Fig. 6(a)). In
Compared to LSTM that only considers the time dimension,
conjunction with Fig. 6(b), it suggests that the data with the larger
models using graph structure exhibited better results. The result
error in the test set is also concentrated in the northeast and
shows that, for the prediction results in short, middle and long
southwest direction. We can infer that this is because the higher
term, compared with LSTM, MAPE, MAE and RMSE of GC-LSTM
wind speed leads to a more apparent difference between the wind-
improved by 10.9%, 12.1%, 14.9% (1e8 h), 6.1%, 7.2%, 12.2%
field distance and the conventional distance. Therefore, we believe
(12e24 h) and 2.3%, 7.5%, 4.9% (36e48 h), respectively. From the
that the consideration of wind speed is helpful in improving the
experiments performed, it turns out that there is a significant in-
prediction accuracy of the model.
crease in accuracy after accounting for spatial correlation, espe-
cially in short-term forecasting.
For GC-LSTM and STGCN that are both loaded on the factor of 4.2. Forecast of PM2.5 concentration during a haze period
spatial correlation, the accuracy improvement of STGCN was 10.3%
(MAPE),11.6% (MAE), and 8.0% (RMSE) over a 48-h period. Along For air pollutants such as PM2.5, the influence of the wind-field
with that, we have found for the first 12 h, the improvement of on its concentration is more evident on the daily or hourly scale.
forecast accuracy was below average (7.7% for MAPE, 8.9% for MAE, When the various meteorological factors (e.g., rainfall, air pressure,
5.6% for RMSE), while after 12 h, accuracy improvement is more temperature) are relatively stable, the effect of the wind-field is the
obvious (14.5% for MAPE, 16.2% for MAE, 12.0% for RMSE). This main factor that changes the spatial distribution of PM2.5 concen-
demonstrates that for long-term forecasts over 12 h, our stacked tration. In conditions of strong wind, it is more important to
network structure and the replacement of LSTM with TCN has consider the wind-field factor in forecasting. We used a short-term
advantages in reducing accuracy degradation. case to observe the impact of the wind-field on the prediction
The previous two spatial topological models only consider the result.
static spatial dependence relationship under the Euclidean distance We selected data from December 13 to December 15 in 2015 as
and do not take into account the dynamic change of the field. By the research time for the Yangtze River Delta region, and this is a
modeling directed dynamic topological relationships, the DD- typical case of pollution intrusion. As can be seen from Fig. 7, on
STGCN model effectively improved the prediction accuracy. We December 13, the pollutants slowly moved to the southwest, and
can see that three evaluation indicators of the DD-STGCN model in the wind-field turned to the northwest wind the next day. The
the first hour were 11.8% (MAPE),13.3% (MAE), and 9.5% (RMSE) pollutants then diffused from west to east and north to south. On
lower than those of STGCN, which indicates that the directional the 15th, the pollutant continued migration to the southeast, the
information of the wind-field plays a supporting role in forecasting Yangtze River Delta region was severely polluted, and pollution in
and is conducive to describe the spatial dependence between North China had dissipated, the air quality was getting better.
stations. We used kriging interpolation for the prediction results at each
In order to further compare the prediction capabilities of four station. Fig. 7(a) shows the result after the observations are inter-
models, we draw line charts and scatter plots (Fig. 5) of predicted polated, and Fig. 7(b) shows the result from our model. From the
and observed values on the test set of each model. The scatter trend perspective of the trend, the difference between our model and the
Table 1
The result for hourly prediction values of PM2.5 of different models.
model metric þ1 h þ2 h þ4 h þ8 h þ12 h þ24 h þ36 h þ48 h
LSTM MAPE (%) 26.43 28.35 28.78 33.22 33.98 35.67 36.33 37.65
MAE 16.45 18.84 20.22 24.87 24.43 27.56 29.87 30.61
RMSE 25.25 27.93 28.90 30.34 32.11 36.30 37.21 37.01
GC-LSTM MAPE (%) 24.20 25.28 25.59 28.75 31.88 34.56 36.01 36.29
MAE 15.20 17.48 18.01 19.27 22.44 25.80 27.63 28.31
RMSE 22.19 23.73 24.34 25.28 27.99 32.06 34.89 35.66
STGCN MAPE (%) 21.81 23.40 24.33 26.29 28.48 30.73 30.98 30.69
MAE 13.98 16.34 16.49 17.94 19.15 22.32 22.99 23.13
RMSE 20.87 22.12 22.89 24.67 26.02 28.34 30.23 31.67
DD-STGCN MAPE (%) 19.23 20.44 22.23 23.45 26.41 28.02 28.78 29.03
MAE 12.13 14.12 15.08 16.33 18.40 20.44 22.9 23.01
RMSE 18.90 19.41 20.22 22.51 24.62 27.23 28.11 29.47
7
Fig. 5. Line charts of daily predicted and observed value of different models on test data (left). Correlation between the predicted and predicted value on test data in 1-h prediction
(right).
real value was not large, and they all showed a trend of migration west. We selected STGCN and DD-STGCN model to study the spatial
from northwest to southeast. distribution of the standard deviation ellipse for the predicted
To better illustrate the advantages of considering wind-field, we pollutant distribution. To try to make the comparison more equi-
choose the concentration distribution at noon on the 14th as an table, we also fed wind-field information into STGCN as two input
example. In order to show the situation of the entire contaminated channels, that is, wind speed in u and v direction.
area more clearly, we expanded the scope of our research to the From the results of kriging interpolation (Fig. 8), the distribution
8
Fig. 6. (a) wind rose diagram on test dataset. (b) RMSE difference wind rose diagram.
Fig. 7. Spatial variations of the interpolation result. (a) Interpolation result of observed data. (b) Interpolation result of predicted data of DD-STGCN.
of two models in this region was basically the same on Dec.14th, basically covered highly polluted areas, and they were all in a
generally featuring south-low-north-high, and there was a clear northeast-southwest direction. Due to the overestimation of
striped area with an east-west orientation. Besides, in the central STGCN, the area of the STGCN’s standard deviation ellipse is larger
region of Jiangsu province, the STGCN model is obviously than the areas of the observed value and DD-STGCN’s ellipse. We
overestimated. superimposed the standard deviation ellipse with the wind-field.
For a better comparison of the spatial distribution of forecast The direction of the long axis of the two ellipses was consistent
results, we used the direction distribution tool to generate the with the wind-field (blue arrows in Fig. 8), indicating that the
standard deviation ellipse containing 68% of the data. It can be seen directionality of pollution distribution is closely related to the
from Fig. 8 that the standard deviation ellipses of three cases wind-field. We further calculated the ellipse’s oblateness. For the
9
Fig. 8. Standard deviational ellipse (the green ellipse) on December 14 for different model: (a) observed data (b)STGCN (c)DD-STGCN. (For interpretation of the references to color in
this figure legend, the reader is referred to the Web version of this article.)
spatial distribution of observed data, the oblateness of the ellipse is We will consider combining the propagation process mechanism of
0.58, which has obvious characteristics of distribution along the PM2.5 with deep learning methods to make the model more
direction of 135 . Compared with the pure data-driven model physically realistic.
(oblateness ¼ 0.47), our model (oblateness ¼ 0.60) has a better
performance in maintaining directionality. Declaration of competing interest
Through this case, both models consider wind speed and wind
direction at the same time, but the STGCN model only used wind The authors declare that they have no known competing
field information as input factors simply, while our model used financial interests or personal relationships that could have
domain knowledge to measure the interaction between the two appeared to influence the work reported in this paper.
stations. The experimental results showed that our model can not
only improve the prediction accuracy but also make the prediction Acknowledgements
results more reasonable in the spatial distribution.
This research was funded by the National Key R&D Program of
China (2018YFB0505000), National Natural Science Foundation of
5. Conclusion China (41671391, 41922043, 41871287).
In this study, we have proposed a hybrid deep learning model References

DD-STGCN for forecasting PM2.5 concentration in the dynamic
wind-field. Firstly, we constructed a dynamic graph time-series by Bai, Y., Zeng, B., Li, C., Zhang, J., 2019. An ensemble long short-term memory neural
network for hourly pm2. 5 concentration forecasting. Chemosphere 222,
calculating the field distance, and further adopted temporal
286e294.
convolution and graph convolution models to perform temporal Bradbury, J., Merity, S., Xiong, C., Socher, R., 2016. Quasi-recurrent Neural Networks
and spatial feature extraction on the vertexes and edges of the arXiv preprint arXiv:1611.01576.
Brauer, M., Lencar, C., Tamburic, L., Koehoorn, M., Demers, P., Karr, C., 2008. A cohort
graph time-series, respectively. Our experiment results indicated
study of traffic-related air pollution impacts on birth outcomes. Environ. Health
that the proposed model achieved better fitting and prediction Perspect. 116 (5), 680e686.
performances than the LSTM, GC-LSTM, and STGCN model by Chuang, M.-T., Zhang, Y., Kang, D., 2011. Application of wrf/chem-madrid for real-
comparing RMSE, MAE, and MAPE. After incorporating domain time air quality forecasting over the southeastern United States. Atmos. Envi-
ron. 45 (34), 6241e6250.
knowledge in our model, we found that there was a significant Foley, K., Roselle, S., Appel, K., Bhave, P., Pleim, J., Otte, T., Mathur, R., Sarwar, G.,
improvement in prediction accuracy, especially in the case of high Young, J., Gilliam, R., et al., 2010. Incremental testing of the community mul-
wind speed. Furthermore, we chose a special period of heavy tiscale air quality (cmaq) modeling system version 4.7. Geosci. Model Dev.
(GMD) 3 (1), 205.
pollution to analyze the performance of our model in a short Hamilton, W., Ying, Z., Leskovec, J., 2017. Inductive representation learning on large
period, experiment results showed that the interpolation result graphs. In: Advances in Neural Information Processing Systems, pp. 1024e1034.
distribution of our predicted values is more realistic and has the Hamra, G.B., Guha, N., Cohen, A., Laden, F., Raaschou-Nielsen, O., Samet, J.M.,
Vineis, P., Forastiere, F., Saldiva, P., Yorifuji, T., et al., 2014. Outdoor Particulate
advantage in maintaining directionality. We can further optimize Matter Exposure and Lung Cancer: a Systematic Review and Meta-Analysis.
the model from the following aspects: (i) More experiments on Environmental health perspectives.
other regions and periods need to be conducted for further verifi- Han, Y., Lam, J.C., Li, V.O., Zhang, Q., 2020. A Domain-specific Bayesian Deep-
Learning Approach for Air Pollution Forecast. IEEE Transactions on Big Data.
cation, (ii) extend the model from single-factor prediction to multi- Heft-Neal, S., Burney, J., Bendavid, E., Burke, M., 2018. Robust relationship between
factor prediction, (iii) consider the trend and periodicity of PM2.5 air quality and infant mortality in africa. Nature 559 (7713), 254e258.
concentration changes. Hoek, G., Krishnan, R.M., Beelen, R., Peters, A., Ostro, B., Brunekreef, B.,
Kaufman, J.D., 2013. Long-term air pollution exposure and cardio-respiratory
In this paper, we use domain knowledge to optimize the spatio-
mortality: a review. Environ. Health 12 (1), 43.
temporal prediction model of PM2.5 concentration, but it is still not Hong, J., Mao, F., Min, Q., Pan, Z., Wang, W., Zhang, T., Gong, W., 2020. Improved
sufficient. In the next step of our work, we will continue to explore Pm2. 5 Predictions of Wrf-Chem via the Integration of Himawari-8 Satellite
hybrid modeling which combines the strengths of physical Data and Ground Observations. Environmental Pollution, p. 114451.
Huang, C.-J., Kuo, P.-H., 2018. A deep cnn-lstm model for particulate matter (pm2. 5)
modeling (theoretical foundations, interpretable compartments) forecasting in smart cities. Sensors 18 (7), 2220.
and machine learning (data-adaptiveness) (Reichstein et al., 2019). Kampffmeyer, M., Chen, Y., Liang, X., Wang, H., Zhang, Y., Xing, E.P., 2019. Rethinking
10
knowledge graph propagation for zero-shot learning. In: Proceedings of the 113, 012127.
IEEE Conference on Computer Vision and Pattern Recognition, rez, P., Trier, A., Reyes, J., 2000. Prediction of pm2. 5 concentrations several hours
Pe
pp. 11487e11496. in advance using neural networks in santiago, Chile. Atmos. Environ. 34 (8),
Leclercq, B., Kluza, J., Antherieu, S., Sotty, J., Alleman, L., Perdrix, E., Loyens, A., 1189e1196.
Coddeville, P., Guidice, J.-M.L., Marchetti, P., et al., 2018. Air pollution-derived Qi, Y., Li, Q., Karimian, H., Liu, D., 2019. A hybrid model for spatiotemporal fore-
pm2. 5 impairs mitochondrial function in healthy and chronic obstructive casting of pm2. 5 based on graph convolutional neural network and long short-
pulmonary diseased human bronchial epithelial cells. Environ. Pollut. 243, term memory. Sci. Total Environ. 664, 1e10.
1434e1449. Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., et al.,
Li, L., Gong, J., Zhou, J., 2014. Spatial interpolation of fine particulate matter con- 2019. Deep learning and process understanding for data-driven earth system
centrations using the shortest wind-field path distance. PloS One 9 (5), e96111. science. Nature 566 (7743), 195e204.
Li, J., Shao, X., Sun, R., 2019. A dbn-based deep neural network model with multitask Sun, X., Liu, C., Wang, Z., Yang, F., Liang, H., Miao, M., Yuan, W., Kan, H., 2020.
learning for online air quality prediction. J. Contr. Sci. Eng. 2019. Prenatal Exposure to Residential Pm2. 5 and Anogenital Distance in Infants at
Lu, W.-Z., Wang, W.-J., 2005. Potential assessment of the “support vector machine” Birth: A Birth Cohort Study from Shanghai, china. Environmental Pollution,
method in forecasting ambient air pollutant trends. Chemosphere 59 (5), p. 114684.
693e701. Veli
ckovi c, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., 2017. Graph
Lu, W., Fan, H., Leung, A., Wong, J., 2002. Analysis of pollutant levels in central Hong Attention Networks arXiv preprint arXiv:1710.10903.
Kong applying neural network method with particle swarm optimization. En- Wang, W., Guo, Y., 2009. Air pollution pm2. 5 data analysis in los angeles long beach
viron. Monit. Assess. 79 (3), 217e230. with seasonal arima model. In: 2009 International Conference on Energy and
Ma, Y., Hao, J., Yang, Y., Li, H., Jin, J., Chen, G., 2019. Spectral-based Graph Con- Environment Technology, vol. 3. IEEE, pp. 7e10.
volutional Network for Directed Graphs, 08990 arXiv preprint arXiv:1907. Wang, Z., Maeda, T., Hayashi, M., Hsiao, L.-F., Liu, K.-Y., 2001. A nested air quality
Monti, F., Otness, K., Bronstein, M.M., 2018. Motifnet: a motif-based graph con- prediction modeling system for urban and regional scales: application for high-
volutional network for directed graphs. In: 2018 IEEE Data Science Workshop ozone episode in taiwan, Water, Air. and Soil Pollution 130 (1e4), 391e396.
(DSW). IEEE, pp. 225e228. Xingjian, S., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-c., 2015. Con-
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., volutional lstm network: a machine learning approach for precipitation now-
Muharemagic, E., 2015. Deep learning applications and challenges in big data casting. Advances in Neural Information Processing Systems, pp. 802e810.
analytics. Journal of Big Data 2 (1), 1. Yu, B., Yin, H., Zhu, Z., 2017. Spatio-temporal Graph Convolutional Networks: A Deep
Pak, U., Ma, J., Ryu, U., Ryom, K., Juhyok, U., Pak, K., Pak, C., 2020. Deep learning- Learning Framework for Traffic Forecasting arXiv preprint arXiv:1709.04875.
based pm2. 5 prediction considering the spatiotemporal correlations: a case Zhao, J., Deng, F., Cai, Y., Chen, J., 2019. Long short-term memory-fully connected
study of beijing, China. Sci. Total Environ. 699, 133561. (lstm-fc) neural network for pm2. 5 concentration prediction. Chemosphere
Pan, B., 2018. Application of xgboost algorithm in hourly pm2. 5 concentration 220, 486e492.
prediction. In: IOP Conference Series: Earth and Environmental Science, vol.
11

1 s2.0 S0269749121000518 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0269749121000518 Main

Uploaded by

Copyright:

Available Formats

Environmental Pollution 273 (2021) 116473

Contents lists available at ScienceDirect

Forecasting PM2.5 using hybrid graph convolution-based model

1. Introduction infant mortality (Heft-Neal et al., 2018). With the increasing

the former. Furthermore, spectral convolution tends to take more

X * ¼ AX (5) ZAlþ1 ¼ XAl *flA (11)

Xi* ¼ Aij Xj þ Xi (6)

3.5. Temporal convolution Atc ¼ sðF*T AÞ (13)

In space-time forecasting problems, the RNN-based model is !!!

2) Support for dynamic directed graphs. Previous studies of space-

model metric þ1 h þ2 h þ4 h þ8 h þ12 h þ24 h þ36 h þ48 h

In this study, we have proposed a hybrid deep learning model References

You might also like