1 s2.0 S096969972200120X Main

Journal of Air Transport Management 106 (2023) 102301
Contents lists available at ScienceDirect
Journal of Air Transport Management

journal homepage: www.elsevier.com/locate/jairtraman
Temporal attention aware dual-graph convolution network for air traffic

flow prediction
Kaiquan Cai a, Zhiqi Shen a, Xiaoyan Luo b, *, Yue Li a
a
School of Electronics and Information Engineering, Beihang University, Beijing, 100191, China
b
School of Astronautics, Beihang University, Beijing, 100191, China
A R T I C L E I N F O A B S T R A C T
Keywords: Air traffic flow prediction is vital for its supporting function for collaborative decision making in Air Traffic
Air traffic flow prediction Management. However, due to the inherent spatial and temporal dependencies of air traffic flow and the
Spatial and temporal dependencies irregular sector structure in which flow operates in, it is still a challenging problem. To solve this problem,
Dual-graph convolution
numerous methods are proposed considering airspace adjacency, while flight routes and the origin-destination
Long path dependencies
Temporal attention
dependency are not taken into account. In this paper, we propose a temporal attention aware dual-graph
convolution network (TAaDGCN) to predict air traffic flow, in which the airspace structure and routes of flow
are both included. Firstly, a complementary spatial dual graph convolution module is constructed to capture the
dependencies of adjacent sectors and origin-destination (OD) sectors. Then, to include long path information, a
spatial embedding (SE) block is adopted to represent potential related sectors of flight traversal. Furthermore, to
characterize temporal evolution pattern, a temporal attention (TA) module is applied to access past features of
input sequence. Based on the blocks stated above, a spatio-temporal block is constructed in which multiple
spatial and temporal dependencies are covered. The experimental results on real-world flight data demonstrate
the proposed method can achieve a better prediction performance than other state-of-the-art comparison
methods, especially superior to the methods that ignore the sector spatial structure.
1. Introduction perspective of the prediction area: sector, route, route points and airport
(Murça and Hansman, 2018). Since air traffic controllers manage flight
With the rapid economic development, the demand for air traffic in in units of sectors, we mainly focus on the flow of sectors in this paper. In
many fields has greatly increased (Kuhn, 2016; Wang and Cai et al., the recent years, massive efforts have been made to solve the problem
2020). However, there is a growing conflict between the rapid growth of (Cao and Zhang, 2013; Sridhar and Soni et al., 2006; Bayen and Grieder
air traffic demand and the limited capacity of the air traffic management et al., 2002). Related ATFP studies can be roughly divided into two
(ATM) system, which further contributes to airspace congestion and categories: the model-driven approach and the data-driven approach (Li
flight delays. (Li et al., 2008; Montlaur and Delgado, 2017; Chen et al, and Zhang et al., 2016; Tian and Zhang et al., 2016; Xu and Yin et al.,
2019; Yang and Mao et al., 2018; Cai and Zhang et al., 2017). These 2015). The model-driven approach formulates traffic problems by con
problems give rise to higher requirements for air traffic flow manage structing a model, which requires complex system programming and
ment (ATFM) (Zhang and Hao et al., 2015; Rocha et al., 2018). Efficient high computational complexity, especially when the number of aircrafts
air traffic management decisions are dependent on accurate air traffic has increased significantly (Bayen and Grieder et al., 2002; Lin and
flow prediction. Air traffic flow prediction (ATFP), as the core technol Zhang et al., 2018). The data-driven approach mainly include time series
ogy of ATFM, can provide decision-making basis to formulate effective algorithms and machine learning algorithms. The time series algorithms
management strategies, which has received increasing worldwide regard air traffic flow as a time series, which predict future flow by
attention (Xu and Prats et al., 2020). Accurate ATFP enables air traffic mining the potential evolution laws in history series. (Vardaro and Doan
controllers to be aware of the trend of air traffic flow, which is signifi et al., 2013; Cadenas; Rivera et al., 2016; Mehrmolaei and Keyvanpour,
cant to flight safety assurance (Sandamali and Su et al., 2020). 2016). With the rapid development of artificial intelligence, numerous
Air traffic flow prediction can be divided into four levels from the researchers have turned their attention to the machine learning
* Corresponding author. XueYuan Road No.37, HaiDian District, BeiJing, China.

E-mail address: luoxy@buaa.edu.cn (X. Luo).
https://doi.org/10.1016/j.jairtraman.2022.102301
Received 24 February 2022; Received in revised form 7 August 2022; Accepted 7 September 2022
Available online 26 September 2022
0969-6997/© 2022 Elsevier Ltd. All rights reserved.
K. Cai et al. Journal of Air Transport Management 106 (2023) 102301
methods, such as support vector machines (SVM) and shallow neural features. By stacking the ST blocks in the encoder-decoder architecture,
networks (Qiu and Li, 2014; Zhang and Jiang et al., 2016; Wang and multiple spatial and temporal dependencies can be effectively modelled.
Liang et al., 2018). However, these methods neglect the complex actual
environment in air traffic, which can not capture spatial correlation of 2. Related works
different regions, resulting in an unsatisfactory performance. Consid
ering the air traffic flow of different regions is closely related, some In recent years, traffic flow prediction has attracted widespread
researchers attempt to encode the air traffic flow into a traffic flow attention from researchers all over the world. The existing methods can
matrix (TFM), and model the spatial and temporal dependencies by be roughly divided into two categories: the model-driven approach and
convolutional neural network (CNN) and long short-term memory the data-driven approach. These methods are described in detail below.
(LSTM) (Lin and Zhang et al., 2019; Liu and Lin et al., 2019). Although Firstly, the model-driven approach is a formulaic representation of
spatial dependencies are considered in these methods by modelling air the traffic flow evolution law by constructing a model, and such ap
traffic as a grid matrix, it is not suitable for flow prediction of airspace proaches require comprehensive and detailed systematic modelling
sectors which have irregular topology. In addition, there are still based on prior knowledge. The representative algorithms contain the
shortcomings in modelling the long-term temporal dependencies. flight plan-based algorithms and the uncertainty analysis algorithms.
In summary, the ATFP performance in existing researches is still far The flight plan-based algorithms are traditional air traffic flow predic
from satisfactory, mainly due to the following challenges: tion approaches, which mainly use some rule-based knowledge in the
aviation system to infer the future air traffic flow (Lin and Zhang et al.
(i) It is difficult to efficiently model the irregular sector structure and 2018). For example, Chen et al. (2013) utilize the real-time flight plan
fully capture the complex spatial dependencies. Firstly, it should data in the EuroCat-X system to make air traffic forecasts on positioning
be noted that the air traffic flow distribution of any sector has points, airports, routes, sectors, etc. The model makes good use of the
significant dependencies with its neighboring sectors. Secondly, powerful computing abilities and abundant information resources of
the traffic flow between long-distance sectors will also affect each Eurocat-X system, and gets latest information of aircrafts’movement
other due to the fixed routes. For example, there are many directly from air traffic management, so that the real-time performance
scheduled flights between Beijing and Shanghai. Although they can be improved. The uncertainty analysis algorithms focus on the un
are far away, there is still a great spatial dependency between certainty in the flow evolution process, and model uncertain factors to
them. Finally, aircraft trajectories have fixed routes, resulting in a predict flight flow (Tian and Zhang et al., 2016; Steiner and Krozel et al.,
dependency relationship on the sectors crossed by a certain path, 2009; Xu and Yin et al., 2015). Zerrouki and Bouchon et al. (1999)
which is called long path dependency in this paper. propose a modified interaction prediction method using a fuzzy model
(ii) It is difficult to accurately capture the temporal dependencies for predicting air traffic flow, and fuzzy constraint is applied to describe
between the historical flow and the future flow. Historical traffic some uncertain information in air traffic system, which builds an air
patterns may affect the current traffic flow, which reveals the traffic flow prediction model based on the fuzzy constraint. Meyn (2002)
temporal dependencies of air traffic flow. However, long-term establish a simple and efficient probabilistic method for air traffic de
information is difficult to be accurately captured since most mand forecast. The method analyzes the uncertain factors affecting air
existing models can not directly access past features in long input traffic and models how aircraft interact using the uncertainties, which
sequences, which implies a limitation in capturing long temporal provides better flow estimates. The flight plan-based algorithms and the
dependencies. uncertainty analysis algorithms explicitly model flow variation patterns.
However, in reality, traffic flow data is affected by many factors, which
Based on these above observations, we propose a temporal attention makes it difficult to obtain accurate traffic prediction models. The
aware dual-graph convolution network (TAaDGCN) to predict air traffic existing models can not accurately describe the changes of traffic flow in
flow across airspace sectors, which can capture the spatial dimension complex realistic environments and the construction of these models
dependencies and the temporal dimension dependencies. To efficiently requires high computational complexity.
model the irregular structures of airspace sectors, a dual-graph module Secondly, the data-driven approaches infer trends based on statisti
is specifically designed for the characteristics of air traffic flow, i.e., the cal patterns in the data, which are ultimately used to predict and assess
adjacency graph and the origin-destination (OD) graph. Since air traffic the traffic conditions. Such methods do not analyze the physical char
network has irregular topologies which can be well modelled by graph acteristics and dynamic behavior of the traffic system, so that they are
convolutional networks, but how to construct the graph is an open highly flexible. Early data-driven methods include time series algo
problem. Existing strategies of constructing graphs are not suitable for rithms, which regard the historical air traffic flow as a time series and fit
flight data. Therefore, we construct a dual-graph structure combining the observed time series into a parametric model to predict future traffic
the characteristics of air traffic and integrate it into the graph convo flow. Since the flight flow of sectors for a certain period is affected by the
lution, which is called dual-graph strategy including adjacency graph flight flow of the first several hours, it could utilize the flow data of the
and origin-destination (OD) pair graph. To characterize the evolution of first few hours to predict the flow of the subsequent period. As early as
temporal information, we employ a temporal attention (TA) module. In 1976, the autoregressive integrate moving average model (ARIMA) is
addition, to combine the characteristics of long paths in air traffic, TA proposed and become the most widely used time series model, which is a
module is implemented after concatenating the input history flow and regression method using its own historical data (Vardaro and Doan
the spatial embedding (SE) based on path. Specifically, the proposed et al., 2013; Cadenas Rivera et al., 2016; Mehrmolaei and Keyvanpour,
spatial embedding (SE) block encodes sector nodes as vectors that pre 2016). To improve the prediction precision, different variants are pro
serve the long path information. To take multiple spatial and temporal duced, including subset ARIMA (Lee and Fambro, 1999), seasonal
dependencies into account, a spatio-temporal block (ST block) based on ARIMA (Fabian et al., 2003), and so on, which further improves the
the dual-graph module and the attention module is designed, in which prediction performance. With the development of artificial intelligence,
two complementary graphs are incorporated into the graph convolution machine learning methods have become representative of data-driven
to capture complex spatial dependencies, and an attention mechanism is methods. More researchers are trying to use machine learning models
adopted to efficiently capture complex temporal dependencies. Overall, to predict air traffic flow. These methods use smarter statistical models,
our model follows the encoder-decoder architecture, where the encoder which can make computer systems to find more complex patterns in
encodes the input traffic features and the decoder predicts the output large amounts of data. The machine learning algorithms as typical
sequence. Between the encoder and the decoder, there is a transform data-driven approaches, include support vector machines (SVM) and
layer, which is used to transform the time dimension of the hidden shallow neural networks (Wang and Liang et al., 2018). For example,
2
Fig. 1. (a) The spatio-temporal structure of air traffic flow, where the sector structure and flow at each time slice forms a graph. (b) The flow vector generated by
each node on the sector network.
Zhang and Jiang et al. (2016) use support vector machines to predict air 2020). However, these strategies of constructing graphs in existing
traffic flow, which improves the real-time monitoring and controlling in methods are not suitable for flight data, since there are scheduled flight
terminal areas. Qiu and Li (2014) propose a prediction method based on plans between some long-distance sectors. For example, there are many
the wavelet neural network, which uses nonlinear wavelet to replace the scheduled flights between Beijing Capital Airport and Shanghai Hon
nonlinear activation function in the conventional neural network. The gqiao Airport. Although these two sectors are far away from each other,
development of these methods demonstrates that machine learning is a there is still a great spatial dependency between them.
powerful tool in ATFP. For the temporal dependencies, Wu and Tan (2016) present a deep
However, none of the above methods considers spatial correlation. architecture combining CNN and LSTM to predict traffic flow. Shi and
To capture spatial correlation of the air traffic flow, some end-to-end Chen et al. (2015) propose a convolutional LSTM (ConvLSTM) by
deep learning models are proposed, which encode the air traffic flow extending the fully connected LSTM (FC-LSTM) to build a trainable
into a traffic flow matrix (TFM), and then extract spatial and temporal model for end-to-end precipitation nowcasting problems. Furthermore,
dependencies by CNN and LSTM (Lin and Zhang et al., 2019; Liu and Lin attention mechanisms are widely applied to various domains due to
et al., 2019). However, there are still limitations in these ATFP methods. their high efficiency and flexibility in modelling temporal dependencies
Air traffic controllers manage air traffic flow in units of sectors, so (Vaswani and Shazeer et al., 2017; Devlin and Chang et al., 2019; Park
modelling air traffic as a grid matrix is not suitable for flow prediction of and Lee et al., 2020; Zheng and Fan et al., 2020), and the core idea is to
airspace sectors which have irregular topology. adaptively focus on the most relevant features according to the input
In contrast, in road traffic flow prediction, many researchers have data.
made great efforts to solve the above problems. For the spatial de In summary, in the field of ATFM, there is a lack of satisfactory
pendencies, recent researches show the effectiveness of graph convolu performance in capturing the complex spatial and temporal correlation
tion network (GCN) in irregular data embedding (Kipf and Welling et al., of the airspace. Therefore, for temporal dimensional modelling, we
2016; Yao and Tian and Zhang, 2016; Cai et al., 2021; Chen et al., 2021). employ a temporal attention (TA) module to directly access past features
GCN generalizes CNN to irregular data forms, which can be mainly in long input sequence, which contributes to mining complex temporal
divided into two categories according to different convolution opera dependencies. For spatial dimensional modelling, despite GCN has been
tors. One is to rearrange the vertices into the grid forms and process proven to be suitable for processing data with irregular structure, how to
them by normal convolution operations (Niepert et al., 2016), the other construct the graph reasonably in GCN is still an open problem. Thus, we
is to use the graph Fourier transform to convert vertices into the spectral propose a dual-graph module specifically for the characteristics of air
domain (Bruna et al., 2013; Defferrard et al., 2016; Kipf and Welling traffic flow. Specifically, since adjacent sectors have flow transfer
et al., 2016). The latter applies convolutions in spectral domains by characteristics, an adjacency graph based on flight trajectory is
introducing the spectral framework, which has been widely used in designed. Since there are flow dependencies in origin-destination (OD)
existing works (Chai et al., 2019). Yu and Yin et al., (2017) propose a sectors, an OD graph based on OD pairs is proposed. In addition,
spatio-temporal graph convolutional network (STGCN), which formu considering aircraft trajectories have obvious path dependencies, we
lates traffic prediction problem on graphs and builds the model by design a path-based spatial embedding module to capture the long path
complete convolutional structures with faster training speed and fewer dependencies.
parameters. To capture the varying spatial dependencies among traffic
data, a dynamic spatio-temporal graph convolutional neural network 3. Problem definition
(DSTGCN) is proposed, which can model dynamic spatial dependencies
by finding the changes of the Laplacian matrix. Zhao and Song et al. 3.1. Air traffic flow
(2020) design a temporal graph convolutional network (T-GCN) model,
which combines the graph convolutional network (GCN) and the gated Air traffic flow is defined as the number of aircrafts at a certain time
recurrent unit (GRU). In T-GCN, the GCN is used to learn complex to in a specific area. In air traffic flow management, the sectors are a
pological structures for capturing spatial dependencies and the GRU is number of polygonal areas with different sizes geographically. Since air
used to learn dynamic changes of traffic data for capturing temporal traffic flow management is based on sectors, we focus on the number of
dependencies. It is worth noting that, strategies for constructing graphs aircrafts in sectors in this paper. Specifically, we construct a graph to
are often different in different tasks. Some directly use distance infor model the national air traffic flow transfer pattern, where each sector is a
mation to construct graphs (Yu and Yin et al., 2017), and others node of the graph. Each node has the property of traffic flow, and the
construct graph with similarity or correlation (Liu and Chen et al., edge represents relationship between different nodes. Putting it into
3
Fig. 2. The framework of the proposed TAaDGCN. GCN: Graph Convolution Network; FC: Fully Connected.
mathematical terms, the whole airspace sectors are regarded as a graph sectors should be considered to improve prediction accuracy.
structure, and each sector is a node v ∈ V of the graph, where V repre
sents the sector set, as shown in Fig. 1 (a). Each node on the network 4. Method
t
generates a flow vector Xi = [Xit0 , Xit1 , …, Xi j , …], as shown by the solid
lines in Fig. 1 (b). The flow of all sectors at the tj -th time slice is repre To model the temporal and spatial dependencies of air traffic flow
t t t t and predict flight flow of all sectors across the country, we propose a
sented as X(tj ) = [X1j , X2j , …, Xi j , …XNj ] ∈ RN , where N is the number of
t novel temporal attention aware dual-graph convolution network
sectors and Xi j ∈ R1 represents the air traffic flow of the i-th sector at the (TAaDGCN), and the overall architecture of the proposed model is
tj -th time slice. shown in Fig. 2. Considering the flow transfer characteristics between
adjacent sectors and the flow dependencies of origin-destination (OD)
3.2. Air traffic flow prediction sectors, we design a complementary dual-graph module to capture the
adjacent dependencies and OD dependencies. In addition, we employ a
Given the historical flight flow of all sectors with a fixed length of temporal attention (TA) module for modelling the evolution of temporal
time, the ATFP problem can be defined as a multi-step spatio-temporal information. Due to the long-path dependencies of air traffic flow, the
prediction problem for all airspace sectors, i.e., learning a function f : spatial embedding (SE) based on path is designed, and the TA module is
implemented after concatenating the input and the SE. Specifically, the
RN×P →RN×Q that maps the flight flow of P historical time steps X(tj − P+1) ,
input is firstly concatenated with the generated spatial embedding
…, X(tj ) to the flight flow in the next Q time steps X(tj +1) , …, X(tj +Q) :
vectors, and then into the encoder-decoder architecture. In the encoder
( )
X (tj +1) , …, X (tj +Q) = f X (tj − P+1) , …, X (tj ) , (1) and the decoder, to take multiple spatial and temporal dependencies
into account, L ST blocks are designed. Each ST block consists of a novel
complementary dual-graph module and two temporal attention mech
where, X(tj +1) , …, X(tj +Q) are the output of our model, and X(tj +Q) =
anisms. Between the encoder and the decoder, a transform layer is added
t +Q t +Q t +Q t +Q
[X1j , X2j , …, Xi j , …, XNj ] ∈ RN is flow of all sectors at the (tj + to model the relationships between historical P time steps and future Q
Q)-th time slice. In ATFP, the spatial dependencies between different
4
Fig. 3. Dual-graph. (a) The adjacency graph based on flight trajectory (b) The OD graph based on OD pairs.
time steps. By stacking ST blocks, our model can effectively model the Following this idea, we build the adjacency graph based on historical
complex temporal and spatial dependencies of air traffic flow, and can flight trajectories by using the number of flights from a sector to another
achieve a state-of-the-art performance. The details of each component sector. Specifically, suppose there are Nf historical flight trajectories: [F1 ,
are described as follows: F2 , …, Fi , …, FNf ], and the i-th trajectory Fi can be represented as : [fi0 , fi1 ,
…, fimi ], where fi0 represents the origin route point of the trajectory, fimi
4.1. Spatial dual-graph represents the destination route point, and mi represents the number of
route points of Fi . Then, the number of flights FlowAdj (va , vb ) between
Graph generation plays an important role in tasks that use GCN sector va and sector vb can be calculated as formula (2), and the number
(Bruna et al., 2013; Defferrard et al., 2016; Kipf and Welling et al., of flights between any two sectors FlowAdj ∈ RN×N can be calculated by
2016). Although it has been proven that GCN can effectively capture traversal.
irregular data relationships, how to construct graphs is still an unsolved Nf {
∑ ∑
mi
problem, especially in the air traffic field. The weight between nodes on FlowAdj (va , vb ) = y, y =
1, fij ∈ va , fij+1 ∈ vb
, (2)
graph represents the dependency and mutual influence. Since any sector i=1 j=0
0, others
has significant dependencies with its neighbors, distance is usually used /
to calculate the weight between nodes in road traffic, and short distance ∑
N
represents large weight, which is very reasonable in many fields. How WAdj (va , vb ) = FlowAdj (va , vb ) FlowAdj (va , vα ). (3)
α=1
ever, this is not the case in air traffic. For air traffic flow, there are two
dependencies worth considering. Firstly, the dependency between Finally, weight matrix of the adjacency graph, WAdj , is defined as the
adjacent sectors needs to be considered. Due to the certain directionality normalized number of flights as formula (3), and the diagonal values of
of air traffic flow, the influence between different adjacent sectors is WAdj are directly set to 0.
very different. Secondly, due to the scheduled planning of the air traffic,
some origin-destination sector pairs will also have great dependency 4.1.2. The OD graph based on OD pairs
although they are far away. Therefore, we construct a spatial dual-graph There are scheduled flight plans between some long-distance sectors.
structure specifically to capture the two dependencies between sectors Therefore, although some origin-destination sector pairs are far away,
more effectively, which includes the adjacent graph and the OD graph. they also have a great dependency. For example, there are many
Specifically, the adjacent graph is constructed according to the historical scheduled flights between Beijing Capital Airport and Shanghai Hon
flight trajectories, while the OD graph modelling long-distance sector gqiao Airport. Although they are far away, there is still a great de
relationship is constructed according to the origin-destination pairs in pendency of traffic flow between them. Therefore, we utilize the origin-
the flight plan, as shown in Fig. 3. From these two different perspectives, destination pairs to build the OD graph. Specifically, we use historical
the two complementary graphs can capture the influence between the flight mission data [F1 , F2 , …, Fi , …, FNf ] to count the number of flights
short-distance adjacent sectors and potential influence between the between any OD pair as formula (4). Then, the weight matrix of the OD
long-distance OD sectors. graph, WOD , is defined as the normalized number of flights between OD
In this section, we firstly describe how to construct the adjacency pairs sectors as formula (5), and the diagonal values of WOD are directly
graph based on the historical flight trajectories GAdj = (V, EAdj , WAdj ) and set to 0.
the OD graph based on OD pairs GOD = (V, EOD , WOD ). V is the set of {
Nf
nodes (|V| = N) and each node represents a sector. EAdj and WAdj FlowOD (va , vb ) =
∑
z, z =
1, fi0 ∈ va , fimi ∈ vb
, (4)
represent the edges between nodes and the weights between nodes, i=1
0, others
respectively. Both GAdj and GOD share the same nodes, but have different /
edges and weights. ∑
N
WOD (va , vb ) = FlowOD (va , vb ) FlowOD (va , vα ). (5)
α=1
4.1.1. The adjacency graph based on flight trajectory
In view of the air traffic flow pattern, the flight flow between adja
cent sectors may affect each other. However, the impacts between 4.2. Path-based spatial embedding block
different adjacent sectors are very distinct due to the certain direction
ality of flight flow. For example, although Taiyuan and Shijiazhuang, as The dual-graph module reflects the relationship between the adja
well as Taiyuan and Hohhot are adjacent sectors, the flow of the former cent sectors and the long-distance OD sector pairs. However, the de
is larger than the latter due to the air traffic flow pattern. Naturally, the pendencies on the long path during flight can not be captured.
weight of the former must be much larger than the latter. Therefore, a path-based spatial embedding module is proposed, which
5
Fig. 4. Attention mechanism models temporal correlation.
incorporates the long path information during flight into prediction problem of the irregular structure graph. Taking the air traffic flow at
models. Specifically, the proposed spatial embedding (SE) block encodes time tj as an example, for a single graph W ∈ {WAdj , WOD }, suppose x ∈
sector nodes as vectors that preserve the long path information. RN is a signal defined on tj -th time slice graph. The spectral convolution
The idea of spatial embedding block is inspired by word2vec in of the signal x and the kernel Θ on graph W can be defined as (Kipf and
natural language processing (NLP) (Mikolov and Chen et al., 2013; Le Welling et al., 2016):
and Mikolov, 2014), which can encode each word into a vector while ( )
retaining the contextual meaning of the word. The proposed path-based Θ∗G(W) x = Θ(ML )x = Θ UΛU T x = UΘ(Λ)U T x, (7)
spatial embedding block firstly performs a random walk on the con
structed adjacency graph to obtain multiple paths, and then the obtained (8)
1 1
ML = In − D− 2 WAdj D− 2 = UΛU T ∈ RN×N ,
multiple paths can be used as the input of the skip-gram network, which
is a shallow feedforward neural network and can embed each node as a where ML is the Laplacian matrix that can represent the graph W, In is an
∑
vector while retaining the contextual dependencies in the path (Grover identity matrix, D is a diagonal degree matrix with Dii = Wij . U is the
j
and Leskovec, 2016). The details of the random walk and the skip-gram
matrix of the eigenvectors of the normalized graph Laplacian matrix, Λ
network are described in the appendix. Through the processes above
is the diagonal matrix of the eigenvalues of ML .
mentioned, the vector representation of each node can be obtained,
The above graph convolution operation transforms both of signal x
while retaining the long path information. Subsequently, we obtain the
and kernel Θ into the Fourier spectrum domain, multiplies the trans
spatial embedding, represented as sevi ∈ Rd , where vi ∈ V.
formed results, and then performs inverse Fourier transform to obtain
the final result of the graph convolution operation. However, the
4.3. Spatio-temporal block
computational complexity of equation (7) is O (n2 ). Therefore, Cheby
shev polynomial approximation is adopted to reduce the cost of equation
Spatio-temporal block is formed as a “sandwich” structure which
(7) to O (K|ε|):
includes two temporal attention layers and one spatial graph convolu
tion layer in between. The attention layer is implemented for extracting ∑
K− 1
temporal features, and the length of the time dimension will not be Θ∗G(W) x = Θ(ML )x ≈ ̃L )x,
θk Tk (M (9)
changed after the attention layer.
k=0
where θk ∈ Rk is a vector of polynomial coefficients. K is the kernel size

4.3.1. Spatial dual-graph convolution of graph convolution, which determines the maximum radius of the
To fully exploit dual-graph structure containing complex spatial ̃L = 2ML − In , λmax is the maximum
convolution from the central node. M
correlation information, a novel spatial dual-graph convolution layer is λmax
proposed, as defined in formula (6): eigenvalue of the Laplacian matrix. Tk (M̃L ) ∈ RN×N is the Chebyshev
( ∐ ) polynomial of order k. For multi channels input X(tj ) ∈ RN×d , the graph
Hl+1 = σ Θ∗G(W) Hl , (6) d
∑
convolution can be generalized by Θi ∗G(W) xi , where xi ∈ RN×1 is the
W∈{WAdj ,WOD }
i=1
i-th feature channel of X(tj ) .
where Hl ∈ RT×N×d and Hl+1 ∈ RT×N×d represent feature vectors of N
Overall, the dual-graph convolution module takes Hl ∈ RT×N×d as
node in layer l and l + 1 respectively, N is the number of sectors, T adj
input. For all T time slices Hlt ∈ RN×d and Hlt ∈ RN×d , the equal graph
represents the time dimension, and d is the feature dimension. σ denotes
the activation function,
∐
denotes the aggregation function, e.g., sum, convolution operation with the same parameters is imposed in parallel,
adj
max, average etc. WAdj , WOD are the graph based on flight trajectories and then T × HOD
lt ∈ R
N×d
and T × Hlt ∈ RN×d can be calculated using
and the graph based on OD pairs, respectively. Θ represents the the OD graph and the adjacency graph, respectively. Subsequently, the
convolution kernel, ∗G(W) represents graph convolution operator on output of OD graph convolution HlOD ∈ RT×N×d is obtained by concate
graph W based on the conception of spectral graph convolution. nating T × HOD
lt ∈ R
N×d
on the time dimension, while the output of ad
Spectral graph convolution extends the idea of the traditional jacency graph convolution Hl
adj
∈ RT×N×d is obtained by concatenating
convolution into the spectral domain to solve the feature extraction
6
adj
T × Hlt ∈ RN×d . The final output of the dual-graph convolution module HL′ ∈ RQ×N×d is input to the decoder which has the same structure as the
Hdual ∈ RT×N×d is obtained by adding HOD and Hl
adj
as follows: encoder, and the predicted value Y ̂ ∈ RQ×N is finally generated after
l
several spatio-temporal blocks and two fully connected layers. Our
H dual
=H OD
+H . adj
(10) TAaDGCN is trained by minimizing the mean square error (MSE) be
tween the ground truth Yi,tj and the predicted value Y
̂ i,tj :
4.3.2. Temporal attention
In temporal dimension, the flight flow of a certain sector is related to 1 ∑N ∑ Q
( )
loss = ̂ i,tj 2 .
Yi,tj − Y (15)
its previous observations, and the correlation changes nonlinearly with NQ i=1 tj =1
the time step. To mine complex temporal dependencies, a temporal
attention (TA) module is employed. Specifically, the attention mecha
5. Experiments
nism is used to adaptively assign different importance to the previous
observations, as shown in Fig. 4. For node v1 , the hidden state of layer Hl
5.1. Experiments settings
at step tj is directly determined by the information from time steps
earlier than the tj -th step, formulated as follows: 5.1.1. Dataset
∑ ( ) The original data is provided by the Aviation Data Communication
h(l)
vi ,tj = βtj ,t ⋅fValue h(l− 1)
, (11)
t∈N
vi ,t
Corporation (ADCC), China. It mainly includes dynamic flight operation
data and static airspace data. The flight operation data covers the flight
tj
plan data and the real trajectory data from September 1, 2018 to
where hvi ,tj represents the hidden state of layer Hl at step tj , N repre
(l)
tj
November 31, 2018, with an average daily number of flights of 18,000.
sents a set of time steps before tj , fValue ( ⋅) represents a learnable trans The flight plan data includes airports, planned flight routes, planned
form, which corresponds to the value vector in the attention. βtj ,t departure and arrival time, real departure and arrival time, etc. The real
represents the attention score, defined as the correlation between time trajectory data is composed of flight mission ID, route point names,
step tj and t: latitude and longitude, flight level, speed, etc. The airspace data includes
(
1)
) (
1)
) sector names, longitudes and latitudes of 1770 route points and 302
fKey h(l−
vi ,t , fQuery h(l−
vi ,tj airports, as well as the boundaries of various sectors. From temporal
utj ,t = √̅̅̅ , (12)
d dimension, the data is further divided into three parts: 70% for training,
( ) 10% for validating and 20% for testing.
exp utj ,t
βtj ,t = ∑ ( ), (13)
tr ∈N t exp utj ,tr
5.1.2. Evaluation metrics
To demonstrate the performance of the proposed model, three
j
where fKey , fQuery devote two learnable transforms, respectively, which widely used metrics are applied, i.e., Mean Absolute Error (MAE), Root
correspond to the key vector and query vector in the attention (Vaswani Mean Square Error (RMSE), and Mean Absolute Percentage Error
and Shazeer et al., 2017). 〈⋅, ⋅〉 represents the dot product operation, d is (MAPE), which are defined as:
a scaling factor. In practice, attention function is simultaneously calcu 1∑ n
lated on a set of query vectors that are packed into a matrix MQ . Simi MAE = y i |,
|yi − ̂ (16)
n i=1
larly, key vectors and value vectors are also packed into matrices MK and
MV , respectively. Therefore, hvi ,tj can be computed as:
(l) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1∑ n
( ) RMSE = (yi − ̂y i )2 , (17)

MQ MK T n i=1
h(l)
vi ,tj = softmax √̅̅̅ MV . (14)
d
n ⃒
∑ ⃒yi − ̂
⃒
y i ⃒⃒ 100%
In this way, our model can adaptively select the relevant hidden MAPE = ⃒
⃒ ̂ × , (18)
y ⃒ n
states and model the complex temporal correlation. Since attention can i=1 i
directly access the past information in long input sequences, our model
where n is the number of testing samples. ̂ y i and yi denote the predicted
performs well in long-term dependencies modelling. Since formula (14)
value and the ground truth of air traffic flow, respectively. In addition,
is implemented for each time slice, the length of the time dimension will
we use range of prediction results at different time steps to analyze the
not be changed after extracting features in the attention layer.
multi-step prediction performance of different methods, which is
calculated as the difference between the maximum prediction error and
the minimum prediction error on different time steps.
4.4. Encoder-decoder
5.2. Baselines
As shown in Fig. 2, X ∈ RP×N is firstly concatenated with the gener
ated spatial embedding vectors, and then converted to H0 ∈ RP×N×d by
We compare the proposed method with other four flow prediction
two fully connected layers. Next, H0 ∈ RP×N×d is fed to the encoder. The methods: ARIMA, it is a well-known model used to predict future values
encoder consists of L spatio-temporal blocks with residual, in which the in a time series (Mehrmolaei and Keyvanpour, 2016); Support vector
residual structure is conducive to the back propagation of errors. Since regression (SVR), a single sector traffic prediction model is developed
the spatio-temporal blocks will not change the dimensions of the input based on the support vector machine (Zhang and Jiang et al., 2016);
feature, the encoder will encode the input data as HL ∈ RP×N×d . Followed Spatio-temporal graph convolutional network (STGCN), it combines the
by the encoder, a transform layer is designed to change the time spatial graph convolutional layers and the temporal gated convolutional
dimension from historical sequence features HL ∈ RP×N×d into future layers (Yu and Yin et al., 2017); GMAN, it is a graph multi-attention
sequence features HL′ ∈ RQ×N×d . The transform layer is composed of two network for traffic prediction (Zheng and Fan et al., 2020).
convolution layers, in which the first convolution layer is followed by a
Relu activation function, and the second convolution directly generates
the output of the transform layer. Through the transform layer, the time
dimension is transformed from P to Q. Next, the future sequence feature
7
Table 1
The experimental results (Average ± Standard Deviation) of the proposed TAaDGCN and other four comparison methods.
Metric Method 20min 40min 60min 80min 100min 120min Range
MAE TAaDGCN 5.77 ± 0.01 5.91 ± 0.01 5.99 ± 0.01 6.02 ± 0.02 6.07 ± 0.03 6.20 ± 0.05 0.43
STGCN 5.83 ± 0.03 6.23 ± 0.05 6.44 ± 0.08 6.69 ± 0.07 6.99 ± 0.10 7.34 + 0.22 1.51
GMAN 7.39 ± 0.09 6.75 ± 0.08 6.35 ± 0.06 6.36 ± 0.03 6.77 ± 0.07 7.44 ± 0.10 1.05
SVR 7.60 ± 0.00 9.17 ± 0.01 10.24 ± 0.02 11.30 ± 0.03 12.30 ± 0.04 13.27 ± 0.05 5.67
ARIMA 8.39 10.77 12.63 14.65 16.80 18.98 10.59
RMSE TAaDGCN 8.04 ± 0.03 8.26 ± 0.01 8.34 ± 0.01 8.38 ± 0.02 8.45 ± 0.03 8.61 ± 0.06 0.57
STGCN 8.19 ± 0.05 8.72 ± 0.10 8.99 ± 0.13 9.31 ± 0.12 9.75 ± 0.17 10.35 ± 0.37 2.16
GMAN 10.25 ± 0.16 9.28 ± 0.13 8.74 ± 0.09 8.78 ± 0.03 9.36 ± 0.05 10.37 ± 0.11 1.63
SVR 10.48 ± 0.02 12.60 ± 0.04 14.10 ± 0.06 15.60 ± 0.09 16.99 ± 0.12 18.28 ± 0.16 7.8
ARIMA 11.34 14.22 16.33 18.68 21.09 23.54 12.2
MAPE TAaDGCN 0.23 ± 0.00 0.24 ± 0.00 0.25 ± 0.01 0.26 ± 0.01 0.26 ± 0.01 0.28 ± 0.01 0.05
STGCN 0.23 ± 0.01 0.25 ± 0.01 0.27 ± 0.01 0.29 ± 0.02 0.32 ± 0.03 0.36 ± 0.04 0.13
GMAN 0.36 ± 0.01 0.30 ± 0.01 0.27 ± 0.01 0.28 ± 0.01 0.31 ± 0.01 0.37 ± 0.01 0.10
SVR 0.32 ± 0.00 0.41 ± 0.00 0.49 ± 0.00 0.58 ± 0.00 0.66 ± 0.00 0.74 ± 0.01 0.42
ARIMA 0.35 0.49 0.64 0.81 0.99 1.18 0.83
Fig. 5. Predicted performance of all sectors at different times.
Fig. 6. Number of sectors in different RMSE error intervals for the proposed TAaDGCN and the five variants.
5.3. Experimental results among the deep learning methods, which is due to the consideration of
the characteristics of air traffic in our method. (2) More robust. In terms
5.3.1. Model comparison of stability, although the traditional methods are more stable, the large
To demonstrate the prediction performance of the proposed method, prediction error is not suitable for practical applications. In contrast,
Table 1 shows the experimental results of five different methods on the deep learning methods can obtain better but unstable results since they
real aviation data set in the next 20 min (Q =1), 40 min (Q =2), 60 min have a large number of parameters. However, our method combines the
(Q =3), 80 min (Q =4), 100 min (Q =5) and 120 min (Q =6). To compare characteristics of dual graphs for capturing multiple spatial and tem
the variances of different methods, we list the average and standard poral dependencies, and can still achieve more robust results than other
deviation values of different metrics for 5 runs. As shown in Table 1, we deep learning methods. (3) Significant advantage in error range. Our
can observe that our method has the following advantages: (1) Low error method obtains better multi-step prediction accuracy, especially when
value. Whether compared with the traditional methods or the deep the prediction interval is long. For example, when prediction interval is
learning methods, our method has achieved the lowest error value. The 120 min, MAE is decreased by 1.14 (improved by 15.53%), RMSE is
improvement is significant compared with non-deep learning methods, decreased by 2.19 (improved by 21.16%) compared with the best
which is because non-deep learning methods ignore the vital spatial baseline model. Besides, our method achieves the lowest range of pre
information. In addition, our method achieves a best performance diction results on different time steps, which represents the difference
8
Fig. 7. The performance of the proposed TAaDGCN with TAaDGCN-NG1 and TAaDGCN-NG2 in all sectors.
between the maximum. value proves the proposed method predicts fewer abnormally large er
prediction error and the minimum prediction error on different time rors, i.e., more stable than other comparison methods. In addition, the
steps is small. In practical applications, the long-term horizon prediction proposed TAaDGCN can achieve the lowest MAPE values, which shows
is more important, which provides air traffic controllers with more time that the prediction error is the smallest based on the original values.
to dispatch aircrafts and eliminate potential conflicts. (4) Fewer
abnormally large errors and low relative error. MAE is defined as the 5.3.2. Variant comparison
average of the absolute value of the error which measures the actual To analyze the effect of each component in the proposed model, we
error, while RMSE is defined as the square root of the sum of squared evaluate five variants by removing graph convolution, temporal atten
errors which magnifies the effect of larger errors and is extremely sus tion, SE block, adjacency graph, and OD pairs graph from our TAaDGCN,
ceptible to outliers. As shown in Table 1, the MAE value of the proposed respectively, called TAaDGCN-NGCN (without graph convolution),
method is smaller than other comparison methods, which proves that TAaDGCN-Nat (without temporal attention), TAaDGCN-NSE (without
the proposed method can predict more accurately. The smaller RMSE spatial embedding), TAaDGCN-NG1 (without adjacency graph), and
9
Fig. 8. Predicted air traffic flow and MAE error in three representative sectors, namely Beijing, Guangzhou, and Shanghai, when Q is 1.
TAaDGCN-NG2 (without OD graph). of the proposed TAaDGCN with TAaDGCN-NG1 and TAaDGCN-NG2 in
all sectors, as shown in Fig. 7. It can be shown that the TAaDGCN with
1) Evaluation on all variants dual graphs can achieve the lowest prediction error in all sectors.
Furthermore, for two variants, TAaDGCN-NG1 performs better on 52%
We firstly evaluate the prediction performance of all sectors at of sectors, while TAaDGCN-NG2 performs better on 48% of sectors,
different times, as shown in Fig. 5 and we observe that no matter which which shows that the adjacency graph and the OD pair graph are crucial
component is removed, prediction errors will increase, indicating that and complementary in air traffic flow prediction.
each component plays an essential role in the entire model. Specifically,
the results of TAaDGCN-NGCN are the worst, mainly because the graph 5.4. Case study
convolution plays a very important role in extracting spatial structure
features. In addition, the error slope of TAaDGCN-NAt increases signif 5.4.1. Analysis of typical sector prediction results
icantly when Q is greater than 4, since the attention mechanism which To further investigate the proposed TAaDGCN, we selected three
accesses directly the past features in the long input sequences is more representative sectors to show the average prediction results for
important in long-term prediction. Fig. 6 shows the number of sectors in consecutive 24 h. For comparison, two deep learning methods consid
different RMSE error intervals for the proposed TAaDGCN and the five ering the spatial dependencies i.e., GMAN and STGCN are used. Fig. 8
variants. Observing from it the proposed TAaDGCN has more sectors in shows the predicted air traffic flow and MAE error in three representa
smaller error intervals compared with the other five variants. tive sectors, namely Beijing, Guangzhou, and Shanghai, when Q is 1. The
black solid lines represent ground truth, blue solid lines, green solid lines
2) Evaluation complementary on dual-graph and red solid lines represent predicted flow in STGCN, GMAN and the
proposed TAaDGCN, respectively. The blue, green and red histograms
To demonstrate the superiority of the proposed dual-graph convo represent the MAE of STGCN, GMAN and the proposed TAaDGCN
lution based on aerostatic characteristics, we compare the performance respectively. It can be seen from Fig. 8 that the proposed TAaDGCN can
10
Fig. 9. Flow curves of four areas (two areas with abnormal flow, two areas with normal flow) in a certain period of time.
more effectively capture the dynamic changes of air traffic flow, and 6. Conclusions
obtain smaller prediction error at most hours compared with the other
two deep learning methods. This paper proposes a temporal attention aware dual-graph convo
lution network (TAaDGCN) to predict air traffic flow across national
5.4.2. Case analysis of model robustness airspace sectors. Since there is an obvious spatial and temporal depen
We have selected some cases to demonstrate that our model is robust dence of air traffic flow, we have modelled it from both temporal and
to sudden reduction of air traffic flow. Fig. 9 shows the flow curves of spatial perspectives. For spatial dimension, considering the flight flow
four sectors in a same period of time, where yellow curve indicates the transfer characteristics between adjacent sectors and the flow de
actual value of the flow and blue curve indicates the predicted value of pendencies between origin-destination sector pairs, we have designed a
our model. Obviously, the flow in sector A and sector B on September dual-graph strategy including adjacency graph and origin-destination
15th is anomalous (compared to other days), while the flow of sector C (OD) pair graph. The adjacency graph captures the directional de
and sector D is normal. It can be seen that the proposed model can pendencies between adjacent sectors, while the OD pair graph captures
achieve accurate prediction results not only under normal conditions, the potential influence between the long-distance sectors due to the
but also under abnormal conditions. fixed flight schedules. In addition, considering the obvious long path
11
particularly, the proposed dual-graph module based on spatial sector

characteristics has superiority in air traffic flow prediction, where the
adjacency graph and the OD pair graph are both crucial and comple
mentary. 3) The proposed method is applied to a real-world case study,
and the results have shown that our model is robust and can achieve
accurate prediction results compared to other methods. In addition,
since the proposed method captures the general patterns of air traffic
and is independent of the selected data and spatial regions, the findings
are transferable internationally.
Our findings have new managerial implications for air traffic man
agement. Firstly, the flow prediction in most existing air traffic man
agement system relies on accurate flight plans that are changing over
time. When the flight plan is changed, management strategies have to be
adjusted accordingly, which is complex and time-consuming. The pro
posed method focuses on modelling data-driven spatial and temporal
dependence, thus controllers can be aware of the trend of air traffic flow
Fig. 10. The schematic diagram of transition probability in the random walk. and make management strategies without accurate flight plan. Sec
ondly, the information sharing mechanism between regions in the
existing air traffic management is not sufficient. The proposed method
constructs a nationwide network that captures the global spatial and
temporal correlation, enabling controllers to obtain the global impact
when making management decisions. Finally, with the increase of air
traffic flow, inaccurate flow prediction leads controllers to adopt con
servative strategies, resulting in low airspace utilization and serious
flight delays. At the same time, controllers with heavy workload are
prone to mistakes and omissions. Our study has mined the complex
spatial-temporal dependencies of the air traffic environment, which are
helpful to make more accurate predictions. By obtaining accurate in
formation about future air traffic flow distribution in advance, con
trollers are able to anticipate potential risks in advance and then
regulate the flight such as rerouting, ground holding, etc.
However, the research findings in this paper have some limitations
and are open to new direction for further study. Although the proposed
method is able to capture the complex spatio-temporal dependencies in
the current air traffic network, it requires reconstructing the graph
structure and retraining the network if the airspace structure changes
since the designed dual graph is fixed. In the future, we will try to design
dynamic graph structures to accommodate possible dynamic airspace
changes.
Author statement
Fig. 11. The overall framework of the skip-gram network.
Kaiquan Cai: Conceptualization, Methodology, Validation, Supervi
dependencies during flight, we have designed a path-based spatial sion; Zhiqi Shen:. Methodology, Software, Validation, Writing - Original
embedding module to represent potential related sectors of flight Draft; Xiaoyan Luo: Methodology, Validation, Writing - Review &
traversal. For temporal dimension, considering the difficulty of Editing; Yue Li: Software, Data Curation, Validation, Resources.
capturing long temporal dependencies, a temporal attention (TA)
mechanism has been adopted and been implemented after concate
nating the input history flow and the spatial embedding (SE) based on Declaration of competing interest
path, enabling better fusion of spatial long path and temporal infor
mation. By stacking the ST blocks in the encoder-decoder architecture, The authors declare that they have no known competing financial
multiple spatial and temporal dependencies have been effectively interests or personal relationships that could have appeared to influence
modelled. the work reported in this paper.
By observing the experimental results on real aviation datasets, we
can find that: 1) Compared with other models, our model is able to Acknowledgements
achieve better prediction performance, which is reflected in low error
value, more robust, significant advantage in error range, and fewer This work is supported by the National Key Research and Develop
abnormally large errors. 2) Each module in the proposed method, i.e., ment Program of China (No. 2021YFB2601700) and the Funds of the
dual-graph convolution strategy, path-based spatial embedding module National Natural Science Foundation of China (Grant Nos. U2033215,
and temporal attention module, plays an indispensable role. In U2133210).
Appendix
A Random Walk
12
Random walk takes any node on the graph as the starting node, and walks according to the probability between nodes to obtain some random paths
(Grover and Leskovec, 2016). The core of the random walk is to define the probability of walking from the current node to the next node. As shown in
Fig. 10, vi , vi− 1 , and vi+1 represent the current node, the traversed node in the previous step and the next walk target respectively. Now the walk starts
with the node ci = vi , for the next walk target ci+1 , transition probability can be generated by the following distribution:
⎧
⎨ αpq (vi− 1 , vi+1 )⋅WAdj (vi , vi+1 ), if (v , v ) ∈ E
(19)
i i+1 Adj
P(ci+1 = vi+1 |ci = vi ) = Z ,
⎩
0, otherwise
where Z represents the normalization constant, WAdj (⋅, ⋅) represents weight between nodes. αpq (vi− 1 , vi+1 ) is transition probability coefficient, and
αpq (vi− 1 , vi+1 ) is defined as:
⎧
⎪
⎪ 1
⎪
⎪ , ifnextwalktargetvi+1 isjustnodevi− 1
⎪
⎪ p
⎨
αpq (vi− 1 , vi+1 ) = 1, if node vi+1 is connected to node vi− 1 (20)
⎪
⎪
⎪
⎪ 1
⎪
⎪
⎩ , if node vi+1 and node vi− 1 are not connected
q
where, p and q are hyperparameters. When p is large, it is less likely to travel to the visited node vi− 1 . When q is less than 1, it is tend to visit nodes far
away from the node vi− 1 , such as vi+1 .
(2)
B The Skip-gram Network
Given a node path set composed of unlabeled nodes, the skip-gram network can generate a vector that expresses the context of path for each node
(Grover and Leskovec, 2016). Specifically, we use each current node as an input to a log-linear classifier with continuous projection layers, and predict
the neighbor nodes of the current node in the same path. The overall framework is shown in Fig. 11. By training the network, we can obtain the weight
of hidden layers as vector representation for each node, which can imply the context of the node in the path set.
To optimize above network, the objective function is defined, which maximizes the log probability of the neighborhood NS (v) for a node v, given by
f:
∑
max logPr(NS (v)|f (v)) (21)
f
v∈V
where f is a mapping function, V is node set.

To make the optimization problem tractable, conditional independence assumption and symmetry assumption in feature space are defined.
Conditional independence assumption is the likelihood of observing a neighborhood node is independent of observing any other neighborhood node
given the feature representation of the source node, therefore, Pr(NS (v)|f(v)) can be factorized as:
∏
Pr(NS (v)|f (v)) = Pr(ni |f (v)) (22)
ni ∈NS (v)
The symmetry assumption in feature space can model the conditional likelihood of each source-neighbor node pair as a softmax unit, which is
parameterized by the dot product of their features:
exp(f (ni )⋅f (v))
Pr(ni |f (v)) = ∑ (23)
u∈V exp(f (u)⋅f (v))
By the two assumptions, Formula (23) can be approximated as:

[ ]
∑ ∑ ∑
max − log exp(f (u) ⋅ f (v)) + f (ni ) ⋅ f (v) (24)
f
v∈V u∈V ni ∈NS (v)
By using stochastic gradient descent over the model parameters, we can obtain vector representation for each node and preserve the path semantics
of the node.
References Cao, Y., Zhang, L., et al., 2013. An Air Traffic Prediction Model Based on Kernel Density
Estimation. American Control Conference. IEEE, pp. 6333–6338, 2013.
Chai, D., Wang, L., Yang, Q., 2019. Bike flow prediction with multi-graph convolutional
Bayen, A., Grieder, P., Tomlin, C., 2002. A control theoretic predictive model for sector-
networks. In: "[C]//Proceedings of the 26th ACM SIGSPATIAL International
based air traffic flow. AIAA Guidance, Navigation, And Control Conference And
Conference on Advances in Geographic Information Systems. 2018, pp. 397–400.
Exhibit 5011.
Chen, S., 2013. Short-term air traffic flow prediction based on run-time data of Eurocat-X
Bruna, Joan, Zaremba, Wojciech, Arthur Szlam, LeCun, Yann, 2013. Spectral Networks
system[J]. Inf. Commun. 2013 (5), 42–44.
and Locally Connected Networks on Graphs arXiv preprint arXiv:1312.6203.
Chen, X., Yu, H., Cao, K., et al., 2019. Uncertainty-aware flight scheduling for airport
Cadenas, E., Rivera, W., et al., 2016. Wind speed prediction using a univariate ARIMA
throughput and flight delay optimization[J]. IEEE Transactions on Aerospace and
model and a multivariate NARX model. Energies 9 (2), 109.
Electronic Systems 56 (2), 853–862, 2019.
Cai, Kai-Quan, et al., 2017. Simultaneous optimization of airspace congestion and flight
Chen, Jiatong, et al., 2021. An airspace capacity estimation model based on spatio-
delay in air traffic network flow management. IEEE Transactions on Intelligent
temporal graph convolutional networks considering weather impact. In: 2021 IEEE/
Transportation Systems 18 (11), 3072–3082.
AIAA 40th Digital Avionics Systems Conference (DASC). IEEE.
Cai, Kaiquan, et al., 2021. A deep learning approach for flight delay prediction through
time-evolving graphs. IEEE Transactions on Intelligent Transportation Systems.
13
Defferrard, M., Bresson, X., Vandergheynst, P., 2016. Convolutional neural networks on Rocha, M., 2018. Collaborative air traffic flow management: incorporating airline
graphs with fast localized spectral filtering[J]. Advances in neural information preferences in rerouting decisions[J]. Journal of Air Transport Management 71,
processing systems 29, 3844–3852. 97–107.
Devlin, J., Chang, M., et al., 2019. BERT: Pre-training of Deep Bidirectional Transformers Sandamali, G. G. N. and R. Su, et al. "Two-stage scalable air traffic flow management
for Language Understanding arXiv:1810.04805v2,2019, 24 May. model under uncertainty." IEEE Transactions on Intelligent Transportation Systems:
Fabian, X., Ban, G., Boussad, R., Breitenfeldt, M., Couratin, C., Delahaye, P., Durand, D., 1-13.
Finlay, P., Flchard, X., Guillon, B., Feb. 2003. Modelling and forecasting vehicular Shi, X., Chen, Z., et al., 2015. Convolutional LSTM Network: A Machine Learning
traffic flow as a seasonal arima process: theoretical basis and empirical results. Approach for Precipitation Nowcasting. NIPS, pp. 802–810, 2015.
Journal of Transportation Engineering 129 (6), 664–672. Sridhar, B., Soni, T., Sheth, K., et al., 2006. Aggregate flow model for air-traffic
Grover, A., Leskovec, J., 2016. node2vec: scalable feature learning for networks. KDD management[J]. Journal of Guidance Control & Dynamics 29 (4), 992–997, 2006.
2016, 855–864. Steiner, M., Krozel, J., 2009. Translation of Ensemble-Based Weather Forecasts into
Kipf, T.N., Welling, M., 2016. Semi-Supervised Classification with Graph Convolutional Probabilistic Air Traffic Capacity impact"[C]2009 IEEE/AIAA 28th Digital Avionics
Networks[J] arXiv preprint arXiv:1609.02907, 2016. Systems Conference. IEEE, 2009: 2. D. 6-1-2. D. 6-7.
Kuhn, K.D., 2016. A methodology for identifying similar days in air traffic flow Tian, W., Zhang, Y., et al., 2016. Probabilistic demand prediction model for en-route
management initiative planning. Transportation Research Part C: Emerging sector. International Journal of Computer Theory and Engineering 8 (6), 495–499.
Technologies 69, 1–15. Vardaro, A., Doan, C.T., et al., 2013. Graph time-series mixture models for air traffic
Le, Q.V., Mikolov, T., 2014. Distributed representations of sentences and documents. In: prediction. In: IEEE. 2013 Integrated Communications, Navigation and Surveillance
International Conference on Machine Learning. PMLR, pp. 1188–1196. Conference (ICNS). IEEE, pp. 1–19, 2013.
Lee, S., Fambro, D., 1999. Application of subset autoregressive integrated moving Vaswani, A., Shazeer, N., Parmar, N., et al., 2017. Attention Is All You Need[J]. arXiv,
average model for short-term freeway traffic volume forecasting. Transportation 2017.
Research Record Journal of the Transportation Research Board 1678 (1), 179–188. Wang, W., Cai, K., et al., 2020. Analysis of the Chinese railway system as a complex
Li, Q., Zhang, Y., et al., 2016. A flow-based flight scheduler for en-route air traffic network. Chaos, Solitons & Fractals 130, 109408.
management. IFAC-PapersOnLine 49 (3), 353–358. Wang, Z., Liang, M., et al., 2018. A hybrid machine learning model for short-term
Li, W., Souza, B., Crespo, A., et al., 2008. Decision support system in tactical air traffic estimated time of arrival prediction in terminal manoeuvring area. Transportation
flow management for air traffic flow controllers[J]. Journal of Air Transport Research Part C: Emerging Technologies 95, 280–294.
Management 14 (6), 329–336. Wu, Y., Tan, H., 2016. Short-term Traffic Flow Forecasting with Spatial-Temporal
Lin, Y., Zhang, J., et al., 2018. An algorithm for trajectory prediction of flight plan based Correlation in a Hybrid Deep Learning Framework arXiv preprint arXiv:1612.01022,
on relative motion between positions. Frontiers of information technology & 2016.
electronic engineering 19 (7), 905–916. Xu, Y., Prats, X., et al., 2020. Synchronised demand-capacity balancing in collaborative
Lin, Y., Zhang, J., et al., 2019. Deep learning based short-term air traffic flow prediction air traffic flow management. Transportation Research Part C: Emerging Technologies
considering temporal–spatial correlation. Aerospace Science and Technology 93, 114, 359–376.
105113. Xu, K., Yin, H., et al., 2015. Game theory with probabilistic prediction for conflict
Liu, L. and J. Chen, et al. "Physical-virtual collaboration modelling for intra- and inter- resolution in air traffic management. In: International Conference on Intelligent
station metro ridership prediction." IEEE Transactions on Intelligent Transportation Systems & Knowledge Engineering. IEEE, 2016.
Systems: 1-15. Yang, C., Mao, J., Qian, X., et al., 2018. Designing Robust Air Transportation Networks
Liu, H., Lin, Y., et al., 2019. Research on the air traffic flow prediction using a deep via Minimizing Total Effective Resistance[J]. IEEE Transactions on Intelligent
learning approach. IEEE Access 7, 148019–148030. Transportation Systems, pp. 2353–2366, 2018.
Mehrmolaei, S., Keyvanpour, M.R., 2016. Time Series Forecasting Using Improved Yu, B., Yin, H., et al., 2017. Spatio-temporal graph convolutional networks: a deep
ARIMA." 2016 Artificial Intelligence and Robotics (IRANOPEN). IEEE. learning framework for traffic forecasting. In: Twenty-Seventh International Joint
Meyn, L., 2002. Probabilistic methods for air traffic demand forecasting. In: AIAA Conference on Artificial Intelligence IJCAI-18. 2018.
Guidance, Navigation, and Control Conference and Exhibit, p. 4766. Zerrouki, L., Bouchon-Meunier, B., Fondacci, R., 1999. Fuzzy System for Air Traffic Flow
Mikolov, T., Chen, K., et al., 2013. Efficient estimation of word representations in vector Management. Physica-Verlag HD, 1999.
space. Computer Science. Zhang, H., Jiang, C., et al., 2016. Forecasting traffic congestion status in terminal areas
Montlaur, A., Delgado, L., 2017. Flight and passenger delay assignment optimization based on support vector machine. Advances in Mechanical Engineering 8 (9),
strategies. Transportation Research Part C: Emerging Technologies 81, 99–117. 168781401666738.
Murça, M.C.R., Hansman, R.J., 2018. Predicting and planning airport acceptance rates in Zhang, Z., Hao, Z., Gao, Z., 2015. A dynamic adjustment and distribution method of air
metroplex systems for improved traffic flow management decision support. traffic flow en-route[J]. Journal of Air Transport Management 42 (jan), 15–20.
Transportation Research Part C: Emerging Technologies 97, 301–323. Zhao, L., Song, Y., et al., 2020. T-GCN: a temporal graph convolutional network for
Niepert, Mathias, Ahmed, Mohamed, Kutzkov, Konstantin, 2016. Learning Convolutional traffic prediction. IEEE Transactions on Intelligent Transportation Systems 21 (9),
Neural Networks for Graphs. ICML, pp. 2014–2023. 3848–3858.
Park, C., Lee, C., et al., 2020. ST-GRAT: a novel spatio-temporal graph attention network Zheng, C., Fan, X., Wang, C., et al., 2020. Gman: a graph multi-attention network for
for accurately forecasting dynamically changing road speed. In: Proceedings of the traffic prediction. Proceedings of the AAAI Conference on Artificial Intelligence 34
AAAI Conference on Artificial Intelligence, 2020. (1), 1234–1241, 2020.
Qiu, F., Li, Y., 2014. Air traffic flow of genetic algorithm to optimize wavelet neural
network prediction. In: 2014 IEEE 5th International Conference on Software
Engineering and Service Science. IEEE, 2014, pp. 1162–1165.
14

1 s2.0 S096969972200120X Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S096969972200120X Main

Uploaded by

Copyright:

Available Formats

Journal of Air Transport Management 106 (2023) 102301

Contents lists available at ScienceDirect

Journal of Air Transport Management

Temporal attention aware dual-graph convolution network for air traffic

* Corresponding author. XueYuan Road No.37, HaiDian District, BeiJing, China.

Fig. 4. Attention mechanism models temporal correlation.

where θk ∈ Rk is a vector of polynomial coefficients. K is the kernel size

( ) RMSE = (yi − ̂y i )2 , (17)

Fig. 5. Predicted performance of all sectors at different times.

particularly, the proposed dual-graph module based on spatial sector

B The Skip-gram Network

where f is a mapping function, V is node set.

By the two assumptions, Formula (23) can be approximated as:

You might also like