You are on page 1of 6

Proc. of 2020 7th Int. Conf. on Information Tech.

, Computer, and Electrical Engineering (ICITACEE)

Predicting Future Potential Flight Routes via


Inductive Graph Representation Learning
Arie Wahyu Wijayanto Farid Ridho
Department of Computational Statistics Department of Computational Statistics
Politeknik Statistika STIS Politeknik Statistika STIS
Jakarta, Indonesia Jakarta, Indonesia
ariewahyu@stis.ac.id faridr@stis.ac.id

Abstract—Air transport activity has been considered as an


important part of modern societies with high demand for
massive scale mobilizations. One of the fundamental challenges
in air transportation is to predict future potential flight routes
connecting newly developed cities around the world. In this pa-
per, we aim to solve the flight routes prediction problem under
graph theory perspective as a link prediction task by mapping
the existing airports as graph vertices and flight routes as
graph links. We generate low-dimensional feature vectors of
vertices by learning the embedding function. For each selected
vertex in the input network, we sample the neighborhood
and aggregate feature information from neighbors to predict
and generalize to unseen links and vertices. The effectiveness
of our prediction approach is shown using the Open Flights
database containing 568 airlines with 65,535 routes between
3,425 airports. The promising performance of 90% accuracy
is gained by the prediction model.
Keywords—graph representation learning, graph mining,
network embedding, link prediction, air transportation net-
work

I. I NTRODUCTION
The vast distances among major cities around the world Fig. 1. Illustration of the flight routes prediction problem.
lead to heavy demand for air transport activities [1]. To
the network structure and the features of vertices to uncover
meet the requirement of interconnectivity, air transportation
valuable hidden information [8, 9]. In this paper, we take
has also become comfortable means of travel in many
the graph theory perspective to map the routes prediction
existing remote areas [2, 3]. On the other hand, the cur-
problem into the link prediction problem where the graph
rent advancement in automatic data collection technologies
vertices represent airports and the graph links represent
provides an abundant amount of traffic data including air
flight routes.
flight trajectories among countries [4]. However, a crucial
challenge remains unaddressed, that is how to accurately The link prediction problem aims to discover missing
predict the future potential flight routes among major cities links and predict the emergence of potential future links
airports. in networks based on the observed features of links and
Predicting future potential flight routes is of great impor- vertices [5]. Node and edge are also termed as vertex and
tance to support the comprehensive infrastructure planning link in the discrete mathematics language. In this paper,
and tourism demand forecasting [4, 5]. It is also benefi- we use those terms interchangeably. Given the network
cial for local governments to anticipate future growth of structure and features of the vertices, we construct the
passenger arrivals, aircraft movements, and cargo demands. low-dimensional feature vector of vertices by the learnable
Further, the airline industry could also efficiently plan and embedding function. The embedding function is learned by
optimize flight frequency for each designated airports. sampling the vertices and aggregating their local neigh-
Networks are useful models that can represent and ana- borhood feature information. Therefore, accurate predictors
lyze the structures of real-world complex systems, such as can be applied to discover the most potential latent links.
the flight routes interconnection between airports. Networks This inductive approach is also cost-efficient than naively
are also known as graphs in the graph theory and discrete checking all possible link combinations.
mathematics language [6, 7]. We may analyze and exploit Using the OpenFlights database of 65,535 flight routes

978-1-7281-7226-2/20/$31.00
Authorized ©2020
licensed use limited to: Dedan Kimathi University 146
IEEEof Technology. Downloaded on May 17,2021 at 05:57:29 UTC from IEEE Xplore. Restrictions apply.
between 3,425 airports, we show the effectiveness of our A. Node2vec
prediction approach. Our proposed approach potentially Node2vec introduced a node embedding approach based
beneficial for the airline industry to anticipate and capture on biased random walk using Skip-Gram strategy. It itera-
more crucial market opportunities. tively maps each node in a graph into a low-dimensional
feature space which maximizes the log-probability of pre-
II. P RELIMINARIES serving global neighborhoods of nodes. Let G = (V, E) be
an input network with set of nodes V and set of edges E.
A. Flight Routes Prediction Problem
For each node v ∈ V and NS(v) ⊂ V as neighbors of v
We first define the flight routes prediction problem as generated by Skip-Gram sampling strategy, Node2vec aim
follows. Given the network G as the existing network of to maximize the following objective function:
flight routes among a set of airports V , the objective is X
to predict the future possible flight within the network G max log P r(NS(v) |f (v)), (1)
f
connecting the subset of airports in V . v∈V
Figure 1 shows a schematic illustration of this problem.
where f : V → Rd is the mapping function of node
Using the existing flight routes (shown in blue-colored line),
into feature space with d number of dimensions. P r(·)
we aim to get the predicted future flight routes (shown in
denotes the likelihood or probability that will be maximized.
red-colored dashed line).
Node2vec embedding algorithm is one of the most popular
benchmark for graph representation learning.
B. Link Prediction Problem
B. Graph Convolutional Network (GCN)
Link prediction is widely known in many daily applica-
tions such as giving friendship recommendation in social GCN follows the ideas of the convolutional neural net-
networks, proposing items to purchase in e-commerce, and work (CNN) into graph-structured data by convolving the
showing suitable matches in online dating applications [3, input graph directly according to the connectivity structure
5, 10, 11]. We can utilize link prediction to predict missing of the graph. GCN utilizes a layer-wise propagation ap-
links in an incomplete data or to predict future possible proach by approximating first-order spectral graph convo-
links [12–15]. Formally, given an input static network or lutions in each layer to represent one-hop local neighbor-
a snapshot of dynamic network G, in the link prediction hoods. The resulted hidden layer representations are able to
problem, we aim to infer the most likely links to form in preserve not only the feature of each node but also the local
the network based on a partially observed current network subgraph structure.
of G. Let us consider a simple two-layer GCN and an input
graph G with adjacency matrix A and graph Laplacian
C. Graph Representation Learning matrix L. The resulted GCN feature vectors Z of nodes
in graph G will be calculated as follows:
The fundamental objective of graph representation learn-
ing is to preserve the topology and features of graph- Z = sof tmax(L ReLU (LXW1 )W2 ), (2)
structured data and extract its valuable information into low-
dimensional space [16–18]. Most studies in learning the where X, W1 , and W2 are the matrix of nodes feature,
graph representation can be roughly categorized into two weight matrix of the first and second hidden layer re-
classes: (1) graph Laplacian regularization, which includes spectively. ReLU is a rectified linear unit activation func-
manifold regularization and label propagation, and (2) graph tion. Softmax is an activation function which computes
1
embedding approaches. sof tmax(x
P i ) = Z exp(xi ) in a row-wise manner when
In this paper, we will focus on the graph embedding Z = i exp(xi ).
approaches. For each node in the input network, the graph
C. Graph Attention Network (GAT)
embedding approach will encode it into a d-dimensional
real-valued feature vector. Figure 2 illustrates the embed- In GAT, we assume that the contributions of neighbor-
ding process in detail. Node v is encoded by the embedding ing nodes to the focused node are neither pre-determined
function f : v → µv into its feature vector representation like GCN nor identical like GraphSAGE. GAT learns the
µv with length d. relative weights among connected nodes by utilizing at-
tention mechanisms. GAT constructs each node represen-
III. R ELATED W ORK tation adaptively from the combination of its neighborhood
vectors. The attention is computed as adjustable weights
In this section, we will review the related work of on different connecting neighbors. The attention is then
existing methods in the link prediction problem. Some most iteratively updated based on the feature vector of local
popular works include Node2vec [19], Graph Convolutional neighboring nodes.
Network (GCN) [20], and Graph Attention Network (GAT) The graph convolutional operation in GAT is calculated
[21]. as follows,

147
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 17,2021 at 05:57:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Schematic illustration of vertex embedding.

B. Inductive Graph Representation Learning


X
h(k)
v = σ( (k)
αvu W (k) h(k−1)
u ), (3) Given a graph as input with its selected node features, we
u∈N (v)∪v need a node embedding method to learn and preserve the
structural neighborhood information of each node. The lim-
(0) (k)
where hv = xv . The attention weight αvu measures the itation of some existing graph representation learning such
connective strength between the node v and its neighbor u as Node2vec [19] and DeepWalk [24] is its transductive
as follows: fashion. The transductive learning requires to learn from
the whole graph to construct the embedding of each node.
Hence, if any new nodes are introduced to the existing input
(k)
αvu = f (g(aT [W (k) h(k−1)
v ||W (k) h(k−1)
u ])), (4) graph, the embedding process needs to be constructed from
scratch.
where f (·) and g(·) are softmax and LeakyReLU ac- In inductive graph representation learning, to deal with
tivation function respectively. a is a vector of learnable the issue, the embedding method is required to create
parameters. W (k) is the weight matrix of the corresponding the feature vector of new nodes without re-training the
k-th attention mechanism. whole node embedding. The common approach to achieve
inductive learning is by introducing the aggregator function
IV. F LIGHT ROUTES P REDICTION WITH I NDUCTIVE from nodes features and neighborhood structure to induce
G RAPH R EPRESENTATION L EARNING the embedding of any new nodes.
We utilize Graph Sample and Aggregate (GraphSAGE)
A. Feature Extraction and Selection method to learn inductively the low-dimensional feature
We evaluate our proposed methods on real-world airflight vector of nodes in the given input graph. GraphSAGE learns
network datasets, which summarized in Table I. OpenFlights node embedding through a general inductive framework
dataset is a directed network of regular flights connecting consisting of several feature aggregators [25]. In each
more than three thousands airports and consists of more iteration, GraphSAGE aims to solve the memory bottleneck
than sixty thousands flight routes [22, 23]. The self-loop problem by sampling a fixed-size neighborhood, after that
nodes are removed from the dataset. it computes a specific aggregator over feature extractor.
The sampling strategy in GraphSAGE yields impressive
TABLE I. STATISTICS OF DATASET performance on node labeling tasks over several large-scale
networks.
Name #nodes #edges #features
GraphSAGE commonly adopts supervised node classi-
OpenFlights 3,425 65,535 738 fication tasks as the evaluation benchmark with the as-
sumption that a better embedding algorithm leads to higher
node classification accuracy. Given a graph and its features,
The following flight-route 738 features are extracted from GraphSAGE generates a vector representation of each node
the dataset and selected as input: zv for all v ∈ V as follows:
• Codeshare (1 feature)
hkN (v) = aggk (hk−1
u , ∀u ∈ N (v)), (5)
• Stop (1 feature)
• Airlines (568 features) where aggk (·) is the aggregator function of k-layers (∀k ∈
• Equipment (168 features) 1, ..., k) denoting the information aggregation from k-hop

148
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 17,2021 at 05:57:29 UTC from IEEE Xplore. Restrictions apply.
node neighborhoods as well as the corresponding weight
matrices W k . N (v) is the set of neighboring nodes of v.

hkv = σ(W k · concat(hk−1


v , hN (v)k )) (6)
where concat(·) is the concatenation function of node’s
recent vector representation hk−1
v with its aggregated neigh-
borhood information hN (v)k .
V. E XPERIMENTAL R ESULTS
A. Evaluation Criteria
To evaluate the link prediction task, we need to measure
the performance of our method as a binary classifier in
predicting the existence of an edge of the given two vertices.
In order to train any link prediction models, we need to
provide the train and test sets of edges and the modified
graphs where those edges removed. In this work, the input
graph is split into a train and test set where each of them
should have the same number of nodes but a different
number of edges.
From the original graph G, we perform a uniformly ran-
dom sampling to extract the subset of negative samples and
positive samples. The negative sample consists of node pairs Fig. 3. Accuracy and loss on train and validation set during the
where there exist no edge connecting them. The positive initial 100 epochs training.
sample contains some pairs of nodes that are connected by
edges. The test set Gtest is resulted by removing all positive
samples from the original graph G. To get the training
set Gtrain , we use a similar procedure from the reduced
original graph G. To quantify the performance comparison
of link prediction methods, we use the standard evaluation
metrics, accuracy, and loss.
B. Effectiveness Evaluation
The resulted performance of the trained GraphSAGE
model on the train set Gtrain and test set Gtest are as
follows:
• On the train set, loss and accuracy are 22.22% and
93.36% respectively
• On the test set, loss and accuracy are 24.72% and
90.00% respectively
C. Experimental Setting
In this work, we select 50% random edges of the input
network to be utilized as a test set Gtest . The remaining
Fig. 4. Visualization of the resulted low-dimensional space of
50% of edges are used as the training set. All node features node embeddings generated using TSNE method.
are converted into numeric values as input vectors. The
fraction of 10% random edges are selected from the test the first 100 epochs. In the early epochs, the model begin
set as positive and negative samples respectively. The same with quite low accuracies, less than 70% and able to
fraction number applies to the training set. gain 85% accuracy after 5 epochs. After 100 epochs, the
To build the prediction model, we take a 2-layer archi- model achieves 93.36% and 90.00% on training and testing
tecture with the hidden layer sizes of 50 for both layers, set respectively. As both results show a similar accuracy
a bias term, dropout rate of 0.3, and batch size 20. The performance, the proposed model is quite able to avoid the
experiments are performed on the Google Collaboratory overfitting issue.
[26] and we implement the simulation using Stellar Graph The visualization of the low-dimensional feature space
library [27] build under TensorFlow and Keras. of node in Openflights network is shown in Figure 4
Figure 3 shows the training history of the trained Graph- build under TSNE visualization. Each point (shown in blue
SAGE model on the train and test (validation) set during colored dot) represents a node in the airflight network,

149
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 17,2021 at 05:57:29 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Problem mapping.

which is the airport. In two dimensional plot, using two [4] Y. Takahashi, R. Osawa, and S. Shirayama, “A basic
TSNE-generated variables we see some airports naturally study of the forecast of air transportation networks
form some clusters due to their attributes similarities. The using different forecasting methods,” Journal of Data
deeper exploration of similarities among airports and flight Analysis and Information Processing, vol. 5, pp. 49–
routes opens another interesting investigation and is left for 66, 2017.
our future work. [5] B. Zhu and Y. Xia, “Link prediction in weighted
networks: A weighted mutual information model,”
VI. C ONCLUSION
PLOS ONE, vol. 11, no. 2, pp. 1–13, Feb. 2016.
In this paper, we have addressed the problem of predict- [6] A. W. Wijayanto and T. Murata, “Learning adaptive
ing future potential flight routes using an inductive graph graph protection strategy on dynamic networks via
representation learning. Specifically, we map the problem reinforcement learning,” in 2018 IEEE/WIC/ACM In-
of flight routes prediction into a graph link prediction using ternational Conference on Web Intelligence (WI), ser.
vector/node embedding, as illustrated in Figure 5. We use WI 2018, New York, USA: IEEE, 2018, pp. 534–539.
a real-world airport and flight-route dataset to show the [7] A. W. Wijayanto and T. Murata, “Effective and scal-
effectiveness of our proposed approach which could be able methods for graph protection strategies against
beneficial to capture the growing market opportunities. The epidemics on dynamic networks,” Applied Network
promising performance of 90% accuracy is shown by the Science, vol. 4, no. 1, p. 18, 2019.
prediction model. Our future work includes the exploration [8] A. W. Wijayanto and T. Murata, “Flow-aware vertex
of the most influential features of airports and flight routes protection strategy on large social networks,” in Pro-
which determine the likelihood of two or more airports ceedings of the 2017 IEEE/ACM International Con-
being connected and the strength of their connection. ference on Advances in Social Networks Analysis and
R EFERENCES Mining 2017, ser. ASONAM ’17, Sydney, Australia:
ACM, 2017, pp. 58–63.
[1] P. Srisaeng, G. Baxter, S. Richardson, and G. Wild, [9] A. W. Wijayanto and T. Murata, “Pre-emptive spec-
“A forecasting tool for predicting australias domestic tral graph protection strategies on multiplex social
airline passenger demand using a genetic algorithm,” networks,” Applied Network Science, vol. 3, no. 1,
Journal of Aerospace Technology Management, vol. p. 5, 2018.
7, no. 4, pp. 476–489, 2015. [10] Y. Jiao, Y. Xiong, J. Zhang, and Y. Zhu, “Collective
[2] S. Ayhan and H. Samet, “Aircraft trajectory predic- link prediction oriented network embedding with
tion made easy with predictive analytics,” in Pro- hierarchical graph attention,” in Proceedings of the
ceedings of the 22nd ACM SIGKDD International 28th ACM International Conference on Information
Conference on Knowledge Discovery and Data Min- and Knowledge Management, ser. CIKM 19, Beijing,
ing, ser. KDD 16, San Francisco, California, USA: China: Association for Computing Machinery, 2019,
Association for Computing Machinery, 2016, 2130. 419428.
[3] M.-Y. Zhou, H. Liao, W.-M. Xiong, X.-Y. Wu, and [11] M. Zhang and Y. Chen, “Weisfeiler-lehman neural
Z.-W. Wei, “Connecting patterns inspire link predic- machine for link prediction,” Proceedings of the
tion in complex networks,” Complexity, vol. 2017, ACM SIGKDD International Conference on Knowl-
no. 8581365, pp. 1–12, Dec. 2017. edge Discovery and Data Mining, vol. Part F129685,
pp. 575–583, 2017.

150
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 17,2021 at 05:57:29 UTC from IEEE Xplore. Restrictions apply.
[12] A. K. Menon and C. Elkan, “Link prediction via ma- [23] J. Kunegis, “KONECT – The Koblenz Network Col-
trix factorization,” in Machine Learning and Knowl- lection,” in Proc. Int. Conf. on World Wide Web
edge Discovery in Databases, D. Gunopulos, T. Companion, 2013, pp. 1343–1350.
Hofmann, D. Malerba, and M. Vazirgiannis, Eds., [24] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk:
Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, online learning of social representations,” in Pro-
pp. 437–452. ceedings of the 20th ACM SIGKDD international
[13] R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla, conference on Knowledge discovery and data mining
“New perspectives and methods in link prediction,” - KDD ’14, New York, New York, USA: ACM Press,
in Proceedings of the 16th ACM SIGKDD Interna- 2014, pp. 701–710.
tional Conference on Knowledge Discovery and Data [25] W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive
Mining, ser. KDD 10, Washington, DC, USA: Asso- representation learning on large graphs,” in 31st Con-
ciation for Computing Machinery, 2010, 243252. ference on Neural Information Processing Systems
[14] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and (NIPS 2017), Long Beach, CA, USA, 2017, pp. 1–19.
G. Bouchard, “Complex embeddings for simple link [26] Google colaboratory, https://colab.research.google.
prediction,” in Proceedings of the 33rd International com, 2017.
Conference on International Conference on Machine [27] C. Data61, Stellargraph machine learning library,
Learning - Volume 48, ser. ICML16, New York, NY, https://github.com/stellargraph/stellargraph, 2018.
USA: JMLR.org, 2016, 20712080.
[15] M. A. Hasan and M. J. Zaki, “A survey of link
prediction in social networks,” in Social Network
Data Analytics, C. C. Aggarwal, Ed. Boston, MA:
Springer US, 2011, pp. 243–275.
[16] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart,
and J. Sun, “Gram: Graph-based attention model for
healthcare representation learning,” in Proceedings of
the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, ser. KDD
17, Halifax, NS, Canada: Association for Computing
Machinery, 2017, 787795.
[17] J. philippe Vert and Y. Yamanishi, “Supervised graph
inference,” in Advances in Neural Information Pro-
cessing Systems 17, L. K. Saul, Y. Weiss, and L.
Bottou, Eds., MIT Press, 2005, pp. 1433–1440.
[18] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton,
and J. Leskovec, “Hierarchical graph representation
learning with differentiable pooling,” in Advances
in Neural Information Processing Systems 31, S.
Bengio, H. Wallach, H. Larochelle, K. Grauman,
N. Cesa-Bianchi, and R. Garnett, Eds., Curran As-
sociates, Inc., 2018, pp. 4800–4810.
[19] A. Grover and J. Leskovec, “Node2vec: scalable
feature learning for networks,” in Proceedings of
the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining - KDD
’16, New York, New York, USA: ACM Press, 2016,
pp. 855–864.
[20] T. N. Kipf and M. Welling, “Semi-supervised clas-
sification with graph convolutional networks,” in In-
ternational Conference on Learning Representations
(ICLR), 2017, pp. 1–14.
[21] P. Velickovic, G. Cucurull, A. Casanova, A. Romero,
P. Lio, and Y. Bengio, “Graph attention networks,”
in International Conference on Learning Represen-
tations (ICLR), 2018, pp. 1–12.
[22] Openflights network dataset – KONECT, http : / /
konect.uni-koblenz.de/networks/openflights, 2016.

151
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 17,2021 at 05:57:29 UTC from IEEE Xplore. Restrictions apply.

You might also like