Professional Documents
Culture Documents
Abstract— The congestion and disruption of information collection processes and bring many blind spots in disas-
infrastructures frequently happen during disasters, which would ter areas [4]. To address such problems, many alternative
hinder the understanding of disaster scenarios, and thus impede approaches have been proposed, e.g., data crowd-sourcing [5],
rapid response activities. With the advantages of high flexibility
and efficiency, this paper proposes to use UAVs as temporary and social media mining [6], and deploying mobile communication
mobile relays for disaster data collection. However, different from units [7]. Recently, with the technological advancement of
many existing data collection scenarios in industrial sectors, the Unmanned Aerial Vehicle (UAV), it has been proposed to use
disaster data value varies with UAV arrival time and service time UAVs as temporary and mobile relays for disaster data collec-
in terms of their importance for disaster response, which makes
tion, with many pilot applications in 2015 Nepal earthquake,
the scheduling of UAVs challenging. To address such a problem,
this paper proposes an attention-based Deep Reinforcement 2021 Henan Flash Flood, 2022 Luding earthquake, etc.
Learning (DRL) method for multi-UAV scheduling considering UAV-based data collection is emerging in various scenarios,
time-varying data value. Specifically, the problem is modeled as e.g., large-scale wireless sensor networks [8], communication
a specific team orienteering problem with time-varying value. networks [9], [10], [11], infrastructure inspection [12], and
Then the relationships between UAV route selection and service
time at each node are analyzed, based on which the computing
construction process monitoring [13], among which UAV
efficiency for solution algorithms can be improved. After that, demonstrates many advantages of high flexibility and speed,
an attention-based DRL method is developed, with a calibrated economic efficiency, and easy-to-deploy. Meanwhile, many
attention model and decoding method. Finally, systematic compu- works have been conducted to improve its collection quality
tational experiments are conducted to evaluate the performance and efficiency [14], [15], [16], in which UAV route planning
of the proposed method, which demonstrates its superiority over
popular methods in UAV scheduling, especially for large-scale
and scheduling has attracted much attention, with objectives
and complex scenarios. on minimizing UAV energy consumption [17], [18], flight
time [19], task completion time [20], data packet losses [21],
Index Terms— Disaster response, unmanned aerial vehicle,
multi-UAV scheduling, data collection, deep reinforcement learn- etc. These works provide extensive knowledge and valuable
ing, time-varying value. insights about UAV-based data collection. However, they can-
not be directly adopted in disaster data collection, which has
four distinct features that make the problem more complex
I. I NTRODUCTION and challenging.
Firstly, the value of disaster data is highly dependent on
T IMELY disaster data is essential for efficient responses,
thus vital for saving lives and preventing economic
losses [1], [2], [3]. Nevertheless, information infrastructures
collection time and varies with time. On the one hand, since
the timeliness of data is vital in disasters, the data value will
are always vulnerable that would be congested or even decrease dramatically with time. On the other hand, disaster
destroyed by disasters, which may impede the disaster data data value peaks at the time collection starts and decreases dra-
matically afterward. This is because the most critical data (e.g.,
Manuscript received 17 April 2023; revised 10 October rescue and relief demand data) usually be sent by victims at the
2023 and 9 December 2023; accepted 16 December 2023. This work very beginning, then they tend to use the temporal relay (UAV)
was supported in part by the National Natural Science Foundation of China
under Grant 72174042 and Grant 72101223, in part by the Natural Science for non-critical or even disaster-irrelevant communications
Foundation of Guangdong Province under Grant 2023A1515011402, in part thereafter. Incorporating such time-varying characteristics of
by the Natural Science Foundation of Shenzhen Municipality under Grant data value greatly increases the complexity of the problem
JCYJ20230807140406013, and in part by the Startup Fund of The Hong
Kong Polytechnic University. The Associate Editor for this article was J. Li. and makes efficient scheduling of UAVs more challenging.
(Corresponding author: Gangyan Xu.) Secondly, the disaster data volumes or values usually dif-
Pengfu Wan, Gangyan Xu, and Jiawei Chen are with the Depart- fer from regions regarding their populations and degrees of
ment of Aeronautical and Aviation Engineering, The Hong Kong Poly-
technic University, Hong Kong (e-mail: pengfu.wan@connect.polyu.hk; damage. It makes the disaster area contain heterogeneous data
gangyan.xu@polyu.edu.hk; superlaser-jw.chen@connect.polyu.hk). points (regions), which are different from previous works in
Yaoming Zhou is with the Department of Industrial Engineering and industrial scenarios. Such heterogeneity further affects the
Management, Shanghai Jiao Tong University, Shanghai 200240, China
(e-mail: iezhou@sjtu.edu.cn). decisions on UAVs’ service time at different points and makes
Digital Object Identifier 10.1109/TITS.2023.3345280 the scheduling problem more complex.
1558-0016 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Thirdly, due to the scarcity of UAVs in disasters, it cannot as an important issue in this field, UAV scheduling has
cover all affected regions but only parts of them, as in the case attracted much attention with different performance measures,
of 2021 Henan Flood. It differs our work from previous ones in such as energy consumption, coverage ranges, collection cost,
industrial data collection where the problems can be modeled and efficiency [26]. For example, considering the high energy
and solved based on the variants of Vehicle Routing Problems consumption of sensors in data transmission, Baek et al. [27]
(VRPs). In this work, besides efficient routing decisions, the proposed an energy-efficient UAV routing by maximizing
sub-sets of regions to be covered should also be decided. the minimum residual energy of sensors. Li et al. [28]
Fourthly, decisions in emergency situations should be made considered the cooperation among vehicles and UAVs in
efficiently. Nevertheless, considering the complexities of the 6G-based IoT networks, and designed data collection routes
problem discussed above, and its relatively large scale in terms to improve the coverage ratio and reduce collection costs.
of the number of demand regions and UAVs, it is a challenge And Wang et al. [29] took multiple objectives of UAV data
to design efficient decision-making algorithms. collection into account, and proposed two schemes for flight
Taking the above features and challenges into consideration, cycle minimization and energy efficiency maximization.
this work develops a Deep Reinforcement Learning (DRL) There are also different techniques developed for solv-
based multi-UAV scheduling method to maximize the value ing the UAV scheduling problems in data collection, e.g.,
of disaster data collected, and finally support efficient and graph-theory-based, optimization-based, and learning-based
effective disaster responses. The contributions of this work methods [14]. Specifically, graph-theory-based methods can
lie in the following four aspects: convert the geographical space into graphs and generate data
• An integrated mathematical model is developed that could collection routes based on graph analysis [30], which are
well capture the features of time-varying data value and widely applied in space division problems [31]. Optimization-
partial coverage of demand regions for multiple UAVs based methods can get the optimal or near-optimal route
scheduling in disaster data collection. by optimization related techniques, such as branch and
• Through analyzing the interaction between UAV arrival bound [32], dynamic programming [33], and successive con-
time and service time on potential disaster data value vex optimization [34]. However, they cannot cope with
collected, an approximate analytical solution is developed complex and large-scale problems, thus inappropriate for
to accelerate the UAV scheduling process. disaster scenarios that require efficient decision-making [14].
• Through embedding the interdependent decision pro- Recently, learning-based methods that adopt supervised learn-
cesses of UAV routes and service time at each region, ing [35] and reinforcement learning [36] techniques are
a new attention-based DRL framework is developed that emerging, which could deal with dynamic and uncertain
can support real-time decision-making for multi-UAV environments, as well as large-scale problems. There are
scheduling in disaster data collection. currently two mainstream reinforcement learning frameworks
• Systematic experimental case studies are conducted that for UAV-based data collection. One adopts grid world models
verify the advantages of the proposed method over for depicting data collection scenarios and using value-based
existing learning-based and heuristics-based methods in methods [37], [38], [39]. The other applies policy-based meth-
different scenarios. Results can also be adopted as bench- ods, taking into account more flexible scenario settings and
marks for future research. UAV action spaces [40], [41].
The rest of the paper is structured as follows. Related works However, previous works focus on trajectory design, and
are reviewed in Section II. The mathematical model of the UAV route scheduling issues are still not well covered.
multi-UAV based disaster data collection problem is presented Besides, existing works mainly consider the amount of data
and analyzed in Section III. Section IV discusses the DRL- collected, while in a disaster scenario, both the value and
based solution method. Then experimental case studies and amount of data collected should be considered, which is more
results analysis are given in Section V. Finally Section VI complex and challenging.
concludes the whole paper and points out the future works.
B. Team Orienteering Problem With Time-Varying Value
II. R ELATED W ORK
Team Orienteering Problem (TOP) [42] refers to the prob-
In this section, the relevant works are reviewed from three
lem that given a set of nodes with different values and a fixed
streams: UAV-assisted data collection, team orienteering prob-
amount of time, each team member decides its path to visit
lems with time-varying value, and reinforcement learning for
these nodes such that the total value of all paths is maximized.
combinatorial optimization.
Different from VRPs [43], members in TOP only need to
visit a subset of the nodes, which makes it more appropriate
A. UAV-Assisted Data Collection for scenarios with insufficient resources, such as healthcare
UAV-assisted data collection has attracted a lot of attention logistics [44], disaster response [45], etc.
in recent years, and extensive work has been done across dif- Driving by various practical cases, especially in emergency
ferent areas [14], [22]. However, in practical applications, there response systems, TOP with time-varying value has also been
are some challenges faced by UAV-assisted data collection, studied [46], [47]. Generally, the time-varying properties can
including the selection of data collection mode [23], sensors be classified into three groups. The first is the arrival-time-
deployment [24], UAV speed control [25], etc. In particular, dependent value, where the value varies with the arrival
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WAN et al.: DRL ENABLED MULTI-UAV SCHEDULING FOR DISASTER DATA COLLECTION 3
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
N OTATION TABLE
si ≥ 0 ∀i ∈ N (8)
xi, j,k ∈ {0, 1} ∀i, j ∈ N , k ∈ V (9)
The objective function (1) maximizes the data value col-
lected by all UAVs, which depends on the routes of UAVs
and data collection time (service time) at each region. Con-
straint (2) means every UAV should leave the disaster region
m ∈ C it visited, which guarantees flow conservation of the
problem. Constraint (3) ensures every disaster region is visited
at most once. Inequality (4) limits the maximum number of
Fig. 2. Data value with UAV arrival time and service time.
UAVs depart from the depot. The visiting sequence within each
route is specified in constraint (5), where M is a large positive
constant to linearize the inequality. If xi, j,k = 1, it means be quickly collected. Since this stage is usually very short,
that UAV k would visit the region j after visiting the region the data value is assumed to be constant with bik . (ii) After a
i, and the arrival time at region j must be greater than or certain time, the data value begins to decrease at an increasing
equal to the sum of the arrival time at region i, traveling rate as new disaster data is generated at a decreasing rate while
time from region i to region j, and the service time at region non-critical data begin to show up [71]. (iii) When most critical
i. if xi, j,k = 0, inequality (4) is always satisfied since M disaster data are collected, the instant data value will stay at
is large enough. Constraint (6) initializes the departure time, a very low level. With the above analysis, the logistic curve
while constraint (7) is the endurance limitation for UAVs. f ik is appropriate for approximating such a process. In this
Constraints (8) and (9) are the ranges of decision variables work, the midpoint of the logistic curve is proportional to βi
si and xi jk . with the ratio g. The reason is that if one node contains more
data value, the UAV should take more time to collect its data.
Integrating with (10), the instant data value f ik after the data
B. Time-Varying Data Value
collection starts can be depicted as (11).
According to the analysis in Section I and Section II, the
bik
disaster data value vik collected by UAV k at node i is a f ik = ( g > 0, x ∈ [0, si ] ) (11)
function of aik and si , denoted as vik = f (aik , si ). 1 + e x−gβi
1) Arrival-Time-Dependent Value: Due to the timeliness Based on Equation (10) and (11), the collected data value
requirement in disasters, the earlier the disaster data are vik can be obtained by integrating the logistic function over
collected, the more value they may contain. According to the service time si as (12) and illustrated in Figure 2.
βi − α · aik
Z si Z si
investigations on the survival rate with rescue time [70], the
instant data value of node i (with origin instant data value of vik = f ik dx = dx (12)
0 0 1 + e x−gβi
βi ) is considered to decrease linearly with aik , denoted as bik :
bik = βi − α · aik (α > 0) (10) C. Model Analysis
where the decreasing rate of data value is α. The TOP with time-varying data value is more complex
2) Service-Time-Dependent Value: Different from service- than the basic TOP (which is already an NP-Hard problem),
time-depended value models adopted in emergency rescue that not only traveling routes with arrival time at each node
scenarios, the data value decreasing process over si contains should be determined, but also the service time of UAV
three stages: (i) During the initial stage right after UAV arrives, at each node. It can hardly be solved by heuristic based
a large amount of informative and critical disaster data will methods within a short time, let alone exact optimal solutions.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WAN et al.: DRL ENABLED MULTI-UAV SCHEDULING FOR DISASTER DATA COLLECTION 5
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WAN et al.: DRL ENABLED MULTI-UAV SCHEDULING FOR DISASTER DATA COLLECTION 7
As UAVs should return to the depot finally equation (2), the Algorithm 1 Reinforce With Exponential Moving
depot embedding is included in context embedding. Similar to Average Baseline
graph embedding, both the current state embedding and depot Input: number of epochs N , steps per epoch T , batch
embedding are encoded by MHA. With context embedding size B, decay factor γ
h c and query weight W q , query for context embedding qc is Initialize network weights of encoder and decoder;
formulated as: for epoch = 1, 2, . . . , N do
qc = W q h c (25) for step = 1, 2, . . . , T do
generate B instances X 1 , X 2 , . . . , X B ;
3) Decoding Mechanism: The possibilities of the next for i = 1, 2, . . . , B do
action are obtained by the decoding mechanism. Since the while termination condition is not satisfied
context embedding includes information about the current do
state, qc is used to query ki of other nodes and calculate select actions through current policy;
attention weight between them. Note that not all nodes are if previous node is not the depot then
available since visited nodes cannot be revisited (equation (3)), calculate service time of previous
and some nodes that make UAV exceed its maximum flight node;
time are prohibited (equation (7)). The set of unavailable nodes end
forms a mask, and attention weights of unavailable nodes are update the environment and observe
masked. Then the attention weight u i of node i is: current states;
T end
q√c ki if node i is available record actions a1 , a2 , . . . , a B and states
ui = dk (26) s1 , s2 , . . . , s B ;
compute total rewards G 1 , G 2 , . . . , G B ;
−∞ otherwise
end
To reduce the varying range of u i , the tanh function is used to
if no baseline is generated before then
clip the result. Possibilities of all actions are obtained through
b ← avg(G 1 , G 2 , . . . , G B );
the softmax function of attention weights.
else
b ← γ avg(G 1 , G 2 , . . . , G B ) + (1 − γ )b;
C. Decoding Method end
PB
Different from single-agent problems, multiple UAVs with ∇θ ← i=1 (G i − b)∇θ logπ(ai |si , θ );
different states are considered in this work. One feasible end
method is to compare the actions of different UAVs and choose end
the most appropriate decision. Unlike generating joint actions,
UAVs in the competitive mechanism have to compete with
each other, and only one action of one UAV is chosen each
the total value is obtained according to the formula (1), and the
time. The main advantage of this method is avoiding conflicts
policy is updated according to the REINFORCE framework.
when UAVs tend to select the same node. As the current
j The pseudo-code of REINFORCE with an exponential moving
states of UAVs are different, context embedding h c have to
j average baseline is presented in the Algorithm 1.
be calculated for each UAV j, then query qc and the attention
j
weight u i between UAV j and node i are calculated as:
j j
V. E XPERIMENTAL C ASE S TUDY
qc = W q h c (27)
jT A. Experiment Setting
qc k i
j √ if node i is available for UAV j In this work, a square disaster area with size [0, 1] × [0, 1]
ui = dk (28)
is built for experimental case study. A set of disaster regions
−∞ otherwise
with data to be collected are randomly generated within the
To all attention weights calculated above, the possibility for area. Their instant data values β are also randomly generated
action ai j (choosing UAV j and node i) is in the range of [20, 30]. In the experiments, the data value
j
decreasing rate α is set to 0.005, implying that data value
eu i will decrease to 0 after 200 minutes, which is consistent
p(ai j ) = P P (29)
j with the understanding of golden window (around 3 hours)
j i eu i
in emergencies. Meanwhile, g is set as 0.5 in formula (11).
Greedy decoding is developed and the action with the largest Besides, a fixed airspeed of 0.05 unit distance per minute
possibility is chosen. After getting the new action, the service is set for UAVs. To verify the performance of the proposed
time of the previous node is calculated if the previous node is method in different scales of problems, the number of disaster
not the depot. Then the environment information, including regions Nregion is set from 5 to 500 while the number of UAVs
arrival time and UAV positions, will be updated based on NUAV ranges from 1 to 20. In the following, each scenario is
constraint (5), and new states of UAVs will be observed. numbered, as ’U 10− N 50’ denotes the scenario with 10 UAVs
Repeating such interactions until all UAVs return to the depot, and 50 disaster regions.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WAN et al.: DRL ENABLED MULTI-UAV SCHEDULING FOR DISASTER DATA COLLECTION 9
TABLE IV
C OMPARISON IN M EDIUM -S CALE AND L ARGE -S CALE P ROBLEMS
TABLE V
I MPACT OF D IFFERENT ROUTES S EQUENCE L ENGTH ON E XPERIMENTAL P ERFORMANCE
TABLE VI
I MPACT OF D IFFERENT P ROBLEM S CALES ON E XPERIMENTAL P ERFORMANCE
fixing the number of UAVs at 20 (see the lower part of average number of disaster regions allocated to each UAV in
Table IV), we can find that the superiority of the proposed different scenarios. Conventionally, the higher the ratio, the
method becomes more dominant. In addition, Figure 7 is more disaster regions visited by each UAV. Table V shows
employed to visualize the performance of algorithms in large- the performance of different algorithms by using the results
scale scenarios. of Greedy Algorithm as the benchmark. According to Table V
4) Advantages of Proposed Method in Long-Sequence and and Figure 7, it can be observed that the higher the ratio, the
Large-Scale Problems: Table V examines the impact of route better the performance of both learning-based methods, which
sequence length, which is determined by the ratio of Nregion shows that they have advantages over heuristic methods when
to NUAV , on algorithmic efficiency. Ratio N /U means the dealing with long route sequence problems. In comparison,
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
VI. C ONCLUSION
To address the problem of UAV-assisted disaster data col-
lection, this paper proposed an efficient multi-UAV scheduling
method considering the feature of time-varying data value.
First of all, a TOP-based mathematical model is developed to
maximize the data value collected by all UAVs. Besides, the
features of the time-varying data value in terms of both UAV
arrival time and service time are analyzed and modeled. Mean-
while, to accelerate the solution algorithms, the relationships
Fig. 7. Comparative trend curves across different problem scales between UAV route selection and service time are analyzed.
under 20 UAVs. An attention-based DRL method is then proposed, which can
obtain high-quality solutions in near real-time. Several typical
there is no evident growing trend in the performance of these scenarios are simulated and different algorithms are tested,
heuristic methods. which verify the advantages of the proposed method in both
In Table VI, the product of the number of disaster computing efficiency and solution quality.
regions and number of UAVs, N·U, is regarded as an indi- This work can be extended from the following directions.
cator to measure the scale of each problem, which varies Firstly, the proposed mathematical model with time-varying
from 500 to 10000. The percentage results of the four methods value can be further developed to incorporate more com-
listed in Table VI are still benchmarked against the Greedy plex scenarios, such as multiple disaster types, dynamic and
Algorithm. As the scale of the instances increases, the per- uncertain environments, and non-uniform spatial distribution
formance gap between our proposed method and compared of disasters. Secondly, the dynamics of the scenarios can be
algorithms becomes increasingly larger (as shown in the last considered with regard to the development of disasters and
column of Table VI). This phenomenon illustrates that in the changes of UAV numbers. Thirdly, the method proposed
some small-scale and medium-scale problems, other meth- in this paper can be further improved with higher robustness
ods, especially heuristic methods, can quickly traverse the to different scenarios. Fourthly, the multi-agent reinforcement
neighborhood and continuously update their optimal solutions. learning frameworks can be studied to deal with decentralized
However, in large-scale problems, the neighborhood of the cur- scenarios.
rent solution grows exponentially, which dramatically reduces
their iterative efficiency. By contrast, the efficiency of the R EFERENCES
proposed DRL method is minimally affected by the scale of [1] I. Nourbakhsh, R. Sargent, A. Wright, K. Cramer, B. McClendon,
the problem, and its performance on large-scale problems is and M. Jones, “Mapping disaster zones,” Nature, vol. 439, no. 7078,
also very good. pp. 787–788, Feb. 2006.
[2] M. Zook, M. Graham, T. Shelton, and S. Gorman, “Volunteered geo-
graphic information and crowdsourcing disaster relief: A case study
of the Haitian earthquake,” World Med. Health Policy, vol. 2, no. 2,
D. Discussions pp. 7–33, Jul. 2010.
According to the above experimental results, several advan- [3] M. Morton and J. L. Levy, “Challenges in disaster data collection during
recent disasters,” Prehospital Disaster Med., vol. 26, no. 3, pp. 196–201,
tages and managerial implications can be concluded. Jun. 2011.
Firstly, the proposed method is effective in realizing efficient [4] I. Junglas and B. Ives, “Recovering it in a disaster: Lessons from
and high-quality scheduling of UAVs for disaster data collec- Hurricane Katrina,” MIS Quart. Executive, vol. 6, no. 1, pp. 39–51,
2007.
tion, no matter small-scale problems or large-scale problems. [5] H. To, S. H. Kim, and C. Shahabi, “Effectively crowdsourcing the
In practice, the proposed method can be adopted for real-time acquisition and analysis of visual data for disaster response,” in Proc.
decision-making in disasters. IEEE Int. Conf. Big Data, Oct. 2015, pp. 697–706.
[6] P. R. Spence, K. A. Lachlan, and A. M. Rainear, “Social media and
Secondly, the proposed method performs better in long- crisis research: Data collection and directions,” Comput. Hum. Behav.,
sequence problems where the UAV will visit many regions in vol. 54, pp. 667–672, Jan. 2016.
one trip. In future, with the improved capacity of UAVs and [7] T. Sakano et al., “Disaster-resilient networking: A new vision based on
movable and deployable resource units,” IEEE Netw., vol. 27, no. 4,
their prolonged endurance, the long-sequence feature will be pp. 40–46, Jul. 2013.
more prominent as UAVs are capable of visiting more nodes in [8] S. Wang, Y. Long, Y. Zhou, and G. Xu, “Multi-UAV route planning
one trip. Therefore, the proposed method will be more popular for data collection from heterogeneous IoT devices,” in Proc. IEEE Int.
in many scenarios with the development of UAV technologies. Conf. Ind. Eng. Eng. Manag. (IEEM), Dec. 2022, pp. 1556–1560.
[9] X. Pang, M. Sheng, N. Zhao, J. Tang, D. Niyato, and K.-K. Wong,
Thirdly, in real-life applications, given the number of “When UAV meets IRS: Expanding air-ground networks via passive
regions in an area (e.g., the area of responsibility) and the reflection,” IEEE Wireless Commun., vol. 28, no. 5, pp. 164–170,
number of UAVs, the multi-UAV scheduling model can be pre- Oct. 2021.
[10] T. Ma et al., “UAV-LEO integrated backbone: A ubiquitous data collec-
trained using a simulation-based environment and then directly tion approach for B5G Internet of Remote Things networks,” IEEE J.
applied in practice. Our experimental results have shown that Sel. Areas Commun., vol. 39, no. 11, pp. 3491–3505, Nov. 2021.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
WAN et al.: DRL ENABLED MULTI-UAV SCHEDULING FOR DISASTER DATA COLLECTION 11
[11] X. Pang, W. Mei, N. Zhao, and R. Zhang, “Intelligent reflecting [33] H. Hu, K. Xiong, G. Qu, Q. Ni, P. Fan, and K. B. Letaief, “AoI-minimal
surface assisted interference mitigation for cellular-connected UAV,” trajectory planning and data collection in UAV-assisted wireless powered
IEEE Wireless Commun. Lett., vol. 11, no. 8, pp. 1708–1712, Aug. 2022. IoT networks,” IEEE Internet Things J., vol. 8, no. 2, pp. 1211–1223,
[12] Y. Tan, S. Li, H. Liu, P. Chen, and Z. Zhou, “Automatic inspection Jan. 2021.
data collection of building surface based on BIM and UAV,” Autom. [34] W. Chen, S. Zhao, Q. Shi, and R. Zhang, “Resonant beam charging-
Construct., vol. 131, Nov. 2021, Art. no. 103881. powered UAV-assisted sensing data collection,” IEEE Trans. Veh.
[13] K. Asadi et al., “An integrated UGV-UAV system for construc- Technol., vol. 69, no. 1, pp. 1086–1090, Jan. 2020.
tion site data collection,” Autom. Construct., vol. 112, Apr. 2020, [35] J. Chen et al., “Efficient data collection in large-scale UAV-aided
Art. no. 103068. wireless sensor networks,” in Proc. 11th Int. Conf. Wireless Commun.
[14] Z. Wei et al., “UAV-assisted data collection for Internet of Things: Signal Process. (WCSP), Oct. 2019, pp. 1–5.
A survey,” IEEE Internet Things J., vol. 9, no. 17, pp. 15460–15483, [36] L. Liu, K. Xiong, J. Cao, Y. Lu, P. Fan, and K. B. Letaief, “Average AoI
Sep. 2022. minimization in UAV-assisted data collection with RF wireless power
[15] I. Jawhar, N. Mohamed, J. Al-Jaroodi, D. P. Agrawal, and S. Zhang, transfer: A deep reinforcement learning scheme,” IEEE Internet Things
“Communication and networking of UAV-based systems: Classifica- J., vol. 9, no. 7, pp. 5216–5228, Apr. 2022.
tion and associated architectures,” J. Netw. Comput. Appl., vol. 84, [37] S. Fu et al., “Energy-efficient UAV-enabled data collection via wireless
pp. 93–108, Apr. 2017. charging: A reinforcement learning approach,” IEEE Internet Things J.,
[16] D. Liu et al., “Opportunistic UAV utilization in wireless networks: Moti- vol. 8, no. 12, pp. 10209–10219, Jun. 2021.
vations, applications, and challenges,” IEEE Commun. Mag., vol. 58, [38] P. Tong, J. Liu, X. Wang, B. Bai, and H. Dai, “Deep reinforcement
no. 5, pp. 62–68, May 2020. learning for efficient data collection in UAV-aided Internet of Things,” in
[17] W. Ejaz, A. Ahmed, A. Mushtaq, and M. Ibnkahla, “Energy-efficient task Proc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), Jun. 2020,
scheduling and physiological assessment in disaster management using pp. 1–6.
UAV-assisted networks,” Comput. Commun., vol. 155, pp. 150–157, [39] K. K. Nguyen, T. Q. Duong, T. Do-Duy, H. Claussen, and L. Hanzo,
Apr. 2020. “3D UAV trajectory and data collection optimisation via deep reinforce-
[18] X. Pang, J. Tang, N. Zhao, X. Zhang, and Y. Qian, “Energy-efficient ment learning,” IEEE Trans. Commun., vol. 70, no. 4, pp. 2358–2371,
design for mmWave-enabled NOMA-UAV networks,” Sci. China Inf. Apr. 2022.
Sci., vol. 64, no. 4, Apr. 2021, Art. no. 140303. [40] M. Sun, X. Xu, X. Qin, and P. Zhang, “AoI-energy-aware UAV-
[19] J. Gong, T.-H. Chang, C. Shen, and X. Chen, “Flight time minimization assisted data collection for IoT networks: A deep reinforcement learning
of UAV for data collection over wireless sensor networks,” IEEE J. Sel. method,” IEEE Internet Things J., vol. 8, no. 24, pp. 17275–17289,
Areas Commun., vol. 36, no. 9, pp. 1942–1954, Sep. 2018. Dec. 2021.
[20] Z. Wang, G. Zhang, Q. Wang, K. Wang, and K. Yang, “Completion time [41] Y. Wang et al., “Trajectory design for UAV-based Internet of Things
minimization in wireless-powered UAV-assisted data collection system,” data collection: A deep reinforcement learning approach,” IEEE Internet
IEEE Commun. Lett., vol. 25, no. 6, pp. 1954–1958, Jun. 2021. Things J., vol. 9, no. 5, pp. 3899–3912, Mar. 2022.
[21] Y. Emami, B. Wei, K. Li, W. Ni, and E. Tovar, “Deep Q-networks for [42] I.-M. Chao, B. L. Golden, and E. A. Wasil, “The team orienteering
aerial data collection in multi-UAV-assisted wireless sensor networks,” problem,” Eur. J. Oper. Res., vol. 88, no. 3, pp. 464–474, Feb. 1996.
in Proc. Int. Wireless Commun. Mobile Comput. (IWCMC), Jun. 2021, [43] H. Qin, X. Su, T. Ren, and Z. Luo, “A review on the electric vehicle
pp. 669–674. routing problems: Variants and algorithms,” Frontiers Eng. Manag.,
[22] M. T. Nguyen et al., “UAV-assisted data collection in wireless sensor net- vol. 8, no. 3, pp. 370–389, Sep. 2021.
works: A comprehensive survey,” Electronics, vol. 10, no. 21, p. 2603, [44] R. Aringhieri, S. Bigharaz, D. Duma, and A. Guastalla, “Novel appli-
Oct. 2021. cations of the team orienteering problem in health care logistics,” in
[23] S. R. Yeduri, N. S. Chilamkurthy, O. J. Pandey, and L. R. Cenkeramaddi, Optimization in Artificial Intelligence and Data Sciences. Rome, Italy:
“Energy and throughput management in delay-constrained small- Springer, 2022, pp. 235–245.
world UAV-IoT network,” IEEE Internet Things J., vol. 10, no. 9, [45] S. Saeedvand, H. S. Aghdasi, and J. Baltes, “Novel hybrid algorithm for
pp. 7922–7935, May 2023. team orienteering problem with time windows for rescue applications,”
[24] Q. Wu, P. Sun, and A. Boukerche, “Unmanned aerial vehicle-assisted Appl. Soft Comput., vol. 96, Nov. 2020, Art. no. 106700.
energy-efficient data collection scheme for sustainable wireless sensor [46] V. F. Yu, P. Jewpanya, S.-W. Lin, and A. N. P. Redi, “Team orienteering
networks,” Comput. Netw., vol. 165, Dec. 2019, Art. no. 106927. problem with time windows and time-dependent scores,” Comput. Ind.
[25] X. Li, J. Tan, A. Liu, P. Vijayakumar, N. Kumar, and M. Alazab, Eng., vol. 127, pp. 213–224, Jan. 2019.
“A novel UAV-enabled data collection scheme for intelligent transporta-
[47] Q. Yu, Y. Adulyasak, L.-M. Rousseau, N. Zhu, and S. Ma, “Team
tion system through UAV speed control,” IEEE Trans. Intell. Transp.
orienteering with time-varying profit,” Informs J. Comput., vol. 34, no. 1,
Syst., vol. 22, no. 4, pp. 2100–2110, Apr. 2021.
pp. 262–280, Jan. 2022.
[26] B. Alzahrani, O. S. Oubbati, A. Barnawi, M. Atiquzzaman, and
[48] E. Erkut and J. Zhang, “The maximum collection problem with time-
D. Alghazzawi, “UAV assistance paradigm: State-of-the-art in appli-
dependent rewards,” Nav. Res. Logistics, vol. 43, no. 5, pp. 749–763,
cations and challenges,” J. Netw. Comput. Appl., vol. 166, Sep. 2020,
Aug. 1996.
Art. no. 102706.
[27] J. Baek, S. I. Han, and Y. Han, “Energy-efficient UAV routing for [49] A. Ekici and A. Retharekar, “Multiple agents maximum collection
wireless sensor networks,” IEEE Trans. Veh. Technol., vol. 69, no. 2, problem with time dependent rewards,” Comput. Ind. Eng., vol. 64, no. 4,
pp. 1741–1750, Feb. 2020. pp. 1009–1018, Apr. 2013.
[28] T. Li, W. Liu, Z. Zeng, and N. N. Xiong, “DRLR: A deep-reinforcement- [50] G. Erdosan and G. Laporte, “The orienteering problem with variable
learning-based recruitment scheme for massive data collections in profits,” Networks, vol. 61, no. 2, pp. 104–116, Mar. 2013.
6G-based IoT networks,” IEEE Internet Things J., vol. 9, no. 16, [51] Q. Yu, C. Cheng, and N. Zhu, “Robust team orienteering prob-
pp. 14595–14609, Aug. 2022. lem with decreasing profits,” INFORMS J. Comput., vol. 34, no. 6,
[29] T. Wang, X. Pang, J. Tang, N. Zhao, X. Zhang, and X. Wang, “Time and pp. 3215–3233, Nov. 2022.
energy efficient data collection via UAV,” Sci. China Inf. Sci., vol. 65, [52] S. A. Shah, D. Z. Seker, S. Hameed, and D. Draheim, “The rising role
no. 8, Aug. 2022, Art. no. 182302. of big data analytics and IoT in disaster management: Recent advances,
[30] R. Penicka, J. Faigl, and M. Saska, “Physical orienteering problem for taxonomy and prospects,” IEEE Access, vol. 7, pp. 54595–54614, 2019.
unmanned aerial vehicle data collection planning in environments with [53] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song, “Learning
obstacles,” IEEE Robot. Autom. Lett., vol. 4, no. 3, pp. 3005–3012, combinatorial optimization algorithms over graphs,” in Proc. Adv. Neural
Jul. 2019. Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
[31] S. Aggarwal and N. Kumar, “Path planning techniques for unmanned [54] Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, and
aerial vehicles: A review, solutions, and challenges,” Comput. Commun., A. A. Cire, “Combining reinforcement learning and constraint program-
vol. 149, pp. 270–299, Jan. 2020. ming for combinatorial optimization,” in Proc. AAAI Conf. Artif. Intell.,
[32] M. Samir, S. Sharafeddine, C. M. Assi, T. M. Nguyen, and A. Ghrayeb, 2021, vol. 35, no. 5, pp. 3677–3687.
“UAV trajectory planning for data collection from time-constrained IoT [55] H. Hu, X. Zhang, X. Yan, L. Wang, and Y. Xu, “Solving a new 3D
devices,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 34–46, bin packing problem with deep reinforcement learning method,” 2017,
Jan. 2020. arXiv:1708.05930.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[56] Q. Cai, W. Hang, A. Mirhoseini, G. Tucker, J. Wang, and Pengfu Wan received the B.S. degree in indus-
W. Wei, “Reinforcement learning driven heuristic optimization,” 2019, try engineering from Nanjing University, Nanjing,
arXiv:1906.06639. China, in 2021, and the M.Sc. degree in engineer-
[57] S. Manchanda, A. Mittal, A. Dhawan, S. Medya, S. Ranu, and A. Singh, ing enterprise management from The Hong Kong
“GCOMB: Learning budget-constrained combinatorial algorithms over University of Science and Technology, Hong Kong,
billion-sized graphs,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, in 2022. He is currently pursuing the Ph.D. degree
2020, pp. 20000–20011. with the Department of Aeronautical and Aviation
[58] J. Li, M. Zhou, Q. Sun, X. Dai, and X. Yu, “Colored traveling sales- Engineering, The Hong Kong Polytechnic Univer-
man problem,” IEEE Trans. Cybern., vol. 45, no. 11, pp. 2390–2401, sity, Hong Kong.
Nov. 2015. His research interests include data-driven opti-
[59] X. Meng, J. Li, X. Dai, and J. Dou, “Variable neighborhood search for a mization and control, reinforcement learning, and
colored traveling salesman problem,” IEEE Trans. Intell. Transp. Syst., emergency management.
vol. 19, no. 4, pp. 1018–1026, Apr. 2018.
[60] X. Xu, J. Li, and M. Zhou, “Delaunay-triangulation-based variable
neighborhood search to solve large-scale general colored traveling
salesman problems,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3,
pp. 1583–1593, Mar. 2021.
[61] X. Xu, J. Li, M. Zhou, and X. Yu, “Precedence-constrained col-
ored traveling salesman problem: An augmented variable neighborhood Gangyan Xu (Member, IEEE) received the B.S.
search approach,” IEEE Trans. Cybern., vol. 52, no. 9, pp. 9797–9808, degree in automation and the M.E. degree in sys-
Sep. 2022. tems engineering from the Huazhong University of
[62] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neu- Science and Technology, Wuhan, China, in 2009 and
ral combinatorial optimization with reinforcement learning,” 2016, 2012, respectively, and the Ph.D. degree in systems
arXiv:1611.09940. engineering from The University of Hong Kong,
[63] W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing Hong Kong, in 2016.
problems!” 2018, arXiv:1803.08475. He is currently an Assistant Professor with The
[64] R. Gama and H. L. Fernandes, “A reinforcement learning approach to the Hong Kong Polytechnic University, Hong Kong.
orienteering problem with time windows,” Comput. Oper. Res., vol. 133, Prior to that, he was an Assistant Professor with the
Sep. 2021, Art. no. 105357. Harbin Institute of Technology, Shenzhen, China; a
Research Fellow with Nanyang Technological University, Singapore; and a
[65] B. Lin, B. Ghaddar, and J. Nathwani, “Deep reinforcement learning for
Research Assistant with the City University of Hong Kong, Hong Kong.
the electric vehicle routing problem with time windows,” IEEE Trans.
His research interests include data-driven optimization and control, intelligent
Intell. Transp. Syst., vol. 23, no. 8, pp. 11528–11538, Aug. 2022.
transportation systems, resilient engineering, and emergency management.
[66] J. Li et al., “Deep reinforcement learning for solving the heterogeneous
Dr. Xu is an Editorial Board Member of Advanced Engineering Informatics
capacitated vehicle routing problem,” IEEE Trans. Cybern., vol. 52,
and a Special Corresponding Expert of Frontiers of Engineering Management.
no. 12, pp. 13572–13585, Dec. 2022.
[67] T. Barrett, W. Clements, J. Foerster, and A. Lvovsky, “Exploratory
combinatorial optimization with reinforcement learning,” in Proc. AAAI
Conf. Artif. Intell., 2020, vol. 34, no. 4, pp. 3243–3250.
[68] M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác, “Reinforcement
learning for solving the vehicle routing problem,” in Proc. Adv. Neural
Inf. Process. Syst., vol. 31, 2018, pp. 1–11. Jiawei Chen received the B.S. degree in automation
[69] L. Duan et al., “A multi-task selected learning approach for solving 3D from the Harbin Institute of Technology, Shenzhen,
flexible bin packing problem,” 2018, arXiv:1804.06896. China, in 2022. She is currently pursuing the Ph.D.
[70] Z.-C. Li and Q. Liu, “Optimal deployment of emergency rescue stations degree with the Department of Aeronautical and
in an urban transportation corridor,” Transportation, vol. 47, no. 1, Aviation Engineering, The Hong Kong Polytechnic
pp. 445–473, Feb. 2020. University, Hong Kong.
[71] L. Zhuang, J. He, Z. Yong, X. Deng, and D. Xu, “Disaster information Her research interests include data-driven opti-
acquisition by residents of China’s earthquake-stricken areas,” Int. J. mization and control, intelligent transportation sys-
Disaster Risk Reduction, vol. 51, Dec. 2020, Art. no. 101908. tems, and reinforcement learning.
[72] S. Vigerske and A. Gleixner, “SCIP: Global optimization of mixed-
integer nonlinear programs in a branch-and-cut framework,” Optim.
Methods Softw., vol. 33, no. 3, pp. 563–593, May 2018.
[73] Y. Pang, Y. Zhang, Y. Gu, M. Pan, Z. Han, and P. Li, “Efficient data col-
lection for wireless rechargeable sensor clusters in harsh terrains using
UAVs,” in Proc. IEEE Global Commun. Conf., Dec. 2014, pp. 234–239.
[74] Y. Long, G. Xu, J. Zhao, B. Xie, and M. Fang, “Dynamic Truck–UAV
collaboration and integrated route planning for resilient urban emergency Yaoming Zhou received the B.Eng. degree in
response,” IEEE Trans. Eng. Manag., pp. 1–13, 2023. mechatronics from Zhejiang University, Hangzhou,
[75] O. Ghdiri, W. Jaafar, S. Alfattani, J. B. Abderrazak, and China, in 2014, and the Ph.D. degree in operations
H. Yanikomeroglu, “Energy-efficient multi-UAV data collection research from The University of Hong Kong, Hong
for IoT networks with time deadlines,” in Proc. IEEE Global Commun. Kong, in 2018.
Conf., Dec. 2020, pp. 1–6. From 2018 to 2019, he was a Senior Algorithm
[76] F. S. Moosavi Heris, S. F. Ghannadpour, M. Bagheri, and F. Zandieh, Engineer with Alibaba. Since 2019, he has been
“A new accessibility based team orienteering approach for urban tourism an Associate Professor with the Department of
routes optimization (a real life case),” Comput. Oper. Res., vol. 138, Industrial Engineering and Management, Shanghai
Feb. 2022, Art. no. 105620. Jiao Tong University, Shanghai, China. His research
[77] S. Alfattani, W. Jaafar, H. Yanikomeroglu, and A. Yongacoglu, “Multi- interests include the modeling and analysis of trans-
UAV data collection framework for wireless sensor networks,” in Proc. portation systems, and the integration of operations research, data analytics,
IEEE Global Commun. Conf. (GLOBECOM), Dec. 2019, pp. 1–6. and artificial intelligence and their application to transportation.
Authorized licensed use limited to: BLDEA's College of Eng & Tech - Vijayaura. Downloaded on February 09,2024 at 07:08:39 UTC from IEEE Xplore. Restrictions apply.