Professional Documents
Culture Documents
Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments For Multiple Autonomous Underwater Vehicles
Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments For Multiple Autonomous Underwater Vehicles
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6808 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
ter environments such as submarine reefs and different types of includes 2 parameters, and one is the length of the rescue route,
obstacles make rescue missions more complicated. Multi-AUV and the other is the reward within the Attraction Rescue Area
systems should consider the obstruction threats when rescue and Rescue Area. R-RLPSO is a real-time algorithm, which
missions are performed for security reasons. Fig. 1 illustrates can make a rapid response to rescue missions. Besides, it takes
that AUV1 encounters a cuboid obstacle before approaching into account the overall rescue efficiency of the multi-AUV
rescue mission T1 when AUV1 needs to change the scheduled system and ignores the relations between rescue missions.
rescue route. To ensure the rescue efficiency, the multi-AUV The main contribution of this paper is to provide a real-time
system still needs to make sure that the new rescue route is underwater rescue algorithm for multi-AUV systems. The
cost-effective and time-saving. algorithm mainly includes the following innovations:
Selecting an optimal rescue strategy is important for the 1) We propose a reward-based real-time rescue assignment
multi-AUV system to complete the rescue missions; this issue algorithm R-RLPSO based on RL and PSO to solve the rescue
is a nondeterministic polynomial (NP) complete problem [4]. missions for the multi-AUV system in the 3-D underwater
As the number of rescue missions increases, it is difficult to environment.
find an optimal rescue strategy [5]. Related algorithms mainly 2) We propose a concept with the name of Attraction Rescue
include negotiation algorithms [6], auction algorithms [7]–[9], Area. Meanwhile, we propose a linear reward function based
genetic algorithms [10]–[12], ant colony algorithms [13]–[15], on the proposed Attraction Rescue Area. For the waypoints
neural networks [16], [17]. Although researchers have in attraction Rescue Area, the reward value is calculated by a
proposed many approaches for task assignments, these linear reward function.
algorithms may not be suitable for the actual underwater 3) We propose Reward Coefficient based on reward of all
rescue scenarios. The most critical issue for rescue missions Attraction Rescue Areas and Rescue Areas, aiming at speed
to be considered is how to speed up rescue and decrease up the convergence of R-RLPSO and mark the current reward
the resource consumption. The negotiation algorithms and states of rescue missions.
auction algorithms for rescue assignments are required to 4) To make the simulation match the actual rescue envi-
consider handling complicated relations between AUVs and ronment, we construct rescue missions in the 3-D underwater
rescue missions. Besides, negotiation algorithms and auction environment, including the submarine reefs and different types
algorithms generally include two processes, including rescue of obstacles.
assignment and path planning, which are not suitable for The rest of the paper is organized as followes: Section II
real-time rescue missions. Meanwhile, these algorithms mainly analyzes the related work on task assignment. Section III
concentrate on rescue assignments and pay less attention to describes the problem state. Section IV gives the proposed
path planning. For underwater rescue missions, path planning algorithm. Section V shows the simulation results and analysis.
is also an important aspect. The biological heuristic intelligent Section VI concludes the paper.
optimization algorithm has been widely used in path planning.
When genetic algorithms and ant colony algorithms are II. R ELATED W ORKS
applied to perform rescue missions, the algorithms have higher The core processing in searching the optimal rescue assign-
computational complexity. Besides, these algorithms require ment strategy belongs to the task assignment, which has been
prior experiences. Especially for the ant colony algorithm, extensively studied. There have been mainly three types of
it needs to determine the reach ability between the two rescue methods, including linear programming, market mechanisms,
missions, which may easily lead to rescue failure [18]. and intelligence algorithms [19].
To overcome the weaknesses of the algorithms as mentioned In the early study of task assignments, linear program-
above, it is necessary to provide a real-time strategy to solve ming has been a classic method of solving task assignments.
real-time rescue assignment for the multi-AUV system. There- Darrah et al. [20] studied the multi-AUV dynamic task assign-
fore, we propose the approach of Reward acting on Reinforce- ment using mixed integer linear programming. Although this
ment Learning and Particle Swarm Optimization (R-RLPSO) method can accurately provide a strategy for task assignment,
to resolve the rescue assignment for the multi-AUV system the computational complexity is high and does not meet the
in the 3-D complex underwater environment. Compared with requirements of real-time rescue missions. Zu et al. [21]
negotiation algorithm and auction algorithm, R-RLPSO not applied the Hungarian algorithm to solve the task assignment.
only pays attention to path planning, but also merge rescue task It resolved the robot on how to get missions and realize them
assignment and path planning into one process, which reduces at a minimal cost, but the computational complexity is quite
the time and suits real-time rescue tasks. Compared with the high, which cannot meet real-time requirements in complex
biological heuristic algorithm, this algorithm does not need to scenarios.
consider the relationship between tasks, but is determined by The advantages of the market-based mechanism based
the location of rescue points, which reduces the computational approaches are that the calculation is simple. However, before
complexity and improves the efficiency. In R-RLPSO, the res- a multi-AUV system performing rescue missions, it needs
cue state for each rescue mission is represented by the reward, to consider an optimal rescue assignment strategy, after that,
which is obtained based on reinforcement learning (RL). considering the implementation strategy of rescue missions
Particle swarm optimization (PSO) is utilized to produce a efficiently. Meanwhile, these approaches mainly focus on res-
rescue route for obstacle avoidance. The quality of the rescue cue assignments rather than how to complete rescue missions.
route is evaluated by the cost function. The cost function Charles et al. [22] introduced a Bayesian formulation for
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
WU et al.: REINFORCEMENT LEARNING AND PARTICLE SWARM OPTIMIZATION SUPPORTING REAL-TIME RESCUE ASSIGNMENTS 6809
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6810 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
WU et al.: REINFORCEMENT LEARNING AND PARTICLE SWARM OPTIMIZATION SUPPORTING REAL-TIME RESCUE ASSIGNMENTS 6811
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6812 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
then the waypoint will fall within the Rescue Area, which Then, the Reward Coefficient vector of W ( j ) at the j -th
means that the current rescue mission will be completed and iteration can be shown in (11).
the reward of γ tends to ε. For each rescue mission, there is ( j) ( j) ( j)
γ1 γk γN
not only a single waypoint p(x, y, z) within Attraction Rescue W ( j) = [ ( j)
,...,( j)
,...,
( j)
], (11)
Area and Rescue Area. Therefore, the reward of each rescue Wsum Wsum Wsum
mission should be the summation of the rewards of a certain For the waypoint p(x, y, z) at the k-th rescue mission,
number of waypoints on the rescue route. These waypoints ( j +1)
the reward γk at ( j +1)-th iteration can be shown by (12).
are determined via measuring the overall rescue efficiency
through multi-AUV system. For each AUV in multi-AUV di st = (x − x i )2 + (y − yi )2 + (z − z i )2 , (12)
system, it produces an optimal rescue strategy by C O ST _F
that only measures the summation of the reward of all rescue Then, the reward value γ can be expressed as (13).
⎧
missions, which meets less concerned the relationship between ⎪0 di st > R1 ,
⎪
⎪
rescue missions. ⎨ ( )
( j +1) (1 + W (k))ε
j di st < R0 ,
When R0 ≤ di st ≤ R1 , it is novel and efficient to γk =
⎪
⎪ ((1 − (di st − R 0 )/(R 1 − R 0 ))ε)
propose such a linear reward function. If this situation is not ⎪
⎩ R0 ≤ di st ≤ R1 ,
considered, then the reward of γ for the waypoint p(x, y, z) (1 + W ( j ) (k))
becomes a Boolean problem. The disadvantage of this way (13)
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
WU et al.: REINFORCEMENT LEARNING AND PARTICLE SWARM OPTIMIZATION SUPPORTING REAL-TIME RESCUE ASSIGNMENTS 6813
( j +1)
where W ( j ) (k) is the Reward Coefficient of the k-th rescue γk needs to be punished, as shown in (15).
mission at the j -th iteration. The proposed Reward Coefficient ( j +1)
vector is necessary for performing rescue missions in the ( j +1) γk − ε1 W ( j ) (k) > 0 and W ( j +1)(k) = 0,
γk =
multi-AUV system, including one is that the waypoints of 0 other wi se,
rescue route within the Attraction Rescue Area and Rescue (15)
Area makes the cost function C O ST _F fall more signifi-
cantly. Therefore, it is more easily selected as the optimal where W ( j ) (k) is the Reward Coefficient of the k-th rescue
rescue route by the cost function C O ST _F. The other, more mission at j -th iteration. ε1 is the penalty value of the reward.
important, is that the Reward Coefficient vector of W ( j ) at the
j -th iteration is passed to the reward of ( j + 1)-th iteration. V. P ERFORMANCE
Reward transmission means that the reward is not only related To demonstrate the effectiveness of R-RLPSO in under-
to the current iteration but also related to the next reward. Due water rescue missions, several simulations are constructed in
to the limitation of the cost function C O ST _F, for the Vi MATLAB R2016b, where the personal computer is configured
in multi-AUV system, it is impossible to perform all rescue with Intel Core i3-7100U @3.9GHz, 8GigaBytes (GB) RAM.
missions. It will have a particular “preference” for some rescue In our simulations, the number of rescue missions is the value
missions, and this “preference” will make part of the Reward of 7, there are four cuboid objects and six sphere objects in
Coefficient vector W ( j ) which is equal to zero. Meanwhile, the 3-D underwater environment. We use the sailing distance
the “preference” part has larger reward weight, and the more of AUV passes through the task area to value the quality of
significant weight in the Reward Coefficient vector of W ( j ) is given algorithms.
just the reward weight of the Rescue Area and the Attraction
Rescue Area. As shown in (11), this reward weight will be
passed to the next iteration, which we call the “strength to A. Simulation Environment Settings of R-RLPSO Algorithm
strength” of the reward. The simulation were constructed in a three-dimensional
( j +1)
However, the reward of γk does not always increase; underwater environment. The data for the underwater environ-
it should be punished in two aspects. On the one hand, ment was real data which was downloaded from the National
the waypoints are excessively accumulated in each Attrac- Marine Science Data Center. It can be seen to include obstacles
tion Rescue Area and Rescue Area. On the other hand, and uneven seabed. The parameters for R-RLPSO are designed
no waypoints falling within each Attraction Rescue Area and as followed: inertia weight factor w linearly decreases from
Rescue Area. The reason for the first phenomenon is that 0.9 to 0.4; learning factor c1 linearly decreases from 2.5 to
Attraction Rescue Area and Rescue Area use greedy method to 0.5, learning factor c2 linearly increases from 0.5 to 2.5. The
attract waypoints. To avoid the rescue route distortion caused parameter α and β in C O ST _F is 2 and 10, respectively.
by excessive accumulation of waypoints in the Attraction The maximum number of iterations is 50, the number of
Rescue Area and Rescue Area, the reward of γk has to be particles is 300. Besides, the reward of ε is 0.1, the penalty
punished. For each rescue mission, if the upper limit of the reward value ε1 is 0.5, the upper limit κ is 10, the radius
waypoints is set to κ and the penalty value is set to ε1 of Attraction Rescue Area is 10 meters (m). The background
( j +1)
where ε1 is a constant. After that, the reward of γk is of the simulation is as follows: seven shipwreck accidents are
penalized if the number of waypoints in the Attraction Rescue occurring, the multi-AUV system needs to move from the base
Area and Rescue Area exceeds the upper limit κ. As shown station S, where the coordinate of S is (5, 5, 5) and performing
in (14). seven rescue missions T = {T1 , T2 , . . . , T7 }. After the rescue
missions are completed, AUV swarm needs to reach their
( j +1)
( j +1) γk − ε1 η > κ, substations G 1 , G 2 , and G 3 , where the coordinates of them
γk = ( j +1) (14)
γk η ≤ κ, are (180, 180, 0), (130, 180, 0), and (180, 130, 0), respectively.
Meanwhile, we set different types of obstacles along the rescue
where the η represents the number of waypoints within each route, including cuboid obstacles C = {Ci |i = 1, 2, 3, 4},
Attraction Rescue Area and Rescue Area. The reason for sphere obstacles S = {Si |i = 1, 2, . . . , 6}, and submarine
the second phenomenon is that the AUV abandons performing reefs. Each Rescue Area is represented by a sphere object.
the related rescue missions. However, it is impossible to Each Attraction Rescue Area is also a virtual sphere object,
complete all rescue missions for a single AUV in the 3-D which contains the Rescue Area. According to the Degree
underwater environment. Therefore, it is normal for AUV to of Mission Completion, the AUV passes the predetermined
abandon some rescue missions. As shown in (11), each AUV Rescue Area, indicating that the rescue mission is completed.
has a particular “preference” for some rescue missions. For The description of the rescue mission is shown in Table I.
a single AUV, “Non-preference” rescue missions should not As shown in Table II and Table III, P represents the center
have waypoints. However, for “preference” rescue missions, point of the rescue mission, and R represents the coverage
( j +1)
the reward of γk needs to be punished if there are no area. For obstacles in the environment model, the description
waypoints. For the AUV at the k-th Rescue Area, if the of sphere obstacles and cuboid obstacles are shown in Table II
situation that W ( j ) (k) > 0 at the j -th iteration, but W ( j +1) = and Table III, respectively. In Table II, P and R represents the
0 at the ( j + 1)-th iteration occurs. Then, the reward value center of the sphere objects and the radius of sphere objects,
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6814 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
WU et al.: REINFORCEMENT LEARNING AND PARTICLE SWARM OPTIMIZATION SUPPORTING REAL-TIME RESCUE ASSIGNMENTS 6815
TABLE IV TABLE V
T HE D ESCRIPTION OF C UBOID O BSTACLES ( M ) T HE D ISTANCE B ETWEEN AUV S WARM AND R ESCUE M ISSIONS
A FTER THE F IRST S TAGE ( M )
TABLE VI
T HE D ISTANCE B ETWEEN AUV AND R ESCUE M ISSIONS
rescue missions, where C1 -C4 represent the cuboid obstacles, A FTER THE S ECOND S TAGE ( M )
and S1 -S6 represent the sphere obstacles. T1 -T7 represent
the rescue missions. Three AUV are moving from the base
station to perform rescue missions. We assume that each AUV
completes rescue in 100 steps. In the simulation, the rescue
process of each AUV is divided into four stages. Fig. 5(b)
shows the first stage of the rescue process. At this stage,
AUV swarm are far away from the rescue missions, and no TABLE VII
rescue missions are completed. The three AUVs will face three T HE C LOSEST D ISTANCE B ETWEEN AUV AND R ESCUE M ISSION
missions: T1 , T4 , and T6 . After the first stage of the rescue T3 D URING THE T HIRD S TAGE ( M )
process, the distances between the AUV swarm and rescue
missions are shown in Table V. Besides, AUV1 will face the
cuboid obstacles C2 , and AUV2 avoids the sphere obstacles S1
and the cuboid obstacles C2 . Fig. 5(c) shows the second stage
of the rescue process. At this stage, AUV swarm performs one
of these missions, respectively. Table IV shows that AUV1 is closest distance of AUV2 and AUV3 to T3 are 44.94 m and
the closest to T6 , AUV2 is the closest to T1 , and AUV3 is 31.34 m, respectively. Fig. 5(e) shows that AUV swarm has
the closest to T6 after the first stage of the rescue process. performed all rescue missions, and AUV swarm return to their
However, the simulation result shows that the AUV swarm substations, respectively.
do not rush to its nearest mission under the action of algo-
rithm R-RLPSO, AUV1 tends to T1 , AUV2 tends to T4 , and
AUV3 tends to T6 . Here, the most crucial issue is that AUV C. Algorithm Performance Analysis of R-RLPSO Algorithm
swarm are not pre-defined in selecting rescue missions and The reward plays a vital role in the rescue assignment.
the algorithm is lightweight computational complexity, which Each AUV automatically selects rescue missions based on
meet the real-time characteristics. Especially for AUV2, Table the current reward. Fig. 6(a) shows the reward of rescue
IV shows that the closest distance from the rescue mission T1 missions for AUV1. The simulation result shows that AUV1 is
is 27.62 m, and the farthest distance from T4 is 41.78 m. scheduled to perform T1 , T2 , and T3 in the initial rescue state.
However, considering the overall efficiency of performing The reward of the other rescue missions is zero. The reason for
rescue missions, AUV2 still selects to perform T4 rather than this phenomenon is that because of the role of C O ST _F; the
T1 . After the second stage, the distance between the AUV multi-AUV system needs to be cost-effective and time-saving
swarm and rescue missions is shown in Table VI. In addition, during the process of rescue. T1 , T2 , and T3 have temporary
AUV1 avoids the cuboid obstacles C2 and AUV3 avoids reward stability during the process of iterations. The reason
the sphere obstacles S3 . Fig. 5(d) shows the third stage of for this phenomenon is that the AUV1 does not find better
the rescue process. At this stage, all AUVs perform one of waypoints for performing rescue missions. Then, the reward
these missions, respectively. Table VI shows that AUV1 is the value of T3 gradually increases during the process of iterations.
closest to T2 , AUV2 is also the closest to T2 , and AUV3 is the The reward value of T2 increases rapidly and stabilizes after a
closest to T7 after the second stage. However, the simulation brief decrease. The reward of T1 decreases at the beginning of
result shows that AUV1 tends to T2 , AUV2 tends to T5 , iterations, but it rises and stabilizes in the later of iterations.
and AUV3 tends to T7 . In addition, AUV1 avoids the sphere When the AUV1 performs rescue missions, it should not
obstacles S4 , and AUV2 avoids the cuboid obstacles C3 . pay much attention to the single rescue mission, because
Fig. 5(d) shows that AUV1 has completed T3 at the end of the it can distort the rescue route. Considering the rescue cost,
third stage. Table VII shows the closest distance between AUV AUV1 should have a comprehensive measurement of all rescue
swarm and rescue mission T3 during the process of the third missions. When the reward of T1 decreases, the reward value
stage. The result indicates that AUV1 is only 2.34 m away of T2 and T3 must increase. This shows that AUV can
from the center point of T3 . This waypoint occurs at step 79, comprehensively measure the situation of each rescue mission
and the coordinate of the waypoint is p(148.97, 141.10, 2.20). to find the best location of waypoints.
Table II shows that the radius of T3 is 3 m. Therefore, In Fig. 7, the blue dotted line shows the total reward
AUV1 has completed the T3 during the third stage. The of rescue missions for AUV1. The total reward consists of
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6816 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
WU et al.: REINFORCEMENT LEARNING AND PARTICLE SWARM OPTIMIZATION SUPPORTING REAL-TIME RESCUE ASSIGNMENTS 6817
TABLE VIII
T HE WAYPOINTS OF M ULTI -AUV S YSTEM FALLING ATTRACTION R ESCUE A REAS AND R ESCUE A REAS ( M )
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6818 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
WU et al.: REINFORCEMENT LEARNING AND PARTICLE SWARM OPTIMIZATION SUPPORTING REAL-TIME RESCUE ASSIGNMENTS 6819
R EFERENCES
[1] S. M. Zadeh, D. M. W. Powers, K. Sammut, and A. M. Yazdani,
“A novel versatile architecture for autonomous underwater vehicle’s
motion planning and task assignment,” Soft Comput., vol. 22, no. 5,
pp. 1687–1710, Mar. 2018.
[2] D. Zhu, H. Huang, and S. X. Yang, “Dynamic task assignment and path
planning of multi-AUV system based on an improved self-organizing
map and velocity synthesis method in three-dimensional underwa-
ter workspace,” IEEE Trans. Cybern., vol. 43, no. 2, pp. 504–514,
Apr. 2013.
Fig. 11. The rescue mission assignment by IACO algorithm. [3] W. K. Zhang, G. X. Wang, G. H. Xu, C. Liu, and X. Shen, “Development
of control system in abdominal operating ROV,” Chin. J. Ship Res.,
vol. 12, no. 2, pp. 124–132, 2017.
[4] J. Faigl, P. Vana, and J. Deckerova, “Fast heuristics for the 3-D
If we change the D I STmin = 30 m and D I STmax = 100 m, multi-goal path planning based on the generalized traveling salesman
the rescue result of the multi-AUV system is shown in Fig. 11. problem with neighborhoods,” IEEE Robot. Autom. Lett., vol. 4, no. 3,
Fig. 11 shows that the AUV1 and AUV3 completed the pp. 2439–2446, Jul. 2019.
[5] S. MahoudZadeh, D. M. W. Powers, and A. M. Yazdani, “A novel
rescue missions successfully. For AUV2, because there is efficient task-assign route planning method for AUV guidance in a
no relation between the starting point S and T4 , this limits dynamic cluttered environment,” in Proc. IEEE Congr. Evol. Comput.
AUV2 performing rescue mission T4 . Meanwhile, to ensure the (CEC), Jul. 2016, pp. 678–684.
[6] S. Moon, E. Oh, and D. H. Shim, “An integral framework of task
rescue cost of the multi-AUV system, AUV2 has to go through assignment and path planning for multiple unmanned aerial vehicles
the rescue mission T1 that has been rescued by AUV1, fail- in dynamic environments,” J. Intell. Robot. Syst., vol. 70, nos. 1–4,
ing T4 . The location of the Rescue Area can be easily detected pp. 303–313, Apr. 2013.
[7] W. Yao, N. Qing, N. Wan, and Y. Liu, “An iterative strategy for task
by radar detection, but the prior experience is required for the assignment and path planning of distributed multiple unmanned aerial
values of D I STmin and D I STmax . Improper experience can vehicles,” Aerosp. Sci. Technol., vol. 86, pp. 455–464, Mar. 2019.
easily lead to failure of rescue missions. The R-RLPSO is a [8] J. Zhang, G. Wang, X. Yao, Y. Song, and F. Zhao, “Research on task
assignment optimization algorithm based on multi-agent,” in Proc. Chin.
real-time rescue algorithm that does not consider the complex Automat. Congr. (CAC), Xi’an, China, Nov. 2018, pp. 2179–2183.
relationships between rescue missions, as long as the location [9] G. Ferri, A. Munafo, A. Tesei, and K. LePage, “A market-based task allo-
information of rescue missions is obtained, the multi-AUV cation framework for autonomous underwater surveillance networks,” in
Proc. OCEANS, Aberdeen, U.K., Jun. 2017.
system can find the most cost-effective rescue strategy and [10] A. Alvarez, A. Caiti, and R. Onken, “Evolutionary path planning
generate rescue route in a short time and its low computing for mobile robot navigation,” IEEE J. Ocean Eng., vol. 29, no. 2,
time can guarantee the real-time performance of rescue task. pp. 418–429, Apr. 2004.
[11] X. Bai, W. Yan, S. S. Ge, and M. Cao, “An integrated multi-population
genetic algorithm for multi-vehicle task assignment in a drift field,” Inf.
VI. C ONCLUSION Sci., vol. 453, pp. 227–238, Jul. 2018.
[12] S. Li, X. Xu, and L. Zuo, “Task assignment of multi-robot systems based
In this paper, we have provided an R-RLSPO algorithm to on improved genetic algorithms,” in Proc. IEEE Int. Conf. Mechatronics
achieve the real-time rescue assignments for the multi-AUV Autom. (ICMA), Beijing, China, Aug. 2015, pp. 1430–1435.
system in the 3-D complex underwater environment. Com- [13] Z. Xu, Y. Li, and X. Feng, “Constrained multi-objective task assign-
ment for UUVs using multiple ant colonies system,” in Proc. ISECS
pared with the existing algorithms, the obvious advantage of Int. Colloq. Comput., Commun., Control, Manage., Guangzhou, China,
the R-RLPSO algorithm is to ensure that the rescue missions Aug. 2008, pp. 462–466.
are completed under the premise of cost-effective, rapid rescu- [14] X. Qin et al., “Task allocation of multi-robot based on improved
ant colony algorithm,” Space Control Technol. Appl., vol. 44, no. 5,
ing, and less concerned about the relationship between rescue pp. 55–59, Oct. 2018.
missions. By the R-RLPSO algorithm, the multi-AUV system [15] G. Li, L. Boukhatem, and J. Wu, “Adaptive quality-of-service-based
can adaptively select rescue missions to find the optimal rescue routing for vehicular ad hoc networks with ant colony optimization,”
IEEE Trans. Veh. Technol., vol. 66, no. 4, pp. 3249–3264, Apr. 2017.
strategy, which meets the needs of the real-time rescue in the [16] X. Cao, D. Zhu, and S. X. Yang, “Multi-AUV target search based
actual scenario. on bioinspired neurodynamics model in 3-D underwater environments,”
Our future works may focus on the following. IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 11, pp. 2364–2374,
Nov. 2016.
1) Considering the establishment of communication mecha- [17] D. Zhu, X. Cao, B. Sun, and C. Luo, “Biologically inspired self-
nism in the multi-AUV system. When one of the AUVs fails to organizing map applied to task assignment and path planning of an AUV
perform the rescue missions, it will automatically transmit the system,” IEEE Trans. Cognit. Develop. Syst., vol. 10, no. 2, pp. 304–313,
Jun. 2018.
subsequent rescue mission queue to the nearest AUV. When [18] I. Younas, F. Kamrani, M. Bashir, and J. Schubert, “Efficient genetic
the nearest neighbor AUV receives the rescue mission queue, algorithms for optimal assignment of tasks to teams of agents,” Neuro-
it will add the rescue mission queue to its own rescue missions, computing, vol. 314, pp. 409–428, Nov. 2018.
[19] S. MahmoudZadeh, D. M. W. Powers, K. Sammut, and A. Yazdani,
and then weigh the overall rescue efficiency to execute the “Toward efficient task assignment and motion planning for large-scale
rescue missions. underwater missions,” Int. J. Adv. Robot. Syst., vol. 13, no. 5, pp. 1–13,
2) Building more intelligent real-time multi-AUV rescue 2016.
[20] M. A. Darrah, W. Niland, and B. M. Stolarik, “Multiple UAV dynamic
system. The heuristic information is produced by the neural task allocation using mixed integer linear programming in a SEAD
network in the process of rescue missions. The heuristic mission,” in Proc. Infotech, 2006, p. 7164.
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.
6820 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
[21] L.-N. Zu, Y.-T. Tian, J.-C. Fu, and J.-F. Liu, “Algorithm of task- Chengxin Song received the B.S. degree from
allocation based on realizing at the lowest cost in multimobile robot Shenyang Aerospace University in 2017, where he is
system,” in Proc. Int. Conf. Mach. Learn. Cybern., Shanghai, China, currently pursuing the master’s degree. His current
Aug. 2004, pp. 152–156. research interests include intelligent path planning
[22] E. P. Charles and C. Henrik, “A Bayesian formulation for auction- and energy optimization on AUV.
based task allocation in heterogeneous multi-agent teams,” Proc SPIE,
vol. 8047, no. 23, May 2011, Art. no. 804710.
[23] G. Oh, Y. Kim, J. Ahn, and H. L. Choi, “Market-based distributed task
assignment of multiple unmanned aerial vehicles for cooperative timing
mission,” J. Aircr., vol. 54, no. 6, pp. 2298–2310, 2017.
[24] L. Wang and Z. Wang, “Collection path ant colony optimization for
multi-agent static task allocation,” J. Inf. Comput. Sci., vol. 9, no. 18,
pp. 5689–5696, 2012.
[25] L. Lin, S. Qibo, W. Shangguang, and Y. Fangchun, “Research on
PSO based multiple UAVs real-time task assignment,” in Proc. 25th
Jian Ma received the B.S. degree from Shenyang
Chin. Control Decis. Conf. (CCDC), Guiyang, China, May 2013, Aerospace University in 2018, where he is currently
pp. 1530–1536. pursuing the master’s degree. His main research
[26] G. Oh, Y. Kim, J. Ahn, and H.-L. Choi, “PSO-based optimal task allo-
interests include multi-UAV group algorithm and
cation for cooperative timing missions,” IFAC-PapersOnLine, vol. 49,
UAV networks communication security.
no. 17, pp. 314–319, Aug. 2016.
[27] H. Wang, J. Yuan, H. Lv, and Q. Li, “Task allocation and online path
planning for AUV swarm cooperation,” in Proc. OCEANS, Aberdeen,
U.K., Jun. 2017, pp. 1–6.
[28] Y. Shi and R. C. Eberhart, “Parameter selection in particle swarm
optimization,” in Proc. Int. Conf. Evol. Program., Berlin, Germany,
1998, pp. 592–600.
[29] S. MahmoudZadeh, A. M. Yazdani, K. Sammut, and D. M. W. Powers,
“Online path planning for AUV rendezvous in dynamic cluttered under-
sea environment using evolutionary algorithms,” Appl. Soft Comput.,
vol. 70, pp. 929–945, Sep. 2018. Jinsong Wu (Senior Member, IEEE) received the
[30] W. Song, Y. Zhou, X. Hu, S. Duan, and H. Lai, “Memristive neural Ph.D. degree from the Department of Electrical
network based reinforcement learning with reward shaping for path and Computer Engineering, Queen’s University,
finding,” in Proc. 5th Int. Conf. Inf., Cybern., Comput. Social Syst. Kingston, ON, Canada. He was the Founder and the
(ICCSS), Hangzhou, China, Aug. 2018, pp. 200–205. Founding Chair of the IEEE Technical Committee on
[31] M. Dorigo, M. Birattari, and T. Stutzle, “Ant colony optimization,” IEEE Green Communications and Computing (TCGCC).
Comput. Intell. Mag., vol. 1, no. 4, pp. 28–39, Nov. 2006. He is elected as a Vice-Chair, Technical Activities,
[32] B. Sun, D. Zhu, and S. X. Yang, “An optimized fuzzy control algorithm IEEE Environmental Engineering Initiative, and a
for three-dimensional AUV path planning,” Int. J. Fuzzy Syst., vol. 20, pan-IEEE effort under the IEEE Technical Activities
no. 2, pp. 597–610, Feb. 2018. Board (TAB). He is also the Co-Founder and the
[33] T. Kohonen, “The self-organizing map,” IEEE Proc., vol. 78, no. 9, Founding Vice-Chair of IEEE Technical Committee
pp. 1464–1480, Sep. 1990. on Big Data (TCBD).
Jiehong Wu (Member, IEEE) received the Ph.D. Guangjie Han (Senior Member, IEEE) received
degree in computer architecture from Northeastern the Ph.D. degree from Northeastern University,
University in 2008. She was sponsored by Chi- Shenyang, China, in 2004. He is currently a Profes-
nese Government as a Visiting Scholar with Wright sor with the Department of Information and Com-
State University, Dayton, OH, USA, in 2011. She munication System, Hohai University, Changzhou,
is currently a Professor and a Prominent Teacher China, and a Distinguished Professor with the Dalian
with Shenyang Aerospace University, China. Her University of Technology, Dalian, China.
main research interests include UAV/AUV/UUV sys-
tem’s correspondence security, autonomous obstacle
avoidance and defense, and power consumption opti-
mization.
Authorized licensed use limited to: Kasetsart University provided by UniNet. Downloaded on April 24,2024 at 08:54:08 UTC from IEEE Xplore. Restrictions apply.