You are on page 1of 11

This article has been accepted for publication in IEEE Transactions on Industrial Informatics.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX 1

A Cooperative Evolutionary Computation


Algorithm for Dynamic Multiobjective Multi-AUV
Path Planning
Xiao-Fang Liu, Member, IEEE, Yongchun Fang, Senior Member, IEEE, Zhi-Hui Zhan, Senior Member,
IEEE, Yun-Liang Jiang, Jun Zhang, Fellow, IEEE

Abstract— Multiple autonomous underwater vehicles diversity and optimality.


(AUVs) are popular for executing submarine missions,
which involve multiple targets distributed in a large and Index Terms— Dynamic multiobjective; multi-AUV path
complex underwater environment. The path planning of planning; differential evolution; ant colony system; evo-
multiple AUVs is a significant and challenging problem, lutionary computation; multiple populations for multiple
which determines the location of surface points for AUV objectives
launch and plans the paths of AUVs for target travel-
ing. Most existing works model the problem as a single-
objective static optimization problem. However, the target I. I NTRODUCTION
missions may change over time, and multiple optimiza-
tion objectives are usually expected for decision making. Autonomous underwater vehicles (AUVs) are qualified for
Thus, this paper models the problem as a dynamic multiob- a wide range of subsea missions [1], such as deep surveys of
jective optimization problem and proposes a cooperative hydrothermal vent fields [2] and underwater communication
evolutionary computation algorithm to provide diverse and [3]. Due to the limitation of a single AUV in terms of
high-quality solutions for decision makers. In the proposed energy and capacities, multiple AUVs usually cooperatively
method, solutions are represented using a bi-layer encode
scheme, in which the first layer indicates the surface lo- work to complete complex missions in a large underwater
cation points and the second layer represents the trav- environment [4]. The target missions can be decomposed into
eling sequences of target missions. Multiple populations multiple subsets and each subset is assigned to one AUV for
for multiple objectives (MPMO) framework is adopted to execution. For example, in monitoring scenarios, a group of
efficiently solve the dynamic multiobjective AUV optimiza- AUVs are used to visit particular targets in parallel for quickly
tion problem. In addition, a recombination-based sampling
strategy is developed to improve convergence by fusing investigating the whole area [5]; in tactical edge networks,
the information of multiple populations. Once a change AUVs work together to achieve a high degree of coordination
occurs, an incremental response strategy is adopted to and communications [6]. To save energy, AUVs are usually
generate high-quality solutions for population evolution. launched from movable platforms, e.g., unmanned surface
Based on the dataset of New Zealand bathymetry, six vehicle, on the ocean rather than a coast. The movable surface
complex underwater scenarios are constructed with a size
of 50km×50km×10km and 400 target missions for tests. vehicles and AUVs form a multi-AUV system to fulfill target
Experimental results show that the proposed method out- missions. To build such a system, the path planning of multiple
performs the state-of-the-art algorithms in terms of solution AUVs is one of challenging problems, especially in a large
underwater environment with complex bathymetry, irregular
Manuscript . . . This work was supported in part by the National obstacles, and risky areas.
Natural Science Foundation of China (NSFC) under Grants 62103202, The multi-AUV path planning problem aims to locate the
61903139, and U22A20102, and in part by the Natural Science Founda-
tion of Tianjin under Grant 21JCQNJC00140. (Corresponding authors: surface points and find the optimal paths of AUVs to travel
Zhi-Hui Zhan and Jun Zhang) target missions. In the literature, many research efforts are
Xiao-Fang Liu and Yongchun Fang are with the Institute of Robotics devoted to developing various methods [7]. Most of them
and Automatic Information System, College of Artificial Intelligence,
Nankai University, and also with the Tianjin Key Laboratory of Intelli- do not consider the location of surface points and solve the
gent Robotics, Nankai University, Tianjin 300350, China (email: liuxiao- mission assignment and path planning of AUVs separately.
fang@nankai.edu.cn; fangyc@nankai.edu.cn). Particularly, the mission assignment problem is NP-hard, and
Zhi-Hui Zhan is with the College of Artificial Intelligence, Nankai
University, Tianjin 300350, China, and also with the School of Com- the methods for mission assignment can be roughly classified
puter Science and Engineering, South China University of Technology, into four categories, i.e., mixed-integer linear programming
Guangzhou 510006, China (e-mail: zhanapollo@163.com). [8], market-based methods [9], evolutionary computation [10],
Yun-Liang Jiang is with the School of Computer Science and Tech-
nology, Zhejiang Normal University, Jinhua 321004, Zhejiang, China, and learning-based methods [11]. Among these methods,
and also with the School of Information Engineering, Huzhou University, mixed-integer linear programming usually requires a long
Huzhou 313000, Zhejiang, China (email: jyl2022@zjnu.cn). time for instances with a large number of target missions.
Jun Zhang is with Hanyang University, Ansan 15588, South Korea,
and also with Zhejiang Normal University, Jinhua 321004, China (e-mail: Market-based methods use auction mechanisms to determine
junzhang@ieee.org). the AUV assigned to each target mission. They are simple

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
2 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX

but may not find the optimal solutions since they use local v1 S v2S v3 S v5 S
information only. Evolutionary computation models the mis- v1 S v2S v3 S v5 S

sion assignment as a combinatorial optimization problem and v1 T

adopts multiple strategies for population evolution to approx- v2T v3T

imate the optimal solutions, such as genetic algorithm. These v1 T v2T v3T
methods have shown good performance especially on large-
scale instances. Learning-based methods use machine learning (a) G (b) G0
techniques to assign missions, such as Q-learning [12] and Fig. 1. An example of environment modeling: (a) the search graph G,
self-organizing map [13]. The performance of these algorithms (b) the cost map G0 , where VS = {v1S , v2S , v3S , v5S } is the set of can-
depends greatly on the training data and the learning strategy. didate points for AUV launch and recovery, and VT = {v1T , v2T , v3T }
is the set of target vertices.
Given the assigned missions, the paths of multiple AUVs are
planned using a multidimensional random tree star algorithm
[14]. However, the method is task-related. A genetic algorithm
location and path planning. Multiple populations are adopted
is employed to plan paths of AUVs for sensor placement [15].
to find the optimal solutions for maintaining population di-
Differential evolution is adopted to plan the path of each AUV
versity since the multiple populations for multiple objectives
according to the estimated environmental scalar field [16],
(MPMO) framework has shown good performance on multiob-
[17]. These methods are designed for small-scale missions.
jective optimization [22]. The MPMO framework creates one
In addition, the Q-learning method is applied to AUV path
population for every objective and the multiple populations
planning for reducing the energy consumption and travel
coevolve towards the Pareto fronts [23]. Particularly, solutions
distance of AUVs [11]. An on-policy reinforcement learning
are represented using a bi-level encode scheme, which includes
approach plans paths to target missions without any collisions
the surface location points in the first layer and the voyages
by gaining information and rewards from the environment
of AUVs in the second layer. In addition, a recombination-
[18]. However, the trained network may not work well in new
based sampling strategy is developed to fuse information from
environments that are quite different from the trained ones.
multiple populations for improving algorithm convergence.
There are also some works deal with the mission assignment
Once a change occurs, an incremental response strategy is
and path planning simultaneously. For example, a clustering
adopted to update the outdated populations using historical
method is adopted for task assignment and ant colony opti-
information. Experiments are performed on six complex sce-
mization is used to construct paths of multiple AUVs [19]. A
narios, which are constructed with irregular obstacles and
biologically inspired neural network is adopted to select one
risky areas based on the dataset of New Zealand bathymetry.
AUV and plan the path to each target mission according to
The experimental results show the advantage of the proposed
the active values in the grid maps of underwater environments
method over existing methods in terms of solution diversity
[20]. However, the method is only applied to a simple un-
and optimality.
derwater scenario with a size of 10m×10m×10m. The risky
The contributions of the work are summarized as follows:
areas and other features of complex underwater environments
• A new dynamic multiobjective optimization model is
are not concerned. In addition, reinforcement learning provides
rewards of allocation strategies and waypoints to guide particle developed for multi-AUV path planning.
• A new cooperative evolutionary computation algorithm is
swarm optimization to find the optimal strategy and paths in
a real-time rescue system [21]. However, the training samples proposed to plan the paths of multiple AUVs, which visit
of reinforcement learning are rare and only small-scale target a large number of changing target missions in a complex
missions are concerned. environment.
To our best knowledge, most of existing works consider The paper is organized as follows. Section II gives the
one optimization objective only, such as energy consumption, background. Section III presents the details of the proposed
load balance, and mission running time. However, multiple method. Experiments are performed in Section IV and sum-
optimization objectives are usually expected by decision mak- maries are concluded in Section V.
ers. For example, energy consumption and load balance are
usually concerned in complex scenarios for efficiently utilizing II. B ACKGROUND
the AUVs. In addition, these works often assume that the
target missions are invariant. Indeed, new target missions A. Problem Formulation
can be discovered constantly, such as in rescue scenarios, Given multiple target traveling missions M =
and they may also change due to environmental factors. {m1 , ..., mN } distributed in a large and complex underwater
Thus, this paper studies the multi-AUV path planning with environment, a multi-AUV system determines the location of
multiple optimization objectives and changing target missions movable surface vehicles and launches AUVs from them to
in complex underwater environments. complete missions in parallel. Every AUV can be released
This paper models the multi-AUV path planning as a into water multiple times and recovered before the energy is
dynamic multiobjective optimization problem and proposes a exhausted, resulting in multiple voyages. Among all voyages,
cooperative evolutionary computation algorithm, named CEA, each target mission is required to be visited by one voyage
to solve it. In the proposed method, the underwater envi- exactly once. The total payload of a voyage is the sum of
ronment is converted into a search graph for surface point the payload requirement li of each visiting target mission mi

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
AUTHOR et al.: TITLE 3

and should not be higher than the payload capacity Lmax of successive waypoints wji,k and wji,k+1 as
an AUV. To obtain the optimal voyages in terms of resource n−1
X m−1
utilization, this paper considers two common objectives, i.e.,
X
ht(pj ) = height(wji,k , wji,k+1 ) (4)
minimize the total cost f1 of all voyages and minimize the i=0 k=0
maximum ending time f2 of all voyages. Particularly, f1 is
The turning cost of pj is the sum of the turning cost
related to the energy consumption and fixed costs of AUVs
and f2 is about the load balance among voyages. Since turn(wji,k ) of all waypoints wji,k
the target missions may be changed due to ocean flows or si,k−1 si,k
j j
other environmental factors, the multi-AUV path planning is turn(wji,k ) = 1 − cos(θji,k ) = 1 − (5)
formulated as a dynamic multiobjective problem |si,k−1
j ||si,k
j |

minimize F (X, t) = (f1 (X, t), f2 (X, t)), where si,k−1


j is the line segment connecting wji,k−1 and wji,k ,
X and sj is the one connecting wji,k and wji,k+1 . Similarly, the
i,k
f1 (X, t) = gj + wc · nv ,
risk cost of a voyage is the sum of the risk intensity of each
1≤j≤nv
line segment between waypoints in the voyage as
f2 (X, t) = max hj ,
1≤j≤nv n−1
XX m
(1) risk(pj ) = rt(wj0,0 ) + rt(wji,k ),
X
s.t. xij = 1, ∀1 ≤ i ≤ N,
i=0 k=1
1≤j≤nv
nr
(6)
X X
li · xij ≤ Lmax , ∀1 ≤ j ≤ nv , rt(wji,k ) = RI × max{0, Rr − ||wji,k − rcr ||)}/Rr .
1≤i≤N r=1
gj ≤ Emax , ∀1 ≤ j ≤ nv . where r is the index of a risk area, nr is the number of risk
areas, RI is the maximal risk intensity, Rr is the maximal risk
where F (X, t) is the fitness function vector, X is a solution
radius, and rcr is the center of r-th risk area.
including multiple voyages to travel all target missions, t is
the time, f1 (X, t) is the traveling cost of all voyages, gj is
the energy consumption of j-th voyage, wc is the fixed vehicle B. Dynamic Multiobjective Optimization
cost for the launch and recovery of an AUV, nv is the number To solve dynamic multiobjective optimization problems, an
of voyages, f2 (X, t) is the maximum completion time among algorithm is expected to track the changing Pareto optimal
all voyages, hj is the completion time of j-th voyage, xij solutions. To deal with the multiobjective issue, existing
represents whether a target mi is involved by voyage pj , li methods can be roughly classified into three categories, i.e.,
is the payload requirement of the target mi , Lmax and Emax Pareto local search, metaheuristics, and learning-based meth-
are the maximum payload capacity and the maximum available ods. First, Pareto local search locates a large number of initial
energy of an AUV, respectively. solutions and searches the neighboring areas [24]. Exploration
Particularly, the energy cost gj of a voyage pj is calculated and exploitation are usually balanced to improve algorithm
following [19] as performance [25]. To reduce the computational cost, ND-tree
is developed for archive update [26]. Collaborative mecha-
gj = w1 · len(pj ) + w2 · ht(pj ) nisms are proposed to facilitate the search of neighboring
(2)
+ w3 · turn(pj ) + w4 · risk(pj ) areas [27]. Second, metaheuristics use multiple strategies to
enhance solution diversity and improve algorithm convergence
where w1 , w2 , w3 , and w4 are the weights of different energy
[28], such as nature-inspired genetic algorithm [29], particle
costs, i.e., traveling length cost len(pj ), height cost ht(pj ),
swarm optimization [30], [31], and ant colony system [32].
turning cost turn(pj ), and risk cost risk(pj ). To calculate
Metaheuristics have successfully solved many multiobjective
the energy costs of a voyage, the trajectory between each pair
combinatorial optimization problems. Third, learning-based
of successive points in a voyage is defined first. Denote a
methods use machine learning techniques to extract the pattern
voyage as pj = (vj0 , ..., vjn = vj0 ), where vj0 is the surface
for solution construction or evaluation [33]. For example,
location point to launch the AUV, vji (1 ≤ i ≤ n − 1) is the
Gaussian process models are adopted to approximate the
i-th target mission visited, and the trajectory tij between vji
objective functions and search for Pareto optimal solutions
and vji+1 is a sequence tij = (wji,0 = vji , ...., wji,m = vji+1 ) within trust regions [34]. Features are extracted to characterize
of waypoints (turning points to avoid obstacles). The length the landscape for performance prediction [35]. Graph neural
cost of pj is the sum of Euclidean distances between any two networks are trained to generate solutions for large-scale com-
successive waypoints as binatorial optimization problems [36]. Attention-based models
n−1
X n−1
X m−1
X are trained to learn the heuristic for routing problems [37].
len(pj ) = dist(vji , vji+1 ) = ||wji,k − wji,k+1 || In response to changes, multiple methods are developed
i=0 i=0 k=0 to generate new high-quality solutions, i.e., solution reuse,
(3) prediction, transfer learning, and reinforcement learning. Par-
where ||wji,k − wji,k+1 || is the Euclidean distance between the ticularly, solution reuse methods generate new solutions based
waypoints wji,k and wji,k+1 . The height cost of pj is the sum of on historical ones [38], [39]. Prediction methods learn the
the height differences height(wji,k , wji,k+1 ) between any two changing pattern from historical information for prediction.

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
4 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX

For example, an interaction-based prediction method is pro- be calculated using the shortest path faster algorithm [19],
posed to predict task scheduling for multiple robots in a coop- which is adopted due to its popularity and low computational
erative multirobot system [40]. Prediction methods may fail if complexity on spare graphs.
the changes do not follow a stable pattern. In contrast, transfer Since the voyage planning involves the surface location
learning shifts historical data to adapt to new environments. points and target missions only, we induce a subgraph G0 =
For example, historical data is shifted to new environments (VS ∪ VT , E 0 ) of G by a subset VS ∪ VT of the vertex set V ,
for training a surrogate using transfer learning techniques [41]. where E 0 contains an edge in E if and only if both endpoints
Reinforcement learning trains a policy network to give actions of this edge are in VT or one endpoint in VS and one endpoint
when interacting with environments [42]. It requires a large in VT . We also add an edge e between vertices from VS ∪ VT
training set to get good performance. to G0 if there is a shortest path between them in G. Note that
edges between vertices from VS are not added to G0 since they
III. T HE P ROPOSED C OOPERATIVE E VOLUTIONARY are not used for voyage planning. The energy and time costs
C OMPUTATION A LGORITHM of the shortest path are assigned to the edges. The resulting
A. Underwater Environment Modeling G0 provides a cost map for voyage planning as illustrated in
Fig. 1(b).
The 3D underwater environment can be represented by a
graph. To set up this correspondence, the underwater envi-
ronment is evenly divided into multiple cubes, which are B. Cooperative Evolutionary Computation Algorithm
represented by vertices. Edges connect two vertices if the Based on the cost map G0 , CEA searches for the surface
cubes represented by these vertices are adjacent. The resultant location points and optimal voyages to travel all targets first.
graph G = (V, E) is called a search graph, where V is the set Then, between any two successive vertices in voyages, a
of cubes and E is the edge set. trajectory is planned based on G to form complete paths of
Denote the length, width, and height of the underwater AUVs.
environment as L, W , and H, and those of a cube as LC , 1) Solution Representation: Denote the number of surface
WC , and HC . Each cube is indexed by a triple-tuple (i, j, k) points to locate as K. A solution X is represented by a bi-level
and its center point is denoted as cijk = (x, y, z) encode scheme
x = (i + 0.5)LC , y = (j + 0.5)WC , z = (k + 0.5)HC . X = (S, P ),
(7) S = (s1 , ..., si , ..., sK ) = (a1 , ..., a2K ), (8)
where 1 ≤ i ≤ L/LC , 1 ≤ j ≤ W/WC , and 1 ≤ k ≤
P = (p1 , ..., pi , ..., pK ).
H/HC . Every line segment connecting the center points of two
adjacent cubes represents an edge. Since each cube is adjacent where S is the surface location points of movable surface
to 26 cubes surrounding it, |E| ≤ 13|V |. Specifically, a cube vehicles, si is i-th surface point selected from VS . A surface
is considered as infeasible if it intersects with an obstacle location point can be represented by a 2-tuple cube index,
or the bathymetry and is then removed from G to ensure i.e., (m, n), since the height is definite. Thus, S is a 2K-
the connectivity of vertices. A cube is risky if it intersects dimensional vector and ai is a number related to a special
with a risky area, and the risky value is defined according surface point. In addition, P is the optimal voyages to travel
to its distance to the risky area. The cubes interacting with all targets and includes K vertex sequences. Particularly, pi is
water surface make up a subset VS of V , which contains all a sequence, e.g., x0 = si , ..., xj , ..., xn−1 , xn = si (xj ∈ VT ,
candidate points for AUV launch and recovery. The cubes in ∀1 ≤ j ≤ n−1), of vertices for a voyage or the combination of
which targets locate are called target vertices and make up a multiple voyages starting from si . Two voyages are separated
subset VT of V . An example of G is illustrated in Fig. 1(a). by the surface point si . For example, if two voyages start
The problem of multi-AUV path planning is equivalent to from si , pi is a sequence, e.g., x0 = si , ..., xj , ..., xn =
the problem of selecting vertices from VS and finding the si , xn+1 , ..., xm+n = si (xj ∈ VT , ∀1 ≤ j ≤ n − 1, n + 1 ≤
“shortest” paths that visit every target vertex in VT exactly j ≤ m + n − 1), of vertices.
once. A problem involving energy cost and traveling time 2) Evolution of Multiple Populations: Two populations are
can be modeled by assigning energy costs and traveling time adopted to optimize different objective functions. One popu-
between vertices to the edges. The resultant graph that has a lation finds the optimal solution of f1 and another one for f2 .
number assigned to each edge is called a weighted graph G. In both populations, the surface location points of individuals
In G, two vertices u and v are connected by a path if there is are updated using differential evolution [43] and the voyages
a sequence of vertices starting with u and ending with v such are constructed using the ant colony system [44].
that the endpoints of each edge in the path are vertices which In i-th (1 ≤ i ≤ 2) population, NP individuals are initialized
are adjacent. Let the energy or time cost of a path between first. Since Sij of an individual Xij = (Sij , Pij ) is real
vertices be the sum of the weights of the edges of this path. A vectors, it is randomly generated in the search space or set
path is shortest if it takes the lowest energy consumption and as the surface points closest to the cluster centers of VT by
the shortest time. Specifically, if two paths are nondominated performing K-means [45]. For the voyage planning of Pij ,
on energy consumption and time cost, then the one with a a N × N pheromone matrix τi is used to record the search
lower energy cost is considered shorter since energy cost is a experience. Each τi (u, v) represents the preference of selecting
critical constraint. The shortest path between two vertices can v after u and is initialized as τi0 = 1/(N fi (X 0 )), where N

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
AUTHOR et al.: TITLE 5

is the number of target missions, X 0 is a solution constructed C1:{1,3,5,6}

using a greedy mechanism, in which the target missions are s1 s2 a


divided into multiple groups using K-means clustering and the P1
3 2 6 5 4 7 1 8 3 6 5 1
voyages of each group of target missions are constructed using
the record-to-record travel algorithm [46]. In X 0 , the surface
location points are the cubes closest to the cluster centers of s1 s2 b
P2
VT . The best solution in the population is denoted as Xi,best = 5 4 8 2 3 1 6 7 5 3 1 6
(Si,best , Pi,best ). In every generation, every individual Xij
0 0
is updated using a new solution Xij = (Sij , Pij0 ), where
0 0 0 0 0 0
Sij = (sij1 , ..., sij2K ) and Pij = (pij1 , ..., pijK ).
0 0 1 3 5 6 1 3 5 6
Surface Point Location: The Sij of Xij is generated using
the “DE/best/1” mutation and crossover operators. A mutant voyages Splitting
0
vector Sij is first generated by
Fig. 2. An example of the voyage planning for a set C1 = {1, 3, 5, 6}
0 of target missions in the recombination-based sampling strategy.
Sij = Si,best + F (Sir1 − Sir2 ) (9)

where F is the scale factor, r1 and r2 are two distant random where pro(v) is the selection probability of v as
integers in the range of [1, NP]. Then the crossover operator
is performed to generate a trial vector τi (xt−1 , v)ηi (xt−1 , v)β
pro(v) = P β
,
( u∈T τi (xt−1 , u)ηi (xt−1 , u) (13)
0
0
Sijd , if r < CR or d = drand , 1
Sijd = (10) ηi (xt−1 , v) = .
Sijd , otherwise. ci (xt−1 , v)
r is a random number in the range of [0,1], and q0 is a
where 1 ≤ d ≤ 2K is the dimension index, r is a random
predefined parameter. If r ≤ q0 , the target vertex with the
number in the range of [0,1], CR is the crossover rate, and
highest probability is selected; otherwise, roulette selection is
drand is a random integer in the range of [1, 2K]. Specifically,
performed to choose v 0 based on the probability distribution. In
if some target missions are unreachable by all surface points in
0 0 addition, τi (xt−1 , v) is the pheromone value of v, ηi (xt−1 , v)
Sij , then Sij is readjusted as follows. The closest reachable
0 is the heuristic value of v, and ci (xt−1 , v) is the traveling
surface points to these targets and the surface points in Sij
cost between xt−1 and v according to fi . Specifically, if T is
are collected to perform K-means clustering. The surface
0 empty, then the voyage p is added to p0ijk and a new one is
points closest to the cluster centers are used to update Sij .
constructed until Ck is empty. After finishing the construction
Specifically, if some target missions are still unreachable after
of voyages of all Ck , the local update of the pheromone is
five tries, the fitness value of the solution is set as a very large
performed
value. (
Multi-AUV Voyage Planning: Based on Sij 0
, the targets are τi0 , if (u, v) ∈ Pij0 ,
assigned to the nearest surface location point according to the τi (u, v) = (1−ρ)τi (u, v)+ρ∆τ, ∆τ =
0, otherwise.
optimization objective, resulting in multiple subsets. Denote (14)
the set of the targets assigned to k-th surface point s0k as Ck = where ρ is the evaporation parameter.
{v1 , ..., v|Ck | }. Every Ck is used to construct voyages starting Then Xij0
is evaluated and used to update Xij and Xi,best
from s0k . To achieve this, targets are selected from Ck step by if it is better on fi . After the update of all individuals, the
step. In the first step, one target x1 is randomly selected from global update of the pheromone is performed
Ck and a voyage is initialized as p = (s0k , x1 ). The selected
target x1 is then removed from Ck . At t-th step, the set of τi (u, v) = (1 − α)τi (u, v) + α∆τ, (15)
available target vertices to select is where
(
T = {v | c0 (p) + c0 (xt−1 , v) + c0 (v, s0k ) ≤ Emax & 1/fi (Xi,best ), if (u, v) ∈ Pi,best ,
(11) ∆τ = (16)
l(p) + l(v) ≤ Lmax , ∀v ∈ Ck }. 0, otherwise.

where c0 (p) is the energy cost of the voyage to the last target α is the pheromone decay parameter, and fi (Xi,best ) is the
xt−1 in the voyage p, c0 (xt−1 , v) is the energy cost between fitness value of the best solution. After both two populations
xt−1 and v, c0 (v, s0k ) is the energy cost between v and s0k , l(p) are updated, their individuals are collected together with the
is the total payload of the current voyage p before visiting v, archived solutions and the nondominated ones are stored in
and l(v) is the payload requirement of v. Then one target xt the archive.
is selected from T according to the state transfer rule 3) Recombination-based Sampling Strategy: Since each
 population optimizes one objective only, they mainly gather
 arg max pro(v), if r ≤ q0 , around the margins of the Pareto front. To reach the areas
xt = v∈T (12) missed by the populations, a recombination-based sampling
v 0 , otherwise. strategy is developed to generate new solutions by fusing

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
6 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX

information of different populations. First, two solutions X1 = reconstructed using the ant colony system. Similarly, for the
(S1 , P1 ) and X2 = (S2 , P2 ) are randomly selected from the archived solutions, the surface location points keep the same
archive as parents. Then the first part S 0 of surface location and the voyages are reconstructed using the ant colony system
points of a new solution X 0 = (S 0 , P 0 ) is generated using based on the pheromone and heuristic information of a random
simulated binary crossover and polynomial mutation [47], population. The populations and the archived solutions are
and P 0 is constructed by recombining the vertices in parents evaluated and the nondominated ones are stored in the archive.
P1 = (p11 , ..., pK 1 K
1 ) and P2 = (p2 , ..., p2 ). The construction The populations then continue to evolve until meeting new
0
procedure of P is described as follows. To reduce the traveling changes or achieving termination conditions.
cost, each target vertex is assigned to the nearest surface 5) Trajectory Planning: The trajectory planning is performed
location point in S 0 , resulting in one cluster Ck for every at the end of each time step. After finishing the voyage
s0k (1 ≤ k ≤ K). For the vertices in Ck , two sequences a planning, the trajectory of AUV between tow adjacent vertices
and b are constructed according to their orders in P1 and P2 , (u, v) in the voyages is planned using A* search [49]. Indeed,
respectively. The two sequences represent the traveling pref- the trajectory between u and v can be recovered from the
erence of parents and can be used to generate new sequences shortest path searching in G during the environment modeling
using crossover. Two crossover points are randomly selected procedure, but it may require a large memory. Thus, if the
to split a into three segments, i.e., a1 , a2 , and a3 . Similarly, b shortest path is not stored, A* search is used for finding the
is split into b1 , b2 , and b3 . Then a new sequence is obtained by optimal path between u and v in G. To ensure the optimality of
combining a3 , a1 , and b2 in order. Notably, repeat targets are A* search, the heuristic value (the estimated cost) of a vertex
removed. Specially, the unvisited targets in Ck are appended x to the endpoint v should be not higher than the optimal
into the new sequence according to their orders in b. The new traveling cost between them [19]. We assume that there is a
sequence is then used to construct voyages using the splitting direct line segment between x and v in G and set its traveling
method [48], which splits the sequence into (near) optimal cost as the heuristic value of x.
voyages.
An example of the recombination-based sampling is illus-
C. Worst Computational Complexity of CEA in a Time
trated in Fig. 2. There are two surface points s1 and s2 .
Step
The parent P1 includes two voyages p1 = {s1 , 3, 2, 6, 5, s1 }
and p2 = {s2 , 4, 7, 1, 8, s2 }. The voyages in P2 are p1 = The computational complexity of CEA includes those of
{s1 , 5, 4, 8, 2, s1 } and p2 = {s2 , 3, 1, 6, 7, s2 }. Given a set four parts, i.e., cost map building, initialization, evolution of
of target missions C1 = {1, 3, 5, 6}, we rearrange the tar- populations, and incremental response strategy. Denote the size
get missions in C1 based on their order in p1 and p2 of of a population as NP, the number of target missions as N , the
P1 , resulting in a sequence a = {3, 6, 5, 1}. Similarly, a number of surface points as K, the number of optimization
sequence b is obtained from P2 , i.e., b = {5, 3, 1, 6}. Then objectives as M , the number of iterations as I, the number
two random points are adopted to cut a and b into three of vertices in the cost map G0 as |V 0 |, the average times of a
subsequences, i.e., a1 = {3}, a2 = {6, 5}, a3 = {1}, and cube entering the queue in the shortest path faster algorithm
b1 = {5}, b2 = {3, 1}, b3 = {6}. A new sequence is generated as k, and the number of vertices in the graph G as |V 0 |.
by concatenating a3 , a1 , and b2 , i.e., c = {1, 3}. The remaining First, the computational complexity of the cost map building
target missions, i.e., 5 and 6, are appended to c in their order is O(k × N × |V 0 |).
in b, resulting in c = {1, 3, 5, 6}. Finally, the vertex sequence Second, in the initialization, the setting of initial surface
c is split to generate multiple new voyages. points requires O(N Knc ) computations for an individual,
In each generation, |A|/2 solutions are generated using the where nc is the number of iterations for k-means clustering
recombination-based sampling, where |A| is the size of the and is usually a small value. The initialization of voyages
archive. The newly generated solutions are used to update the requires O((nd nR nU nL N + N + N 2 )nP ) computations using
archive and only nondominated ones are preserved. Through the record-to-record travel algorithm, where nd = 30 is the
the evolution of multiple generations, the algorithm converge number of loops in the diversification phase, nR = 30 is the
towards the Pareto front. The archive provides multiple high- size of the neighbor list, nU = 3 is the number of heuristic
quality solutions to decision makers. used, nL = 5 is the number of local minima to reach, nP is
4) Incremental Response Strategy to Changes: Due to the the number of perturbations. Thus, the initialization of M ×NP
ocean flows or other factors, the target missions may change individuals requires O(M × N × K × NP + M × N 2 × NP)
with time. For example, some target missions are canceled computations.
and some are changed to other locations due to the risk; new Third, in every iteration, the update of surface points in
target missions arrive in search and rescue. In response to an individual requires O(K) computations. The clustering
such changes, the paths of AUVs should be replanned based of target missions based on surface points requires O(N K)
on a newly constructed cost map G0 . Since targets are still computations. Voyage planning requires O(N 2 ) computations.
distributed in a large underwater environment, the surface The local update of a pheromone matrix requires O(N 2 )
location points and voyages found before may be useful for the computations. The global pheromone update of the pheromone
planning of new target missions. Thus, to utilize the historical matrix requires O(N 2 ) computations. Thus, the update of M
search experience, the surface location points of all individuals populations requires O(((K + N K + N 2 + N 2 ) × NP +
in the populations remain the same and their voyages are N 2 ) × M ) = O(N × M × K × NP + N 2 × M × NP)

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
AUTHOR et al.: TITLE 7

computations, which can be reduced to O(M × N 2 × NP)


since we have K  N . The recombination-based sampling
strategy requires O(N K +N 2 ) = O(N 2 ) computations. Then
generating solutions using the recombination-based sampling
strategy requires O(N 2 nA ) computations, where nA is the
number of solutions in the archive. The archive update re- (a) I1 (b) I2 (c) I3
quires O((M × NP + nA )2 × M ) computations. Thus, the
computational complexity of I iterations is O((M × N 2 ×
NP + N 2 nA + (M × NP + nA )2 × M ) × I), which can be
reduced to O(M 3 × NP2 × I + M × N 2 × NP × I) since we
have nA ≈ M × NP in the experiments. At the end of the
algorithm, trajectory planning is performed for the archived
solutions. The worst complexity of the trajectory planning is
O((|V | + |E|)N nA ) = O(M × N × |V | × NP). (d) I4 (e) I5 (f) I6
Fourth, in the incremental response strategy, the update of Fig. 3. The six underwater environment scenarios (a)-(f) I1 − I6 .
populations and the archive requires O(M × N 2 × NP + M 3 ×
NP2 ) computations.
Therefore, the worst complexity of CEA in a time step is the target missions and adopts local search heuristics (i.e.,
O(k × N × |V 0 | + M 3 × NP2 × I + M × N 2 × NP × I + M × record-to-record travel algorithm) to generate voyages for each
N × |V | × NP). cluster [46]. NSGMP uses edge recombination crossover and
swap mutation to generate new solutions, which are further
IV. E XPERIMENTS refined by a local greedy search [50]. Nondominated solutions
are selected using the crowding comparison procedure [51].
A. Experimental Settings
MOEA/D-DE-C-ACO decomposes the problem into multiple
1) Underwater Environment Scenarios: To simulate the un- single-objective subproblems using weight vectors, and every
derwater environments, the dataset of 250m resolution grid- subproblem is optimized using DE-C-ACO for multi-AUV
ded New Zealand bathymetry1 [19] is used to construct six path planning [19]. Once a change occurs, half of the pop-
complex underwater scenarios, named I1 − I6 , with complex ulations are reinitialized.
bathymetry, irregular and branched obstacles, and global risks, The parameters of the competing algorithms are set the same
as illustrated in Fig. 3, where the 3D surface is the bathymetry, as in the references. In K-VRPH, the size of the main loop
red blocks are obstacles, blue balls are risky areas, and black is nL = 20, δ = 0.01 to control the deterioration degree of
points are target missions. The length, width, and depth of the the objective function, the size of the neighbor list is nN =
underwater environment is 50km, 50km, and 10km, respec- 30, and the number of times that the solution is perturbed
tively. The underwater environment is modeled as a search once the search is stuck in a local optimum nP = 2. In both
graph based on cubes with a size of 250m×250m×50m. NSGMP and MOEA/D-DE-C-ACO, the population size is set
Each scenario instance includes 400 target traveling missions as NP=100. In NSGMP, the crossover rate is cr = 1.0, and
randomly distributed underwater. Each target mission requires the mutation rate is mr = 1/D, where D is the problem
1 unit payload. Each instance experiences nine changes, in dimension. In MOEA/D-DE-C-ACO, F = 0.75, CR = 0.95,
which the positions of 10% target missions are randomly β = 2, q0 = 0.9, α = ρ = 0.1. In CEA, NP = 10, F = 0.75
changed. The change frequency is set as 20,000 function and CR = 0.95 following [19], and the parameters in the ant
evaluations. In the following, we call the environment between colony system is set following [44], i.e., β = 2, q0 = 0.9, α =
two changes as a time step. The available energy of an AUV ρ = 0.1. The distribution indexes for crossover and mutation
is 1200, the payload capacity is 15, and the speed is 2m/s. The in the sampling strategy are set as 20.
weights in cost functions w1 , w2 , w3 , w4 , and wc are mission- 3) Performance Metrics: Two widely used performance met-
related and set according to the AUV operators provided by rics, mean inverted generational distance (MIGD) and mean
decision makers. In the experiments, w1 = 1, w2 = 10, hypervolume (MHV), are adopted to evaluate the algorithm
w3 = 50, w4 = 50, and wc = 200 following [19]. The performance. The MIGD is calculated as
maximum number of surface points is set as 5. P ∗
2) Competing Algorithms: Since there are few works dealing 1≤t≤Nchange +1 IGD(P Ft , Pt )
M IGD = ,
with the dynamic multiobjective path planning of multiple Nchange + 1
P ∗ (17)
AUVs, three typical algorithms are adopted for comparisons, p∗ ∈P Ft∗ minp∈Pt d(p , p)

i.e., the vehicle routing heuristics with K-means (K-VRPH) IGD(P Ft , Pt ) = .
|P Ft∗ |
[46], non-dominated sorting-based genetic mission planner
(NSGMP) [50], and multiobjective evolutionary algorithm where t is the index of the time step, Nchange is the number
based on decomposition with DE-C-ACO (MOEA/D-DE-C- of changes, P Ft∗ is the Pareto front of the instance at t
ACO) [19]. Particularly, K-VRPH uses K-means to cluster and approximated using the nondominated solutions of all
algorithms, Pt is the set of nondominated solutions found by
1 https://niwa.co.nz/oceans/resources/bathymetry/download-the-data an algorithm at time t, and |P Ft∗ | is the number of points in

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
8 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX

TABLE I  &($


&($
PARAMETER SETTINGS OF ALL ALGORITHMS QR56
QR56
 QR,6
Algorithm Parameter values QR,6


f2
f2
K-VRPH nL = 20, δ = 0.01, nN = 30, nP = 2
NSGMP NP=100, cr = 1, mr = 1/D 
MOEA/D-DE-C-ACO NP=100, F = 0.75, CR = 0.95, β = 2, q0 = 0.9, α = ρ = 0.1
CEA NP=10, F = 0.75, CR = 0.95, β = 2, q0 = 0.9, α = ρ = 0.1

    
TABLE II f1 f1
MHV RESULTS OF CEA AND ITS TWO VARIANTS NO RS AND NO IS (a) 2nd (b) 3rd
Ins noRS noIS CEA
I1 3.98e-02±4.46e-03 + 5.40e-02±2.36e-03 + 5.87e-02±5.88e-03 &($ &($
I2 1.02e-01±1.41e-02 + 1.18e-01±7.36e-03 + 1.36e-01±1.17e-02 QR56
 QR56
I3 7.65e-02±3.86e-03 + 9.43e-02±3.58e-03 = 9.57e-02±4.74e-03 QR,6 QR,6


f2
I4 4.79e-02±3.99e-03 + 6.41e-02±4.22e-03 + 7.28e-02±1.08e-02

f2
I5 6.09e-02±3.85e-03 + 7.45e-02±3.82e-03 = 7.60e-02±5.82e-03 
I6 5.15e-02±5.36e-03 + 6.38e-02±4.36e-03 = 6.72e-02±6.25e-03
Best 0 0 6
+/=/- 6/0/0 3/3/0 
    
f1 f1
(c) 4th (d) 5th
P Ft∗ . Since the PF is unknown, the nondominated solutions
obtained by all algorithms in all runs are used as reference &($
 QR56
&($
points to calculate IGD. The mean hypervolume is calculated QR56
QR,6  QR,6
as

f2

f2
P
1≤t≤Nchange +1 HVt 
M HV = (18) 
Nchange + 1
where HVt is the hypervolume of the nondominated solutions 
    
obtained by an algorithm based on a reference point (1, 1). The f1 f1

solutions are normalized using (1.1f1,max , 1.1f2,max ), where (e) 6th (f) 7th
f1,max and f2,max are the maximum fitness values in P Ft∗ . Fig. 4. Nondominated solutions obtained by noRS, noIS, and CEA in
Each algorithm performs 20 runs, and the median values 2nd -7th time steps of I2 .
and the standard deviations of the performance metrics are
reported. Wilcoxon rank-sum test is performed between the TABLE III
proposed method and every competing algorithm at a 5% MIGD RESULTS OF ALL ALGORITHMS
significance level. The results are signed as “+”, “=”, or “- Ins K-VRPH NSGMP MOEA/D-DE-C-ACO CEA
I1 1.22e+04±3.13e+02 + 3.86e+04±6.31e+02 + 8.57e+03±1.28e+02 + 8.17e+03±2.57e+02
” to represent that the proposed method is significantly better, I2 2.52e+04±1.28e+02 + 3.64e+04±1.32e+03 + 2.36e+04±6.39e+01 + 2.33e+04±1.27e+02
I3 9.90e+03±2.41e+02 + 4.08e+04±5.06e+02 + 7.12e+03±1.95e+02 + 6.96e+03±1.63e+02
equal to, or worse than the competing algorithm, respectively. I4 1.42e+04±2.25e+02 + 4.89e+04±4.46e+02 + 1.13e+04±9.77e+01 + 1.09e+04±2.42e+02
I5 1.76e+04±3.32e+02 + 4.43e+04±3.42e+02 + 1.53e+04±7.99e+01 + 1.47e+04±2.50e+02
I6 1.93e+04±2.17e+02 + 3.96e+04±3.98e+02 + 1.70e+04±1.37e+02 + 1.60e+04±3.21e+02
Best 0 0 0 6
B. Effects of Components in CEA +/=/- 6/0/0 6/0/0 6/0/0

This subsection tests the effects of the recombination-based


sampling strategy and the incremental response strategy. Two
To further observe the solution quality, taking I2 as an
CEA variants are constructed for test, i.e., noRS that does
example, Fig. 4 illustrates the nondominated solutions ob-
not use the recombination-based sampling strategy and noIS
tained by noRS, noIS, and CEA in 2nd to 7th time steps. One
that does not adopt the incremental response strategy. The
unit of f2 represents one minute. Note that the incremental
MHV results of noRS, noIS, and CEA are reported in Table II.
response strategy works after 2nd time step. From Fig. 4, the
From Table II, CEA performs significantly better than noRS on
solutions of CEA are better convergent and distributed than
all instances, showing that the recombination-based sampling
that of two variants in each time step. In contrast, the solutions
strategy can improve algorithm performance. On the one hand,
of noRS are far away from the Pareto front, showing poor
the recombination-based sampling strategy can generate high-
convergence. Similarly, though a few solutions of noIS are
quality solutions for enhancing solution diversity. On the
nondominated with that of CEA in 2nd time step, the solutions
other hand, the sampling strategy takes the advantages of
lack enough diversity and are less convergent in other time
multiple archived solutions to improve solution convergence
steps. This shows that the incremental response strategy can
towards the Pareto front. Compared with noIS, CEA performs
well utilize the historical solutions to improve solution quality
significantly better on 3 out of 6 instances, whereas there is no
and enhance diversity. In addition, the effect of the response
significant difference between them on other instances. This
strategy becomes more significant as time progresses due to
shows that the historical information can assist the algorithm
the accumulation of experience.
to quickly find new high-quality solutions after changes. In
addition, CEA obtains the best results among three algorithms
on all instances. Thus, both the recombination-based sampling C. Compared with Other Algorithms
strategy and the incremental response strategy play important The MIGD and MHV results of all algorithms on the
roles in the proposed method. six instances are reported in Tables III and IV. From Table

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
AUTHOR et al.: TITLE 9

TABLE IV
&($  &($
MHV RESULTS OF ALL ALGORITHMS
.953+ .953+

Ins K-VRPH NSGMP MOEA/D-DE-C-ACO CEA 16*03 16*03
I1 7.34e-03±3.59e-03 + 2.41e-02±3.06e-03 + 5.22e-02±1.71e-03 + 5.87e-02±5.88e-03 

f2

f2
02($''(&$&2 02($''(&$&2
I2 1.44e-03±2.61e-03 + 8.56e-02±4.63e-03 + 1.12e-01±4.28e-03 + 1.36e-01±1.17e-02
I3 1.14e-02±5.17e-03 + 3.74e-02±2.18e-03 + 8.81e-02±6.97e-03 + 9.57e-02±4.74e-03
I4 0.00e+00±2.60e-03 + 1.89e-02±8.16e-04 + 6.92e-02±4.82e-03 = 7.28e-02±1.08e-02
I5 2.90e-03±4.20e-03 + 2.55e-02±1.88e-03 + 6.69e-02±5.38e-03 + 7.60e-02±5.82e-03  
I6 7.72e-03±5.16e-03 + 2.10e-02±2.71e-03 + 5.86e-02±9.21e-03 + 6.72e-02±6.25e-03
Best 0 0 0 6
+/=/- 6/0/0 6/0/0 5/1/0    
f1 f1
(a) 2nd (b) 3rd
&($ &($

.953+ .953+
&($  &($

 16*03 16*03


.953+
 .953+
f2

f2
02($''(&$&2 02($''(&$&2
16*03 16*03



f2
f2
02($''(&$&2 02($''(&$&2


 
    

f1 f1    


(a) 2nd (b) 3rd f1 f1
(c) 4th (d) 5th
 &($ &($

.953+ .953+ 


&($ &($
 16*03  16*03
.953+ .953+

f2

f2

02($''(&$&2 02($''(&$&2
16*03 16*03


f2

f2
02($''(&$&2 02($''(&$&2
 

 
   
f1 f1     
(c) 4th (d) 5th f1 f1
(e) 6th (f) 7th
&($ &($
 .953+
 .953+ Fig. 6. Nondominated solutions obtained by K-VRPH, NSGMP,
16*03 16*03 MOEA/D-DE-C-ACO, and CEA in 2nd -7th time steps of I5 .

f2

f2

02($''(&$&2 02($''(&$&2



algorithms in almost all time steps. Particularly, the solutions
   
of NSGMP take a higher energy cost (f1 ) since NSGMP uses
f1 f1 neighbor information only from past solutions and tends to
(e) 6th (f) 7th assign the target missions into more voyages. In contrast, the
solutions of K-VRPH require a longer traveling time since K-
Fig. 5. Nondominated solutions obtained by K-VRPH, NSGMP,
MOEA/D-DE-C-ACO, and CEA in 2nd -7th time steps of I2 . VRPH constructs solutions using local heuristic information
and is easily trapped in local optima. Compared with NSGMP
and K-VRPH, both CEA and MOEA/D-DE-C-ACO obtain
III, CEA performs significantly better than all competing better solutions. This shows that differential evolution can
algorithms on all instances in terms of MIGD. From Table locate good surface points and ant colony system can well
IV, CEA works significantly better than K-VRPH, NSGMP, plan voyages using the historical experience and heuristic
and MOEA/D-DE-C-ACO on 6, 6, and 5 out of 6 instances. information in a global scheme. Specifically, the number of
This shows that CEA can find solution sets with higher solutions of MOEA/D-DE-C-ACO is smaller than CEA and
quality and stronger diversity. The good performance of CEA the solutions are worse. This shows that the multipopulation
benefits from the multipopulation mechanism, recombination- mechanism and the recombination-based sampling strategy can
based sampling strategy, and the incremental response strategy. help CEA better maintain population diversity and approxi-
Particularly, the multiple populations help locate multiple mate the Pareto front.
high-quality solutions in the search space, which are the data Thus, CEA has a strong global search ability to find high-
source for the recombination-based sampling strategy to gen- quality and diverse solutions in complex underwater environ-
erate better solutions. The cooperation of multiple populations ments with dynamic target missions.
and the sampling strategy enables the algorithm to search
well-distributed and high-quality solutions. In addition, the V. C ONCLUSION
incremental response strategy helps the algorithm quickly react This paper tracks the dynamic multiobjective multi-AUV
to changes by efficiently utilizing historical information. path planning problem in large underwater environments with
Taking I2 and I5 as examples, the nondominated solutions complex obstacles and risky areas. A cooperative evolutionary
obtained by each algorithm in 2nd - 7th time steps are illustrated computation algorithm is proposed to locate the surface points
in Figs. 5 and 6. The solutions of CEA dominate those of other for AUV launch and plan the paths of AUVs for traveling

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
10 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, XXXX

target missions. Experimental results on multiple complex un- [13] A. Atyabi, S. MahmoudZadeh, and S. Nefti-Meziani, “Current advance-
derwater environments show that the proposed method can find ments on autonomous mission planning and management systems: An
auv and uav perspective,” Annu. Rev. Control, vol. 46, pp. 196–215,
better solution sets than the state-of-the-art algorithms in dy- 2018.
namic scenarios. Particularly, the multipopulation mechanism [14] R. Cui, Y. Li, and W. Yan, “Mutual information-based multi-auv path
helps maintain population diversity and the recombination- planning for scalar field sampling using multidimensional rrt*,” IEEE
Trans. Syst., Man, Cybern., Syst., vol. 46, no. 7, pp. 993–1004, 2016.
based sampling strategy benefits to improving convergence by [15] J.-V. Sørli, O. H. Graven, and J. D. Bjerknes, “Multi-uav cooperative
fusing information from multiple populations. In addition, the path planning for sensor placement using cooperative coevolving genetic
incremental response strategy improves algorithm performance strategy,” in Advances in Swarm Intelligence, Y. Tan, H. Takagi, Y. Shi,
and B. Niu, Eds. Cham: Springer International Publishing, 2017, pp.
by using historical information after changes. The proposed 433–444.
method is able to provide diverse and high-quality solutions [16] J. Zhang, M. Liu, S. Zhang, R. Zheng, and S. Dong, “Multi-auv
for decision makers. adaptive path planning and cooperative sampling for ocean scalar field
estimation,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–14, 2022.
Several avenues are open for further investigations. First, [17] D. Wei, H. Ma, H. Yu, X. Dai, G. Wang, and B. Peng, “A hyperheuristic
since the proposed method uses historical solutions to update algorithm based on evolutionary strategy for complex mission planning
the populations after changes, it may fail if problems become of auvs in marine environment,” IEEE J. Ocean. Eng., vol. 47, no. 4,
pp. 936–949, 2022.
substantially different after changes. Transfer learning methods [18] W. Luo, Q. Tang, C. Fu, and P. Eberhard, “Deep-sarsa based multi-
could be developed to adapt to random changes. Second, uav path planning and obstacle avoidance in a dynamic environment,”
moving obstacles and changing ocean flows are not considered in Advances in Swarm Intelligence, Y. Tan, Y. Shi, and Q. Tang, Eds.
Cham: Springer International Publishing, 2018, pp. 102–111.
in the problem model. In future works, new methods could
[19] X. Yu, W.-N. Chen, X.-M. Hu, T. Gu, H. Yuan, Y. Zhou, and J. Zhang,
be developed for the multi-AUV path planning in complex “Path planning in multiple-auv systems for difficult target traveling mis-
underwater environments with moving obstacles and changing sions: A hybrid metaheuristic approach,” IEEE Trans. Cogn. Develop.
ocean flows. Syst., vol. 12, no. 3, pp. 561–574, 2020.
[20] D. Zhu, B. Zhou, and S. X. Yang, “A novel algorithm of multi-auvs
task assignment and path planning based on biologically inspired neural
network map,” IEEE Trans. Intell. Veh., vol. 6, no. 2, pp. 333–342, 2021.
R EFERENCES [21] J. Wu, C. Song, J. Ma, J. Wu, and G. Han, “Reinforcement learning and
[1] Y. Wang, X. Ma, J. Wang, S. Hou, J. Dai, D. Gu, and H. Wang, “Robust particle swarm optimization supporting real-time rescue assignments for
auv visual loop-closure detection based on variational autoencoder multiple autonomous underwater vehicles,” IEEE Trans. Intell. Transp.
network,” IEEE Trans. Ind. Informat., vol. 18, no. 12, pp. 8829–8838, Syst., vol. 23, no. 7, pp. 6807–6820, 2022.
2022. [22] Z.-H. Zhan, J. Li, J. Cao, J. Zhang, H. S.-H. Chung, and Y.-H. Shi,
[2] D. Roper, C. A. Harris, G. Salavasidis, M. Pebody, R. Templeton, “Multiple populations for multiple objectives: A coevolutionary tech-
T. Prampart, M. Kingsland, R. Morrison, M. Furlong, A. B. Phillips, and nique for solving multiobjective optimization problems,” IEEE Trans.
S. McPhail, “Autosub long range 6000: A multiple-month endurance auv Cybern., vol. 43, no. 2, pp. 445–463, 2013.
for deep-ocean monitoring and survey,” IEEE J. Ocean. Eng., vol. 46, [23] X.-F. Liu, Z.-H. Zhan, Y. Gao, J. Zhang, S. Kwong, and J. Zhang,
no. 4, pp. 1179–1191, 2021. “Coevolutionary particle swarm optimization with bottleneck objective
[3] I. Jawhar, N. Mohamed, J. Al-Jaroodi, and S. Zhang, “An architecture learning strategy for many-objective optimization,” IEEE Trans. Evol.
for using autonomous underwater vehicles in wireless sensor networks Comput., vol. 23, no. 4, pp. 587–602, 2019.
for underwater pipeline monitoring,” IEEE Trans. Ind. Informat., vol. 15, [24] J. Shi, J. Sun, Q. Zhang, H. Zhang, and Y. Fan, “Improving pareto
no. 3, pp. 1329–1340, 2019. local search using cooperative parallelism strategies for multiobjective
[4] M. Katwe, K. Singh, P. K. Sharma, C.-P. Li, and Z. Ding, “Dynamic combinatorial optimization,” IEEE Trans. Cybern., pp. 1–14, 2022.
user clustering and optimal power allocation in uav-assisted full-duplex [25] A. Jaszkiewicz and T. Lust, “Proper balance between search towards
hybrid noma system,” IEEE Wireless Commun., vol. 21, no. 4, pp. 2573– and along Pareto front: biobjective TSP case study,” Ann. Oper. Res.,
2590, 2022. vol. 254, no. 1, pp. 111–130, July 2017.
[5] M. Chen and D. Zhu, “A workload balanced algorithm for task as- [26] A. Jaszkiewicz, “Many-objective pareto local search,” Eur. J. Oper. Res.,
signment and path planning of inhomogeneous autonomous underwater vol. 271, no. 3, pp. 1001–1013, 2018.
vehicle system,” IEEE Trans. Cogn. Develop. Syst., vol. 11, no. 4, pp. [27] X. Cai, C. Xia, Q. Zhang, Z. Mei, H. Hu, L. Wang, and J. Hu, “The
483–493, 2019. collaborative local search based on dynamic-constrained decomposition
[6] G. Han, Z. Tang, Y. He, J. Jiang, and J. A. Ansere, “District partition- with grids for combinatorial multiobjective optimization,” IEEE Trans.
based data collection algorithm with event dynamic competition in Evol. Comput., vol. 51, no. 5, pp. 2639–2650, 2021.
underwater acoustic sensor networks,” IEEE Trans. Ind. Informat., [28] F. Yang, X. Lei, J. Le, N. Mu, and X. Liao, “Minable data publication
vol. 15, no. 10, pp. 5755–5764, 2019. based on sensitive association rule hiding,” IEEE Trans. Emerg. Topics
[7] Y. Rizk, M. Awad, and E. W. Tunstel, “Cooperative heterogeneous multi- Comput., vol. 6, no. 5, pp. 1247–1257, 2022.
robot systems: A survey,” ACM Comput. Surv., vol. 52, no. 2, apr 2019. [29] J. M. Sanchez-Gomez, M. A. Vega-Rodrı́guez, and C. J. Pérez, “Auto-
[8] J. Lee and V. Friderikos, “Interference-aware path planning optimization matic update summarization by a multiobjective number-one-selection
for multiple uavs in beyond 5g networks,” J. Commun. Netw., vol. 24, genetic approach,” IEEE Trans. Cybern., pp. 1–0, 2022.
no. 2, pp. 125–138, 2022. [30] X.-F. Liu, Y. Fang, Z.-H. Zhan, and J. Zhang, “Strength learning particle
[9] Z. Zhang, J. Wang, D. Xu, and Y. Meng, “Task allocation of multi- swarm optimization for multiobjective multirobot task scheduling,” IEEE
auvs based on innovative auction algorithm,” in ISCID, vol. 2, 2017, Trans. Syst., Man, Cybern., Syst., pp. 1–12, 2023.
pp. 83–88. [31] R. Lan, Y. Zhu, H. Lu, Z. Liu, and X. Luo, “A two-phase learning-based
[10] S. Mahmoudzadeh, D. M. Powers, K. Sammut, A. M. Yazdani, and swarm optimizer for large-scale optimization,” IEEE Trans. Cybern.,
A. Atyabi, “Hybrid motion planning task allocation model for auv’s vol. 51, no. 12, pp. 6284–6293, 2021.
safe maneuvering in a realistic ocean environment,” J. Intell. Robotics [32] M. Z. Kreishan and A. F. Zobaa, “Allocation of dump load in islanded
Syst., vol. 94, no. 1, p. 265–282, apr 2019. microgrid using the mixed-integer distributed ant colony optimization,”
[11] G. Han, A. Gong, H. Wang, M. Martı́nez-Garcı́a, and Y. Peng, “Multi- IEEE Syst. J., vol. 16, no. 2, pp. 2568–2579, 2022.
auv collaborative data collection algorithm based on q-learning in un- [33] Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinato-
derwater acoustic sensor networks,” IEEE Trans. Veh. Technol., vol. 70, rial optimization: A methodological tour d’horizon,” Eur. J. Oper. Res.,
no. 9, pp. 9294–9305, 2021. vol. 290, no. 2, pp. 405–421, 2021.
[12] F. Zitouni and R. Maamri, “Cooperative learning-agents for task al- [34] K. Touloupas and P. P. Sotiriadis, “Locomobo: A local constrained
location problem,” in Interactive Mobile Communication Technologies multiobjective bayesian optimization for analog circuit sizing,” IEEE
and Learning, M. E. Auer and T. Tsiatsos, Eds. Cham: Springer Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 41, no. 9, pp.
International Publishing, 2018, pp. 952–968. 2780–2793, 2022.

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Transactions on Industrial Informatics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TII.2023.3268760
AUTHOR et al.: TITLE 11

[35] A. Liefooghe, F. Daolio, S. Verel, B. Derbel, H. Aguirre, and K. Tanaka, Yongchun Fang (Senior Member, IEEE) re-
“Landscape-aware performance prediction for evolutionary multiobjec- ceived the B.S. degree in electrical engineering
tive optimization,” IEEE Trans. Evol. Comput., vol. 24, no. 6, pp. 1063– and the M.S. degree in control theory and ap-
1077, 2020. plications from Zhejiang University, Hangzhou,
[36] M. J. A. Schuetz, J. K. Brubaker, and H. G. Katzgraber, “Combinatorial China, in 1996 and 1999, respectively, and
optimization with physics-inspired graph neural networks,” Nat. Mach. the Ph.D. degree in electrical engineering from
Intell., vol. 4, pp. 367 – 377, 2021. Clemson University, Clemson, SC, USA, in
[37] W. Kool, H. van Hoof, and M. Welling, “Attention, Learn to Solve 2002.
Routing Problems!” ICLR, pp. 1–7, 2019. He is currently a Professor with the Institute
[38] D. Sidoti, G. V. Avvari, M. Mishra, L. Zhang, B. K. Nadella, J. E. of Robotics and Automatic Information System
Peak, J. A. Hansen, and K. R. Pattipati, “A multiobjective path-planning (IRAIS), Nankai University, Tianjin, China. His
algorithm with time windows for asset routing in a dynamic weather- research interests include nonlinear control, visual servoing, AFM-based
impacted environment,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, nano-systems, and control of underactuated systems.
no. 12, pp. 3256–3271, 2017.
[39] J. Cheng, M. Ju, M. Zhou, C. Liu, S. Gao, A. Abusorrah, and C. Jiang,
“A dynamic evolution method for autonomous vehicle groups in a
highway scene,” IEEE Internet Things J., vol. 9, no. 2, pp. 1445–1457, Zhi-Hui Zhan (Senior Member, IEEE) received
2022. the Bachelor’s degree and the Ph. D. degree in
[40] X.-F. Liu, X.-X. Xu, Z.-H. Zhan, Y. Fang, and J. Zhang, “Interaction- Computer Science from the Sun Yat-Sen Uni-
based prediction for dynamic multiobjective optimization,” IEEE Trans. versity, Guangzhou China, in 2007 and 2013,
Evol. Comput., pp. 1–1, 2023. respectively.
[41] J. Li, T. Sun, Q. Lin, M. Jiang, and K. C. Tan, “Reducing negative He is currently the Changjiang Scholar Young
transfer learning via clustering for dynamic multiobjective optimization,” Professor with the School of Computer Sci-
IEEE Trans. Evol. Comput., vol. 26, no. 5, pp. 1102–1116, 2022. ence and Engineering, South China University
[42] R. Zhang, J. Hao, R. Wang, H. Deng, and H. Wang, “Multi-objective of Technology, Guangzhou, China. His current
global path planning for uav-assisted sensor data collection using drl research interests include evolutionary compu-
and transformer,” in Web and Big Data, B. Li, L. Yue, C. Tao, tation, swarm intelligence, and their applications
X. Han, D. Calvanese, and T. Amagasa, Eds. Cham: Springer Nature in real-world problems and in environments of cloud computing and big
Switzerland, 2023, pp. 492–500. data.
[43] X.-F. Liu, Z.-H. Zhan, Y. Lin, W.-N. Chen, Y.-J. Gong, T.-L. Gu, H.-Q. Dr. Zhan was a recipient of the IEEE Computational Intelligence
Yuan, and J. Zhang, “Historical and heuristic-based adaptive differential Society (CIS) Outstanding Early Career Award in 2021, the Outstanding
evolution,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 49, no. 12, pp. Youth Science Foundation from National Natural Science Foundations of
2623–2635, 2019. China (NSFC) in 2018, and the Wu Wen-Jun Artificial Intelligence Excel-
[44] M. Dorigo and L. Gambardella, “Ant colony system: a cooperative lent Youth from the Chinese Association for Artificial Intelligence in 2017.
learning approach to the traveling salesman problem,” IEEE Trans. Evol. His doctoral dissertation was awarded the IEEE CIS Outstanding Ph.
Comput., vol. 1, no. 1, pp. 53–66, 1997. D. Dissertation and the China Computer Federation Outstanding Ph. D.
[45] A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recog- Dissertation. He is one of the World’s Top 2% Scientists for both Career-
nit. Lett., vol. 31, no. 8, pp. 651–666, 2010. Long Impact and Year Impact in Artificial Intelligence and one of the
[46] C. S. Groer, B. Golden, and W. Edward, “A library of local search Highly Cited Chinese Researchers in Computer Science. He is currently
heuristics for the vehicle routing problem,” Math. Prog. Comp., vol. 2, the Chair of Membership Development Committee in IEEE Guangzhou
no. 2, p. 79–101, 2010. Section and the Vice-Chair of IEEE CIS Guangzhou Chapter. He is
[47] K. Deb and H. Jain, “An evolutionary many-objective optimization currently an Associate Editor of the IEEE Transactions on Evolutionary
algorithm using reference-point-based nondominated sorting approach, Computation, the Neurocomputing, the Memetic Computing, and the
part i: Solving problems with box constraints,” IEEE Trans. Evol. Machine Intelligence Research.
Comput., vol. 18, no. 4, pp. 577–601, 2014.
[48] C. Prins, “A simple and effective evolutionary algorithm for the vehicle
routing problem,” Comput. Oper. Res., vol. 31, no. 12, pp. 1985–2002, Yun-Liang Jiang received the Ph.D. degree in
2004. computer science and technology from Zhejiang
[49] B. Garau, A. Alvarez, and G. Oliver, “Path planning of autonomous University, Hangzhou, China, in 2006. He is
underwater vehicles in current fields with complex spatial variability: currently a Professor with the School of Com-
an a* approach,” in ICRA, 2005, pp. 194–198. puter Science and Technology, Zhejiang Nor-
[50] B. Miloradović, B. Çürüklü, M. Ekström, and A. V. Papadopoulos, mal University, Jinhua, China, and also with the
“Gmp: A genetic mission planner for heterogeneous multirobot system School of Information Engineering, Huzhou Uni-
applications,” IEEE Trans. Cybern., vol. 52, no. 10, pp. 10 627–10 638, versity, Huzhou, China. His research interests
2022. include intelligent information processing and
[51] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist geographic information systems.
multiobjective genetic algorithm: Nsga-ii,” IEEE Trans. Evol. Comput.,
vol. 6, no. 2, pp. 182–197, 2002.

Jun Zhang (Fellow, IEEE) received the Ph.D.


degree from the City University of Hong Kong,
Kowloon, Hong Kong, in 2002.
He is currently an Honorary Professer with
Xiao-Fang Liu (Member, IEEE) received the Zhejiang Normal University, China and also a
B.S. degree and Ph.D degree in computer sci- BP professor with the Hanyang University, ER-
ence from Sun Yat-Sen University, Guangzhou, ICA, South Korea. He has published over 180
China, in 2015 and 2020, respectively. IEEE Transactions papers in his research area.
She is currently an Assistant Professor with His research interests include computational in-
College of Artificial Intelligence, Nankai Uni- telligence, cloud computing, data mining, and
versity. Her current research interests include power-electronic circuits. Professor Zhang was a
artificial intelligence, evolutionary computation, recipient of the China National Funds for Distinguished Young Scientists
swarm intelligence, and their applications in de- from the National Natural Science Foundation of China in 2011 and was
sign and optimization, such as cloud computing appointed as a Cheung Kong Chair Professor in 2013 from the Ministry
resources scheduling, multirobot systems. of Education, China, in 2009. He is currently an Associate Editor of the
IEEE Transactions on Evolutionary Computation and IEEE Transactions
on Cybernetics.

© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SSEN. Downloaded on May 06,2023 at 09:13:06 UTC from IEEE Xplore. Restrictions apply.

You might also like