You are on page 1of 16

applied

sciences
Article
Path Planning for Automatic Berthing Using Ship-Maneuvering
Simulation-Based Deep Reinforcement Learning
Anh Khoa Vo 1 , Thi Loan Mai 2 and Hyeon Kyu Yoon 2, *

1 Department of Smart Environmental Energy Engineering, Changwon National University,


Changwon 51140, Republic of Korea; anhkhoak3t@gmail.com
2 Department of Naval Architecture and Marine Engineering, Changwon National University,
Changwon 51140, Republic of Korea; mailoankttt@gmail.com
* Correspondence: hkyoon@changwon.ac.kr; Tel.: +82-055-2133-683

Abstract: Despite receiving much attention from researchers in the field of naval architecture and
marine engineering since the early stages of modern shipbuilding, the berthing phase is still one of
the biggest challenges in ship maneuvering due to the potential risks involved. Many algorithms
have been proposed to solve this problem. This paper proposes a new approach with a path-
planning algorithm for automatic berthing tasks using deep reinforcement learning (RL) based on
a maneuvering simulation. Unlike the conventional path-planning algorithm using the control
theory or an advanced algorithm using deep learning, a state-of-the-art path-planning algorithm
based on reinforcement learning automatically learns, explores, and optimizes the path for berthing
performance through trial and error. The results of performing the twin delayed deep deterministic
policy gradient (TD3) combined with the maneuvering simulation show that the approach can be
used to propose a feasible and safe path for high-performing automatic berthing tasks.

Keywords: path planning; deep reinforcement learning; TD3; maneuvering simulation; automatic
berthing

Citation: Vo, A.K.; Mai, T.L.; Yoon,


1. Introduction
H.K. Path Planning for Automatic
Berthing Using Ship-Maneuvering Since the early stages of modern shipbuilding, much attention has been paid to
Simulation-Based Deep Reinforcement automated methods of ship navigation, particularly with the continuous advancements in
Learning. Appl. Sci. 2023, 13, 12731. artificial intelligence (AI). As a result, the number of autonomous ships has rapidly grown.
https://doi.org/10.3390/ Autonomous ship navigation offers substantial advantages in terms of safety, efficiency,
app132312731 reliability, and environmental sustainability. By harnessing advanced technologies such
as sensor systems, data analysis, and artificial intelligence and by reducing the risk of
Academic Editors: Arpad Barsi,
Niclas Zeller and Eliseo Clementini
human error, autonomous navigation systems ensure the safety of ship operations. These
systems operate consistently and reliably, unhindered by human limitations, resulting in
Received: 30 October 2023 more predictable performance and fewer accidents. Autonomous ships can process vast
Revised: 23 November 2023 amounts of data, enabling informed decision making, collision avoidance, and adaptation
Accepted: 25 November 2023 to changing conditions [1]. However, automatic ship berthing remains an extremely
Published: 27 November 2023
complex task, particularly under low-speed conditions where the hydrodynamic forces
acting on the ship are highly nonlinear [2]. Controlling the ship becomes challenging
and necessitates the expertise of an experienced commander. Numerous researchers have
Copyright: © 2023 by the authors.
conducted extensive studies on the principles and algorithms for automatic ship berthing.
Licensee MDPI, Basel, Switzerland. Researchers have developed various ship control algorithms based on control theories
This article is an open access article and maneuverability assumptions [3–9]. These approaches proved effective under defined
distributed under the terms and berthing conditions before the existence of AI. The development of AI algorithms has
conditions of the Creative Commons propelled the creation of algorithms and methods that enhance ship control performance,
Attribution (CC BY) license (https:// improve safety, and greatly reduce accidents in the marine industry. Many supervised
creativecommons.org/licenses/by/ learning algorithms based on neural networks have exhibited promising results with a
4.0/). high success rate, as in [10–13]. Applying AI algorithms eliminates the need to clearly

Appl. Sci. 2023, 13, 12731. https://doi.org/10.3390/app132312731 https://www.mdpi.com/journal/applsci


Appl. Sci. 2023, 13, 12731 2 of 17

Appl. Sci. 2023, 13, 12731 with a high success rate, as in [10–13]. Applying AI algorithms eliminates the need to
2 of 16
clearly understand the mathematical model of ships. However, acquiring a substantial
number of labeled training data can be time-consuming and costly.
Unlikethe
understand themathematical
aforementioned modelmethods where
of ships. the training
However, acquiringdataset is not necessary,
a substantial number re- of
inforcement learning techniques, which
labeled training data can be time-consuming and costly. constitute an area of machine learning, allow the
ship Unlike
to learnthe and optimize its berthing
aforementioned methods maneuvers
where thethrough
traininginteractions
dataset is not with a simulated
necessary, re-
environment.
inforcement The application
learning techniques, of RL in the
which automatic
constitute anberthing task has shown
area of machine learning, good results
allow the
whentothe
ship learnship
andcan automatically
optimize its berthinglearnmaneuvers
the strategy and optimize
through interactions the with
control policy to
a simulated
move to the berthing
environment. point [14,15].
The application of RL in the automatic berthing task has shown good results
whenIn thethis
shippaper, initial development
can automatically learn theof a novel
strategy andpath-planning
optimize the controlalgorithm for to
policy autono-
move
mous
to ship berthing
the berthing point that uses the latest technique in reinforcement learning, called twin
[14,15].
delayed deep
In this deterministic
paper, policy gradient
initial development (TD3),
of a novel is proposed.
path-planning The TD3for
algorithm algorithm
autonomous was
ship berthingbythat
introduced uses
Scott the latest(2018)
Fujimoto technique
and isin specifically
reinforcement learning,
designed forcalled twin delayed
continuous action
deep
spaces.deterministic policy gradient
TD3 is an extension of deep (TD3), is proposed.
deterministic Thegradients
policy TD3 algorithm (DDPGs) wasandintroduced
aims to
by Scott Fujimoto
address (2018) and
certain challenges andis improve
specificallythedesigned
stability of forlearning
continuous action spaces.
in complex TD3 is
environments.
an extension of deep deterministic policy gradients (DDPGs)
TD3 exploration algorithms allow the agent to explore the environment and gain new and aims to address certain
challenges
experiences and
thatimprove
optimize therewards
stability through
of learning in and
trial complex
error.environments.
It employs two TD3 exploration
distinct value
algorithms allow thewhich
function estimators, agentmitigate
to explore the the environment
overestimation and
bias and gain new experiences
stabilize the learning thatpro-
optimize rewards these
cess. Leveraging through twotrial and TD3
critics, error.provides
It employs twoaccurate
more distinct value function
estimatesestimators,
and facil-
which mitigate
itates better the overestimation
policy bias and stabilize
updates. High performance the learning
and stability process.
compared toLeveraging these
other algorithms
two critics, TD3 provides more accurate value estimates and facilitates
in the field of RL were shown in [16]. In combination with the MMG model, the solutions better policy updates.
High performance and motion
for ship-maneuvering stabilitysimulation
compared to other
that werealgorithms
proposedinbythe field of RL
a research weremaneu-
group shown
in [16]. In combination with the MMG model, the solutions for
vering modeling group (MMG) in 1977 [17] suggest a feasible path, resulting in faster con-ship-maneuvering motion
simulation
vergence and thatimproved
were proposed accuracy.by a research group maneuvering modeling group (MMG)
in 1977This[17] suggest
article a feasible path,
is organized resulting
into five parts. in
Thefaster
firstconvergence
section introducesand improved
previous accuracy.
studies
This article is organized into five parts. The first section introduces
conducted in this field. Section 2 presents the equation of motion for the ship based on the previous studies
conducted
MMG model in this
along field.
with Section 2 presentsand
hydrodynamic the interaction
equation ofcoefficients.
motion for the ship 3based
Section on the
outlines the
MMG model along with hydrodynamic and interaction coefficients.
path-planning algorithm based on deep reinforcement learning TD3. Section 4 showcases Section 3 outlines the
path-planning
and discusses algorithm
simulationbased results onfor
deeptworeinforcement
berthing cases. learning
Finally, TD3. Section
Section 4 showcases
5 concludes this
and discusses
research. simulation results for two berthing cases. Finally, Section 5 concludes
this research.
2. Mathematical Model
2. Mathematical Model
2.1. Coordinated
2.1. Coordinated System
System
This paper
This paper focuses
focuses on
on the
themotions
motionsofofUSVs
USVsininhorizontal
horizontal planes
planesonly. Thus,
only. Thus,twotwoco-
ordinate systems
coordinate systemswere
weredefined
definedfor
foraamaneuvering
maneuveringship
ship based
based on on the right-hand rule,
the right-hand rule, as
as
shown in Figure 1. The earth-fixed coordinate system is 𝑂𝑥𝑦, where the origin is
shown in Figure 1. The earth-fixed coordinate system is Oxy, where the origin is located on located
on the
the water
water surface
surface and and the body-fixed
the body-fixed coordinate
coordinate system
system yb ,𝑜𝑥
is oxbis 𝑦 , where
where the origin
the origin is
is in the
in the midship. The 𝑥 and 𝑦 axes point toward the ship’s bow and starboard,
midship. The x and y axes point toward the ship’s bow and starboard, respectively. The respec-
tively. The
heading heading
angle angle 𝜓the
ψ represents represents the angle
angle between the between
x and xbtheaxes.𝑥 and 𝑥 axes.

Figure 1. Coordinate system of the twin-propeller and twin-rudder ship model.


Figure 1. Coordinate system of the twin-propeller and twin-rudder ship model.
Appl. Sci. 2023, 13, 12731 3 of 16

2.2. Mathematical Model of USV


The motion equation with three degrees of freedom (3-DOF) is established based on
Newton’s second law. In this paper, the berthing task assumed performing in the calm
water, the disturbances from environments such as waves and wind in the port area are
ignored due to the simplicity of the equation of motion. Thus, the heave, roll, and pitch
motion are relatively small and not significantly affected in the equation of motion; thus, it
can be neglected from the equation of motion. Furthermore, the low-speed condition made
the 3-DOF motions sufficient to simulate the motion of the vehicle. Additionally, the main
purpose of this paper is to focus on the path planning algorithm to generate the feasible
path for the berthing task based on the maneuvering simulation. The application of the
3-DOF motion equations causes simplicity but retains characteristics of the system. The
MMG model for the 3-DOF equation of motion suggested in [17] that the total ship force
and moment can be divided into sub-components: the hull, thruster, and steering system.
Thus, the 3-DOF motion equation is expressed as
.
m u − vr − xbG r2 = X H + XP + XR

. .
m v + ur + xbG r =  YH + YP + YR (1)
. .
Izz r + mxG v + ur = NH + NP + NR

where m is the mass of the ship; xbG is the longitudinal position of the center of gravity of
the ship; u, v, r denote the surge, sway, and yaw velocities, respectively; the symbol “.” in
the head of the variable denotes the derivative of the variable with respect to time; Izz is
the moment representing the mass moment of inertia with respect to the z-axis; X, Y, and
N represent the surge force, lateral force, and yaw moments, respectively, at the midship;
and the subscripts H, P, and R denote the hull, rudder, and propeller, respectively.
Due to the operation conditions in the berthing phase, the hydrodynamic forces and
moments around the midship acting on the ship hull were investigated at a low speed
with a wide range of drift angles [2]. Thus, the equation of motion considers some of the
high-order hydrodynamic coefficients and the hydrodynamic forces and moments caused
by the hull, which are expressed as follows:
.
X H = Xu. u + Xu|u| u u + Xvv v2 + Xrr r2 + Xvr vr
. .
Y H = Yv. v + Yr. r + Yv v + Yvvv v3 + Yvvvvv v5 + Yr r + Yr|r| r r +Yvvr v2 r + Yvrr vr2 (2)
. .
NH = Nv. v + Nr. r + Nv v + Nuv uv + Nvvv v3 + Nuvvv uv3 + Nr r + Nr|r| r r + Nvvr v2 r + Nvrr vr2

The hydrodynamic coefficients were described under the Taylor series expansion in
terms of surge velocity, sway velocity, and yaw rate.
The ship model was equipped with twin-propeller and twin-rudder systems [18]. The
thruster model is expressed as follows:
h 2 2 i
XP = (1 − t)ρD4 n P KTP JPP + nS KTS JPS
YP = 0 h i (3)
2 2
NP = y P (1 − t)ρDP4 n P KTP JPP − nS KTS JPS

where D is the propeller diameter; n is the propeller revolutions per minute; t denotes the
thrust deduction factor; y P is the lateral position of the propeller from the centerline; and
the superscripts S and P denote the side of the propeller (port and starboard).
The thrust coefficient KT is described as the function of the advanced ratio coefficient
JP that was obtained through the propeller open water test

KT = k0 + k1 JP + k2 JP2 (4)
Appl. Sci. 2023, 13, 12731 4 of 16

The parameter required for the estimation of thrust is given as

u PP,S
JPP,S = P,S
n D P
 
u P,S
P = 1 − w PP,S u
 
2
w PP,S = w P0 exp −CP v0 P (5)
vP = v + xP r
CPP = CP− and CPS = CP+ when β P > 0
CPP = CP+ and CPS = CP− when β P < 0

where the wake fraction at the propeller position w P was estimated using the wake fraction
at the propeller in the straight motion w P0 ; the geometrical inflow angle to the propeller
position is denoted by β P ; CP+ and CP− describe the wake-changing coefficients for plus and
minus β P due to lateral motion; and x P denotes the longitudinal position from the midship.
Forces and moments due to the steering system for the twin rudder were calculated
based on the normal force FN and are expressed as

XR = −(1 − t R ) FNP + FN S sin δ




YR = (1 + a H ) FNP + FN S cos δ

(6)
NR = ( x R + a H x H ) FNP + FNS cos δ −


y R (1 − t R ) FNP − FNS sin δ




where the normal force acting on the rudder is described as follows (Equation (7)).

1  2
FNP,S = ρA R URP,S f a sin α P,S
R (7)
2
The parameters required for estimating the rudder forces and moment during the
maneuver are given as
r 2  2
URP,S = u P,S
R + v P,S
R
6.13Λ
fa = Λ+2.25
 
v P,S
α P,S
R = δ − tan−1 R
u P,S
v  R   2
u s (8)
8K P,S
u
u P,S
R
P,S u
= εu R tη 1 + κ  1 +  T 2 − 1  + (1 − η )
π JPP,S

v P,S P,S
R = γR (v + l R r )
γRP = γ− S +
R and γR = γR when β R > 0

γRP = γ+ S
R and γR = γR when β R < 0

where FN is the normal rudder force; t R , a H , and x H are the steering resistance deduction
factor, the rudder increase factor, and the position of an additional lateral force component,
respectively; UR is the resultant rudder inflow velocity; f a is the rudder lift gradient
coefficient; Λ is the rudder aspect ratio; α R is the effective inflow angle to the rudder; u R
and v R are longitudinal and lateral inflow velocity components to the rudder; ε is a ratio of
a wake fraction at the propeller and rudder position; γ is the flow straightening coefficient;
and β R is the effective inflow angle to the rudder in the maneuvering motion.
ppl. Sci. 2023, 13, 12731

Appl. Sci. 2023, 13, 12731 5 of 16


straightening coefficient; and 𝛽 is the effective inflow angle to the ru
vering motion.
2.3. Hydrodynamic and Interaction Coefficients

2.3.Previous studies were carried out by research groups at Changwon National University
Hydrodynamic and Interaction Coefficients
on hydrodynamic properties under operating conditions [19]. Experiments were conducted
Previous
on the Korean studies
autonomous were
surface carried
ship out by
(KASS) model, research
a ship model used groups at Changw
in the project
carried out by many universities and research institutes to develop autonomous ships. The
sitycharacteristics
main on hydrodynamic
and the shapeproperties under
of the ship’s model operating
are shown in Table 1conditions [19]. Exp
and Figure 2 The
ducted on the
cross-comparison Korean
of the autonomous
results between surface
the previous ship
studies and (KASS)
[20] model, a shi
shows similarities
in the results. Figure 3 shows the comparison results of the turning maneuverability
project carried out by many universities and research institutes to d
at a rudder angle of 35 degrees at three and six knots. Similarly, in order to obtain the
ships.path
feasible The formain characteristics
the automatic berthing task,and the shape of
the high-accuracy the ship’s
equation model
of motion and are s
hydrodynamic
Figure 2 The coefficients of the USV should
cross-comparison ofbethe
investigated
resultscarefully
between using theprevious
the Korean stu
autonomous surface ship (KASS) model. In this paper, the hydrodynamic coefficients were
similarities
estimated in captive
using the the results.
model testFigure 3 shows
at Changwon the University
National comparison results of the
and compared
ability
with atmethod
the CFD a rudder angleinof
as presented 35The
[21]. degrees atrelative
coefficients threetoand sixsurge
only the knots. Similarl
velocity
were estimated through the resistance test. The hydrodynamic coefficients related to surge
thesway
and feasible
velocitypath forand
for forces the automatic
moments berthing
were estimated task,
through thedrift
a static high-accuracy
test. The
and hydrodynamic
hydrodynamic coefficients
coefficients related of the
to the yaw rate were USV should
estimated using thebecircular
investigated
motion car
test and the hydrodynamic coefficients related to the combined effect of sway velocity and
rean autonomous surface ship (KASS) model. In this paper, the hydrod
yaw rate were estimated using the combined circular motion with drift test. The added
were
mass andestimated using the
interaction coefficients werecaptive model
selected from test
[21]. The at Changwon
summary National U
of hydrodynamics
and interaction coefficients are shown in Tables 2 and 3.
pared with the CFD method as presented in [21]. The coefficients relat
velocity
Table were
1. Principal estimated
dimensions. through the resistance test. The hydrodynami
to surge and sway
Item (Unit) velocity for forces and moments
Value were estimated t
test. The hydrodynamic
Length perpendicular, L pp (m)coefficients related to the yaw rate were esti
22.000
Breadth, B (m) 6.000
cular motion test and the hydrodynamic coefficients
Draft, T (m) 1.250
related to the com
velocity and yaw
Displacement rate
Volume, V (mwere
3 ) estimated using the combined circular m
86.681
Rudder area, A R (m2 ) 0.518
The added mass
Rudder span,and
HR (m)interaction coefficients were
0.900 selected from [21]. T
Propeller diameter,
drodynamics DP (m)
and interaction 0.950
coefficients are shown in Tables 2 and 3

Figure
Figure 2. Geometry
2. Geometry of an autonomous
of an autonomous surface
surface ship (KASS). ship (KASS).

Table 1. Principal dimensions.

Item (Unit) Valu


Length perpendicular, 𝐿 (𝑚) 22.00
Breadth, 𝐵 (𝑚) 6.00
675 −3530 −513
𝑋 226 𝑌 3080 𝑁 −2
𝑌 390 𝑁 −138
𝑌| | −47 𝑁 −178
𝑌 −2170 𝑁| | −253
Appl. Sci. 2023, 13, 12731 6 of 16
𝑌 −3590 𝑁 −420
𝑁 −1830

x/L (-)

Figure
Figure 3. 3.Simulation
Simulationresults
results of
of turning
turning trajectories
trajectories at
ataarudder
rudderangle
angleofof3535
degrees at at
degrees three andand
three six
knots [20].
six knots [20].

Table 3. Interaction coefficients.


Table 2. Hydrodynamic force and moment coefficients.
Propeller and Rudder
Hull (×10−5 )
1−𝑡 0.934 1+𝑎 0.702 𝛾 0.342
Xu. −81 Yv. −1034 Yv. 64
𝜀 0.960 𝐶 −2.713 𝛾 0.634
Xuu −627 Yr. −126 Yr. −33
Xvv 𝜅 0.695
−407 Yv 𝐶 11.211
−2610 Nv −130
Xrr 675 Yvvv −3530 Nuv −513
2.4. Maneuverability
Xvr 226 Yvvvvv 3080 Nvvv −2
Yr 390 Nuvvv −138
To define the problem, it is necessary to assess the maneuverability of the ship. Un-
Yr|r| −47 Nr −178
derstanding the maneuverability Y makes it reasonable
−2170 to determine
Nr|r| where to start
−253the au-
vrr
tomatic berthing process. The maneuvering
Yvvr simulation
−3590 under low
Nvrr speed (1 knot)
−420was con-
ducted to investigate the ship-maneuvering characteristics in theNvvr port environment.
−1830 Fig-
ure 4 shows the trajectory of the turning circle test at 35 degrees of a rudder angle at 1
knot. The simulation results show that this can easily turn into a range of approximately
3𝐿 3.. Interaction
Table Thus, the coefficients.
berthing area should be greater than three times that of 𝐿 from the
berthing point. The maneuvering characteristics are shown in Table 4.
Propeller and Rudder
1 − tR 0.934 1 + aH 0.702 γ+
R 0.342
ε 0.960 CP+ −2.713 γ−
R 0.634
κ 0.695 CP− 11.211

2.4. Maneuverability
To define the problem, it is necessary to assess the maneuverability of the ship. Under-
standing the maneuverability makes it reasonable to determine where to start the automatic
berthing process. The maneuvering simulation under low speed (1 knot) was conducted
to investigate the ship-maneuvering characteristics in the port environment. Figure 4
shows the trajectory of the turning circle test at 35 degrees of a rudder angle at 1 knot. The
simulation results show that this can easily turn into a range of approximately 3L PP . Thus,
the berthing area should be greater than three times that of L PP from the berthing point.
The maneuvering characteristics are shown in Table 4.
pl. Sci. 2023, 13,
Appl. 12731
Sci. 2023, 13, 12731 7 of 16 7 of

x/L (-)

Figure
Figure 4. Simulation
4. Simulation resultsof
results of turning
turning trajectories at a at
trajectories rudder angle of
a rudder 35 degrees.
angle of 35 degrees.

Table 4. Turning maneuverability characteristics.


Table 4. Turning maneuverability characteristics.
Starboard Turning Port Turning
Starboard Turning Port Turning
Advance (L PP ) 2.681 2.680
Advance (𝐿(LPP))
Transfer 2.681
1.114 1.115 2.680
Turning Radius (L PP ) 2.625 0.623
Transfer (𝐿 ) 1.114 1.115
Tactical diameter (L PP ) 2.935 2.932
Turning Radius (𝐿 ) 2.625 0.623
Tactical diameterApproach
3. Path-Planning (𝐿 ) 2.935 2.932
In the last few decades, significant advancements have been made in the field of
3. Path-Planning Approach
artificial intelligence, particularly reinforcement learning, a subfield of machine learning,
which falls under the machine learning category. Reinforcement learning involves training
In the last few decades, significant advancements have been made in the field of
data by assigning rewards and punishments based on behavior and state. Unlike supervised
tificial
andintelligence,
semi-supervised particularly reinforcement
learning, reinforcement learning,
learning does nota subfield of machine
rely on pairs of input learni
which falls
data or under the machine
true results and it doeslearning category.
not explicitly evaluate Reinforcement learning
near-optimal actions involves
as true or tra
false. As a result, reinforcement learning offers a solution
ing data by assigning rewards and punishments based on behavior and state. Unlike to tackle complex problems,
including the control of robots, self-driving cars, and even applications in the aerospace
pervised and semi-supervised learning, reinforcement learning does not rely on pairs
industry. A noteworthy advancement in reinforcement learning is the introduction of a
inputtwin
data or true
delayed deepresults and it does
deterministic policynot explicitly
gradient in 2018. evaluate
TD3 is annear-optimal actions as tr
effective model-free
or false.
policyAsreinforcement
a result, reinforcement
learning method. learning
The TD3offers agenta solution to tackle
is an actor–critic complex problem
reinforcement
learning
including theagent that optimizes
control of robots, the expected long-term
self-driving cars,reward.
and evenSpecifically, TD3 builds
applications in upon
the aerospa
the success of the deep deterministic policy gradient algorithm
industry. A noteworthy advancement in reinforcement learning is the introduction o developed in 2016. DDPG
remains highly regarded and successful in the continuous action space, finding extensive
twin applications
delayed deep deterministic
in fields such as roboticspolicy gradient in
and self-driving 2018. TD3 is an effective model-f
systems.
policy reinforcement
However, like many learning method.
algorithms, DDPGThe has itsTD3 agent including
limitations, is an actor–critic
instability andreinforcem
the
learning
need agent that optimizes
for fine-tuning the expected
hyperparameters for each task.long-term
Estimation reward. Specifically,
errors gradually accumulateTD3 bui
during training, leading to suboptimal local states, overestimation,
upon the success of the deep deterministic policy gradient algorithm developed in 20 or severe forgetfulness
on the part of the agent. To address these issues, TD3 was developed with a focus on
DDPG remains highly regarded and successful in the continuous action space, findi
reducing the overestimation bias prevalent in previous reinforcement learning algorithms.
extensive
This isapplications
achieved through in fields such as robotics
the incorporation of three keyandfeatures:
self-driving systems.
However, like many algorithms, DDPG has its limitations, including instability a
the need for fine-tuning hyperparameters for each task. Estimation errors gradually ac
mulate during training, leading to suboptimal local states, overestimation, or severe f
getfulness on the part of the agent. To address these issues, TD3 was developed with
Appl. Sci. 2023, 13, 12731 8 of 16

• The utilization of twin critic networks, which work in pairs.


• Delayed updates of the actor.
• Action noise regularization.
By implementing these features, TD3 aims to enhance the stability and performance of re-
inforcement learning algorithms, ultimately improving their applicability in various domains.

3.1. Conception
The path-planning algorithm in this paper was performed using the KASS model
mentioned in Section 2.3. The port selected was the Busan port, whose geometry is shown
in Figure 5. The objective of this result is to use the TD3 (the pseudocode as shown in
Algorithm 1) for training the model that can generate the path for the berthing process.
First, the TD3 algorithm trains the model with the combination of maneuvering simulations
suggested by the MMG model. This approach allows for the integration of realistic ship
motion dynamics into the training process. Then, this model is used to generate the desired
path for the berthing task by inputting the state of the ship and predicting the control signal
n (propeller speed) and δ (rudder angle), based on the input ship state s( x, y, ψ, u, v, and r ).
The concept of path planning for automatic berthing tasks is shown in Figure 6.

Algorithm 1: Pseudocode TD3 Algorithm


1 Initialize the critic network Qφ1 , Qφ2 and actor-network µθ with random parameter φ1 , φ2 , θ
2 Initialize the target parameter to the main parameter
θtarg ← θ, φtarg1 ← φ1 , φtarg2 ← φ2
3 For t = 0 to T − 1 do:
Observe the state s of the environment and choose the action
4 
a = clip µθ (s) + ε, alow , ahigh where ε ∼ N
Execute action a in the TD3 environment to observe the new state s0 , reward r, and done
5
signal that gives the signal to stop training for this step.
6 Store training set (s, a, r, s0 , d) in replay buffer D
7 If s0 taking to the goal point, reset the environment state
If it is time for an update:
8
For j in range (custom decided) do:
Randomly sample a batch of transitions
9
B = {(s, a, r, s0 , d)} from
 D 
10 a0 (s0 ) = clip µθtarg (s0 ) + clip(ε, −c, c), alow , ahigh
Compute Target actions:
where ε ∼ ( N, σ)
Compute targets:
11
y(r, s0 , d) = r + γ(1 − d) min Qφtargi (s0 , a0 (s0 ))
i =1,2
Update Q-functions using gradient descent:
2
12

∇φi | B1 | ∑ Qφi (s, a) − y(r, s0, d)
(s,a,r,s0,d)∈ B
where i = 1, 2
If j mod policy delay == 0 then:
13 Update policy by the one-step deterministic policy gradient ascent using
∇θ | B1 | ∑ Qφ1 (s, µθ (s))
s∈ B
Update target networks:
14 φtarg.i ← ρφtarg.i + (1 − ρ)φi where i = 1, 2
θtarg ← ρθtarg + (1 − ρ)θ
End if
End for
15
End if
End until convergence
10 Compute Target actions:
where ε ~ ( N , σ )
Appl. Sci. 2023, 13, 12731 8 of 17
Compute targets:
11
y ( r , s ', d ) = r + γ (1 − d ) min Qφ t arg i ( s ', a ' ( s ' ) )
i =1,2
By implementing these features, TD3 aims to enhance the stability and performance
Appl. Sci. 2023, 13, 12731 Update Q-functions using gradient descent: 9 of 16
of reinforcement learning algorithms, ultimately improving their applicability in various
1
 ( Qφi ( s, a ) − y ( r , s ', d ) )
2
domains.
12 ∇φ i
B ( s , a , r , s ', d )∈B
where i=1,2
If j mod policy delay == 0 then:
Update policy by the one-step deterministic policy gradient ascent using
13
1
∇θ
B
 Qφ ( s, μθ ( s ) )
s∈B
1

Update target networks:


14 φt arg .i ← ρφt arg .i + (1 − ρ ) φi where i=1,2
θ t arg ← ρθ t arg + (1 − ρ )θ
End if
End for
15
End if
Figure5.5.End
Figure Satellite
until
Satellite imageofofthe
theBusan
convergence
image Busanport.
port.

3.1. Conception
The path-planning algorithm in this paper was performed using the KASS model
mentioned in Section 2.3. The port selected was the Busan port, whose geometry is shown
in Figure 5. The objective of this result is to use the TD3 (the pseudocode as shown in
Algorithm 1) for training the model that can generate the path for the berthing process.
First, the TD3 algorithm trains the model with the combination of maneuvering simula-
tions suggested by the MMG model. This approach allows for the integration of realistic
ship motion
Figure dynamics
6. Concept into the training
of the path-planning process. Then, this model is used to generate the
algorithm.
Figure
desired 6. Concept
path forofthe
the berthing
path-planning
task algorithm.
by inputting the state of the ship and predicting the
control
3.2.
3.2. signal
Setting
Setting 𝑛 (propellerLearning
for Reinforcement
for Reinforcement speed) and 𝛿 (rudder angle), based on the input ship state
Learning
𝑠(𝑥, 𝑦, 𝜓, 𝑢, 𝑣, and 𝑟). The concept of path planning for automatic berthing tasks is shown
Inthis
In thissection,
section,thetheparameters
parametersand andvariables
variables for
for the
theTD3
TD3algorithm
algorithm to
todeal
dealwith
withthe
the
in Figure 6.
automatic berthing task were set as
automatic berthing task were set as follows: follows:
••Algorithm
Observationspace
Observation spaceand
1. Pseudocode
andstate:
state:The
TD3 Algorithm
Theobservation
observationspace spaceand andstate
statewereweredefined
definedas asthetheset
set
ofphysical
of physicalvelocity,
velocity,position,
position,and andorientation.
orientation. The Thestate vectors(𝑠(𝑥,
statevector x, y,𝑦,ψ,𝜓,u,𝑢,v,𝑣,and
and r𝑟)
)
Initialize the critic x,ynetwork Qφ1 , Qφ 2 and actor-network μθ with random param-
1includes
includes the
the position
position 𝑥,𝑦 as asthe
the element.
element. The
The orientation
orientation ψ𝜓 is the
is theheading
heading angle.
angle. The
The
linear
linear φ1 ,φ2 ,θisisu,v
etervelocity
velocity 𝑢,𝑣and
and the theangular
angular velocity
velocity r; 𝑟;
is is
•• Action:
Action: The
The control
control action
action includes
includes the
Initialize the target parameter to the main parameterthe control
control input
input of of the
the thruster
thruster (revolution
(revolution of of
2propeller) and steering (rudder angle) system. The action signal is continuous in the
θt arg ← θ , φt arg1 ← φ1 , φt arg 2 ← φ2
propeller) and steering (rudder angle) system. The action signal is continuous in the
range −1, 1], where [[−1,
range[[−1, −1,1] [−300,100]
1]==[−300, 100]rpm rpmrepresents
representsthe thethrust
thrustsystem
systemand and[−1,[−1,1]
For t [=−01, to
31] where 1] T [−−135,
= do:
35] is the degrees for the steering system.
where [−1, 1] = [−35, 35] is the degrees for the steering system.
RewardObserve
•• Reward function:
function: the state
This
This plays
playss aof the environment
a crucial
crucial role
rolein inthe and
thedesign
designchoose
of the action
ofaareinforcement
reinforcement learning
learning
4
application.
application.ItItserves
servesas = clip (for
asaaguide
guide for (the
μθ thes ) +network
ε , alow , a high
network ) whereprocess
training
training ε ~ N and
process andhelps
helpsoptimize
optimize
the
themodel’s
model’s performance
performance throughout
throughout each
eachepisode.
episode. If the
thereward function does not
Execute action a in the TD3 environment to If
observe reward
the new function
state sdoes ' , re-not
5accurately
accurately capture the objectives of the target task, the model may struggle to achieve
ward capture
r , and donethe objectives
signal that of gives
the target task, the
the signal model
to stop may struggle
training for this to achieve
step.
desirable
desirableperformance.
performance.
6 Store training set ( s, a, r, s ', d ) in replay buffer D
7 If s ' taking to the = ri −
rS goal ri−1 reset the environment state
point,
If it is time for an r= r Dist + r LinVel + r AngVel + r Heading
update:
8 p
For j in range (custom = −100 do:
r Dist decided) x 2 + y2
Randomly r Heading
sample= −a1000 batch −ψ
of transitions
ψTarget
9 √
r B = {( s, a, r , s ', d )} from D
= −2000 u + v
LinVel
2 2

r AngVel = −1000|r |

In this paper, based on the boundary state, the weights 100, 1000, 2000, and 1000 were
assigned to the weight of distance, heading, linear, and angular velocity, respectively. The
reward value was described as the sum of the reward in each time step. It received a positive
rLinVel = −2000 u 2 + v 2
rAngVel = −1000 r

In this paper, based on the boundary state, the weights 100, 1000, 2000, and 1000 were
Appl. Sci. 2023, 13, 12731 assigned to the weight of distance, heading, linear, and angular velocity, respectively. 10 of 16 The
reward value was described as the sum of the reward in each time step. It received a pos
itive value if the state variable changed to the required value and vice versa. Furthermore
value
for theif the state
faster variable changed
convergence to the required
of rewards value
in the first andof
stage vice
theversa. Furthermore,
berthing task, the for
distance i
the
morefaster convergence
prior of rewards
to the target state in the the
than first speed
stage ofand
the berthing
heading.task, the the
Thus, distance is more
reward function o
prior to theistarget
distance state than
multiplied bythe
thespeed and heading.
distance Thus,Itthe
coefficients. is reward functioncoefficients)
(1.1-Distance of distance in the
is multiplied by the distance coefficients. It is (1.1-Distance coefficients) in the case of the
case of the reward function of the heading, resultant velocity, and yaw rate. The reward
reward function of the heading, resultant velocity, and yaw rate. The reward coefficients
coefficients are described in Figure 7.
are described in Figure 7.

2
1
0
Distance Coefficient
-1
Velocity and Heading Coefficient
-2
0 0.2 0.4 0.6 0.8 1
Distance Ratio
Figure7.7.Reward
Figure Reward coefficients.
coefficients.

•• Environment:
Environment: The
Theenvironment
environment receives the input
receives as theas
the input control input and
the control the and
input statethe state
then returns the ship’s new state and the reward for this action. The environment
then returns the ship’s new state and the reward for this action. The environmen
function was built based on a maneuvering simulation that uses the MMG model as a
function wassimulation.
mathematical built based on a maneuvering simulation that uses the MMG model as
• a mathematical
Agent: simulation.
The hyperparameters for the TD3 model were selected as follows: the number
• Agent: The hyperparameters for thewith
of hidden layers was set as two layers TD3512
model
unitswere selected
for each. as follows:
The learning ratethe
for numbe
of actor
the hiddenandlayers was set as
critic networks twoβ layers
α and was setwith 512 units
to 0.0001. for each.
The discount Theγ learning
factor was 0.99. rate fo
The
thesoft update
actor and coefficient τ was 0.005.
critic networks α andThe batchset
β was sizetowas 128. This
0.0001. The training
discount factor 𝛾 was
process
0.99. The soft update coefficient 𝜏 was 0.005. The batch size was 128. This training
was set to 20,000/50,000 steps for the warmup of the model with the exploration noise
set in Table
process 5. set to 20,000/50,000 steps for the warmup of the model with the explora
was
tion
Table 5. noise set
Exploration in Table
rates. 5.

Table 5. ExplorationItem
rates. Value (Exploration Rate )
Step [0–5000] 0.5
Item
Step [5000–10,000]: Value (Exploration
0.4 Rate ϵ)
Step [0–5000]
Step [10,000–15,000] 0.3 0.5
Step [15,000–20,000] 0.2
Step [5000–10,000]: 0.4
Step [20,000–50,000] 0.1
Step [10,000–15,000] 0.3
StepConditions
3.3. Boundary [15,000–20,000] 0.2
Step [20,000–50,000] 0.1
The simulations were performed at Busan port, with the satellite capture as in Figure 7.
The geometry of this port was simplified as Figure 8. Considering the geometry of this port,
3.3. cases
two Boundary Conditions
of berthing were selected to investigate the path-plan-generating system.
Case 1: Parallel berthing task in a 30.5
The simulations were performed × 202 m
at Busan water
port, area.
with theInsatellite
this case, the ship’s
capture as in Figure
initial states were generated randomly, as described in Table 6. The berthing point
7. The geometry of this port was simplified as Figure 8. Considering the geometry of thi was
assumed, as shown in Table 7. The berthing task can be considered a success if the ship
port, two cases of berthing were selected to investigate the path-plan-generating system.
state is in the range of the values described in Table 7. The boundary area is described as a
Figure 8a and Table 8.
Case 2: Perpendicular berthing task in a 42 × 170.5 m water area. In this case, the
ship’s initial states were generated randomly, as described in Table 9. The berthing point
was assumed, as shown in Table 10. The berthing task can be considered a success if the
ship state is in the range of the values described in Table 10. The boundary area is described
as a Figure 8b and Table 11.
𝑟(°/𝑠) 0 ± 1

Table 11. Boundary values (case 2).

Item Value
Appl. Sci. 2023, 13, 12731 𝑥(𝑚) 11 of 16
[−20, 150.5]
𝑦(𝑚) [−10, 32]

(a) (b)
Figure
Figure 8.
8. Geometry
Geometryofofthe
theport
portand
andberthing situation:
berthing (a) (a)
situation: parallel berthing
parallel tasktask
berthing (case 1) and
(case (b)
1) and
perpendicular berthing
(b) perpendicular tasktask
berthing (case 2). 2).
(case

4. Simulation
Table Results
6. Initial state and
values Discussion
(case 1).
The USV is considered to have successful berthing if it approaches the target berthing
point with an error Itemwithin the allowable range established in Section Value 3.3. The training
process stops when xthe 0 ( mnumber
) [−20, 20]
of training episodes reaches 50,000.
Figures 9 and 10 y0 sequentially
(m) present a set of information [−10,
that 10]
includes the trajectory,
ψ0 (◦ ) −5, 5]
surge, sway, yaw rate, heading angle, and control input as the [propeller revolution and
u0 (knot) 1
rudder angle. The vUSV starts from the random state shown in Section 3.3. Under the au-
0 ( knot ) 0
tomatic control of the TD3

r0 ( /s) model, the ship has successfully berthed 0 at the target location.
The ship states gradually change from the initial state to the required range. In particular,
the combination with the adjusted reward function made the ship berthing process hap-
7. Target
Tablefaster
pen andpoint
more(case 1).
optimally. The time series of surge velocity was shown in the first
phase of the berthing process. The ship’s speed was increased to reduce the distance to
Item Value
the berthing point. However, the sway velocity and yaw rate did not change much in this
x (mof
phase. In the last phase ) the berthing process, the importance of180 ± 2 velocity, yaw rate,
surge
y (m) 20 ± 0.5
and heading angle is much more important than the distance in the first phase. So, the
ψ (◦ ) 0±3
change in this valueu seems
(m) more sudden to adapt to the required value of the berthing
0 ± 0.1
process. v (m) 0 ± 0.05
r (◦ /s)the learning performance of TD3 case 01±with
Figure 11a,b shows 1 50,000 episodes.
Although the warmup episode number is 20,000, the average reward shows that the
model seems to be successfully berthing and stable before the warmup phase is finished.
Table 8. Boundary values (case 1).
This demonstrates that TD3 provides good reinforcement learning for the automatic
berthing process. ToItem account for the difference in penalty values affecting the training
Value
process, too high a penalty will cause the model to misjudge the states, for instance, in
x (m) [−20, 182]
cases where the vessel y (mhas
) moved reasonably close to the berthing position. However, it
[−10, 20.5]
then collides with the wall and receives a heavy point deduction, causing the model to
judge that process as wrong and try actions other than that process. This makes the
Table 9. Initial state values (case 2).

Item Value
x0 ( m ) [−20, 20]
y0 ( m ) [−10, 10]
ψ0 (◦ ) [−5, 5]
u0 (knot) 1
v0 (knot) 0
r0 (◦ /s) 0
Appl. Sci. 2023, 13, 12731 12 of 16

Table 10. Target point (case 2).

Item Value
x (m) 150 ± 0.5
y (m) 30 ± 2
ψ (◦ ) 90 ± 3
u (m/s) 0 ± 0.1
v (m/s) 0 ± 0.05
Appl. Sci. 2023, 13, 12731 r (◦ /s) 0±1 13 o

Table 11. Boundary values (case 2).


training process longer. These results show that the method proposed in this paper h
high performance
Item and success rate with a low penalty value.
Value
The
x (m) simulation results show that the combination of TD3 and the maneuvering s
[−20, 150.5]
ulation proposes
y (m) a powerful and accurate system for the automatic berthing proc
[−10, 32]
Comparing the shape of the average reward to the results shown in [15] demonstrates
the stability of the TD3 algorithm is better than the older algorithm in the field of r
4. Simulation Results and Discussion
forcement learning. In particular, the method proposed in this paper is easier to ap
The USV isbecause
consideredof thetoability
have successful berthingand
to learn, explore, if itoptimize
approaches the the target
policy berthing
automatically. It can
point with an error within the allowable range established in Section
used for another ship model if we know the ship’s hydrodynamic characteristics.3.3. The training
process stops when However,
the number theoflimitations
training episodes reaches
in this paper are50,000.
evident. Firstly, the simplification of
Figures 9 and
port condition: due to the simplicity of the model, that
10 sequentially present a set of information includes
the effect of thethe trajectory,was igno
disturbance
surge, sway, yaw rate, heading angle, and control input as the propeller
in the simulation. This causes non-accuracy if there is wind or waves. The revolution andsecond lim
rudder angle. tion
The ofUSV starts from the random state shown in Section
this approach is the simplification of the obstacles and port geometry. 3.3. Under the It has a
automatic control of theeffect
nificant TD3 on
model, the ship has successfully
the determination of the initial berthed
state andatthe
theberthing
target location.
point. The prese
The ship statesofgradually change from
moving obstacles the initial
can make state totime
the training the required range. In particular,
increase significantly. So, the method p
the combination with in
posed thethis
adjusted rewardonly
paper should function made
be used the determined
in the ship berthing portprocess
and not happen
for moving ob
faster and morecles.
optimally. The time series of surge velocity was shown in
Finally, this paper proposed the initial development of the path planning the first phase of system.
the berthing process. The ship’s speed was increased to reduce the distance
results are performed based on maneuver simulation. The accuracy and performanc to the berthing
point. However, theapproach
this sway velocity and yaw
in real-world rate did not
operations needchange much in this
to be carefully phase. In
considered andtheevaluated
last phase of the berthing process, the importance of surge velocity, yaw rate,
Although the approach in this paper uses the newest technique in reinforcem and heading
angle is much more
learning important
at present,thanthethe distance inofthe
performance thisfirst phase.
method So, the
should change in this
be investigated and compa
value seems more sudden to adapt to the required value of the berthing process.
carefully.

Figure
Figure 9. Simulation 9. Simulation
results results
in a parallel in a parallel
berthing berthing
task (case 1). task (case 1).
Appl. Sci. 2023, 13, 12731
Appl. Sci. 2023, 13, 12731 13 of 16 14 o

Appl. Sci. 2023, 13, 12731 14 of 17

Figure 10. Simulation


Figureresults in the perpendicular
10. Simulation results in theberthing task (case
perpendicular 2). task (case 2).
berthing

Figure 11a,b shows the learning performance of TD3 case 1 with 50,000 episodes.
Although the warmup episode number is 20,000, the average reward shows that the model
seems to be successfully berthing and stable before the warmup phase is finished. This
demonstrates that TD3 provides good reinforcement learning for the automatic berthing
process. To account for the difference in penalty values affecting the training process, too
high a penalty will cause the model to misjudge the states, for instance, in cases where the
vessel has moved reasonably close to the berthing position. However, it then collides with
the wall and receives a heavy point deduction, causing the model to judge that process
as wrong and try actions other than(a) that process. This makes the training process (b) longer.
These results show
Figurethat the method
11. Learning proposed
performance in this
of TD3: (a) paper
parallelhas a high
berthing performance
(case and
1) and (b) perpendicular be
Figure 10. Simulation
success ingaresults
rate with task in the
low(case 2). perpendicular
penalty value. berthing task (case 2).

5. Conclusions and Remarks


In this study, the Korea autonomous surface ship (KASS) model was selected as
target ship to perform the training for path planning for the autonomous berthing tas
mathematical model and the hydrodynamics coefficients suggested in previous resea
conducted at Changwon National University provided an accurate model for solving
motion of a slow ship. By performing the path-planning algorithm based on the comb
tion of TD3 and a maneuvering simulation, the automatic berthing task could be c
ducted with the automatic berthing problem and stable performance using reinforcem
(a) (b)
learning.
Figure 11.11.
Figure Learning
Learning performance
Even though
performance ofof
TD3:the(a)
TD3: parallel
high
(a) parallel berthing
berthing(case
performance the1)1)
of(case and
and(b)
(b)perpendicular
path-planning system was
perpendicular berth-shown, the c
berthing
ingtask
task(case
(case2).
2). plex environmental disturbance in the port area needs to be included in the model. It ta
more time to train the model but is a necessary factor in the real situation. Addition
5. Conclusions and
The simulation Remarks
several results show based
algorithms that the oncombination
control theory ofmust
TD3 be and the maneuvering
considered for fastersim-
convergence
ulation
In thisproposes
study, the a Korea
powerful and accurate
autonomous system
surface ship for
(KASS) the automatic
model was berthing
selected as process.
the
Comparing
target theAuthor
ship to perform shape theContributions:
oftraining
the average Conceptualization,
for path reward to the
planning H.K.Y.
forresults and
shown
the autonomous A.K.V.; methodology,
in [15] task.H.K.Y.,
demonstrates
berthing A A.K.V.,
that the stabilityT.L.M.;
of software,
the TD3 A.K.V.; validation,
algorithm is betterA.K.V.
than and
the T.L.M.;
older formal analysis,
algorithm in A.K.V.
the andofT.L.M.; inv
field
mathematical model and the hydrodynamics coefficients suggested in previous research
gation, A.K.V. and T.L.M.; resources, T.L.M.; data curation, A.K.V. and T.L.M.; writing—orig
reinforcement
conducted learning.
at Changwon In particular,
National the method
University providedproposed
an in thismodel
paper foris easier to apply
draft preparation, A.K.V.; writing—review andaccurate
editing, H.K.Y. and solving the
T.L.M.; visualization, A.K
because of the
motion of a slow ship. abilityBy to learn,
performing explore,
the and optimize
path-planning the policy
algorithm automatically.
based on the It can be
combina-
supervision, H.K.Y.; project administration, H.K.Y.; funding acquisition, H.K.Y. All authors h
used
tion of for
TD3another a ship
and read andmodel
agreediftowe
maneuvering theknow
simulation,
published thethe
ship’s hydrodynamic
automatic
version berthingcharacteristics.
of the manuscript. task could be con-
However, the limitations in this paper are evident. Firstly,
ducted with the automatic berthing problem and stable performance using reinforcement the simplification of the port
condition: due Funding:
to the This research
simplicity of the was funded
model, the by of
effect thetheDevelopment
disturbance of
was Autonomous
ignored Ship Techno
in the
learning. (PJT201313, Development of Autonomous Navigation System with Intelligent Route Planning F
simulation.
Even though Thisthe
causes non-accuracy ifofthere is wind or waves. The second limitation of this
tion),high
funded performance
by the Ministry the path-planning
of Oceans system
and Fisheries (MOF, was shown,
Korea). the com-
approach is the simplification of the obstacles and port geometry.
plex environmental disturbance in the port area needs to be included in the model. It takes It has a significant effect
on the Institutional
determination of Review
the initial Board
state Statement:
and the Not applicable.
berthing point. The presence of moving
more time to train the model but is a necessary factor in the real situation. Additionally,
obstacles
several can make
algorithms the training
based on control time increase
theory mustsignificantly.
be considered So, for
the faster
method proposed in this
convergence.
paper should only be used in the determined port and not for moving obstacles. Finally,
Author Contributions: Conceptualization, H.K.Y. and A.K.V.; methodology, H.K.Y., A.K.V., and
T.L.M.; software, A.K.V.; validation, A.K.V. and T.L.M.; formal analysis, A.K.V. and T.L.M.; investi-
gation, A.K.V. and T.L.M.; resources, T.L.M.; data curation, A.K.V. and T.L.M.; writing—original
draft preparation, A.K.V.; writing—review and editing, H.K.Y. and T.L.M.; visualization, A.K.V.;
Appl. Sci. 2023, 13, 12731 14 of 16

this paper proposed the initial development of the path planning system. The results are
performed based on maneuver simulation. The accuracy and performance of this approach
in real-world operations need to be carefully considered and evaluated.
Although the approach in this paper uses the newest technique in reinforcement
learning at present, the performance of this method should be investigated and com-
pared carefully.

5. Conclusions and Remarks


In this study, the Korea autonomous surface ship (KASS) model was selected as
the target ship to perform the training for path planning for the autonomous berthing
task. A mathematical model and the hydrodynamics coefficients suggested in previous
research conducted at Changwon National University provided an accurate model for
solving the motion of a slow ship. By performing the path-planning algorithm based
on the combination of TD3 and a maneuvering simulation, the automatic berthing task
could be conducted with the automatic berthing problem and stable performance using
reinforcement learning.
Even though the high performance of the path-planning system was shown, the
complex environmental disturbance in the port area needs to be included in the model. It
takes more time to train the model but is a necessary factor in the real situation. Additionally,
several algorithms based on control theory must be considered for faster convergence.

Author Contributions: Conceptualization, H.K.Y. and A.K.V.; methodology, H.K.Y., A.K.V. and
T.L.M.; software, A.K.V.; validation, A.K.V. and T.L.M.; formal analysis, A.K.V. and T.L.M.; inves-
tigation, A.K.V. and T.L.M.; resources, T.L.M.; data curation, A.K.V. and T.L.M.; writing—original
draft preparation, A.K.V.; writing—review and editing, H.K.Y. and T.L.M.; visualization, A.K.V.;
supervision, H.K.Y.; project administration, H.K.Y.; funding acquisition, H.K.Y. All authors have read
and agreed to the published version of the manuscript.
Funding: This research was funded by the Development of Autonomous Ship Technology (PJT201313,
Development of Autonomous Navigation System with Intelligent Route Planning Function), funded
by the Ministry of Oceans and Fisheries (MOF, Korea).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available in this article (Tables
and Figures).
Conflicts of Interest: The authors declare no conflict of interest.

Nomenclature

Item Unit
αR - Effective inflow angle to the rudder
β rad Drift angle of the ship
βP rad Inflow angle to the propeller
βR rad Inflow angle to the rudder
γR - Flow straightening coefficient of the rudder
δ rad Rudder angle
η - Propeller diameter to the rudder span ratio
Λ - Rudder aspect ratio
The experimental coefficient for
κ -
longitudinal inflow velocity to the rudder
∇ m3 Displacement of the ship
ψ rad Heading angle of the ship
ρ kg/m3 Water density
Ratio of the wake fraction at the propeller
ε -
to the rudder
Appl. Sci. 2023, 13, 12731 15 of 16

AR m2 Rudder profile area


aH - Increase factor of the rudder force
B m Breadth of ship
BR m Average rudder chord length
Experimental constant due to wake
CP -
characteristic in maneuvering
DP m Diameter of propeller
d m Draft of ship
FN N Normal force of the rudder
FX , FY N Surge and sway force acting on the ship
fα - Lift gradient coefficients of the rudder
HR m Span of rudder
IZ kgm2 Moment of inertial of ship
JP - Advanced ratio of propeller
KT - Propeller thrust open water characteristic
k0 , k1 , k2 - Coefficients relative to KT
L PP m Ship length between two perpendicular
Effective longitudinal length of the rudder
lR m
position
MZ Nm Yaw moment acting on the ship
m kg Ship mass
n P , nS rpm Propeller revolutions per minute (rpm)
O − xyz - Earth-fixed coordinate system
o − xb yb zb - Body-fixed coordinate system
R00 - Resistance of the ship in straight motion (-)
r deg/s Yaw rate
r Reward
T N Longitudinal propeller force
t s Time
tP - Thrust deduction factor
tR - Steering deduction factor
U m/s Resultant velocity
U0 m/s Initial resultant velocity
UR m/s Resultant inflow velocity to the rudder
Longitudinal and lateral velocity of the
u, v m/s
ship in the body-fixed coordinate system
Longitudinal and lateral inflow velocity to
uR , vR m/s
rudder position
Wake coefficient at the propeller in
wP -
maneuvering motion
Wake coefficient at the propeller at straight
w P0 -
motion
wR - Wake coefficient at the rudder position
Surge, sway force, and yaw moment
X, Y, N N, N, Nm
around the midship
Surge, sway force, and yaw moment acting
X H , YH , NH N, N, Nm
on ship’s Hull
Surge, sway force, and yaw moment due to
XP , YP , NP N, N, Nm
the propeller
Surge, sway force, and yaw moment due to
XR , YR , NR N, N, Nm
the rudder
Longitudinal position of the center of
xG m
gravity
Longitudinal position of the acting point of
xH m
addition lateral force
xP m Longitudinal position of the propeller
xR m Longitudinal position of the rudder
Appl. Sci. 2023, 13, 12731 16 of 16

Abbreviations

Item
AI Artificial Intelligence
CFD Computational Fluid Dynamics
DDPG Deep Deterministic Policy Gradients
DRL Deep Reinforcement Learning
MMG Maneuvering Modeling Group
RL Reinforcement Learning
TD3 Twin Delayed DDPG (a variant of the DDPG algorithm)
USV Unmanned Surface Vehicle

References
1. Chaal, M.; Ren, X.; BahooToroody, A.; Basnet, S.; Bolbot, V.; Banda, O.A.V.; van Gelder, P. Research on risk, safety, and reliability
of autonomous ships: A bibliometric review. In Safety Science (Vol. 167); Elsevier B.V.: Amsterdam, The Netherlands, 2023.
[CrossRef]
2. Oh, K.G.; Hasegawa, K. Low speed ship manoeuvrability: Mathematical model and its simulation. In Proceedings of the
International Conference on Offshore Mechanics and Arctic Engineering—OMAE, Nantes, France, 9–14 June 2013; p. 9. [CrossRef]
3. Shouji, K. An Automatic Berthing Study by Optimal Control Techniques. IFAC Proc. Vol. 1992, 25, 185–194. [CrossRef]
4. Skjåstad, K.G.; Barisic, M. Automated Berthing (Parking) of Autonomous Ships. Ph.D. Thesis, NTNU, Trondheim, Norway, 2018.
5. Mizuno, N.; Uchida, Y.; Okazaki, T. Quasi real-time optimal control scheme for automatic berthing. IFAC-Pap. 2015, 28, 305–312.
[CrossRef]
6. Nguyen, V.S.; Im, N.K. Automatic ship berthing based on fuzzy logic. Int. J. Fuzzy Log. Intell. Syst. 2019, 19, 163–171. [CrossRef]
7. Zhang, Y.; Zhang, M.; Zhang, Q. Auto-berthing control of marine surface vehicle based on concise backstepping. IEEE Access
2020, 8, 197059–197067. [CrossRef]
8. Sawada, R.; Hirata, K.; Kitagawa, Y.; Saito, E.; Ueno, M.; Tanizawa, K.; Fukuto, J. Path following algorithm application to
automatic berthing control. J. Mar. Sci. Technol. 2021, 26, 541–554. [CrossRef]
9. Wu, G.; Zhao, M.; Cong, Y.; Hu, Z.; Li, G. Algorithm of berthing and maneuvering for catamaran unmanned surface vehicle
based on ship maneuverability. J. Mar. Sci. Eng. 2021, 9, 289. [CrossRef]
10. Im, N.; Seong Keon, L.; Hyung Do, B. An Application of ANN to Automatic Ship Berthing Using Selective Controller. Int. J. Mar.
Navig. Saf. Sea Transp. 2007, 1, 101–105.
11. Ahmed, Y.A.; Hasegawa, K. Automatic ship berthing using artificial neural network trained by consistent teaching data using
nonlinear programming method. Eng. Appl. Artif. Intell. 2013, 26, 2287–2304. [CrossRef]
12. Im, N.; Hasegawa, K. Automatic ship berthing using parallel neural controller. IFAC Proc. Vol. 2001, 34, 51–57. [CrossRef]
13. Im, N.K.; Nguyen, V.S. Artificial neural network controller for automatic ship berthing using head-up coordinate system. Int. J.
Nav. Archit. Ocean. Eng. 2018, 10, 235–249. [CrossRef]
14. Marcelo, J.; Figureueiredo, P.; Pereira, R.; Rejaili, A. Deep Reinforcement Learning Algorithms for Ship Navigation in Restricted
Waters. Mecatrone 2018, 3, 151953. [CrossRef]
15. Lee, D. Reinforcement Learning-Based Automatic Berthing System. arXiv 2021, arXiv:2112.01879.
16. Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. Int. Conf. Mach. Learn.
2018, 80, 1587–1596.
17. Yasukawa, H.; Yoshimura, Y. Introduction of MMG Standard Method for Ship Maneuvering Predictions. J. Mar. Sci. Technol. 2015,
20, 37–52. [CrossRef]
18. Khanfir, S.; Hasegawa, K.; Nagarajan, V.; Shouji, K.; Lee, S.K. Manoeuvring characteristics of twin-rudder systems: Rudder-hull
interaction effect on the manoeuvrability of twin-rudder ships. J. Mar. Sci. Technol. 2011, 16, 472–490. [CrossRef]
19. Vo, A.K.; Mai, T.L.; Jeon, M.; Yoon, H.k. Experimental Investigation of the Hydrodynamic Characteristics of a Ship due to Bank
Effect. Port. Res. 2022, 46, 294–301. [CrossRef]
20. Kim, D.J.; Choi, H.; Kim, Y.G.; Yeo, D.J. Mathematical Model for Harbour Manoeuvres of Korea Autonomous Surface Ship
(KASS) Based on Captive Model Tests. In Proceedings of the Conference of Korean Association of Ocean Science and Technology
Societies, Incheon, Republic of Korea, 13–14 May 2021.
21. Vo, A.K. Application of Deep Reinforcement Learning on Ship’s Autonomous Berthing Based on Maneuvering Simulation. Ph.D.
Thesis, Changwon National University, Changwon, Republic of Korea, 2022.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like