Comparative Control

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/360046322
COMPARATIVE ANALYSIS OF DIFFERENT CONTROL STRATEGIES FOR ACTIVE

HEAVE COMPENSATION
Thesis · June 2021
CITATION READS
1 59
1 author:
Shrenik Zinage
Purdue University
4 PUBLICATIONS 26 CITATIONS
SEE PROFILE
All content following this page was uploaded by Shrenik Zinage on 13 July 2023.
The user has requested enhancement of the downloaded file.

COMPARATIVE ANALYSIS OF DIFFERENT
CONTROL STRATEGIES FOR ACTIVE HEAVE
COMPENSATION
A Project Report
submitted by
SHRENIK ZINAGE
in partial fulfilment of the requirements

for the award of the degree of
BACHELOR OF TECHNOLOGY
DEPARTMENT OF OCEAN ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY MADRAS.
JUNE 2021
THESIS CERTIFICATE
This is to certify that the thesis titled COMPARATIVE ANALYSIS OF DIFFERENT

CONTROL STRATEGIES FOR ACTIVE HEAVE COMPENSATION, submitted
by Shrenik Zinage, to the Indian Institute of Technology, Madras, for the award of the
degree of Bachelor of Technology, is a bonafide record of the research work done by
him under our supervision. The contents of this thesis, in full or in parts, have not been
submitted to any other Institute or University for the award of any degree or diploma.
Dr. Abhilash Somayajula

Research Guide
Assistant Professor
Dept. of Ocean Engineering
IIT Madras, 600 036
Place: Chennai
Date: 11th June 2021
ACKNOWLEDGEMENTS
First and foremost, I would like to thank my advisor, Dr. Abhilash Somayajula, for
his patient guidance, continuous motivation and valuable suggestions all the way during
my education in IIT Madras. Without his support it was impossible to achieve this feat.
His logical approach to the difficult problem has always been of great value to me.
Most importantly, his passion and conviction for work always inspires me, and I hope
to continue to work in future with his noble thoughts. One of the main transformation
which I find in myself working under him are, calmness and responsiveness under any
adversity. I am deeply indebted to him for the changes he made in my scientific thinking
and personal attitude. I would also like to thank him for giving me a head-start into
research by guiding me to write various research papers and encouraging me to research
on novel ideas.
I sincerely thank my BTech Project Committee members Prof. Suresh Rajendran,

Prof. Vijaykumar and Prof. Ananthakrishnan for their invaluable suggestions. The
healthy discussion during the presentation and report building each month helped me a
lot to formulate new ideas and ultimately, write this thesis.
Last but not the least, I am truly grateful towards my family and friends who have
been patient with me and motivated me everyday to pursue my interests. They have
been very understanding and have supported me at every step throughout this journey.
i
ABSTRACT
KEYWORDS: Active Heave Compensation; Winch Control; Classical Control;

Data Driven Control; Learning Based Control
Heave compensation is an essential part in various offshore operations. It is used in

various applications, which include the transfer of cargo between two vessels in the
open ocean, installation of topsides of an offshore structure, on-loading or off-loading
systems, offshore drilling, landing helicopter on oscillating structures, deploying and re-
trieving manned submersibles, stabilizing ROV’s, and for surveillance, reconnaissance
and monitoring. These applications typically involve a load suspended from a hydrauli-
cally powered winch that is connected to a vessel that is undergoing dynamic motion in
the ocean environment. The goal in these applications is to design a winch controller
to keep the load at a regulated height by rejecting the net heave motion of the winch
arising from ship motions at sea.
In this project, we analyze and compare the performance of various control techniques
ranging from classical control to machine learning based control in stabilizing a sus-
pended load while the vessel is subjected to changing sea conditions. The KRISO con-
tainer ship is chosen as the vessel undergoing dynamic motion in the ocean. The oppo-
site of the net heave motion at the winch is provided as a reference signal to track. The
control strategies implemented are Proportional-Derivative (PD) Control, Model Pre-
dictive Control (MPC), Linear Quadratic Integral Control (LQI), Sliding Mode Control
(SMC), Data Driven Control, and Reinforcement Learning Based Control. The per-
formance of all these controllers except data-driven control is compared with respect
to heave compensation, offset tracking, disturbance rejection, and measurement noise
attenuation.
ii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS i
ABSTRACT ii
LIST OF TABLES v
LIST OF FIGURES vii
ABBREVIATIONS viii
NOTATION ix
1 INTRODUCTION 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . 3
2 SHIP MOTION MODELING 5
3 STATE SPACE MODEL OF THE WINCH DRIVE 11

3.1 Dynamic Model of the Winch Drive . . . . . . . . . . . . . . . . . 11
3.2 Data Values for the Winch Drive . . . . . . . . . . . . . . . . . . . 14
4 CLASSICAL CONTROL 15
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Proportional-Derivative Control . . . . . . . . . . . . . . . . . . . 16
4.3 Linear Quadratic Integral Control . . . . . . . . . . . . . . . . . . 19
4.4 Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Sliding Mode Control . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 DATA DRIVEN CONTROL 25

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iii
5.2 Long short-term memory (LSTM) . . . . . . . . . . . . . . . . . . 25
5.3 LSTM for dynamic model identification . . . . . . . . . . . . . . . 26
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 REINFORCEMENT LEARNING BASED CONTROL 30

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 Deep Deterministic Policy Gradient Algorithm . . . . . . . . . . . 30
6.3 Training and Hyperparameter Tuning . . . . . . . . . . . . . . . . . 32
7 SIMULATION RESULTS 36
7.1 Case 1: With no disturbance and no measurement noise . . . . . . . 37
7.2 Case 2: With an offset . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.3 Case 3: With disturbance but no measurement noise . . . . . . . . . 42
7.3.1 Ability of classical controllers in rejecting disturbance . . . 43
7.3.2 Ability of RL controller in rejecting disturbance . . . . . . . 43
7.4 Case 4: With measurement noise but no disturbance . . . . . . . . . 44
7.4.1 With low measurement noise . . . . . . . . . . . . . . . . . 45
7.4.2 With high measurement noise . . . . . . . . . . . . . . . . 46
8 CONCLUSION 49
LIST OF TABLES
2.1 KCS Particulars . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Data Values for Winch Drive . . . . . . . . . . . . . . . . . . . . . 14
4.1 Tuned Values for MPC . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1 Details of Assumed Architecture . . . . . . . . . . . . . . . . . . . 27
7.1 Slight Sea State: Hs = 1.5 m, Tp = 6 sec . . . . . . . . . . . . . . . 37

7.2 Moderate Sea State: Hs = 4 m, Tp = 9 sec . . . . . . . . . . . . . . 38
7.3 Rough Sea State: Hs = 6 m, Tp = 12 sec . . . . . . . . . . . . . . . 38
7.4 Very Rough Sea State: Hs = 8.5 m, Tp = 14 sec . . . . . . . . . . . 38
v
LIST OF FIGURES
1.1 Passive Heave Compensation System (Calnan, 2016) . . . . . . . . 2

1.2 Active Heave Compensation System (Calnan, 2016) . . . . . . . . . 2
2.1 PM Spectrum for 4 sea states . . . . . . . . . . . . . . . . . . . . . 6

2.2 RAO’s obtained using MDLHydroD . . . . . . . . . . . . . . . . . 7
2.3 Ship with Installed Crane (O is the intersection point of the waterline,
centerline and midship) . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Net heave time history at the winch generated from a PM spectra . . 10
3.1 Schematic diagram of hydraulic driven winch . . . . . . . . . . . . 11
4.1 Schematic diagram of closed loop winch control system . . . . . . . 15

4.2 Schematic of proportional-derivative control for a plant . . . . . . . 16
4.3 Loop Shaping for tuning PD controller . . . . . . . . . . . . . . . . 18
4.4 Schematic of linear quadratic integral control for a plant . . . . . . 19
4.5 Schematic of model predictive control for a plant . . . . . . . . . . 21
4.6 Schematic of conventional sliding mode control for a plant . . . . . 23
5.1 Schematic architecture of the LSTM method . . . . . . . . . . . . . 26

5.2 Accuracy Vs Learning Rate . . . . . . . . . . . . . . . . . . . . . . 27
5.3 MSE Loss Vs Epochs for the training data set . . . . . . . . . . . . 28
5.4 Reeled length comparison between LSTM and PD controller . . . . 29
6.1 DDPG Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.2 Learning curve of the RL agent . . . . . . . . . . . . . . . . . . . . 35
7.1 Uncompensated motion time history at the winch for four sea states 36
7.2 Compensated motion time histories in different sea conditions . . . 41
7.3 Normalized swash angle for Hs = 8.5 m, Tp = 14 sec . . . . . . . . 41
7.4 Ability of the controllers to track offset for Hs = 4 m, Tp = 9 sec . . 42
7.5 Spectrum of compensated motions (a) without disturbance and (b) with
disturbance in moderate sea state (Classical Control) . . . . . . . . 43
vi
7.6 Spectrum of (a) disturbance and PM spectra’s and (b) compensated mo-
tions in moderate sea state (RL Based Control) . . . . . . . . . . . 44
7.7 Spectrum of compensated motions for low noise (SNR = 28.6 dB)
(Classical Control) . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.8 Spectrum of compensated motions for low noise (SNR = 34.53 dB)
(RL Based Control) . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.9 Spectrum of compensated motions for high noise (SNR = 1.81 dB )
(Classical Control) . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.10 Spectrum of compensated motions for high noise (SNR = 4.56 dB) (RL
Based Control) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
vii
ABBREVIATIONS
AHC Active Heave Compensation

PHC Passive Heave Compensation
PM Pierson Moskowitz
FFT Fast Fourier Transform
IFFT Inverse Fast Fourier Transform
KCS KRISO Container Ship
RAO Response Amplitude Operator
PD Proportional-Derivative
MPC Model Predictive Control
LQI Linear Quadratic Integral
SMC Sliding Mode Control
LSTM Long Short Term Memory
RNN Recurrent Neural Network
FC Fully Connected
MSE Mean Square Error
RMSE Root Mean Square Error
SGD Stochastic Gradient Descent
RL Reinforcement Learning
DQN Deep Q Network
DDPG Deep Deterministic Policy Gradient
TD Temporal Difference
RMS Root Mean Square
SNR Signal to Noise Ratio
viii
NOTATION
Lpp Length between Perpendiculars

LW L Length Waterline
B Breadth of the Ship
D Depth of the Ship
T Draft of the Ship
CB Block Coefficient
Fn Froude Number
kxx Roll Radius of Gyration about CG
kyy Pitch Radius of Gyration about CG
kzz Yaw Radius of Gyration about CG
Hs Significant Wave Height
Tp Peak Time Period
ωnyq Nyquist Frequency
ξ3 Heave RAO
ξ4 Roll RAO
ξ5 Pitch RAO
β Wave Incident Angle
βs Slew Angle
lcrane Horizontal Extent of the Crane
d Disturbance
g Acceleration due to Gravity
Koil Bulk Modulus of Hydraulic Fluid
Vc Volume of Hydraulic Lines
Dp Maximum Pump Displacement
Dm Fixed Displacement of Motor
ωp Rotation Rate of Pump
kleak Total Leakage Constant
ηm Efficiency of Motor
Tw Time Constant
rw Radius of Winch
Jw Inertia of Winch
b Viscous Friction of the Winch
m Mass of the Payload
up Control Input
xp Swash Angle of Pump
∆p Change in Pressure
żw Velocity of the Reeled Rope
zw Position of the Reeled Rope
n Measurement Noise
P Plant Transfer Function
C Controller Transfer Function
ix
L Open Loop Transfer Function
KP Proportional Gain
KD Derivative Gain
Tf Time Constant for Noise Filter
K Optimal Feedback Gain Matrix
Np Prediction Horizon
Nc Control Horizon
S Sliding Surface
kp Controlled Gain
π∗ Optimal Policy
τ Network Transition Gain
γ Discount Factor
D Replay Buffer
N Ornstein–Uhlenbeck action noise
Rt Discounted Reward
Q Q Value
µ Deterministic Policy Function
x
CHAPTER 1
INTRODUCTION
1.1 Background and Motivation
With the ever increasing marine exploration and gradual depletion of land resources,
mankind has started to look to the oceans, that are rich in several minerals and other
natural resources. In order for these resources to be extracted, various processes like
deep sea mining and offshore drilling are used (Ni et al., 2009; Korde, 1998; Woodacre,
2015; Woodacre et al., 2015). Deep sea mining is the process of retrieving the mineral
deposits from the ocean floor. National Institute of Ocean Technology (NIOT) in India
(Atmanand and Ramadass, 2017) is currently planning to extract such minerals from
the Central Indian Ocean Basin (CIOB) located at a depth of 6000 meters. NIOT is
currently building a crawler based mining machine that can collect, crush, and pump the
mineral nodules to the mother ship (Atmanand and Ramadass, 2017). This operation is
generally executed by suspending a mine excavator from a surface vehicle. For effective
mining, the heave motion of the vessel needs to be compensated.
Various external disturbances like waves, winds, and currents can cause the vessel
to experience strong dynamic motion in both seakeeping (vertical plane motion) and
maneuvering modes (horizontal plane motions). The maneuvering motions are slower
(of the order of a few minutes) and the seakeeping modes are faster (of the order of a few
seconds). The maneuvering modes of motion can generally be controlled effectively
using dynamic positioning systems (Fossen (2011); Fossen and Strand (2001); Sørensen
(2005); Grimble et al. (1980)). However, the seakeeping motions of the vessel need
effective compensation to achieve regulation of the payload position.
The handling of equipment’s in ocean can be an arduous task especially in higher

sea states. Heave compensation methods are generally adopted to attenuate the vessels
heave motion from the load. They can be broadly categorized as either passive heave
compensation (PHC) or active heave compensation (AHC) depending on the method
of compensation which is employed. A PHC device functions as a vibration absorber
that is tuned to absorb the energy at certain range of frequencies. They generally do
not require a power source for their operation; instead they employ a vibration damping
element along the tow line which attenuates the towed body motion. However, the
compensation performance through PHC systems is typically around 80 % (Hatleskog
and Dunnigan, 2007). Fig. 1.2 shows a schematic attempting to maintain depth on its
submerged load from states 1 through 3 using a PHC system.
Figure 1.1: Passive Heave Compensation System (Calnan, 2016)
On the other hand, AHC system is an active feedback system that provides the
payload displacement as a continuous feedback to the controller at every time instant so
that a better compensation is achieved (Woodacre et al. (2015)). This type of feedback
based compensation requires a significant power source for its functioning. However,
one of the advantages of using an active system is that the feedback variable is not
limited to vessels heave motion but can also be based on more complex variables like
the separation between two ships for payload transfer. Fig. 1.2 shows a schematic
attempting to maintain depth on its submerged load from states 1 through 3 using a
AHC system.
Figure 1.2: Active Heave Compensation System (Calnan, 2016)
2
AHC systems are composed of controllers that use the feedback signal to provide
an actuation signal through actuators. Actuators are mechanical devices driven by elec-
tric motors and usually suffer from saturation. This means that there is a limit on the
maximum actuation that can be achieved from an actuator. A lot of research is cur-
rently underway on handling a payload when actuators are subjected to saturation and
disturbances (Galuppini et al., 2018).
1.2 Objective of the Thesis
The thesis aims to do a comparative study of different control strategies ranging from
classical control to machine learning based control for active heave compensation. The
effectiveness of each of the controllers will be determined with respect to heave com-
pensation performance, offset tracking, disturbance rejection and measurement noise
attenuation. The ability of the controllers in handling the saturation of the actuators
will also be analysed.
1.3 Organisation of the Thesis
Each of the subsequent chapters contain a brief introduction which also serves as a
literature review for the particular topic covered in that chapter.
Chapter 2 presents the modeling of ship motion which is used to generate the net
heave time history of the KCS vessel in different sea conditions. The opposite of this net
heave time history is given as a reference signal to the controller that drives an actuator.
The actuator then controls the unwound rope of the winch to achieve a stabilized load.
Chapter 3 explains the dynamic model of the winch drive and also provides the
parameter values assumed in this study.
Chapter 4 gives an overview of each of the classical controllers implemented and

also provides the approaches by which each of the controller gains were tuned.
Chapter 5 introduces a data driven control methodology for AHC utilizing LSTM
networks. The ability of this type of controller in regulating the payload is analysed.
3
Chapter 6 discusses the RL based control methodology using deep deterministic
policy gradient algorithm (DDPG). The efficacy of using this technique is assessed in a
simulated environment.
Chapter 7 discusses the results. The efficacy of all controllers except data driven
control is analysed with respect to
1. Heave compensation performance
2. Ability to handle saturation of the actuators
3. Tracking an offset
4. Rejecting disturbance
5. Attenuating measurement noise
Chapter 8 summarizes the conclusions of this study.
4
CHAPTER 2
SHIP MOTION MODELING
In this project, the KCS vessel is chosen for calculating the dynamic motion in the
ocean. Table 2.1 shows the particulars of an KCS ship.
Table 2.1: KCS Particulars
Particulars Value
Length between perpendiculars Lpp 230 m
Length waterline LW L 232.5 m
Breadth B 32 m
Depth D 19 m
Draft T 10.8 m
Displacement 52030 m3
Block Coefficient CB 0.65
LCB from midship (fwd +) -3.404 m
LCB from AP (fwd +) 111.596 m
VCG from WL 3.551 m
VCG from keel 14.351 m
GM 0.6 m
Design Forward Speed U 24 knots
Analysis Speed (In this study) 0 knots
Froude Number Fn 0.26
Roll Radius of gyration about CG kxx 12.88 m
Pitch/Yaw radius of gyration about CG kyy /kzz 57.5 m
The Pierson Moskowitz spectrum corresponding to a significant wave height Hs and

peak period Tp for a range of frequencies is defined by
−5 −4 !
0.3125 ωTp −5 ωTp
S(ω) = Tp Hs2 exp . (2.1)
2π 2π 4 2π
For a simulation duration of T seconds, the frequency increment is given by ∆ω =

2π/T . At each discrete frequency ωn = n∆ω, the amplitude of the nth wave component
is given by
p
An = 2S(ωn )∆ω. (2.2)
The initial phase φn of each wave component is sampled from a uniform distribution
between −π and π. This method of computing wave elevation time history from spec-
trum is known as random phase method (Somayajula, 2017). After calculating An , ωn
is revised by randomization within a ∆ω interval to avoid the signal from repeating
itself. The random values of frequencies are chosen as shown below
(ωn )new = (ωn )old + ∆ωX, (2.3)
where X is a random variable following an uniform distribution between −0.5 and 0.5.
The resultant irregular wave elevation time history is given by
N
X
η(t) = An cos(ωn t + φn ), (2.4)
i=1
where N is the number of wave components. In this project, N is taken as 100001 to

simulate a 10000 s time history with a sampling time of 0.1 s.
The random wave elevation time history as per Eq. (2.4) was calculated for the
following four sea states:
1. Slight sea state: Hs = 1.5 m, Tp = 6 s
2. Moderate sea state: Hs = 4 m, Tp = 9 s
3. Rough sea state: Hs = 6 m, Tp = 12 s
4. Very Rough sea state: Hs = 8.5 m, Tp = 14 s
14
12
10
0
0 0.5 1 1.5 2
Figure 2.1: PM Spectrum for 4 sea states
6
The plot of the PM spectrum for these four sea states is shown in Fig. 2.1. The
response amplitude operator (RAO) of the KCS container ship is obtained for the heave,
roll and pitch modes using MDLHydroD developed by (Guha, 2016; Somayajula et al.,
2014; Guha et al., 2016) which is a frequency domain 3D panel method based tool
for analysis of wave structure interaction. Although the frequency domain program is
capable of including forward speed effects, forward speed was not considered in this
study for AHC. The magnitude of heave, roll and pitch RAOs of the KCS container
ship for a wave incident angle of 135 degrees are shown in Fig. 2.2.
1.2 0.04
1
0.03
0.8
3
4
0.6 0.02
0.4
0.01
0.2
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
(rad/s) (rad/s)
(a) Heave RAO (b) Roll RAO
0.015
0.01
5
0.005
0
0 0.5 1 1.5 2
(rad/s)
(c) Pitch RAO
Figure 2.2: RAO’s obtained using MDLHydroD
It can be seen that the RAO of roll exhibits a sharp peak, which corresponds to a
low potential damping at the roll natural frequency of the vessel. In such cases, viscous
damping plays a significant role. This is usually estimated from physical model tests
(Somayajula and Falzarano, 2016, 2017a,b) or from empirical relationships (Falzarano
et al. (2015)). However, in this project, the viscous roll damping has not been con-
sidered on purpose to enhance the motion of the vessel and thus observe and compare
the effectiveness of various controllers in keeping the load suspended from an on-board
7
winch stabilized.
After the RAO is obtained, the response spectrum of the roll, pitch and heave is
calculated as shown below
Sresponse (ω) =| H(ω) |2 S(ω), (2.5)
where Sresponse is the response spectrum, H(ω) is the response RAO and S(ω) is the
wave spectrum. Now, the input wave elevation time history was decomposed into its
frequency components by taking a fast Fourier transform (FFT). The amplitude of the
response at a frequency ωk = (k − 1)∆ω is then obtained by taking the product of
response RAO at that frequency and the FFT of input wave elevation time history at the
same frequency. Although a time history may contain several frequencies, we can only
distinguish frequencies between 0 and the Nyquist frequency. The Nyquist frequency
is half the sampling rate of the discrete signal and is given by ωnyq = π/∆t where
∆t = 0.1 sec. When a signal is sampled such that the frequency content in the signal is
beyond the Nyquist frequency, the period of the sampled signal is observed to be larger
than the period of the original signal. Therefore, when computing the time history of
heave, roll and pitch motions, we only consider the frequencies less than the Nyquist
frequency. The values at higher frequencies are taken to be mirrored conjugates of
the values before the Nyquist frequency and the mirroring happens about the Nyquist
frequency. Finally inverse fast Fourier transform (IFFT) is used to convert these three
degrees of freedom back into time domain.
In this project, the origin of the ship fixed coordinate system was assumed to be at
the intersection of the waterline, centerline, and midship. Assuming that the crane is
placed at (xcrane , ycrane ) in the vessel’s body fixed coordinate system with a slewing gear
angle βs and the wave incident angle of β, the net heave response time history of the
winch placed on board the KCS container ship is calculated in terms of the combined
roll, heave, and pitch motion caused due to the wave excitation. Fig. 2.3 shows a
schematic diagram of the ship with a crane installed on it.
8
Heave
Yaw
Slewing
Sway
O
Roll
Surge
Figure 2.3: Ship with Installed Crane (O is the intersection point of the waterline, cen-
terline and midship)
Assuming small amplitude motions consistent with linear hydrodynamic theory, the
net heave motion time history is given by
zwinch = η3 (β) + (ycrane + lcrane sin(βs ))η4 (β) − (xcrane + lcrane cos(βs ))η5 (β), (2.6)
where η3 (β), η4 (β), and η5 (β) are the heave, roll and pitch time histories respectively,
which depend on the incident wave angle β.
In this project, the coordinate of the crane with respect to vessel’s body frame was
assumed to be at (−1.5 m, 2 m) with a slewing gear angle of 30 degrees and the hori-
zontal extent of the crane (lcrane ) is assumed to be 3 m. The plot of a 100 second snip of
the net heave motion time history in 4 different sea states when the waves are incident
at an angle of 135 degrees are shown in Fig. 2.4.
9
0.1 1
0.08 0.8
0.06 0.6
0.04 0.4
0.02 0.2
0 0
-0.02 -0.2
-0.04 -0.4
-0.06 -0.6
-0.08 -0.8
-0.1 -1
0 20 40 60 80 100 0 20 40 60 80 100
(a) Hs = 1.5 m, Tp = 6 s (b) Hs = 4 m, Tp = 9 s
2 5
4
1.5
3
1
2
0.5
1
0 0
-1
-0.5
-2
-1
-3
-1.5
-4
-2 -5
0 20 40 60 80 100 0 20 40 60 80 100
(c) Hs = 6 m, Tp = 12 s (d) Hs = 8.5 m, Tp = 14 s
Figure 2.4: Net heave time history at the winch generated from a PM spectra
10
CHAPTER 3
STATE SPACE MODEL OF THE WINCH DRIVE
3.1 Dynamic Model of the Winch Drive
The state space model for hydraulic drive of the winch in this section is taken from
Richter et al. (2017) for the implementation of different control approaches. A hydraulic
winch is the most commonly used type of winch for active heave compensation of
cranes, due to its high power to weight ratio as compared to other motors like pneumatic
motors, AC motors, DC motors, SMAs etc. A schematic representation of the hydraulic
drive is illustrated in Fig. 3.1.
Figure 3.1: Schematic diagram of hydraulic driven winch
An electric motor drives the hydraulic pump with a constant angular speed ωp that
in turn drives the hydraulic motor with a variable angular speed ωm depending on the
normalized swash angle xp of the pump. The motor here is assumed to be a fixed
displacement motor with a displacement of Dm whereas the pump is assumed to be
a variable displacement pump with a maximum displacement Dp . The displacement
of the pump is controlled by a normalized swash angle (−1 < xp < 1). Thus the
variation of normalized swash angle will govern the flow generated by the pump. A
linear relationship is assumed between the normalized swash angle and the flow rate of
the pump. The input to the plant is a control signal that translates to the normalized
swash angle through a first order system. A saturation of normalized swash angle is
assumed beyond xp = ±1. The connection between the pump and the motor is through
hydraulic lines with volume Vc . The fixed displacement motor drives the winch to which
the load is attached. Between the motor and the winch, an mechanical gear with a gear
ratio k is placed. Proper oil supply is ensured with the help of feed pressure pfeed . The
pressure relief valves make sure that the pressure doesn’t cross safety limits. Since these
limits are not usually reached in practice, the oil supply does not significantly affect the
governing dynamics of the winch and hence the leakage of oil kleak is not modeled in
this project.
The compressiblity of the hydraulic fluid is governed by the bulk modulus Koil . The
pressure on the either sides of the closed circuit is governed by
Koil
ṗ1w = (Dp ωp xp − Dm ωm − kleak (p1w − p1f ))
Vc
(3.1)
Koil
ṗ1f = (Dm ωm − Dp ωp xp − kleak (p1f − p1w )),
Vc
where kleak = k1,p + k1,m and the pressure at the top and bottom hydraulic lines is
denoted by p1w and p1f respectively. Defining the change in pressure ∆p = p1f − p1w ,
we have
2Koil
∆ṗ = (Dm ωm − Dp ωp xp − kleak ∆p). (3.2)
Vc
A positive constant swash angle xp would yield a negative pressure ∆p and positive
flow rate q and vice versa. The dynamics of the winch is governed by
(Jw + mrw2 )ω̇w = −Dm kηm ∆p − bωw + mgrw , (3.3)
where ηm denotes the mechanical efficiency of the motor, b denotes the viscous friction
of the winch and ωw denotes the rotational velocity of the winch with a radius rw and
inertia Jw . The relative position of the load with respect to the winch is denoted by zw
and the relative velocity is given by żw = rw ωw . The torque due to the dynamics of
the lifting rope is neglected and the total torque is approximated by a constant payload
torque mgrw , where m is the mass of the payload and g is the acceleration due to
gravity. The control input up and the normalized swash angle xp are related through a
first order system as shown below
1
ẋp = − (xp − up ), (3.4)
Tw
12
where Tw is the time constant of the normalized swash plate. The state space model of
the winch is given by
ẋ = Ax + Bup + [0 0 d 0]T
(3.5)
y = zw = Cx, x(0) = x0
with the state

x = [xp ∆p żw zw ]T , (3.6)
the system matrix

 1 
− 0 0 0
 Tw 
 
 2Koil Dp ωp 2Koil kleak 2Koil Dm k
 
− −

0
 Vc Vc rw Vc 
A=

,
 (3.7)
 rw Dm kηm b 
 0 − 2
− 0

 Jw + mrw Jw + mrw2 

 
0 0 1 0
the input matrix

T
1
B= 0 0 0 , (3.8)
Tw
the output matrix
h i
C= 0 0 0 1 , (3.9)
and the disturbance given by
rw2 mg
d= + d,
e (3.10)
Jw + mrw2
where de is the disturbance caused due to unmodelled dynamics, nonlinear friction, vi-
brations and parameter uncertainty. Throughout this thesis, it is assumed that the system
is of strict feedback form without an observer (i.e all states are measured and available
for feedback) and the cable does not lose tension throughout the operation.
13
3.2 Data Values for the Winch Drive
The parameters for the hydraulic drive of the winch are shown in Table 3.1.
Table 3.1: Data Values for Winch Drive
Parameter Name Parameters Value

Acceleration due to gravity g 9.8 m/s2
Bulk modulus of hydraulic fluid Koil 1.8 × 109 N/m2
Volume of hydraulic lines Vc 2 × 10−3 m3
Maximum pump displacement Dp 40 × 10−6 m3
Fixed displacement of motor Dm 4 × 10−6 m3
Rotation rate of pump ωp 45 Hz
Leakage constant of pump k1,p 0
Leakage constant of motor k1,m 0
Efficiency of motor ηm 0.65
Gear transmission ratio k 200
Time constant Tw 1s
Radius of winch rw 0.5 m
Inertia of the winch Jw 150 kgm2
Viscous friction of the winch b 1 × 104 kgm2 /s
Mass of the payload m 1000 kg
14
CHAPTER 4
CLASSICAL CONTROL
4.1 Introduction
Over the years, various studies have looked at using different type of classical con-
trollers for heave compensation (Li et al., 2019; Woodacre et al., 2018; Do and Pan,
2008). This chapter aims to implement some of the classical control approaches avail-
able in the literature and further analyze which controllers are good at handling various
aspects in active heave compensation. The primary objective of the controller here is
to let the rope suspended payload track a desired reference trajectory in an earth fixed
frame without being influenced by the heave motion of the ship or vessel.
Disturbances (d)
r e up y
Desired Reference Controller(C) Winch System(P)
−
ym
Sensor Feedback
Noise (n)
Figure 4.1: Schematic diagram of closed loop winch control system
In heave compensation, as the surface vessel heaves up, the winch must let out a line
equal in length to the heave displacement to effectively cancel the motion. Similarly,
as the surface vessel lowers, the winch must reel the line in equal to the displacement.
The block diagram of the winch and the controller is shown in Fig. 4.1. The reference
signal is taken as the opposite of the net heave motion time history so that the controller
generates a control signal up in such a way that the output of the plant y (i.e zw ) tracks
the opposite of the net heave time history r (i.e −zwinch ). This will ensure that the ab-
solute motion of the payload which is the summation of the vessel’s net heave response
and the unwound rope length is close to zero. The winch motion y is measured and fed
back to the controller with the help of feedback sensor such as a winch encoder which
measures the actual length of rope that has been reeled in or out. The measured signal
ym is then obtained by adding measurement noise to the winch motion y. It is assumed
that the winch system is of strict feedback form without an observer (i.e all states are
measured and available for feedback) and the cable does not lose tension throughout
the operation.
4.2 Proportional-Derivative Control
PD Controller is one of the most common controllers used in several applications. Its
advantages include fast rise time, proper reference tracking, and quick reaction to distur-
bances. In the marine industry, this controller is not only used in adjusting the heading
angle of ships (Fang et al., 2012) and USVs (Sonnenburg and Woolsey, 2013) but also
has various uses in heave compensation (Li et al., 2018). In this thesis, a PD controller
was preferred over the PID controller because the plant already has an integrator which
enables tracking without a steady state error. Fig. 4.2 shows the schematic diagram of
an PD Controller with noise filter.
Figure 4.2: Schematic of proportional-derivative control for a plant
Given a reference (r), disturbance (d), and measurement noise (n), the objective is
to analyze the closed loop performance of the system. Based on Fig. 4.1, the Laplace
transform of the governing differential equation of y and e can be written as
Y (s) = (1 + P C)−1 P CR(s) + (1 + P C)−1 D(s) − (1 + P C)−1 P CN (s) (4.1)
16
E(s) = (1 + P C)−1 R(s) − (1 + P C)−1 D(s) + (1 + P C)−1 P CN (s), (4.2)
where Y (s), R(s), D(s), N (s), and E(s) are the Laplace transforms of y(t), r(t),
d(t), n(t), and e(t) respectively. P (s) and C(s) represents the plant and the controller
transfer function in the Laplace domain. In practical applications, the disturbance is
usually composed of low frequency components and the noise is composed of high fre-
quency components. Let S be the sensitivity function representing the transfer function
between disturbance and output, while T be the complementary sensitivity function that
represents transfer function between the noise and the output. These two sensitivities
are complementary as they add up to yield unity. Thus the governing equations can be
expressed as
Y (s) = T R(s) + SD(s) − T N (s) (4.3)
E(s) = SR(s) − SD(s) + T N (s), (4.4)
where T and S are given as
1
S= (4.5)
1 + PC
PC
T = . (4.6)
1 + PC
The goal of this closed loop feedback system is to provide stability, disturbance rejec-
tion, and noise attenuation. At low frequencies, the sensitivity function should be small
as we want a good disturbance rejection and a proper reference tracking. In the high
frequency region, the complementary sensitivity function should be small as noise (pre-
dominantly composed of high frequency components) has to be attenuated. The point at
which the open loop transfer function (L = P C) crosses the 0 dB line is the crossover
frequency where we consider a trade-off between proper reference tracking, disturbance
rejection, and noise attenuation. The looptune() function from MATLAB was used
to tune the PD controller. To design a feedback system, the open-loop transfer function
(L) should be similar to an integrator with a constant slope. Its location is determined
by the point at which the crossover frequency is assumed. The crossover frequency of 8
rad/s was used for the ’looptune()’ function to tune the controller gains. The value
of 8 rad/s was chosen as the crossover frequency for the PD controller as it was able to
provide an considerable compensation performance along with good disturbance rejec-
17
tion and noise attenuation. Fig. 4.3 shows the stability margins and the plot of S and T
along with the loop transfer function L.
Figure 4.3: Loop Shaping for tuning PD controller
The dotted line in Fig. 4.3 represents the target loop shape whereas the blue line
indicates the actual loop transfer function when the PD controller gains are tuned. Gain
margin indicates how much gain it would take to make a 0 dB gain at the −180 degree
phase frequency whereas phase margin indicates how much phase lag it would take to
make a −180 degree phase at the 0 dB gain frequency. Thus, for the closed loop system
to be stable, the system should be designed such that the crossover frequency is far
away from the −180 degree phase frequency. As per Fig. 4.3, it can be seen that the
gain margin and the phase margin are positive and never reach the 0 dB and the 0 degree
line, hence can be considered reasonably stable to process variations.
While tuning the parameters, a low pass filtered derivative signal was used rather
than a pure derivative as the pure derivative amplifies measurement noise. The tuned
18
gains of the PD controller are shown below:
Kp = 5.86, Kd = 5.46, Tf = 0.03, (4.7)
where Kp , Kd are the controller gains and Tf is the time constant for the noise filter.
The control input for this control law in the Laplace domain is given by

Kd s
U (s) = Kp + E(s), (4.8)
1 + Tf s
where U (s) and E(s) are the Laplace transform of up (t) and e(t) ≡ −(zw (t)+zwinch (t))
respectively.
4.3 Linear Quadratic Integral Control
LQI/LQR Control (Kwakernaak and Sivan, 1972) is a type of optimal control problem
which minimizes the cost function by providing an optimum feedback gain matrix K by
penalizing the states and the control input such that there is a trade-off between set point
tracking and controller effort as per our own performance requirement. LQI control has
an added benefit of proper tracking for varying set point over LQR control due to its
integral action. It was preferred over pole placement control because pole placement
uses pole locations to find the optimal gain matrix K, which might not be intuitive as the
order of the system increase whereas LQI finds the optimal feedback matrix K based
on the closed loop performance we desire. Fig. 4.4 shows the schematic diagram of an
LQI Controller.
Figure 4.4: Schematic of linear quadratic integral control for a plant
19
The quadratic cost function J for the optimal control problem is given by
Z ∞
J= (xT Qx + uTp Rup )dt, (4.9)
0
where Q and R are the tuning matrices depending on the performance requirement. The
tuned values of Q and R is shown below
 
1 0 0 0 0
 
0 10−7 0 0 0 
 
  h i
Q = 0 R = 10 . (4.10)
 
0 1 0 0 
 
 
0 0 0 1 0 
 
0 0 0 0 1012
The strict feedback control input up which minimizes the above cost function is given
by
up = −K[xp ∆p żw zw xe ]T = −Kx, (4.11)
where K is a 1 × 5 matrix and xe is the 5th state which is the integral of difference be-
tween the desired reference and state space output. A high weight of 1012 was provided
to the 5th state as compared to the other states to emphasize low steady state error and
hence result in a low net heave of the payload. The feedback matrix K is given by
K = R−1 B T P (t), (4.12)
where P (t) is a positive definite solution of the Ricatti equation given by
AT P (t) + P (t)A − P (t)BR−1 B T P (t) + Q = −Ṗ (t). (4.13)
To tune the K matrix, the ’lqi()’ function from MATLAB was used to find the opti-
mum gain matrix K for a good tracking leading to the value
K = [814.2 − 1.02 × 10−4 1.26 × 103 6.124 × 104 − 3.16 × 105 ]. (4.14)
20
4.4 Model Predictive Control
Model predictive control (Woodacre et al., 2018) is a discrete control algorithm that
determines the optimal controller output required to reach a particular set-point by min-
imizing the quadratic cost function. Fig. 4.5 shows the schematic diagram of an model
predictive controller.
Figure 4.5: Schematic of model predictive control for a plant
The cost function J is given by
Np Nc Nc
X X X
T
J= (y + zwinch ) Q(y + zwinch ) + uTp P up + ∆uTp R∆up (4.15)
i=0 i=0 i=0
umin ≤ up ≤ umax
Nc ≤ Np ,
where Q, P , and R are the weighting parameters (can be matrices in case of multiple
inputs or outputs) for the winch output tracking error, control input, and the rate of
change of control input. Np is the prediction horizon over which the controller allows
the model to evolve, Nc is the control horizon which tells how many time steps forward
the control action is evaluated. In this thesis, a 50 millisecond time step was used.
The choice of Nc and Np will depend on the sampling time and and the time scale
of the system’s dynamics. For a given value of Np , larger control horizon results in
21
more computational complexity, whereas for a given Nc , a larger value of the prediction
horizon results in less control action. Hence, Np and Nc need to be tuned depending on
the requirement. The MPC toolbox from SIMULINK was used to tune the controller
which uses its internal dynamic model of the process, a cost function J over the receding
(moving) horizon, and an optimization algorithm in order to minimize the cost function
J by altering the control input.
The main difference between model predictive control (MPC) and LQI is that MPC
performs cost optimization in a receding time window, whereas LQI optimizes the cost
in a fixed time window. Thus, MPC may obtain a sub optimal solution as it solves
the optimization in a smaller time window as compared to LQI. However, MPC has a
benefit that it can handle hard constraints of the states, which is the main demerit of
LQI. In this study, a hard constraint on one of the states xp is assumed. The tuned
parameters for MPC are shown in Table 4.1. A higher weight of 7 was provided to
the compensated heave at the winch in order to make the controller more aggressive
towards the closed loop performance.
Table 4.1: Tuned Values for MPC
Parameter Np Nc Q P R
Value 20 4 7 0.1 0.01
4.5 Sliding Mode Control
Sliding mode control (Khalil and Grizzle, 2002) is a nonlinear control strategy that
changes the dynamics of a nonlinear system with the help of a discontinuous control
signal that forces the system to slide along a cross-section of the system’s normal be-
havior. Fig. 4.6 shows the schematic diagram of an sliding mode controller.
The most prominent feature of sliding mode control (SMC) is that it is insensitive
to parametric uncertainty and external disturbances during the sliding mode. A sliding
mode controller can stabilize the trajectory of a system. There are two steps in the
SMC design. The first step is designing a sliding surface so that the plant restricted
to the sliding surface has a desired system response. This means the state variables of
22
Figure 4.6: Schematic of conventional sliding mode control for a plant
the plant dynamics are constrained to satisfy another set of equations that define the
so called sliding surface. The second step is constructing a switching feedback gains
necessary to drive the plant’s state trajectory to the sliding surface. These constructions
are built on the generalized Lyapunov stability theory (Khalil and Grizzle, 2002). The
sliding mode surface S is expressed as
n−1
d
S=σ= +k e, (4.16)
dt
where n is the number of states of the system, e = zw −r is the error between the output
of the winch and the desired reference, and k is a positive parameter which is chosen
based on the performance we desire. The equivalent control input ueq is obtained by
setting the derivative of S to zero.
Ṡ = 0. (4.17)
The switching control is given by
us = kp sgn(S) + αS, (4.18)
where kp is the controlled gain, α is a constant positive gain for exponential reaching
law, and sgn() is the signum function. An exponential reaching law was used in order
for the system to reach the sliding surface faster. Therefore, the overall control law up
23
= ueq + us is obtained by setting
n−1 !
d d
+k e = −kp sgn(S) − αS, (4.19)
dt dt
to solve for the control input up . In our model n = 4, which yields S to be
...
S = e + 3kë + 3k 2 ė + k 3 e. (4.20)
Now equating
Ṡ = −kp sgn(S) − αS, (4.21)
we have,

.... żw b Dm ηm kr∆p
up = kp sgn(S) + αS + r + k3 (ṙ − żw ) + k2 r̈ + +
mr2 + Jw (mr2 + Jw )

Dm bηm kr 2Dp koil ωp xp 2Dm kkoil żw 2Dm Dp ηm kkoil rωp xp
+ − +
(mr2 + Jw )2 Vc Vc r Tp Vc (mr2 + Jw )
b2 2
ηm k 2 koil

2Dm żw b Dm ηm kr∆p
+ − + (4.22)
(mr2 + Jw )2 Vc (mr2 + Jw ) mr2 + Jw mr2 + Jw

b(żw b + Dm ηm kr∆p) ... Dm ηm kr(2rDp koil ωp xp − 2Dm kkoil żw )
− k1 −r+ ,
(mr2 + Jw )2 Vc r(mr2 + Jw )
where k1 = 3k, k2 = 3k 2 and k3 = k 3 respectively. The parameters of the sliding mode

control law should be chosen in order to enhance the performance of the controlled
system. Here, k, kp and α were tuned and k = 27, kp = 70, and α = 1 were found
to yield to good control. The SMC however has a drawback that not all derivatives
of r can be practically measured. Also, numerical calculation of these derivatives will
cause numerical errors to be introduced and can have a detrimental effect on the control
performance. Therefore, even though SMC is mathematically elegant, it can be hard to
implement practically.
24
CHAPTER 5
DATA DRIVEN CONTROL
5.1 Introduction
Data-driven control systems are a broad family of control systems, in which the iden-
tification of the process model and/or the design of the controller are based entirely
on experimental data collected from the plant. In many control applications, trying to
write a mathematical model of the plant is considered a hard task. This problem can
be overcome by data-driven methods, which allows to fit a appropriate system model to
the experimental data that is collected. The control engineer can then exploit this model
to design an effective controller for the system.
In this chapter, Long Short Term Memory Networks (LSTMs), which is a select type
of Recurrent Neural Network (RNN) are used to predict the control input signal when
a particular net heave time history of the ship is provided as an input to the deep neural
network. This predicted input signal is then provided to the hydraulic-driven winch
model to determine the compensation performance of the payload. This is a feasibility
study to check if ML models are capable of learning the control required to achieve
desired output.
5.2 Long short-term memory (LSTM)
Long short term memory (LSTM) is a type of Recurrent Neural Network (RNN) that
are used for predicting sequential data (Hochreiter and Schmidhuber, 1997). One of the
major limitation of RNN is that it cannot be used to capture long term dependencies on
data that might be caused due to vanishing gradients. LSTM with the use of memory
can solve this issue. Because of its ability to learn the temporal feature of the data, it
is widely used to learn time-series data. It can be used to solve vanishing gradient and
divergence problems that occurs in conventional RNN structures. It is mainly composed
of a cell with a number of gates attached. By opening and closing the gates, LSTM can
truncate gradients in the network. Eq (5.1) shows the mathematical formulation of
LSTM.
it = σ(Wxi xt + Whi ht−1 + bii ) (Input gate)
ft = σ(Wxf xt + Whf ht−1 + bif ) (Forget gate)
ft = σ(Wxf xt + Whf ht−1 + bif ) (Forget gate)
zt = tanh(Wxc xt + bc ) (Block Input) (5.1)
ct = ft ∗ ct−1 + it ∗ zt (Cell gate)
ot = σ(Wx0 xt + Wh0 ht−1 + bi0 ) (Output gate)
ht = ot ∗ tanh(ct ) (Block Output)
where xt is the input vector at a time step t, W(.) and b(.) represent the weight matrix
and the bias vector for linear transformation, σ(.) refers to the element-wise activation
function (generally sigmoid function) and ∗ denotes the point-wise vector products. The
information in the network is controlled by opening and closing the input, output, and
forget gate. Fig. 5.1 shows the schematic architecture of LSTM.
Figure 5.1: Schematic architecture of the LSTM method
5.3 LSTM for dynamic model identification
The network architecture used is shown in Table 5.1 . It consists of 2 LSTM layers with
64 units each followed by an fully connected (FC) layer of size 1. This network takes
the opposite of the vessel’s net heave time history for a particular sea state as an input
and predicts the control input signal up required to track this time history. The use of
LSTM network ensures that a sequence of observations, prior to the current time instant,
26
are used to predict the control input at the next time step. The training of this neural
network was done using pairs of input-output data where the input of the network is the
output of the hydraulic driven winch (i.e zw ) and the output of the network is the control
input up to the winch when a PD controller is implemented. The reference provided to
the PD controller was the opposite of the vessel’s net heave time history for different
sea states. The primary idea of this study is to see if a LSTM network can emulate PD
control, given sufficient data.
Table 5.1: Details of Assumed Architecture
Layer Type Size Activation

1 LSTM 64 Tanh
2 LSTM 64 Tanh
3 FC 1 Linear
The training data for LSTM network was a 20000 sec time history with the input-
output data being concatenated for 4 sea states namely slight, moderate, rough and very
rough sea states with each sea state containing the data for 5000 sec each.
Since we construct this problem using regression with a single unit at the end, we
have used mean squared error (MSE) loss function in our network during the training.
The control input signal to the winch is a continuous variable predicted for each time
step over the sequential data and the evaluation metrics used in this study is root mean
squared error (RMSE).
Figure 5.2: Accuracy Vs Learning Rate
27
Simulation studies presented in this chapter have been carried out using Keras with
a Tensorflow backend. The training was done on Nvidia GeForce GTX 1060 16GB
GPUs. The optimizer used was stochastic gradient descent (SGD) for all the simula-
tions. Learning rates between 5 ∗ 10−2 and 10−3 were tested and the highest learning
rate giving a good model convergence was found to be 10−1 as can be seen from Fig.
5.2. The batch size and the window size was taken to be 32 and 25 seconds respectively.
Fig. 5.3 shows the plot of loss function vs epoches for training data sets. As per Fig.
5.3, it can be seen that the MSE loss decreases after a few epochs and remains almost
constant after 60 epochs.
Figure 5.3: MSE Loss Vs Epochs for the training data set
5.4 Results
When the neural network was tested on a new time history that it was not exposed in
training, there seemed to be a constant offset error between the predicted control input
of the LSTM network and the PD control input. Due to this when the predicted control
input was applied to the winch model, the reeled length started to increase linearly as
time progressed. This effect can be seen from Fig. 5.4. In order for the LSTM network
to learn the behaviour of tracking a particular offset, the training data including various
offsets was also tried, but it still failed to learn to compensate the payload.
This study concluded that LSTM was unable to track low frequency inputs well.
Therefore LSTM was dropped as a candidate for controlling the net heave of a payload.
28
10
0 50 100 150 200 250 300 350 400 450 500
Figure 5.4: Reeled length comparison between LSTM and PD controller
29
CHAPTER 6
REINFORCEMENT LEARNING BASED CONTROL
6.1 Introduction
Lately, model-free deep reinforcement learning has made significant progress in solv-
ing a variety of complex tasks. The first successful application of this technique to
learn the control policy was through a deep Q-network (DQN) for playing Atari games
(Mnih et al., 2013; Silver et al., 2017), which integrates the Q learning and deep neural
network. However, DQN can only be used to solve problems that have discrete action
space. Since many control tasks in the real world have continuous action space, several
advanced reinforcement learning algorithms aiming to solve continuous control prob-
lems have also been developed (Lillicrap et al., 2015; Mnih et al., 2016; Levine et al.,
2016).
This chapter introduces a RL based control methodology using the deep determinis-
tic policy gradient (DDPG) algorithm for active heave compensation. The efficacy and
the potential of using this technique for real-life applications are assessed in a simulated
environment.
6.2 Deep Deterministic Policy Gradient Algorithm
In reinforcement learning (RL), the agent interacts with the environment by taking ac-
tions and observing the state and the reward without any knowledge of the dynamics of
the environment. The goal of the RL algorithm is to find the optimal policy for selecting
the control actions u∗t for a system currently in state st according to a policy π ∗ (st )
u∗t = π ∗ (st ) = arg max Qπ (st , ut ), (6.1)

π
that maximises the total discounted reward Rt over the episode which is given by
∞
X
Rt = γ i rt+i+1 , (6.2)
i=0
where γ denotes the discount factor. The action-value function Q is defined as:
Qπ (st , ut ) = Eπ [Rt |st , ut ], (6.3)
which is the expected discounted reward when an action ut is taken in state st . The
optimal policy π ∗ is found by interacting with the environment and learning from the
feedback. By this the agent generates experience which is then used to improve the pol-
icy. The action-value function written in a recursive format also known as the Bellman
equation is given by:
Qπ (st , ut ) = Ert ,st+1 ∼E [r(st , ut ) + γEut+1 ∼π [Qπ (st+1 , ut+1 )]], (6.4)
where the state st+1 is observed from environment E and action ut+1 is sampled from
the policy π. Since DDPG uses a deterministic policy the above equation then becomes
Qµ (st , ut ) = Ert ,st+1 ∼E [r(st , ut ) + γQµ (st+1 , µ(st+1 ))], (6.5)
where µ represents the deterministic policy function.
DDPG adopts an actor-critic framework where both the policy and action-value
functions are learned using neural networks. The actor and critic network are two ar-
tificial neural networks that are used to obtain actions and to estimate action values
respectively. The goal of the critic network is to minimise the mean square temporal
difference (TD) error:
N
1 X
L= (Q(si , ui |θQ ) − yi )2 , (6.6)
N i=1
where θQ represents the weights and biases of the critic network, N is the sample batch
size, and yi is the temporal difference (TD) target given by:
yi = r(si , ui ) + γQ(si+1 , µ(si+1 )), (6.7)
The TD error is defined as difference between the evaluated Q value and the TD target
31
yi . If yi is calculated using the same network used by the critic, it may be very hard to
0 0
converge. So a target critic Q0 (s, u|θQ ) and target actor µ0 (s|θµ ) are introduced. The
0 0
parameters θQ and θµ are updated using a soft method given by
0 0
θQ ← τ θQ + (1 − τ )θQ
(6.8)
µ0 µ µ0
θ ← τ θ + (1 − τ )θ ,
where τ 1. So, the TD target yi can be expressed as
0 0
yi = r(si , ui ) + γQ0 (si+1 , µ0 (si+1 |θµ )|θQ ). (6.9)
The aim of the actor network is to maximise the expected accumulated reward J
whose gradient is given by:
N
1 X
∇θ µ J ≈ ∇u Q(s, u|θQ )|si ,u=µ(si ) ∇θµ µ(s|θµ )|si , (6.10)
N i=1
where N is the sample batch size and si is sampled from the replay buffer D. The
pseudo code of the DDPG algorithm is shown in the next page.
6.3 Training and Hyperparameter Tuning
The RL agent is programmed using Keras with a Tensorflow backend. The training was
done on Nvidia GeForce GTX 1060 16GB GPU. The state space s ∈ S chosen for the
application of RL is defined as
S = {zw , żw , zwinch , żwinch }, (6.11)
where zw is the length of the reeled rope and zwinch is the net heave at the winch due to
the motion of the vessel in waves. The action space u ∈ A is defined as
A = {up }, (6.12)
where up is the control input provided to the winch model. The target is to achieve
zw = −zwinch by adjusting up .
32
Algorithm 1 DDPG algorithm
1: Randomly initialise critic network Q(s, u|θQ ) and actor µ(s|θµ ) with weights θQ
and θµ
0 0
2: Set target parameters equal to main parameters θQ ← θQ , θµ ← θµ
3: Empty replay buffer D
4: for episode = 1, M do
5: Initialise a random noise N for action exploration
6: Receive initial observation state s0
7: for t = 1,T do
8: Select action
ut = clip(µ(st |θµ ) + Nt , alow , ahigh )
9: Execute action ut in the environment E

10: Observe new state st+1 and reward rt
11: Store (st , at , rt , st+1 ) in D
12: Sample a random minibatch of N transitions (si , ai , ri , si+1 ) from D
13: Compute target
0 0
yi = ri + γQ0 (si+1 , µ0 (si+1 |θµ )|θQ )
14: Update the critic network by minimising the loss

N
1 X
L= (Q(si , ui |θQ ) − yi )2
N i=1
15: Update the actor policy by applying the following gradient:

N
1 X
∇θ µ J ≈ ∇u Q(s, u|θQ )|si ,u=µ(si ) ∇θµ µ(s|θµ )|si
N i=1
16: Update the target networks with

0 0
θQ ← τ θQ + (1 − τ )θQ
0 0
θµ ← τ θµ + (1 − τ )θµ
17: end for
18: end for
33
Figure 6.1: DDPG Architecture
Fig. 6.1 shows a schematic diagram of the structure of the DDPG architecture used.
The actor network is composed of two fully connected layers with 256 neurons each
whereas the critic network is composed of a fully connected layers with 64 and 32
neurons for inputs s and u respectively followed by a fully connected layer of 512
neurons as shown in Fig. 6.1. A rectified linear unit function was used as the activation
function for each neuron.
The reward r is defined as follows:


1 − 20ez − ėz

ez ≤ 0.05 m
r= (6.13)
−10ez − 2ėz

ez > 0.05 m,
where ez =| zw + zwinch | and ėz =| żw + żwinch | respectively.
The RL based controller was trained for 150 episodes for 3000 training steps with a
sample time of 0.1 sec. The input data used for training was a 300 sec net heave time
history at the winch in moderate sea state. During the simulation, the initial conditions
34
of the states were randomly selected. The sample batch size was set to 128 with a
replay buffer capacity of 50000. For training the network, an Adam optimiser was used
for both the actor and the critic network. The learning rate of the actor and the critic
network were set to 0.001 and 0.002 respectively. The target network transition gain τ
was set as 0.005 and the discount factor γ was set as 0.998. For better exploration, an
Ornstein–Uhlenbeck action noise with θ = 0.15, µ = 0, and σ = 0.0005 was used. This
action noise was added to the action up in the training phase to boost the performance.
Fig 6.2 shows the learning curve achieved as a result of the training.
10 4
1
-1
-2
-3
-4
-5
-6
0 25 50 75 100 125 150
Figure 6.2: Learning curve of the RL agent
35
CHAPTER 7
SIMULATION RESULTS
In this chapter, five control approaches i.e PD, MPC, LQI, SMC and RL based con-
trol are compared to understand the advantages and limitations of each of the control
approaches. The following four cases were examined to test the efficacy of each con-
troller.
1. Heave compensation with no disturbance and no measurement noise
2. Heave compensation with an offset
3. Heave compensation with disturbance but no measurement noise
4. Heave compensation with measurement noise but no disturbance
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.5
0
-0.5
0 50 100 150 200 250 300 350 400 450 500
1
0
-1
0 50 100 150 200 250 300 350 400 450 500
4.5
0
-4.5
0 50 100 150 200 250 300 350 400 450 500
Figure 7.1: Uncompensated motion time history at the winch for four sea states
Fig. 7.1 shows the plot of uncompensated motion time history for these four sea
states. The RMS value of uncompensated net heave at the winch for slight, moderate,
rough and very rough sea states evaluated from a 1000 seconds time history were found
to be 0.0331 m, 0.2759 m, 0.7390 m, and 1.3203 m respectively.
7.1 Case 1: With no disturbance and no measurement

noise
Table 7.1, 7.2, 7.3, and 7.4 show the analysis of the compensated performance for all
the controllers. In this thesis, heave compensation performance was defined as
RM S(η uc (t) − η c (t))

Performance = × 100% (7.1)
RM S(η uc (t))
where η uc and η c is the uncompensated and compensated net heave at the winch respec-
tively.
PD control showed intermediate compensation for all the four sea states in compar-
ison to all other controllers.
MPC was found to outperform other classical strategies in terms of compensation

for moderate and rough sea states. It was also found that MPC required the minimal
control input to achieve such compensation as compared to other strategies. However,
MPC does has a disadvantage that, it performs an online optimization and hence re-
quires considerable online computation power for real time applications.
Table 7.1: Slight Sea State: Hs = 1.5 m, Tp = 6 sec
Controller rms(Compensated Motion) Heave Compensation Performance

PD 0.0044 m 86.64%
MPC 0.0033 m 90.07%
LQI 0.0057 m 82.82%
SMC 0.0011 m 96.56%
RL 0.001 m 96.9%
37
Table 7.2: Moderate Sea State: Hs = 4 m, Tp = 9 sec

PD 0.0282 m 89.77%
MPC 0.0210 m 92.38%
LQI 0.0356 m 87.09%
SMC 0.0259 m 90.63%
RL 0.0019 m 99.3%
Table 7.3: Rough Sea State: Hs = 6 m, Tp = 12 sec

PD 0.0584 m 92.1%
MPC 0.0437 m 94.09%
LQI 0.0740 m 89.98%
SMC 0.0552 m 92.53%
RL 0.0140 m 98.1%
Table 7.4: Very Rough Sea State: Hs = 8.5 m, Tp = 14 sec

PD 0.2034 m 84.59%
MPC 0.2061 m 84.39%
LQI 2.2274 m 0%
SMC 0.1158 m 91.23%
RL 0.0590 m 95.53%
LQI control demonstrated the least compensation for all the four sea states. Since
LQI is based on a linear model of the plant, it could not compensate heave motion
when the normalized swash angle was saturated. This scenario occurred in the very
rough sea state at about t = 75 seconds when the uncompensated net heave motion at
the winch experienced a relatively sharp rise (Fig. 7.1) resulting in normalized swash
38
angle saturation as seen in Fig. 7.3. The poor compensation of heave motion during
this period is evident from the compensated motion plot shown in Fig. 7.2(d). The
controller could not recover until t = 210 seconds when the uncompensated heave
motion magnitude reduced. A similar larger uncompensated heave motion was also
experienced at about t = 300 seconds and in this case also, the controller wasn’t able
to recover until t = 500 seconds when the uncompensated heave motion magnitude
reduced. In a sharp jump case, the controller attempts to compensate by increasing the
control input sharply and hence resulting in a saturation. However, when the increase is
gradual, the control input also increases gradually and hence avoids saturation.
SMC was able to perform better in terms of compensation in the slight sea state.
However, in moderate and rough sea state, its performance was found to be intermedi-
ate.
RL controller was able to provide the best compensation performance in all the four
sea conditions as shown in Table 7.1, 7.2, 7.3, and 7.4. It also outperformed other
strategies in handling the saturation of the swash angle as shown in Fig. 7.2(d) and Fig.
7.3.
0.02
0
-0.02
0 50 100 150 200 250 300 350 400 450 500
0.02
0
-0.02
0 50 100 150 200 250 300 350 400 450 500
0.02
0
-0.02
0 50 100 150 200 250 300 350 400 450 500
0.02
0
-0.02
0 50 100 150 200 250 300 350 400 450 500
0.02
0
-0.02
0 50 100 150 200 250 300 350 400 450 500
(a) Hs = 1.5 m, Tp = 6 sec
39
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
(b) Hs = 4 m, Tp = 9 sec
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
0.1
0
-0.1
0 50 100 150 200 250 300 350 400 450 500
(c) Hs = 6 m, Tp = 12 sec
40
2
0
-2
0 50 100 150 200 250 300 350 400 450 500
2
0
-2
0 50 100 150 200 250 300 350 400 450 500
10
0
-10
0 50 100 150 200 250 300 350 400 450 500
2
0
-2
0 50 100 150 200 250 300 350 400 450 500
2
0
-2
0 50 100 150 200 250 300 350 400 450 500
(d) Hs = 8.5 m, Tp = 14 sec
Figure 7.2: Compensated motion time histories in different sea conditions
1
0
-1
0 50 100 150 200 250 300 350 400 450 500
1
0
-1
0 50 100 150 200 250 300 350 400 450 500
1
0
-1
0 50 100 150 200 250 300 350 400 450 500
1
0
-1
0 50 100 150 200 250 300 350 400 450 500
1
0
-1
0 50 100 150 200 250 300 350 400 450 500
Figure 7.3: Normalized swash angle for Hs = 8.5 m, Tp = 14 sec
41
7.2 Case 2: With an offset
1.5
0.5
-0.5
-1
0 50 100 150 200 250 300 350 400
Figure 7.4: Ability of the controllers to track offset for Hs = 4 m, Tp = 9 sec
Fig. 7.4 shows the ability of each strategy in tracking an offset for a period of 200
secs in a moderate sea state for a wave incident angle of 135 degrees. It was found
that all the controllers were reasonably good at tracking the offset except LQI. LQI
demonstrated long overshoots when the offset was introduced and removed. This can
again be explained due to the saturation of swash angle when a sudden offset of 1 m is
applied or removed.
7.3 Case 3: With disturbance but no measurement noise
In this case, the ability of all controllers in rejecting disturbance is analysed. It was
assumed that the disturbance entered the system only through the third state (i.e reeled
42
rope velocity żw ). Since the ability of controllers to reject disturbances is independent
of sea state, the simulation results were analyzed only for the moderate sea state.
7.3.1 Ability of classical controllers in rejecting disturbance
In order to analyze the ability of the classical controllers in rejecting disturbance, the
disturbance time history was generated from a constant spectrum of spectral density
value 10−3 between a cut in and cut off frequencies of 0 rad/s and 0.5 rad/s respec-
tively.
10-3 Spectral Density of Compensated Motions 10 -3 Spectral Density of Disturbance and Compensated Motions
3 3
Disturbance
PD
PD
MPC MPC
2.5 LQI 2.5 LQI
Spectral Density Function (m 2 .s)
SMC SMC

10 -4
2 2
4
2
1.5 1.5
1
0
0.1 0.2 0.3
1 1
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Frequency (rad/s)
Frequency (rad/s)
(a) Without disturbance (b) With disturbance
Figure 7.5: Spectrum of compensated motions (a) without disturbance and (b) with dis-
turbance in moderate sea state (Classical Control)
Fig. 7.5(a) and Fig. 7.5(b) shows the power spectral density of the compensated
motions for classical control strategies in moderate sea state with no disturbance and
disturbance respectively. It can be seen from Fig. 7.5 that SMC was found to be most
sensitive to disturbance. Other than SMC, all the controllers were reasonably good at
rejecting disturbances.
7.3.2 Ability of RL controller in rejecting disturbance
In order to analyze the ability of the RL based controller in rejecting disturbance, the
disturbance time history was generated from a constant spectrum of spectral density
value 10 with a cut in and cut off frequencies of 0 rad/s and 0.5 rad/s respectively.
The reason why a higher spectral density value was considered is because the RL based
controller was quite good in rejecting low disturbances.
43
A PD controller was used for comparing the efficacy of the RL controller in reject-
ing disturbances. Fig. 7.6 shows the plot of the spectrum for the disturbance and the
compensated motions for both the controllers (i.e RL and PD controller) in the presence
and absence of disturbance. It can be observed that both controllers were reasonably
good at high disturbance rejection as well.
15
10 -3
10 -4
10
10 -5
5 10 -6
10 -7
0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
(a) (b)
Figure 7.6: Spectrum of (a) disturbance and PM spectra’s and (b) compensated motions
in moderate sea state (RL Based Control)
7.4 Case 4: With measurement noise but no disturbance
In order to analyze the ability of the controllers to attenuate measurement noise, two
cases were conducted: one with a low measurement noise and other with a high mea-
surement noise. In this study, the noise was added only to the reeled length of the rope
(i.e zw ) of the winch model and relayed back to the controller. The signal to noise ratio
(SNR) is usually defined in decibels as
2
σsignal

σsignal
SNR = 10 log10 2
= 20 log10 (7.2)
σnoise σnoise
In this study σsignal is taken as the RMS value of the uncompensated net heave
motion at the winch and σnoise is taken as the RMS value of the noise.
44
7.4.1 With low measurement noise
Ability of Classical Controllers in attenuating noise
The noise was generated from a constant spectral value of 10−6 with a cut in and cut off
frequency of 3.14 rad/s and 50 rad/s respectively. The SNR value corresponding to
this noise was 28.6 dB. Fig. 7.7 shows the power spectral density of the noise and com-
pensated motion time history for each of the classical strategy when a low measurement
noise was included. As per Fig. 7.7, SMC was totally insensitive to measurement noise.
PD controller was found to be good at attenuating noise as compared to LQI before 8
rad/s. However LQI controller was found to outperform PD controller in attenuating
noise after 8 rad/s. It can be seen from Fig. 7.7 that at higher frequencies optimal con-
trol had an edge over classical control in terms of noise attenuation. MPC showed an
odd behaviour in which the measurement noise amplified and started to attenuate noise
only after 20 rad/s. However the peak of the MPC at 20 rad/s can be decreased and
shifted to the left by decreasing the weights to Q = 1 for the compensated motion at
the winch but this results in a significant decrease in heave compensation performance
of about 60%.
10-3 Spectral Density of Compensated Motions 10-5 Spectral Density of Noise and Compensated Motions
3 1
PD 10-6 Noise
0.9 2
MPC PD
2.5 LQI 1.5 MPC
SMC 0.8 LQI

Spectral Density Function (m2 .s)
1
SMC
0.7
0.5
2
0.6 0
4 6 8 10 12
1.5 0.5
0.4
1
0.3
0.2
0.5
0.1
0 0
0 0.5 1 1.5 2 2.5 3 0 5 10 15 20 25 30 35 40 45 50
Frequency (rad/s) Frequency (rad/s)
(a) Between 0 and 3 rad/s (b) Between 0 and 50 rad/s
Figure 7.7: Spectrum of compensated motions for low noise (SNR = 28.6 dB) (Clas-
sical Control)
Ability of RL controller in attenuating noise
The noise was generated from a constant spectral value of 10−6 with a cut in and cut
off frequency of 3.14 rad/s and 30 rad/s respectively. A lesser cutoff frequency was
45
considered in this case in order to avoid aliasing effect as the RL based controller’s
sampling time was set as 0.1 sec which corresponds to a nyquist frequency of 31.4
rad/s. The SNR value corresponding to this noise was 34.53 dB. Fig. 7.8 shows the
power spectral density of the noise and compensated motions for both the strategies
(i.e RL and PD controller) when a low noise is included. From Fig. 7.8, it can be seen
that in high SNR case, the PD controller was able to perform better in attenuating the
noise at higher frequencies. Also, in this case there was no effect of noise seen on the
compensation ability of both the controllers as per Fig. 7.6(b) and zoomed plot inside
Fig. 7.8.
10 -3
10 -4 10-3
10-4
10-5
10 -5
0 1 2 3
10 -6
10 -7
0 5 10 15 20 25 30
Figure 7.8: Spectrum of compensated motions for low noise (SNR = 34.53 dB) (RL
Based Control)
7.4.2 With high measurement noise
Ability of Classical Controllers in attenuating noise
off frequency of 3.14 rad/s and 50 rad/s respectively. The SNR value corresponding
to this noise was 1.81 dB. Fig. 7.9 shows the power spectral density of the noise and
compensated motion time history for each of the strategy when a high measurement
noise was included. Saturation in the normalized swash angle was observed for MPC
and PD controller when a high measurement noise was added. As per Fig. 7.9(b), all
the controllers were reasonably good at attenuating measurement noise. However due to
saturation in the normalized swash angle, MPC and PD controller showed an high peak
in spectrum between 0 and 1.5 rad/s. This means that the compensation is significantly
46
affected for PD and MPC when a large noise is present in the system. However, for
LQI and SMC a strong noise has very minimal effect on performance and their ability
to attenuate noise even when high measurement noise is present in the feedback signal.
Spectral Density of Compensated Motions 10-3Spectral Density of Noise and Compensated Motions
0.07 3
PD Noise
MPC PD
0.06 10-3
LQI 2.5 2 MPC

SMC LQI
1.5 SMC
0.05
2 1
0.04 0.5
1.5
0
0.03 4 6 8 10 12
1
0.02
0.5
0.01
0 0
0 0.5 1 1.5 2 2.5 3 0 5 10 15 20 25 30 35 40 45 50
Frequency (rad/s) Frequency (rad/s)
(a) Between 0 and 3 rad/s (b) Between 0 and 50 rad/s
Figure 7.9: Spectrum of compensated motions for high noise (SNR = 1.81 dB ) (Clas-
sical Control)
Ability of RL controller in attenuating noise
off frequency of 3.14 rad/s and 30 rad/s respectively. A lesser cutoff frequency was
considered in this case in order to avoid aliasing effect as the RL based controller’s
sampling time was set as 0.1 sec which corresponds to a nyquist frequency of 31.4
rad/s. The SNR value corresponding to this noise was 4.56 dB.
10 -2
10 -3
10 -4 10-1
10-2
10-3
10 -5
10-4
0 1 2 3
0 5 10 15 20 25 30
Figure 7.10: Spectrum of compensated motions for high noise (SNR = 4.56 dB) (RL
Based Control)
47
Fig. 7.10 shows the power spectral density of the noise and compensated motions
for both the strategies (i.e RL and PD controller) when a high noise is included. As
per Fig. 7.10, it can be seen that RL controller performed better in attenuating higher
magnitude noise. Saturation in the normalized swash angle was observed for both the
controllers in this case, and due to this the compensation ability was significantly ef-
fected. As per Fig. 7.6(b) and zoomed plot inside Fig. 7.10, the PD controller was found
to be more affected due to the noise as compared to the RL based controller. When it
came to high noise attenuation the RL based controller performed better than the PD
controller as shown in Fig. 7.10.
48
CHAPTER 8
CONCLUSION
In this study, five control strategies - PD (Proportional Derivative Control), LQI (Lin-
ear Quadratic Integral Control), MPC (Model Predictive Control), SMC (Sliding Mode
Control) and RL (Reinforcement Learning) based Control- are implemented for achiev-
ing active heave compensation. The five control strategies are described in detail and
their performance is compared and analysed in keeping a payload regulated while re-
jecting the net heave at the winch arising from ship motions at sea.
When no noise or disturbance is present, RL controller showed the best heave com-
pensation performance in all the four sea states. It was also decent in rejecting dis-
turbances and attenuating measurement noise. SMC was able to provide good heave
compensation and noise attenuation. However, it failed to reject disturbances com-
pletely. LQI controller was found to have the least compensation performance among
all the controllers for all the four sea states. It also failed to compensate motion in se-
vere sea state and was affected considerably by saturation dynamics. However, it was
able to perform decent disturbance rejection and noise attenuation. MPC on the other
hand was found to outperform other classical strategies in terms of compensation for
moderate and rough sea states. However, it failed to attenuate low noise. PD control
tuned with loop tuning exhibited intermediate compensation for all the four sea states
in comparison to all other controllers.
REFERENCES
1. Atmanand, M. and G. Ramadass, Concepts of deep-sea mining technologies. In Deep-
Sea Mining. Springer, Berlin, Germany, 2017, 305–343.
2. Calnan, C. (2016). Set-point algorithms for active heave compensation of towed bod-
ies.
3. Do, K. D. and J. Pan (2008). Nonlinear control of an active heave compensation sys-
tem. Ocean engineering, 35(5-6), 558–571.
4. Falzarano, J., A. Somayajula, and R. Seah (2015). An overview of the prediction

methods for roll damping of ships. Ocean Systems Engineering, 5(2), 55–76. ISSN
2093-6702.
5. Fang, M.-C., Y.-H. Lin, and B.-J. Wang (2012). Applying the pd controller on the roll
reduction and track keeping for the ship advancing in waves. Ocean Engineering, 54,
13–25.
6. Fossen, T. I., Handbook of Marine Craft Hydrodynamics and Motion Control. John
Wiley & Sons, NJ, USA, 2011.
7. Fossen, T. I. and J. P. Strand (2001). Nonlinear passive weather optimal positioning

control (wopc) system for ships and rigs: experimental results. Automatica, 37(5),
701–715.
8. Galuppini, G., L. Magni, and D. M. Raimondo (2018). Model predictive control of

systems with deadzone and saturation. Control Engineering Practice, 78, 56–64.
9. Grimble, M., R. Patton, and D. Wise (1980). The design of dynamic ship positioning
control systems using stochastic optimal control theory. Optimal Control Applications
and Methods, 1(2), 167–202.
10. Guha, A. (2016). Development and application of a potential flow computer program:
determining first and second order wave forces at zero and forward speed in deep and
intermediate water depth. Ph.d. dissertation, Texas A&M University, College Station,
TX.
11. Guha, A., A. Somayajula, and J. Falzarano, Time domain simulation of large ampli-
tude motions in shallow water. In 21st SNAME Offshore Symposium. Society of Naval
Architects and Marine Engineers, Houston, TX, 2016.
12. Hatleskog, J. T. and M. W. Dunnigan (2007). Passive compensator load variation for
deep-water drilling. IEEE Journal of Oceanic engineering, 32(3), 593–602.
13. Hochreiter, S. and J. Schmidhuber (1997). Long short-term memory. Neural compu-
tation, 9(8), 1735–1780.
14. Khalil, H. K. and J. W. Grizzle, Nonlinear Systems, volume 3. Prentice Hall, Upper
Saddle River, NJ, USA, 2002.
50
15. Korde, U. A. (1998). Active heave compensation on drill-ships in irregular waves.
Ocean engineering, 25(7), 541–561.
16. Kwakernaak, H. and R. Sivan, Linear Optimal Control Systems, volume 1. Wiley-
InterScience, New York, USA, 1972.
17. Levine, S., C. Finn, T. Darrell, and P. Abbeel (2016). End-to-end training of deep
visuomotor policies. The Journal of Machine Learning Research, 17(1), 1334–1373.
18. Li, M., P. Gao, J. Zhang, J. Gu, and Y. Zhang (2018). Study on the system design
and control method of a semi-active heave compensation system. Ships and Offshore
Structures, 13(1), 43–55.
19. Li, Z., X. Ma, Y. Li, Q. Meng, and J. Li (2019). Adrc-esmpc active heave compensa-
tion control strategy for offshore cranes. Ships and Offshore Structures, 1–9.
20. Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and
D. Wierstra (2015). Continuous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971.
21. Mnih, V., A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and
K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning. In Interna-
tional conference on machine learning. PMLR, 2016.
22. Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and

M. Riedmiller (2013). Playing atari with deep reinforcement learning. arXiv preprint
arXiv:1312.5602.
23. Ni, J., S. Liu, M. Wang, X. Hu, and Y. Dai, The simulation research on passive heave
compensation system for deep sea mining. In 2009 International Conference on Mecha-
tronics and Automation. IEEE, 2009.
24. Richter, M., S. Schaut, D. Walser, K. Schneider, and O. Sawodny (2017). Experi-
mental validation of an active heave compensation system: estimation, prediction and
control. Control Engineering Practice, 66, 1–12.
25. Silver, D., J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hu-

bert, L. Baker, M. Lai, A. Bolton, et al. (2017). Mastering the game of go without
human knowledge. nature, 550(7676), 354–359.
26. Somayajula, A. and J. Falzarano, Estimation of roll motion parameters using R-MISO
system identification technique. In J. S. Chung, M. Muskulus, T. Kokkinis, and A. M.
Wang (eds.), 26th International Offshore and Polar Engineering (ISOPE 2016) Confer-
ence, volume 3. International Society of Offshore and Polar Engineers (ISOPE), Rhodes
(Rodos), Greece, 2016.
27. Somayajula, A. and J. Falzarano (2017a). Application of advanced system identifi-

cation technique to extract roll damping from model tests in order to accurately predict
roll motions. Applied Ocean Research, 67, 125–135. ISSN 01411187.
28. Somayajula, A. and J. Falzarano (2017b). Critical assessment of reverse-MISO tech-

niques for system identification of coupled roll motion of ships. Journal of Marine
Science and Technology, 22(2), 231–244. ISSN 0948-4280.
51
29. Somayajula, A., A. Guha, J. Falzarano, H.-H. Chun, and K. H. Jung (2014). Added
resistance and parametric roll prediction as a design criteria for energy efficient ships.
Ocean Systems Engineering, 4(2), 117–136. ISSN 2093-6702.
30. Somayajula, A. S. (2017). Reliability Assessment of Hull Forms Susceptible to Para-

metric Roll in Irregular Seas. Ph.d. dissertation, Texas A&M University, College Sta-
tion, TX.
31. Sonnenburg, C. R. and C. A. Woolsey (2013). Modeling, identification, and control

of an unmanned surface vehicle. Journal of Field Robotics, 30(3), 371–398.
32. Sørensen, A. J. (2005). Structural issues in the design and operation of marine control
systems. Annual Reviews in Control, 29(1), 125–149.
33. Woodacre, J. (2015). Model-predictive control of a hydraulic active heave compen-

sation system with heave prediction. Ph.d. dissertation, Dalhousie University, Nova
Scotia, Canada.
34. Woodacre, J., R. Bauer, and R. Irani (2015). A review of vertical motion heave
compensation systems. Ocean Engineering, 104, 140–154.
35. Woodacre, J., R. Bauer, and R. Irani (2018). Hydraulic valve-based active-heave
compensation using a model-predictive controller with non-linear valve compensations.
Ocean Engineering, 152, 47–56.
52
LIST OF PAPERS BASED ON THESIS
1. Zinage, S. and A. Somayajula (2020). A comparative study of different active

heave compensation approaches. Ocean Systems Engineering, 10(4), 373.
2. Zinage, S. and A. Somayajula (2021). Deep reinforcement learning based con-

troller for active heave compensation. IFAC-PapersOnLine, 54(16), 161-167.
53
View publication stats

Comparative Control

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparative Control

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

COMPARATIVE ANALYSIS OF DIFFERENT CONTROL STRATEGIES FOR ACTIVE

Thesis · June 2021

The user has requested enhancement of the downloaded file.

in partial fulfilment of the requirements

DEPARTMENT OF OCEAN ENGINEERING

This is to certify that the thesis titled COMPARATIVE ANALYSIS OF DIFFERENT

Dr. Abhilash Somayajula

I sincerely thank my BTech Project Committee members Prof. Suresh Rajendran,

KEYWORDS: Active Heave Compensation; Winch Control; Classical Control;

Heave compensation is an essential part in various offshore operations. It is used in

LIST OF FIGURES vii

2 SHIP MOTION MODELING 5

3 STATE SPACE MODEL OF THE WINCH DRIVE 11

5 DATA DRIVEN CONTROL 25

6 REINFORCEMENT LEARNING BASED CONTROL 30

2.1 KCS Particulars . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Data Values for Winch Drive . . . . . . . . . . . . . . . . . . . . . 14

4.1 Tuned Values for MPC . . . . . . . . . . . . . . . . . . . . . . . . 22

5.1 Details of Assumed Architecture . . . . . . . . . . . . . . . . . . . 27

7.1 Slight Sea State: Hs = 1.5 m, Tp = 6 sec . . . . . . . . . . . . . . . 37

1.1 Passive Heave Compensation System (Calnan, 2016) . . . . . . . . 2

2.1 PM Spectrum for 4 sea states . . . . . . . . . . . . . . . . . . . . . 6

3.1 Schematic diagram of hydraulic driven winch . . . . . . . . . . . . 11

4.1 Schematic diagram of closed loop winch control system . . . . . . . 15

5.1 Schematic architecture of the LSTM method . . . . . . . . . . . . . 26

6.1 DDPG Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 34

AHC Active Heave Compensation

Lpp Length between Perpendiculars

1.1 Background and Motivation

The handling of equipment’s in ocean can be an arduous task especially in higher

Figure 1.1: Passive Heave Compensation System (Calnan, 2016)

Figure 1.2: Active Heave Compensation System (Calnan, 2016)

1.2 Objective of the Thesis

1.3 Organisation of the Thesis

Chapter 4 gives an overview of each of the classical controllers implemented and

1. Heave compensation performance

2. Ability to handle saturation of the actuators

5. Attenuating measurement noise

Chapter 8 summarizes the conclusions of this study.

SHIP MOTION MODELING

Table 2.1: KCS Particulars

The Pierson Moskowitz spectrum corresponding to a significant wave height Hs and

For a simulation duration of T seconds, the frequency increment is given by ∆ω =

(ωn )new = (ωn )old + ∆ωX, (2.3)

where N is the number of wave components. In this project, N is taken as 100001 to

1. Slight sea state: Hs = 1.5 m, Tp = 6 s

2. Moderate sea state: Hs = 4 m, Tp = 9 s

3. Rough sea state: Hs = 6 m, Tp = 12 s

4. Very Rough sea state: Hs = 8.5 m, Tp = 14 s

Figure 2.1: PM Spectrum for 4 sea states

(a) Heave RAO (b) Roll RAO

(c) Pitch RAO

Figure 2.2: RAO’s obtained using MDLHydroD

Sresponse (ω) =| H(ω) |2 S(ω), (2.5)

(a) Hs = 1.5 m, Tp = 6 s (b) Hs = 4 m, Tp = 9 s

(c) Hs = 6 m, Tp = 12 s (d) Hs = 8.5 m, Tp = 14 s

STATE SPACE MODEL OF THE WINCH DRIVE

3.1 Dynamic Model of the Winch Drive

Figure 3.1: Schematic diagram of hydraulic driven winch

(Jw + mrw2 )ω̇w = −Dm kηm ∆p − bωw + mgrw , (3.3)

with the state

the system matrix

the input matrix

where τ 1. So, the TD target yi can be expressed as