Power Management MultiObjective Deep RL

832 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 6, NO.
2, JUNE 2020
Reliable Power Scheduling of an

Emission-Free Ship: Multiobjective
Deep Reinforcement Learning
Saeed Hasanvand , Mehdi Rafiei , Meysam Gheisarnejad ,
and Mohammad-Hassan Khooban , Senior Member, IEEE
i−1
Abstract— Environmental pollutants, as a global concern, have PCharge Transferred power from fuel cells and cold
led to a general increase in the utilization of renewable energy ironing to the batteries in hour i − 1.
resources instead of fossil fuels. Accordingly, the penetration of i−1
PDischarge Value of the supplied loads by batteries in
these resources in all-electric ships, as well as power grids, has hour i − 1.
increased in recent years. In this article, in order to have a
zero-emission and cost-effective energy management in an all- costFC Fuel cell operation cost.
electric ferry boat, a new reliable and optimal power scheduling costcold-ironing Cold-ironing operation cost.
is presented that uses fuel cell and battery energy storage systems. E ci Transferred power from cold ironing to the
Furthermore, the real information including load profile and load and batteries in the hour i.
paths is considered for the case study to assess the feasibility PB−L Transferred power from batteries to the
and superiority of the proposed approach. In addition to the
cost of energy management, to have a reliable combination of
loads.
max
the proposed resources, the loss of load expectation (LOLE) PCharge Allowed maximum charging of batteries in
as a reliability index is considered in the energy management an hour.
context and the problem is solved by the deep reinforcement PC−B Transferred power from cold ironing to
learning in a multiobjective manner. The results of the consider- charge the batteries.
ation of two common standards, including DNVGL-ST-0033 and
DNVGL-ST-0373, demonstrate that the proposed energy man-
PC−L Transferred power from cold ironing to
agement method is applicable in industrial applications. Finally, loads.
max
the real-time simulation-based hardware-in-the-loop (HIL) is con- PDischarge Allowed maximum discharging of batteries
ducted to validate the performance and efficacy of the suggested in an hour.
power scheduling for the emission-free ships. PFH2 H2 tank constraint penalty function.
Index Terms— Deep reinforcement learning (RL), energy man- PF−B Transferred power from fuel cells to charge
agement, fuel cell, hardware-in-the-loop (HIL), loss of load the batteries.
expectation (LOLE), zero-emission ships. min
PFC max
, PFC Power outage of fuel cells stays in their
allowed range.
N OMENCLATURE PF−L Transferred power from fuel cells to loads.
ρmid Price of cold-ironing energy in mid-peak.
cf Cost constant coefficient of fuel cells.
ρOFF Price of cold-ironing energy in OFF-peak.
E if Transferred power from fuel cells to the load
ρON Price of cold-ironing energy in ON-peak.
and batteries in hour i.
Cycles Operation time in which the battery is
ηFC Efficiency of fuel cells.
charged and discharged.
αm Minimum coefficients of the SOC.
DICBattery Battery daily investment cost.
PBattery Nominal power of batteries.
DICFC Fuel cell daily investment cost.
αM Maximum coefficients of the SOC.
OT Operation time.
SOCi State of charge in hour i.
tk Duration of failure state.
Manuscript received November 7, 2019; revised December 31, 2019 and pk Individual probability of the outage capacity.
February 24, 2020; accepted March 18, 2020. Date of publication March 25, ηc Charging efficiency of batteries.
2020; date of current version June 19, 2020. (Corresponding author: ηd Discharging efficiency of batteries.
Mohammad-Hassan Khooban.)
Saeed Hasanvand is with the Department of Electrical Engineering, (st , at ) State–action pair.

Firouzabad Institute of Higher Education, Firouzabad 74718, Iran Q (s, a, θ Q ) Output of target network.
(e-mail: saeedhasanvand@gmail.com). Q π (st , at ) Action-value function.
Mehdi Rafiei and Mohammad-Hassan Khooban are with the Department
of Engineering, Aarhus University, 8200 Aarhus, Denmark (e-mail: θQ Weight coefficient of the target network.
m_rafiei@ymail.com; khooban@ieee.org). θ Q
Weight coefficient of the neural network.
Meysam Gheisarnejad is with the Department of Electrical Engineering,
Najafabad Branch, Islamic Azad University, Isfahan 1477893855, Iran.
A Action space.
Digital Object Identifier 10.1109/TTE.2020.2983247 L(θ Q ) Loss function.
2332-7782 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universitas Indonesia. Downloaded on November 16,2020 at 05:58:19 UTC from IEEE Xplore. Restrictions apply.
HASANVAND et al.: RELIABLE POWER SCHEDULING OF AN EMISSION-FREE SHIP 833
R Reward. system with other clean energy resources such as photovoltaic,

S State space. energy storage device, and cold ironing [18].
Q(s, a, θ Q ) Output of the neural network. After the integration of battery and other clean energies,
β Percentage of the loads which is supplied by an important challenge to the ship’s power supply is its power
batteries or cold ironing. management and control strategy [19]–[21]. In modern energy
γ Discount factor. management, the reliability assessment is a key design in
π(at , st ) Policy function. order to continuously supply the consumers. Interruptions as a
consequence of any failure in MG affect the system reliability
I. I NTRODUCTION and reduce the overall performance and efficiency of a hybrid
energy system [22]. The reliability issue in power systems has
A. Literature Survey and Motivation been investigated extensively in the literature [23], [24], but
N OWADAYS, using fossil fuels to generate electricity has

led to some major concerns in energy communities. For
instance, fossil fuel reserves will run out in the future and
it has been rarely considered in power management for ships.
Hence, this article provides a multiobjective (MO) optimiza-
tion model to consider the reliability index in the presence
using these sources of energy will fall by one-third by 2040. of DGs and storage systems. For the energy management
Pollution is also a major drawback for fossil fuels because problem, the suggested MO model involves two conflicting
they emit carbon dioxide and other harmful air pollutants objectives which are, namely, the loss of load expectation
when burning. To address the concerns related to the conven- (LOLE) and the cost of the generated power.
tional resources, clean renewable energy sources are increas- Literature survey [25], [26] reveals that to optimize the
ingly being utilized around the world to replace fossil fuels. operation of the energy management problem, some global
Recently, the power distribution systems and nonfossil fuels optimization-based methodologies have been developed in
have been widely adopted in the shipping industry because this the control scheme design, like particle swarm optimization
industry is responsible for about 3–5 percent of the global CO2 (PSO), dynamic programming (DP), and sequential quadratic
emissions [1]. With the progress in nonfossil power resources programming (SQP), to name a few. Based on an accurate
that are located near consumers and load sites, a new concept model and complete knowledge of the future driving cycle,
named distributed generation (DG) plants has been introduced these algorithms can derive the global optimal energy
in the form of microgrids (MGs) [2]. Renewable and clean management for a particular driving cycle while their
energies can be applied as the main source of marine power solutions are not optimal for other cycles. A possible solution
systems to reduce emissions. Hence, applying various kinds of for addressing the limitations of the optimization strategies
green energies with different electrical loads in marine power is to adopt reinforcement learning (RL) in which the agent
systems can be regarded as mobile MGs [3]–[5]. can capture the optimal (or near-optimal) control policies by
Up to now, a lot of efforts have been made to achieve interacting with its environment. Since the RL schemes try to
optimum performance for the ship’s electrical power system improve the strategy using raw observation information and
in the context of MGs [6]. Nevertheless, due to the presence the reward signal emitted from the environment, it takes less
of dynamic loads and various operating scenarios, the power computational burden in its implementation for the energy
management and control of ship MGs have become more management problems [27]. However, the need to discretize
complex [7]. In the literature related to this topic, some the action and state pairs in this tabular scheme may lead to
advanced power generation, control, and optimization method- the problem of dimensionality [28], and make it difficult to
ologies have been developed to address the current challenges perform well in high-dimensional contexts, like the energy
in the MGs. Many efforts have been made to extend the management in an all-electric ferry boat. More recently,
use of the MGs from several viewpoints such as protection deep Q network (DQN) has been developed to address the
issues [8], [9], designing advanced control strategies [10], limitations of conventional RL schemes by replacing the
reactive power planning [11], and distributed energy resource Q-table in the procedure of Q-learning with deep neural
management [12], [13]. In the context of ship MGs energy networks (NNs). As a result, the DQN algorithm has been
management, numerous research studies have been devoted successfully applied to different fields of control problems for
to reducing the ship’s pollution effectively. For instance, the betterment of the existing results [29], [30], such as ramp
in [14] and [15], the energy storage systems with diesel genera- metering, quadrotor helicopter, and hybrid electric vehicles.
tors have been accommodated to improve the energy efficiency Literature survey reveals that the majority of the studies in
and reduce the emission. In [16], the optimal management the context of the DQN are focused on the single-objective
of the maritime photovoltaic/battery/diesel/cold-ironing hybrid optimization [29]–[32], while MODQN [33], [34] is preferred
plant is developed to study solar energy and to reduce the to optimize the environments with conflicting objective
ship’s electricity cost. In order to facilitate real-time imple- functions. Specifically, an MODQN algorithm is adopted in
mentation, a new configuration of the battery/flywheel hybrid this article that can assign the optimal energy of the energy
energy storage system is used in an all-electric ship [17]. management system in a model-free framework.
Another applicable option to obtain a zero-emission ship is The motivating factors for this article on the integration
to employ fuel cells that are highly efficient power resources of fuel cell and energy storage are supported by the issues
without causing any pollution. In addition, they can be oper- related to power management for zero-emission and cost-
ated as a dispatchable power supply and adopted in a hybrid effective ship. These power resources are gaining immense
834 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 6, NO. 2, JUNE 2020
importance in the recent power system to have a reliable and TABLE I

clean power generation. Accordingly, this article proposes the B OAT ’ S S PECIFICATIONS
DQN algorithm to solve the emission-free power management
problem in the all-electric ship by considering the total cost
and reliability to meet the power supply availability levels.
In order to achieve a better simulation of a real-world problem
with conflicting objective functions, an MO version of DQN
was formulated to solve the MO problem.
B. Contributions
In this article, the investment, operation costs (OCs), and
reliability assessment of a zero-emission energy management
system based on fuel cells, batteries, and cold ironing have
been conducted for the all-electric ferry ship. To overcome
the limitations of the fuel cell, including its slow dynamics,
batteries were used in the system. On the other hand, in an
attempt to correct the mass and volume limitations of ships [1],
Fig. 1. Schematic of power paths. (a) On sail. (b) At anchor.
cold ironing was employed so as to reduce the required
capacity of H2 tanks for fuel cells and batteries. As a result,
C. Organization of This Article
the combination of these energy resources leads to an efficient
hybrid energy system with a high potential to follow the load The rest of this article is organized as follows. In Section II,
changes and acceptable mass and volume. Many constraints the boat, paths specification, and loads profile are presented.
have been identified for the energy management of ships, such The proposed energy system and the required considerations
as time and location-based limitations of refueling, location- of the energy resources and their energy costs are explained
based differences in power generation profile, the constraint in Section III. Section IV includes the required informa-
of batteries, ramp up of the fuel cell, as well as the H2 tanks tion about the necessary standard approval for the proposed
constraint. In this problem, the optimization parameters are method. The proposed energy management strategy including
the hourly transferred powers from all energy resources to the constraints and cost functions are presented in Section V.
loads, as well as the size and capacity of these resources. The MODQN method is introduced in Section VI. The
In addition to the cost of energy management, to evaluate the experimental results are discussed in Section VII. The future
reliability of the proposed power system, the LOLE index scope and potentials of the current research are elaborated in
is considered in energy management and the problem is Section VIII. Finally, Section IX concludes the study.
solved by the MODQN algorithm. In order to demonstrate
the applicability of the proposed method for industrial pur- II. P ROBLEM S PECIFICATIONS
poses, the results of the analysis of two different standards,
Here, the specifications of the boat, paths, and energy profile
including DNVGL-ST-0033 and DNVGL-ST-0373, have been
are presented.
presented. Finally, real-time simulation-based hardware-in-the-
loop (HIL) has been performed to validate the performance
and effectiveness of the proposed power scheduling of the A. Boat Specifications
emission-free ships. In this article, a bay tour ferry boat is considered as the
To summarize, the main contributions of this article to the case study, where the fuel cell and battery systems have been
research area could be expressed as follows. properly planned to reach a zero-emission and cost-effective
1) The LOLE, as a power generation reliability metric, ship. It should be noted that this boat transfers travelers
is adopted in the energy management problem of a ferry for a one-day tour. The detailed specifications are presented
boat. in Table I. In addition to fuel cell and battery, the boat is
2) The DNVGL-ST-0033 and DNVGL-ST-0373 standards connected to the onshore power network which is known as
are implemented to assess the applicability of the cold ironing. Although cold ironing may be costly for boats
proposed energy management scheme in industrial with emission, this is done to reduce pollution. The reason for
applications. using cold ironing in this article is not only for the pollution
3) Because the formulated problem is complex, in the aspect but also that the zero-emission boat is connected to
sense that traditional optimization algorithms, e.g., PSO, the onshore power network based on energy requirements and
DP, and so on seem to be inapplicable, the new deep costs. The boat is shown in two states “on sail” and “at anchor”
RL scheme is adopted to solve the energy management in Fig. 1(a) and (b), respectively. As can be seen in Fig. 1,
problem. the power generated by the fuel cells and the entrancing power
4) Simulation-based HIL is employed to validate the of the cold ironing are used to supply the loads and charge
efficiency and usefulness of the proposed method from the batteries. However, batteries only supply loads. The power
the real-time point of view. transfer details are as follows.
Fig. 3. Load profile for 24 h.
Fig. 2. Path specifications.
PF−L : transferred power from fuel cells to supply the

loads; PF−B : transferred power from fuel cells to charge the
batteries; PB−L : transferred power from batteries to supply
the loads; PC−L : transferred power from cold ironing to supply
the loads; and PC−B : transferred power from cold ironing to
Fig. 4. Proposed circuit of the energy system.
charge the batteries.
B. Path Specifications
In order to evaluate the proposed power management
approach for 24 h, a path with three stopping places has been
considered, as shown in Fig. 2.
In this tour, both the “disconnected from power network”
and “cold-ironing” scenarios have been evaluated. The travel
Fig. 5. Efficiency versus output power of the FCveloCity-HD-100/200.
starts at 6 A . M. from point “A” and the boat arrives at
point “B” at 10 A . M. Afterward, the boat stays for three hours
sources and loads are connected to a 400-V dc bus. In this sys-
at point “B” and then departs for point “C” at 1 P. M .. The
tem, there are some fuel cells and batteries in which their num-
boat arrives to point “C” at 6 P. M. After 2 h, the boat leaves
ber and size will be obtained optimally in this article. The cold
point “C” at 8 P. M . toward point “A” and then spends 5 h to
ironing is also connected to the dc bus. According to Fig. 4, the
start the next travel. It should be noted that in the first and
loads consist of two waterjets as engines as well as dc loads
second journeys, the boat uses only one engine but on the
(e.g., radars and sonar) which are also connected to the dc bus.
third journey, it uses both engines. The cold ironing is only
The electronic converters are adopted to connect the compo-
available on points “A” and “B” and the H2 tanks can get
nents to the dc link while the control system is responsible for
charged only on points “A” and “C.”
ensuring their reliable operation. In Section III-A, the details
and considerations of each power source are presented.
C. Load Profile
According to the boat’s path, the power management prob- A. Fuel Cell
lem is solved for 24 h. The loads include the engines and
The efficiency versus output power diagram of
usual consumption loads which are shown in Fig. 3. The
FCveloCity-HD-100 and FCveloCity-XD-200 which are
consumption power by two engines at anchors is equal to 0.
adopted from Ballard company is shown in Fig. 5. Since
The power consumption of each engine is equal to 250 kW.
fuel cells have a slow dynamic, they cannot follow the
Because of the boat’s casting-off and docking maneuvers,
rapid changes in loads. Therefore, in order to satisfy the
the required engine power in starts and stops is higher than
fast dynamic of load variations, the batteries are used to
the usual sailing. Thus, the average power required by the
amortize the load and compensate for the fuel cell’s slow
engines in the first and last hours of each journey is higher than
dynamics. The power management of a boat is studied in
other hours of the journey. On the other hand, the lightning
the form of hourly averages, so it is not possible to see the
loads have different values based on the time of a day, but the
fast dynamic of the loads. Nevertheless, in order to consider
necessary loads (such as radars) are constantly in use.
the load’s fast dynamic, the coefficient β is defined as the
factor of the high dynamic part of the loads. Therefore,
III. C ONSIDERATION OF THE E NERGY S YSTEM
it is assumed that β × 100 percentages of loads in each
As mentioned earlier, the proposed energy system consists hour (equal to fast dynamic changes) must be supplied by
of the fuel cell, battery, and cold ironing. The schematic of batteries or cold ironing. This assumption, underlying the case
the energy system is shown in Fig. 4. The energy system is study, is modeled as the constraint in the MO optimization
proposed and verified in [35]. It can be seen that all the energy problem.
TABLE II
S IMULATOR C LASSES FOR THE F UNCTIONAL A REA O PERATION [37]
Fig. 6. Load duration curve (LDC) in percent.
C. Cold Ironing
Due to the dependency of the fuel cell’s efficiency on its
As mentioned earlier, the connection to the onshore power
output power (Fig. 5), its cost function cannot be defined with
network is optional for a zero-emission boat. This decision
respect to its output power. Hence, we should use the input
must be taken based on the boat’s power requirements and
power (amount of hydrogen consumed), which is equal to the
the power cost of the onshore network. The cost of the cold
output power divided by the efficiency, to calculate the cost of
ironing is related to the time of connection to the onshore
the fuel cell. The following equation is used for this purpose:
⎛ ⎞ network and is presented as follows:
E i ⎧
costifuel cell = c f ∗ ⎝ ⎠.
f
(1) ⎪
⎨ρON ∗ E c t ∈ [7, 10) ∪ [18, 20)
i
ηFC E if costcold-ironing = ρmid ∗ E ci t ∈ [6, 7) ∪ [10, 18) ∪ [20, 22)

i
⎪
⎩
In power management modeling of the zero-emission boat, ρOFF ∗ E ci t ∈ [22, 6)
the limitation of H2 is considered as a penalty function. (4)
where ρON , ρmid , and ρOFF are equal to the price of cold-
B. Battery ironing energy in ON-peak, mid-peak, and OFF-peak periods,
The batteries adopted for this article are Corvus Dolphin respectively. Furthermore, E ci is equal to the transferred power
Energy (66 kWh) and Corvus Orca Energy (125 kWh) from from cold ironing to the load and batteries in the hour i .
Corvus Energy Company. In order to increase the batteries’
lifetime as an important parameter, some limitations must be
IV. S TANDARD A PPROVAL
considered. The first limitation is defined in the form of the
maximum charging and discharging of batteries in a specific In order to implement the proposed energy management
period of time. Too much charging or discharging in a period method in industrial applications, the required standards must
of time causes heat, which has negative effects on the battery’s be clearly satisfied. The DNV GL company is a Norwegian
max
lifetime. Therefore, two parameters PCharge max
and PDischarge are company that has dedicated a research department to develop
defined here with the concept of the maximum allowed charge the standards, services, and rules for oil and gas, maritime,
and discharge power of batteries in 1 h. and energy industries since 1864. Based on the concept of
The other limitation is the maximum and minimum energy the proposed method and the utilized hardware and also
stored in batteries. In order to increase the battery’s lifetime, the DNV GL information, the proposed energy management
it should be better to keep the battery’s SOC in a specific method must meet two standards to find its way in the
range. The situations of the SOC which are higher and lower marine industry. These two standards are 1) DNVGL-ST-0033
than the suggested range are called overcharge and overdis- (Maritime simulator systems) [36] and 2) DNVGL-ST-0373
charge, respectively. Accordingly, by defining α M and αm (HIL) [37].
as the maximum and minimum coefficients of the SOC, Here, some considerations of these standards for the pro-
respectively, the following equation must be satisfied: posed method are presented.
αm ∗ PBattery ≤ SOC ≤ α M ∗ PBattery . (2)
A. DNVGL-ST-0033
The other consideration in the modeling of the batteries
is about the charge and discharge losses. These losses are This standard proposes one way of carrying out the approval
modeled by charge and discharge coefficients (ηc and ηd ). of the relevant maritime administration for the maritime sim-
Therefore, the SOC of the batteries at the beginning of hour i ulator systems that are used for mandatory simulator-based
is calculated as follows: training to demonstrate competence or to demonstrate the
continued proficiency required by the STCW (standards of
i−1 −1 i−1
SOCi = SOCi−1 + ηc PCharge P (3) training, certification, and watchkeeping) convention. Based on
ηd Discharge
the definitions related to the standard in Table II, the proposed
i−1
where PCharge is equal to the summation of the transferred method is categorized as a class C simulator. According to the
power from fuel cells and cold ironing to the batteries in hour standard, the simulator shall be capable of simulating a real-
i −1. In addition, PDischarge
i−1
represents the value of the supplied istic environment for the entire applicable STCW competency
loads by batteries in hour i − 1. requirements referred to as the DNVGL-ST-0033 text.
By considering the propulsion type of the engines (electric TABLE III

propulsion motors) and the class of the simulator (class C), C ONSTRAINTS E QUATIONS
to get the approval of the standard, first, the interactions
between the subsystems and the dynamic behavior of the
machinery systems and its essential parameters must be repli-
cated in the simulation model. Subsequently, the engine room
components with their processes and their controller systems
(sensors, controllers, actuators, and valves) must be simulated.
Finally, the simulation design shall include facilities that are
needed for the injection and resetting of faults during service
at proper times.
B. DNVGL-ST-0373
The purpose of the HIL analysis is to evaluate the target
system in order to provide objective evidence of appropriate
performance (during ordinary, unusual, or deteriorated condi-
tions) in accordance with the relevant functional requirements.
The aim of this standard is to offer a standard for third-
party certification of the HIL testing. A certificate supplied
according to these standard documents is in compliance with
the requirements stated in this standard.
The documentation that shall be submitted as the require-
ments for approval is 1) HIL test package documentation (a set
of documents as described in the DNVGL-ST-0373 text) and
All the constraints that govern this problem are listed
2) HIL test package report (a set of documents as described in Table III [(5)–(13)]. The explanations related to these
in the DNVGL-ST-0373 text). constraints are presented in following sections.
V. P OWER M ANAGEMENT 1) Power Supply Constraint: Based on the power supply
constraint, the value of the loads and the summation of the
In this article, the system OC in the form of an hourly power generated powers in each hour must be equal. This constraint
dispatch, the system investment cost (IC), and the system relia- is analyzed in two different states. In the “on sail” state,
bility are optimized as a power management problem. The aim the loads are supplied only by fuel cells and batteries. At first,
is to determine the best amount of generated or stored power of a part of the loads is supplied by fuel cells (PF−L i
) within
each energy source in each hour to satisfy the loads under the its acceptable range determined by (5). Subsequently, the
power supply constraints. Therefore, the power management remaining loads are supplied by batteries (PB−L i
) according
problem has been formulated into deep RL with an MO to (6). In the “at anchor” state, in addition to the fuel cells
framework to find the optimal solution. The transferred power, and batteries, the cold ironing can also be used to supply the
optimal number, and size of the resources are the optimization loads. Therefore, the parts of the loads are supplied by cold
parameters. The total costs, as well as the reliability index, are i
ironing (PC−L ) and batteries (PB−L
i
) in their acceptable ranges
considered as objective functions. Numerous indices exist for specified according to (9) and (10). After that, the power of
measuring the reliability of the power generation system; these i
the fuel cells (PF−L ) is used to supply the remaining loads
include loss of load probability (LOLP), LOLE, loss of energy following (11).
probability (LOEP), and loss of energy expectation (LOEE),
2) Batteries Maximum Charge and Discharge Constraints:
to name a few. Among the reliability indices, the LOLE index
Batteries have limitations in their amount of charge and
is often preferred in engineering problems [38], which is
discharge in a period of time. As mentioned earlier, these
estimated based on the expected number of days or hours in max max
limitations are modeled by PCharge and PDisharge . The term
a specified period in which a load loss will occur. LOLE of max
PCharge in (7), (12), and (13) shows the constraints for the
one day in 10 years, or 0.1 days/year, is commonly used in
maximum charge of batteries using fuel cells and cold ironing.
the power systems affiliated to the North American Electric max
Moreover, the term PDisharge in (5) and (10) represents the
Reliability Corporation (NERC) [39].
constraints for the maximum discharge of batteries to the
loads.
A. Constraints
3) Batteries Overcharge and Overdischarge Constraints: In
One of the important differences between ship energy order to avoid overcharge and over discharge of the batteries,
management in comparison to other situations is the high two coefficients (α M and αm ) are defined, respectively. There-
complexity of the energy management’s constraints. Such fore, the term αm ∗ PBattery in (7), (12), and (13) guarantees
complexities arise mainly from the time and location-based that the batteries will not get overdischarged. On the other
limitations of refueling, and the location-based differences in hand, the overcharge phenomenon is prevented by the term
power generation profile. α M ∗ PBattery in (7), (12), and (13).
4) Fuel Cells Maximum and Minimum Power Constraints: TABLE IV

min COPT
In order to apply these constraints, two parameters PFC and
max
PFC are defined here. Hence, these two parameters in (5),
(7), and (12) certify that the power outage of fuel cells stays
in their allowed range.
5) Constraint of Different Dynamics of Fuel Cells and
Loads: It has been said that one of the reasons for using
batteries with fuel cells is to fix the problem of fuel cells’ slow
dynamic response. For this purpose, the parameter β has been
defined as the percentage of loads supplied by batteries or cold
ironing. Therefore, the term (1 − β) ∗ PLoad i
in (5) and (10)
guarantees that at least β percentage of loads remains for
batteries or cold ironing.
6) Constraint of H2 Tank: This constraint is modeled as
a penalty function in the cost function of the optimization
problem. following equation:

n
LOLE = pk tk (17)
B. Objective Functions
k=1
The power management problem has been solved as an MO where pk is the individual probability of the capacity outage
problem. The objective functions are the summation of energy of Ok MW, n is the number of states of the capacity outage
resource costs that consist of OC, IC, and the penalty function probability table (COPT), and tk is the duration of the failure
of the H2 tank constraint, and LOLE. state which can be calculated using LDC according to Fig.
Operation Costs: Because of the different energy resources 6. Finally, LOLE is calculated using Table IV. To calculate
in “on sail” and “at anchor” states, two independent OC this index, the forced outage rate for the fuel cell and the
functions are defined for each state. As previously stated, the battery are 1% and 0.8%, respectively [40], [41]. It should be
H2 tank constraint is formulated as a penalty function (PFH2 ) noted that the peak load is 670 kW and in the procedure of
in the cost function. Therefore, the OC function is presented calculating LOLE, all states in which the output power of the
as follows: fuel cells and 30% of batteries (more than 30% reduces its

costFC + PFH2 on sail lifetime) are considered between 670 and 900 kW. Therefore,
OC = (14) the total objective functions can be presented as follows:
costFC + costcold-ironing + PFH2 at anchor. ⎧
⎪
⎪ F = Cost = IC + OC
Investment Cost: ICs for battery and fuel cell are calculated ⎪ 1
⎪
⎪
as follows. Since the energy management problem is solved ⎨ = DIC
⎪ FC + DICBattery
for 24 h, the costs are considered for each day. For fuel cell, costFC + PFH2 on sail (18)
⎪
⎪ +
the IC and lifetime are 40 $/kW and 40 000 h, respectively, ⎪
⎪ cost +cost +PF at anchor
⎪
⎪
FC cold-ironing H2
so the fuel cell’s daily IC is ⎩
F2 = Reliability = LOLE.
(PFC ∗ 40) ∗ OT
DICFC = . (15) VI. D EEP R EINFORCEMENT L EARNING E NERGY
40000
M ANAGEMENT S TRATEGY
DICFC is the fuel cell’s daily IC and OT is operation time,
which is the fuel cell’s working hours in energy management A. Formulating the Energy Management Model to the
problems. Markov Decision Process
The IC for the battery is 17.8 $/kWh, and the shelf life is 4 The main goal of RL is to find optimal decisions by trial-
years. Therefore, if the battery is charged and discharged once and-error using a large number of observations as well as the
a day, the shelf life is 1460 cycles, and the daily IC is analysis of the system behavior to ameliorate the system per-
formance. By setting up and configuring simulations in which
(E BA ∗ 17.8) ∗ Cycles
DICBattery = (16) an intelligent agent, over time, interacts with the environment,
1460 the problems associated with the sequential decision-making
where DICBattery is the battery daily IC and Cycles denotes the will be addressed by RL. Here, the energy management
operation time in which the battery is charged and discharged problem for the ship system is converted into an appropriate
in the energy management problem. form of RL. To achieve this goal, the RL model is formulated
Reliability Index: In addition to the operation and IC, for this problem as the Markov Decision Process (MDP) [29]
reliability is another objective function that is formulated here that contains the following compounds.
as LOLE. State Space S: A collection of all possible states st which
LOLE is calculated using the probability of the capac- the environment (i.e., energy management system) can assume.
ity outage and the time of failure state as given in the The intended state parameters should reflect the environmental
information as is easily obtainable. In this case, the batteries’

SOC remained H2 in the hydrogen tank (r H2 ); the LOLE, and
the total power output of fuel cells, batteries, and cold ironing
are selected as the state vector, i.e.,

St = SOC, r H2 , LOLE, PFC , PBattery , PCold-ironing . (19)
Action Space A: A finite collection of possible actions that

the RL agent can apply, leading a transition from state st
at step t to state st+1 at step t + 1. Each action of the
energy management agent at releases the power for energy
management purposes and puts the system in a new state. Fig. 7. Illustration of the DQN-based energy management system.
In this article, the action vector is defined by the flowed power
from fuel cells to loads and batteries, from batteries to loads,
and from cold ironing to loads and batteries, given as idea behind DQN is to implement the deep NN as a function
i approximator to find the optimal value function.
At = PF−L , PF−B
i
, PB−L
i
, PC−L
i
, PC−B
i
. (20)
In this application, a DQN algorithm is developed
Reward R: A reward signal rt is emitted from the energy for the concerned energy management problem as shown
management environment for the evaluation of the performed in Fig. 7 [42], which integrates the nonlinear function approx-
action in a certain state. The control goal of the operation imator (e.g., deep NN) and conventional RL.
scheme is realized in a way that minimizes the cost of a hybrid In particular, this algorithm utilizes the deep NN to estimate
energy system while the RL agent aims at optimally choos- the Q-value by (22), described as the Bellman equation, and
ing actions to maximize the accumulated reward. Therefore, explores the actions with proper value outcome. The NN
the reward signal rt is defined as the opposite of the system of DQN is trained with samples from a replay buffer R to
operating cost, i.e., the negative of the cost function (18) is minimize the correlation between sequential data. During the
used to derive the reward signal for the MDP problem. training procedure of the DQN, a minibatch of tuples (st , at ,
Agent: The controller of energy management system pro- rt , st+1 ) is randomly selected from the buffer R to update
viding a mapping from st to at and updates the policy in a the weight coefficients of the NNs. To avoid the risks of
way that obtains a higher reward from the energy sources. divergence resulted from the direct implementation of the NN,
Policy: The policy π describes the agent’s behavior in a a separate target network is also adopted to compute the target
specific environment, which makes a mapping from st to at . values. To preserve the right balance between the exploration
State Transition Function: After applying action at , the cur- and exploitation, the actions are chosen by a ε-greedy policy,
rent state st will move to a new state, s, which is simulated i.e., the mechanism choose a random action with probability ε
by the dynamics of the energy management environment. while the maximum Q-value will be chosen with probability
Cost Minimization: For the concerned problem, our goal 1 − ε. The parameters of the action value network are trained
is to maximize the rewards by exploring the optimal control based on minimizing the loss function L(θ Q ) in the following
policy for the energy management problem, mathematically, equation [30], [31]:
the goal is
Q
∞ L θ = E r + γ max Q st+1 , at+1 , θ Q
at +1
max

γ t rt+t (21)
π
Q 2
t =0 −Q st , at , θ (23)
where 0 ≤ γ < 1 is a discount factor.
Action-Value Function: This function represents the where Q(st , at , θ Q ) is the output of the main NN with the

expected long-term reward and is adopted to measure the qual- weight coefficients θ Q . r + γ maxat +1 Q(st+1 , at+1 , θ Q ) is the
ity of executing an action at under state st , i.e., Q π (st , at ) = target Q-value approximated by the target NN with the weight

E[Rt |st , at ]. Q ∗π , optimal value, returns the maximum Rt coefficients θ Q .
considering all policies, and it is obtainable using the Bellman The pseudocode of the standard DQN used for the energy
equation management problem is depicted in Algorithm 1 and the
training loop of the algorithm is shown in Fig. 8 [43].
Q ∗π (st , at ) = Ert ,st +1 ∼E rt+1 + γ max Q ∗π (st+1 , at+1 )|st , at .
at +1
C. Design of the Q-Network
(22)
Conventionally, the table-based RL algorithms (TBRLAs;
e.g., Q-learning, SARSA, and so on) are adopted to approx-
B. DQN-Based Strategy imate the optimal action-value function. Despite the fact that
As one of the leading technologies in the design of intelli- the TBRLAs can deal with the MDP problem with both
gent systems, deep RL (or called deep Q-network, DQN) has discrete state and action spaces, it is difficult for them to
become a hotspot in the area of machine learning. The main learn from the energy management of an all-electric ferry with
Algorithm 1 Pseudocode of the Standard DQN

Algorithm
1: Initialize reply buffer R to capacity N,
2: Initialize action value network with random weights θ Q
3: Initialize target action-value network with weights

θ Q ←|θ Q
4: for episode = 1 to MaxEpisode do
5: for t = 1 to time length of the driving cycle T do
6: Select a random action with probability ε Fig. 9. Structural diagram of the Q-network.
7: Otherwise select at = max Q(st , a|θ Q )

8: Apply action at to the environment
9: Observe reward rt and next state st+1
10: Store transition into (st , at , rt , st+1 ) replay
buffer R
11: Sample random minibatch of (st , at , rt , st+1 )
from R
12: if episode terminates at step i + 1; Set yi = ri
else set:

yi = ri + γ max Q(si+1 ai+1 |θ Q )
ai+1
Fig. 10. Photograph of PHIL platform for testing the suggested energy
13: Perform a gradient descent step on management scheme.
(yi − Q(si , ai |θ Q ))2
14: end D. Multiobjective MDPs
15: end
MOMDP is an extension of the standard MDP, where the
environment comprises a vector of m rewards, i.e., − →r t =
→
− −
→
[ r 1,t , . . . , r m,t ]T . In MOMDP, a vector of discounted returns
is used to evaluate a policy π (i.e., a solution). Thus,
∞ ∞

E γ rt+t , . . . , E
t
γ rm+t .
t
(25)
t =0 t =0
The goal in MO problems is met by discovering a set

of Pareto solutions or one or more solutions that fulfill the
decision-maker’s priority. Current strategies in MOMDP often
adopt scalarization functions to decrease the dimensional-
ity of the underlying MO environment into a single scalar
value [33], [44].
In this approach, a scalarization function f converts the
MO value V π of a polic π to a scalar value. In this article,
a linear scalarization is used, i.e., f (V π , w) = w · V π , where
w is the weight vector that determines the importance of each
Fig. 8. Flowchart of energy management of an all-electric ferry boat based
on deep RL. objective.
high-dimensional state space. To overcome this difficulty, a VII. R EAL -T IME V ERIFICATIONS
deep feedforward NN, as the Q-network, is designed in this In this section, deep RL with the MO framework is imple-
article to approximate the optimal action-value function. The mented for the energy management problem and it is examined
architecture of the designed Q-network is shown in Fig. 9. on an experimental testbed shown in Fig. 10. The DSpace
According to Fig. 9, the inputs of the Q-network are the MicroLabBox with DS1202 PowerPC Dual Core 2-GHz
state s of the energy management problem at time step t, processor board and DS1302 I/O board is utilized to verify
and the action values are approximated in the output of the the learning capability of the MODQN algorithm from a
network with respect to the relevant action a. In order to real-time perspective. In the energy management context, the
diminish the gradient-explosion problem, a rectified linear unit dSPACE testbed is implemented to offer a power HIL (PHIL)
(ReLU), as the activation function, is employed for hidden simulation by interfacing the DQN algorithm section to the
layers, given as energy management part.
As stated earlier, the fuel cells and batteries have been
x, if x > 0 involved in proposed energy system to reach a zero-emission
ϕ(x) = (24)
0, otherwise. and cost-effective ship. The design parameters in the energy
TABLE V
S IZE AND N UMBER OF E ACH R ESOURCE
Fig. 11. Value of optimization parameters. TABLE VI

VALUE OF O BJECTIVE F UNCTIONS (LOLE AND C OSTS )
management optimization problem are the nominal power of
the fuel cells and batteries (discrete parameters) and all hourly
transferred powers (continuous parameters). By considering
the highest hourly load which is about 670 kW, the selection
of nominal power of the fuel cells and batteries has been done
in a way that the summation of maximum fuel cells power and
30% of the batteries maximum capacity is more than 670 kW
and less than 900 kW. The design space of other parameters
is defined based on the selected nominal power of the fuel
cells and batteries. The final result of the power management
problem which is the power transfer of resources for each hour
is shown in Fig. 11. These values are transferred power from
fuel cells to supply the loads (PF−L ), transferred power from
fuel cells to charge the batteries (PF−B ), transferred power
from batteries to supply the loads (PB−L ), transferred power
from cold ironing to supply the loads (PC−L ), and transferred Fig. 12. Power dispatch of each resource.
power from cold ironing to charge the batteries (PC−B ).
The results of the proposed energy system have been com-
pared to another system named “current system,” which has a
similar energy system in comparison to the proposed structure.
The current system is a part of a project in the Energy
Technology Development and Demonstration Program [35].
By considering several rules based on the system’s situation,
the rule-based method is employed to solve the power manage-
ment problem. The considered method here is an SOC-based
power management strategy that considers the system’s SOC
and loads in each hour to distribute the required power among Fig. 13. Power output of fuel cells and remained H2 in 24 h.
available energy resources. The required information related
to this method is available in [45]. As can be seen from
Table V, the number of each resource in the proposed method
is more than the current system and also with a smaller size.
Therefore, LOLE is improved when the system has smaller
resources. The value of the objective functions for both current
and proposed systems, i.e., LOLE and costs, are tabulated
in Table VI, which is obtained by two power management
methods (MODQN and rule-based). Table VI shows that the
MODQN method has better results than the rule-based method
for the total cost. As can be seen, not only the total cost of the
proposed system is less than the current system (1035 versus
1047 $/day) but also LOLE index in the proposed method Fig. 14. Power output and SOC of batteries in 24 h.
has been significantly improved (0.1 versus 3.4 day/year).
The power management results of the proposed energy batteries. The remained percentage of H2 after each hour is
system for each resource and the total loads in each hour also shown in Fig. 13. It can be seen that the optimization
are shown in Fig. 12. Fig. 12 shows that all loads in each problem accurately exploits the fuel cell’s power to avoid the
hour are completely supplied and the optimization method lack of H2 at the time of sailing.
could correctly satisfy the power supply constraint. The output The batteries charge by fuel cells and cold ironing, batter-
power of fuel cells in 24 h is shown in Fig. 13. Fig. 13 shows ies discharge to loads and the SOC of batteries are shown
the hourly transferred power from fuel cells to the loads and in Fig. 14. The right side of Fig. 14 shows that the SOC
TABLE VII stability of the systems. Therefore, future research can explore
C OMPARISON OF THE H YBRID E NERGY S YSTEMS the possibility of using virtual inertia when designing control
methods to solve the stability issues related to marine systems.
On the other hand, future research can further use the pro-
posed strategy to solve various energy management problems.
For instance, in the vehicle electrification design, the bat-
tery is the most crucial component in an electric vehicle,
which requires energy management. Therefore, solving energy
management problems in modern distribution power systems
is always more than 400 kWh, which is more than 30% of including conventional resources, renewables, and electric
total battery power. This constraint protects the battery from vehicles needs a powerful optimization approach to meet
overdischarging. several constraints and minimize the system’s operational
In order to analyze the applicability of the proposed energy costs and emissions. Another application for the proposed
management method on other types of hybrid energy systems strategy can be in-home energy management. A home energy
and also to have a comparison with a common energy system, management system controls the electrical loads and power
a real-world energy system type, which has been widely used resources, such as air conditioners, electric vehicles, energy
in the literature [46]–[48], is considered here. The system storage systems, and renewable energy resources in order to
includes diesel generators, batteries, and cold ironing, and match their operation with the behavior of the residential
their main goals are to reduce the operating hours of diesel consumption, when the electricity price changes.
engines, decrease the total cost, and maintain the SOC at a
certain level. The proposed method is applied on a system IX. C ONCLUSION
with a diesel generator and several battery packs in which
This article explored the feasibility of managing the energy
the size of the selected components are: 1) a 600-kW diesel
sources of zero-emission marine systems while assessing the
generator and 2) a set of 712-kWh batteries (two 125 kWh and
seven 66 kWh). The comparison results including LOLE, total reliability specifically for a ferry boat. In the hybrid energy
system, fuel cells, batteries, and cold ironing were appropri-
cost, and the amounts of pollution are presented in Table VII.
ately planned in an MG form for ship systems. In the energy
From the reliability viewpoint, LOLE is larger for the common
system because of using the diesel generator and its high management problem, the optimization parameters included
the powers hourly transferred from all energy resources to
forced outage rate. It can be seen that the total cost of the
loads, as well as the size and capacity of these resources.
common system is slightly less than the emission-free system,
The objective functions were the cost of the generated power
owing to the lower energy prices of the diesel generators.
and a reliability metric, namely LOLE, which is the expected
It should be noted that the diesel generators have fewer ICs
in comparison to fuel cells, but their daily IC is higher number of days in a specified period. Since the problem
is characterized by two conflicting objectives, it was dealt
due to their smaller lifetime. Finally, Table VII shows the
with through the DQN technique in an MO manner. The
high pollution rates of the common system that can cause
monetary penalties in some places. Consequently, although results revealed that the proposed energy management method
correctly satisfied all constraints while minimizing the OC as
the proposed system is applicable in different hybrid energy
well as reaching a standard criterion for LOLE. In addition,
systems including the traditional diesel power systems, its
a consideration of the two standards, i.e., DNVGL-ST-0033
potential is best revealed when used in zero-emission ships,
and DNVGL-ST-0373, shows that the proposed method can
given the specific features of these systems such as their use
of renewable energy sources and variant loads. be used for industrial applications as well. Finally, real-time
simulations were conducted to show the performance and
efficacy of the proposed method for the emission-free ships.
VIII. F UTURE W ORKS AND P OTENTIAL I NDUSTRIAL
A PPLICATIONS OF THE P ROPOSED S TRATEGY
R EFERENCES
Given the growing interest in the use of green energy [1] J. J. Minnehan and J. W. Pratt, “Practical application limits of fuel cells
to reduce air pollution, energy and emission management and batteries for zero emission vessels,” Sandia Nat. Lab.(SNL-NM),
systems are increasingly making use of the renewable energy Albuquerque, NM, USA, Tech. Rep. SAND2017-12665, 2017.
[2] S. Hasanvand, M. Nayeripour, E. Waffenschmidt, and
technology. Therefore, future studies could continue this line H. Fallahzadeh-Abarghouei, “A new approach to transform an existing
of research by investigating the feasibility of using other distribution network into a set of micro-grids for enhancing reliability
sources of green energy, such as photovoltaic and wind energy, and sustainability,” Appl. Soft Comput., vol. 52, pp. 120–134, Mar. 2017.
[3] M.-H. Khooban et al., “Robust frequency regulation in mobile micro-
as power supplies in the systems. The possibility of using grids: HIL implementation,” IEEE Syst. J., vol. 13, no. 4, pp. 4281–4291,
a heat recovery method to convert the waste heat of the Dec. 2019.
fuel cells to electric power can be another topic for future [4] T. V. Vu, D. Gonsoulin, F. Diaz, C. S. Edrington, and T. El-Mezyani,
research to explore. Moreover, it is known that the increasing “Predictive control for energy management in ship power systems under
high-power ramp rate loads,” IEEE Trans. Energy Convers., vol. 32,
the number of inverter-based renewable generations and bat- no. 2, pp. 788–797, Jun. 2017.
teries (instead of using conventional rotating generators) can [5] R. Heydari, M. Gheisarnejad, M. H. Khooban, T. Dragicevic, and
lead to a reduction in the total inertia of the systems, and F. Blaabjerg, “Robust and fast voltage-source-converter (VSC) control
for naval shipboard microgrids,” IEEE Trans. Power Electron., vol. 34,
this correspondingly affects the frequency-related issues and no. 9, pp. 8299–8303, Sep. 2019.
[6] S. Mashayekh and K. L. Butler-Purry, “An integrated security- [26] Q. Wei, D. Liu, F. L. Lewis, Y. Liu, and J. Zhang, “Mixed iterative
constrained model-based dynamic power management approach for adaptive dynamic programming for optimal battery energy control in
isolated microgrids in all-electric ships,” IEEE Trans. Power Syst., smart residential microgrids,” IEEE Trans. Ind. Electron., vol. 64, no. 5,
vol. 30, no. 6, pp. 2934–2945, Nov. 2015. pp. 4110–4120, May 2017.
[7] J. S. Chalfant and C. Chryssostomidis, “Analysis of various all-electric- [27] R. Xiong, J. Cao, and Q. Yu, “Reinforcement learning-based real-time
ship electrical distribution system topologies,” in Proc. IEEE Electr. Ship power management for hybrid energy storage system in the plug-in
Technol. Symp., Apr. 2011, pp. 72–77. hybrid electric vehicle,” Appl. Energy, vol. 211, pp. 538–548, Feb. 2018.
[8] M. Amin Zamani, T. S. Sidhu, and A. Yazdani, “Investigations into the [28] T. Liu, X. Hu, S. E. Li, and D. Cao, “Reinforcement learning optimized
control and protection of an existing distribution network to operate as look-ahead energy management of a parallel hybrid electric vehi-
a microgrid: A case study,” IEEE Trans. Ind. Electron., vol. 61, no. 4, cle,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 4, pp. 1497–1507,
pp. 1904–1915, Apr. 2014. Aug. 2017.
[9] A. Kahrobaeian and Y. A.-R.-I. Mohamed, “Interactive distributed gen- [29] C. Wang, J. Wang, Y. Shen, and X. Zhang, “Autonomous navigation of
eration interface for flexible micro-grid operation in smart distribution UAVs in large-scale complex environments: A deep reinforcement learn-
systems,” IEEE Trans. Sustain. Energy, vol. 3, no. 2, pp. 295–305, ing approach,” IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 2124–2136,
Apr. 2012. Mar. 2019.
[30] Y. Wu, H. Tan, J. Peng, H. Zhang, and H. He, “Deep reinforcement
[10] J. M. Guerrero, M. Chandorkar, T.-L. Lee, and P. C. Loh, “Advanced learning of energy management with continuous control strategy and
control architectures for intelligent microgrids—Part I: Decentralized traffic information for a series-parallel plug-in hybrid electric bus,” Appl.
and hierarchical control,” IEEE Trans. Ind. Electron., vol. 60, no. 4, Energy, vol. 247, pp. 454–466, Aug. 2019.
pp. 1254–1262, Apr. 2013. [31] H. Hua, Y. Qin, C. Hao, and J. Cao, “Optimal energy management
[11] R. Majumder, “Reactive power compensation in single-phase operation strategies for energy Internet via deep reinforcement learning approach,”
of microgrid,” IEEE Trans. Ind. Electron., vol. 60, no. 4, pp. 1403–1416, Appl. Energy, vol. 239, pp. 598–609, Apr. 2019.
Apr. 2013. [32] M. Gheisarnejad, J. Boudjadar, and M.-H. Khooban, “A new adaptive
[12] H. S. V. S. K. Nunna and S. Doolla, “Multiagent-based distributed- type-II fuzzy-based deep reinforcement learning control: Fuel cell air-
energy-resource management for intelligent microgrids,” IEEE Trans. feed sensors control,” IEEE Sensors J., vol. 19, no. 20, pp. 9081–9089,
Ind. Electron., vol. 60, no. 4, pp. 1678–1687, Apr. 2013. Oct. 2019.
[13] A. Chaouachi, R. M. Kamel, R. Andoulsi, and K. Nagasaka, “Multi- [33] A. Abels, D. M. Roijers, T. Lenaerts, A. Nowé, and
objective intelligent energy management for a microgrid,” IEEE Trans. D. Steckelmacher, “Dynamic weights in multi-objective deep
Ind. Electron., vol. 60, no. 4, pp. 1688–1699, Apr. 2013. reinforcement learning,” 2018, arXiv:1809.07803. [Online]. Available:
[14] S. Faddel, A. A. Saad, T. Youssef, and O. Mohammed, “Decentralized http://arxiv.org/abs/1809.07803
control algorithm for the hybrid energy storage of shipboard power [34] M. M. Hasan, K. Lwin, M. Imani, A. Shabut, L. F. Bittencourt,
system,” IEEE J. Emerg. Sel. Topics Power Electron., vol. 8, no. 1, and M. A. Hossain, “Dynamic multi-objective optimisation using deep
pp. 720–731, Mar. 2020. reinforcement learning: Benchmark, algorithm and an application to
[15] W. Lhomme and J. P. Trovão, “Zero-emission casting-off and docking identify vulnerable zones based on water quality,” Eng. Appl. Artif.
maneuvers for series hybrid excursion ships,” Energy Convers. Manage., Intell., vol. 86, pp. 107–135, Nov. 2019.
vol. 184, pp. 427–435, Mar. 2019. [35] M. H. Khooban, M. Gheisarnejad, H. Farsizadeh, A. Masoudian, and
[16] R. Tang, X. Li, and J. Lai, “A novel optimal energy-management J. Boudjadar, “A new intelligent hybrid control approach for DC–DC
strategy for a maritime hybrid energy system based on large- converters in zero-emission ferry ships,” IEEE Trans. Power Electron.,
scale global optimization,” Appl. Energy, vol. 228, pp. 254–264, vol. 35, no. 6, pp. 5832–5841, Jun. 2020.
Oct. 2018. [36] Det Norske Veritas and Germanischer Lloyd. Accessed: May 2019.
[Online]. Available: http://rules.dnvgl.com/docs/pdf/DNVGL/ST/2019-
[17] J. Hou, J. Sun, and H. Hofmann, “Control development and performance
05/DNVGL-ST-0033.pdf
evaluation for battery/flywheel hybrid energy storage solutions to mit- [37] Det Norske Veritas and Germanischer Lloyd. Accessed: May 2016.
igate load fluctuations in all-electric ship propulsion systems,” Appl. [Online]. Available: http://rules.dnvgl.com/docs/pdf/DNVGL/ST/2016-
Energy, vol. 212, pp. 919–930, Feb. 2018. 05/DNVGL-ST-0373.pdf
[18] N. Bigdeli, “Optimal management of hybrid PV/fuel cell/battery power [38] R. Billington and R. N. Allan, “Reliability evaluation of power systems,”
system: A comparison of optimal hybrid approaches,” Renew. Sustain. 1984.
Energy Rev., vol. 42, pp. 377–393, Feb. 2015. [39] A. P. Sanghvi, N. J. Balu, and M. G. Lauby, “Power system reliability
[19] M. Banaei, M. Rafiei, J. Boudjadar, and M.-H. Khooban, “A comparative planning practices in North America,” IEEE Trans. Power Syst., vol. 6,
analysis of optimal operation scenarios in hybrid emission-free ferry no. 4, pp. 1485–1492, Nov. 1991.
ships,” IEEE Trans. Transp. Electrific., vol. 6, no. 1, pp. 318–333, [40] M. A. Fotouhi Ghazvini, H. Morais, and Z. Vale, “Coordina-
Mar. 2020. tion between mid-term maintenance outage decisions and short-term
[20] S. Fang, Y. Xu, Z. Li, T. Zhao, and H. Wang, “Two-step multi-objective security-constrained scheduling in smart distribution systems,” Appl.
management of hybrid energy storage system in all-electric ship micro- Energy, vol. 96, pp. 281–291, Aug. 2012.
grids,” IEEE Trans. Veh. Technol., vol. 68, no. 4, pp. 3361–3373, [41] S. R. U. Electric Utility Commission. Accessed: Sep. 16, 2016. [Online].
Apr. 2019. Available: https://austinenergy.com
[21] A. Letafat et al., “Simultaneous energy management and optimal com- [42] Y. Hu, W. Li, K. Xu, T. Zahid, F. Qin, and C. Li, “Energy management
ponents sizing of a zero-emission ferry boat,” J. Energy Storage, vol. 28, strategy for a hybrid electric vehicle based on deep reinforcement
Apr. 2020, Art. no. 101215. learning,” Appl. Sci., vol. 8, no. 2, p. 187, Jan. 2018.
[22] A. I. Sarwat, A. Domijan, M. H. Amini, A. Damnjanovic, and [43] X. Han, H. He, J. Wu, J. Peng, and Y. Li, “Energy management based on
A. Moghadasi, “Smart grid reliability assessment utilizing Boolean reinforcement learning with double deep Q-learning for a hybrid electric
driven Markov process and variable weather conditions,” in Proc. North tracked vehicle,” Appl. Energy, vol. 254, Nov. 2019, Art. no. 113708.
Amer. Power Symp. (NAPS), Charlotte, NC, USA, Oct. 2015, pp. 4–6. [44] H. Mossalam, Y. M. Assael, D. M. Roijers, and S. Whiteson,
“Multi-objective deep reinforcement learning,” 2016, arXiv:1610.02707.
[23] S. Hasanvand, M. Nayeripour, S. A. Arefifar, and
[Online]. Available: http://arxiv.org/abs/1610.02707
H. Fallahzadeh-Abarghouei, “Spectral clustering for designing robust [45] J. Han, J.-F. Charpentier, and T. Tang, “An energy management system of
and reliable multi-microgrid smart distribution systems,” IET Gener., a fuel cell/battery hybrid boat,” Energies, vol. 7, no. 5, pp. 2799–2820,
Transmiss. Distrib., vol. 12, no. 6, pp. 1359–1365, 2018. 2014.
[24] M. Nayeripour, H. Fallahzadeh-Abarghouei, S. Hasanvand, and [46] M. Kalikatzarakis, R. D. Geertsma, E. J. Boonen, K. Visser, and
M.-E. Hassanzadeh, “Interactive fuzzy binary shuffled frog leaping algo- R. R. Negenborn, “Ship energy management for hybrid propulsion and
rithm for multi-objective reliable economic power distribution system power supply with shore charging,” Control Eng. Pract., vol. 76,
expansion planning,” J. Intell. Fuzzy Syst., vol. 29, no. 1, pp. 351–363, pp. 133–154, Jul. 2018.
Sep. 2015. [47] M. D. A. Al-Falahi, K. S. Nimma, S. D. G. Jayasinghe, H. Enshaei, and
[25] T. Mesbahi, N. Rizoug, P. Bartholomeüs, R. Sadoun, F. Khen- J. M. Guerrero, “Power management optimization of hybrid power sys-
fri, and P. Le Moigne, “Optimal energy management for a li- tems in electric ferries,” Energy Convers. Manage., vol. 172, pp. 50–66,
ion battery/supercapacitor hybrid energy storage system based on Sep. 2018.
a particle swarm optimization incorporating Nelder–Mead simplex [48] E. Skjong, T. A. Johansen, M. Molinas, and A. J. Sørensen, “Approaches
approach,” IEEE Trans. Intell. Vehicles, vol. 2, no. 2, pp. 99–110, to economic energy management in diesel–electric marine vessels,”
Jun. 2017. IEEE Trans. Transp. Electrific., vol. 3, no. 1, pp. 22–35, Mar. 2017.

Power Management MultiObjective Deep RL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Power Management MultiObjective Deep RL

Uploaded by

Copyright:

Available Formats

832 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 6, NO.

Reliable Power Scheduling of an

R Reward. system with other clean energy resources such as photovoltaic,

N OWADAYS, using fossil fuels to generate electricity has

importance in the recent power system to have a reliable and TABLE I

Fig. 3. Load profile for 24 h.

Fig. 2. Path specifications.

PF−L : transferred power from fuel cells to supply the

Fig. 6. Load duration curve (LDC) in percent.

ηFC E if costcold-ironing = ρmid ∗ E ci t ∈ [6, 7) ∪ [10, 18) ∪ [20, 22)

By considering the propulsion type of the engines (electric TABLE III

4) Fuel Cells Maximum and Minimum Power Constraints: TABLE IV

information as is easily obtainable. In this case, the batteries’

Action Space A: A finite collection of possible actions that

Algorithm 1 Pseudocode of the Standard DQN

7: Otherwise select at = max Q(st , a|θ Q )

The goal in MO problems is met by discovering a set

Fig. 11. Value of optimization parameters. TABLE VI

You might also like