Professional Documents
Culture Documents
H I G H L I G H T S G R A P H I C A L A B S T R A C T
A R T I C L E I N F O A B S T R A C T
Keywords: Vehicles using a single fuel cell as a power source often have problems such as slow response and inability to
Deep reinforcement learning recover braking energy. Therefore, the current automobile market is mainly dominated by fuel cell hybrid
Energy management strategy vehicles. In this study, the fuel cell hybrid commercial vehicle is taken as the research object, and a fuel cell/
Fuel cell
battery/supercapacitor energy topology is proposed, and an energy management strategy based on a double-
Hybrid electric vehicle
TD3
delay deep deterministic policy gradient is designed for this topological structure. This strategy takes fuel
cell hydrogen consumption, fuel cell life loss, and battery life loss as the optimization goals, in which super-
capacitors play the role of coordinating the power output of the fuel cell and the battery, providing more
optimization ranges for the optimization of fuel cells and batteries. Compared with the deep deterministic
policy gradient strategy (DDPG) and the nonlinear programming algorithm strategy, this strategy has reduced
hydrogen consumption level, fuel cell loss level, and battery loss level, which greatly improves the economy
and service life of the power system. The proposed EMS is based on the TD3 algorithm in deep reinforcement
learning, and simultaneously optimizes a number of indicators, which is beneficial to prolong the service life of
the power system.
* Corresponding author.
E-mail addresses: akitaw@foxmail.com (J. Wang), zhoujianhao@nuaa.edu.cn (J. Zhou), zwz@nuaa.edu.cn (W. Zhao).
https://doi.org/10.1016/j.geits.2022.100028
Received 14 February 2022; Received in revised form 8 May 2022; Accepted 4 August 2022
Available online 19 September 2022
2773-1537/© 2022 The Authors. Published by Elsevier Ltd on behalf of Beijing Institute of Technology Press Co., Ltd. This is an open access article under the CC BY-
NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
2
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
1. Introduction deep learning mainly determines the parameters of the control algorithm,
perceives the current state of the environment, or predicts the state of the
Fuel cell hybrid vehicles generally contain more than one power next moment for further control and analysis; reinforcement learning can
source, and their energy management will be more complicated than control and make decisions based on real-time feedback. Therefore, unlike
single-power sourced fuel cell electric vehicles. In the automotive field, traditional control schemes, DRL algorithms are able to learn control ac-
the existing energy management strategies (EMS) for multi-power sour- tions through continuous trial-and-error interactions with the environment
ces generally take fuel economy and travelling endurance as control under appropriate reward and punishment mechanisms [10,11]. More-
goals. EMS can be generally divided into rules-based control strategies over, DRL does not require a detailed physical model, and can continuously
and optimization-based control strategies. With the development of learn and optimize control actions, so it is very suitable for complex dy-
artificial intelligence (AI) technology, many scholars have begun to try to namic systems and can optimize multiple optimization objectives simul-
apply various AI algorithms to EMS, so AI-based EMS have gradually taneously. Liu et al. [12] proposed a Q-learning-based EMS to allocate
emerged. engine torque which revealed a near-optimal performance in comparison
The rule-based EMS is easy to implement and apply, but requires to DP-based EMS. Yuan et al. [13] proposed a Q-learning-based EMS for
sufficient experience and on-line calibration. Generally, the EMS is plug-in FCHEVs, which optimizes fuel cell start-stop, which has a certain
formulated by the designer based on the current road conditions and the impact on suppressing fuel cell aging. Reddy et al. [14] proposed an EMS
understanding of the hybrid power system through experience, mainly based on deep reinforcement learning to reduce battery loss and fuel
including state machine/operation mode control, power-following con- consumption, and improve economy while maintaining battery SOC. Li
trol, power decoupling and fuzzy logic control (FLC), etc. [1–4]. Wang et al. [15] used deep reinforcement learning to research and develop EMS
et al. [5] proposed an EMS for hybrid electric vehicles (HEV) with fuel for series hybrid electric vehicles, by integrating historical accumulated
cell/battery/supercapacitor power sources and the power distribution trip information to achieve more effective control of the state of charge in
among the three power sources was performed in real time with the help DRL-based EMS. Li et al. [16] proposed an EMS for electric vehicle hybrid
of power demand prediction, and its hydrogen consumption and SOC battery system based on deep reinforcement learning to formulate EMS for
maintenance were verified under real driving conditions. Gao et al. [6] the hybrid battery system according to the electrical and thermal charac-
proposed an EMS for fuel cell hybrid bus, FLC was applied to determine teristics of the battery, aiming to reduce energy losses and improve the
the power allocation of fuel cell based on the required power and electrical and thermal safety levels of the entire system. Han et al. [17]
regenerative braking power, and it is verified through experiments that proposed an EMS for dual-motor-driven hybrid crawler vehicles based on a
the strategy can follow the demand power well. dual deep Q-learning algorithm, which prevents the training process from
Optimization-based EMS can be divided into global optimization falling into an over-optimistic estimation of the strategy value, and high-
strategies and real-time optimization strategies. In general, the global lights its significant advantages in iterative convergence rate and optimi-
optimization algorithm is performed offline under the premise of known zation performance. In conclusion, the above scholars have successfully
driving conditions or power requirements, mainly including dynamic applied the reinforcement learning idea to the energy management of fuel
programming algorithm (DP), genetic algorithm (GA), particle swarm cell hybrid electric vehicles, providing ideas and references for this
algorithm and so on. In reality, road conditions change dynamically, research. However, the EMS proposed in the above research is not well
which makes its implementation in the Vehicle Control Unit (VCU) applicable to energy topology with multi power sources, such as fuel
impractical. The high computational cost also limits its application in cell/battery/supercapacitor.
energy management. However, since its optimization result is a global Fuel cell and lithium battery as hybrid energy source (HES) solves the
optimization, it can provide a data basis for other online and real-time problem for single fuel cell powered vehicle due to the slow response of
control strategies. Xu et al. [7] proposed a DP-based EMS to optimize the fuel cell. However, this kind of power topology also has certain de-
the driving cost of fuel cell and lithium battery. fects, such as the rapid depletion of battery health. Therefore, fuel cell/
Different from the global optimization algorithm, the real-time opti- lithium battery/supercapacitor based HES was proposed to improve the
mization algorithm allocates energy by minimizing the instantaneous health and longevity of battery. Liu [18] et al. proposed a fuel cell/bat-
cost function of the system, and the computational cost is greatly tery/supercapacitor based EMS, and used ADVISOR to carry out a sys-
reduced, such as Maximum Power Point Tracking (MPPT), Equivalent tematic simulation analysis of a hybrid vehicle. Compared with fuel
Consumption Minimization Strategy (ECMS), and Model Predictive cell/battery vehicles, supercapacitors provide more optimization range
Control (MPC). Han et al. [8] proposed an ECMS based on an adaptive for the optimization of fuel cells and battery packs, improving the
equivalent factor. During the driving process of a fuel cell hybrid vehicle economy and service life of the power system, the disadvantage is that it
in a fixed operating condition, the ECMS will automatically select the increases the complexity of the system and needs to consider more
optimal equivalent factor according to the operating conditions. Shen optimization objectives. Most of the existing EMS for three power sources
et al. [9] used a fuzzy modeling framework to build a robust MPC only optimize hydrogen consumption and lack consideration of fuel cell
controller, while using the Linear Matrix Inequality (LMI) technique to degradation and battery longevity. In this case, the EMS obtained by
express the constraints in the optimization problem, its control effect has training and verification cannot obtain good optimization results in
also been verified, but the lack of consideration of fuel cell and battery others complex systems, and the generalization ability is limited. Cai
efficiency and loss is likely to cause unnecessary cost loss. et al. [19] proposed a decentralized EMS based on a hybrid virtual
With the increasing application of AI-based algorithms, many re- impedance droop fuel cell/battery/supercapacitor hybrid system, and
searchers have begun to adopt deep reinforcement learning (DRL) based verified the reliability of the strategy by numerical simulation. This
EMS. DRL owns the perception ability of deep learning and the decision- strategy integrates various evaluation indicators of dynamic systems, and
making ability of reinforcement learning. In terms of EMS development, achieves good optimization results, but has poor migration ability and
3
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
4
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
Within above-mentioned mathematical model, the simulation accu- It can be known from the equivalent circuit that the battery output
racy of the stack characteristics is dependent on a group of parametric voltage Vb can be expressed as follows:
coefficients. These parameters were identified using GA method based on
the experimental results provided by the manufactories [21]. Vb ¼ Voc Ib Rint (10)
Regarding to the fuel cell system, the net power Pfc denotes the dif- The output power Pb of the lithium battery is as follows:
ference between the gross power Pstack and the auxiliary power Paux,
which can be computed by: Pb ¼ Vb Ib (11)
Assuming that the output power of the lithium battery is known, the
Pstack ¼ Ufc Ist
(6) current of the lithium battery can be calculated according to the
Pfc ¼ Pstack Paux
following formula:
The electric air compressor system and cooling system are the main
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
auxiliary equipment for the fuel cell system [22]. The power demand of Voc V 2oc 4Rint Pb
Ib ¼ (12)
the compressor can be depicted as: 2Rint
κ1 ! The characteristic curve of the single cell of the lithium battery used
Cp Tair pout κ
Paux Pcp ¼ 1 qair (7) in the experiment is shown in Fig. 4.
ηmech ηmot pin
where, Pcp refers to compressor power. Cp and Tair denote the heat ca-
pacity and temperature of air, respectively. κ denotes as the adiabatic
coefficient. pin and pout are the inlet and outlet pressure of air, respec-
tively. qair is the mass flow rate of air. ηmech and ηmot stand for the effi-
ciency of compressor and its drive motor, respectively.
The hydrogen consumption rate can be computed by [23]:
NMH2
m_ H2 ¼ Ist (8)
nF
where, N is cell number of the stack, MH2 is the molar mass of hydrogen, n
is the number of transferred charges.
Meanwhile, for a given rate of hydrogen consumption rate, the effi-
ciency of a fuel cell can be defined as the ratio of the output power of the
fuel cell to the power generated by the hydrogen:
Pfc
ηfc ¼ (9)
m_ H2 LHV Fig. 4. Lithium battery characteristic curve.
2.4. Supercapacitor
5
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
across the ideal capacitor and the terminal voltage of the supercapacitor, 2.5. DC/DC converter
respectively, and Ic is the output current of the supercapacitor.
DC-DC converter is necessary which implements bidirectional boost
and buck (lower) operation. In this work, two DC-DC converters in par-
allel are required for DFC as shown in Fig. 1. In order to facilitate energy
analysis and cost calculation, the efficiency of the DC-DC converters are
regarded as a fixed value, so that the DC-DC converter can be modeled as
follows:
where, Ureq and Ireq represent the requested voltage and current from DC
bus, respectively; ηDCDC refers to the efficiency of the DC-DC converter.
2.6. Motor
The motor used in this paper is directly connected to the end of the
drive shaft, and can be used as a motor to provide traction torque, or as a
generator to absorb braking torque to achieve regenerative braking. The
motor efficiency ηm is related to the two working modes, and is related to
Fig. 5. Super capacitor RC circuit.
the torque Tmot and speed of the motor ωmot , and its calculation formula is
as follows:
It can be seen from the RC circuit that the output power Psc of the sgn Preq Tmot ωmot ηm ; ðMotor modeÞ
super capacitor can be expressed as follows: Pmot ¼ Preq ¼ (18)
sgn Preq ηm Tmot ωmot ; ðGenerator modeÞ
Psc ¼ Vc Ic ¼ ðVc_oc Ic Rc ÞIc (14)
where Pmot is the output power of the motor; Preq is the required power of
Similar to the lithium battery, at time k, the power of the super- the vehicle.
capacitor can be expressed as follows: The motor efficiency diagram used in this paper is shown in Fig. 6. On
the premise of known speed and torque, the motor efficiency can be
Ic ðkÞ
SOCsc ðkÞ ¼ SOCðk 1Þ (15) obtained by looking up the table.
Qc
where Qc is the maximum charge of the supercapacitor. 2.7. Longitudinal dynamics model of logistics truck
In addition, the voltage of the super capacitor is also closely related to
the power of the super capacitor. In general, the ideal open circuit voltage The research object of this paper is a fuel cell hybrid logistics truck
at the current moment can be expressed as follows: and its main parameters are shown in Table 1.
According to the power balance relationship during the driving pro-
Vc_oc ðkÞ ¼ SOCsc ðkÞ Vc_oc_max (16) cess of the car, the following formula can also be obtained as follows:
v
where Vc_oc_max is the maximum value of the super capacitor voltage. Preq ¼ mgf cos α þ 0:5ρCD Av2 þ ma þ mg sin α (19)
1000
6
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
7
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
y ¼ minðy1 ; y2 Þ (21) consumption and reduce the lifetime loss of fuel cells and batteries, while
maintaining battery SOC, thereby reducing the overall cost of energy
The utilization of two Critics for training can effectively solve the
system operation.
deviation caused by over-estimation of the Q value, but in the process of
A large number of studies have shown that the degradation of fuel
network update, each step will generate a small error, and after multiple
cells mainly includes low load, high load, and frequent load changes, so
updates, the error will be amplified. Ultimately, the inaccurate Q value
the loss of fuel cell life Cfc can be expressed as follows:
leads to a high variance problem. In order to reduce the variance, a twin-
delayed update manner is adopted, and the current Actor and target C_ fc ¼ Clow þ Chigh þ Cchange (23)
network will not be updated immediately after the current Critic update.
Other networks do not update until Critic is updated N times, and the In the above formula, Clow is the life loss of the fuel cell at low load;
update of the target network continues the soft update method of the Chigh is the life loss of the fuel cell at high load; Cchange is the life loss of the
traditional DDPG algorithm. fuel cell with frequent load changes.
After solving the high variance problem, for the error itself, the value The life loss of the fuel cell during low-load operation can be
function needs to be estimated more accurately [19], so a certain noise μ expressed as a function of the low-load operation time Tlow of the fuel cell
is added to the target Q value as follows: [29]:
0 0 0
yi ¼ r þ γQ ðs ; a þ μ; ω Þ
0
(22) λlow Tlow Mfc
Clow ¼ (24)
~ fc
V
The pseudo-code of the TD3 algorithm is shown in Table 3.
where λlow is the decay rate; Mfc is the cost of the fuel cell. According to
literature [25], the average cost of the fuel cell is 593.95 ¥/kw, and the
Table 3 maximum output power of the fuel cell studied in this paper is 76 kw,
Pseudo-code of TD3. then the cost of the fuel cell is 76 593.95 ¼ 45,140.49 ¥; V ~ fc is the
Pseudo code for offline training of TD3 algorithm pressure drop when the fuel cell is scrapped (generally, the voltage of the
fuel cell drops by 10% at rated power).
1: Initialize the network parameters ω1 , ω2 of the two Critic networks, and the actor's
network parameters θ The degradation cost of the fuel cell under high load operation can
0 0
2: Initialize the Critic and Actor networks in the target network: ω1 ← ω1 、 ω2 ← ω2 、 also be expressed as a function of the high load operation time Thigh of the
0
θ ←θ fuel cell [29] as follows:
3. Empty the experience replay pool R
4: For episode ¼ 1 to M do λhigh Thigh Mfc
Chigh ¼ (25)
5: Begin with an Ornstein-Uhelnbeck (OU) noise N for exploration ~ fc
V
6: Observe initial state s1
7: For t ¼ 1 to T do
8: Select action at ¼ π θ ðsÞ þ N by current policy π ðat ; st Þ and exploration noise
where λhigh is the decay rate.
9: Execute action at The degradation cost of the fuel cell when the load changes can be
10: Observing the reward for current system feedback rt expressed as a function of the fuel cell power change rate P_ fc as follows
11: Observe the system state at the next moment stþ1 [29]:
12: Store transitions ðst ; at ; rt ; stþ1 Þ into R
13: Sample a random mini-batch of m transition from R Z
0 λchange Mfc P_ fc
14: The output of the Actors in the target network will also add noise a ~ ← π θ0 ðs Þ þ μ, Cchange ¼ dt (26)
μ clipðNð0; σ~Þ; c; cÞ 1000nfc V~ fc
15: Calculate the target Q value using the smaller value in the target critical output:
0 0 0
yt ¼ rt ðst ; at Þ þ γmini¼1;2 Q ðst ; at ; ωi Þ where λchange is the decay rate; nfc is the number of fuel cells.
16: Minimize the loss function to update the Critic network: Jðωi Þ ¼ In addition to considering the life loss of the fuel cell, this paper also
1 Xm considers the life loss of the battery. Generally, when the battery loss
i¼1 i
ðy Qðsi ; ai ; ωi ÞÞ2 capacity reaches 20%, the battery can no longer be used. Therefore, the
m
17: If t mod N then life of the battery can be calculated as follows [31]:
1 Xm
18: Policy gradient method update Actor network: rJ ðθÞ ¼ ½ra Qðsi ; ai ; ωÞ
m ΔQloss Mbat
i¼1
js¼si ;a¼πθ ðsÞ rθ π θðsÞ js¼si C_ bat ¼ (27)
0 0 0 0 20%
19: Soft update target network: ωi ← τωi þ ð1 τÞωi , θ ← τθ þ ð1 τÞθ
20: End if In the formula, ΔQloss is the capacitance loss; Mbat is the battery cost.
18: End for According to the literature [25], the battery cost is generally 1,139.43
19: End for
¥/kwh. The maximum output power of the lithium battery pack used in
this paper is 35kw, so the battery cost is 1139.43 35 ¼ 39,880.17¥.
3.2. Design of reward function To sum up, set the reward function rt as:
C_ h þ C_ fc þ C_ bat þ C_ bat_soc
The block diagram of TD3-based EMS is shown in Fig. 8. The input rt ¼ 0:1 < SOCsc < 0:9 (28)
states consist of vehicle speed, acceleration, battery SOC and super- _ _ _ _
C h þ Cfc þ C bat þ C bat_soc þ ζ others
capacitor SOC, and the control variables are fuel cell power and bat-
tery power. The reward function is an important factor for the offline where C_ h is the hydrogen consumption of the fuel cell, which is con-
training of TD3-based EMS. The control goal of this study is to opti- verted into the hydrogen price according to the unit price of hydrogen in
mize hydrogen consumption and the longevity of fuel cell and batterie, 2020 [29] (25.55¥/kg); C_ fc is the life loss of the fuel cell, which is
while maintaining SOC of battery and supercapacitor in an appropriate mainly composed of low load life loss Clow , high load life loss Chigh and
range. TD3-based EMS is employed to coordinate the power output of frequent load change life loss Cchange ; C_ bat is battery pack life loss; ζ is the
the fuel cell and the battery, so that the control target can be better penalty factor when the supercapacitor exceeds the given SOC range, and
achieved. the specific value depends on the operating conditions and vehicle
The reward function is an important factor in the offline training of
models; C_ bat_soc is the battery SOC adjustment term to maintain the
TD3 algorithm, that is, the optimization goal in the optimization prob-
battery SOC, and its expression is as follows:
lem. The control objectives of this study are to reduce hydrogen
8
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
C_ bat soc
_
¼ λsoc SOC (29) greatly, which is a very suitable case for verifying the performance of the
strategy.
where SOC_ is the SOC change rate, and the expression has been given in
Section 2; λsoc is the discount coefficient, which is valued according to the
specific vehicle model and operating conditions. 4.1. Analysis of offline training performance
The TD3 algorithm performs the steps in Table 4 for offline training;
3.3. Formulation of nonlinear programming-based EMS in order to characterize the offline training process of deep reinforcement
learning, the cumulative reward obtained in each step of training is
In order to validate the performance of the TD3-based EMS, this study normalized and the data obtained after taking the positive value is
takes the nonlinear programming based EMS (NEMS) with the same mean_reward to represent the deep reinforcement learning offline
objective function as the TD3 [22]. The optimization problem is solved training process.
by sequential quadratic programming (SQP) method as follows [30]: In order to reflect the improvement effect of the TD3 algorithm
( compared to the DDPG algorithm, this study applies the previously
C_ h þ C_ fc þ C_ bat þ C_ bat soc 0:1 < SOCsc < 0:9 mentioned reward function to DDPG and participates in the energy
min J ¼
C_ h þ C_ fc þ C_ bat þ C_ bat soc þ ζ others management of the fuel cell/battery/supercapacitor system in this
8 chapter, while maintaining the corresponding network settings. The two
>
> Pfc þ Pb þ Psc ¼ Preq
>
> deep reinforcement learning-based strategies obtained the convergence
>
>
>
> Ist;min Ist Ist; max (30) graph shown in Fig. 10 after offline training.
<
st : jIst ðtÞ Ist ðt 1Þj ΔIst; max It can be seen from Fig. 10 that at the beginning of training, TD3
>
>
>
> began to explore the environment. Since the evaluation of actions by
>
> Icharge_lim ðSOCÞ Ibat Idischarge_lim ðSOCÞ
>
> the Critic network was not accurate enough at this time, the accumu-
:
Isc_charge_lim ðSOCsc Þ Isc Isc_discharge_lim ðSOCsc Þ lated rewards in the early stage showed a trend of large changes. This
reflects that TD3 obtains the environmental information under different
where Ist;min and Ist;max are the current limit range of the fuel cell actions as much as possible, so as to better evaluate the output action;
respectively; ΔIst;max is the maximum allowable current change rate of the and after 35 episodes, the cumulative reward fluctuation decreases and
fuel cell; Ibat is the battery current; Icharge lim ðSOCÞ and Idischarge lim ðSOCÞ becomes stable, reflecting that the evaluation of the action by the Critic
are the maximum charging current and maximum discharging current of network has been relatively accurate, and the TD3 algorithm training is
the battery pack; Isc is the supercapacitor current; Isc_charge_lim ðSOCsc Þ and completed. In contrast, the convergence diagram of DDPG shown in
Isc_discharge lim ðSOCsc Þ are the maximum charging current and maximum Fig. 10 also shows a similar trend, and the training is completed around
discharging current of the supercapacitor. 31 episodes, which is not much different from TD3. However, the
fluctuation of DDPG in the early stage is much larger than that of TD3,
which may be because the double-delay update method of TD3 reduces
4. Results and discussions the variance of Q value in the early stage, and it can also be seen that
between 38 episodes and 48 episodes, DDPG may fall into a local op-
In this research, the proposed EMS was tested and verified under timum, resulting in an increase in mean_reward, and there is no such
WLTP class 3 conditions. The working condition is shown in Fig. 9. The problem after TD3 converges. This is also due to the fact that TD3 uses
cycle time of this working condition is 1800 s and the maximum speed is two Critic for training, which can effectively solve the deviation caused
36.47 m/s. The load requirement of this working condition is more by over-estimation of the Q value, and effectively avoid falling into the
complicated than that of NEDC, and the speed and acceleration fluctuate local optimal situation.
9
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
Fig. 9. Schematic diagram of speed and acceleration under WLTP class 3 operating conditions.
4.2. Analysis of power distribution of HES Fig. 11 shows the output power of the fuel cell under different EMS. It
can be clearly seen that the transient power of the fuel cell under the EMS
Figs. 11–13 are the power allocation diagrams of the TD3-based EMS control based on TD3 is smaller than that of DDPG and NEMS, and the
(TD3), the DDPG-based EMS (DDPG), and the nonlinear programming- overall output demand of the fuel cell is also smaller. Therefore, it can be
based EMS (NEMS) under the same objective function. inferred that when the load suddenly changes in the power system, the
fuel cell loss under the strategy based on TD3 will be smaller, and the
hydrogen consumption of the fuel cell will also be smaller.
Table 4
Parameter settings for offline training of deep reinforcement
Fig. 12 shows the output power of the battery under the three stra-
learning. tegies. From the power curve, since TD3 reduces the power demand for
the fuel cell, in order to meet the power demand of the entire vehicle, the
Preset parameters Value
power demand for the battery will increase accordingly, so the power
Discount factor 0.99 change of the battery will be more frequent. And because the optimiza-
Actor learning rate 0.001
tion goal of the TD3 strategy includes the battery life optimization item,
Critic learning rate 0.01
Experience pool capacity 10,000 and the battery life is closely related to the transient operating condi-
Number of replay samples 64 tions, the transient power of the battery under the TD3 strategy is smaller
Soft update discount factor 0.01 than the other two strategies.
10
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
Fig. 13 shows the output power of the supercapacitor. Obviously, the increase of the average efficiency can also reduce the hydrogen con-
supercapacitor under the control of the TD3 strategy has the highest sumption of the fuel cell.
utilization rate, and makes full use of the regulation effect of the super-
capacitor, that is, under the premise of optimizing the fuel cell and bat- 4.3. SOC trajectories of batterie and supercapacitor
tery as much as possible, using super capacitors to make up the remaining
power demand. The battery SOC and supercapacitor SOC trends of the three strategies
Fig. 14 reflects the fuel cell power distribution for the three control are shown in Fig. 15. It can be seen from Fig. 15 that TD3 and NEMS have
strategies under WLTP class 3 conditions. In general, when a fuel cell stronger constraints on the SOC of lithium batteries than DDPG, which
operates in a more efficient operating region, not only can its average shows that DDPG tends to use lithium batteries in the power distribution
system efficiency be improved, but also effectively reduce fuel cell process. It can be seen from the SOC changes of supercapacitors reflected
degradation caused by low/high loads. Obviously, due to the low/high- in Fig. 15 that TD3 is more inclined to use supercapacitors to compensate
load battery life loss included in the optimization objective, the working the required power in power distribution. Among the three strategies,
state of the fuel cell under the TD3 strategy is more stable, and the fre- TD3 can better utilize supercapacitors to improve the optimization space.
quency of operation in the high-efficiency range is also higher. In addition, TD3 and DDPG are a global optimization strategy. Different
Combining the results in Fig. 14, it can be seen that when TD3 performs from the real-time optimization strategy of NEMS, the battery SOC or
power distribution, the work efficiency matched by the output power of supercapacitor SOC under NEMS is in a lower state than the other two
the fuel cell is as high as possible, which can reduce the life loss of the fuel strategies. This also confirms from the side that in order to meet the
cell caused by the start/stop of the fuel cell to a certain extent, and the multi-objective optimization of hydrogen consumption and fuel cell/
11
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
battery loss, TD3 and DDPG are more inclined to use supercapacitors to subsequent training does not appear as a rising process like DDPG, which
compensate the required power during power distribution. is due to TD3's unique dual-critic and dual-delay update mechanism,
which makes the estimation of Q value by TD3 more accurate, and can
4.4. Cost analysis effectively avoid overfitting and fall into local optimum.
In addition to hydrogen consumption, the optimization objectives of
Hydrogen consumption is one of the optimization indicators of TD3, TD3, DDPG and NEMS also include fuel cell life loss and battery life loss.
DDPG and NEMS. This chapter converts the quality of hydrogen con- Among them, fuel cell life loss mainly includes low load loss, high load
sumption into price through the unit price of hydrogen. Fig. 16 shows the loss and transient load loss. Fig. 17 shows the comparison of each loss of
optimization results of these three strategies for hydrogen consumption. the three strategies. As can be seen from Fig. 17, NEMS tends to optimize
Combined with the SOC curve in Fig. 15, from the trend point of view, the low load loss, TD3 tends to optimize high load loss and power battery
hydrogen consumption curve and SOC curve of the strategy based on TD3 loss, and DDPG tends to optimize transient load loss and power battery
show a relatively stable upward or downward trend in the whole working loss. Although the optimization effect of the other two strategies is better
process, and the slope fluctuates little, which is a typical global optimi- than that of the TD3 strategy under low load loss, TD3 is better than or
zation trend [29]; However, the slopes of the hydrogen consumption equal to the other two strategies in terms of high load loss, power battery
curves and SOC curves of the other two strategies changed greatly, and loss and transient load loss. On the whole, the optimization effect of the
the trend of the curves fluctuated continuously. As far as the final result is TD3 strategy is the best among the three strategies.
concerned, the hydrogen consumption of TD3 is 36.40% lower than that Fig. 18 shows the distribution of overall loss and hydrogen con-
of DDPG, and the hydrogen consumption of TD3 is 50.87% lower than sumption for the three strategies. Compared with DDPG and NEMS, the
that of NEMS. Although TD3 and DDPG converge in a similar episode, the total cost of TD3 is reduced by 17.36% and 26.83%, respectively. For
12
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
13
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
Fig. 18. Comparison of the overall loss and the sum of hydrogen consumption of the three strategies.
TD3, it can be clearly seen that hydrogen consumption, fuel cell loss more effective in optimizing the evaluation index, that is, the optimized
(including three loss terms) and battery loss show a uniform distribution dynamic system works more stably, the frequency of working in the
trend. On the premise that the high power loss is too small to be ignored, high-efficiency range is higher, thereby reducing the loss of the power
the fuel cell loss of TD3 is reduced by 18.15% compared with NEMS in system and obtaining a lower overall cost, which greatly improves the
one cycle, and the fuel cell loss of DDPG is reduced by 27.22% compared economy of the power system.
with NEMS.
Compared with DDPG and NEMS, the battery loss of TD3 is reduced Conflict of interest statement
by 2.16% and 15.62%, respectively. Although the fuel cell loss of TD3 is
higher than that of DDPG, the overall cost of TD3 is lower than that of The authors declare that they have no known competing financial
DDPG by reducing hydrogen consumption and power battery loss. interests or personal relationships that could have appeared to influence
The specific values of the compared indicators are given in Table 5. By the work reported in this paper.
comparing with DDPG and NEMS, the performance of TD3 in reducing
hydrogen consumption, fuel cell loss and prolonging battery life has been Acknowledgements
verified.
This work was supported by the National Natural Science Foundation
Table 5
of China [Grant No. 51805254]. Any opinions expressed in this paper are
Summary table of various indicators. solely those of the authors and do not represent those of the sponsors. The
authors would like to thank the reviewers for their helpful corrections
TD3 DDPG NEMS
and insightful suggestions.
Hydrogen consumption cost (RMB) 3.97 6.25 8.08
Low load loss cost of fuel cell (RMB) 2.69 2.45 2.28
References
Fuel cell high load loss cost (RMB) 0.006,7 0.009,5 0.015
Fuel cell transient loss cost (RMB) 1.63 1.30 2.39
Overall loss cost of fuel cell (RMB) 4.33 3.85 5.29 [1] Xie C, Quan S, Du C. Research on energy management system of fuel cell electric
vehicle [J]. Automotive Engineering 2007;29(9):758–60.
Battery loss cost (RMB) 4.07 4.93 4.16
[2] Wang Q, Xiao Y, Qi W. Research on vehicle energy management of fuel cell hybrid
Total cost (RMB) 12.38 14.98 16.92
electric vehicle[J]. Power Technology 2012:1459–62. 2012年 10.
[3] Zhang S. Research on electric vehicle charging strategy and control technology
based on dual energy sources [D]. Chongqing University of Technology; 2019.
[4] Zhang C, Dong J, Liu J. etc. Control strategy of battery and supercapacitor hybrid
5. Conclusion energy storage system[J]. Journal of Electrotechnical Technology 2014;(4):334–40.
[5] Wang Y, Sun Z, Zonghai C. Rule-based energy management strategy of a lithium-ion
In this study, a novel EMS is formulated by combining the TD3 al- battery, supercapacitor and PEM fuel cell system[J]. Energy Procedia 2019;158:
2555–60.
gorithm in deep reinforcement learning and used to solve the cost [6] Gao D, Jin Z, Lu Q. Energy management strategy based on fuzzy logic for A fuel cell
optimization problem of long-distance logistics trucks with battery/fuel hybrid bus[J]. Journal of power sources 2008;185(1):311–7.
cell/supercapacitor power structure. Aiming at the fuel cell/battery/ [7] Xu L, Ouyang M, Li J, et al. Dynamic programming algorithm for minimizing
operating cost of a PEM fuel cell vehicle [C]. In: 2012 IEEE international symposium
supercapacitor hybrid system proposed in this paper, a multi-objective
on industrial electronics; 2012. p. 1490–5.
optimal EMS is established based on the TD3 algorithm, while consid- [8] Han J, Park Y, Kum D. Optimal adaptation of equivalent factor of equivalent
ering the hydrogen consumption level, fuel cell life loss level and bat- consumption minimization strategy for fuel cell hybrid electric vehicles under
active state inequality constraints[J]. Journal of Power Sources 2014;267:491–502.
tery life loss level. The method effectively suppresses the aging of core
[9] Shen D, Lim CC, Shi P. Robust fuzzy model predictive control for energy management
components, optimizes the hydrogen consumption of the fuel cell, and systems in fuel cell vehicles[J]. Control Engineering Practice 2020;98:104364.
improves the operation cycle and economy of the entire power system. [10] Vazquez-Canteli JR, Nagy Z. Reinforcement learning for demand response: a review
Compared with the strategy based on nonlinear programming algo- of algorithms and modeling techniques[J]. Applied energy 2019;235:1072–89.
[11] Hu Y, Li W, Xu K, et al. Energy management strategy for a hybrid electric vehicle
rithm and the strategy based on DDPG, it proves that the TD3 strategy is based on deep reinforcement learning [J]. Applied Sciences 2018;8(2):8020187.
14
J. Wang et al. Green Energy and Intelligent Transportation 1 (2022) 100028
[12] Liu T, Zou Y, Liu D, et al. Reinforcement learning of adaptive energy management [21] Amphlett JC, Baumert RM, Mann RF, et al. Performance modeling of the Ballard
with transition probability for a hybrid electric tracked vehicle [J]. IEEE Mark IV solid polymer electrolyte fuel cell II. Empirical model development[J].
Transactions on Industrial Electronics 2015;62(12):7837–46. Journal of The Electrochemical Society 1995;142(1):9–15.
[13] Yuan J, Yang L, Chen Q. Intelligent energy management strategy based on [22] Hu X, Zou C, Tang X, et al. Cost-optimal energy management of hybrid electric
hierarchical approximate global optimization for plugin fuel cell hybrid electric vehicles using fuel cell/battery health-aware predictive control[J]. IEEE
vehicles [J]. International Journal of Hydrogen Energy 2018;43(16):8063–78. Transactions on Power Electronics 2019;35(1):382–92.
[14] Reddy N P, Pasdeloup D, Zadeh M K, et al. An intelligent power and energy [23] Sarioglu L, Klein OP, Schroder H, et al. Energy management for fuel-cell hybrid
management system for fuel cell/battery hybrid electric vehicle using vehicles based on specific fuel consumption due to load shifting[J]. Intelligent
reinforcement learning [C]. In 2019 IEEE transportation electrification conference transportation systems. IEEE Transactions on 2012;13(4):1772–81.
and expo (ITEC):1–6. [24] Zhang C, Allafi W, Dinh Q, et al. Online estimation of battery equivalent circuit
[15] Li Yuecheng, He Hongwen, et al. Deep reinforcement learning-based energy model parameters and state of charge using decoupled least squares technique[J].
management for a series hybrid electric vehicle enabled by history cumulative trip Energy 2018;142:413–20.
information[J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 2019;68: [25] Hua L. Development of supercapacitor and new energy vehivle[J]. Automobile &
7416–30. 8. Parts 2009;No.4:26–32.
[16] Li Weihan, Han Cui, et al. Deep reinforcement learning-based energy management [26] Liu Chunna. Supercapacitor and its application in new energy vehicles[J]. Chinese
of hybrid battery systems in electric vehicles[J]. Journal of Energy Storage 2021;36: Journal of Power Sources 2010;34(12):1223–5. 12.
102355. [27] Li Hui, Zhao Bin, et al. Battery/supercapacitor energy management for streetcars
[17] Han Xuefeng, He Hongwen, et al. Energy management based on reinforcement [J]. BATTERY BIMONTHLY 2022;51(1):48–52. 2.
learning with double deep Q-learning for a hybrid electric tracked vehicle[J]. [28] Zhang F, Li J, Li Z. A TD3-based multi-agent deep reinforcement learning method in
Applied Energy 2019;254:113708. mixed cooperation-competition environment[J]. Neurocomputing 2020;411:
[18] Liu L. Simulation analysis and control of fuel cell hybrid electric vehicle multi- 206–15.
energy system [D]. Jilin University; 2007. [29] Sun Z, Wang Y, Chen Z, et al. Min-max game based energy management strategy for fuel
[19] Cai Kuncheng, Chen Jiawei, Song Qingchao. Decentralized energy management cell/supercapacitor hybrid electric vehicles[J]. Applied Energy 2020;267:115086.
strategy of fuel cell/battery/supercapacitor-hybrid electric vehicle [A]. China [30] Schittowski K. NLQPL: a FORTRAN-subroutine solving constrained nonlinear
Society of Automation. In: Proceedings of the 2020 China automation conference programming problems[J]. Annals of Operations Research 1985;5:485–500.
(CAC2020) [C]. China Society of Automation: China Society of Automation; 2020. [31] Li Yuecheng, He Hongwen, et al. Deep reinforcement learning-based energy
p. 1–5. management for a series hybrid electric vehicle enabled by history cumulative trip
[20] Larminie James. Fuel cell systems explained/-2nd ed[M]. John Wiley; 2003. information[J]. IEEE Transactions On Vehicular Technology AUGUST 2019;68(8).
15