You are on page 1of 9

Applied Energy 286 (2021) 116541

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Early and robust remaining useful life prediction of supercapacitors using


BOHB optimized Deep Belief Network
Muhammad Haris ∗, Muhammad Noman Hasan, Shiyin Qin
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China

ARTICLE INFO ABSTRACT

Keywords: The lifespan, power density, and transient response make supercapacitors a component of choice for the electric
Supercapacitor vehicle and renewable energy industry. Supercapacitors’ long lifecycle often makes it difficult for designers to
RUL estimation assess the system’s reliability over the complete product cycle. In the existing literature, the remaining useful
Deep Belief Network
life (RUL) estimations utilize up to 50% state of health (SOH) degradation data to successfully predict the
BOHB optimization
RUL of the supercapacitors with reasonable accuracy, making them impractical in terms of time and resources
required to collect the data. The time to acquire data imposes restrictions on developing a data-driven RUL
prediction model for the supercapacitors. The objective of this study is to reliably predict the SOH degradation
curve of the supercapacitors with the availability of less than 10% degradation data to avoid time and cost-
consuming lifecycle testing. This study presents a novel combination of deep learning algorithm-Deep Belief
Network (DBN) with Bayesian Optimization and HyperBand (BOHB) to predict the RUL of the supercapacitors
in the early phases of degradation. The proposed method successfully predicts the degradation curve using
the data of the initial 15 thousand cycles (less than 6% data for training in most of the cases), which is very
promising since the supercapacitor has yet to show much degradation at this stage, thus reducing up to 54%
time for the development of the RUL prediction model. The proposed model shows good accuracy with percent
error and root mean squared error (RMSE) ranging from 0.05% to 2.2% and 0.8851 to 1.6326, respectively.
The robustness of the model is also tested by injecting noise in the training data during training.

1. Introduction health. The prognostics and health management (PHM) of superca-


pacitors prevent premature failure as well as increase the reliability
Supercapacitors have high power density, a wide temperature range, of the system. Early predictions of Remaining Useful Life (RUL) of
and long service life than conventional batteries [1]. The low internal supercapacitors will open a new area of its use and optimization. The
resistance, high charge/discharge rate, and efficiency make them ideal manufacturers can analyze the device’s aging behavior using initial
for modern energy storage systems [2]. Supercapacitors find applica- degradation data, which allows them to perform an early test for
tion in energy storage [3], microgrids [4], renewable energy [5] and the end of life estimation. The RUL prediction of supercapacitors is
hybrid vehicles [6]. However, they are widely used in the electric challenging due to the non-linear degradation and dependence on a
vehicle, since they can provide and absorb power at a much high rate wide variety of operating conditions [11].
during acceleration and deceleration of the vehicle as compared to There are mainly two methods to predict RUL of a device: physics
batteries [7]. In [8], the authors used supercapacitors to harvest energy of failure (PoF) based approach and data-driven approach. The PoF
from a vehicle’s regenerative shock absorber. based prediction uses physical or mathematical model to precisely
Many factors influence the lifetime of supercapacitors, including describe the behavior and mechanism of aging of the device [12].
charge and discharge rate, temperature, and charge voltage [9]. Su- In [11], an aging model of the supercapacitor is proposed based on
percapacitors exhibit a change in Equivalent Series Resistance (ESR) Eyring’s law. It considers the pre-exponential factor as a non-linear
and capacitance due to aging [10]. These parameters affect the State function of the aging temperature. This method reduces RMSE to 2.79%
of Health (SOH) of supercapacitors. Since supercapacitors are used compared to the classical Eyring model (9.7%) and the modified Eyring
as a short-term high power absorbing and delivering storage device, model (3.39%). A particle filter-based approach is used in [13] to esti-
the monitoring and prediction of its SOH are critical for the system’s mate supercapacitors’ RUL based on capacitance and internal resistance

∗ Corresponding author.
E-mail address: haris@buaa.edu.cn (M. Haris).

https://doi.org/10.1016/j.apenergy.2021.116541
Received 13 June 2020; Received in revised form 1 January 2021; Accepted 19 January 2021
Available online 28 January 2021
0306-2619/© 2021 Elsevier Ltd. All rights reserved.
M. Haris et al. Applied Energy 286 (2021) 116541

Table 1
Nomenclature Previous works on RUL estimation of supercapacitors.
Reference RUL estimation model Maximum data used for
𝜇 Mean of gaussian noise predictions
𝜎 Standard deviation of gaussian noise [16] Genetic algorithm with Sequential 60%
ANN Artificial Neural Network Quadratic Programming optimized
BO Bayesian Optimization Long–short term memory (LSTM)
Recurrent Neural Network (RNN)
BOHB Bayesian Optimization and Hyperband
[17] LSTM RNN 70%
CNN Convolutional Neural Network
[18] Neo-fuzzy neuron model 50%
DBN Deep Belief Network [13] Particle filter 70%
EI Expected Improvement This study BOHB optimized Deep Belief Network 6%
ESR Equivalent Series Resistance
F Farads
HB Hyperband 2. Robustness: The early predictions may result in low predic-
KDE Kernel Density Estimator tion accuracy because of the availability of a smaller amount of data
LSTM Long–Short Term Memory for training. The resultant RUL estimation model should be robust
MAE Mean Absolute Error enough to provide reasonable prediction accuracy for the unobserved
MAPE Mean Absolute Percentage Error data. It should also give good performance with new datasets while
maintaining the same network architecture and hyperparameters.
ML Machine Learning
Significance and contribution of the study:
PHM Prognostics and Health Management
This paper presents a deep learning method to provide early and
RBM Restricted Boltzmann Machine
robust estimations of RUL for supercapacitors. This will reduce the need
RMSE Root Mean Squared Error to develop a physics-based model for estimating the degradation curve
RNN Recurrent Neural Network since the nature of electrochemical degradation in supercapacitors is
RUL Remaining Useful Life difficult to model. For this purpose, a Deep Belief Network (DBN)
SOH State of Health based prediction model is developed for the early RUL estimation of
SVM Support Vector Machine supercapacitors. The reason for choosing DBN is their extensive learn-
TPE Tree Parzen Estimator ing capability due to their multi-layer structure and back-propagation
𝑉𝑅 Rated Voltage of the supercapacitor learning architecture. As mentioned earlier, the prediction performance
of DBN is highly dependent on its hyperparameters. To provide robust
performance on large prediction data, we used Bayesian Optimization
and HyperBand (BOHB) to find the optimal hyperparameters to provide
thresholds. Unlike model-based predictions, data-driven predictions the best prediction accuracy. We also demonstrate that our proposed
model performs better than Bayesian optimized DBN (BO-DBN) and
rely on large amounts of data and statistical inferences instead of com-
HyperBand optimized DBN (HB-DBN) in terms of finding better and
plex mathematical models [14]. Machine learning (ML) like Artificial
faster-optimized parameters for prediction, hence showing reasonable
Neural Network (ANN), Support Vector Machines (SVM), Convolutional
efficiency. The proposed BOHB-DBN model uses the initial 15 000
Neural Network (CNN), Recurrent Neural Network (RNN), and DBN is
cycles’ data, which accounts for less than 6% training data, to predict
now widely used in data-driven RUL and EOL estimation of complex
the remaining 94% of the degradation curve with good accuracy. To
systems due to advancement in computational speeds. The problem
show the universality of the developed model we have trained the
with these algorithms is to find the optimal values of the architectural
model on different degradation profiles using same hyperparameters
parameters (hyperparameters) like number of layers, batch size and
configuration. The model shows excellent results on other superca-
learning rates. The prediction performance of the machine learning-
pacitor degradation profiles, which proves the algorithm can be used
based RUL estimation model heavily depends upon these parameters.
to estimate the RUL of the supercapacitors with good precision. We
Thus it is necessary to find optimal values of the parameters without
also illustrate that the proposed model shows excellent performance by
consuming much computational speed and time. The supercapacitor’s
training on noisy data, which shows its robustness and adaptability to
RUL estimation model requires a considerable amount of testing before
new data. Table 1 summarizes previous works related to data-driven
developing an accurate data-driven model since they have a typical RUL estimation of supercapacitors and the amount of measurement
lifetime of 500,000 cycles. According to [15], the testing time required data needed for good prediction accuracy. As shown in Table 1, the
for the expected cycle lifetime of a 5 Farad(F) supercapacitor is 2011 RUL estimations techniques in the previous literature require large data
days (5.5 years) at 25 ◦ C and 961 days at 64 ◦ C. Furthermore, due to to train the model, which requires more time for measurement and
continuous advancement in supercapacitor technology, their capacity collection of data. In this regard, there is a need to devise a method
increases, increasing the time and resources to collect the data for that predicts degradation trends and estimates End of Life (EOL) using
RUL estimation. Thus imposes limitations in developing a data-driven a lesser amount of degradation data. Previous works related to data-
model because a sufficient amount of data for training is needed to driven RUL estimation use at least 50% of degradation data to develop
create a model that gives low prediction errors. In this regard, the the supercapacitor’s RUL prediction model.
development of RUL estimation model of the supercapacitor offers two
major challenges: 2. Proposed framework
1. Early prediction: The measurement of supercapacitor degra-
dation data requires years of testing on the real system, creating 2.1. Deep Belief Network (DBN)
difficulties for designers and manufacturers to validate the system’s
performance. Early predictions mean that the RUL estimation model The DBN is made up of multiple stacked Restricted Boltzmann Ma-
should deliver precise results using a smaller amount of degradation chine (RBM) layers, which are the key building blocks of the network
data. Early predictions will help manufacturers reliably detect any ab- [19], as illustrated in Fig. 1. RBM consists of two layers of neurons-
normality in supercapacitors’ degradation in the early stages, reducing a hidden layer and a visible layer. Every node of the visible layer
development costs. and hidden layer is connected bi-directionally, in which training is

2
M. Haris et al. Applied Energy 286 (2021) 116541

evaluated by maximum log-likelihood function, represented by Eq. (7):



log 𝐿(𝜃) = log 𝑝(𝑉𝑛 ) (7)
𝑛

where, 𝜃(𝜃 = {𝛼, 𝛽, 𝑤}) is the model parameter of the RBM and 𝑛 is the
number of training samples. The gradient of the log-likelihood function
with respect to model parameters is computed as follows:
𝜕log 𝑝 (𝑣) ∑
= 𝑣𝑖 − 𝑝 (𝑣) 𝑣𝑖 = 𝐸[𝑣𝑖 ]𝑟𝑒𝑎𝑙 − 𝐸[𝑣𝑖 ]𝑚𝑜𝑑𝑒𝑙 (8)
𝜕𝛼𝑖 𝑣

𝜕log 𝑝 (𝑣) ∑
= 𝑝(ℎ𝑗 = 1|𝑣) − 𝑝 (𝑣) 𝑝(ℎ𝑗 = 1|𝑣)
𝜕𝛽𝑖 𝑣

Fig. 1. An illustration of a Deep Belief Network consists of three layers of RBM. = 𝐸[ℎ𝑗 ]𝑟𝑒𝑎𝑙 − 𝐸[ℎ𝑗 ]𝑚𝑜𝑑𝑒𝑙 (9)

𝜕log 𝑝 (𝑣) ∑
= 𝑝(ℎ𝑗 = 1 | 𝑣)𝑣𝑖 − 𝑝 (𝑣) 𝑝(ℎ𝑗 = 1 | 𝑣)𝑣𝑖
𝜕𝑤𝑖𝑗 𝑣

= 𝐸[𝑣𝑖 ℎ𝑗 ]𝑟𝑒𝑎𝑙 − 𝐸[𝑣𝑖 ℎ𝑗 ]𝑚𝑜𝑑𝑒𝑙 (10)

where, 𝐸[.]𝑟𝑒𝑎𝑙 𝑎𝑛𝑑 𝐸[.]𝑚𝑜𝑑𝑒𝑙 represent probabilities of actual data and


reconstructed data, respectively. Given 𝑣 and ℎ values 𝐸[𝑣𝑖 ℎ𝑗 ]𝑟𝑒𝑎𝑙 can
be calculated by Eq. (11):

𝐸[𝑣𝑖 ℎ𝑗 ]𝑟𝑒𝑎𝑙 = 𝑝(ℎ|𝑣)𝑣𝑇 (11)

Given the training dataset 𝜒, 𝐸[𝑣𝑖 ℎ𝑗 ]𝑟𝑒𝑎𝑙 can easily be computed by


Fig. 2. An RBM model having M visible neurons (v) and N hidden neurons (h). using Eq. (5) by simply set 𝑣 ← 𝜒. However, calculating 𝐸[𝑣𝑖 ℎ𝑗 ]𝑚𝑜𝑑𝑒𝑙 is
complex due to presence of partition function in Eq. (3). One possible
solution to this problem is proposed in [20]. In this method, Gibbs sam-
pling is performed on visible neurons starting from any random state
carried out through unsupervised learning of each RBM called the pre-
until certain convergence criteria are achieved. Gibbs sampling consists
training stage. The main purpose of RBM’s training is to choose initial
of getting hidden states ℎ𝑗 by given training samples to visible neurons
parameters (weights and biases) to maximize the likelihood estimation
layer 𝑣𝑖 by using Eq. (5). Then visible states can be reconstructed by
for the rebuilding of training samples. The architecture of RBM having
obtaining hidden states using Eq. (6). The learning rules for the three
M visible neurons (v) and N hidden neurons (h) is shown in Fig. 2.
parameters can be represented by Eqs. (12)–(14).
The hidden and visible neurons have binary stochastic values such
that ℎ ∈ {0, 1}𝑁 and 𝑣 ∈ {0, 1}𝑀 . The energy function of the joint 𝛥𝛼𝑖 = 𝜑𝛼𝑖 + 𝛿(𝐸[𝑣𝑖 ]𝑟𝑒𝑎𝑙 − 𝐸[𝑣𝑖 ]𝑚𝑜𝑑𝑒𝑙 ) (12)
hidden-visible layer structure can be given by Eqs. (1) and (2).

𝐸 (𝑣, ℎ) = −𝛼 𝑇 𝑣 − 𝛽 𝑇 ℎ − 𝑣𝑇 𝑤ℎ (1) 𝛥𝛽𝑗 = 𝜑𝛽𝑗 + 𝛿(𝐸[ℎ𝑗 ]𝑟𝑒𝑎𝑙 − 𝐸[ℎ𝑗 ]𝑚𝑜𝑑𝑒𝑙 ) (13)


𝑀 ∑
𝑁 ∑
𝑀 ∑
𝑁 𝛥𝑤𝑖𝑗 = 𝜑𝑤𝑖𝑗 + 𝛿(𝐸[𝑣𝑖 ℎ𝑗 ]𝑟𝑒𝑎𝑙 − 𝐸[𝑣𝑖 ℎ𝑗 ]𝑚𝑜𝑑𝑒𝑙 ) (14)
𝐸 (𝑣, ℎ) = − 𝛼𝑖 𝑣𝑖 − 𝛽𝑗 ℎ𝑗 − 𝑣𝑖 𝑤𝑖𝑗 ℎ𝑗 (2)
𝑖=1 𝑗=1 𝑖=1 𝑗=1 where, 𝜑 𝑎𝑛𝑑 𝛿 are the momentum and learning rate, respectively.
where, 𝛼𝑖 and 𝛽𝑗 represent the biases of visible and hidden neurons, In the final stage, DBN is trained layer by layer using the pre-training
respectively and 𝑤𝑖𝑗 is the weight between the visible neuron (𝑣𝑖 ) and stage’s initial parameters by a supervised back-propagation algorithm
hidden neuron (ℎ𝑗 ). The probability distribution of the joint structure to fine-tune these parameters more efficiently. This fine-tuning aims
(v,h) can be calculated by Eq. (3) as: to minimize the error or cost function, considering the outcome of an
additional layer on the top of DBN after training of each RBM. The
𝑒−𝐸(𝑣,ℎ) error in the output layer for the training data set can be calculated by
𝑃 (𝑣, ℎ) = ∑ −𝐸(𝑣,ℎ)
(3)
(𝑣,ℎ) 𝑒 Eq. (15).
The denominator in Eq. (3) is the partition function or normalization
1 ∑(
𝑁
)
factor for all configurations of visible neurons v and hidden neurons h. 𝐸𝑟𝑟 = 𝑦 − ℎ𝑛 (15)
2 𝑛=1 𝑛
The conditional probability distribution of hidden neurons ℎ and visible
neurons 𝑣 given state vector 𝑣𝑖 of the visible neuron layer and ℎ𝑗 of the where, 𝑦𝑛 𝑎𝑛𝑑 ℎ𝑛 is the actual and modeled values in the 𝑛th output
hidden neuron layer is given by Eq. (4) and Eq. (5), respectively. node, respectively.
( ) ∑
𝑀
𝑝 ℎ𝑗 = 1 | 𝑣 = 𝜎(𝛽𝑗 + 𝑤𝑖𝑗 𝑣𝑖 ) (4) 2.2. BayesIan Optimization and HyperBand (BOHB)
𝑖=1
( ) ∑
𝑁
The quality of the predictive capability of the DBN model is greatly
𝑝 𝑣𝑖 = 1 | ℎ = 𝜎(𝛼𝑖 + 𝑤𝑖𝑗 ℎ𝑗 ) (5)
𝑗=1
related to its hyperparameters configuration. These hyperparameters
impact architecture, learning rates, activation functions, and the num-
where, 𝜎(.) is the sigmoid function and can be expressed by Eq. (6):
ber of iterations to updates weights and biases. To optimize these
1 parameters, researchers often rely on conventional optimization tech-
𝜎 (𝑥) = (6)
1 + 𝑒−𝑥 niques like random search [21] and grid search algorithms. The ma-
As discussed above, the task of RBM is to set initial parameters of chine learning algorithm’s performance can be modeled by function
weights (𝑤) and biases (𝛼 𝑎𝑛𝑑 𝛽). These initial parameters can be (𝑓 ∶ 𝜒 → R) of their hyperparameters 𝑥 ∈ 𝜒. Mathematically,

3
M. Haris et al. Applied Energy 286 (2021) 116541

optimizer aims to find a global maximizer (or minimizer) of a function,


represented by Eq. (16):

𝑥⋆ = 𝑎𝑟𝑔 𝑥∈𝜒 min𝑓 (𝑥) (16)

We cannot observe 𝑓 (𝑥) directly due to inherent randomness in the


machine learning algorithm, so we define a function 𝑦(𝑥) such that:

𝑦 (𝑥) = 𝑓 (𝑥) + 𝜖 (17)


( 2
)
where, 𝜖 ∼  0, 𝜎𝑛𝑜𝑖𝑠𝑒 . BOHB combines Bayesian Optimization
and HyperBand for hyperparameters optimization proposed in [22].
Bayesian Optimization is an iteration based algorithm that consists of
two main components: a probabilistic surrogate model for modeling of
the objective function and an acquisition function, which explores new
areas in sample space and exploit areas that are already known for bet-
ter results. Bayesian Optimization uses various probabilistic regression
model which define a predictive distribution 𝑝 (𝑓 | 𝐷) based on already
( ) ( ) ( )
computed data samples 𝐷 = { 𝑥0 , 𝑦0 , 𝑥1 , 𝑦1 , … , 𝑥𝑛−1 , 𝑦𝑛−1 }. The
acquisition function used in this article is the Expected Improvement
(EI) criterion. The Expected Improvement criterion defines the non-
negative expected value at location 𝑥 over the best objective value
previously observed, given by Eq. (18):
( )
𝐸𝐼𝑦∗ (𝑥) = max 0, 𝑦∗ − 𝑓 (𝑥) 𝑑𝑝(𝑓 |𝐷) (18)

{ }
where, 𝑦∗ = min 𝑦0 , 𝑦1 , … , 𝑦𝑛 . We used Tree Parzen Estimator (TPE)
as a Bayesian Optimization method [23,24] in this study. Instead
of modeling function f directly by 𝑝 (𝑓 | 𝐷), TPE uses kernel density
estimator to model the densities over the input configurations expressed Fig. 3. Proposed framework for RUL prediction of supercapacitors. The DBN is
by Eqs. (19) and (20): optimized by BOHB algorithm as per minimum RMSE criterion.

( )
𝑙 (𝑥) = 𝑝 𝑦 < 𝑦∗ | 𝑥, 𝐷 (19)
number of hyperparameters. The algorithm chooses the best and worst
𝑔 (𝑥) = 𝑝(𝑦 ≥ 𝑦∗ | 𝑥, 𝐷) (20) configuration after the number of observations 𝐷 for budget 𝐵, satisfy
𝑙(𝑥)
To select a new candidate for optimization it maximizes the ratio of 𝑔(𝑥) the criteria 𝐷𝐵 ≥ 𝑁𝑚𝑖𝑛 +2. Eqs. (21) and (22) depicts the model density
which is equivalent to optimizing EI function as mentioned in [23]. of the worst and best configuration, respectively.
Hyperband uses the Successive Halving algorithm proposed in [25] ( )
𝑁𝐵,𝑙 = max 𝑁𝑚𝑖𝑛 , 𝑞.𝑁𝐵 (21)
for the optimization of hyperparameters. Successive Halving algorithm
( )
initially allocates a budget to a set of configurations and evaluate 𝑁𝐵,𝑔 = max 𝑁𝑚𝑖𝑛 , 𝑁𝐵 − 𝑁𝐵 , 𝑙 (22)
the performance of all configurations. Based on the performance, the
algorithm throwaway the worst performed configurations and repeat where, 𝑁𝐵 is the number of data points for budget 𝐵 and 𝑞 is the
this until only one configuration is left. Successive Halving suffers percentile for 𝑁𝐵 . In BOHB, EI is optimized by taking samples S from
𝑙(𝑥), Eq. (19). The only difference is that all the bandwidths of the
‘‘budget vs. number of configuration’’ problem. Given a finite budget 𝐵
densities in l(x) are now multiplied by a factor b to find and explore
and number of configurations 𝑛, it is not clear beforehand whether to
more promising configurations.
assign a smaller budget to each configuration or assign a larger budget
to a smaller number of configurations. Hyperband resolves this problem
2.3. BOHB-DBN
by performing the grid search over several possible randomly sampled
values of n configurations. Hyperband allocates the smallest budget to a
In order to get best performance from the DBN prediction model, we
larger number of configurations before discarding the worst performed
propose to use the BOHB algorithm to optimize the hyperparameters of
configurations while running some configurations on the maximum the model. Fig. 3 illustrate the proposed framework used in this study
budget only. to predict the RUL of the supercapacitors. The main goal is to minimize
The selection of configurations by random sampling in HyperBand the overall RMSE in order to predict the test data with maximum accu-
at the beginning creates difficulty in finding acceptable configurations. racy. As illustrated in Fig. 3, the hyperparameter search space is passed
According to [26], an arbitrary configuration may result in a low- to the BOHB algorithm. This search space is user-generated, which
quality model; thus, the algorithm will evaluate many configurations allows the BOHB algorithm to optimize the DBN model within the
before finding a decent setting. defined hyperparameter configurations. Hence, to ensure the optimum
Bayesian Optimization and Hyperband (BOHB) algorithm rely on performance and reliability, the number of layers, batch size, number
model-based search at the beginning of each HyperBand iteration. of iterations, pre-tuning, and fine-tuning learning rates were optimized
After the desired number of configurations is selected, the Successive through the BOHB. Table 2 shows the hyperparameter search space
Halving procedure is performed on these configurations. BOHB opti- used to optimize our DBN model.
mization follows the same algorithm of HyperBand with Successive The BOHB algorithm tries to search for the best configuration of
Halving, but it uses Bayesian Optimization to select new configurations hyperparameters by evaluating them on the DBN model iteratively, in a
based on previous trials. The Bayesian Optimization component used given number of evaluations, 1800 in our case. After the optimization,
in BOHB resembles TPE (Tree Parzen Estimator) [23], but it uses a the DBN model is trained on best-evaluated hyperparameters on ran-
single multidimensional Kernel Density Estimator (KDE). In BOHB, a domly selected degradation profiles to give the most accurate results.
minimum number of configurations 𝑁𝑚𝑖𝑛 is required to fit a useful We have selected the data of the initial 15 000 cycles as the training
KDE; which is set to 𝑑 + 1 evaluations performed, where 𝑑 is the data to predict the RUL of the supercapacitors.

4
M. Haris et al. Applied Energy 286 (2021) 116541

Table 2 Table 3
Hyperparameters and their optimization search space. Best hyperparameters configuration after the BOHB optimization.
Hyperparameter Distribution Range Hyperparameter Optimized value
Pre-tuning (RBM) learning rate Uniform [1e−1, 1e−4] Pre-tuning (RBM) learning rate 0.0050
Fine-tuning learning rate Uniform [1e−1, 1e−4] Fine-tuning learning rate 0.0095
Batch size Quantized uniform [1, 8] Batch Size 2
Number of hidden layers Quantized uniform [5, 20] Number of hidden layers 18
Epochs — pre-tuning phase Quantized uniform [10, 200] Epochs — pre-tuning phase 88
Epochs — fine tuning phase Quantized uniform [100, 3000] Epochs — fine tuning phase 1895

profiles on different temperatures. The experiment is performed on


10 Farad (F) supercapacitor manufactured by Maxwell Technologies.
The supercapacitor has rated voltage (𝑉𝑅 ) of 2.7 V and operating
temperature range from −40 ◦ C to 60 ◦ C. The projected life of the
supercapacitor is 500,000 cycles at constant current discharge from
𝑉𝑅 to 0.5𝑉𝑅 at 25 ◦ C. As shown in Fig. 4, the aging speed of the
supercapacitors increases with increase in temperature and the cutoff
voltage. The supercapacitor’s degradation curve shows a similar trend
and pattern regardless of the operating conditions which can be divided
into two segments: A short-term rapid degradation during initial oper-
ating hours or cycles followed by a long-term linear degradation trend
for the remaining part of the lifetime. This nature of the degradation
in the capacity of supercapacitors is also discussed in [28]. Due to
Fig. 4. Degradation of supercapacitors under different temperature and charging similar degradation pattern shown by the supercapacitors during aging,
conditions. The supercapacitors shows fast followed by slow degradation behavior
the proposed prediction model can be used to estimate the RUL of the
regardless of the operating conditions [27].
supercapacitors in different operating conditions.

3. Performance evaluation metrics 4.2. DBN optimization

We select Root-Mean-Squared Error (RMSE) as the loss function The DBN model is optimized by the BOHB algorithm using degra-
to define the proposed model’s performance criteria. RMSE is a stan- dation data of the initial 15 000 cycles of supercapacitor C1, which is
dard criterion to measure the performance of a model in predicting less than 6% data, and results are assessed based on the loss function as
quantitative data, given by Eq. (23): per procedure mentioned in 2.2. Fig. 5(a) shows the result of each eval-

√ uated configuration of hyperparameters on the DBN model by BOHB.
√1 ∑ 𝑁
( )2
𝑅𝑀𝑆𝐸 = √ 𝑌 𝑝𝑟𝑒𝑑𝑛 − 𝑌 𝑟𝑒𝑎𝑙𝑛 (23) The BOHB performed 1800 evaluations on different configurations of
𝑁 𝑛=1 hyperparameters, as mentioned in Table 2. Initial evaluations result in
where 𝑁 is the number of observations and, 𝑌 𝑝𝑟𝑒𝑑𝑛 𝑎𝑛𝑑 𝑌 𝑟𝑒𝑎𝑙𝑛 is the higher values of the loss function (RMSE) due to the initial runs on the
𝑛th predicted and actual value, respectively. To further evaluate the lower budgets. Over time, the value of loss function (RMSE) decreases,
prediction results, we also calculate Mean Absolute Error (MAE), which and prediction accuracy increases due to the model’s training on higher
is the mean of the absolute difference of true and predicted values over budgets. Table 3 shows the optimal hyperparameters configuration that
all instances and 𝑅2 , is the percentage of response variable variation gives the lowest value of the loss function (overall RMSE) during 1800
that can be explained by a regression model. MAE and 𝑅2 can be evaluations of the DBN model by BOHB optimization.
expressed by Eqs. (24) and (25): To prove the proposed BOHB-DBN model’s robustness and effec-
tiveness, we performed benchmarking with Bayesian Optimization and
1 ∑
𝑁
𝑀𝐴𝐸 = |𝑌 𝑝𝑟𝑒𝑑𝑛 − 𝑌 𝑟𝑒𝑎𝑙𝑛 | (24) HyperBand algorithm, separately. The optimization is performed us-
𝑁 𝑛=1 ing the same search space and training data. Fig. 5(b) shows the
∑𝑁 ( )2
𝑌 𝑝𝑟𝑒𝑑𝑛 − 𝑌 𝑟𝑒𝑎𝑙𝑛 computed value of the loss function on every iteration by the three
𝑅2 = 1 − 𝑛=1 ( ) (25)
∑𝑁 2 algorithms. The BOHB algorithm found the lowest value of the loss
𝑛=1 𝑌 𝑝𝑟𝑒𝑑𝑛 − 𝑌 𝑟𝑒𝑎𝑙𝑛 function compared to the Bayesian Optimization algorithm and Hy-
To further elaborate and better visualize the model’s performance, we perBand algorithm. Fig. 5(c) shows the best loss function value found
also calculate percent error at all predicted points to show the deviation by the three algorithms over the number of evaluations performed.
of predicted values from actual values, represented by Eq. (26): Initially, Bayesian Optimization performs better than BOHB and Hyper-
Band algorithm, but after some iterations, BOHB outperforms Bayesian
|𝑌 𝑝𝑟𝑒𝑑𝑛 − 𝑌 𝑟𝑒𝑎𝑙𝑛 |
𝑃 𝑒𝑟𝑐𝑒𝑛𝑡 𝑒𝑟𝑟𝑜𝑟 (%) = × 100 (26) Optimization and HyperBand algorithm. It can be seen from Fig. 5(c),
𝑌 𝑟𝑒𝑎𝑙𝑛
after 200 iterations, the BOHB found a hyperparameter configuration
4. Results and discussions that gives less than 1 RMSE value, while BO requires 900 iterations.
On the other hand, the minimum value of RMSE found by HyperBand
4.1. Degradation dataset of supercapacitor optimization is 1.0068 after 1800 iterations. Therefore, the BOHB
algorithm is more efficient and effectively gives better results than the
The operating and environmental conditions of supercapacitors in other two algorithms. The BOHB also has the highest distribution of loss
real systems are different, which affects the speed of degradation of the function in the lowest range compared to Bayesian Optimization and
supercapacitors. In this study, we have used dataset published in [27] HyperBand, as shown in Fig. A.1. Table 4 shows the best loss function
in which the degradation is performed using multiple charge–discharge found by the three algorithms during 1800 optimization iterations.

5
M. Haris et al. Applied Energy 286 (2021) 116541

Table 4
The lowest loss function found by BOHB, Bayesian Optimization
and HyperBand during 1800 evaluations.
Algorithm Best loss value
BOHB 0.9389
Bayesian Optimization 0.9431
HyperBand 1.0068

Table 5
Performance of the BOHB-DBN model trained over different percentages of the training
data.
Training data RMSE R2 MAE
30% 0.9507 0.9585 0.6900
50% 0.8291 0.9684 0.4506
70% 0.7786 0.9721 0.3902

Table 6
Early RUL prediction accuracy of BOHB-DBN model on five randomly selected
supercapacitors.
Supercapacitor RMSE 𝑅2 MAE
C1 0.9389 0.9595 0.6927
C3 1.1670 0.8914 0.9660
C6 1.0523 0.8934 0.8365
C8 1.6326 0.8902 1.0070
C9 0.8851 0.8538 0.6883

for C1. Table 5 shows the proposed model’s performance trained on


30%, 50%, and 70% training data of C1. The RMSE decreases by
increasing the percentage of the training data since the model learns
more changes in the degradation pattern of C1.
To prove the robustness and versatility of the BOHB-DBN model,
we have randomly selected the degradation profile of 4 supercapacitors
and predict their RUL. The model is trained with data of the initial
15 000 cycles comprising less than 6% of the total degradation data
(except C8), and the prediction performance is analyzed. This time
the models’ hyperparameters are not optimized and identical architec-
ture for training is used, computed in 4.2 to show adaptability and
robustness to new data. The model predicts the remaining 94% of
the curve with good accuracy which shows that it can be used to
estimate the degradation curve of the supercapacitors in early stages
as shown in Fig. 6. Table 6 shows the performance of our proposed
method considering RMSE as a standard of performance evaluation
for the model. The overall RMSE of C3 is 1.1670, C6 is 1.0523, C9
is 0.8851, and C8 is 1.6326, showing the usefulness of the proposed
method for early predictions. A comparatively large value of RMSE for
C8 is due to the fact that the supercapacitor C8 shows premature failure
since the operating profile is beyond its maximum ratings. Therefore,
the degradation pattern of C8 is faster than the other supercapacitors.
However, the model gives acceptable results; even the degradation
profile over which the network is optimized was different than C8.
In order to further illustrate the effectiveness of the proposed model,
we have also calculated the percentage error at each point of predic-
tion. As illustrated in Fig. 7, we observed that the maximum absolute
Fig. 5. Optimization of DBN model by the BOHB algorithm. (a) The result of every
percentage error at all the predicted values did not exceed 2.2% in
configuration tested by the BOHB algorithm. (b) Loss function found by Bayesian
Optimization, HyperBand and BOHB Optimization on every evaluation. (c) Performance any selected degradation profiles. However, [16] gives prediction error
comparison between the BOHB, Bayesian and HyperBand optimization. of 1.61% with 60% training data and [13] gives 8.37% error with
70% training data. Furthermore, [17] produces 0.22% MAPE with 70%
training data while the proposed model gives 1.05% MAPE with 6%
4.3. Prediction of RUL training data. This shows that the point to point as well as overall
prediction accuracy of the proposed model is very good, considering
The optimization of the prediction model was performed using less a long prediction time.
than 6% data for training, which gives results with good prediction
accuracy, as discussed in Section 4.2. In order to show the usefulness 4.3.1. Model’s training on noisy data
and adaptability of the proposed model even on comparatively small The robustness of the model is one of the primary focuses in the
training data, we analyzed the model on different percentages of the development of the model. To further explore the model’s performance
training data using the same network architecture presented in Table 3 on unforeseen data, we superimposed Gaussian noise (𝜇 = 0 and

6
M. Haris et al. Applied Energy 286 (2021) 116541

Table 7
Performance of the BOHB-DBN model trained on noisy data.
Supercapacitor RMSE MAE
C1 2.4237 1.9993
C3 1.6602 1.3624
C6 1.1619 0.9258
C8 2.0385 1.4253
C9 1.2484 1.0928

Fig. 7. Absolute percentage error at all predicted points on selected degradation


profiles.

5. Conclusion

This study proposed a novel and robust algorithm based on Deep


Belief Network optimized with Bayesian Optimization and HyperBand
(BOHB) to estimate RUL of the supercapacitors in early stages. The
optimization of the DBN with BOHB provided an efficient way to search
for hyperparameters, which shows faster results in finding optimal
solutions with the lowest prediction error values. The optimization
is performed using less than 6% data comparing to previous studies
which utilizes up-to 70% data for training. The results of the opti-
mization are individually compared with Bayesian optimization as well
as HyperBand optimization. The BOHB algorithm is 77% faster than
the two other algorithms for our prediction network. We analyzed the
performance of the developed prediction model with new degradation
data of four other capacitors using the same hyperparameters and 6%
of training data (except C8) to show the universality of the model. Con-
sidering the prediction data comprised of 94% of the total degradation
data, the model predicted the degradation curve of the supercapacitors
with a maximum error of 2.2%.
Fig. 6. RUL predictions by the proposed BOHB-DBN model. (a) C1:2.7 V,65 ◦ C (b) In comparison to previous studies, the proposed model saves up to
C3:2.9 V,50 ◦ C (c) C6:2.7 V,25 ◦ C.
54% of development time for establishing a data-driven model, which
will save time required to collect and measure the cycling data of
supercapacitors. We have also tested the robustness and stability of the
𝜎 = 0.5) on the training data and analyzed the prediction accuracy. proposed model by adding noise in the training data. The addition of
This noise in the training data serves two purposes; first, it is used to noise in training data represents the deviation of degradation patterns
test the performance on new data with the same degradation patterns. due to operating conditions and measurement errors. The BOHB-DBN
Second, it can be accounted for errors in measurement data during method showed good prediction accuracy trained on the noisy data
the operation. The model shows excellent stability and shows good with a minimum RMSE of 1.16. The proposed model has predicted
prediction accuracy, as shown in Fig. 8. The zoom inset shows the effect the most cycles to the author’s knowledge than any other previous
works related to the RUL estimation of the supercapacitors with low
of adding noise on training data. It can be seen from the figure that the
errors in the predicted value. The low prediction rate using a small
proposed BOHB-DBN model follows the same pattern for prediction,
percentage of training data represents that the proposed method can
even with a change in data. This shows that the proposed model is
be used as a tool to predict the SOH of any supercapacitor, well before
robust and adaptable to new degradation data even when noise is any other algorithm. Thus, it helps designers and manufacturers to
present and can be used to estimate the RUL of any supercapacitors validate and verify the performance in early phases of operation and
during early stages with good precision. Table 7 shows the prediction replace supercapacitors, which shows poor performance, saving time,
results of the model with noise in training data. and manufacturing costs.

7
M. Haris et al. Applied Energy 286 (2021) 116541

Fig. A.1. Loss function distributions to show the loss function density. (a) BOHB
optimization. (b) Bayesian optimization. (c) HyperBand optimization.
Fig. 8. Prediction results of the model trained on noisy data. The inset graph in the
figure shows the effect of adding noise on the training data (a) C9. (b) C6. (c)C3.

Appendix

CRediT authorship contribution statement A.1. Loss function distributions of BOHB, Bayesian Optimization and Hy-
perBand

Muhammad Haris: Investigation, Conceptualization, Methodology, To further illustrate the performance of the BOHB optimization on
Writing - original draft. Muhammad Noman Hasan: Investigation, the DBN model, we have plotted the distribution of loss function over
Data curation, Software. Shiyin Qin: Methodology, Supervision. the complete optimization run. Fig. A.1 shows the distribution graphs
of the three algorithms. As seen from Fig. A.1(a), the BOHB optimiza-
tion, evaluate maximum number of lowest loss function’s values than
Bayesian Optimization and HyperBand.
Declaration of competing interest

References
The authors declare that they have no known competing finan-
[1] Liu S, Wei L, Wang H. Review on reliability of supercapacitors in energy
cial interests or personal relationships that could have appeared to storage applications. Appl Energy 2020;278:115436. http://dx.doi.org/10.1016/
influence the work reported in this paper. j.apenergy.2020.115436.

8
M. Haris et al. Applied Energy 286 (2021) 116541

[2] Chia YY, Lee LH, Shafiabady N, Isa D. A load predictive energy management [15] Murray DB, Hayes JG. Cycle testing of supercapacitors for long-life robust
system for supercapacitor-battery hybrid energy storage system in solar ap- applications. IEEE Trans Power Electron 2015;30:2505–16. http://dx.doi.org/10.
plication using the Support Vector Machine. Appl Energy 2015;137:588–602. 1109/TPEL.2014.2373368, URL: http://ieeexplore.ieee.org/document/6963481/.
http://dx.doi.org/10.1016/j.apenergy.2014.09.026. [16] Zhou Y, Huang M, Pecht M. Remaining useful life estimation of lithium-ion cells
[3] Ma T, Yang H, Lu L. Development of hybrid battery-supercapacitor energy based on k-nearest neighbor regression with differential evolution optimization,
storage for remote area renewable energy systems. Appl Energy 2015;153:56–62. vol. 249. Elsevier Ltd.; 2020, http://dx.doi.org/10.1016/j.jclepro.2019.119409.
http://dx.doi.org/10.1016/j.apenergy.2014.12.008. [17] Zhou Y, Huang Y, Pang J, Wang K. Remaining useful life prediction for
[4] Serban I. A control strategy for microgrids: Seamless transfer based on supercapacitor based on long short-term memory neural network. J Power
a leading inverter with supercapacitor energy storage system. Appl Energy Sources 2019;440:227149. http://dx.doi.org/10.1016/j.jpowsour.2019.227149,
2018;221:490–507. http://dx.doi.org/10.1016/j.apenergy.2018.03.122. URL: https://linkinghub.elsevier.com/retrieve/pii/S0378775319311425.
[5] Jaszczur M, Hassan Q. An optimisation and sizing of photovoltaic system with [18] Soualhi A, Sari A, Razik H, Venet P, Clerc G, German R, et al. Supercapacitors
supercapacitor for improving self-consumption. Appl Energy 2020;279:115776. ageing prediction by neural networks. In: IECON proceedings (industrial elec-
http://dx.doi.org/10.1016/j.apenergy.2020.115776. tronics conference). 2013, p. 6812–8. http://dx.doi.org/10.1109/IECON.2013.
[6] Feroldi D, Carignano M. Sizing for fuel cell/supercapacitor hybrid vehicles based 6700260.
on stochastic driving cycles. Appl Energy 2016;183:645–58. http://dx.doi.org/10. [19] Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets
1016/j.apenergy.2016.09.008. *. Technical report.
[7] Castaings A, Lhomme W, Trigui R, Bouscayrol A. Comparison of energy [20] Hinton GE. Training products of experts by minimizing contrastive
management strategies of a battery/supercapacitors system for electric vehicle divergence. Neural Comput 2002;14:1771–800. http://dx.doi.org/10.1162/
under real-time constraints. Appl Energy 2016;163:190–200. http://dx.doi.org/ 089976602760128018.
10.1016/j.apenergy.2015.11.020. [21] Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach
[8] Zhang Z, Zhang X, Chen W, Rasim Y, Salman W, Pan H, et al. A high-efficiency Learn Res 2012;13:281–305.
energy regenerative shock absorber using supercapacitors for renewable energy [22] Falkner S, Klein A, Hutter F. BOHB: Robust and efficient hyperparameter
applications in range extended electric vehicle. Appl Energy 2016;178:177–88. optimization at scale. In: 35th international conference on machine learning,
http://dx.doi.org/10.1016/j.apenergy.2016.06.054. vol. 4. ICML 2018, 2018, p. 2323–41, arXiv:1807.01774.
[9] Hammar A, Venet P, Lallemand R, Coquery G, Rojat G. Study of accelerated [23] Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter
aging of supercapacitors for transport applications. IEEE Trans Ind Elec- optimization. In: Advances in neural information processing systems 24: 25th
tron 2010;57:3972–9. http://dx.doi.org/10.1109/TIE.2010.2048832, URL: http: annual conference on neural information processing systems 2011; 2011. p. 1–9.
//ieeexplore.ieee.org/document/5456234/. [24] Bergstra J, Yamins D, Cox DD. Making a science of model search: Hyperparam-
[10] Gualous H, Gallay R, Alcicek G, Tala-Ighil B, Oukaour A, Boudart B, et al. eter optimization in hundreds of dimensions for vision architectures. Technical
Supercapacitor ageing at constant temperature and constant voltage and ther- report PART 1, 2013.
mal shock. Microelectron Reliab 2010;50:1783–8. http://dx.doi.org/10.1016/j. [25] Jamieson K, Talwalkar A. Non-stochastic best arm identification and hyper-
microrel.2010.07.144. parameter optimization. In: Proceedings of the 19th international conference
[11] El Mejdoubi H, Gualous H, Oukaour A, Slamani Y, Sabor J. Supercapacitors on artificial intelligence and statistics. AISTATS 2016, 2015, p. 240–8, arXiv:
state-of-health diagnosis for electric vehicle applications. In: EVS 2016 - 29th 1502.07943.
international electric vehicle symposium 8; 2016. p. 379–87. [26] Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: A
[12] Kovaltchouk T, Multon B, Ahmed HBen, Aubry J, Venet P. Enhanced aging model novel bandit-based approach to hyperparameter optimization. J Mach Learn Res
for supercapacitors taking into account power cycling: Application to the sizing 2018;18:1–52, arXiv:1603.06560.
of an energy storage system in a direct wave energy converter. IEEE Trans Ind [27] Zhou Y, Wang Y, Wang K, Kang L, Peng F, Wang L, et al. Hybrid genetic
Appl 2015;51:2405–14. http://dx.doi.org/10.1109/TIA.2014.2369817. algorithm method for efficient and robust evaluation of remaining useful life
[13] El Mejdoubi A, Chaoui H, Sabor J, Gualous H. Remaining useful life prognosis of of supercapacitors. Appl Energy 2020;260:114169. http://dx.doi.org/10.1016/j.
supercapacitors under temperature and voltage aging conditions. IEEE Trans Ind apenergy.2019.114169.
Electron 2018;65:4357–67. http://dx.doi.org/10.1109/TIE.2017.2767550, URL: [28] Maxwell Technologies. Application note: Maxwell technologies® BOOSTCAP®
http://ieeexplore.ieee.org/document/8089445/. energy storage modules life duration estimation. Technical report, 2007.
[14] Li Y, Liu K, Foley AM, Zülke M, Nanini-Maury E, Van Mierlo HE. Data-
driven health estimation and lifetime prediction of lithium-ion batteries:
A review. Renew Sustain Energy Rev 2019;113:109254. http://dx.doi.org/
10.1016/j.rser.2019.109254, URL: https://linkinghub.elsevier.com/retrieve/pii/
S136403211930454X.

You might also like