Professional Documents
Culture Documents
Technology
PAPER
1. Introduction
Recently, regulations on fossil fuel vehicles in the automotive industry have been strengthened owing to
emerging environmental issues, and the xEV (electric vehicles, hydrogen fuel vehicles, hybrid vehicles, etc)
market has been expanding. Batteries, one of the core components of xEVs, can affect the efficiency and
safety of the system depending on their condition and management methods. As a result, the need to
monitor the battery’s condition through metrics such as the state of health (SOH) and the state of charge
(SOC) to ensure its safety and reliability has arisen.
SOH is defined as the percentage of maximum capacity that a fully charged battery can discharge at a
specific point in time compared to its capacity. As an important indicator representing the aging state of the
battery, active research is being conducted on SOH estimation methods. The methods for SOH estimation
are broadly divided into model-based estimation methods [1–3] and data-based estimation methods [4–6].
Model-based SOH estimation analyzes the physical/chemical principles of batteries and mathematically
models them, applying estimation filters based on the model. The SOH of a battery has been estimated using
a Kalman filter (KF) based on the Thevenin battery model [1], the unscented particle filter algorithm [2],
and dual extended KF [3]. However, model-based SOH estimation has limitations in linearizing and
modeling complex nonlinear internal state changes.
Data-based SOH estimation studies aim to estimate SOH by reflecting the characteristics of the data
without considering the complex state changes inside the battery [7–9]. Cui and Joe used a dynamic
spatiotemporal attention-based gated recurrent unit (GRU) model based on a GRU [4], Gu et al combined a
convolutional neural network (CNN) and a transformer [5], and Li et al used active state tracking long
short-term memory neural network (AST-LSTM NN) for estimating SOH and predicting remaining useful
life [6]. Shen et al used the extreme learning machine and voltage estimated through the whale optimization
algorithm to assess SOH [8]. Avkhimenia used reinforced training for battery action [10]. Lee et al used five
types of CNN models for SOH estimation [11], whereas Lin et al used semi-supervised learning for the same
purpose [12]. Various data-based SOH estimation algorithms are being researched to improve the accuracy
of SOH estimation, and high-quality datasets containing large quantities of data are imperative to leverage
the full potential of these methods.
As datasets become more important, research on data generation algorithms, such as generative
adversarial networks (GANs) [13], is actively being conducted. In the image domain, algorithms such as deep
convolutional GANs (DCGANs) [14], Wasserstein GANs (WGANs) [15], and least squares GANs (LSGANs)
[16] have been developed, and algorithms such as continuous RNN GANs (C-RNN-GANs) [17], recurrent
conditional GANs (RCGANs) [18], and time-series GANs (TimeGANs) [19] are being researched in the time
series domain. C-RNN-GAN [17] directly applies the GAN architecture to sequence data, RCGAN [18]
introduces minor architecture differences to reduce dependency on previous outputs while maintaining
similarity, and TimeGAN [19] generalizes time series data, reduces data dimensions through latent space,
and facilitates learning.
Research is being conducted to improve the performance of estimation and classification algorithms by
generating a large amount of high-quality data using TimeGAN [20]. Increased prediction accuracy by
generating hard-to-collect tire wear data, and [21] improved prediction accuracy by supplementing heating
system data with different characteristics depending on the conditions. Additionally [22], resolved the
imbalance of bearing failure data using data generation algorithms to improve failure classification accuracy.
Therefore, research on improving estimation and classification accuracy by augmenting open-source datasets
with limited quantities, such as batteries, is necessary.
This study aimed to improve battery SOH estimation accuracy by supplementing a dataset, a crucial
factor affecting learning, using TimeGAN. Since it is difficult to collect battery data, we secure a high-quality
dataset containing a large quantity of data through TimeGAN and compare the SOH estimation accuracy
improvement rate through LSTM [23] and GRU [24] based on recurrent neural networks. Through this, we
aimed to verify the usefulness of the SOH estimation accuracy improvement technique proposed in this
paper by securing datasets.
2. Methods
2.3. TimeGAN
GAN is a representative algorithm for data augmentation [13], based on which various algorithms have been
derived in the image domain (DCGAN [14], WGAN [15], LSGAN [16], etc) and the time series domain
2
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
(C-RNN-GAN [17], RCGAN [18], TimeGAN [19], etc). The basic GAN model consists of a generator and a
discriminator, as shown in figure 3. The generator uses random noise as input to create synthetic data, and
the discriminator classifies original data and synthetic data. Equation (2) is based on the role of the generator
and discriminator, and the GAN algorithm aims to generate synthetic data similar to the original data. Here,
D and G represent the discriminator and generator, respectively, V represents the value function of the
discriminator and generator, and E represents the expected value,
minG maxD V (D, G) = EXPdata (x) [log D (x)] + EZPZ (z) [log (1 − D (G (z)))] . (2)
When the discriminator D classifies the original data x as true and synthetic data G (z) as false, the value
function V (D, G) has its maximum value. The GAN algorithm learns in the direction where the value
function is maximized.
3
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
Based on this, the TimeGAN model was proposed to generate time series data reflecting temporal
characteristics. The TimeGAN structure, as shown in figure 4, adds an autoencoder architecture to the GAN
algorithm, enabling the learning of temporal dynamics in smaller dimensions. In the autoencoder, data are
reduced and restored through embedding and recovery functions, and in GAN, data are generated and
distinguished in reduced dimensions to create data similar to the original data.
hs = es (s) (3)
ht = ex (hs , ht−1 , xt ) . (4)
The recovery function restores the reduced data to the original dimension for data learning. In detail, the
static feature space (hs ) and the temporal feature space (ht ) expressed in the latent space are restored to the
original dimension’s static feature space (s) and the temporal feature space (x), respectively. The recovery
function is shown in equations (5) and (6)
s̃ = rs (hs ) (5)
The discriminator function, which classifies synthetic and original data, is shown in equations (9)
and (10),
ỹs = ds h̃s (9)
4
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
ỹt = dx −−
→
ut , −
→
ut . (10)
2.4.1. LSTM
LSTM is an algorithm designed to solve the vanishing gradient problem of recurrent neural networks, with a
structure shown in figure 5 [23]. First, it uses the cell state to transmit information input before the current
time point and assigns weights to the data through three gates (forget gate, input gate, and output gate). The
forget gate determines the importance of past data and assigns weights. The input gate decides how much to
reflect the current input and the previously hidden state data. The output gate determines the weight when
transmitting to the hidden state.
As shown in equation (11), the forget gate assigns appropriate weights to the values of ht−1 and xt and
delivers them to cell state Ct−1 . Meaningful data receive weights close to 1, while unimportant data receive
weights close to 0,
ft = σ Wf · [ht−1 , xt ] + bf . (11)
In the input gate, the new cell state Ct is created by updating the cell state through it and C̃t , as shown in
equations (12)–(14),
In the output gate, the output of Ct is determined, as shown in equations (15) and (16),
The LSTM model configuration used for SOH estimation is shown in figure 6 and consists of an input
layer, LSTM layer, dropout, and dense layer.
2.4.2. GRU
GRU is an algorithm proposed to simplify LSTM [24]. It combines LSTM’s cell state and hidden state and
integrates the forget gate and input gate to create a simpler structure, as shown in figure 7. GRU consists of
an update gate and a reset gate, resulting in a simpler structure. Its characteristic feature is the reduced
computational load compared to LSTM.
5
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
The reset gate determines the weight of the previous information, and the update gate decides how much
information to reflect through the tanh function.
In the reset gate, the weights of the previous hidden state ht−1 and the current input xt are determined, as
shown in equations (17) and (18),
The update gate judges the reflection ratio of past and current information and assigns weights, as shown
in equations (19) and (20),
The GRU model configuration used for SOH estimation is shown in figure 8 and consists of an input
layer, GRU layer, dropout, and dense layer.
6
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
battery dataset and compared the SOH estimation accuracy improvement rate using the synthetic battery
dataset similar to the original one.
First, to quantitatively evaluate the similarity of the generated time series data features, the original and
synthetic data were represented using t-distributed stochastic neighbor embedding (t-SNE). To compare the
similarity of datasets in t-SNE, we applied the rate of change in the correlation coefficient of linear regression
and the silhouette coefficient [30], which are quantitative indicators.
Next, to compare the SOH estimation accuracy improvement rate, we used RMSE. RMSE is a
representative indicator for estimation accuracy evaluation and is shown in equation (21). Here, yi and ŷi
represent the actual and predicted values, respectively, and n denotes the number of data points,
v
u n
√ u1 X
RMSE = MSE = t
2
(yi − ŷi ) . (21)
n
i =1
We evaluated the proposed SOH estimation accuracy improvement method through TimeGAN in two
aspects. First, we applied the rate of change in the correlation coefficient and the silhouette coefficient to
evaluate the similarity between the synthetic dataset generated through TimeGAN and the original dataset.
Then, we estimated SOH using LSTM and GRU algorithms with the synthetic battery dataset to evaluate the
improvement rate of SOH estimation accuracy.
7
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
Next, we evaluated the similarity of the data using the silhouette coefficient. The closer the silhouette
coefficient value is to 0, the greater the similarity to the quality of the original data, and the closer the value is
to 1, the more distinct the characteristics between the data. The silhouette coefficient of the original battery
dataset and synthetic battery dataset was close to 0, demonstrating that the latter reflects the characteristics of
the original dataset.
We confirmed the similarity between the synthetic and original battery datasets through two evaluation
indicators, as shown in table 1, and verified its applicability in battery SOH estimation algorithms.
8
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
was 18.11% for RW12 data, and the lowest rate was 5.22% for RW10 data. Additionally, the IQR either
improved or remained similar, confirming its influence on the improvement of learning stability.
As shown in table 2, with the dataset secured through TimeGAN, the accuracy of learning improved
overall for all cells and algorithms. The accuracy improvement rate varied with the characteristics of the
original dataset, and since the accuracy can change with the estimation algorithm and learning environment,
the cells with the highest or lowest accuracy varied depending on the estimation algorithm and learning
environment; this confirms that while the most suitable estimation algorithm for learning may differ
depending on the data characteristics, SOH estimation accuracy can be improved by securing datasets
through data augmentation algorithms. In addition, the SOH estimation accuracy improved with a
significant increase in synthetic data in all cells, and the stability of SOH estimation results was also affected.
9
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
4. Conclusion
This study investigated a method to improve SOH estimation accuracy by securing a high-quality dataset
containing a large quantity of data through TimeGAN. To augment the battery dataset, we applied the
TimeGAN model and then verified the similarity between the generated synthetic dataset and the original
dataset to ensure high quality. When the confirmed high-quality synthetic battery dataset was added to the
original dataset for SOH estimation, the battery SOH estimation accuracy improved for all cells.
(1) A synthetic battery dataset with similar characteristics to the original battery dataset can be generated
using TimeGAN.
(2) The addition of a synthetic battery dataset can improve SOH estimation accuracy.
It was confirmed that utilizing high-quality synthetic data in the SOH estimation algorithm learning
improved the SOH estimation accuracy.
(3) The dataset expanded through synthetic data can improve the stability of SOH estimation algorithm
learning.
It was confirmed that utilizing a large amount of synthetic data for SOH estimation algorithm learning
resolved the problems of overfitting or underfitting, resulting in stable learning outcomes.
The data that support the findings of this study are openly available at the following URL/DOI: https://data.
nasa.gov/Raw-Data/Randomized-Battery-Usage-1-Random-Walk/ugxu-9kjx.
Acknowledgments
This work was supported by ‘Building an open platform ecosystem for future technology innovation in the
automotive industry’ funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea). (Project
Number: P0018434) and the Industrial Strategic Technology Development Program (20010132,
10
Mach. Learn.: Sci. Technol. 4 (2023) 045007 S Seol et al
Development of the systematization technology of e-powertrain core parts development platform for
expending the industry of xEV parts) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).
ORCID iDs
References
[1] Topan P A, Ramadan M N, Fathoni G, Cahyadi A I and Wahyunggoro O 2016 State of charge (SOC) and state of health (SOH)
estimation on lithium polymer battery via Kalman filter 2nd Int. Conf. on Science and Technology-Computer (ICST) (IEEE
Publications)
[2] Liu D, Yin X, Song Y, Liu W and Peng Y 2018 An on-line state of health estimation of lithium-ion battery using unscented particle
filter IEEE Access 6 40990
[3] Dai H, Wei X, Sun Z, Wang J and Gu W 2012 Online cell SOC estimation of Li-ion battery packs using a dual time-scale Kalman
filtering for EV applications Appl. Energy 95 227
[4] Cui S and Joe I 2021 A dynamic spatial-temporal attention-based GRU model with healthy features for state-of-health estimation
of lithium-ion batteries IEEE Access 9 27374
[5] Gu X, See K W, Li P, Shan K, Wang Y, Zhao L, Lim K C and Zhang N 2023 A novel state-of-health estimation for the lithium-ion
battery using a convolutional neural network and transformer model Energy 262 125501
[6] Li P, Zhang Z, Xiong Q, Ding B, Hou J, Luo D, Rong Y and Li S 2020 State-of-health estimation and remaining useful life
prediction for the lithium-ion battery based on a variant long short-term memory neural network J. Power Sources 459 228069
[7] Obregon J, Han Y R, Ho C W, Mouraliraman D, Lee C W and Jung J Y 2023 Convolutional autoencoder-based SOH estimation of
lithium-ion batteries using electrochemical impedance spectroscopy J. Energy Storage 60 106680
[8] Shen J, Ma W, Shu X, Shen S, Chen Z and Liu Y 2023 Accurate state of health estimation for lithium-ion batteries under random
charging scenarios Energy 279 128092
[9] Li Y, Liu K, Foley A M, Zülke A, Berecibar M, Nanini-Maury E, van Mierlo J and Hoster H E 2019 Data-driven health estimation
and lifetime prediction of lithium-ion batteries: a review Renew. Sustain. Energy Rev. 113 109254
[10] Avkhimenia V 2023 Sizing, operation, and evaluation of battery energy storage with dynamic line rating and deep learning MSc
Thesis University of Alberta
[11] Lee G, Kwon D and Lee C 2023 A convolutional neural network model for SOH estimation of Li-ion batteries with physical
interpretability Mech. Syst. Signal Process. 188 110004
[12] Lin C, Xu J and Mei X 2023 Improving state-of-health estimation for lithium-ion batteries via unlabeled charging data Energy
Storage Mater. 54 85–97
[13] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair Z, Courville A and Bengio Y 2014 Generative adversarial
nets Advances in Neural Information Processing Systems vol 27 pp 2672–80
[14] Radford A, Metz L and Chintala S 2015 Unsupervised representation learning with deep convolutional generative adversarial
networks (arXiv:1511.06434)
[15] Arjovsky M, Chintala S and Bottou L 2017 Wasserstein generative adversarial networks Int. Conf. on Machine Learning (PMLR)
[16] Mao X, Li Q, Xie H, Lau R Y K, Wang Z and Smolley S P 2017 Least squares generative adversarial networks Proc. IEEE Int. Conf. on
Computer Vision p 2813
[17] Mogren O 2016 C-RNN-GAN: continuous recurrent neural networks with adversarial training (arXiv:1611.09904)
[18] Esteban C, Hyland S L and Rätsch G 2017 Real-valued (medical) time series generation with recurrent conditional GANs
(arXiv:1706.02633)
[19] Yoon J, Jarrett D and van der Schaar M 2019 Time-series generative adversarial networks Advances in Neural Information Processing
Systems p 32
[20] Shangguan A, Xie G, Fei R, Mu L and Hei X 2023 Train wheel degradation generation and prediction based on the time series
generation adversarial network Reliab. Eng. Syst. Saf. 229 108816
[21] Zhang Y, Zhou Z, Liu J and Yuan J 2022 Data augmentation for improving heating load prediction of heating substation based on
TimeGAN Energy 260 124919
[22] Li J, Liu Y and Qijie L 2022 Generative adversarial network and transfer-learning-based fault detection for rotating machinery with
imbalanced data condition Meas. Sci. Technol. 33 045103
[23] Hochreiter S and Schmidhuber J 1997 Long short-term memory Neural Comput. 9 1735
[24] Chung J, Gulcehre C, Cho K H and Bengio Y 2014 Empirical evaluation of gated recurrent neural networks on sequence modeling
(arXiv:1412.3555)
[25] Bole B, Kulkarni C S and Daigle M 2014 Adaptation of an electrochemistry-based Li-ion battery model to account for deterioration
observed under randomized use Annual Conf. PHM Society vol 6
[26] Agudelo B O, Zamboni W and Monmasson E 2021 Application domain extension of incremental capacity-based battery SoH
indicators Energy 234 121224
[27] Hong S, Kang M, Jeong H and Baek J 2020 State of health estimation for lithium-ion batteries using long-term recurrent
convolutional network IECON 2020 the 46th Annual Conf. IEEE Industrial Electronics Society
[28] Chemali E, Kollmeyer P J, Preindl M, Fahmy Y and Emadi A A 2022 A convolutional neural network approach for estimation of
Li-ion battery state of health from charge profiles Energies 15 1185
[29] Bockrath S, Lorentz V and Pruckner M 2023 State of health estimation of lithium-ion batteries with a temporal convolutional
neural network using partial load profiles Appl. Energy 329 120307
[30] Rousseeuw P J 1987 Silhouettes: a graphical aid to the interpretation and validation of cluster analysis J. Comp. Appl. Math. 20 53
11