You are on page 1of 5

2021 The 3rd Asia Energy and Electrical Engineering Symposium

Day-Ahead Photovoltaic Power Forcasting Using Convolutional-LSTM Networks


Yuanyuan Wang* Yaobang Chen
Dongying Power Supply Company School of Physical Science and Technology
State Grid Shandong Electric Power Company, University of Jinan
Dongying, China Jinan, China
e-mail: fwork2020@163.com e-mail: datacyb@gmail.com

Hanghang Liu, Xiaowei Ma


Dongying Power Supply Company Dongying Power Supply Company
State Grid Shandong Electric Power Company, State Grid Shandong Electric Power Company,
Dongying, China Dongying, China
e-mail: caic617618@163.com e-mail: cai617618@163.com
2021 3rd Asia Energy and Electrical Engineering Symposium (AEEES) | 978-1-6654-2551-3/21/$31.00 ©2021 IEEE | DOI: 10.1109/AEEES51875.2021.9403023

Xiaoxiang Su Qi Liu
Dongying Power Supply Company Dongying Power Supply Company
State Grid Shandong Electric Power Company, State Grid Shandong Electric Power Company,
Dongying, China Dongying, China
e-mail: sxxiang@163.com e-mail: rose@163.com

Abstract——Efficient and reliable forcasting of power PV power forecasting (PVPF) can be classified into
generation from PV plants is important for grid management short-term forecasting and long-term forecasting according
optimization and power dispatch allocation. In this paper, we to the forecast duration. Forecasting methods are divided into
propose a hybrid model for short-term photovoltaic power four main categories: statistical models, machine learning
forcasting (PVPF), Convolutional-LSTM (Conv-LSTM), and models, hybrid models, and deep learning models. Statistical
extend its attention mechanism architecture and shortcut models mainly rely on the historical data of the predicted
connection architecture, called Conv-LSTM-A and variables for forecasting, such as probabilistic models [4] and
Conv-LSTM-S, respectively.The input data used are hourly
autoregressive (AR) models [5]. Machine learning models
data of commercial PV plants for one year and contain only
are mostly used for recursive forcasting, mainly support
four features: historical active power, solar radiation, panel
temperature and current hour. The effects of different
vector machines (SVM) [6] and regression models [7].
historical data lengths on the forcasting results are compared. hybrid models are mainly a mixture of the above models
The hyperparameters of the model were adjusted using [8,9]. deep learning models can achieve multi-step direct
bayesian optimization to find the model with the best forcasting and can cope well with overfitting and gradient
hyperparameter configuration. The experimental results show disappearance problems. CNN [10], RNN [11] and hybrid
that the RMSE of the Conv-LSTM-A model is 11.13% and the models have been shown to be be applied to PV power
MAPE of the Conv-LSTM-S model is 7.0030 using 7 days of generation forcasting. Among them, the hybrid model of
historical data for day-ahead PV power forcasting. CNN and RNN is one of the most commonly used methods
[12].
Keywords-tovoltaic power; convolutional-LSTM; attention In a hybrid model based on deep learning, [13] used a
mechanism; shortcut connection; bayesian optimization CNN-LSTM model to predict solar radiation and their results
showed good short-term forcasting results. [14] used
I. INTRODUCTION LSTM-RNN model for day-ahead PVPF, which improved
In response to the depletion of fossil fuels and global the generalization ability of the model. [15] demonstrated
warming, renewable energy (RE) is becoming an that the hybrid LSTM-Convolutional model can be applied
increasingly important part of the energy mix. Among them, to PVPF. [16] used the attention mechanism model for
photovoltaic power generation is one of the most promising short-term PVPF and demonstrated the effectiveness of the
technologies for renewable energy generation [1]. The LSTM-Attention model. In [17], the Convolutional LSTM
Global Future Report 2013 Renewable Energy Policy Network was proposed for the first time and used for
Network for the 21st Century (REN21) predicts that by 2050, precipitation forecasting, which can extract spatio-temporal
the global solar photovoltaic (PV) capacity has the potential features. In view of the advantages of hybrid models, this
to reach 8000 GW [2]. However, the intermittent and paper proposes a new hybrid forcasting model for PVPF,
fluctuating solar radiation results in high variability in power Convolutional-LSTM (Conv-LSTM) and extends the
generation, which poses a serious challenge for PV power to attention mechanism and shortcut connection models, called
the grid. Efficient and accurate forecasting can solve these Conv-LSTM-A and Conv-LSTM -S.
problems [3]. The main contributions of this paper are as follows: 1) A
hybrid PVPF network with better extraction of

978-1-6654-2551-3/21/$31.00 ©2021 IEEE 917


Authorized licensed use limited to: Bahria University. Downloaded on June 10,2023 at 19:26:52 UTC from IEEE Xplore. Restrictions apply.
spatio-temporal features is proposed, using 2D-CNN to
extract spatial features and combining attention mechanism
and shortcut connection to redesign two models to improve
the forcasting capability of the model. 2) Using bayesian
optimization to fine-tune hyperparameters, the optimal
configuration of the model is found efficiently and with good
generalization ability. 3) The model performance is
compared when using 5-day and 7-day historical data for
forcasting respectively. The main structure of this paper is as
follows: Section 2 introduces the proposed model Then is integrated (intergrated) with the previous
architectures. Section 3 introduces the dataset used in this
paper. Section 4 presents the evaluation metrics and hidden state unit and the previous target output
experimental results; Section 5 gives the conclusions.
to predict the current hidden state .
II. METHODOLOGY The proposed Conv-LSTM-A model architecture in this
paper is shown in Figure 1(b).
A. Conv-LSTM
C. Conv-LSTM with Shortcut
In order to solve the gradient disappearance or gradient
explosion problem of RNN and achieve long distance The concept of shortcut connection or residual
learning, [18] proposed Long Short Term Memory Network connection comes from ResNet [21], which effectively
(LSTM), which adds hidden states with a nonlinear alleviates the gradient disappearance problem and lays the
mechanism called cell state.LSTM uses simple The LSTM foundation for building deep networks. Networks with
uses a simple gating function to control the modification, shortcut connection are easier to optimize and have better
update or reset of the state, including input gate, output gate performance than normal networks.
and forget gate. The update equation is shown in (1): Inspired by this, this paper adds a layer with a single
LSTM as shortcut connection to the Conv-LSTM model, and
then concatenate the features learned from the two paths.
The Conv-LSTM-S model architecture as shown in Figure
1(c).
D. Error Metrics
To evaluate the model performance from multiple
perspectives, this paper uses RMSE and MAPE as evaluation
metrics. The RMSE is used as the loss function to train and
where denotes the sigmoid activation function, optimize the model. Its equation is shown in Equation (3):

denotes the gate weight, denotes the bias,


denotes the input vector, denotes the hidden state,
denotes the cell state vector, denotes the
forgotten gate vector, denotes the input gate vector,
where and denote the predicted power and
denotes the output gate vector, and denotes
the Hadamard product. actual power, respectively. is the number of samples.
CNN models have various architectures, and this paper is E. Day-Ahead Forcasting
designed based on the classical VGG model [19]. The
There are two types of forecasting methods in the PVPF
proposed Conv-LSTM model architecture in this paper is
task: direct forecasting and recursive forecasting. Among
shown in Figure 1(a).
them, direct forecasting can be further divided into
B. Conv-LSTM with Attention single-step forecasting and multi-step forecasting. In this
The attention mechanism model comes from the field of paper, we use multi-step forcasting, i.e., we use the previous
natural language processing, which enables better translation feature series data to predict the power for the following day,
of long sentences [20]. In the attention mechanism model, as shown in Figure 2.

the context vector ( ) is a weighted sum of the hidden


states , which represents the
information contained in the current time step. As shown in
Equation (2):

918
Authorized licensed use limited to: Bahria University. Downloaded on June 10,2023 at 19:26:52 UTC from IEEE Xplore. Restrictions apply.
feeding the data into a deep learning model. The processing
is usually done using a sliding window, and the final sample
data set has a shape of [samples, history length, features].
Section 4 of this paper compares the effect of using 5-day
historical series data and 7-day historical series data on the
model performance. Therefore the data SHAPE used in the
two experiments are [None,24 × ,4] and [None, 24 × 7, 4]
respectively.

Figure 1. Model Architecture: (a) Conv-LSTM, (b) Conv-LSTM-A, (c) Figure 3. Box diagram of Solar Radiation and Active Power feature
Conv-LSTM-S

Figure 2. Multi-step direct forcasting illustration

III. DATA CONSTRUCTION Figure 4. Solar Radiation and Active Power feature Plot using daily data
resampled
Original data process.The raw data used in this paper
come from one year of data from commercial PV plants, IV. EXPERIMENTS
which contain a large number of outliers. In the experiments
of this paper, a simple outlier process is used, i.e., data with A. Data Preprocessing
negative values in power and solar radiation are set to zero. The model input designed in this paper requires 4D input,
Resample data.The time interval of the raw data used in so the shape of the sample dataset needs to be reshaped. In
this paper is 15 minutes, and because the power varies less this paper, we use to slice (split) the historical time series
within an hour, the raw data are resampled into data with a into 4 sub-series to realize the adjustment of 3D sample
time interval of one hour according to the hourly average. dataset into 4D sample dataset.
The final data of 8737 records were obtained, and the box
plots of power and solar radiation in the data of each month B. Hyperparameter Fine-Tuning
are shown in Figure 3. The power and solar radiation In this paper, we use bayesian optimization [22] to
obtained by resampling at the daily average are shown in fine-tune hyperparameters of the model. Compared with grid
Figure 4. search and random search, bayesian optimization can find
Train and test data split.In this paper, the original data the optimal parameter configuration efficiently and quickly.
are divided into training set, validation set and test set with Taking the LSTM model as an example, the hyperparameter
the proportions of 70%, 20% and 10%, respectively. Then the space and the optimal configuration to be searched are
data are normalized use z-score method and the final data are shown in Table I. From this, we can see that the learning rate
in the range of [0,1]. is 0.0004, the optimizer is nadam, the batch size is 224, the
Feature selection.Considering the generalization number of LSTM layers is 2, the activation function is tanh,
capability and forcasting robustness of the model, this paper and the number of neurons are 32.
follows the general approach of selecting historical power, In this paper, all models are trained with early stopping
solar irradiance, panel temperature, and current hour as the set, and training is stopped when the loss on the validation
set of input features for the model from the 11 features of the set no longer changes in 30 iterations, and the maximum
original data. number of training is set to 150. before forcasting, the models
Sample data process.In the time series forcasting task, are retrained.Finally, the forcasting results are
the raw data needs to be processed into sample data before inverse-normalized.

919
Authorized licensed use limited to: Bahria University. Downloaded on June 10,2023 at 19:26:52 UTC from IEEE Xplore. Restrictions apply.
TABLE I. PARAMETERS CONFIGURATION OF LSTM AFTER As can be seen, using 5-day historical data for training,
FINE-TUNING
the RMSE of each model ranges from 12.00% to 12.85% and
Hyperparameters Choice Choose the MAPE ranges from 7.6585 to 7.9240, with Conv-LSTM-S
Learning rate [1e-4, 1e-1] 0.0004 obtaining the best performance. Using 7 days of historical
data for training, the RMSE of each model ranged from
Optimizer [Adagrad,Adam,Aadam] Nadam
11.13% to 12.83% and the MAPE ranged from 7.0030 to
Batch size [32, 224] 224
7.8645, with Conv-LSMT-S obtaining the best RMSE and
Activation [sigmoid, tanh] tanh Conv-LSTM-A obtaining the best MAPE.
Layers [1,3] 2 After increasing the length of the historical data, the
Units [16,128] 32,32 Conv-LSTM-A model obtained the largest performance
Epochs Early Stopping -- improvement with 9.12% improvement in RMSE and 9.39%
improvement in MAPE. the Conv-LSTM-S model also
TABLE II. EVALUATION METAICS FOR EACH MODEL obtained a larger improvement with 7.25% improvement in
RMSE and 5.66% improvement in MAPE. From Figure 3
Models RMSE MAPE
and Figure 4, it can be seen that the range of power
LSTM 0.1259/0.1122 7.9240/7.7617 generation for the data used in this paper is from 0 to 20 MW,
Conv-LSTM 0.1285/0.1283 7.8996/7.8645 and there are some extreme values and outliers. the MAPE
Conv-LSTM-A 0.1294/0.1176 7.7284/7.0030 of the Conv-LSTM-A and Conv-LSTM-S models is smaller,
Conv-LSTM-S 0.1200/0.1113 7.6585/7.2253 which indicates their better robustness to fluctuations in peak
power.
One-day ahead PV power forcasting was performed C. Day-Ahead Forcasting
using 5-day and 7-day historical series data, respectively, and
This section compares the effects of different historical
the model was retrained using the best hyperparameter
series lengths on model forcasting performance.
configuration for each model, and each metric after the
training was completed is shown in Table 2.

Figure 5. Forecast results using 5-day historical data

Figure 6. Forecast results using 7-day historical data

920
Authorized licensed use limited to: Bahria University. Downloaded on June 10,2023 at 19:26:52 UTC from IEEE Xplore. Restrictions apply.
In Fig 5, it can be seen that the solar radiation is highly calculation method for ultrashort-term solar PV power forecasting.
correlated with the power. The model is prone to high errors Energy Convers. Manag. 2018;157:123–35.
when predicting periods of excessive power fluctuations in a [2] Ismail AM, Ramirez-Iniguez R, Asif M, et al. Progress of solar
photovoltaic in ASEAN countries: a review. Renew Sustain Energy
short period of time. When predicting nighttime power, the Rev 2015;48:399-412.
model predicts negative values, and there are two reasons for [3] Blaga R, Sabadus A, Stefu N, et al. A current perspective on the
this result. First, the nighttime power was negative in the accuracy of incoming solar energy forecasting. Prog Energy Combust
actual collected data, but it was processed to a zero value Sci 2019;70:119-44.
during preprocessing, thus changing the trend of the data. [4] Agoua XG, Girard R, Kariniotakis G. Short-term spatio-temporal
Second, the activation function is tanh resulting in a range of forecasting of photovoltaic power production.IEEE Trans Sustain
[-1,1] for the output values and a large difference between Energy 2018;9(2):538-46.
the maximum and minimum values of power, resulting in a [5] Agoua XG, Girard R, Kariniotakis G. Probabilistic Models for
large volatility of the data. Spatio-Temporal Photovoltaic Power Forecasting[J]. IEEE
Transactions on Sustainable Energy, 2019, 10(2): 780-789.
In Fig 6, it can be seen that the model predicts the peak
[6] Barman M, Dev Choudhury N B. Season specific approach for
power more accurately when the side length of the historical short-term load forecasting based on hybrid FA-SVM and similarity
data series used.And, the LSTM model shows greater concept[J]. Energy, 2019, 174: 886-896.
fluctuations in predicting the nighttime power, which leads [7] Li Y, He Y, Su Y, et al. Forecasting the daily power output of a
to larger errors. Where the Conv-LSTM and Conv-LSTM-S grid-connected photovoltaic system based on multivariate adaptive
models are biased towards more conservative forcasting and regression splines[J]. Applied Energy, 2016, 180: 392-401.
thus obtain higher errors, and the Conv-LSTM-S is biased [8] Thorey J, Chaussin C, Mallet V. Ensemble forecast of photovoltaic
towards more aggressive forcasting, all three proposed power with online CRPS learning[J]. International Journal of
Forecasting, 2018, 34(4): 762-773.
models outperform the LSTM model.
[9] Cervone G, Clemente-Harding L, Alessandrini S, et al. Short-term
V. CONCLUSIONS photovoltaic power forecasting using Artificial Neural Networks and
an Analog Ensemble[J]. Renewable Energy, 2017, 108: 274-286.
In this study, we explored four models using two [10] Huang C, Kuo P. Multiple-Input Deep Convolutional Neural Network
different granularities of data for day-ahead PV generation Model for Short-Term Photovoltaic Power Forecasting[J]. IEEE
forecasting: 1) LSTM, 2) Conv-LSTM , 3) Conv-LSTM-A, Access, 2019, 7: 74822-74834.
and 4) Conv-LSTM-S. Using only four features, the [11] Zhou S, Zhou L, Mao M, et al. Transfer Learning for Photovoltaic
proposed models all obtained encouraging results. The Power Forecasting with Long Short-Term Memory Neural
Network[C]. 2020 IEEE International Conference on Big Data and
Conv-LSTM-A model achieves a maximum improvement of Smart Computing (BigComp), 2020: 125-132.
9.12% in RMSE and 9.39% in MAPE when increasing the [12] Sobri S, Koohi-Kamali S, Rahim N A. Solar photovoltaic generation
historical data from 5 to 7 days. The experimental results forecasting methods: A review[J]. Energy Conversion and
show that the CNN and LSTM models can model the PV Management, 2018, 156: 459-497.
power prediction problem and achieve comparable [13] Zang H, Liu L, Sun L, et al. Short-term global horizontal irradiance
performance. In a one-day ahead PV power prediction task forecasting based on a hybrid CNN-LSTM model with
using only one year of hourly data, at least 5 days of data are spatiotemporal correlations[J]. Renewable Energy, 2020, 160: 26-41.
required to achieve better results. [14] Wang F, Xuan Z, Zhen Z, et al. A day-ahead PV power forecasting
method based on LSTM-RNN model and time correlation
The best configuration of the model can be searched for modification under partial daily pattern prediction framework[J].
faster using bayesian optimization. All three proposed Energy Conversion and Management, 2020, 212: 112766.
models outperform the LSTM model, the Conv-LSTM-A [15] Wang K, Qi X, Liu H. Photovoltaic power forecasting based
model combined with the attention mechanism makes better LSTM-Convolutional Network[J]. Energy, 2019, 189: 116225.
use of historical information, and the Conv-LSTM-S model [16] Zhou H, Zhang Y, Yang L, et al. Short-Term Photovoltaic Power
combined with the shortcut connection extracts richer Forecasting Based on Long Short Term Memory Neural Network and
features and is easier to train. Both models improve the Attention Mechanism[J]. IEEE Access, 2019, 7: 78063-78074.
information flow, and therefore the forcasting results are [17] Shi X, Chen Z, Wang H, et al. Convolutional LSTM Network: a
better. The proposed model has good adaptability to the machine learning approach for precipitation nowcasting[C].
Proceedings of the 28th International Conference on Neural
maximum and minimum values of PVPF. Information Processing Systems - Volume 1, 2015: 802–810.
ACKNOWLEDGEMENT [18] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural
Comput, 1997, 9: 1735–1780.
This work was supported by National Key R&D Program [19] Simonyan K, Zisserman A. Very Deep Convolutional Networks for
of China (Intergovernmental Special Projects, Large-Scale Image Recognition[J], 2014, abs/1409.1556.
2019YFE0118400) and A project of Shandong Province [20] Bahdanau D, Cho K, Bengio Y J C. Neural Machine Translation by
Higher Educational Youth Innovation Science and Jointly Learning to Align and Translate[J], 2015, abs/1409.0473.
Technology Program (2019KJN029). [21] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image
Recognition[C]. 2016 IEEE Conference on Computer Vision and
REFERENCES Pattern Recognition (CVPR), 2016: 770-778.
[22] Snoek J, Larochelle H, Adams R. Practical Bayesian Optimization of
[1] Wang F, Zhen Z, Liu C, Mi Z, Hodge B-M, Shafie-khah M, et al.
Machine Learning Algorithms[C]. NIPS, 2012.
Image phase shift invariance based cloud motion displacement vector

921
Authorized licensed use limited to: Bahria University. Downloaded on June 10,2023 at 19:26:52 UTC from IEEE Xplore. Restrictions apply.

You might also like