You are on page 1of 13

Wind power prediction using ML

and DL methodologies
Arun Kumar M, Caroline Dorathy Esther J, Sabareshwaran M T,
Dr. M K Kavithadevi, Dr. Sriharipriya KC
Abstract
grain. However, despite its potential, wind
Wind power prediction is crucial for energy has historically faced economic
efficient utilization of renewable energy challenges, often being deemed costly and
sources and grid management. In this unpredictable compared to conventional
study, various machine learning (ML) and fuels like petroleum.
deep learning (DL) methodologies to
improve the accuracy of wind power In response to these challenges, the field of
forecasts are explored. The approach wind prediction has garnered significant
encompasses techniques such as attention in research circles, aiming to
Convolutional Neural Networks (CNN), enhance the accuracy and reliability of
Long Short-Term Memory Networks wind power forecasts. Traditionally, two
(LSTM), and dense models to predict wind main approaches have been employed for
power generation for the next 15 days. A wind prediction: Numerical Simulation
novel evaluation metric, the R2 score, is and Meteorological Information
also proposed to assess model Perception, and Historical Data Processing
performance. By leveraging these with Artificial Intelligence techniques. The
methodologies, it is aimed to enhance the former utilizes real-time meteorological
accuracy of wind power forecasting data to construct simulation models, while
beyond current standards, facilitating the latter leverages historical data and AI
better integration of renewable energy into algorithms for prediction.
the grid infrastructure.
In recent years, the advent of deep learning
has revolutionized wind power prediction,
enabling the development of sophisticated
Introduction models capable of capturing the complex
relationships inherent in wind data. These
With the global population expanding deep learning models, including
rapidly and fossil fuel reserves dwindling, Convolutional Neural Networks (CNN),
the imperative for clean, sustainable Long Short-Term Memory Networks
energy sources has never been more (LSTM), Bidirectional LSTM (BiLSTM),
pressing. In this context, wind energy Gated Recurrent Units (GRU),
emerges as a viable solution, offering a Bidirectional GRU (BiGRU), Deep
renewable and environmentally friendly Confidence Networks (DBN), and
alternative to traditional energy sources. Autoencoders (AE), have demonstrated
Harnessing the power of wind has been remarkable efficacy in solving nonlinear
integral to human civilization for centuries, problems and improving wind power
driving windmills to pump water and grind prediction accuracy.
In this study, delve into the realm of proposed optimization of LSTM using
machine learning (ML) and deep learning SFS-PSO model achieved the best results
(DL) methodologies for wind power with R2 equaling 99.99% in predicting the
prediction, aiming to contribute to the wind power values. Reference [2] aims to
advancement of renewable energy beat LSTM and TCN model accuracy by
integration and grid management. By incorporating the CEEMDAN algorithm.
exploring various modeling techniques and CEEMDAN (Complete Ensemble
evaluation metrics, it is endeavoured to Empirical Mode Decomposition with
enhance the reliability and efficiency of Adaptive Noise) This technique
wind power forecasting, thereby decomposes the original wind energy data
facilitating the transition towards a more into multiple Intrinsic Mode Functions
sustainable energy future. (IMFs) representing different time scales
and frequency components. CEEMDAN is
applied to decompose initial wind energy
data to cut down volatility and randomized
Literature review change. In the training process of LSTM
LSTM can capture complex temporal and TCN, LM is applied to discover the
dependencies inherent in wind speed data. ideal super parameters that should move
LSTM can remember past information for forward the exactness of the model and
an extended period which allows them to decrease those preparing the long haul of
capture these complex temporal the model. TCN excels at long-term
dependencies and make accurate prediction due to its ability to capture
predictions for the short-term. LSTM can long-range dependencies. The combination
also handle non-linearity and complex TCN and LSTM allows the model to
patterns in data very well. Recent studies capture both short-term and long-term
delve into utilizing LSTMs for wind trends in the wind energy data, potentially
prediction, frequently implementing hybrid leading to more accurate predictions. In
models with other techniques to enhance this model, LSTM predicts the short-term
accuracy. Concurrently, researchers strive components (IMF1 and IMF2) extracted
to refine the optimization process and by CEEMDAN and TCN predicts the
enhance data preprocessing, both long-term components (IMF3 to IMFn)
contributing to improved wind prediction extracted by CEEMDAN. CEEMDAN-
accuracy. Reference [1] presents a new TCN and CEEMDAN-LSTM have close
optimization technique based on stochastic accuracy of 91.2% and 92.5% respectively
fractal search and particle swarm while the accuracy of CEEMDAN-LSTM-
optimization (SFS-PSO) to optimize the TCN is 99.8%. Longer input data offers
parameters of the LSTM network. This more information but includes both
sequential hybrid architecture utilizes valuable and misleading elements. If the
LSTM for initial predictions, followed by learning rate is too small, the model will
SFS-PSO optimization to refine the results converge too slowly, the training time will
and elevate evaluation criteria. The be too long, or it will fall into a local
algorithm specifically incorporates SFS to minimum, while if the learning rate is
enhance the exploitation phase, leading to adjusted too large, the loss function may
improved performance and improved oscillate during the training process,
evaluation matrices such as mean absolute making it difficult to converge. Therefore,
error (MAE), Nash Sutcliffe Efficiency it is necessary to seek suitable
(NSE), mean square error (MSE), hyperparameters to balance various
coefficient of determination (R2), root constraints to achieve better prediction
mean squared error (RMSE). The results [3]. While the grid search method is
experimental results illustrated that the a common optimization algorithm it has
high time complexity. So multi-step grid as follows: Batchsize = 128; Learning rate
search is proposed in [3] to find good = 1 × 10−4; Using kaiming weight
parameters. It is divided into two steps. In initialization; Epoch = 400; Dropout = 0.1.
the first step, relatively good parameters The Gated Transformer model
are found over a wide range, with a larger outperformed all other models in terms of
amount of parameter updates each time. In prediction accuracy, achieving an
the second step the search scale is improvement of approximately 8% for
narrowed based on good values obtained in short-term forecasts and 11% for medium-
the first stage and the update step is to long-term forecasts. Reference [5]
smaller in the second stage reducing the proposes An EEMD and Transformer
total search time without missing too many hybrid forecasting system for multistep
probabilities. The proposed hybrid CNN- short-term wind speed forecasting based
LSTM model beats the traditional single only on the wind speed history data.
models like RNN, LSTM, and CNN EEMD decomposition strategy is
models in accuracy. MAE, RMSE, CC, employed in the data preprocessing stage
and R2 of CNN-LSTM are 0.4783, 0.6480, to effectively reduce noise pollution in the
0.9528, and 0.9070 respectively. While original data, the proposed model is
LSTMs excel at capturing temporal trained using a very large-scale wind speed
dependencies, they may struggle with data(19 years of data) and performance
long-term predictions (days or weeks) due evaluation is performed on wind speed
to the vanishing gradient problem. data throughout the year. This paper sets a
Reference [4] proposes a prediction model new benchmark for wind speed forecasting
based on a Gated Transformer for by achieving significantly lower prediction
medium- to long-term wind power errors compared to previous methods. At
prediction. In this paper (Numerical 3, 6, 12, and 24 hours, the model's average
Weather Prediction)NWP data information error (MAE) is 0.243, 0.290, 0.362, and
is introduced as auxiliary information, 0.453, respectively, representing a
improving data correlation and ensuring substantial improvement over the state-of-
the effectiveness of data feature extraction. the-art. This EEMD-Transformer model
The inclusion of NWP data significantly outperforms the EEMD-gated recurrent
improved the average accuracy of all unit (GRU) model in mean absolute error
prediction models by 117% for medium- (MAE), root mean square error (RMSE),
to long-term forecasts, as demonstrated in and R; MAE decreases by 3.5%, RMSE
this study. Dilated convolution serves as decreases by 4.7%, and R improves by
an effective tool for extracting features 0.0018, while mean absolute percentage
from lengthy sequential data. In the error (MAPE) increases by 0.91.
model's Encoder component, it handles
historical wind power and NWP data.
Leveraging multi-head attention and a
gating mechanism, the Encoder adeptly
captures crucial features. As a result, the
encoder produces an attention weight map,
which provides insights into the
distribution of information within the data.
The decoder predicts future wind power
values by taking the output of the encoder
and short historical wind power data as
inputs and uses cross-attention to leverage
information from the encoder. The
hyperparameter settings in this paper were
No Paper name Author Methodology Hyperparameters Strength and weakness Accuracy
(R2)

1 Wind Power Tarek, Zahraa & LSTM and SFS- ω, C1, C2, r1, r2, β, β’ The LSTM layer could MAE-
Prediction Shams, Mahmoud PSO model Batch size - 64, Learning potentially learn and 0.000002
Based on & Elshewey, (Stochastic Fractal rate - 0.0001, Epochs - exploit the temporal NSE -
Machine Ahmed & El- Search and Particle 50, Optimizer - Adam, patterns within the 1.2×10−7
Learning and kenawy, El-Sayed Swarm Activation function (in wind data to improve MBE-
Deep Learning & Ibrahim, Optimization ) output)- Linear, prediction accuracy. 0.00001
Models Abdelhameed & Activation function (in SFS technique can R2-
Abdelhamid, hidden(5 Layers))- ReLU, help select relevant 99.99%
Abdelaziz & El- KNN regressor features from the wind RMSE-
dosuky, Mohamed. n_neighbors = 2, weights = dataset, potentially 0.00002
distance, reducing noise and to
Bagging regressor boost training
n_estimators = 10, efficiency. PSO can
max_samples = 1, improve robustness by
GB regressor optimizing
n_estimators = 200, hyperparameters
learning_rate = 0.1

2 Prediction of Chenjia Hu, Yan LSTM and Time Not Mentioned CEEMDAN- MAE-
ultra-short-term Zhao, He Jiang, Convolution decreases the influence 0.136
wind power Mingkun Jiang, network (TCN) of non-stationary and MSE-
based on Fucai You, Qian with Complete random fluctuation in 0.059
CEEMDAN- Liu Ensemble the original data on the R2-
LSTM-TCN Empirical Mode prediction 0.998
Decomposition Accurately capture
with Adaptive both short-term and
Noise long-term
(CEEMDAN). dependencies using
LSTMs and TCNs,

3 Wind speed Zhipeng Shen, Hybrid LSTM- Learning rate = 0.011 and The hybrid approach MAE -
prediction of Xuechun Fan, CNN model that input length = 26 of LSTM and CNN 0.9870
unmanned Liangyu Zhang, uses Three convolutional layer- combined with grid R2-
sailboat based Haomiao Yu grid search with filter size is (1, 3), (1, 3) search and early 94.44%
on CNN and early stopping and (1, 2), the stride is (1, stopping offers RMSE-
LSTM hybrid 1). Filter 1 = x16 size(1,3) potential strengths for 1.3718
neural network Filter 2= x16 size(1,3) wind power CC-
Filter 3= x16 size(1,2) prediction: 0.9723
Dropout ratio = 0.3 LSTM Addressing both
cells 1= 10 LSTM cells 2= temporal and spatial
10 Activation function - aspects of the data
ReLU ,Optimizers - Adam, Finding the
Epochs- 30, Batch size- configuration that
128 minimizes prediction
error
Improving efficiency
by stopping
unnecessary
evaluations

4 Research on Huang, Qiyue & Gated Transformer Batchsize = 128; Learning Efficiently processes With
Wind Power Wang, Yapeng & rate = 1 × 10−4, Using long historical and NWP
Prediction Yang, Xu & Im, kaiming weight future NWP data MSE-
Based on a Sio-Kei initialization; Epoch = sequences relevant to 0.0555
Gated 400; Dropout = 0.1 wind power prediction. MAE-
Transformer Transformer models 0.1376
can be Without
computationally NWP
expensive, and the MSE-
addition of the gating 0.0631
mechanism might MAE-
further increase this 0.1408
complexity.
Use of NWP data
gives Improve data
correlation Improve
feature extraction

5 Multistep short- Huijuan Wu, Transformer model EEMD mode number= 16 Removal of start/end MAE-
term wind Keqilao Meng, with EEMD Learning rate= 0.003, tags (SOS/EOS) 0.167
speed Daoerji Fan, decomposition and Optimization- Adam, makes the model more RMSE-
forecasting Zhanqiang Zhang, specific Batch size= 512, suitable for fixed- 0.221
using Qing Liu adjustments for Dropout= 0.1 length prediction MAPE-
transformer wind speed Window size of 𝑿= 18 EEMD Potentially 22.40
prediction Sequence length of 𝑿 (𝑙𝑥) reduces noise and R-
= 24 improve model 0.9717
Model dimension= 128 generalizability and
Feedforward dimension can extract different
(𝑑𝑒)= 256 time-scale features
Heads number= 2 Transformer can
Encoder layers= 1 capture long-range
Decoder layers= 1 dependencies and is
Window size of 𝒀= 18 well-suited for
handling the sequential
data

6 Wind Power Wen-Hui Lin , Temporal Number of filter= 32 The proposed scheme MAPE =
Forecasting Ping Wang , Kuo- Convolutional Kernel size = 10 effectively solves the 5.13%
with Deep Ming Chao , Network (TCN) Dilations = [1,2,4,8,16] long-distance
Learning Hsiao-Chung Li, algorithm of DLNs Epochs = 50 dependency problem,
Networks: Zong-Yu Yang Optimizer = Adam as demonstrated by the
Time-Series and Yu-Huang Lai Number of stack = 4 input of large amounts
Forecasting NP = 10, F = 0.5, CR = of temporal–spatial
0.3, G = 30 series data such as
one-year wind power
data.
Methodology Feature engineering played a pivotal role
in enhancing the predictive capabilities of
the model. Key steps involved in feature
Wind power prediction is a pivotal area of engineering included: Date and time
research aimed at enhancing the efficiency information from the timestamp column
and reliability of renewable energy were extracted and engineered to capture
sources. In this project, an innovative temporal patterns inherent in the data.
methodology that integrates traditional Features such as year, month, day, hour,
Machine Learning (ML) algorithms and and minute were derived to provide the
Deep Learning (DL) models for wind model with additional temporal context.
power prediction is proposed, leveraging Domain-specific knowledge was leveraged
the TataPower dataset. This approach to engineer additional features that may
combines the strength of ML algorithms in impact wind power generation. Features
capturing intricate patterns with the related to weather conditions, geographical
capability of DL models, in handling location, and turbine specifications were
temporal dependencies. By amalgamating incorporated to enrich the dataset and
these techniques, it is aimed to surpass improve predictive performance. Features
conventional methods and achieve more that did not significantly influence the
accurate wind power predictions. target variable (power generation) were
identified and pruned from the dataset to
The methodology is initiated by collecting reduce noise and improve model
the TataPower dataset, encompassing efficiency. This process involved careful
multivariate time-series data related to analysis and domain expertise to retain
wind speed, wind direction, and power only the most relevant features for
generation, among other parameters. prediction. Numerical features were
Preliminary preprocessing steps involved normalized and scaled to ensure
handling missing values and ensuring uniformity in data distribution and prevent
uniformity in data formats. Date and time features with larger magnitudes from
information was meticulously extracted dominating the model training process.
and formatted to facilitate temporal This step was essential for stabilizing the
analysis. Furthermore, numerical features training process and improving
were scaled to normalize the data convergence during model optimization.
distribution, easing subsequent model Techniques such as principal component
training. analysis (PCA) or feature importance
ranking were employed to select the most
To gain insights into the dataset's informative features and reduce the
characteristics, an extensive Exploratory dimensionality of the dataset. By focusing
Data Analysis (EDA) phase is conducted. on the most relevant features, it is aimed to
Descriptive statistics and visualization streamline model training and enhance
techniques, including pair plots and predictive accuracy.
correlation heatmaps, were employed to
uncover underlying patterns and
relationships between variables.
Additionally, a wind rose plot was
constructed to visualize the relationship
between wind direction and wind speed,
providing valuable insights into directional
wind patterns.
the theoretical power curve provides a
decent estimate of actual power output, but
other factors besides wind speed also
influence the LV Active Power generation.
To extract more meaningful features, new
attributes like Week, Month, Season, Day,
and Hour are created from the existing
"Date/Time" column. Finally, categorical
features such as "Seasons" are encoded
using a dictionary for the machine learning
Figure 1: EDA models to understand them better.
Numerical features (power, wind speed,
A scatter matrix was plotted to visualize Theoretical Power Curve and LV
pairwise relationships between the ActivePower) are plotted against
numerical features like wind speed, categorical features (week, month, season)
theoretical power curve, wind direction using bar charts. This helps identify trends
and LV ActivePower(kW). in power generation based on these factors.
From the scatter plot between wind speed
and Theoritical Power Curve, it is seen that
Wind Turbines Systems can't generate any
power if the wind speed is less than 4 m/s
When the wind speed is larger than 4 m/s
to 11 m/s, the relation between them are
linear meaning that increasing the wind
speed, allows turbines to generate more
power after the wind speed passes 11 m/s,
the power generated is saturated on 3600
KWh. Figure 3: Numerical columns over the weeks

Figure 2: Colinear relation

The correlation coefficient between Wind


Speed and Theoretical power curve is very
close to 1, which indicates a very strong Figure 4: Numerical columns over the months
positive correlation. This means that as
wind speed increases, the theoretical
power curve prediction also increases. The
correlation between LV ActivePower and
Theoretical_Power_Curve is also positive
and relatively strong, but not as strong as
the correlation between wind speed and
theoretical power curve. This suggests that
hierarchical partitioning. XGBoost
Regressor offers scalability, speed, and
regularization techniques for handling
diverse datasets. XGBoost with Random
Forest (XGBRF) Regressor combines
strengths for enhanced performance and
robustness. Each model contributes to a
comprehensive approach to wind power
prediction, leveraging their respective
strengths to capture the dataset's
Figure 5: Numerical columns over the seasons complexity effectively.

Ensemble regression models are chosen Several regression algorithms were considered
for their ability to capture complex for predicting wind turbine power generation:
relationships within the dataset. The
ensemble models selected for this study 1.Gradient Boosting Regressor (GBR): An
include Gradient Boosting Regressor, ensemble learning technique that builds
Support Vector Regressor (SVR), Random sequential trees, minimizing the residual errors
from the previous tree. The predicted output is
Forest Regressor, Linear Regression, Extra
the sum of predictions from multiple weak
Trees Regressor, AdaBoost Regressor, learners.
Decision Tree Regressor, XGBoost
Regressor, and XGBoost with Random
[ ]
K
Forest (XGBRF) Regressor. These models y i=∑ ❑f k ( x i )
^
offer diverse approaches to capturing the k=1

underlying patterns in the data and Where, ( ^


y i ) is the predicted output for the (
mitigating overfitting. ith) instance, ( K ) is the number of weak
learners (trees) and ( f k ( x i ) ) represents the
In selecting ensemble regression models prediction of the ( kth) weak learner for the
for wind power prediction, each model ( ith) instance.
was chosen for its unique strengths in
capturing different aspects of the dataset. 2. Support Vector Regressor (SVR): Utilizes
Gradient Boosting Regressor offers support vectors for regression. SVR finds the
sequential model fitting, reducing bias and hyperplane that best fits the data points while
variance in predictions. Support Vector maximizing the margin.
Regressor (SVR) effectively handles high-

( )
dimensional data and captures complex 1
n

nonlinear relationships. Random Forest [ minimize | w |2+ C ∑ ( ξ i +ξ ¿i ) ]


2 i=1
Regressor is robust to overfitting and
Where, ( w ) are the weights of the
computationally efficient, suitable for
hyperplane, ¿) is the regularization
large datasets. Linear Regression provides
parameter and (n) is the number of data
simplicity and interpretability, serving as a
points.
baseline for comparison.
3. Random Forest Regressor (RFR): An
Extra Trees Regressor utilizes ensemble ensemble learning method using decision
learning for improved robustness and trees. It builds multiple decision trees during
generalization. AdaBoost Regressor training and outputs the mean prediction of the
focuses on improving weak learners' individual trees.
performance through sequential training.
Decision Tree Regressor captures 1
N

nonlinear relationships in data through [^


y i= ∑ f (x )]
N j=1 j i
Where, ( ^
y i ) is the predicted output for the variable based on the decision rules learned
th from the data features.
( i ) instance, ( N ) is the total number of
trees and ( f j ( x i )) represents the prediction y i=predict ( x i ) ]
[^
of the ( j th) tree for the ( i th) instance.
Where, ( ^ y i ) is the predicted output for the
( i ) instance and ( predict ( x i) ) represents
th
4. Linear Regression: A linear modeling
technique that assumes a linear relationship the prediction of the decision tree for the
between the input features and the target th
variable.
( i ) instance.

[^
y i= β0 + β 1 x i 1 + β 2 x i 2+ …+ β p x ip ] 8. XGBoost Regressor: A gradient boosting
algorithm that optimizes the mean squared
Where, ( ^ y i ) is the predicted output for the error objective function by adding weak
learners.
( i ) instance, ( β 0) is the intercept and
th

( β 1 , β 2 , … , β p \) are the coefficients of the K


input features ( x i 1 , x i 2 , … , x ip ¿ ) y i=∑ f k ( x i ) ]
[^
k=1
respectively.
Where, ( ^
y i ) is the predicted output for the (
5. Extra Trees Regressor (ETR): An ensemble ith) instance, ( K ) is the number of weak
learning algorithm similar to Random Forest, learners (trees) and ( f k ( x i ) ) represents the
but with random splits. prediction of the ( kth) weak learner for the
N
( ith) instance.
1
[^
y i= ∑ f (x )]
N j=1 j i 9. XGBRF Regressor: A variant of XGBoost
Where, ( ^ y i ) is the predicted output for the designed for random forests, which applies a
th similar boosting approach but utilizes
( i ) instance, ( N ) is the total number of randomization in the tree-building process.
trees and ( f j ( x i )) represents the prediction
K
of the ( j th) tree for the ( i th) instance.
y i=∑ f k ( x i ) ]
[^
k=1

6. AdaBoost Regressor: A boosting algorithm


that builds a strong model by combining Where, ( ^
y i ) is the predicted output for the (
multiple weak learners. It iteratively adjusts ith) instance, ( K ) is the number of weak
the weights of misclassified samples to focus learners (trees) and ( f k ( x i ) ) represents the
on the harder cases. prediction of the ( kth) weak learner for the
( ith) instance.
[ F m ( x ) =Fm −1 ( x )+ α m h m ( x ) ]
Model training and evaluation are
Where, ( F m ( x )) is the ensemble prediction conducted iteratively for each ensemble
after ( m) iterations, ( α m ) is the weight of regression model. The training data
the ( mth) weak learner and ( hm ( x ) ) (`X_1`, `y_train`) are used to fit the model,
while the testing data (`X_1_test`, `y_test`)
represents the prediction of the ( mth) weak
are employed to assess model
learner. performance. Evaluation metrics such as
the coefficient of determination (R² score)
7. Decision Tree Regressor: A non-parametric and Root Mean Squared Error (RMSE) are
model that predicts the value of a target
calculated to gauge the model's goodness
of fit and prediction accuracy. 2. AutoRegressive Model: In this
paradigm, the model makes predictions
Hyperparameter tuning is essential for iteratively, with each output being fed
optimizing the performance of each back as input for the subsequent prediction
regression model. A hyperparameter step. This method is adept at capturing
search space is defined for each model, dynamic temporal dependencies within the
including parameters such as data.
`n_estimators`, `max_depth`,
`learning_rate`, `min_child_weight`, and A `WindowGenerator` object is
`base_score`. Randomized Search Cross- instantiated to facilitate the generation of
Validation is employed to efficiently data slices from the dataset, catering to the
search over the parameter grid and find the requirements of both approaches.
optimal hyperparameters that maximize
the R² score. Two baseline models are devised to
establish a performance benchmark against
After hyperparameter tuning, the best- which the more complex models can be
performing model is selected based on the evaluated:
highest R² score obtained. The selected
model's performance is further evaluated Last Baseline: This model simply repeats
using k-fold cross-validation to assess its the last input timestep for the required
generalization capability. Visualization number of output timesteps. It serves as a
techniques are utilized to compare the straightforward yet intuitive baseline for
cross-validation scores of different models, comparison purposes.
facilitating the identification of the most
suitable model for wind power prediction. Repeat Baseline: Here, the previous 15
days' data is replicated, assuming that the
subsequent 15 days will exhibit similar
To facilitate the training process of the patterns. This simplistic model offers a
models, a function basic estimation strategy.
`compile_and_fit(model, window,
patience=3)` is crafted. This function These baseline models provide initial
streamlines the compilation and training insights into the predictive capacity of the
steps. It incorporates early stopping via subsequent models.
`tf.keras.callbacks.EarlyStopping` to
mitigate overfitting. The models are Several single-shot models are developed,
compiled using Mean Absolute Error each with its unique architecture and
( MAE ) as the loss function and Adam capabilities:
optimizer with a learning rate of ( lr=0.01
). Linear Model: This model predicts the
entire sequence in one step, leveraging
linear layers. It reshapes the output to
For multi-step prediction, two distinct conform to the desired output shape.
strategies are explored:
[ y linear=W linear × x+ blinear ]
1. Single Shot Predictions: This approach where( W linear ) represents the weight
entails predicting the entire timeseries in matrix, ( x )denotes the input, and ( b linear ) is
one step. It is suitable for scenarios where the bias vector.
a sequence of future values needs to be
forecasted collectively.
Dense Model: Adding dense layers ~
[ C t =f t ⊙C t −1 +i t ⊙ C t ]
between the input and output enhances the
model's capacity to capture intricate [ h t=ot ⊙ tanh ( Ct ) ]
patterns in the data. This architecture
introduces nonlinear transformations to the
prediction process, potentially improving
performance. where ( i t ), ( f t ), ( ot ), ¿), ( C t ), and ( h t−1 ) r
epresent the input gate, forget gate, output
[ y dense =σ ( W dense × x +b dense ) ] gate, cell input, cell state, and hidden state
at time ¿) respectively. ( W i), ( W f ), ( W o), (
where ( σ ) represents the activation W C ) denote the weight matrices, ( x t )
function, ( W dense) denotes the weight represents the input at time (t), and ( b i),
matrix, ( x ) is the input, and ( b dense) is the ( b f ), (b o), (b C) are the bias vectors.
bias vector.
These models are meticulously trained and
CNN Model: By incorporating a evaluated to gauge their effectiveness in
convolutional layer, this model is adept at predicting multi-step sequences.
capturing local patterns and dependencies Hyperparameter tuning and iterative
within the data. The convolutional refinement are employed to optimize
operation enables the model to extract model performance further.
spatial features, making it particularly
suitable for sequential data analysis. Results

[ y conv =σ ( W conv∗x +b conv ) ] Dataset Description

The dataset provides a comprehensive


where (∗¿ ) denotes the convolution overview of wind turbine performance and
operation, ( σ ) represents the activation environmental variables. It comprises four
key features: "LV ActivePower (kW),"
function, (W conv ) is the convolutional
"Wind Speed (m/s),"
kernel, ¿) is the input, and ¿) is the bias
"Theoretical_Power_Curve (KWh)," and
vector.
"Wind Direction (°)." The "LV
ActivePower (kW)" column denotes the
RNN Model: Leveraging LSTM layers,
actual power output generated by a wind
this model focuses on capturing temporal
turbine, measured in kilowatts (kW). Its
dependencies within the data. The
values range from a minimum of -
recurrent nature of LSTM enables it to
2.471405 kW to a maximum of
retain information over time, facilitating
3618.732910 kW, with a mean of
the prediction of multi-step sequences.
approximately 1307.68 kW and a standard
deviation of 1312.46 kW. The "Wind
[ i t =σ ( W i ⋅ [ h t−1 , x t ] + bi ) ] Speed (m/s)" column quantifies the
velocity of wind at the turbine location,
[ f t =σ ( W f ⋅ [ ht −1 , x t ] +b f ) ] expressed in meters per second (m/s).
Wind speeds vary from 0 m/s to
25.206011 m/s, with a mean of
[ ot =σ ( W o ⋅ [ ht−1 , x t ] + bo ) ] approximately 7.56 m/s and a standard
deviation of 4.23 m/s. The
~
[ C t =tanh ( W C ⋅ [ ht−1 , x t ] + bC ) ] "Theoretical_Power_Curve (KWh)" likely
denotes the projected power output of the
turbine under optimal conditions,
measured in kilowatt-hours (KWh). Its
values span from 0 KWh to 3600 KWh,
with a mean of approximately 1492.18
KWh and a standard deviation of 1368.02
KWh. Lastly, the "Wind Direction (°)"
column provides insights into the
prevailing wind direction at the turbine
site, indicated in degrees (°). Wind
direction values range from 0° to
359.997589°, with a mean direction of
approximately 123.69° and a standard
deviation of 93.44°.

Figure 7: Correlation matrix

"Theoretical_Power_Curve (KWh)," and


"Wind Direction (°)." Notably, "LV
ActivePower (kW)" demonstrates strong
positive correlations with both "Wind
Speed (m/s)" (approximately 0.91) and
"Theoretical_Power_Curve (KWh)"
(approximately 0.95), indicating that
higher wind speeds and theoretical power
output correspond to increased active
power generation. Additionally, "Wind
Speed (m/s)" and
Figure 6: Scatter matrix "Theoretical_Power_Curve (KWh)"
exhibit a strong positive correlation
The correlation matrix offers insights into (approximately 0.94), suggesting that
the relationships among the variables "LV higher wind speeds lead to greater
ActivePower (kW)," "Wind Speed (m/s)," theoretical power output. Conversely,
"Wind Direction (°)" shows weak
correlations with the other variables,
implying minimal linear relationship with
active power, wind speed, and theoretical
Model Name R2_score RMSE power output.
Among the models assessed, the
GradientBoosting 94.646846 302.227414
Regressor
RandomForestRegressor,
ExtraTreesRegressor, and XGBRegressor
RandomForest 97.319413 213.867095 stand out for their notable strengths. These
Regressor
models demonstrate high R2 scores,
ExtraTrees 97.667454 199.500571
indicating their ability to effectively
Regressor
capture the variance in wind power.
DecisionTree 95.247386 284.770665
Regressor
Moreover, their low RMSE values signify
a high level of accuracy in predicting wind
XGBRegressor 97.965704 186.309943 power output. This suggests that these

XGBRFRegressor 94.158605 315.709240

Table 2: ML model evaluation metrics


models could be valuable assets in Model, and LSTM Model, the Dense
accurately forecasting wind power, a Model exhibited the most promising
critical aspect of renewable energy performance. With additional dense layers
management. While other models such as for more complex transformations, the
GradientBoostingRegressor and Dense Model achieved the lowest MAE
DecisionTreeRegressor also exhibit and RMSE compared to the other single-
respectable performance, they may fall shot models. This indicates that the Dense
short in terms of accuracy when compared Model was able to capture more nuanced
to the top-performing ones. Further patterns and dependencies in the data,
optimization through techniques like cross- leading to more accurate predictions.
validation and hyperparameter tuning can
While the Linear Model and CNN Model
enhance the reliability and robustness of
performed relatively close to each other,
the selected models, ensuring their
and the LSTM Model slightly
effectiveness in real-world wind power
underperformed compared to the others,
prediction scenarios.
they still provided competitive results.
Each model architecture has its own
strengths and weaknesses, with the CNN
Model focusing on spatial patterns and the
LSTM Model specializing in capturing
temporal dependencies.
Overall, the Dense Model emerged as the
top performer in terms of MAE and
RMSE. However, further analysis is
warranted to explore additional factors
such as computational efficiency,
Figure 8: Wind turbine power production
prediction
interpretability, and generalization
performance on unseen data. Additionally,
it would be valuable to conduct more
extensive experimentation and fine-tuning
DL Methodology
to optimize the models further and
In evaluating the performance of the wind potentially uncover new insights into wind
power forecasting models, several key power forecasting.
metrics were considered including Mean
Absolute Error (MAE) and Root Mean
Squared Error (RMSE). The baseline
models, Last Baseline and Repeat
Baseline, provided important reference
points. The Repeat Baseline outperformed
the Last Baseline, which is expected as it Figure 9: Repeat baseline model
leverages previous data to make
predictions. However, both baseline
models were outperformed by the single-
shot models.
Among the single-shot models, including Figure 10: Dense model
the Linear Model, Dense Model, CNN

You might also like