You are on page 1of 11

Nanyang Research Programme

Building Energy Prediction and Optimization using AI


(CEE02)
Li Kaijie

Hwachong Institution

Asst Prof Zhang Limao


Nanyang Technological University
School of Civil and Environmental Engineering

Abstract cedure in order to tune each model to its best per-


formance. By cross-referencing these models and
This project focuses on using machine learn-
the results they produced, Catboost turned out to
ing to obtain an accurate prediction and applying
be the better ones, and a Coefficient of determi-
it in energy engineering. The dataset contains 4
nation at 0.935 was achieved in the end. More-
years of electric consumption, generation, pricing,
over, different features are categorized into differ-
and weather data in Spain. Consumption and gen-
ent groups by their intrinsic attributes, and impor-
eration data were retrieved from ENTSOE, a pub-
tance in determining the actual electricity price is
lic portal for Transmission Service Operator (TSO)
analyzed. Therefore, it is possible to investigate
data [1]. Settlement prices were obtained from the
the top factors influencing the price. Real-life ap-
Spanish TSO Red Electric España [2]. Weather
plications lie in the information that these factors of
data was acquired from the Open Weather API for
great significance can be taken special care of by
the 5 largest cities in Spain by an Internet user and
the government to secure a stable electricity price.
made public on Kaggle [3]. The aim is to use the
technique of machine learning to predict electricity
1 INTRODUCTION
prices accurately based on the relevant data avail-
able. After trials of comparison of different models, Energy is an indispensable part of all human ac-
Catboost, a high-performance open-source library tivities in this heavily industrialised modern world.
for gradient boosting on decision trees was found Polluting fossil fuels, such as petrol and gas, will
to be well-performed. It is able to produce predic- remain to be the main source of energy and con-
tions of decent accuracy when forecasting the elec- tribute to 90% of the increase in energy demand in
tricity price in Spain. the future [4]. Furthermore, the storage of natural
Data processing is the first step for machine resources on earth is limited and will one day be
learning, where irrelevant and redundant informa- depleted. Thus the future energy supply may not
tion is removed from the two datasets. Distribution be able to meet humans’ demand [5]. Fortunately,
for data of each feature is individually visualized it is possible to use machine learning to protect the
by plotting tools to provide a simple yet informa- environment, as it would be a strong tool in reduc-
tive presentation of the dataset. Several machine ing greenhouse gas emissions and helping society
learning models were tested in trying to find the adapt to a changing climate. Machine learning can
best estimation, and grid search is used in this pro- collaborate with other fields to tackle challenges to-

1
gether, such as smart grids and disaster manage- ent results, which makes the decision making and
ment [6]. Similarly, machine learning can also be model selection much harder [10]. Thus, the ma-
applied to improve energy usage efficiency and cut chine learning method is commonly used instead to
energy consumption, as well as helping the tran- evaluate the parameters of different types of build-
sitioning to a renewable-energy-based infrastruc- ings. For instance, the decision tree method has
ture. achieved high accuracy for predictions of building
It would be effective if we start by analysing the energy demand and with less time complexity [11].
energy consumption in a city first, as cities is a Gradient boosting is a machine learning mech-
basic unit in the modern economy and their sea anism that uses two particular steps to minimise
of information makes data analysis feasible. Also, the error of predictions. It exploits the least squares
urban buildings contribute to a considerable pro- to simplify the loss function and then solve it by
portion of the global energy consumption, 40% in tuning single variables [12]. CatBoost, an open-
Europe, for example [7]. The most common and source software library developed by Yandex, is a
widely used are electricity and petroleum, the for- machine learning algorithm that uses gradient boost-
mer for many technological products, the latter for ing on decision trees [13]. The combination of algo-
transportation and industrial productions. If we can rithmic techniques presented in Catboost results in
identify factors with the most strong relationship with great performances in evaluating a variety of datasets.
energy, we will understand the mechanics of en- Two important features introduced in CatBoost are
ergy consumption more deeply, hence manage and the implementation of ordered boosting, a permutation-
improve energy consumption more effectively. More- driven alternative to the usual algorithm, and an
over, it is also possible to construct a predicting original algorithm for processing categorical data.
model using machine learning, forecasting the de- These two methods were designed to deal with a
mand for energy given input of related information, prediction shift caused by a specific target leakage
or vice versa, such as predicting fuel prices based in most gradient boosting algorithms implementa-
on the demand and supply relationship. tions.
Machine learning is widely used for its powerful The aims of this research are: (1) To select a
ability in finding the linear and non-linear relation- better machine learning model for prediction of en-
ship between multi-dimensional variables and pro- ergy generation and energy prices; (2) To find out
duce reliable predictions with high accuracy, and it how to achieve an accurate prediction of energy
is commonly applied in all scientific domains, such price given a dataset of correlated variables; (3)
as Natural language processing (NLP) and push To examine the strength of the influences of differ-
notification [8]. Machine learning can help to con- ent parameters on energy price; (4) To explain the
struct models to predict, for example, energy out- correlations under the real-life context; (5) To give
put level, or find out the influencing factors behind possible suggestions regarding how to manage en-
the demand for energy and all the correlations. Such ergy prices based on the results of the above re-
results can give valuable insights that are hard to search questions. To solve these questions, this
obtain using other methods, thus it would be mean- paper tests several machine learning models to find
ingful to apply machine learning in the field of en- out the most suitable algorithm, evaluate the signif-
ergy. icance of separate variables and interpret the con-
In this modern era, energy must be taken into tribution from each of them. This research con-
consideration for the conservation of nature and tributes to
sustainable development, such as by maximising The rest of the paper is organized as follows.
the efficiency of using energy, decreasing wastage Section 2 reviews some existing researches about
and changing to renewable energy [9]. Yet search- applying machine learning in studying energy. Sec-
ing for optimal models solely by high-accuracy sim- tion 3 presents the gradient boosting algorithm of
ulation would be extremely time consuming and dif- CatBoost with detailed step-by-step procedures. Sec-
ferent simulation software are likely to give differ- tion 4 shows the feasibility and effectiveness of the

2
proposed method in a case study about Spain’s able to deal with null information and will report er-
electricity price from 2015 to 2018 and discusses rors. Due to the unfavourable nature of missing
the merits of Catboost over other algorithms. Sec- data, sets of data containing missing information
tion 5 evaluates the correlations of different vari- are usually discarded. Another common treatment
ables with the prediction output and identity the most is to replace all those missing values with the same
important parameters. base entry, such as ‘0’ or ‘yes’, depending on the
nature of that specific data group.
2 AIMS & OBJECTIVES A second strategy is to get rid of redundant in-
The aim of this project is to use AI to predict formation. In the raw dataset, it is highly likely that
the energy consumption of buildings, find out a ma- not all of the data is relatable and useful for pre-
chine learning algorithm that produces accurate pre- diction, and their existence can harm the effective-
dictions and analyse the results. ness of machine learning. To recognize what kind
of data is redundant, one way is to check the rel-
3 METHODOLOGY ativity with respect to the prediction objective. It
is also efficient to find out redundant information
3.1 Data Preprocessing
by looking at repeated information. In principle, re-
The first step is to get sufficient information about
peated information will not only make the algorithm
the field under study, which often takes the form of
run slower, but also cause overfitting by providing
datasets. There are several means to collect data
excessive information.
which can be categorized into two types: primary
Another method to make the raw data more ac-
research and secondary research. Primary data
cessible is to regulate its data types. The origi-
is usually collect via methods of experiments and
nal dataset can contain information of various for-
quasi-experimental designs, surveys and question-
mats, such as floats, objects or strings. Usually, it
naires, interviews and observation, and specifically
is most convenient and effective to convert all data
in the fields of engineering, computer simulation
into numerical values for the machine learning al-
is also frequently used. As for secondary data, it
gorithms to learn. However, there do exist some al-
refers to materials previously obtained, shared or
gorithms that allow researchers to study data other
made public by researchers and organisations [14].
than numerical formats, so the transformation of
After attaining the raw data for analysis purposes,
data types is very flexible depends on what algo-
processing with the raw data is necessary. This
rithm one is using.
is because the original data is likely to be imper-
Last but not least, it is essential to discard ex-
fect due to various factors such as human bias and
treme values as they can greatly undermine the
random errors, hence it conveys some incorrect in-
precision and accuracy of the machine learning model.
formation and noises which will carry the errors into
the final prediction. Moreover, removing redundant 3.2 Catboost
data from the dataset is also favourable as it can re- Gradient boosting is a strong machine learning
duce the calculation needed to build the prediction algorithm that produces ultramodern prediction re-
model and makes the program run faster. sults, especially for learning tasks with heteroge-
After obtaining the raw data, several strategies neous features, noisy data, and complex depen-
can be adapted to deal with it, in order to make dencies. It is supported by the theory that strong
the dataset more succinct and effective for training predictors can be constructed by an iterative com-
purposes. bination of weaker predictors via a greedy proce-
The first strategy is to look at the completeness dure that corresponds to gradient descent in a func-
of the data. One single array of data usually has tion space [15]. A classic gradient boosting is con-
multiple dimensions, so it actually carries a set of structed as follows: (1) To assume a parameter-
information. Missing data brings great inconvenience ized form and perform parameter optimization. (2)
for machine learning as most algorithms are un- Using iteration and choosing a parameterized class

3
that is almost parallel to the unconstrained negative by production shift: ordered boosting [17]. At the
gradient. (3) Perform the line search and update start, CatBoost generates s + 1 independent ran-
the approximation [16]. Fig. Catboost is a gradi- dom permutations of which σn for positive interg-
ent boosting model that shows good performance ers n are used for evaluation of tree splits, while
and is available as an open-source. It has both σ0 serves to determine the leaf values bj of the ob-
CPU and GPU implementations for the training and tained trees.
scoring of models respectively. After that, leave values of the final mode F are
Variables with a set of discrete values are named calculated by gradient boosting, and TS is calcu-
to be categorical features. Compare to the com- lated using permutation σ0. Catboost is also able
mon practice of one-hot coding or assigning label to reduce its complexity by only updating the values
values, Catboost adopts a more effective strategy of Mr,j (i)(i) := Mr,2j (i) for j = 1, ..., [log2 n] where
that reduces overfitting greatly. It performs random all i with σr (i)≤2j + 1. In this way, the complexity is
permutations of a dataset and computes the aver- reduced from O(sn2 ) to O(sn)
age label values for examples with the same cate- Furthermore, Catboost also adopts the method
gory value. Let σ = (σ1, . . . , σn) be the permuta- of subsampling and uses 5-fold cross-validation to
tion, then xσn ,k is substituted with. tune the parameters.
Pp−1
n=1 [xσn ,k = xσp ,k ]Y σn + a × P
Pp−1 (1)
n=1 [xσn ,k = xσp ,k ] + a

A prior value P and a parameter a > 0, which is


the weight of the prior are added to reduce noises
from low-frequency data. Moreover, Catboost cal-
culates leave values use an innovative schema that
avoids overfitting caused by the permutations men-
tioned above. The combination of a few categori-
cal variables can reduce the information lost when
converting to numerical values, yet the total num-
ber of possible combinations grows exponentially. Figure 1: 5-fold Cross-validation
Therefore, Catboost chooses combinations in a greedy
manner: No combinations are considered for the Catboost also records leaf index as binary vec-
first split in the decision tree. For the next splits, tors with the length equals to the tree depth and
CatBoost combines all combinations and categori- converts used float features, statistics and one-hot
cal features present in the current tree with all cat- coded features into binary values and store them
egorical features in the dataset. Catboost also cre- in a continuous Vector B. This treatment makes it
ates combinations by treating splits as one single a fast scorer.
categorical variable. Last but not least, the simple
3.3 Patameter tuning
yet useful technique of substituting categories with
A machine learning model has many parame-
their frequencies of appearance.
ters that can be changed to produce different pre-
Target statistics(TS) applied in many gradient
dictions,and different combinations of parameters
boosting algorithms that estimate expected target
are able to produce different prediction values. The
value in each category will lead to target leakage
process of finding the optimal values of parameters
and a conditional shift, which means the predic-
is called parameter tuning.The official websites of
tion values differs for training and testing data.To
many machine learning algorithms will explain all
improve its performance in the presence of target
the parameters in great detail and sometimes high-
leakage caused by greedy gradient boosting, Cat-
light those that have greater impacts on the predic-
boost proposed an algorithm that is not troubled
tion outcome, so it is possible to change the param-

4
eters accordingly one at a time manually to find the are available as a complete collection. The data
optimal combination. is presented in terms of comma-separated values
However, tuning parameters one by one is of format (CSV) files and can be downloaded for free.
very low efficiency and may not produce the best Data about the generation as well as consumption
results, thus the method of Grid search is proposed of different types of energy is obtained from the of-
for hyperparameter optimization. Grid search can ficial website of the European Network of Trans-
make complete testing over a given hyperspace mission System Operators for Electricity(ENTSO-
of limited range and select the best combination E), prices for electricity is attained from Red Eléc-
within. Although Grid search requires a lot of cal- trica de España(REE), Spanish cooperation which
culation and is extremely time-consuming, its result is in charge of Spain’s domestic power transmis-
is certainly parallelized. sion system and the weather data was purchased
To exam the model performance in a mathe- and made public by a Kaggle use named ‘Nicholas
matical way, three statistical indexes are calculated Jhana’. All the weather data is from five major Span-
and referenced, namely coefficient of determina- ish cities, which are Valencia, Madrid, Bilbao, Barcelona
tion (R square). root mean square error (RMSE) and Seville respectively. All in all, the available
and main absolute error (MAE). R2 measures the data is consist of information on energy generation
percentage of the variance in output that can be ex- and weather conditions in terms of tens of thou-
plained by the variance in input. Equation Equation sands of rows, and to achieve good performance
Equation for the lateral prediction using machine learning tech-
niques, preprocessing of these data is necessary.
3.4 Sensitivity Analyses
Firstly, the two CSV files are to be integrated
It is important to estimate and understand the
into one for convenience. However, if directly com-
influence of different features on the prediction out-
bining the two files together, it will lead to incor-
put and eventually explain them. We decided to
rect information as although both data files con-
use a new and effective method–SHAP (SHapley
tain data sequenced time wisely from 2015 to 2018
Additive exPlanations). SHAP is a Python library
each hour, there may exist missing and repetitive
and a game theoretic approach to explain the out-
data which will inevitably lead to the main array
put of any machine learning model. It connects op-
of information. Moreover, the energy dataset con-
timal credit allocation with local explanations using
tains energy-related data for five major cities in Spain,
the classic Shapley values from game theory and
which makes it has approximately five times the
their related extensions.
rows of the weather dataset. Hence, a crucial step
4 CASE STUDY before merging the data is to ensure both files has
exactly one set of data for any time point. There-
A dataset that contains four years of electrical
fore, rows with the same time index are removed so
consumption, generation, pricing, and weather data
that each time value becomes unique. After that,
of five major cities in Spain is used for case study
a brief checking shows that datasets have 35064
purposes. This segment consists of parts, includ-
and 175320=5*36064 respectively, which equals
ing data processing, comparison among different
the total number of hours from 2015 to 2018. This
machine learning models, training and forecasting
implies that every hour is included in the datasets
by using the CatBoost method and analysis of the
and there are no missing rows. Additionally, be-
results with suggestions. The detailed description
cause of the chronological sequencing in the weather
of these parts will be fully covered in 4.1, 4.2, 4.3,
file and the previous proving of the completeness
and 4.4.
of the time values, it is easy to identify the range of
4.1 Processing of Data rows for each city, which are 1 35064, 35065 70128,
In a dataset named ‘hourly energy demand gen- 70129 105192, 105193 140256, 140257 175320 to
eration and weather ’, information about the energy be specific. Knowing this, information in the weather
generation, weather conditions and settlement prices file can be reorganized in such a way that all in-

5
formation is conserved while reducing the number function, and the result is shown in fig. The three
of rows by 5 times, by putting the data of cities columns of ‘generation hydro pumped storage ag-
horizontally instead of vertically. The name of the gregated’, ‘forecast wind offshore day ahead’ and
cities are also regarded as irrelevant information ‘NAN’ have most of their values missing, hence
and dumped. these columns are completely deleted. Yet many
columns contain only a few hundred missing val-
ues, which is a small proportion but requires dump-
ing as well, as missing values will certainly lead to
an error when training a machine learning model.
In this case, the dropping of values is performed in
rows rather than columns, by applying the ‘dropna’
function, setting ‘axis=0’ which makes the dropping
happens in rows, together with letting ‘how=’any’,
which is to discard rows as long as one of its slots
Figure 2: Transformation of the energy generation has a missing value. In this way, all of the miss-
file
ing values are gotten rid of while useful data are
preserved as much as possible.
Extracting time information. The column con-
The next step is to get rid of redundant informa-
taining time data in the raw data is of string for-
tion. For temperature, the columns of ‘temp00 min’
mat and contains special signs such as ‘-’ and ‘:’
(minimum temperature) and ‘temp_max’ are dis-
which cannot be treated as numbers. Therefore,
carded as the difference for temperature within one
it is necessary to extract information from it. For
hour is negligible hence they contain similar infor-
instance, the first slot of the time column has a
mation with the ‘temp’ column. Also, columns of
string ‘2015-01-01 00:00:00+01:00’, so to extract
‘’weather_description’, ’weather_main’ and ‘’weather_icon’
which year is this set of information obtained, we
all contain very similar information and are less rep-
take the ‘2015’ from the original complete string
resentative than the ‘weather_id’ columns which in-
and add it to a newly created column, and sim-
dicates the type of weather at that time. However,
ilarly for months and hours. However, for days,
very specific and micro levels of information about
the methodology is different as we have less in-
weather such as precipitation and cloudiness are
terest in the specific date but rather the weekday,
primarily conserved as the uncertainty that they may
hence the function of ‘pd.to_daytime’ is applied to
hold a strong correlation to the prediction. Addition-
produce the weekday information. Additionally, all
ally, by using the ‘dtype’ function to inspect the data
the newly obtained values are transferred into nu-
types for all the data, it is founded that although
merical types, which marks the complete extrac-
many of the data appear to be in the form of num-
tion of time information. Furthermore, for the con-
bers, they are actually of the ‘object’ type which
venience of the subsequent manipulations of data,
cannot be directly processed by machine learning
‘time’ as the first column is transformed into index
algorithms. Therefore, the ‘to_numerics’ is performed
numbers ranged from 1 to 35064.
to transform everything in the data into either floats
Another crucial step is to manipulate the cate-
or integers.
gorical variables. ‘LabelEncoder.fit.transform’ func-
Further more, correlation of different variables
tion is applied here to encode target labels with
is also examined in search of insignificant values.
value between 0 and n_classes-1. For instance,
A coloured correlation map can clearly demonstrate
after transformation, ‘800’ in ‘weather_id’ becomes
the connection between every pair of information.
‘22’, while the ‘0’ from ‘cloud_all’ remains to be ‘0’.
As shown in Figure 4, there are several white
The presence of missing values is also taken into
streaks in the graph, which suggests their corre-
consideration. The number of ‘missing values in
sponding variables are non-related and should be
each column is examined by using the ‘isnull.sum’
dropped. These variables include ’snow_3h’, ’gen-

6
eration fossil coal-derived gas’, ’generation fossil we get accuracy/loss for every combination of hy-
oil shale’, ’generation fossil peat’, ’generation geother-perparameters and we can choose the one with
mal’, ’generation marine’ and ’generation wind off- the best performance. Some algorithms’ website
shore’. also give qualitative suggestions about how to tune
Moreover, parameters with very low correlation the parameters more effectively. For instance, Cat-
to the desired output should also be discarded to boost’s official website tells users that underfitting
reduce overfitting. The dumped parameters are and overfitting can be avoided by setting the num-
’temp’, ’rain_3h’, ’weekday’, ’weather_id1’, ’weather_id2’,
ber of iterations to a large value, using the overfit-
’weather_id3’, ’weather_id4’, ’clouds_all1’, ’clouds_all4’
ting detector parameters and turning the use best
After all the steps of data cleansing mentioned model options on. Thus, we follow theses advice
above, the final distribution of data values is shown then constructing the dictionaries for Catboost’s pa-
in the table and graph below. rameters. Also,to reduce calculation complexity,
only important features are put under Grid SearchCV,
while parameters that have little impacts on results
are neglected.
number of lay-
Depth ers for decision 2, 4, 6
trees’ growth.
An optimization
algorithm that
determines the
step size at
0.05, 0,1, 0.15,
Learning rate each iteration
0.2, 0.25, 0.3
while moving
toward a mini-
mum of a loss
function.
The amount of
Figure 3: Spreading of data after primary process- randomness to
ing Random use for scoring 25, 50, 100,
strength splits when the 200
4.2 Parameter tuning tree structure is
A machine learning model has many parameter selected.
that are variable and can affect the prediction accu- The results of GridsearchCV shows that ’depth=8,
racy significantly. Thus it is of great importance to learning_rate=0.1, random_strength=50’ is the set
find an good set of parameter values in order to ob- of parameters that produces the best prediction.
tain optimal result, while avoid overfitting as much
4.3 Performance evaluation
as possible at the same time. One way to search
Several models are tuned and applied to pre-
for combinations of parameter values is to change
dict the desired outcome: electricity price, and Cat-
the input manually and observe how the accuracy
Boost turns out to be the best algorithm among
changes. Yet it is very ineffective and calculation-
them.
demanding. Therefore, Grid Search is proposed to
To exam the model performance in a mathe-
tune the parameters. Grid SearchCV is a function
matical way, three statistical indexes are calculated
that is able to try all the combinations of the val-
and referenced, namely coefficient of determina-
ues passed in dictionary constructed and evaluates
tion (R square), mean square error (MSE), root mean
the model for each combination using the Cross-
square error (RMSE) and main absolute error (MAE).
Validation method. Hence after using this function

7
R2 measures the percentage of the variance in
output that can be explained by the variance in in-
put.
Mean square error evaluates how close a re-
gression line is to a set of points. It does this by
taking the distances from the points to the regres-
sion line and squaring them.
Root mean square error is the standard devia- Figure 5: Prediction for electricity price
tion of the residuals (which are prediction errors in
this case), and it is a measure of how spread out
4.4 Sensitivity Analysis
these residuals are.
SHAP can help to demonstrate the exact con-
Mean absolute error measures the average mag-
tribution of all variables to the final prediction. The
nitude of the errors in a set of predictions, without
SHAP values of a 1000 data subsample is shown
considering their direction.
below, where red parts are positive contributions
The values for each model are obtained by tak-
and blue parts are negative contributions.
ing the average of five performances to minimize
random errors.
More over, we compare the result for both train-
ing and testing dataset to the prediction in original
dataset. The results show that our prediction if sig-
nificantly better than the official prediction and only
present moderate overfitting.

Accuracy Training Testing Official


R2 0.9689 0.9230 0.1308
MSE 5.882 13.44 175.0
RMSE 2.425 3.667 13.23 Figure 6: SHAP values for a subsample
MAE 1.726 2.513 10.47

Table 1: Performances of Catboost predicting Conducting a SHAP analysis to the prediction


electricity price for electricity price, the impacts of all features on
electricity price prediction is shown (Fig.6). The
Accuracy Training Testing Official most important influencers are ’month’, ’year’ and
R2 0.9971 0.9950 0.9902 ’generation fossil gas’. We can find out that ma-
MSE 60220 103700 204800 jority of the features are positively correlated to the
RMSE 245.4 322.1 452.6
prediction output, with the exception of ’generation
MAE 187.4 239.1 316.1
hydro pumped storage consumption’,generation hy-
Table 2: Performances of Catboost predicting total dro run-pf-river and poundage’, ’cloud_all2’, ’gen-
energy load eration waste’ and ’wind_deg’.
Although forecasting through energy data gen-
erally have better performance compare to the fore-
casting through weather data, it is still significant to
consider the multivariate inputs for a more accurate
prediction.
This results can be interpreted in the real life
context that increasing energy generation implies
greater consumer demand to electricity, higher de-
Figure 4: Prediction for toal energy load mand leads to higher price in the market, while the
negative impacts of ’generation hydro pumped stor-

8
age consumption’ and ’generation hydro run-pf-river energy generation contributes clearly to the total
and poundage’ can be explained as both are re- energy load, which is indeed self-explainatory as
newable energy and takes little variable cost to pro- total energy load should be approximated equal to
duce, hence the output are associated with supply total energy supplied, with the mere exception of
whose increase can lower the general price level. ’generation hydro pumped storage consumption’.
For time factors, we can conclude the the price in- The top four features have the greatest variance
creases greatly as time passes by, which is very and hence greatest impacts on total energy load,
reasonable because of the the world’s ever-increasingwhich suggest that the total generation can be go-
desire for energy due to industrialization. Compar- ing through large fluctuations from time to time. Thus,
ing the two main categories, energy generation in- the government can consider to stabilise the en-
fluences electricity price much more greatly than ergy output for both renewable and fossil fuel en-
weather features, hence Spain government and re- ergy.
lated organization can aim to reduce citizens daily
4.5 Comparison with other models
energy consumption to lower and stabilise the price
To evaluate the performance of Catboost in study-
for electricity.
ing this dataset, other machine learning algorithms
are also tested and the results are compared. Sev-
eral frequently used algorithms are selected, which
are LightGBM, XgBoost and RandomForest.

CatBoost LightGBM XgBoost RF


0.9230 0.8034 0.8995 0.8920
13.44 30.94 17.36 17.66
3.667 5.562 4.167 4.202
2.513 3.762 2.821 2.453

Table 3: Performances for predicting electricity


price

CatBoost LightGBM XgBoost RF


Figure 7: SHAP values for electricity prediction 0.9943 0.9915 0.9943 0.9916
118400 175600 118300 174100
344.1 419.1 344.0 417.3
243.1 294.8 239.6 290.8

Table 4: Performances for predicting total energy


load

We can see that all four algorithms generate


predictions that are far more accurate than the orig-
inal ’price day ahead’ prediction. While among them,
Catboost gives results with greatest accuracy and
minimal errors. While little improvement is shown
for predicting total energy load as the original ’total
load forecast’ prediction already has an R-Square
Figure 8: SHAP values for load prediction value higher than 99 percent. It can be specu-
lated that CatBoost and XgBoost perform slightly
We further analysis how the total energy load is better than the rest two models, but the difference
influenced by different features. As shown below in is very tangible indecisive. Therefore, to conclude,
Fig.7, weather features and time features all have CatBoost gives the best performance among the
very little impacts on the total energy load. All the chosen four algorithms and is very suitable for this

9
case study. [6] Rolnick, D., Donti, P. L., Kaack, L. H., Kochan-
ski, K., Lacoste, A., Sankaran, K., Ross, A. S.,
5 CONCLUSION Milojevic-Dupont, N., Jaques, N., Waldman-Brown,
Machine learning methods can be used to pre- A., Luccioni, A., Maharaj, T., Sherwin, E. D., Mukkav-
dict energy consumption as well as price, and their illi, S. K., Kording, K. P., Gomes, C., Ng, A. Y., Has-
effectiveness is validated in a case study regarding sabis, D., Platt, J. C., … Bengio, Y. (2019, Novem-
five cities in Spain. Catboost is found out to be able ber 5). Tackling Climate Change with Machine Learn-
to produce the best results with minimum errors. ing. arXiv.org. https://arxiv.org/abs/1906.05433
The limitations of this project is that the data [7] European Parliament and Council. Directive
used for case study may not be the most represen- 2010/31/EU of the European Parliament and of the
tative as different countries and regions can have Council of 19 May 2010 on the energy performance
very different energy industries. Therefore, a future of buildings. Official Journal of the European Union
extension of this project can be to gather data from 2010; L153:13–35
more places all over the world to apply machine [8] Jordan, M. I., amp; Mitchell, T. M. (2015).
learning, hence finding a more general result. Machine learning: Trends, perspectives, and prospects.
Science, 349(6245), 255–260. https://doi.org/
6 ACKNOWLEDGEMENT 10.1126/science.aaa8415
[9] Dincer, I., amp; Rosen, M. A. (1999). En-
I would like to thank My mentor, Asst Prof Zhang
ergy, environment and sustainable development.
Limao, for guiding me throughout this entire NRP
Applied Energy, 64(1-4), 427–440. https://doi.
journey and sharing with me his expertise in re-
org/10.1016/s0306-2619(99)00111-7
search, machine learning and data analytics. I would
[10] A. Yezioro, B. Dong, F. Leite, An applied
also like to thank my teacher mentor,Mr.Kay Siang
artificial intelligence approach towards assessing
Low for their unwavering support.
building performance simulation tools, Energy and
7 REFERENCES Buildings 40 (2008) 612–620.
[11] Yu, Z., Haghighat, F., Fung, B. C. M., amp;
[1] Central collection and publication of elec-
Yoshino, H. (2010). A decision tree method for
tricity generation, transportation and consumption
building energy demand modelling. Energy and
data and information fornbsp;thenbsp;pan-European
Buildings, 42(10), 1637–1646. https://doi.org/10.1016/j.en-
market. ENTSO. (n.d.). https://transparency.
build.2010.04.006
entsoe.eu/dashboard/show
[12] Friedman, J. H. (2002). Stochastic gradi-
[2] ENERGY FINAL PRICE | ESIOS electricity
ent boosting. Computational Statistics amp; Data
· data · transparency. (n.d.). https://www.esios.
Analysis, 38(4), 367–378. https://doi.org/10.
ree.es/en/market-and-prices
1016/s0167-9473(01)00065-2
[3] Jhana, N. (2019, October 10). Hourly En-
[13] Overview of CatBoost. CatBoost. Doc-
ergy Demand Generation and weather. Kaggle.
umentation. (n.d.). https://catboost.ai/docs/
Retrieved January 16, 2022, from https://www.
concepts/about.html
kaggle.com/nicholasjhana/energy-consumption-
[14] Hox, J. J., Boeije, H. R. (2005). Data col-
generation-prices-and-weather
lection, primary versus secondary.
[4] Omer, A. M. (2008). Energy, environment
[15] Dorogush, A. V., Ershov, V., Gulin, A. (2018).
and sustainable development. Renewable and Sus-
CatBoost: gradient boosting with categorical fea-
tainable Energy Reviews, 12(9), 2265–2300. https:
tures support. arXiv preprint arXiv:1810.11363.
//doi.org/10.1016/j.rser.2007.05.001
[16] Friedman, J. H. (2001). Greedy function
[5] Bentley, R. W. (2002). Global oil amp; gas
approximation: a gradient boosting machine. An-
depletion: an overview. Energy Policy, 30(3), 189–205.
nals of statistics, 1189-1232.
https://doi.org/10.1016/s0301-4215(01)00144-
[17] Prokhorenkova, L., Gusev, G., Vorobev,
6
A., Dorogush, A. V., Gulin, A. (2017). CatBoost:

10
unbiased boosting with categorical features. arXiv
preprint arXiv:1706.09516.

11

You might also like