1 SM

2017 2nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017)
ISBN: 978-1-60595-485-1
Electricity Consumption Prediction Using XGBoost

Based on Discrete Wavelet Transform
WEIZENG WANG, YULIANG SHI, GAOFAN LYU and WANGHUA DENG
ABSTRACT
The purpose of this paper is to predict the daily electricity consumption of the next
month. It is considerably important for people to cope with the problem well. Although
few articles mentions the topic of electricity consumption prediction, numerous papers
include some topic similar to the topic in this paper, such as rainfall forecasting, wind
speed prediction and water flow forecasting. Moreover, a number of techniques and
algorithms are employed to cope with those issues and achieve outstanding
performance. Those techniques and algorithms are considerably remarkable, but the
accuracy of them is not excellent enough on the long-term prediction of time series. In
this paper, we propose a hybrid model which integrate discrete wavelet transform and
XGBoost to forecast the electricity consumption time series data on the long-term
prediction, namely DWT-XGBoost. The original time series data can decompose into
approximate time series data and detail time series data by the discrete wavelet
transform. And those time series data by decomposition are as features input into the
prediction model that is XGBoost. Furthermore, the parameters of XGBoost are
obtained by a grid search method. The performance of the proposed model in this paper
is measured against with other hybrid models such as integrating discrete wavelet
transform and support vector regression, integrating discrete wavelet transform and
artificial neural networks, and unitary XGBoost. The comparison results show that the
DWT-XGBoost outperforms other models and is a novel method on the long-term
prediction of time series.
KEYWORDS
Wavelet Transform, Discrete Wavelet Transform (DWT), XGBoost, DWT-
XGBoost, Extracting Features.
INTRODUCTION
At present, people cloud not live a better life without electricity, and it is
considerable essential to distribute electricity effectively and protect power supply
demand. It requires data scientists to rely on effective analysis methods and accurate
forecasting models of electricity demand, and to develop the history of electricity
records to accurately predict the next period of electricity consumption and explore the
electricity law of the actual world. The record of electricity here is a time series.
_________________________________________
Corresponding Author: Weizeng Wang,13021093200@163.com, Beijing University of
Technology, People’s Republic of China.
Yuliang Shi, shiyl@bjut.edu.cn, Gaofan Lyu, 17801125181@163.com and Wanghua
Deng, dengwanghua2016@163.com, Beijing University of Technology, People’s Republic
of China.
716
On the time series prediction, the most popular method is the statistical the ARMA at
before, followed by the emergence of machine learning and deep learning.
In machine learning, many researchers have developed many single or hybrid
models for time series prediction, which have been validated in actual data. Ye Ren used
a novel hybrid model of integrating the EMD (empirical mode decomposition) and the
SVR to predict wind speed in his study, then compared with a variety of hybrid models
and finally found that the hybrid model proposed by himself was more accurate [1]. In
the study of fault time series prediction, Xin Wang and Ji Wu proposed a hybrid model
of singular spectrum analysis and the SVR, compared with various models such as the
ARMA and multiple linear regression, and found that the performance of the model in
this example was better than those models [2]. Adamowski and Sun compared the
relative performance of the coupled wavelet-neural network models (WA–ANN) and
regular artificial neural networks (ANN) for flow forecasting at lead times of 1 and 3
days for two different non-perennial rivers in semiarid watersheds of CyprusR, found
that the performance of the former was better [3]. Similarly, Venkata Ramana and others
haven combined the wavelet transform with ANN to obtain a hybrid model named
WNN, and regular ANN and it were then applied to monthly rainfall prediction
respectively to gain a conclusion that the prediction accuracy of the latter is better than
that of the former [4]. And before that, a scholar has put forward an approach of
nonlinear SVM based on PSO (particle swarm optimization algorithm) applied to
rainfall forecasting as well [5]. Zhiyong Liu and others investigated a hybrid model that
was combined the discrete wavelet transform and support vector regression (the DWT–
SVR model) for daily and monthly stream flow forecasting and found it outperformed
regular SVR [6]. In addition, some other hybrid models and unicity models for
forecasting have also been proposed, such as AGA-SSVR which hybridizes SVR model
with adaptive genetic algorithm (AGA) and the seasonal index adjustment [7],
Recurrent Neural Networks(RNN) and Grammatical Inference [8], the least squares
support vector regression [9,10], SVR or SVM [11,12,13,14], Neural Networks [15].
Most of the above are only developed to short-term time series prediction. Moreover,
other investigators have made some results in long-term time series prediction.
Alexander Grigorievskiy and others applied OP-ELM to the problem of long-term time
series prediction [16]. Multiple-output support vector regression (M-SVR) have been
employed in multi-step-ahead time series prediction by Yukun Bao and others [17]. The
least squares support vector machine and the multivariate adaptive regression spline
model are applied to the long-term prediction of river water pollution by Ozgur Kisi et
al [10].
From the literature review above, it is known that there are few researches on electric
consumption prediction, but the research of electric consumption prediction is quite
important to people's production and life. And we learn that the hybrid model
outperforms the corresponding regular single model. So we intend to use a hybrid model
to solve electric power prediction. After investigating the relevant literature, a novel
hybrid model integrating the discrete wavelet transform (DWT) and the XGBoost is
developed. In previous studies, wavelet transform has been successfully applied to time
series prediction. In this study, the data preprocessing is carried out by using the DWT,
and the XGBoost is used as the base prediction model. The results of the DWT
decomposition were then input into the XGBoost model as feature data. Comparing
with some hybrid models, such as DWT-ANN [3, 4] and DWT-SVR [6], we find that
the hybrid model proposed in this paper performs better. From the fact, the hybrid model
717
can accurately predict the future electricity situation. And the forecasting results can
provide decision support for the power sector so that are enable them to more effectively
configure the power to avoid uneven distribution. The major contribution of this paper
is as follows: 1) the introduction of related work, 2) describing the hybrid model
proposed in this paper 3) describing experiment procedure, 4) discussing results, 5) the
final part of the conclusion.
RELATED WORK
This section provides brief introductions to the related techniques including extreme
gradient boosting (XGBoost), discrete wavelet transform (DWT), and grid search used
in the hybrid model. In this paper, XGBoost is used to model the fluctuation, while
DWT is used to decompose the electricity consumption time series.
EXTREME GRADIENT BOOSTING (XGBOOST)
XGBoost is short for “Extreme Gradient Boosting”, where the term “Gradient
Boosting” is proposed in the paper Greedy Function Approximation: A Gradient
Boosting Machine, by Friedman [19]. XGBoost is based on this original model. This is
a machine learning method that has been used by many data researchers, especially in a
variety of data competitions and machine learning competitions, reflects the
performance of more superior than other methods. The XGBoost is applicable in both
classification and regression, which has been validated in many practical cases, such as
store sales prediction, customer behavior prediction, ad click through rate prediction,
hazard risk prediction, web text classification, malware classification [18].
XGBoost is used for supervised learning problems, where we use the training data
(with multiple features) xi to predict a target variable yi. As described by Chen and
Guestrin [18], Xgboost is an ensemble of K Classification and Regression Trees
(CART) {T1 (xi, yi)... TN (xi, yi)} where xi is the given training set of descriptors
associated with a molecule to predict the class label, yi. Given that a CART assigns a
real score to each leaves (outcome or target), the prediction scores for individual CART
is summed up to get the final score and evaluated through K additive functions, as
shown in Eq. 1:
K
yi = fk (xi ) ,fk ∈F. (1)
k=1
Where Kthe number of trees is, fk is a function in the functional spaceF, and F is the
space of all CART. And the objective function contain two parts: training loss and
regularization, as shown in Eq. 2:
n K
obj θ = l yi ,yi + Ω(fk ) . (2)

i k=1
Where l is a differentiable loss function which measures the difference between the
predicted yi and the targetyi . Ω Is a regularization term which penalizes the complexity
of the model to avoid over-fitting.
718
Since additive training is used, the prediction yi at step t expressed as
K
(t) (t-1)
yi = fk xi =yi +ft xi . (3)
k=1
And tree boosting is used to Eq. 2, it can be written as

n
(t) (t-1)
obj θ = l yi ,yi +ft (xi ) +Ω ft . (4)
i
After a series of improvements and evolutions, Eq. 5 is derived, which is used to

score a leaf node during splitting.
2 2
1 GL GR GL +GR 2
Gain = + - -γ (5)
2 HL +λ HR +λ HL +HR +λ
Where the first, second and third term of the equation stands for the score on the
left, right and the original leaf respectively. Moreover, the final term, γ, is regularization
on the additional leaf.
The most important factor behind the success of the XGBoost is its scalability in all
scenarios. The scalability of XGBoost is due to several important systems and
algorithmic optimizations. These innovations include: a new tree learning algorithm for
dealing with sparse data; theoretically reasonable weighted quintile sketch program can
handle instance weights in approximate tree learning. Parallel and distributed computing
makes learning faster, enabling faster model exploration [18]. Although the XGBoost
is based on the classification and regression tree (CART) and the gradient boosting,
XGBoost is better than them because it integrates both the advantages.
DISCRETE WAVELET TRANSFORM (DWT)
The discrete wavelet transform (DWT) is one of the wavelet analysis. In here,
wavelet is a mathematical function that represents the scale of time series and their
relationship to analyze time series that contain non-stationaries. The advantage of
wavelet analysis is that it allows the use of long time intervals for low-frequency
information and shorter intervals for high-frequency information and is capable of
revealing aspects of data like trends, breakdown points, and discontinuities that other
signal analysis techniques might miss. Another is that the mother wavelet can be flexibly
selected according to the characteristics of the time series investigated. Wavelet analysis
contain two approaches: the one is the continuous wavelet transform (CWT), another is
the discrete wavelet transform (DWT).
The definition of the continuous wavelet transform is as fallow
+∞
Ψ 1 * t-τ
CWTx τ, s = x t Ψ dt (6)
|s| -∞ s
Where s is the proportional parameter, τ is the translation parameter, ‘*’ is the

conjugate complex, the parent wavelet Ψ t is the transformation function. It should be
719
noted here that the CWT calculations require a lot of time and resources [3]. In addition,
the CWT generates a lot of data, which contains a lot of redundant information, which
will lead to more difficult data analyses [6]. However, the DWT is simpler to implement
than the CWT and requires less computation time. Even so, it`s analysis is still very
effective and accurate [3, 6]. The DWT scales and positions are usually based on integer
powers of two (dyadic scales and positions) [6]. This is achieved by modifying the
wavelet transform function representation to
j
t-γ 1 t-kγ0 s0
Ψ j, k = j⁄2 Ψ j (7)
s s0 s0
Where Ψ is the wavelet function, t is the time and γ is the translation factor (time
step) of the wavelet over the time series, s indicates the scale (scale factor), j is an integer
that determines the dilation (dilation factor), k is an integer that determines the
translation, s0 is a specified fixed dilation step greater than 1, and γ0 denotes the location
parameter which must be greater than 0.
DWT decomposes the original waveform into two waveforms by two
complementary filters (high frequency and low frequency): the approximate waveform
(A) and the detail waveform (D). The approximate waveform is a high-scale and low-
frequency component; the detail waveform is a low-scale and high-frequency
component. It is generally believed that the low frequency approximate waveforms are
representative of the original waveform of the same, and the high-frequency detail
waveforms represent subtle changes in the original waveform, so either of them is
indispensable. The DWT process is an iterative decomposition. If the number of
decomposed layers is greater than 1, then the decomposition of the approximate
waveform begins from the second layer. So an original waveform after DWT
decomposition will produce a lot of high-frequency detail waveforms, and only a low
frequency approximation waveform.
Performance Criteria
The performance of each model is verified by using different statistical measures.

The statistical measures used in this study are: mean square error (MSE) and correlation
coefficient (R). The mean square error is defined as fallow
N
1 2
MSE yi - yi (8)
N i=1
Where Nthe number of data points is, yi is the observed value, yi is the calculated
value. The definition of correlation coefficient is fallow
∑N y -y yi -yi
i=1 i i
R (9)
2 2
∑N y -y ∑N y -y
i=1 i i i=1 i i
Where Nthe number of data points is, yi is the observed value, yi is the calculated
value, yi is the mean of the observed value, yi and is the mean of the calculated value.
720
THE DWT-XGBOOST BASED HYBRID MODEL
In the above section, DWT and XGBoost have been briefly introduced. This section
describes DWT-XGBoost used in this study, and because of integrating DWT with
XGBoost, it is called DWT-XGBoost.
At present, the researchers have not published papers on the combination of them.
In this study, the main content of that is using DWT to decompose the original time
series and then obtaining the set of sub-series which are as features input to the XGBoost
model outputting results finally. The DWT-XGBoost model construction shown in
Figure 1. Some researchers believed that the set of sub-series by DWT decomposition
needed to use some reliable selection method to select the more appropriate sub-series
as features [6], but some others indicated that all sub-series should be used as features
and are equally important, because each sub-series are derived from the original time
series [3]. In this study the latter's argument is adopted. The Hybrid model can be
expressed as Eq. 10
Yi XGBoost D i,1 , D i,2 , ⋯ , D i,n , A i,n , C i 10
Where D i,n represents the n-the decomposition layer of detail waveform,

A i,n represents the n-the decomposition layer of approximate waveform, and C i
represents other factors.
In this study, a month is generally within five weeks, therefore five models
corresponding to five weeks are trained. There are progressive relationships between
five models, and the weekly predictive values obtained from the previous forecasting
are used to construct feature sets for the later. The definition of that is as follow
Fi = f0 + ∑j=0 gj , i ∈ 1,2,3,4,5
i-1
(11)
Where Fi represents the itch model, f0 represents the initial influencing factor (ie, the
feature set has not been added to the predicted value construct), g represents the
influencing factor from predictive values of the j-the model, and g is 0.
Features Train&Predict Results
D(i,1)
D(i,2) Input
Input
D(i,3) Input
·
·
·
XGBoost Output Y(i)
Input
D(i,n) Input
Input
A(i,n)
C(i)
Figure 1. DWT-XGBoost model construction.
721
EXPERIMENTS
Dataset
The data used in this study is the statistics of the historical power consumption of
all enterprises on Yangzhong High-tech Industrial Development Zone of Jiangsu
Province, China, and provided by Tianchi Big Data Competition hosted by Ali Cloud.
Table 1 shows the historical data format. The forecast is the total daily electricity
consumption of the next month for all enterprises in the region.
After preliminary statistical analysis, it is found that the total daily electricity
consumption shows a periodicity, week as a unit, as shown in Fig. 2. Due to the
existence of some legal holidays in China, there are also some specific high and low
peak segments, but the overall trend is presented in weekly units. Although forecasting
the total daily electricity consumption is final objective, the overall trend is reflected in
the daily electricity consumption for each enterprise. The historical power consumption
of each enterprise is employed for training and forecasting after processing, then the
prediction values will be counted up, finally we will count the total daily electricity
consumption next month. Because of the periodicity nature and the long-term
prediction, it is considered to divide original time series into different sub-time series
according to everyday of a week, in which each enterprise is corresponding to seven
kinds of sub-time series. Consequently, original prediction goal transforms into
forecasting the electricity consumption of each enterprise on next some days in the same
period. Before doing the above, we need to fill missing values. And for some reason,
there are no electricity records for some enterprises someone day. Due to small amount
of the missing, it is probably reasonable that the missing is set 0 value. After initially
processing, the new data format is got, as shown in Table 2. And there is an attribute,
day_of_week (its values consist of 1 to 7), which corresponds to date (e.g.: 4(Thursday)
corresponds 2015/1/1(January 1, 2015)).
TABLE 1. THE HISTORICAL DATA FORMAT.

record_date user_id power_consumption
2015/1/1 1 1135
2015/1/2 1 570
2015/1/1 2 24
2015/1/2 2 22
TABLE 2 ADDING DAY_OF_WEEK AFTER PROCESSINGRECORD_DATE.

record_date user_id power_consumption day_of_week
2015/1/1 1 1135 4
2015/1/2 1 570 5
2015/1/1 2 24 4
2015/1/2 2 22 5
722
Figure 2. The total daily electricity consumption from 2015-01-01 to 2016-08-30.
Features Extraction
The section describes tow methods of features extraction, basic statistics and
wavelet transform. We employ these features extracted by the above of methods in
prediction stage.
Features Extraction Based on Basic Statistics
This section describes how to use basic statistics to extract features. Here we use the
mean and standard deviation statistic to construct features from the time series. After
dividing into train sets
And test sets by the time period, the part of each set used to construct features is
calculated fallowing the statistics. Following the day_of_week attribute hardly, the
average and standard deviation of the electricity consumption per enterprise can be
obtained from subsets within the entire range of that set. And then according to the
day_of_week, absolutely, the average and standard deviation of that can also be
calculated from subsets within one day in the same period of the entire range of that set.
As shown Table 3, where user_id denotes enterprise ID, DOW_power_mean is mean
of the electricity consumption according to the day_of_week, DOW_power_std is
standard deviation of that according to the day_of_week, power_mean is mean of that
not fallowing the day_of_week, power_std is standard deviation of that not fallowing
the day_of_week. In addition, all records presenting fluctuations haven potential
information in the same period, they are also as features. As shown Table 4, where
1_week denotes first day of the same period, the rest are similar.
Features Extraction Based On Wavelet Transform
In some previous studies, researchers have used wavelet transform to extract

features. As in the previous section, the original data are divided into train sets and test
sets according to the time period. After that, fallowing the day_of_week attribute
absolutely, one series of approximate and multiple series of detail are obtained by
723
decomposing them using the discrete wavelet transform of wavelet transform. There are
some of the configuration parameters on the DWT need be set: the parent wavelet is db2
in Daubechies wavelets; according to the length of subsets, the number of decomposition
layers is 2; the edge expansion function using zero- padding. Binding the above
configuration parameters to the DWT, and then that is employed decomposition,
obtaining sub-wavelets of time series as features. As shown Table 5, where w0 denotes
one value of someone of sub wavelets, the rest are similar. Moreover, it is very important
to keep all means and standard deviations of the upper section as features.
Note: where the attributes user_id and day_of_week are not used for training.
TABLE 3. THE FEATURES OF MEAN AND STANDARD DEVIATIONS.

user_i day_of_wee DOW_power_mea DOW_power power_st power_mea
d k n _ d n
1 1 312.9167 84.59149 98.75397 307.0595
1 2 316.9167 102.2523 98.75397 307.0595
TABLE 4. ALL RECORD DATA OF THE SAME PERIOD.

user_id day_of_week 1_week 2_week 3_week ···
1 1 393 354 328 ···
1 2 35 341 462 ···
TABLE 5. THE FEATURES OF SUB-WAVELETS BY DECOMPOSITION OF THE DISCRETE

WAVELET TRANSFORM.
··
user_id day_of_week w0 w1 w2
·
··
1 1 -60.52715344 480.229347 633.6039154
·
··
1 2 -55.22629175 341.5180818 708.8183392
·
TABLE 6. THERE ARE THE LIST OF CANDIDATE PARAMETERS CORRESPONDING TO THE

FEATURE SET OBTAINED BY THE WAVELET TRANSFORM EXTRACTING FEATURES,
OPTIMAL PARAMETER SET AND BEST SCORE ABOUT GENERAL -XGBOOST.
Best Parameters
Best
learning_rat max_delta_ max_dep Score
n_estimators objective
e step th
reg:tweed 0.97671
150 0.15 1 3
ie 962
reg:tweed 0.95426
150 0.15 1 5
ie 311
reg:tweed 0.87063
200 0.10 1 2
ie 653
reg:gam 0.85799
200 0.10 1 3
ma 997
reg:tweed 0.83683
200 0.10 1 2
ie 809
Candidate parameters
{reg:linea
r,
{50, 100, 150, reg:gam {0.05, 0.10, {2, 3, 4, 5,
{1}
200} ma, 0.15} 6}
reg:tweed
ie}
724
TABLE 7. THERE ARE THE LIST OF CANDIDATE PARAMETERS CORRESPONDING TO THE
FEATURE SET OBTAINED BY THE WAVELET TRANSFORM EXTRACTING FEATURES,
OPTIMAL PARAMETER SET AND BEST SCORE ABOUT DWT-XGBOOST.
Best Parameters
Best
learning_rat max_delta_ max_dep Score
n_estimators objective
e step th
reg:tweed 0.99062
200 0.15 1 5
ie 983
reg:tweed 0.98932
200 0.15 1 5
ie 463
reg:tweed 0.95832
150 0.15 1 2
ie 692
reg:tweed 0.94573
150 0.15 1 2
ie 229
reg:tweed 0.95506
150 0.1 1 3
ie 845
Candidate Parameters
{reg:linea
r,
{50, 100, 150, reg:gam {0.05, 0.10, {2, 3, 4, 5,
{1}
200} ma, 0.15} 6}
reg:tweed
ie}
Modeling Process
Once feature sets are constructed, models are constructed. The prediction model
used in this study is XGBoost, and its parameters need to be set manually. In training
phase, the GridSearchCV (cross-validated grid-search) is applied to select these
parameters in the sclera toolkit of Python [2]. It is, to some extent, preventive that the
accuracy of the prediction model decreases due to using the parameters set by subjective
assumptions to train models [6]. After using the selection method, the optimal set of
parameters in the candidate parameters are sought out. It is important to note here that
the time spent in this selection process increases as the amount of training data
increasing. In this study, the feature datasets are used for selecting an optimal parameter
set for training prediction model, repeating the procedure can obtain all models. There
only show the modeling process about regular XGBoost namely General-XGBoost and
DWT-XGBoost, in order to highlight advantage of XGBoost in train stage. And they
are shown in Tables 6 and 7 respectively, where n_ estimators is the number of boosted
trees to fit the actual scene, objective is the learning task and the corresponding learning
objective, learning rate is learning rate, max_delta_step is maximum delta step,
max_depth is maximum tree depth.
RESULTS AND DISCUSSIONS
In this section, the prediction results and model performance are shown and
analyzed. We perform not only General-XGBoost and DWT-XGBoost but also DWT-
SVR and DWT-ANN. By comparing each other, we can learn that the model proposed
in this paper perform better.
725
Model Performance
Before discussion, Table 6 and 7 need be explained. And it is very obvious from the
Best Score in the two tables that the training score of DWT-XGBoost is superior to that
of General-XGBoost. From the two models, scores of each model show a decreasing
trend, which indicates the characteristics of long-term prediction accuracy attenuation
that can be also reflected in the following analysis.
At the training phase, two statistical metrics, mean square error (MSE) and
correlation coefficient (R), are employed to validate all models. As shown in Table 8
that contains the MSE and R values for each kind of model. In order to indicate clearly
the difference between all models, these MSE and R values are plotted into the parallel
coordinate map, which they are respectively shown Fig. 3 and Fig. 4. As can be seen
from Fig. 3, the MSE value of DWT-XGBoost model is significantly smaller than that
of other three models. But the MSE value of DWT-XGBoost is slightly larger than that
of General-XGBoost at , which may be due to the fact that the data used for verification
corresponds to an especial period in the actual scene. And the phenomenon cannot affect
the generalization of model. As can be seen from Fig. 4, the R value of DWT-XGBoost
model is also significantly larger than1 that of other three models. Moreover the R value
of the former is basically close to 1. We find out that DWT-SVR is very well about its
R value, but its MSE value is largest in all models from Fig. 3. In addition, the R value
of General-XGBoost is lowest in Fig. 4.
At the forecasting phase, the same two statistical metrics were used before. As
shown in Table 9, MSE and R values are calculated from the statistical summary of
predicted values and observed values which is by the competition. While MSE and R
values of the forecasting phase are calculated from the predicted and observed values of
each model. As can be seen from Table 9, MSE and R of DWT-XGBoost are
significantly better than that of other three models. In particular, MSE and R of DWT-
SVR are not presented, because the differences of it and other three models are too large.
Model Prediction Results
All model are applied to predict the daily electricity consumption for each company
in the next month respectively, then that results aggregate the total daily electricity
consumption for a month. As we can see from Fig. 5, predicted values of DWT-SVR
differ greatly from those of other three models and observed values. Except from DWT-
SVR, predicted values of other three models and observed values are shown in Fig. 6.
And we can find that predicted values of DWT-XGBoost are closer to observed values,
which is obvious to compare with other two models. The above suggests that not only
DWT-XGBoost outperform other three models (DWT-ANN, DWT-SVR and General-
XGBoost) about accuracy but also the generalization ability of it is better.
TABLE 8. DURING TRAINING PHASE, THERE ARE MSE AND R ON FOUR MODELS.
SubModel
Model
Criteria
MSE 218339190 315666194 246385724 125793522 83220813
General‐XGBoost
R 0.91558032 0.69777717 0.75198702 0.85912064 0.87620513
MSE 5183695 327922997 51241364 13446978 5422614
DWT‐XGBoost
R 0.99835577 0.89885178 0.97154417 0.99095495 0.99578839
MSE 8626116 395772414 195854911 180721893 233393853
DWT‐ANN
R 0.99715978 0.88573545 0.94420191 0.94545379 0.92705372
MSE 1515791813 1504069525 3169728380 247040421 302535316
DWT‐SVR
R 0.99751818 0.87997544 0.93519323 0.91243047 0.91813163
726
TABLE 9. DURING FORECASTING PHASE, THERE ARE MSE AND R ON THREE MODELS.
Model MSE R
General‐XGBoost 771482693043 ‐0.43225086
DWT‐XGBoost 113920387382 0.32561234
DWT‐ANN 782352782056 0.29163273
Figure 3. MSE values in training phase.
Figure 4. R values in forecasting phase.
727
Figure 5. The compare of predicted values and observed values (four models).
CONCLUSION
Comparing with the four models in the above statement, DWT-XGBoost in the
respect of time series prediction shows superiority. This study is to apply this model to
electricity consumption forecast. In the future, if we have the opportunity we will apply
it to other fields, and will continue to improve the model. For time reasons, we still have
improved requirements in some places. For example, in the data section, we can add
weather data and regional economic data, and use the cross validation method to
segment the data set. In the algorithm section, we can use more candidate combinations
of parameters. Of course, the more the combination of the parameters, the more the
training time.
Figure 6. The compare of predicted values and observed values (three models).
728
ACKNOWLEDGEMENTS
Corresponding Author: Weizeng Wang, Beijing University of Technology,

People’s Republic of China. 13021093200@163.com.
REFERENCES
1. Ren, Y., Suganthan, P.N., & Srikanth, N. A novel empirical mode decomposition with support vector
regression for wind speed forecasting. IEEE transactions on neural networks and learning systems,
27(8) (2016) 1793-1798.
2. Wang, X., Wu, J., Liu, C., Wang, S., & Niu, W. A Hybrid Model Based on Singular Spectrum
Analysis and Support Vector Machines Regression for Failure Time Series Prediction. Quality and
Reliability Engineering International, 32(8) (2016) 2717-2738.
3. Adamowski, J., & Sun, K. Development of a coupled wavelet transform and neural network method
for flow forecasting of non-perennial rivers in semi-arid watersheds. Journal of Hydrology, 390(1)
(2010) 85-91.
4. Ramana, R.V., Krishna, B., Kumar, S.R., & Pandey, N.G. Monthly rainfall prediction using wavelet
neural network analysis. Water resources management, 27(10) (2013) 3697-3711.
5. Vikram, P., & Veer, P.R. Rainfall forecasting using nonlinear svm based on pso. IJCSIT)
International Journal of Computer Science and Information Technologies, 2 (2011) 2309.
6. Liu, Z., Zhou, P., Chen, G., & Guo, L. Evaluating a coupled discrete wavelet transform and support
vector regression for daily and monthly streamflow forecasting. Journal of hydrology, 519(2014)
2822-2831.
7. Chen, R., Liang, C.Y., Hong, W.C., & Gu, D.X. Forecasting holiday daily tourist flow based on
seasonal support vector regression with adaptive genetic algorithm. Applied Soft Computing, 26
(2015) 435-443.
8. Giles, C.L., Lawrence, S., & Tsoi, A.C. Noisy time series prediction using recurrent neural networks
and grammatical inference. Machine learning, 44(1) (2001) 161-183.
9. Yuan, C., Xu, S., & Zhang, X. Prediction of water quality base on kernel clustering least squares
support vector regression. In Control, Automation, Robotics and Vision (ICARCV), 2016 14th
International Conference on (pp. 1-5). IEEE.
10. Kisi, O., & Parmar, K.S. Application of least square support vector machine and multivariate adaptive
regression spline models in long term prediction of river water pollution. Journal of Hydrology,
534(2016) 104-112.
11. Wu, C.H., Ho, J.M., & Lee, D.T. Travel-time prediction with support vector regression. IEEE
transactions on intelligent transportation systems, 5(4) (2004) 276-281.
12. Sapankevych, N.I., & Sankar, R. Time series prediction using support vector machines: a survey.
IEEE Computational Intelligence Magazine, 4(2) (2009).
13. Yang, H., Chan, L., & King, I. Support vector machine regression for volatile stock market prediction.
Intelligent Data Engineering and Automated Learning—IDEAL 2002(2002) 143-152.
14. Müller, K.R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. Using support
vector machines for time series prediction. Advances in kernel methods—support vector learning,
(1999) 243-254.
15. Frank, R.J., Davey, N., & Hunt, S.P. Time series prediction and neural networks. Journal of Intelligent
& Robotic Systems, 31(1) (2001) 91-103.
16. Grigorievskiy, A., Miche, Y., Ventelä, A.M., Séverin, E., & Lendasse, A. Long-term time series
prediction using OP-ELM. Neural Networks, 51(2014) 50-56.
17. Bao, Y., Xiong, T., & Hu, Z. Multi-step-ahead time series prediction using multiple-output support
vector regression. Neurocomputing, 129(2014) 482-493.
18. Chen, T., & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm
sigkdd international conference on knowledge discovery and data mining (pp. 785-794), (2016,
August). ACM.
19. Friedman, J.H. Greedy function approximation: a gradient boosting machine. Annals of statistics,
(2001) 1189-1232.
729

1 SM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 SM

Uploaded by

Copyright:

Available Formats

2017 2nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017)

Electricity Consumption Prediction Using XGBoost

EXTREME GRADIENT BOOSTING (XGBOOST)

obj θ = l yi ,yi + Ω(fk ) . (2)

And tree boosting is used to Eq. 2, it can be written as

After a series of improvements and evolutions, Eq. 5 is derived, which is used to

DISCRETE WAVELET TRANSFORM (DWT)

Where s is the proportional parameter, τ is the translation parameter, ‘*’ is the

The performance of each model is verified by using different statistical measures.

Yi XGBoost D i,1 , D i,2 , ⋯ , D i,n , A i,n , C i 10

Where D i,n represents the n-the decomposition layer of detail waveform,

Features Train&Predict Results

TABLE 1. THE HISTORICAL DATA FORMAT.

TABLE 2 ADDING DAY_OF_WEEK AFTER PROCESSINGRECORD_DATE.

Features Extraction Based on Basic Statistics

Features Extraction Based On Wavelet Transform

In some previous studies, researchers have used wavelet transform to extract

TABLE 3. THE FEATURES OF MEAN AND STANDARD DEVIATIONS.

TABLE 4. ALL RECORD DATA OF THE SAME PERIOD.

TABLE 5. THE FEATURES OF SUB-WAVELETS BY DECOMPOSITION OF THE DISCRETE

TABLE 6. THERE ARE THE LIST OF CANDIDATE PARAMETERS CORRESPONDING TO THE

RESULTS AND DISCUSSIONS

Model Prediction Results

Figure 3. MSE values in training phase.

Figure 4. R values in forecasting phase.

Corresponding Author: Weizeng Wang, Beijing University of Technology,

You might also like