Heat

Energy 253 (2022) 124179
Contents lists available at ScienceDirect
Energy
journal homepage: www.elsevier.com/locate/energy
Load forecasting of district heating system based on Informer

Mingju Gong a, *, Yin Zhao a, Jiawang Sun a, Cuitian Han a, Guannan Sun b, Bo Yan c
a
School of Integrated Circuit Science and Engineering, Tianjin University of Technology, Tianjin, 300380, China
b
Tianjin Tianda Qiushi Electric Power High Technology Co., Ltd., Tianjin, 300000, China
c
Tianjin Sanyuan Electric Power Group Co., Ltd., Tianjin, 300000, China
a r t i c l e i n f o a b s t r a c t
Article history: Accurate load forecasting of district heating systems (DHSs) is an essential guide to guaranteeing
Received 13 December 2021 effective energy production, distribution, and rational utilization. Artificial neural networks have been
Received in revised form extensively applied to heating energy prediction in DHS. Recently, a new time series prediction model
25 April 2022
namely Informer was proposed. This study proposes an Informer-based framework for DHS heating load
Accepted 1 May 2022
Available online 4 May 2022
forecasting. To explore the performance of Informer in heating load forecasting tasks, four forecasting
models namely Autoregressive Integrated Moving Average model, Multilayer Perceptron, Recurrent
Neural Network and Long Short-Term Memory network are established for comparison. The historical
Keywords:
District heating system
heating load, outdoor temperature, relative humidity, wind speed and air quality index of a DHS in
Heating load forecasting Tianjin are used as the input characteristics to comprehensively assess the performance of these five
Informer forecasting strategies. The prediction results of the models are evaluated and visualized. The experi-
Relative position encoding algorithm mental results show that the Informer-based forecasting model can achieve the most accurate and stable
predictions. Furthermore, a relative position encoding algorithm is introduced to enhance its general-
ization and robustness. Overall, the Informer-based framework can report satisfactory testing results. The
prediction curve is fitted to the trend of temperature change which can play an excellent guiding role in
heating dispatching.
© 2022 Elsevier Ltd. All rights reserved.
1. Introduction world [6], China has received extensive attention from all social
circles. Therefore, China has put forward the concept of carbon peak
As one of the essential energy services in densely populated and carbon neutralization, which means that the full utilization of
urban areas, district heating transports the heating medium from energy has become an urgent problem that needs to be solved. The
energy resources to customers to satisfy the heating requirement. fourth-generation district heating system (4GDHS) is the key to the
More than 80,000 sets of DHSs have been on stream worldwide. implementation of the sustainable development strategy. The
DHS, the most important energy system in China, is an effective introduction of intelligent control is the most significant feature of
strategy for heating production and distribution to meet residential 4GDHS [7]. In order to meet the actual heating demand of users
and commercial heating needs. The input energy of DHS comes without excess heating loss, heating load forecasting has become
from combined heating and power (CHP) plants [1], district boilers, an indispensable element of DHS. The accurate heating load fore-
industrial waste heating [2] and various natural heating sources casting can provide sufficient guidance for energy saving and
[3,4]. The heat source in DHS needs to be converted into the emission reduction so that the district heating operator can
required heating medium (hot water or steam) through equipment correctly control the operation of DHS and make effective decisions
in the energy station. Compared with local boilers, DHS provides for the heating system.
higher efficiency and less pollution [5]. In the past, heating load forecasting methods were mainly based
With the increasing global environmental problems, various on physical and mathematical methods [8], which need to establish
countries have formulated corresponding policies and measures. As the functional relationship between heating load and its influence
the largest energy-consuming economy and carbon emitter in the characteristics. However, due to the uncertainty of influencing
factors [9], such as holidays and extreme weather, the performance
of mathematical modeling methods will be enormously decreased.
* Corresponding author. Benefitting from the rise of machine learning and deep learning,
E-mail address: gmj790@163.com (M. Gong).
https://doi.org/10.1016/j.energy.2022.124179
0360-5442/© 2022 Elsevier Ltd. All rights reserved.
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
compared in ref. [28], which provides guidance for the effective

Abbreviations selection of energy management prediction models. The SVM
model proposed in ref. [29] combines the similar day method to
DHS district heating system meet the local climate requirements.
4GDHS the fourth-generation district heating system As a deep learning model, Transformer [30] has extensively
CV computer vision blossomed in NLP and CV. Some scholars believe that it has the
NLP natural language processing potential to improve prediction ability. However, because of the
ANN artificial neural network high time complexity, high memory utilization, and sudden drop of
LSTM Long Short-Term Memory network prediction rate, it can not be directly applied to time series pre-
SVM support vector machine diction. Numerous scholars have made great efforts and contrib-
AQI air quality index uted many methods to the challenges. Ref. [31] considers the sparse
ARIMA Autoregressive Integrated Moving Average model factorization of attention matrix. In Ref. [32], logsparse Transformer
MLP Multilayer Perceptron is proposed, which introduces a convolutional self-attention
RNN Recurrent Neural Network mechanism. Ref. [33] introduces Longformer, a linear extension of
MAE mean absolute error attention mechanism with sequence length. A new self-attention
MSE mean square error mechanism in Ref. [34] reduces the complexity of the whole self-
MAPE mean absolute percentage error attention in time-space. In Ref. [35], Informer, a variant of Trans-
former, is established, which has been tested on four large-scale
data sets, and performs excellently. It provides a different solu-
tion to the problem of time series prediction.
As mentioned previously, Informer is a novel time series pre-
data-driven prediction methods occupy a dominant position. The diction model. There is still a big lack of research and application of
machine learning model is adept in learning potential relevance this model. In order to explore the performance of Informer in
from data, and its performance is more flexible and accurate. It has heating load forecasting, using the hourly heating load data of a
been widely used in computer vision (CV), natural language pro- DHS in Tianjin, this study establishes a forecasting framework
cessing (NLP), pattern recognition and other fields. In recent years, based on several forecasting models. The performance of Informer
the development of cloud systems brings convenient data storage is evaluated through the comparison with other models. The nov-
and access. Data-driven forecasting method has become popular in elty and main contributions of this study are as follows:
heating load forecasting tasks.
Whether the heating demand prediction of a single heating (1) Informer is applied to heating load forecasting. The date,
system or the performance of the prediction model is the subject of outdoor temperature, outdoor humidity, wind speed, air
many studies. Most of the models used in DHS load forecasting are quality index (AQI) and historical heating load are taken as its
traditional regression model [10], support vector machine (SVM) input characteristics.
[11], artificial neural network (ANN) [12,13], among others. (2) Another four machine learning models are established,
The regression model is a classical forecasting method. In namely Autoregressive Integrated Moving Average model
Ref. [14], based on the traditional linear regression model, the (ARIMA), Multilayer Perceptron (MLP), Recurrent Neural
short-term prediction of building load is carried out. A seasonal Network (RNN) and LSTM, which are compared with
autoregressive integrated moving average model is proposed in Informer. Mean absolute error (MAE), Mean square error
Ref. [15], which has high prediction accuracy. The author proposed (MSE) and Mean absolute percentage error (MAPE) were
an autoregressive moving average model in Ref. [16], which has used as evaluation methods. The comparison results indicate
been applied in a practical system. SVM has a satisfactory predic- that the predictability of Informer performed better than
tion accuracy usually. Ref. [17] proposed an SVM nonlinear pre- others.
diction model based on particle swarm optimization. Similarly, ref. (3) A relative position encoding method is introduced and
[18] optimizes the support vector regression model by using par- compared with the basic Informer model with absolute po-
ticle swarm optimization, and effectively predicts power, cooling sition encoding. The results show that the error of Informer
and heating load. In addition, ANN is widely used in load fore- with relative position encoding is further reduced and the
casting. Ref. [19] evaluates the calculation time and error of several prediction curve is smoother.
popular ANN models. Ref. [20] applies the Long Short-Term Mem-
ory network (LSTM) to a power plant system model in Shandong, The rest of the paper is organized as follows: The second section
China, and reports satisfactory test results. introduces the system background. The third section contains the
Besides, many scholars believe that combining multiple algo- methodology of Informer and the derivation of related formulas. In
rithms is also a valid method to improve prediction accuracy the fourth section, the experimental preparation and framework
[21,22]. ref. [23] proves the superiority of the integrated model are carried out. In the fifth section, we compare and analyze the
through different methods for building energy evaluation. The experimental results and error evaluation of the models. The sixth
expert system in ref. [24] consists of the following data-driven section is the summary and prospect of this study.
methods: linear regression, random tree regression, feed-forward
neural network, and SVM, which are compared with a single 2. System background
model. In Ref. [25], a prediction model of construction heating and
cooling loads is proposed, which is based on an improved PSO al- 2.1. District heating system
gorithm and LSTM model. Ref. [26] proposes a hybrid algorithm,
which combines similar day selection, empirical mode decompo- DHS plays a positive role in promoting China's economic
sition and LSTM. development and urban construction. Directly related to social and
Due to the diversity of DHS and data features, the prediction public interests, it is a key factor for people's quality of life and
performance of machine learning models needs a concrete analysis social sustainable development. The first-generation DHS origi-
of specific situations [27]. Many popular prediction methods are nated in the United States in the 1880s, which uses steam as the
2
medium and concrete as the pipeline. The second-generation DHS 22, 2018. The operation data of DHS include the 24 h-ahead heating
lasted from the 1930se1970s and began to use 100 C hot water as load, 1 h-ahead heating load and the actual heating load. Due to the
the carrier. After the 1970s, the third-generation DHS occupied a importance of meteorological conditions, outdoor temperature,
dominant position. Hot water as the medium is usually less than outdoor humidity, wind speed and AQI are taken into consider-
100 C and is transported in underground insulated pipes. With the ation. Fig. 2 shows the actual heating load curve of the DHS.
development of information technology, the fourth generation DHS
is the future trend. It is characterized by reduced energy con- 3. Methodology
sumption, sustainable energy and integration of intelligent systems
[7]. 3.1. Transformer
As shown in Fig. 1, DHS is mainly composed of heating sources,
district heating networks, heat exchange stations and customers. With continuous evolution and development, Transformer has
Produced from a heating source, hot water is sent to the heat ex- been applied to various research fields. Like most competitive
change station through the primary network. Then the heat is neural sequence transduction models, Transformer has an encoder-
transferred to the secondary pipe network through the heat ex- decoder structure. In encoder, the input sequence
change station. Finally, the hot water in the secondary pipe network X ¼ ðx1 ; x2 ; …; xn Þ is mapped to the given continuous representa-
is transmitted to users. Sufficient heating source supply is of great tion sequence Z ¼ ðz1 ; z2 ; …; zn Þ. Consuming the previously gener-
value to the normal operation of DHS. Consequently, this study ated symbols as an additional input, an output sequence
focuses on the total heating load of heating source production. Y ¼ ðy1 ; y2 ; …; yn Þ will be generated one element at one time in the
decoder. The structure of Transformer is shown in Fig. 3.
2.2. Data set Encoder: The encoder consists of stacks of n ¼ 6 identical layers.
Each layer contains two sub-layers, a multi-head self-attention
The influential factors of heating load can be mainly divided into mechanism and a fully connected feed-forward network. Defined
four categories: time variables, weather parameters, DHS operation that Sublayer(x) is the function implemented by the sub-layer, the
inertia and user social behavior [21]. Time variables include year, output of each sub-layer is LayerNormðx þ SublayerðxÞÞ. The output
month, day, hour, minute, etc. Time variables can reflect the social dimension is 512.
behaviors of users, which will result in weekly and daily heating Decoder: The decoder also consists of a stack of n ¼ 6 identical
regularities. For the weekly regularity, the heat of residential layers. Compared with the encoder, the decoder adds a third sub-
buildings usually stays lower on weekdays and increases on layer, which performs a multi-head attention mechanism on the
weekends. For the daily regularity, as most users need to work, the output of the encoder. Identical to the encoder, residual connection
heat demand is relatively small during the day, while it will in- is adopted for each layer, followed by layer normalization. The
crease when the users get home at night. For the large DHSs, the purpose of the mask in self-attention mechanism is to avoid getting
regularities can be more obvious. Weather parameters, the decisive future information during decoding.
factors of heating load, contain outdoor temperature, outdoor hu- Attention: The standard self-attention mechanism is defined.
midity, wind speed and AQI. The smog can change the amount of The continuous representation sequence is calculated as follows:
radiation reaching the surface to affect the temperature, thus !
indirectly affecting the heating load. Considering the consecutive QK T
days of severe smog occurring in Tianjin, we select the AQI as the air Z ¼ AttentionðQ ; K; VÞ ¼ softmax pffiffiffiffiffi V (1)
dk
condition in our work. The operational characteristics of DHS are
also a critical factor, including supply and return water tempera-
ture, flow rate, DHS regulation signal, etc. Since the heating load of
DHS is a function of mass flow rate and supply and return water 3.2. Informer: a new model for time series prediction
temperature, the time series of historical heating load can be
regarded as a comprehensive variable which can implicitly reflect (1) The probsparse self-attention is proposed. The original self-
the regulation process and thermal inertia in DHS, which could lead attention is written as probability formula:
to the correlation between historical heating load and subsequent

heating load. This study takes a variety of influencing factors as the X k qi ; k j
input characteristics of the model. Aðqi ; K; VÞ ¼ P vj ¼ Epðkj jq Þ vj (2)
j l kðq i ; k l Þ i
In Tianjin, China, we have established a data acquisition plat-
form to collect DHS actual operation data and relevant weather kðqi ;kj Þ
data. This study is based on the data set from February 17 to March where pðkj qi Þ ¼ P kðq ;k Þ
and kðqi ; kj Þ selects the asymmetric
l i l
T
qi k j
exponential kernel pffiffi .
d
Sparsity criterion of query: KL divergence of attention proba-

bility pðkj qi Þ and uniform distribution qðkj qi Þ ¼ L1K .
1
XLK
1 =L
KLðqjjpÞ ¼ ln K
L k qi ; kj P
j¼1 K l ðqi ; k l Þ
!
X Lk q k T
1 X
Lk
qi kTl i j
¼ ln exp p ffiffiffi pffiffiffi lnLk (3)
d Lk j¼1 d
l¼1
Dropping the constant, the sparsity measurement of the i-th

Fig. 1. Diagrammatic sketch of DHS. query is defined as:
3
Fig. 2. The actual heating load curve.
obstacle that the input is too long to stack is settled. The

distilling process from layer j to layer jþ1 is as follows:
h i
t
Xjþ1 ¼ MaxPool ELU Conv1d Xjt (6)
AB
(3) For the hindrance that the dynamic decoding results in too
long output to make a rapid prediction, the key is to extract a
shorter sequence from the input sequence. For example, use
the known 7-day data as the label length to predict the last 7-
day predicted length.
The structure of Informer is shown in Fig. 4.
Fig. 3. The structure of Transformer.
!
X
LK qi kTj 1 X LK q kT
i j
Mðqi ; KÞ ¼ ln exp pffiffiffi pffiffiffi (4)
j¼1 d LK j¼1 d
The number of the dominant query is u ¼ c*lnLk . Therefore, the

probsparse self-attention is defined as:
!
Q KT
AðQ ; K; VÞ ¼ softmax pffiffiffi V (5)
d
Finally, sampling Lk lnLk dot product pairs, the computational

complexity of attention is diminished in each layer to OðLk lnLk Þ. The
details of formula derivation can be found in Ref. [35].
(2) The distilling is introduced into the encoder to halve the

computational complexity OðLk lnLk Þ of each layer. The Fig. 4. The structure of Informer.
4
3.3. Relative position encoding method every hour for 34 days. Eight features are selected as the input of
Informer. The actual heating load is taken as the prediction target
The location encoding formula of Transformer is as follows [30]: column, as shown in Table 1. The parameter settings in Informer are
sketched in Appendix Table 6.
pos
PEðpos; 2iÞ ¼ sin (7)
100002i=dmodel
Table 1
pos Inputs of Informer.
PEðpos; 2i þ 1Þ ¼ cos (8)
100002i=dmodel Input Variable name
pos is the location of the input data and i is the dimension of the x1 Date
data. x2 Outdoor temperature
x3 Outdoor humidity
The attention score of query, key and value is calculated as x4 Wind speed
follows: x5 AQI
x6 24 h-ahead heating load
qi ¼ Wq ðExi þ Pi Þ (9) x7 1 h-ahead heating load
x8(Target) Actual heating load

kj ¼ Wk Exj þ Pj (10)
4.2. Experimental framework

vj ¼ Wv Exj þ Pj (11)
The basic framework of the heating load forecasting model in-
T cludes data collection, data preprocessing, feature engineering,
Si;j ¼ qi kTj ¼ Wq ðExi þ Pi Þ Wk Exj þ Pj (12)
data splitting, model training, model evaluation and results anal-
ysis, as shown in Fig. 5.
ai;j ¼ softmax Si;j (13)
X
Аi;j ¼ Si;j vj (14)
j
Wq , Wk , Wv are the query, key and value parameters added to

each head by multi-head attention. Exi and Exj is the data embed-
ding vector of xi and xj . Pi and Pj is the position embedding of the i-
th position and the j-th position. The factorization of (12) results in
(15):
Si;j ¼ Exi Wq WkT ExTj þ Exi Wq WkT PjT þ Pi Wq WkT Exj

T
þ Pi Wq WkT PjT
(15)
Ref. [36] proved that after adding an unknown linear change, the
Transformer will lose the relative position information.
After the introduction of relative position encoding method [37],
the model can adapt to the length of the sequence that has not been
seen before, and (15) is converted to the following formula:
T
Si;j ¼ Exi Wq Wk;E ExTj þ Exi Wq Wk;R
T
RTij þ uWk;E
T T
Exj T
þ vWk;R RTij
(16)
PjT is replaced by RTij which represents the relative position.

Pi Wq is replaced by trainable parameters u2Rd in the third term
and v2Rd in the fourth term, which means the attentive bias should
T and W T
remain unchanged. Finally, the two weight matrices Wk;E k;R
are separated to generate content-based and location-based key
vectors respectively.
The error evaluation of Informer with relative position encoding
method (Informer_rel) and the basic model (Informer_bas) will be
tested and analyzed in 5.2.
4. Experimental preparation and framework
4.1. Experimental preparation

Fig. 5. Framework of the experiment.
In this paper, Informer is applied to heating load forecasting. We
gauged the environmental variables of a DHS in Tianjin, China,
5
Data collection: In our data acquisition platform, the data in- Table 3
cludes hourly internal data (historical operational data) and Evaluation results of all models in the same dataset. (Pred_length ¼ 168 h).
external variables (weather data) is collected. The available data Informer ARIMA MLP RNN LSTM
sets range from February 17, 2018 to March 22, 2018. MAE 0.148 0.129 0.195 0.197 0.185
Data preprocessing: The sources of historical production data MSE 0.035 0.042 0.054 0.056 0.064
and weather forecast information data of DHS are different. When MAPE 0.071 0.122 0.109 0.118 0.122
they are integrated for training and testing, all abnormal values of
actual operation data are removed, and the linear interpolation
method is utilized to fill in the missing values.
Feature engineering: After data preprocessing, eight charac-
teristic variables are obtained, which are dates, outdoor tempera-
ture, outdoor humidity, wind speed, AQI, 24 h-ahead heating load,
1 h-ahead heating load and the actual heating load. The date col-
umn will be input into the target model as time information.
Data splitting: In our work, we divide the training set, verifi-
cation set and test set according to the ratio of 6:2:2, which is
actually a traditional data division method.
Model training: In our experiment, the basic Informer will be
compared with ARIMA, MLP, RNN and LSTM to evaluate the best
performance model.
Model evaluation: MAE, MSE and MAPE are the most
commonly used error evaluation methods. MAE is the real error
between the predicted data and the actual data, which is only
related to the size of the data. MSE ensures that each term is pos-
itive and differentiable. MAPE is expressed as a percentage and can
be used to compare the predictions of different proportions.
1 XN
b n Þ2
MSE ¼ ðPn P (17)
N n¼1
N b
100% X P n P n
MAPE ¼ (18)
N n¼1 Pn
1 XN
bnj
MAE ¼ jPn P (19)
N n¼1
b n is the predicted load value

where Pn is the original load value, P
and N is the total number of predicted points, namely pred_length.
Results analysis: Visualize the prediction results. We can more
intuitively observe the performance of the models and analyze
them.
5. Experiment and analysis
5.1. Experiment 1
We compare and analyze the Informer with the other four

machine learning time series prediction models, on the same data
set. In this experiment, the pred_length is 24 h and 168 h respec-
tively. The evaluation results of all models on the same data set are
summarized in Tables 2 and 3, and the best results are shown in
bold. The visual bar chart is shown in Fig. 6.
Table 2
Evaluation results of all models in the same dataset. (Pred_length ¼ 24 h).
Informer ARIMA MLP RNN LSTM
MAE 0.154 0.168 0.178 0.185 0.284

MSE 0.040 0.057 0.055 0.065 0.140
MAPE 0.076 0.138 0.105 0.122 0.202 Fig. 6. Evaluation results of all models. (a) Pred_length ¼ 24 h. (b)
Pred_length ¼ 168 h.
6
The error value of each model indicates the performance of the 0e75 h and the later peaks are significantly lower than the original
corresponding model. When pred_length ¼ 24 h in advance, curve. The delay problem also exists in RNN and LSTM. For RNN, the
Informer shows the minimum values in all the error evaluation problem of curve mutation is more serious. For LSTM, contrary to
indexes. When pred_length ¼ 168 h in advance, there are 2 mini- MLP, the forecasting results of lower level load are not satisfactory.
mum values of Informer in the three error evaluation indexes, while Specifically, in Fig. 10 (b), the predicted curve of LSTM at about 80 h,
ARIMA, MLP, RNN and LSTM account for 1, 0, 0 and 0 respectively. 100 h and 125 h is significantly higher than the original curve.
As can be seen from Figs. 7e11, the four common prediction Informer has successfully predicted the trend of actual data, and
models show good prediction performance. As a relatively well- there is no impact of time delay. The more stable curve is the
developed and accurate algorithm, the prediction curve of ARIMA prominent advantage, which can play a healthy guiding role for
fits the trend of the actual data curve very well. However, the heating dispatching. The deficiency is that the curve does not fit the
prominent problem is the time delay, which leads to high errors. original data perfectly.
MLP also has an obvious time delay. But the more prominent
problem is that the forecasting results are not ideal at the higher-
level load. Specifically, in Fig. 8 (b), the prediction curve of MLP at
Fig. 7. Result of ARIMA. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.
7
Fig. 8. Result of MLP. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.
8
Fig. 9. Result of RNN. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.
9
Fig. 10. Result of LSTM. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.
10
Fig. 11. Result of informer. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.
5.2. Experiment 2 method shows better performance. The prediction curves of

Informer_rel can fit the historical curve more preferably with fewer
The comparison results of error evaluation between Infor- mutations. It indicates that the Informer with relative position
mer_rel and Informer_bas is shown in Table 4 and Table 5. The encoding method can further learn the data relationship, and
minimum error value is shown in bold. The prediction error of shows satisfactory results.
Informer_rel is reduced as we expected. Fig. 12 is the visualization
curve which indicates that Informer with relative position encoding
11
Table 4 6. Conclusion and prospect

Error evaluation of Informer_bas and Informer_rel. (Pred_length ¼ 24).
Informer_bas Informer_rel In this paper, Informer, a new time series forecasting model, is
MAE 0.154 0.151
applied to the field of DHS heating load forecasting. Based on the
MSE 0.040 0.038 actual operation data of a DHS in Tianjin, three error evaluation
MAPE 0.076 0.076 indexes are used to evaluate the five prediction models. In addition,
a relative position encoding strategy is introduced and compared
with the basic method. The following conclusions can be drawn
Table 5 from the experimental results:
Error evaluation of Informer_bas and Informer_rel. (Pred_length ¼ 168).
Informer_bas Informer_rel (1) Informer performs excellently in heating load forecasting

with strong forecasting potential.
MAE 0.148 0.129
MSE 0.035 0.029 (2) Compared with the traditional prediction models, the pre-
MAPE 0.071 0.076 diction curve of Informer is more stable and consistent with
Fig. 12. Results of comparison. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.
12
the characteristics of temperature change, i.e. smoothness, Appendix

which meets the requirements of actual heating dispatching.
(3) Positional encoding method has a very important impact on
the prediction ability of the Informer. The introduction of Table 6
Parameters setting of Informer.
relative positional encoding algorithm can make the model
better learn the sequence relationship between the data, and Parameters Description Value
effectively reduce the prediction mutation. enc_in encoder input size 7
dec_in decoder input size 7
Since Informer has shown excellent performance in the field of dec_out decoder output size 1
enc_layers layers of encoder 2
heating load forecasting, we will start our future work from the
dec_layers layers of decoder 1
following directions: n_heads numbers of heads 8
batch_size batch size 32
(1) In DHS load forecasting, there are many factors affecting heat seq_length sequence length 168
load. It is necessary to collect more data parameters and lab_length label length 24e168
pre_length prediction length 24e168
expand the data set of the model so that informer can d_model dimension of the model 512
analyze the specific situation. earlystop_patience early stopping patience 3
(2) The ultimate purpose of heat load forecasting is energy learning_rate learning rate 0.05
saving. We will conduct more in-depth research on the train_epochs train epochs 600
gpu GPU cuda0
model and study the energy-saving rate of district heating.
Dropout dropout 0.05
(3) The DHS studied here is in Tianjin, China. Due to the
outstanding performance of Informer, we will try to apply
Informer to specific DHS for online test, so as to make prac-
tical contributions to energy conservation and emission
reduction.
References
[1] Liao C, Ertesvåg IS, Zhao J. Energetic and exergetic efficiencies of coal-fired
CHP (combined heat and power) plants used in district heating systems of
Credit authorship contribution statement China. Energy 2013;57:671e81. https://doi.org/10.1016/j.energy.2013.05.055.
[2] Guo X, Hendel M. Urban water networks as an alternative source for district
heating and emergency heat-wave cooling. Energy 2018;145:79e87. https://
Mingju Gong: Project management, Supervision, Writing - re-
doi.org/10.1016/j.energy.2017.12.108.
view. Yin Zhao: Writing e original draft, Software programming. [3] Alkan MA, Keçebaş A, Yamankaradeniz N. Exergoeconomic analysis of a dis-
Jiawang Sun: Methodology, Conceptualization. Cuitian Han: trict heating system for geothermal energy using specific exergy cost method.
Investigation. Guannan Sun: Data management. Bo Yan: Energy 2013;60:426e34. https://doi.org/10.1016/j.energy.2013.08.017.
[4] Guo X, Goumba AP, Wang C. Comparison of direct and indirect active thermal
Visualization. energy storage strategies for large-scale solar heating systems. Energies
2019;12(10):1948. https://doi.org/10.3390/en12101948.
[5] Rezaie B, Rosen MA. District heating and cooling: review of technology and
potential enhancements. Appl Energy 2012;93:2e10. https://doi.org/10.1016/
Declaration of competing interest j.apenergy.2011.04.020.
[6] Gong M, Bai Y, Qin J, Wang J, Yang P, Wang S. Gradient boosting machine for
The authors declare that they have no known competing predicting return temperature of district heating system: a case study for
residential buildings in Tianjin. J Build Eng 2020;27:100950. https://doi.org/
financial interests or personal relationships that could have 10.1016/j.jobe.2019.100950.
appeared to influence the work reported in this paper. [7] Lund H, Werner S, Wiltshire R, Svendsen S, Thorsen JE, Hvelplund F, et al. 4th
generation district heating (4GDH). Energy 2014;68:1e11. https://doi.org/
10.1016/j.energy.2014.02.089.
[8] Idowu S, Saguna S, Åhlund C, Schele n O. Applied machine learning: fore-
Acknowledgements casting heat load in district heating system. Energy Build 2016;133:478e88.
https://doi.org/10.1016/j.enbuild.2016.09.068.
This study is supported by Tianjin Technical Expert Project [9] Karimi M, Karami H, Gholami M, Khatibzadehazad H, Moslemi N. Priority
index considering temperature and date proximity for selection of similar
(19JCTPJC55700) and the research project on district heating days in knowledge-based short term load forecasting method. Energy
energy-saving technology based on big data and deep learning. 2018;144:928e40. https://doi.org/10.1016/j.energy.2017.12.083.
13
[10] Chakhchoukh Y, Panciatici P, Mili L. Electric load forecasting based on statis- [24] Geysen D, De Somer O, Johansson C, Brage J, Vanhoudt D. Operational thermal
tical robust methods. IEEE Trans Power Syst 2011;26(3):982e91. https:// load forecasting in district heating networks using machine learning and
doi.org/10.1109/tpwrs.2010.2080325. expert advice. Energy Build 2018;162:144e53. https://doi.org/10.1016/
[11] Izadyar N, Ghadamian H, Ong HC, moghadam Z, Tong CW, Shamshirband S. j.enbuild.2017.12.042.
Appraisal of the support vector machine to forecast residential heating de- [25] Xudong L, Shuo L, Qingwu F. Prediction of building heating and cooling load
mand for the District Heating System based on the monthly overall natural based on IPSO-LSTM neural network. Chinese Automation Congress (CAC).
gas consumption. Energy 2015;93:1558e67. https://doi.org/10.1016/ IEEE; 2020. p. 1085e90. https://doi.org/10.1109/cac51589.2020.9327849.
j.energy.2015.10.015. [26] Zheng H, Yuan J, Chen L. Short-term load forecasting using EMD-LSTM neural

[12] Kurek T, Bielecki A, Swirski K, Wojdan K, Guzek M, Białek J, et al. Heat demand networks with a xgboost algorithm for feature importance evaluation. En-
forecasting algorithm for a Warsaw district heating network. Energy ergies 2017;10(8). https://doi.org/10.3390/en10081168.
2021;217. https://doi.org/10.1016/j.energy.2020.119347. [27] Ma Z, Song J, Zhang J. Energy consumption prediction of air-conditioning
[13] Liu E, Wang Y, Huang Y. Short-term Forecast of Multi-load of Electrical systems in buildings by selecting similar days based on combined weights.
Heating and Cooling in Regional Integrated Energy System Based on Deep Energy Build 2017;151:157e66. https://doi.org/10.1016/
LSTM RNN. In: IEEE 4th Conference on Energy Internet and Energy System j.enbuild.2017.06.053.
Integration (EI2). IEEE; 2020. p. 2994e8. https://doi.org/10.1109/ [28] Wang R, Lu S, Li Q. Multi-criteria comprehensive study on predictive algo-
ei250167.2020.9347300. rithm of hourly heating energy consumption for residential buildings. Sustain
[14] Iwafune Y, Yagita Y, Ikegami T, Ogimoto K. Short-term forecasting of resi- Cities Soc 2019;49. https://doi.org/10.1016/j.scs.2019.101623.
dential building load for distributed energy management. In: IEEE interna- [29] Barman M, Dev Choudhury NB, Sutradhar S. A regional hybrid Goa-SVM
tional energy conference (ENERGYCON). IEEE; 2014. p. 1197e204. https:// model based on similar day approach for short-term load forecasting in As-
doi.org/10.1109/ENERGYCON.2014.6850575. sam, India. Energy 2018;145:710e20. https://doi.org/10.1016/
[15] Fang T, Lahdelma R. Evaluation of a multiple linear regression model and j.energy.2017.12.156.
SARIMA model in forecasting heat demand for district heating system. Appl [30] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al.
Energy 2016;179:544e52. https://doi.org/10.1016/j.apenergy.2016.06.133. Attention is all you need. Advances in neural information processing systems,
[16] Shyh-Jier H, Kuang-Rong S. Short-term load forecasting via ARMA model vol. vol. 30. Curran Associates, Inc. https://doi.org/10.48550/arXiv.1706.
identification including non-Gaussian process considerations. IEEE Trans Po- 03762.
wer Syst 2003;18(2):673e9. https://doi.org/10.1109/tpwrs.2003.811010. [31] Child R, Gray S, Radford A, Sutskever I. Generating long sequences with sparse
[17] Zhang J, Wang S. Thermal Load Forecasting Based on PSO-SVR. In: IEEE 4th transformers. 2019. https://doi.org/10.48550/arXiv.1904.10509. arXiv pre-
International Conference on Computer and Communications (ICCC). IEEE; print arXiv:1904.10509.
2018. p. 2676e80. https://doi.org/10.1109/CompComm.2018.8780847. [32] Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, et al. Enhancing the Locality
[18] Yan Y, Zhang Z. Cooling, heating and electrical load forecasting method for and Breaking the Memory Bottleneck of Transformer on Time Series Fore-
integrated energy system based on SVR model. In: 2021 6th Asia conference casting. Advances in Neural Information Processing Systems, vol. vol. 32.
on power and electrical engineering (ACPEE). IEEE; 2021. p. 1753e8. https:// Curran Associates, Inc., p. arXiv preprint arXiv:1907.00235. https://doi.org/10.
doi.org/10.1109/acpee51499.2021.9436990. Published. 48550/arXiv.1907.00235.
[19] Luo XJ, Oyedele LO, Ajayi AO, Akinade OO. Comparative study of machine [33] Beltagy I, Peters ME, Cohan A. Longformer: the long-document transformer.
learning-based multi-objective prediction framework for multiple building 2020. https://doi.org/10.48550/arXiv.2004.05150. arXiv preprint arXiv:
energy loads. Sustain Cities Soc 2020;61:102283. https://doi.org/10.1016/ 2004.05150.
j.scs.2020.102283. [34] Wang S, Li BZ, Khabsa M, Fang H, Ma H. Linformer: self-attention with linear
[20] Liu J, Wang X, Zhao Y, Dong B, Lu K, Wang R. Heating load forecasting for complexity. 2020. https://doi.org/10.48550/arXiv.2006.04768. arXiv preprint
combined heat and power plants via strand-based LSTM. IEEE Access 2020;8: arXiv:2006.04768.
33360e9. https://doi.org/10.1109/access.2020.2972303. [35] Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient
[21] Xue P, Jiang Y, Zhou Z, Chen X, Fang X, Liu J. Multi-step ahead forecasting of transformer for long sequence time-series forecasting. 2020. https://doi.org/
heat load in district heating systems using machine learning algorithms. En- 10.48550/arXiv.2012.07436. arXiv preprint arXiv:2012.07436.
ergy 2019;188:116085. https://doi.org/10.1016/j.energy.2019.116085. [36] Yan H, Deng B, Li X, Qiu X. TENER: adapting transformer encoder for named
[22] Ghofrani M, Ghayekhloo M, Arabali A, Ghayekhloo A. A hybrid short-term load entity recognition. 2019. https://doi.org/10.48550/arXiv.1911.04474. arXiv
forecasting with a new input selection framework. Energy 2015;81(119): preprint arXiv:1911.04474.
777e86. https://doi.org/10.1016/j.energy.2015.01.028. [37] Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R. Transformer-XL:
[23] Chou J-S, Bui D-K. Modeling heating and cooling loads by artificial intelligence attentive language models beyond a fixed-length context. 2019. https://
for energy-efficient building design. Energy Build 2014;82:437e46. https:// doi.org/10.48550/arXiv.1901.02860. arXiv preprint arXiv:1901.02860.
doi.org/10.1016/j.enbuild.2014.07.036.
14

Heat

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Heat

Uploaded by

Copyright:

Available Formats

Energy 253 (2022) 124179

Contents lists available at ScienceDirect

Load forecasting of district heating system based on Informer

compared in ref. [28], which provides guidance for the effective

Dropping the constant, the sparsity measurement of the i-th

Fig. 2. The actual heating load curve.

obstacle that the input is too long to stack is settled. The

The structure of Informer is shown in Fig. 4.

Fig. 3. The structure of Transformer.

The number of the dominant query is u ¼ c*lnLk . Therefore, the

Finally, sampling Lk lnLk dot product pairs, the computational

(2) The distilling is introduced into the encoder to halve the

4.2. Experimental framework

Wq , Wk , Wv are the query, key and value parameters added to

Si;j ¼ Exi Wq WkT ExTj þ Exi Wq WkT PjT þ Pi Wq WkT Exj

PjT is replaced by RTij which represents the relative position.

4. Experimental preparation and framework

4.1. Experimental preparation

b n is the predicted load value

5. Experiment and analysis

We compare and analyze the Informer with the other four

Informer ARIMA MLP RNN LSTM

MAE 0.154 0.168 0.178 0.185 0.284

Fig. 7. Result of ARIMA. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.

Fig. 8. Result of MLP. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.

Fig. 9. Result of RNN. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.

Fig. 10. Result of LSTM. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.

Fig. 11. Result of informer. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.

5.2. Experiment 2 method shows better performance. The prediction curves of

Table 4 6. Conclusion and prospect

Informer_bas Informer_rel (1) Informer performs excellently in heating load forecasting

Fig. 12. Results of comparison. (a) Pred_length ¼ 24 h. (b) Pred_length ¼ 168 h.

the characteristics of temperature change, i.e. smoothness, Appendix

You might also like