Professional Documents
Culture Documents
Energy
journal homepage: www.elsevier.com/locate/energy
a r t i c l e i n f o a b s t r a c t
Article history: Accurate load forecasting of district heating systems (DHSs) is an essential guide to guaranteeing
Received 13 December 2021 effective energy production, distribution, and rational utilization. Artificial neural networks have been
Received in revised form extensively applied to heating energy prediction in DHS. Recently, a new time series prediction model
25 April 2022
namely Informer was proposed. This study proposes an Informer-based framework for DHS heating load
Accepted 1 May 2022
Available online 4 May 2022
forecasting. To explore the performance of Informer in heating load forecasting tasks, four forecasting
models namely Autoregressive Integrated Moving Average model, Multilayer Perceptron, Recurrent
Neural Network and Long Short-Term Memory network are established for comparison. The historical
Keywords:
District heating system
heating load, outdoor temperature, relative humidity, wind speed and air quality index of a DHS in
Heating load forecasting Tianjin are used as the input characteristics to comprehensively assess the performance of these five
Informer forecasting strategies. The prediction results of the models are evaluated and visualized. The experi-
Relative position encoding algorithm mental results show that the Informer-based forecasting model can achieve the most accurate and stable
predictions. Furthermore, a relative position encoding algorithm is introduced to enhance its general-
ization and robustness. Overall, the Informer-based framework can report satisfactory testing results. The
prediction curve is fitted to the trend of temperature change which can play an excellent guiding role in
heating dispatching.
© 2022 Elsevier Ltd. All rights reserved.
1. Introduction world [6], China has received extensive attention from all social
circles. Therefore, China has put forward the concept of carbon peak
As one of the essential energy services in densely populated and carbon neutralization, which means that the full utilization of
urban areas, district heating transports the heating medium from energy has become an urgent problem that needs to be solved. The
energy resources to customers to satisfy the heating requirement. fourth-generation district heating system (4GDHS) is the key to the
More than 80,000 sets of DHSs have been on stream worldwide. implementation of the sustainable development strategy. The
DHS, the most important energy system in China, is an effective introduction of intelligent control is the most significant feature of
strategy for heating production and distribution to meet residential 4GDHS [7]. In order to meet the actual heating demand of users
and commercial heating needs. The input energy of DHS comes without excess heating loss, heating load forecasting has become
from combined heating and power (CHP) plants [1], district boilers, an indispensable element of DHS. The accurate heating load fore-
industrial waste heating [2] and various natural heating sources casting can provide sufficient guidance for energy saving and
[3,4]. The heat source in DHS needs to be converted into the emission reduction so that the district heating operator can
required heating medium (hot water or steam) through equipment correctly control the operation of DHS and make effective decisions
in the energy station. Compared with local boilers, DHS provides for the heating system.
higher efficiency and less pollution [5]. In the past, heating load forecasting methods were mainly based
With the increasing global environmental problems, various on physical and mathematical methods [8], which need to establish
countries have formulated corresponding policies and measures. As the functional relationship between heating load and its influence
the largest energy-consuming economy and carbon emitter in the characteristics. However, due to the uncertainty of influencing
factors [9], such as holidays and extreme weather, the performance
of mathematical modeling methods will be enormously decreased.
* Corresponding author. Benefitting from the rise of machine learning and deep learning,
E-mail address: gmj790@163.com (M. Gong).
https://doi.org/10.1016/j.energy.2022.124179
0360-5442/© 2022 Elsevier Ltd. All rights reserved.
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
medium and concrete as the pipeline. The second-generation DHS 22, 2018. The operation data of DHS include the 24 h-ahead heating
lasted from the 1930se1970s and began to use 100 C hot water as load, 1 h-ahead heating load and the actual heating load. Due to the
the carrier. After the 1970s, the third-generation DHS occupied a importance of meteorological conditions, outdoor temperature,
dominant position. Hot water as the medium is usually less than outdoor humidity, wind speed and AQI are taken into consider-
100 C and is transported in underground insulated pipes. With the ation. Fig. 2 shows the actual heating load curve of the DHS.
development of information technology, the fourth generation DHS
is the future trend. It is characterized by reduced energy con- 3. Methodology
sumption, sustainable energy and integration of intelligent systems
[7]. 3.1. Transformer
As shown in Fig. 1, DHS is mainly composed of heating sources,
district heating networks, heat exchange stations and customers. With continuous evolution and development, Transformer has
Produced from a heating source, hot water is sent to the heat ex- been applied to various research fields. Like most competitive
change station through the primary network. Then the heat is neural sequence transduction models, Transformer has an encoder-
transferred to the secondary pipe network through the heat ex- decoder structure. In encoder, the input sequence
change station. Finally, the hot water in the secondary pipe network X ¼ ðx1 ; x2 ; …; xn Þ is mapped to the given continuous representa-
is transmitted to users. Sufficient heating source supply is of great tion sequence Z ¼ ðz1 ; z2 ; …; zn Þ. Consuming the previously gener-
value to the normal operation of DHS. Consequently, this study ated symbols as an additional input, an output sequence
focuses on the total heating load of heating source production. Y ¼ ðy1 ; y2 ; …; yn Þ will be generated one element at one time in the
decoder. The structure of Transformer is shown in Fig. 3.
2.2. Data set Encoder: The encoder consists of stacks of n ¼ 6 identical layers.
Each layer contains two sub-layers, a multi-head self-attention
The influential factors of heating load can be mainly divided into mechanism and a fully connected feed-forward network. Defined
four categories: time variables, weather parameters, DHS operation that Sublayer(x) is the function implemented by the sub-layer, the
inertia and user social behavior [21]. Time variables include year, output of each sub-layer is LayerNormðx þ SublayerðxÞÞ. The output
month, day, hour, minute, etc. Time variables can reflect the social dimension is 512.
behaviors of users, which will result in weekly and daily heating Decoder: The decoder also consists of a stack of n ¼ 6 identical
regularities. For the weekly regularity, the heat of residential layers. Compared with the encoder, the decoder adds a third sub-
buildings usually stays lower on weekdays and increases on layer, which performs a multi-head attention mechanism on the
weekends. For the daily regularity, as most users need to work, the output of the encoder. Identical to the encoder, residual connection
heat demand is relatively small during the day, while it will in- is adopted for each layer, followed by layer normalization. The
crease when the users get home at night. For the large DHSs, the purpose of the mask in self-attention mechanism is to avoid getting
regularities can be more obvious. Weather parameters, the decisive future information during decoding.
factors of heating load, contain outdoor temperature, outdoor hu- Attention: The standard self-attention mechanism is defined.
midity, wind speed and AQI. The smog can change the amount of The continuous representation sequence is calculated as follows:
radiation reaching the surface to affect the temperature, thus !
indirectly affecting the heating load. Considering the consecutive QK T
days of severe smog occurring in Tianjin, we select the AQI as the air Z ¼ AttentionðQ ; K; VÞ ¼ softmax pffiffiffiffiffi V (1)
dk
condition in our work. The operational characteristics of DHS are
also a critical factor, including supply and return water tempera-
ture, flow rate, DHS regulation signal, etc. Since the heating load of
DHS is a function of mass flow rate and supply and return water 3.2. Informer: a new model for time series prediction
temperature, the time series of historical heating load can be
regarded as a comprehensive variable which can implicitly reflect (1) The probsparse self-attention is proposed. The original self-
the regulation process and thermal inertia in DHS, which could lead attention is written as probability formula:
to the correlation between historical heating load and subsequent
heating load. This study takes a variety of influencing factors as the X k qi ; k j
input characteristics of the model. Aðqi ; K; VÞ ¼ P vj ¼ Epðkj jq Þ vj (2)
j l kðq i ; k l Þ i
In Tianjin, China, we have established a data acquisition plat-
form to collect DHS actual operation data and relevant weather kðqi ;kj Þ
data. This study is based on the data set from February 17 to March where pðkj qi Þ ¼ P kðq ;k Þ
and kðqi ; kj Þ selects the asymmetric
l i l
T
qi k j
exponential kernel pffiffi .
d
Sparsity criterion of query: KL divergence of attention proba-
bility pðkj qi Þ and uniform distribution qðkj qi Þ ¼ L1K .
1
XLK
1 =L
KLðqjjpÞ ¼ ln K
L k qi ; kj P
j¼1 K l ðqi ; k l Þ
!
X Lk q k T
1 X
Lk
qi kTl i j
¼ ln exp p ffiffiffi pffiffiffi lnLk (3)
d Lk j¼1 d
l¼1
h i
t
Xjþ1 ¼ MaxPool ELU Conv1d Xjt (6)
AB
(3) For the hindrance that the dynamic decoding results in too
long output to make a rapid prediction, the key is to extract a
shorter sequence from the input sequence. For example, use
the known 7-day data as the label length to predict the last 7-
day predicted length.
!
X
LK qi kTj 1 X LK q kT
i j
Mðqi ; KÞ ¼ ln exp pffiffiffi pffiffiffi (4)
j¼1 d LK j¼1 d
!
Q KT
AðQ ; K; VÞ ¼ softmax pffiffiffi V (5)
d
4
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
3.3. Relative position encoding method every hour for 34 days. Eight features are selected as the input of
Informer. The actual heating load is taken as the prediction target
The location encoding formula of Transformer is as follows [30]: column, as shown in Table 1. The parameter settings in Informer are
sketched in Appendix Table 6.
pos
PEðpos; 2iÞ ¼ sin (7)
100002i=dmodel
Table 1
pos Inputs of Informer.
PEðpos; 2i þ 1Þ ¼ cos (8)
100002i=dmodel Input Variable name
pos is the location of the input data and i is the dimension of the x1 Date
data. x2 Outdoor temperature
x3 Outdoor humidity
The attention score of query, key and value is calculated as x4 Wind speed
follows: x5 AQI
x6 24 h-ahead heating load
qi ¼ Wq ðExi þ Pi Þ (9) x7 1 h-ahead heating load
x8(Target) Actual heating load
kj ¼ Wk Exj þ Pj (10)
X
Аi;j ¼ Si;j vj (14)
j
T
Si;j ¼ Exi Wq Wk;E ExTj þ Exi Wq Wk;R
T
RTij þ uWk;E
T T
Exj T
þ vWk;R RTij
(16)
Data collection: In our data acquisition platform, the data in- Table 3
cludes hourly internal data (historical operational data) and Evaluation results of all models in the same dataset. (Pred_length ¼ 168 h).
external variables (weather data) is collected. The available data Informer ARIMA MLP RNN LSTM
sets range from February 17, 2018 to March 22, 2018. MAE 0.148 0.129 0.195 0.197 0.185
Data preprocessing: The sources of historical production data MSE 0.035 0.042 0.054 0.056 0.064
and weather forecast information data of DHS are different. When MAPE 0.071 0.122 0.109 0.118 0.122
they are integrated for training and testing, all abnormal values of
actual operation data are removed, and the linear interpolation
method is utilized to fill in the missing values.
Feature engineering: After data preprocessing, eight charac-
teristic variables are obtained, which are dates, outdoor tempera-
ture, outdoor humidity, wind speed, AQI, 24 h-ahead heating load,
1 h-ahead heating load and the actual heating load. The date col-
umn will be input into the target model as time information.
Data splitting: In our work, we divide the training set, verifi-
cation set and test set according to the ratio of 6:2:2, which is
actually a traditional data division method.
Model training: In our experiment, the basic Informer will be
compared with ARIMA, MLP, RNN and LSTM to evaluate the best
performance model.
Model evaluation: MAE, MSE and MAPE are the most
commonly used error evaluation methods. MAE is the real error
between the predicted data and the actual data, which is only
related to the size of the data. MSE ensures that each term is pos-
itive and differentiable. MAPE is expressed as a percentage and can
be used to compare the predictions of different proportions.
1 XN
b n Þ2
MSE ¼ ðPn P (17)
N n¼1
N b
100% X P n P n
MAPE ¼ (18)
N n¼1 Pn
1 XN
bnj
MAE ¼ jPn P (19)
N n¼1
5.1. Experiment 1
Table 2
Evaluation results of all models in the same dataset. (Pred_length ¼ 24 h).
6
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
The error value of each model indicates the performance of the 0e75 h and the later peaks are significantly lower than the original
corresponding model. When pred_length ¼ 24 h in advance, curve. The delay problem also exists in RNN and LSTM. For RNN, the
Informer shows the minimum values in all the error evaluation problem of curve mutation is more serious. For LSTM, contrary to
indexes. When pred_length ¼ 168 h in advance, there are 2 mini- MLP, the forecasting results of lower level load are not satisfactory.
mum values of Informer in the three error evaluation indexes, while Specifically, in Fig. 10 (b), the predicted curve of LSTM at about 80 h,
ARIMA, MLP, RNN and LSTM account for 1, 0, 0 and 0 respectively. 100 h and 125 h is significantly higher than the original curve.
As can be seen from Figs. 7e11, the four common prediction Informer has successfully predicted the trend of actual data, and
models show good prediction performance. As a relatively well- there is no impact of time delay. The more stable curve is the
developed and accurate algorithm, the prediction curve of ARIMA prominent advantage, which can play a healthy guiding role for
fits the trend of the actual data curve very well. However, the heating dispatching. The deficiency is that the curve does not fit the
prominent problem is the time delay, which leads to high errors. original data perfectly.
MLP also has an obvious time delay. But the more prominent
problem is that the forecasting results are not ideal at the higher-
level load. Specifically, in Fig. 8 (b), the prediction curve of MLP at
7
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
8
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
9
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
10
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
11
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
Informer_bas Informer_rel In this paper, Informer, a new time series forecasting model, is
MAE 0.154 0.151
applied to the field of DHS heating load forecasting. Based on the
MSE 0.040 0.038 actual operation data of a DHS in Tianjin, three error evaluation
MAPE 0.076 0.076 indexes are used to evaluate the five prediction models. In addition,
a relative position encoding strategy is introduced and compared
with the basic method. The following conclusions can be drawn
Table 5 from the experimental results:
Error evaluation of Informer_bas and Informer_rel. (Pred_length ¼ 168).
12
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
[1] Liao C, Ertesvåg IS, Zhao J. Energetic and exergetic efficiencies of coal-fired
CHP (combined heat and power) plants used in district heating systems of
Credit authorship contribution statement China. Energy 2013;57:671e81. https://doi.org/10.1016/j.energy.2013.05.055.
[2] Guo X, Hendel M. Urban water networks as an alternative source for district
heating and emergency heat-wave cooling. Energy 2018;145:79e87. https://
Mingju Gong: Project management, Supervision, Writing - re-
doi.org/10.1016/j.energy.2017.12.108.
view. Yin Zhao: Writing e original draft, Software programming. [3] Alkan MA, Keçebaş A, Yamankaradeniz N. Exergoeconomic analysis of a dis-
Jiawang Sun: Methodology, Conceptualization. Cuitian Han: trict heating system for geothermal energy using specific exergy cost method.
Investigation. Guannan Sun: Data management. Bo Yan: Energy 2013;60:426e34. https://doi.org/10.1016/j.energy.2013.08.017.
[4] Guo X, Goumba AP, Wang C. Comparison of direct and indirect active thermal
Visualization. energy storage strategies for large-scale solar heating systems. Energies
2019;12(10):1948. https://doi.org/10.3390/en12101948.
[5] Rezaie B, Rosen MA. District heating and cooling: review of technology and
potential enhancements. Appl Energy 2012;93:2e10. https://doi.org/10.1016/
Declaration of competing interest j.apenergy.2011.04.020.
[6] Gong M, Bai Y, Qin J, Wang J, Yang P, Wang S. Gradient boosting machine for
The authors declare that they have no known competing predicting return temperature of district heating system: a case study for
residential buildings in Tianjin. J Build Eng 2020;27:100950. https://doi.org/
financial interests or personal relationships that could have 10.1016/j.jobe.2019.100950.
appeared to influence the work reported in this paper. [7] Lund H, Werner S, Wiltshire R, Svendsen S, Thorsen JE, Hvelplund F, et al. 4th
generation district heating (4GDH). Energy 2014;68:1e11. https://doi.org/
10.1016/j.energy.2014.02.089.
[8] Idowu S, Saguna S, Åhlund C, Schele n O. Applied machine learning: fore-
Acknowledgements casting heat load in district heating system. Energy Build 2016;133:478e88.
https://doi.org/10.1016/j.enbuild.2016.09.068.
This study is supported by Tianjin Technical Expert Project [9] Karimi M, Karami H, Gholami M, Khatibzadehazad H, Moslemi N. Priority
index considering temperature and date proximity for selection of similar
(19JCTPJC55700) and the research project on district heating days in knowledge-based short term load forecasting method. Energy
energy-saving technology based on big data and deep learning. 2018;144:928e40. https://doi.org/10.1016/j.energy.2017.12.083.
13
M. Gong, Y. Zhao, J. Sun et al. Energy 253 (2022) 124179
[10] Chakhchoukh Y, Panciatici P, Mili L. Electric load forecasting based on statis- [24] Geysen D, De Somer O, Johansson C, Brage J, Vanhoudt D. Operational thermal
tical robust methods. IEEE Trans Power Syst 2011;26(3):982e91. https:// load forecasting in district heating networks using machine learning and
doi.org/10.1109/tpwrs.2010.2080325. expert advice. Energy Build 2018;162:144e53. https://doi.org/10.1016/
[11] Izadyar N, Ghadamian H, Ong HC, moghadam Z, Tong CW, Shamshirband S. j.enbuild.2017.12.042.
Appraisal of the support vector machine to forecast residential heating de- [25] Xudong L, Shuo L, Qingwu F. Prediction of building heating and cooling load
mand for the District Heating System based on the monthly overall natural based on IPSO-LSTM neural network. Chinese Automation Congress (CAC).
gas consumption. Energy 2015;93:1558e67. https://doi.org/10.1016/ IEEE; 2020. p. 1085e90. https://doi.org/10.1109/cac51589.2020.9327849.
j.energy.2015.10.015. [26] Zheng H, Yuan J, Chen L. Short-term load forecasting using EMD-LSTM neural
[12] Kurek T, Bielecki A, Swirski K, Wojdan K, Guzek M, Białek J, et al. Heat demand networks with a xgboost algorithm for feature importance evaluation. En-
forecasting algorithm for a Warsaw district heating network. Energy ergies 2017;10(8). https://doi.org/10.3390/en10081168.
2021;217. https://doi.org/10.1016/j.energy.2020.119347. [27] Ma Z, Song J, Zhang J. Energy consumption prediction of air-conditioning
[13] Liu E, Wang Y, Huang Y. Short-term Forecast of Multi-load of Electrical systems in buildings by selecting similar days based on combined weights.
Heating and Cooling in Regional Integrated Energy System Based on Deep Energy Build 2017;151:157e66. https://doi.org/10.1016/
LSTM RNN. In: IEEE 4th Conference on Energy Internet and Energy System j.enbuild.2017.06.053.
Integration (EI2). IEEE; 2020. p. 2994e8. https://doi.org/10.1109/ [28] Wang R, Lu S, Li Q. Multi-criteria comprehensive study on predictive algo-
ei250167.2020.9347300. rithm of hourly heating energy consumption for residential buildings. Sustain
[14] Iwafune Y, Yagita Y, Ikegami T, Ogimoto K. Short-term forecasting of resi- Cities Soc 2019;49. https://doi.org/10.1016/j.scs.2019.101623.
dential building load for distributed energy management. In: IEEE interna- [29] Barman M, Dev Choudhury NB, Sutradhar S. A regional hybrid Goa-SVM
tional energy conference (ENERGYCON). IEEE; 2014. p. 1197e204. https:// model based on similar day approach for short-term load forecasting in As-
doi.org/10.1109/ENERGYCON.2014.6850575. sam, India. Energy 2018;145:710e20. https://doi.org/10.1016/
[15] Fang T, Lahdelma R. Evaluation of a multiple linear regression model and j.energy.2017.12.156.
SARIMA model in forecasting heat demand for district heating system. Appl [30] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al.
Energy 2016;179:544e52. https://doi.org/10.1016/j.apenergy.2016.06.133. Attention is all you need. Advances in neural information processing systems,
[16] Shyh-Jier H, Kuang-Rong S. Short-term load forecasting via ARMA model vol. vol. 30. Curran Associates, Inc. https://doi.org/10.48550/arXiv.1706.
identification including non-Gaussian process considerations. IEEE Trans Po- 03762.
wer Syst 2003;18(2):673e9. https://doi.org/10.1109/tpwrs.2003.811010. [31] Child R, Gray S, Radford A, Sutskever I. Generating long sequences with sparse
[17] Zhang J, Wang S. Thermal Load Forecasting Based on PSO-SVR. In: IEEE 4th transformers. 2019. https://doi.org/10.48550/arXiv.1904.10509. arXiv pre-
International Conference on Computer and Communications (ICCC). IEEE; print arXiv:1904.10509.
2018. p. 2676e80. https://doi.org/10.1109/CompComm.2018.8780847. [32] Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, et al. Enhancing the Locality
[18] Yan Y, Zhang Z. Cooling, heating and electrical load forecasting method for and Breaking the Memory Bottleneck of Transformer on Time Series Fore-
integrated energy system based on SVR model. In: 2021 6th Asia conference casting. Advances in Neural Information Processing Systems, vol. vol. 32.
on power and electrical engineering (ACPEE). IEEE; 2021. p. 1753e8. https:// Curran Associates, Inc., p. arXiv preprint arXiv:1907.00235. https://doi.org/10.
doi.org/10.1109/acpee51499.2021.9436990. Published. 48550/arXiv.1907.00235.
[19] Luo XJ, Oyedele LO, Ajayi AO, Akinade OO. Comparative study of machine [33] Beltagy I, Peters ME, Cohan A. Longformer: the long-document transformer.
learning-based multi-objective prediction framework for multiple building 2020. https://doi.org/10.48550/arXiv.2004.05150. arXiv preprint arXiv:
energy loads. Sustain Cities Soc 2020;61:102283. https://doi.org/10.1016/ 2004.05150.
j.scs.2020.102283. [34] Wang S, Li BZ, Khabsa M, Fang H, Ma H. Linformer: self-attention with linear
[20] Liu J, Wang X, Zhao Y, Dong B, Lu K, Wang R. Heating load forecasting for complexity. 2020. https://doi.org/10.48550/arXiv.2006.04768. arXiv preprint
combined heat and power plants via strand-based LSTM. IEEE Access 2020;8: arXiv:2006.04768.
33360e9. https://doi.org/10.1109/access.2020.2972303. [35] Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient
[21] Xue P, Jiang Y, Zhou Z, Chen X, Fang X, Liu J. Multi-step ahead forecasting of transformer for long sequence time-series forecasting. 2020. https://doi.org/
heat load in district heating systems using machine learning algorithms. En- 10.48550/arXiv.2012.07436. arXiv preprint arXiv:2012.07436.
ergy 2019;188:116085. https://doi.org/10.1016/j.energy.2019.116085. [36] Yan H, Deng B, Li X, Qiu X. TENER: adapting transformer encoder for named
[22] Ghofrani M, Ghayekhloo M, Arabali A, Ghayekhloo A. A hybrid short-term load entity recognition. 2019. https://doi.org/10.48550/arXiv.1911.04474. arXiv
forecasting with a new input selection framework. Energy 2015;81(119): preprint arXiv:1911.04474.
777e86. https://doi.org/10.1016/j.energy.2015.01.028. [37] Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R. Transformer-XL:
[23] Chou J-S, Bui D-K. Modeling heating and cooling loads by artificial intelligence attentive language models beyond a fixed-length context. 2019. https://
for energy-efficient building design. Energy Build 2014;82:437e46. https:// doi.org/10.48550/arXiv.1901.02860. arXiv preprint arXiv:1901.02860.
doi.org/10.1016/j.enbuild.2014.07.036.
14