Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: Occupant behaviour has significant impacts on the performance of machine learning algorithms when
Received 20 September 2022 predicting building energy consumption. Due to a variety of reasons (e.g., underperforming building
Revised 6 January 2023 energy management systems or restrictions due to privacy policies), the availability of occupational data
Accepted 9 January 2023
has long been an obstacle that hinders the performance of machine learning algorithms in predicting
Available online 12 January 2023
building energy consumption. Therefore, this study proposed an agent-based machine learning model
whereby agent-based modelling was employed to generate simulated occupational data as input features
Keywords:
for machine learning algorithms for building energy consumption prediction. Boruta feature selection
Building energy consumption
Prediction
was also introduced in this study to select all relevant features. The results indicated that the perfor-
Machine learning mances of machine learning algorithms in predicting building energy consumption were significantly
Agent-based modelling improved when using simulated occupational data, with even greater improvements after conducting
Occupant behaviour Boruta feature selection.
Ó 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.enbuild.2023.112797
0378-7788/Ó 2023 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Fig. 1. Relationship between occupants, indoor environmental quality and energy [6].
using physical methods can lead to an inaccurate prediction by up desire for a comfortable indoor environment has a great impact on
to 40 %. energy consumption. However, when it comes to public buildings
The underlying principle of AI methods is based on the mathe- where a large number of people are gathered closely, the move-
matical creation of a matrix that is able to map input features with ment or the route of individuals (which will be influenced by
output energy consumption correspondingly [19]. Compared with crowd) is less discussed, despite the impact of occupant movement
physical-based methods, AI methods do not require detailed infor- on building energy consumption. For instance, the operational
mation about every aspect related to building energy usage to con- schedules of some electric appliances (including motion-sensitive
figure the model [20]. In terms of predicting occupant behaviours, lightings, automated gates and lifts, etc.,) depend on the movement
the most frequently used method is regression analysis which of occupants.
attempts to map the probability of certain behaviours such as Based on the above premise, this study attempts to develop a
window-opening with energy consumption [21]. However, as comprehensive hybrid framework that combines AI methods with
data-driven methods, data availability and quality significantly occupant behaviours modelling (i.e., ABM) to predict building
determine the performance of AI methods. Due to privacy con- energy consumption. Social force modelling (SFM) is also inte-
cerns, occupational data are often the most difficult to obtain, grated into the proposed hybrid framework to simulate occupant
which in turn impedes the application of AI methods in predicting movement. The remaining sections of the paper are organised as
occupancy and occupant behaviour [22]. Besides oversimplifica- follows; Section 2 briefly reviews the existing knowledge regarding
tion or data availability dilemma, when considering combining building energy consumption and occupant behaviour modelling,
occupant behaviour, occupant presence was the most frequently while Section 3 provides an overview of the research methodology
used feature for both physical-based and AI methods, but less far as well as a description of the selected case studies. The detailed
emphasis has been placed on adaptive behaviours/occupant beha- results obtained from the study as well as their implications were
viours (e.g., adjusting shades and windows, switching on/off lights, presented in Section 4. Finally, Section 5 summarised the conclu-
setting the thermostat, etc.), which would also have a significant sion of the entire study and showed potential future works.
impact on the predicted energy consumption [23]. Adaptive beha-
viour is defined as the conceptual, social and practical skills that an
individual requires to cope with everyday life [24]. When it relates 2. Literature review
to the built environment, the adaptive principle of thermal comfort
states that an individual interacts with the environment (particu- A widely-recognised and most commonly observed character-
larly when uncomfortable) to restore his/her comfort [25] such istic of occupant behaviour is its stochasticity and in order to
as adjusting shades and windows, switching on/off lights, setting more accurately describe the uncertainty of occupant behaviour,
the thermostat, etc., [26–29]. Considering the heterogeneity of stochastic process and agent-based modelling have been imple-
individual preference towards thermal comfort, agent-based mod- mented in predicting occupant behaviour [4,32,33]. The core idea
elling (ABM) has been extensively employed to simulate the com- of a stochastic process modelling is to estimate the state of occu-
plex adaptive behaviour of occupants. For instance, Zhang et al. pancy and building energy-related appliances (e.g., opening/clo-
[30] proposed an ABM approach to simulate the clothing and activ- sure of windows and doors, on/off of light and presence/
ity behaviour of occupants in buildings, based on the comfort level absence of occupants). Some of the most popular methods include
of the occupants. Similarly, Jiang et al. [31] implemented ABM to logistic regression, Markov chain, passion process and survival
simulate the interaction between occupants and heating, ventila- analysis. A Markov chain is a discrete random process that the
tion and air-conditioning (HAVC) system that aimed to develop state of the next event or process is determined by the current
an energy-efficient building energy management system (BEMS). ones [32]. Haldi and Robinson [34] simulated the occupant beha-
It is noticed that attention to adaptive behaviour has skewed viour on shading devices using a Markov process. A 6-year mea-
towards indoor thermal comfort which is reasonable as occupants’ surement and a field survey were conducted to identify the
2
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
impact of the occupancy, thermal and visual parameters on shad- levels will influence the status of door, window, shades, lights
ing behaviour. The results suggested a significant improvement as well as heating, ventilation and air conditioning.
compared to the deterministic methods. Similarly, Graeme Flett Despite ABM showing the flexibility and capability to simulate
and Nick Kelly [35] employed a Markov chain-based method to complex and stochastic occupant behaviours, it is worth noting
generate realistic occupancy profiles for residential buildings. that like traditional physical-based methods, in order to predict
Transition probability matrices were determined with regards to the overall energy consumption of buildings, extensive details of
the probability of active occupancy based on time use survey energy components need to be manually considered when imple-
data, which generated a promising set of results. However, menting ABM for simulating building energy consumption. Other-
despite Markov chain-based methods taking the stochasticity of wise, a discrepancy in energy consumption will occur [42,43].
occupant behaviour into consideration, it is often criticised for Unlike ABM which has been frequently applied in simulating
the dependence between events. Also, it remains challenging for occupant energy-related behaviour, SFM is however more often
Markov chain methods to simulate multiple energy-related beha- employed in the research of pedestrians traffic other than occupant
viours simultaneously. Most importantly, Markov chain ignores behaviour. In terms of applying SFM for building energy simula-
the interaction between occupants as well as between occupants tion, Zhu et al. [44] investigated the energy-saving scheduling of
and their environments. campus classrooms by simulating students’ traffic in classrooms
In order to address the aforementioned issues, ABM has been based on students’ preferences for seats and times. A modified
introduced to simulate occupational energy behaviours in build- SFM was also implemented for traffic simulation. The main pur-
ings [36–38]. ABM is a computational model for the simulation pose of SFM in the study by Zhu et al. [44] is still to accurately
of the action and interactions of autonomous agents so as to describe occupant movement, but it should be emphasised that
understand the behaviour of a system. The individual agent is an accurate prediction of occupant traffic can facilitate the under-
able to assess its situation and make decisions based on certain standing of the operational reegimes of electric appliances (e.g.,
rules [39]. As a ‘bottom-up’ method, the global behaviour com- motion-sensitive lighting) depending on occupant traffic.
prises a summary of many individual agents interacting with each Based on this premise, this study proposed a hybrid framework
other and with their environment [1]. The agent can either be an that combines ABM and SFM with a machine learning method to
occupant or an electronic appliance. The flexibility and capability predict the electricity consumption of a high-volume self-
of ABM facilitate the simulation of complex behavioural aspects learning higher educational institution (HEI) building.
of occupants. Zhang et al. [3] simulated the energy consumption
of an office building with occupant behaviour taken into consid- 3. Methodology
eration using ABM. The state of agents (especially occupant, light-
ing, computers, etc.) and interactions between agents were An agent-based machine learning method was proposed in this
established respectively to simulate the energy consumption pro- study to improve the accuracy of building energy consumption
file of lighting and computers. A semblable energy consumption prediction by taking occupancy and occupant behaviour into con-
pattern was observed between the simulated and actual energy sideration. The schematic outline of the research is depicted in
consumption. Apart from simulating occupational energy con- Fig. 2, which consists of two components- an occupant behaviour
sumption, a variety of energy control strategies were set up for simulation module and an energy consumption prediction module.
validating the efficiency of each strategy accordingly, so as to ABM was first implemented to simulate the occupancy and occu-
benefit decision-making for energy management departments. pant behaviour inside an HEI self-learning hub, and the simulation
Similar research was conducted by Irman et al. [40] that estab- outcomes include the number of students, in-use/standby comput-
lished a hierarchical multi-resolution ABM to simulate the elec- ers, lighting, and charging personal electronic devices which served
tricity demand profile of 264 residential buildings. Besides the as input data for building energy consumption prediction with
state of agents and their interactions initially built by Zhang machine learning algorithms. Weather data and time information
et al. [3], Irman et al. [40] also introduced the heterogeneity into were also employed as input data. A detailed description of the
occupancy patterns and house profiles in terms of number of beds proposed method was demonstrated during the rest of this section.
and floor area. In addition, weather information was also
employed in their model, considering the significant impact of 3.1. Description of the case study building
weather on energy-related occupational behaviour of residential
buildings. The model exhibited a mean absolute percentage error Alan Gilbert learning commons, a 24 h a day and seven days a
of 17–29 % across 264 residential buildings, which implies the week (24–7) HEI self-learning hub or library built in 2014 was
robustness and generalisation capability of ABM in simulating selected for case study as shown in Fig. 3. With the principal objec-
the complexity of building energy consumption. Besides simulat- tive of minimising CO2 emissions, Alan Gilbert learning commons
ing the direct interaction between occupants and electricity- was equipped with considerable energy-efficient facilities, includ-
related appliances, efforts have been focused on investigating ing photovoltaic roof tiles and solar thermal systems and extensive
the indirect influence of occupants as well. For instance, Azar use of glass curtain walls.
and Menassa [1] implemented ABM to determine different occu- The electricity consumption in an office building according to
pants’ energy use patterns to represent the different and changing Zhang et al. [3] is comprised of two components, namely base con-
occupants’ energy characteristics over time in an office building. sumption and flexible consumption. Base consumption represents
Rebound effect and energy conservation interventions (word-of- the electric equipment and appliances that have to be switched on
mouth effect) were introduced into the model. Occupants were all the time (e.g., security cameras, information display, computer
classified into medium, high and low energy consumers whereby servers, refrigerators, etc.). Flexible consumption denotes the
medium energy consumers’ behaviour were influenced by the energy consumed by the kind of electric equipment and appliances
remaining two. Different energy usage scenarios were also simu- that can be switched on/off at any time by users. The electricity
lated to evaluate the potential in energy saving. Psychological fac- consumption can be mathematically expressed as shown in Equa-
tors were also studied by Barakat and Khoury [41] whereby they tion (1).
developed an ABM framework to study occupant multi-comfort
EC total ¼ EC base þ EC flexible ð1Þ
level in office buildings. Visual, thermal and acoustic comfort
3
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Where EC total is the total electricity consumption, EC base is the base ABM was employed to simulate occupant and occupant beha-
electricity consumption and EC flexible is the flexible electricity viour in Alan Gilbert learning commons in terms of electricity con-
consumption. sumption. The modelling mainly consists of 3 components namely,
4
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
environment, agent, and agent behaviours (interactions). The envi- environments, but rather, a representation of the internal drives
ronment defines the physical boundary for agents to move within, of the agents to execute specific actions related to their move-
move around and interact with. Considering the electricity con- ments around predefined areas. The physical force vectors that
sumption characteristic of Alan Gilbert learning commons, occu- drive such movements are referred to as social forces. The concepts
pant, computer and light are determined as the 3 types of agents. and representations of social forces are well-established but in
order to enhance the understanding of this report without neces-
3.2.1. Environment sarily consulting additional literature, a summary of the three force
The ground and first floor of Alan Gilbert learning commons !0
vectors is also shown here as the driving force f i , inter-agent force
were selected for case study and the building plan is shown in ! !
Fig. 4. The justification of this selection ground and first floors is f ij and boundary force f iw . According to Newton’s second law of
based on the premise that it provides the most holistic representa- motion, the corresponding expression of each agent i is shown in
tion of the entire building upon which any such modelling activi- Equation (3) and the diagram is shown in Fig. 5:
ties can be based in the future. For instance, the ground floor is !
d v i ðt Þ !0 X ! X !
the only different floor in the building because it comprises of mi ¼ fi þ jð–iÞ
f ij þ f iw ð3Þ
the main entrance hall, security checkpoints and coffee bar. All dt w
!
the other floors are very similar in design, thereby implying that Where mi is the mass of agent i, and v i ðt Þ is the walking velocity at
the model of one can be easily adopted for others when needed. time step t.
The green boxes in Fig. 4 represent public areas which are freely a) Driving force
accessible to any user and the lights within these areas are sensible !0
lights. The red boxes indicate meeting rooms which require prior The driving force f i indicates the intention of the agent to
booking and the lights are manually controlled. Both public areas reach a target, based on the desired speed v 0i and desired direction
and meeting rooms are equipped with desktop PCs. The total num- !0
e i . The driving force is represented in Equation (4):
ber of lights and desktops are 411 and 330, respectively.
v 0 ðtÞ! !
e i v i ðt Þ
0
!0
f i ¼ mi i ; ð4Þ
3.2.2. Behaviour of occupant agents si
The occupant behaviour in this study mainly consists of two
!
parts where the first part focuses on occupant movement and the where v i ðt Þ is the agent velocity at time step t, and si is a charac-
second part defines the energy consumption behaviour of occu- teristic time scale that reflects the reaction time.
pants. In terms of occupant movement, the social force model b) Inter-agent force
(SFM) was embedded in ABM to guide the movement of agents !s
Inter-agent force is comprised of socio-psychological force f ij
against obstacles such as walls and other people as well as reach- !p
ing target destinations within the shortest possible distances. The and physical force f ij . The socio-psychological force describes
concept of SFM was first proposed by Helbing and Molnar [45] to the psychological tendency of two agents to keep a certain safe dis-
represent the motion of agents. SFM indicates that the movement tance between each other, while the physical force indicates the
of agents can be represented as if they were experiencing certain physical contact between agents within crowded environments.
‘‘social forces” which are not necessarily caused by their personal The corresponding expressions are shown in Equations (5) and (6):
Fig. 4. The Floor plan of Alan Gilbert learning commons (a) 2D Ground floor; (b) 2D First Floor; and (c) 3D view of the building.
5
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Fig. 6. The hourly record of occupant entering Alan Gilbert learning commons.
Fig. 7. The average time occupant spent in Alan Gilbert learning commons.
19 pandemic. Hence, the data sets would provide a good represen- scale. During the simulation, the value of each outcome will be
tation of normal school activities, with most of the students on recorded at every time step and then these minute-by-minute out-
campus. The time granularity (time step) is set as 1 min basis. comes will be resampled into 1-hourly basis, with the average
The outcomes of this study are ‘‘Number of students”, ‘‘Number value corresponding to the hourly-based energy consumption
of computers in use”, ‘‘Number of computers on standby”, ‘‘Num- and weather data. Considering the randomness of occupant beha-
ber of lighting” and ‘‘Number of charging devices”. The power of viour, the simulation will be implemented 5 times and each time
each electric appliance was not taken into consideration mainly before executing the model, the random seed which controls the
because the generated data sets will be normalised into a unit randomness of the model will be manually changed to different
7
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Table 2
Questionnaire for electricity consumption behaviour in Alan Gilbert learning commons.
Q1. Which part of Alan Gilbert learning commons are you most likely to use?
Public area Neutral Meeting room
Q2. How likely are you to make use of a computer when studying/working in Alan Gilbert learning commons?
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
Q3. How likely are you to charge your personal electronic device(s) while studying/working in Alan Gilbert learning commons?
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
Q4. How likely are you to turn off the computer when leaving Alan Gilbert learning commons?
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
Q5. How likely are you to switch off the lights when leaving Alan Gilbert learning commons?
(Display this question if Q1 answer is Meeting room)
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
3.3.1. Data collection and pre-processing Where X is a data point, X min is the minimum value, X max is the max-
Apart from SOD, 6 months (from 17th June 2019 to 14th Dec imum value and X norm is the normalized value.
2019) of Original data (e.g., building energy consumption, building In order to explore the impact of data length on prediction per-
access record and meteorological data) was collected from UoM’s formance, different lengths (1 month, November only), 3 months
building energy management system, building access system as (September to November) and 5 months (July to November)) of
well as UoM’s meteorological observatory respectively. The granu- training data were set and December data was used as testing data.
larity of the acquired data was based on hourly basis. The meteo-
rological data consists of a total of 9 features, including wind 3.3.2. Boruta feature selection (BFS)
speed, wind direction, global solar radiation, indirect solar radia- While the dimensionality of the data has increased, the previ-
tion, seconds sunshine, temperature, relative humidity, apparent ous study [46] has shown that increasing the dimensionality of
temperature, and pressure. the data does not always improve the predictive performance of
With regards to time information, a total of 10 features have machine learning methods. Repetitive, redundant, and irrelevant
been extracted, including ’Hour of day’ (0–23), ’Day of week’ (0– data can reduce the performance of machine learning. Therefore,
6: Monday-Sunday), ’Day in month’ (1–30), ’Day of year’ (1–365), before feeding the data into the machine learning algorithms
’Week’ (28–50), ’Month’ (7–12), ’Weekday’ (0, 1), ‘Holidays’ (0, 1), (i.e., SGD, KNN, RF and SVM that are employed in this research
‘Exam Period’ (0, 1) (for ‘‘Holidays” and ‘‘Exam Period”, 0 means which function as regressors of prediction task), implementing fea-
8
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Xn
ture engineering, especially feature selection method to determine 1
min kwk2 þ C f
i¼1 i
ð11Þ
all relevant feature sets is essential and would have the greatest 2
impact on the prediction performance of MLs.
BFS is a wrapper feature selection algorithm that embeds a ran- where w represents the normal vector, C is the cost constant and f
dom forest (RF) for determining all features that are relevant to the represents the relaxation factor.
outcome labels and discarding irrelevant features which can occa- c) Random Forest (RF)
sionally make themselves important due to randomness [47]. It is RF is an ensemble method [50] that consists of several indepen-
more sophisticated and difficult than common feature selection dent decision trees (DT). When conducting prediction task, the
algorithms which depend on the performance of prediction as training period entails that each DT generated in a RF algorithm
the essential criterion for selecting the features. The main disad- is trained based on a different random subset of the training set
vantage of those algorithms is the loss of some relevant features and the categories with the majority of votes from individual pre-
[48]. Adding more randomness to the feature set is the core idea dictions of all trees are regarded as finial prediction results. The
of BFS, by randomly shuffling data of original feature set and then accuracy and stability of RFs are significantly enhanced when com-
merging the shuffled with then original data to form an extended pared to single DT, thereby making them more suitable for tackling
feature set. BFS is then implemented to assess the importance of a wider range of prediction challenges.
the features based on the extended feature set and only original d) K-nearest neighbours (KNN)
features whose importance is higher than that of shuffled features KNN regression is a non-parametric algorithm and the principle
are considered important. A detailed procedure for BFS is as follow: of KNN regression is to find a predetermined number of points in
the training sample that are closest in distance to the new point,
i. Add randomness to the feature set by creating shuffled and to predict the label from these points [51,52]. Assuming data
copies (shadow features) of all features and then merge set ðx1 ; y1 Þ; ; ðxn ; yn Þ is the training set with distance metrics d,
the shadow features with original features to form an where xi ¼ ðxi1 ; xi2 ; ; xim Þ is independent input variables. When
extended feature set. given a new instance x, KNN calculates the distance di between x
ii. Implement an RF model on the extended feature set. Mea- and each xi and then ranks the distances di (the corresponding i
sure the average reduced accuracy (Z value), whereby a th nearest neighbour NN i ðxÞ) by its values and its output is noted
higher Z value implies a more important feature. The largest as yi ðxÞ. The predicted output is the mean of the outputs of its
P
Z value of the shadow feature is denoted as Z max . KNN in regression as b y ¼ 1k ki¼1 yi ðxÞ.
iii. During each iteration, the features whose Z value is higher
than Z max will be kept, while those lower than Z max are con- 3.3.4. Evaluation metrics
sidered highly unimportant and will be eliminated from the In order to evaluate the model performance in terms of building
feature set. energy consumption prediction, the following evaluation metrics
iv. Repeat the above process until all features are confirmed (or are included: root mean square error (RMSE), coefficient of deter-
rejected) or reach the maximum number of iterations mination (R2), mean absolute error (MAE) and mean absolute per-
centage error (MAPE)
3.3.3. Machine learning algorithms vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u n
a) Stochastic gradient descent regression (SGD). u1 X
SGD regression is a linear regression that employs a stochastic RMSE ¼ t ðy pi Þ2 ð12Þ
n i¼1 i
gradient descent algorithm as a hyperparameter optimiser to
determine the best model parameters (e.g., the coefficients b of lin- P Pn Pn
ear regression) and the mathematical description of GSD is as n ni¼1 yi pi i¼1 yi Þ i¼1 pi
R2 ¼ rhffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi ð13Þ
follows: n Pn 2 ih Pn Pn 2 i
n y 2Þ y n p 2Þ p
Assuming a linear regression f ðxÞ ¼ xT x þ b with coefficient i¼1 i i¼1 i i¼1 i i¼1 i
10
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Fig. 10. The graphic user interface of occupant behaviour simulation module (2D and 3D animation).
coefficient efficient of features in the red box is at least 2 times lar- profile in Fig. 10, when using larger size of training data, the uni-
ger than that of features without the red box). As shown in the Fig- versity opening week, examination, dissertation and summary
ure, building access record indicated the strongest correlation, are included into training data. The machine learning algorithms
followed by some meteorological features including solar radia- will learn the information about this information and adjust their
tion, relative humidity, and temperature. Hour of day, period, hol- hyperparameters accordingly, which will in turn impede their per-
iday and examination were of time information that showed a formance in predicting building energy consumption during nor-
strong correlation with energy consumption. mal teaching period.
The results of energy consumption prediction using 4 machine Considering the possible impact of day type (e.g., weekday and
learning algorithms and based on different datasets (e.g., Original weekend) on prediction performance of machine learning algo-
data, Original + SOD1, Original + SOD2, . . ., Original + SOD5 and rithms, a prediction approach that was based on just weekday data
Original + SOD1-5) and different lengths of data (e.g., 1 month, was conducted, using the same procedure. The results obtained are
3 months and 5 months) are listed in Fig. 16 and Table 5. Due to detailed in Table 6. The results indicate that there is no difference
the comparative similarity of evaluation metrics in terms of predic- in prediction performance when using weekday data. As a self-
tion results, only RMSE result is illustrated. As shown in Fig. 16, all learning hub or a library, the number of students and the time they
machine learning algorithms (except RF), under all scenarios spend in the building is relatively stable throughout the whole
including SOD can significantly improve the prediction perfor- week, which may explain why there is no obvious difference in
mance. For instance, when using SGD based on 1 month training performance when only using weekday data compared with using
data, the prediction accuracy improved from18.6 % to 35.8 %, com- all data.
pared with using only Original data. For all algorithms, the best Although combing Original data and SOD1-5 provided the best
performance was achieved when using Original + SOD1-5. In terms prediction performance, it also led to a significant increase in input
of training data size, it was noticed that by extending the data size, data dimension which could hinder machine learning algorithms
the prediction accuracy correspondingly decreased. The main rea- from delivering the best performance. On the one hand, a higher
son for decrease in performance can be attributed to the complex- dimension will undoubtedly require more computer resource and
ity of training data. As seen in the historical energy consumption learning time. On the other hand, it could lead to unmanageable
11
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Fig. 11. The graphic user interface of occupant behaviour simulation module (a) chart output and (b) the logic flowchart.
12
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Fig. 13. Comparison between actual and simulated building access records.
Fig. 14. The correlation coefficient between occupant behaviours and energy consumption.
number of dimensions (i.e., when the number of dimensions of agent-based modelling and machine learning methods, there are
increases, the volume of the space increases too quickly and thus still some limitations that need to be addressed through further
the available data becomes sparse). With regards to this study, research. For instance, occupant behaviours are extremely stochas-
the number of features increased from 22 (in the Original dataset) tic, and a variety of factors may contribute to such stochasticity
to 47 (Original + SOD1-5). It is also worth noting the accuracy level including but not limited to environmental factors (especially the
or representativeness of the simulated occupant behaviour, the surrounding environment), human characteristics, social-
higher the number of features, and the model will need to be exe- economic factors and peer pressure. In this study, those factors
cuted more times to obtain various possible results. In order to bet- were not taken into consideration due to the specificity of Alan Gil-
ter optimise the computational effectiveness of the feature bert learning commons (library), whereby occupant electricity-
selection process, without necessarily compromising on the qual- related behaviours are relatively simple and occupants have lim-
ity of the analysis, Boruta feature selection approach was imple- ited interactions with each other. Also, the procedures for estab-
mented to identify and eventually select all relevant features as lishing ABM to simulate occupant behaviours varies from
listed in Table 7. building to building, and it is believed that significant efforts
The results of using the selected features for building energy may be required when simulating a building with more types of
consumption are listed in Table 8, and a comparison among Origi- occupants and more complicated occupant behaviour patterns,
nal, Selected and Original + SOD1-5 is illustrated in Fig. 17 as well. which may affect the generalisation capability of the proposed
The results suggest that using Selected data slightly improved the hybrid of agent-based modelling and machine learning method.
performance of machine learning algorithms in predicting building It should also be emphasised that there could be potential overfit-
energy consumption, despite the exclusion of some features. ting problems due to the assumptions made in the proposed
Despite the significant improvements that have been achieved method and the complexity of the structure of the proposed
in predicting building energy consumption via the proposed hybrid method.
13
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Fig. 15. The correlation coefficient between meteorological, time information and energy consumption.
Fig. 16. The machine learning performance (RMSE) in predicting building energy consumption based on different datasets.
14
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Table 5
The machine learning performance in predicting building energy consumption based on different dataset.
Table 6
The machine learning performance in predicting building energy consumption based on different dataset (weekday data).
15
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
Table 7 5. Conclusion
Summary of Boruta feature selection.
Fig. 17. The machine learning performance (RMSE) in predicting building energy consumption based on Original, Selected and Original + SOD1-5.
Table 8
The machine learning performance in predicting building energy consumption based on selected dataset.
16
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
work. The number of students, in-use/standby computers, lighting [11] R.K. Strand, Incorporating two-dimensional conduction modeling techniques
into an energy simulation program: the EnergyPlus radiant system example,
and charging of devices was the output data generated from ABM.
Energy Build. 274 (2022), https://doi.org/10.1016/j.enbuild.2022.112405.
The model was implemented 5 times in order to generate repre- [12] R. Meldem, F. Winkelmann, Comparison of DOE-2 with temperature
sentative possibilities. The simulated occupational data together measurements in the Pala test houses, Energy Build. 27 (1) (1998) 69–81,
with original data (meteorological and time information) were https://doi.org/10.1016/S0378-7788(97)00027-3.
[13] S. Wang, X. Liu, S. Gates, An introduction of new features for conventional and
used as input features. 4 popular machine learning algorithms, hybrid GSHP simulations in eQUEST 3.7, Energy Build. 105 (2015) 368–376,
namely, stochastic gradient descent regress, support vector https://doi.org/10.1016/j.enbuild.2015.07.041.
machine, K nearest neighbours and random forest were included [14] C. Peng, L. Wang, X. Zhang, DeST-based dynamic simulation and energy
efficiency retrofit analysis of commercial buildings in the hot summer/cold
to test the generalisation capability of ABM. winter zone of China: a case in Nanjing, Energy Build. 78 (2014) 123–131,
The results suggest that combining original data with single https://doi.org/10.1016/j.enbuild.2014.04.023.
simulated data can significantly improve the performances of [15] Y. Chen, X. Liang, T. Hong, X. Luo, Simulation and visualization of energy-
related occupant behavior in office buildings, Build. Simul. 10 (6) (2017) 785–
machine learning methods in predicting building energy consump- 798, https://doi.org/10.1007/s12273-017-0355-2.
tion. The best performances were achieved when combining orig- [16] S. Norouziasl, A. Jafari, Y. Zhu, Modeling and simulation of energy-related
inal data with all simulated occupant data (Original + SOD1-5). human-building interaction: a systematic review, J. Build. Eng. 44 (2021),
https://doi.org/10.1016/j.jobe.2021.102928.
Meanwhile, the impact of day type (weekday or weekend) was also [17] Y. Zhu, S. Saeidi, T. Rizzuto, A. Roetzel, R. Kooima, Potential and challenges of
explored, and the results implied that for library-type buildings immersive virtual environments for occupant energy behavior modeling and
whereby the daily number of students is relatively stable through- validation: a literature review, J. Build. Eng. 19 (2018) 302–319, https://doi.
org/10.1016/j.jobe.2018.05.017.
out the week, day type merely has any impact on the performance
[18] C.M. Clevenger, J. Haymaker, The impact of the building occupant on energy
of building energy consumption prediction processes. The perfor- modeling simulations, Jt. Int. Conf. Comput. Decis. Mak. Civ. Build. Eng. (2006)
mance of prediction was further improved slightly by implement- 1–10.
ing Boruta Feature Selection which excluded the irrelevant/ [19] S.O. Abioye et al., Artificial intelligence in the construction industry: a review
of present status, opportunities and future challenges, J. Build. Eng. 44 (2021),
redundant input features. It should also be emphasised that the https://doi.org/10.1016/j.jobe.2021.103299.
extension of training data length in this study led to a correspond- [20] Q. Qiao, A. Yunusa-Kaltungo, R. Edwards, Predicting building energy
ing decrease in the performance of machine learning algorithms consumption based on meteorological data, IEEE PES/IAS PowerAfrica 2020
(2020) 1–5, https://doi.org/10.1109/PowerAfrica49420.2020.9219909.
during building energy consumption prediction, due to the com- [21] H.C. Putra, C.J. Andrews, J.A. Senick, An agent-based model of building
plexity associated with the historical patterns of energy consump- occupant behavior during load shedding, Build. Simul. 10 (6) (2017) 845–
tion, which further highlights the need to intensify research on the 859, https://doi.org/10.1007/s12273-017-0384-x.
[22] Q. Qiao, A. Yunusa-Kaltungo, R. Edwards, Hybrid method for building energy
optimisation of data selection approaches in the near future. consumption prediction based on limited data, IEEE PES/IAS PowerAfrica 2020
(2020) 1–5, https://doi.org/10.1109/PowerAfrica49420.2020.9219915.
[23] R.F. Rupp, R.K. Andersen, J. Toftum, E. Ghisi, Occupant behaviour in mixed-
Data availability mode office buildings in a subtropical climate: beyond typical models of
adaptive actions, Build. Environ. 190 (December) (2020) 2021, https://doi.org/
Data will be made available on request. 10.1016/j.buildenv.2020.107541.
[24] S. Zheng, K. LeWinn, T. Ceja, M. Hanna-Attisha, L. O’Connell, S. Bishop, Adaptive
behavior as an alternative outcome to intelligence quotient in studies of
Declaration of Competing Interest children at risk: a study of preschool-aged children in flint, MI, USA, Front.
Psychol. 12 (August) (2021) 1–13, https://doi.org/10.3389/fpsyg.2021.692330.
[25] M.S. Mustapa, S.A. Zaki, H.B. Rijal, A. Hagishima, M.S.M. Ali, Thermal comfort
The authors declare that they have no known competing finan- and occupant adaptive behaviour in Japanese university buildings with free
cial interests or personal relationships that could have appeared running and cooling mode offices during summer, Build. Environ. 105 (2016)
to influence the work reported in this paper. 332–342, https://doi.org/10.1016/j.buildenv.2016.06.014.
[26] A. Mahdavi, F. Tahmasebi, B. Gunay, W.O’Brien, S. D’Oca, Technical Report :
Occupant Behavior Modeling Approaches and Evaluation, no. November. 2017.
References [27] Y. Liu et al., Thermal preference prediction based on occupants’ adaptive
behavior in indoor environments- A study of an air-conditioned multi-
occupancy office in China, Build. Environ. 206 (2021), https://doi.org/
[1] E. Azar, C.C. Menassa, ‘‘A decision framework for energy use reduction
10.1016/j.buildenv.2021.108355.
initiatives in commercial buildings,” Proc. - Winter Simul. Conf., pp. 816–827,
[28] P. Zheng et al., Thermal adaptive behavior and thermal comfort for occupants
2011, 10.1109/WSC.2011.6147808.
in multi-person offices with air-conditioning systems, Build. Environ. 207
[2] D.K. Abideen, A. Yunusa-Kaltungo, P. Manu, C. Cheung, A systematic review of
(2022), https://doi.org/10.1016/j.buildenv.2021.108432.
the extent to which BIM is integrated into operation and maintenance,
[29] S. Kumar, M.K. Singh, Seasonal comfort temperature and occupant’s adaptive
Sustainability 14 (14) (2022), https://doi.org/10.3390/su14148692.
behaviour in a naturally ventilated university workshop building under the
[3] T. Zhang, P.-O. Siebers, U. Aickelin, Modelling office energy consumption: an
composite climate of India, J. Build. Eng. 40 (2021), https://doi.org/10.1016/j.
agent based approach, SSRN Electron. J. (2018) 1–15, https://doi.org/10.2139/
jobe.2021.102701.
ssrn.2829316.
[30] A. Zhang, Q. Huang, Y. Du, Q. Zhen, Q. Zhang, Agent-based modelling of
[4] J. Mahecha Zambrano, U. Filippi Oberegger, G. Salvalai, Towards integrating
occupants’ clothing and activity behaviour and their impact on thermal
occupant behaviour modelling in simulation-aided building design: reasons,
comfort in buildings, IOP Conf. Ser.: Earth Environ. Sci. 329 (1) (2019) 012022.
challenges and solutions, Energy Build. 253 (2021), https://doi.org/10.1016/j.
[31] L. Jiang, R. Yao, K. Liu, R. McCrindle, An Epistemic-Deontic-Axiologic (EDA)
enbuild.2021.111498.
agent-based energy management system in office buildings, Appl. Energy 205
[5] S. Carlucci, F. Causone, S. Biandrate, M. Ferrando, A. Moazami, S. Erba, On the
(August) (2017) 440–452, https://doi.org/10.1016/j.apenergy.2017.07.081.
impact of stochastic modeling of occupant behavior on the energy use of office
[32] W. Parys, D. Saelens, H. Hens, Coupling of dynamic building simulation with
buildings, Energy Build. 246 (2021), https://doi.org/10.1016/j.
stochastic modelling of occupant behaviour in offices - a review-based
enbuild.2021.111049.
integrated methodology, J. Build. Perform. Simul. 4 (4) (2011) 339–358,
[6] E. Azar et al., Simulation-aided occupant-centric building design: a critical
https://doi.org/10.1080/19401493.2010.524711.
review of tools, methods, and applications, Energy Build. 224 (2020), https://
[33] T. Hong, Y. Chen, Z. Belafi, S. D’Oca, Occupant behavior models: a critical
doi.org/10.1016/j.enbuild.2020.110292.
review of implementation and representation approaches in building
[7] R.H. Socolow, The twin rivers program on energy conservation in housing:
performance simulation programs, Build. Simul. 11 (1) (2018) 1–14, https://
highlights and conclusions, Energy Build. 1 (3) (1978) 207–242, https://doi.
doi.org/10.1007/s12273-017-0396-6.
org/10.1016/0378-7788(78)90003-8.
[34] F. Haldi, D. Robinson, Adaptive actions on shading devices in response to local
[8] Y. Sun, F. Haghighat, B.C.M. Fung, A review of the -state-of-the-art in data -
visual stimuli, J. Build. Perform. Simul. 3 (2) (2010) 135–153, https://doi.org/
driven approaches for building energy prediction, Energy Build. 221 (2020)
10.1080/19401490903580759.
110022, https://doi.org/10.1016/j.enbuild.2020.110022.
[35] G. Flett, N. Kelly, An occupant-differentiated, higher-order Markov Chain
[9] Q. Qiao, A. Yunusa-Kaltungo, R.E. Edwards, Towards developing a systematic
method for prediction of domestic occupancy, Energy Build. 125 (2016) 219–
knowledge trend for building energy consumption prediction, J. Build. Eng. 35
230, https://doi.org/10.1016/j.enbuild.2016.05.015.
(October 2020) (2021) 101967, https://doi.org/10.1016/j.jobe.2020.101967.
[36] S. Tian, Y. Lu, X. Ge, Y. Zheng, An agent-based modeling approach combined
[10] Y. Jin, D. Yan, A. Chong, B. Dong, J. An, Building occupancy forecasting: a
with deep learning method in simulating household energy consumption, J.
systematical and critical review, Energy Build. 251 (2021), https://doi.org/
Build. Eng. 43 (2021), https://doi.org/10.1016/j.jobe.2021.103210.
10.1016/j.enbuild.2021.111345.
17
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797
[37] D.H. Dorrah, M. Marzouk, Integrated multi-objective optimization and agent- [44] J.Z. Qing-Hua Zhu, Y.Q. Yan Hou, The energy- saving scheduling of campus
based building occupancy modeling for space layout planning, J. Build. Eng. 34 classrooms: a simulation model, IEEE Syst. Man, Cybern. Mag. (2021), https://
(2021), https://doi.org/10.1016/j.jobe.2020.101902. doi.org/10.1109/MSMC.2020.3046236.
[38] Z. Wang, Y. Zhao, C. Zhang, P. Ma, X. Liu, A general multi agent-based [45] D. Helbing, P. Molnár, Social force model for pedestrian dynamics, Phys. Rev. E
distributed framework for optimal control of building HVAC systems, J. Build. 51 (5) (1995) 4282–4286, https://doi.org/10.1103/PhysRevE.51.4282.
Eng. 52 (2022), https://doi.org/10.1016/j.jobe.2022.104498. [46] Q. Qiao, A. Yunusa-Kaltungo, R.E. Edwards, Feature selection strategy for
[39] E. Bonabeau, Agent-based modeling: Methods and techniques for simulating machine learning methods in building energy consumption prediction, Energy
human systems, Proc. Natl. Acad. Sci. USA 99 (SUPPL. 3) (2002) 7280–7287, Rep. 8 (2022) 13621–13654, https://doi.org/10.1016/j.egyr.2022.10.125.
https://doi.org/10.1073/pnas.082080899. [47] M.B. Kursa, W.R. Rudnicki, Feature selection with the boruta package, J. Stat.
[40] I. Mahmood, H.A. Quair-tul-ain, F.J. Nasir, J.A. Aguado, A hierarchical multi- Softw. 36 (11) (2010) 1–13, https://doi.org/10.18637/jss.v036.i11.
resolution agent-based modeling and simulation framework for household [48] R. Tang, X. Zhang. ‘‘CART Decision Tree Combined with Boruta Feature
electricity demand profile, Simulation 96 (8) (2020) 655–678, https://doi.org/ Selection for Medical Data Classification,” 2020 5th IEEE Int. Conf. Big Data Anal.
10.1177/0037549720923401. ICBDA 2020, pp. 80–84, 2020, 10.1109/ICBDA49040.2020.9101199.
[41] M. Barakat, H. Khoury, An agent-based framework to study occupant multi- [49] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. (1995), https://doi.
comfort level in office buildings, Winter Simulation Conference (WSC) 2016 org/10.1023/A:1022627411411.
(2016) 1328–1339, https://doi.org/10.1109/WSC.2016.7822187. [50] M.W. Ahmad, M. Mourshed, Y. Rezgui, Trees vs Neurons: comparison between
[42] J. Malik et al., Ten questions concerning agent-based modeling of occupant random forest and ANN for high-resolution prediction of building energy
behavior for energy and environmental performance of buildings, Build. consumption, Energy Build. 147 (2017) 77–89, https://doi.org/10.1016/j.
Environ. 217 (2022), https://doi.org/10.1016/j.buildenv.2022.109016. enbuild.2017.04.038.
[43] C. Berger, A. Mahdavi, Review of current trends in agent-based modeling of [51] A.A. Salah, Machine learning for biometrics, Mach. Learn. (2011) 704–723,
building occupants for energy and indoor-environmental performance https://doi.org/10.4018/978-1-60960-818-7.ch402.
analysis, Build. Environ. 173 (2020), https://doi.org/10.1016/j. [52] Y. Song, J. Liang, J. Lu, X. Zhao, An efficient instance selection algorithm for k
buildenv.2020.106726. nearest neighbor regression, Neurocomputing 251 (2017) 26–34, https://doi.
org/10.1016/j.neucom.2017.04.018.
18