You are on page 1of 18

Energy & Buildings 283 (2023) 112797

Contents lists available at ScienceDirect

Energy & Buildings


journal homepage: www.elsevier.com/locate/enb

A hybrid agent-based machine learning method for human-centred


energy consumption prediction
Qingyao Qiao, Akilu Yunusa-Kaltungo ⇑
Department of Mechanical, Aerospace and Civil Engineering (MACE), University of Manchester M13 9PL, UK

a r t i c l e i n f o a b s t r a c t

Article history: Occupant behaviour has significant impacts on the performance of machine learning algorithms when
Received 20 September 2022 predicting building energy consumption. Due to a variety of reasons (e.g., underperforming building
Revised 6 January 2023 energy management systems or restrictions due to privacy policies), the availability of occupational data
Accepted 9 January 2023
has long been an obstacle that hinders the performance of machine learning algorithms in predicting
Available online 12 January 2023
building energy consumption. Therefore, this study proposed an agent-based machine learning model
whereby agent-based modelling was employed to generate simulated occupational data as input features
Keywords:
for machine learning algorithms for building energy consumption prediction. Boruta feature selection
Building energy consumption
Prediction
was also introduced in this study to select all relevant features. The results indicated that the perfor-
Machine learning mances of machine learning algorithms in predicting building energy consumption were significantly
Agent-based modelling improved when using simulated occupational data, with even greater improvements after conducting
Occupant behaviour Boruta feature selection.
Ó 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction Fig. 1. The complexity in occupancy and occupant behaviour con-


tributes to the most uncertainty of building energy usage. An
As climate change and energy shortage problems are worsen- experiment conducted on 248 dwellings by Socolow [7] revealed
ing, all countries are embarking on mitigating measures to curb that occupants’ individual behaviour led to 71 % of the energy
the menace. The building sector accounts for around 30 %–40 % demand variation. Therefore, accurately simulating occupant beha-
of global energy usage and more than 80 % of the energy consumed viours is an essential prerequisite to achieving reliable building
throughout a typical building’s life cycle can be attributed to its energy consumption prediction.
service period [1,2]. In the UK, the service sector (e.g., shops, offices The approach for building energy consumption prediction can
and factories) consumed about 15 % of overall energy consumption be in general classified into physical-based, artificial intelligence-
in 2020, with most of this energy getting expended on heating, based (AI) and hybrid methods [8–10]. Physical methods are
lighting, computing, catering and hot water generation [3]. The mainly based on the application of physical laws to simulate build-
energy-intensiveness of service sector indicates a significant ing energy consumption. EnergyPlus [11], DOE-2 [12], eQuest [13]
potential for energy saving and reduction. In order to achieve this and DeST [14] are some of the most popular commercial platforms
goal, a comprehensive understanding of the energy consumed by for building energy simulation. Despite occupant behaviour being
the building is essential, as it provides the scientific reference for incorporated into the platforms, incorrectly setting or oversimpli-
decision-makers or stakeholders to conduct energy conservation fying occupant parameters usually impede the performance of
activities. physical-based methods. For instance, the methods rely on deter-
Building electricity consumption is closely related to occupancy ministic and fixed schedules to model occupancy [15], and energy
and occupant behaviours, especially common activities such as consumption related to occupancy and occupant behaviour is
moving inside the building, interacting with electrical appliances empirically extracted as a fixed value or energy intensity, based
and opening/closing of windows/doors [4,5]. A review conducted on the room’s function and size. This homogenous occupant set-
by Elie Azar et al. [6] revealed the complex relationship between tings fail to represent the diversity of occupants and inevitably
occupants, indoor environmental quality and energy as shown in introduces a performance gap between predicted and actual
energy consumption [16,17]. Research conducted by Clevenger
and Haymaker [18] proved that the difference in occupant settings
⇑ Corresponding author.
E-mail address: akilu.kaltungo@manchester.ac.uk (A. Yunusa-Kaltungo).

https://doi.org/10.1016/j.enbuild.2023.112797
0378-7788/Ó 2023 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 1. Relationship between occupants, indoor environmental quality and energy [6].

using physical methods can lead to an inaccurate prediction by up desire for a comfortable indoor environment has a great impact on
to 40 %. energy consumption. However, when it comes to public buildings
The underlying principle of AI methods is based on the mathe- where a large number of people are gathered closely, the move-
matical creation of a matrix that is able to map input features with ment or the route of individuals (which will be influenced by
output energy consumption correspondingly [19]. Compared with crowd) is less discussed, despite the impact of occupant movement
physical-based methods, AI methods do not require detailed infor- on building energy consumption. For instance, the operational
mation about every aspect related to building energy usage to con- schedules of some electric appliances (including motion-sensitive
figure the model [20]. In terms of predicting occupant behaviours, lightings, automated gates and lifts, etc.,) depend on the movement
the most frequently used method is regression analysis which of occupants.
attempts to map the probability of certain behaviours such as Based on the above premise, this study attempts to develop a
window-opening with energy consumption [21]. However, as comprehensive hybrid framework that combines AI methods with
data-driven methods, data availability and quality significantly occupant behaviours modelling (i.e., ABM) to predict building
determine the performance of AI methods. Due to privacy con- energy consumption. Social force modelling (SFM) is also inte-
cerns, occupational data are often the most difficult to obtain, grated into the proposed hybrid framework to simulate occupant
which in turn impedes the application of AI methods in predicting movement. The remaining sections of the paper are organised as
occupancy and occupant behaviour [22]. Besides oversimplifica- follows; Section 2 briefly reviews the existing knowledge regarding
tion or data availability dilemma, when considering combining building energy consumption and occupant behaviour modelling,
occupant behaviour, occupant presence was the most frequently while Section 3 provides an overview of the research methodology
used feature for both physical-based and AI methods, but less far as well as a description of the selected case studies. The detailed
emphasis has been placed on adaptive behaviours/occupant beha- results obtained from the study as well as their implications were
viours (e.g., adjusting shades and windows, switching on/off lights, presented in Section 4. Finally, Section 5 summarised the conclu-
setting the thermostat, etc.), which would also have a significant sion of the entire study and showed potential future works.
impact on the predicted energy consumption [23]. Adaptive beha-
viour is defined as the conceptual, social and practical skills that an
individual requires to cope with everyday life [24]. When it relates 2. Literature review
to the built environment, the adaptive principle of thermal comfort
states that an individual interacts with the environment (particu- A widely-recognised and most commonly observed character-
larly when uncomfortable) to restore his/her comfort [25] such istic of occupant behaviour is its stochasticity and in order to
as adjusting shades and windows, switching on/off lights, setting more accurately describe the uncertainty of occupant behaviour,
the thermostat, etc., [26–29]. Considering the heterogeneity of stochastic process and agent-based modelling have been imple-
individual preference towards thermal comfort, agent-based mod- mented in predicting occupant behaviour [4,32,33]. The core idea
elling (ABM) has been extensively employed to simulate the com- of a stochastic process modelling is to estimate the state of occu-
plex adaptive behaviour of occupants. For instance, Zhang et al. pancy and building energy-related appliances (e.g., opening/clo-
[30] proposed an ABM approach to simulate the clothing and activ- sure of windows and doors, on/off of light and presence/
ity behaviour of occupants in buildings, based on the comfort level absence of occupants). Some of the most popular methods include
of the occupants. Similarly, Jiang et al. [31] implemented ABM to logistic regression, Markov chain, passion process and survival
simulate the interaction between occupants and heating, ventila- analysis. A Markov chain is a discrete random process that the
tion and air-conditioning (HAVC) system that aimed to develop state of the next event or process is determined by the current
an energy-efficient building energy management system (BEMS). ones [32]. Haldi and Robinson [34] simulated the occupant beha-
It is noticed that attention to adaptive behaviour has skewed viour on shading devices using a Markov process. A 6-year mea-
towards indoor thermal comfort which is reasonable as occupants’ surement and a field survey were conducted to identify the
2
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

impact of the occupancy, thermal and visual parameters on shad- levels will influence the status of door, window, shades, lights
ing behaviour. The results suggested a significant improvement as well as heating, ventilation and air conditioning.
compared to the deterministic methods. Similarly, Graeme Flett Despite ABM showing the flexibility and capability to simulate
and Nick Kelly [35] employed a Markov chain-based method to complex and stochastic occupant behaviours, it is worth noting
generate realistic occupancy profiles for residential buildings. that like traditional physical-based methods, in order to predict
Transition probability matrices were determined with regards to the overall energy consumption of buildings, extensive details of
the probability of active occupancy based on time use survey energy components need to be manually considered when imple-
data, which generated a promising set of results. However, menting ABM for simulating building energy consumption. Other-
despite Markov chain-based methods taking the stochasticity of wise, a discrepancy in energy consumption will occur [42,43].
occupant behaviour into consideration, it is often criticised for Unlike ABM which has been frequently applied in simulating
the dependence between events. Also, it remains challenging for occupant energy-related behaviour, SFM is however more often
Markov chain methods to simulate multiple energy-related beha- employed in the research of pedestrians traffic other than occupant
viours simultaneously. Most importantly, Markov chain ignores behaviour. In terms of applying SFM for building energy simula-
the interaction between occupants as well as between occupants tion, Zhu et al. [44] investigated the energy-saving scheduling of
and their environments. campus classrooms by simulating students’ traffic in classrooms
In order to address the aforementioned issues, ABM has been based on students’ preferences for seats and times. A modified
introduced to simulate occupational energy behaviours in build- SFM was also implemented for traffic simulation. The main pur-
ings [36–38]. ABM is a computational model for the simulation pose of SFM in the study by Zhu et al. [44] is still to accurately
of the action and interactions of autonomous agents so as to describe occupant movement, but it should be emphasised that
understand the behaviour of a system. The individual agent is an accurate prediction of occupant traffic can facilitate the under-
able to assess its situation and make decisions based on certain standing of the operational reegimes of electric appliances (e.g.,
rules [39]. As a ‘bottom-up’ method, the global behaviour com- motion-sensitive lighting) depending on occupant traffic.
prises a summary of many individual agents interacting with each Based on this premise, this study proposed a hybrid framework
other and with their environment [1]. The agent can either be an that combines ABM and SFM with a machine learning method to
occupant or an electronic appliance. The flexibility and capability predict the electricity consumption of a high-volume self-
of ABM facilitate the simulation of complex behavioural aspects learning higher educational institution (HEI) building.
of occupants. Zhang et al. [3] simulated the energy consumption
of an office building with occupant behaviour taken into consid- 3. Methodology
eration using ABM. The state of agents (especially occupant, light-
ing, computers, etc.) and interactions between agents were An agent-based machine learning method was proposed in this
established respectively to simulate the energy consumption pro- study to improve the accuracy of building energy consumption
file of lighting and computers. A semblable energy consumption prediction by taking occupancy and occupant behaviour into con-
pattern was observed between the simulated and actual energy sideration. The schematic outline of the research is depicted in
consumption. Apart from simulating occupational energy con- Fig. 2, which consists of two components- an occupant behaviour
sumption, a variety of energy control strategies were set up for simulation module and an energy consumption prediction module.
validating the efficiency of each strategy accordingly, so as to ABM was first implemented to simulate the occupancy and occu-
benefit decision-making for energy management departments. pant behaviour inside an HEI self-learning hub, and the simulation
Similar research was conducted by Irman et al. [40] that estab- outcomes include the number of students, in-use/standby comput-
lished a hierarchical multi-resolution ABM to simulate the elec- ers, lighting, and charging personal electronic devices which served
tricity demand profile of 264 residential buildings. Besides the as input data for building energy consumption prediction with
state of agents and their interactions initially built by Zhang machine learning algorithms. Weather data and time information
et al. [3], Irman et al. [40] also introduced the heterogeneity into were also employed as input data. A detailed description of the
occupancy patterns and house profiles in terms of number of beds proposed method was demonstrated during the rest of this section.
and floor area. In addition, weather information was also
employed in their model, considering the significant impact of 3.1. Description of the case study building
weather on energy-related occupational behaviour of residential
buildings. The model exhibited a mean absolute percentage error Alan Gilbert learning commons, a 24 h a day and seven days a
of 17–29 % across 264 residential buildings, which implies the week (24–7) HEI self-learning hub or library built in 2014 was
robustness and generalisation capability of ABM in simulating selected for case study as shown in Fig. 3. With the principal objec-
the complexity of building energy consumption. Besides simulat- tive of minimising CO2 emissions, Alan Gilbert learning commons
ing the direct interaction between occupants and electricity- was equipped with considerable energy-efficient facilities, includ-
related appliances, efforts have been focused on investigating ing photovoltaic roof tiles and solar thermal systems and extensive
the indirect influence of occupants as well. For instance, Azar use of glass curtain walls.
and Menassa [1] implemented ABM to determine different occu- The electricity consumption in an office building according to
pants’ energy use patterns to represent the different and changing Zhang et al. [3] is comprised of two components, namely base con-
occupants’ energy characteristics over time in an office building. sumption and flexible consumption. Base consumption represents
Rebound effect and energy conservation interventions (word-of- the electric equipment and appliances that have to be switched on
mouth effect) were introduced into the model. Occupants were all the time (e.g., security cameras, information display, computer
classified into medium, high and low energy consumers whereby servers, refrigerators, etc.). Flexible consumption denotes the
medium energy consumers’ behaviour were influenced by the energy consumed by the kind of electric equipment and appliances
remaining two. Different energy usage scenarios were also simu- that can be switched on/off at any time by users. The electricity
lated to evaluate the potential in energy saving. Psychological fac- consumption can be mathematically expressed as shown in Equa-
tors were also studied by Barakat and Khoury [41] whereby they tion (1).
developed an ABM framework to study occupant multi-comfort
EC total ¼ EC base þ EC flexible ð1Þ
level in office buildings. Visual, thermal and acoustic comfort

3
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 2. The schematic outline of the agent-based machine learning method.

The flexible electricity consumption can be further decomposed


into Equation (2) when considering the interaction between indi-
vidual user and flexible electricity consumption:
EC flexible ¼ b1 EC f 1 þ b2 EC f 2 þ    þ bn EC fn ð2Þ

Where EC f 1 , EC f 2 , . . ., EC fn are the electricity consumption of each


flexible appliance, and n is the total number of flexible appliances.
b1 , b2 , . . ., bn are the binomial parameters (0,1) reflecting occupant
behaviour (1 represents switching on a flexible appliance and 0 rep-
resents switching off a flexible appliance).
Alan Gilbert learning commons building, has a sophisticated
design with well-developed building energy management system
which therefore implies that a reduction in manual interaction
with the building is required. The limited interaction between
occupant and the electricity appliances of Alan Gilbert learning
commons includes switching on/off computer and lights in meet-
Fig. 3. Exterior view of Alan Gilbert learning commons. ing rooms and plugging in personal electronic devices.

3.2. Occupant behaviour simulation module

Where EC total is the total electricity consumption, EC base is the base ABM was employed to simulate occupant and occupant beha-
electricity consumption and EC flexible is the flexible electricity viour in Alan Gilbert learning commons in terms of electricity con-
consumption. sumption. The modelling mainly consists of 3 components namely,
4
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

environment, agent, and agent behaviours (interactions). The envi- environments, but rather, a representation of the internal drives
ronment defines the physical boundary for agents to move within, of the agents to execute specific actions related to their move-
move around and interact with. Considering the electricity con- ments around predefined areas. The physical force vectors that
sumption characteristic of Alan Gilbert learning commons, occu- drive such movements are referred to as social forces. The concepts
pant, computer and light are determined as the 3 types of agents. and representations of social forces are well-established but in
order to enhance the understanding of this report without neces-
3.2.1. Environment sarily consulting additional literature, a summary of the three force
The ground and first floor of Alan Gilbert learning commons !0
vectors is also shown here as the driving force f i , inter-agent force
were selected for case study and the building plan is shown in ! !
Fig. 4. The justification of this selection ground and first floors is f ij and boundary force f iw . According to Newton’s second law of
based on the premise that it provides the most holistic representa- motion, the corresponding expression of each agent i is shown in
tion of the entire building upon which any such modelling activi- Equation (3) and the diagram is shown in Fig. 5:
ties can be based in the future. For instance, the ground floor is !
d v i ðt Þ !0 X ! X !
the only different floor in the building because it comprises of mi ¼ fi þ jð–iÞ
f ij þ f iw ð3Þ
the main entrance hall, security checkpoints and coffee bar. All dt w

!
the other floors are very similar in design, thereby implying that Where mi is the mass of agent i, and v i ðt Þ is the walking velocity at
the model of one can be easily adopted for others when needed. time step t.
The green boxes in Fig. 4 represent public areas which are freely a) Driving force
accessible to any user and the lights within these areas are sensible !0
lights. The red boxes indicate meeting rooms which require prior The driving force f i indicates the intention of the agent to
booking and the lights are manually controlled. Both public areas reach a target, based on the desired speed v 0i and desired direction
and meeting rooms are equipped with desktop PCs. The total num- !0
e i . The driving force is represented in Equation (4):
ber of lights and desktops are 411 and 330, respectively.
v 0 ðtÞ! !
e i  v i ðt Þ
0
!0
f i ¼ mi i ; ð4Þ
3.2.2. Behaviour of occupant agents si
The occupant behaviour in this study mainly consists of two
!
parts where the first part focuses on occupant movement and the where v i ðt Þ is the agent velocity at time step t, and si is a charac-
second part defines the energy consumption behaviour of occu- teristic time scale that reflects the reaction time.
pants. In terms of occupant movement, the social force model b) Inter-agent force
(SFM) was embedded in ABM to guide the movement of agents !s
Inter-agent force is comprised of socio-psychological force f ij
against obstacles such as walls and other people as well as reach- !p
ing target destinations within the shortest possible distances. The and physical force f ij . The socio-psychological force describes
concept of SFM was first proposed by Helbing and Molnar [45] to the psychological tendency of two agents to keep a certain safe dis-
represent the motion of agents. SFM indicates that the movement tance between each other, while the physical force indicates the
of agents can be represented as if they were experiencing certain physical contact between agents within crowded environments.
‘‘social forces” which are not necessarily caused by their personal The corresponding expressions are shown in Equations (5) and (6):

Fig. 4. The Floor plan of Alan Gilbert learning commons (a) 2D Ground floor; (b) 2D First Floor; and (c) 3D view of the building.

5
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

erally an exponential distribution. The majority of occupants spend


less than 10 h in the building.
With regards to energy consumption behaviour of occupants, a
combination of observation and questionnaire was conducted to
understand the interaction between occupant and electric appli-
ances. By empirical observation, the general routes for an occupant
in the building can be summarised as depicted in Fig. 8. The first
step of an occupant when he/she arrives at the building is to decide
where to work (e.g., public areas or meeting rooms). During the
time in the building, the occupant will turn on the computer and
charge personal electronic device(s) based on certain probabilities.
When leaving the building, the occupant will make a final decision
on whether to turn off the computer or not. For a meeting room
user, he/she needs to make an extra decision on whether or not
to switch off the lights when leaving the room. The lights outside
the meeting room are sensible controlled, therefore the occupants
do not need to manually operate them.
A completely anonymous questionnaire was designed in order
to quantify the probabilities of the above occupant behaviours
and the questions of the questionnaire are listed in Table 2. The
questionnaire was deployed online to 864 post graduate research
(PGR) students within the core engineering departments (i.e.,
Departments of Chemical Engineering (ChemEng), Electrical and
Fig. 5. Diagram of the social force model.
Electronic Engineering (EEE), as well as Mechanical, Aerospace
and Civil Engineering (MACE)) at the University of Manchester
  from 1st June to 31st August 2022.
!s r ij  dij !
f ij ¼ Ai exp n ij ; ð5Þ
Bi
3.2.3. Behaviour of light agents
Two types of lights are included in the occupant behaviour sim-
!p  !   !
f ij ¼ kg r ij  dij n ij þ jg r ij  dij Dv tji t ij ; ð6Þ ulation module namely; sensible lights and manually controlled
lights. Sensible lights are installed in all public areas, while manu-
where Ai , Bi , k , j are constant parameters. !
n ij is the unit vector ally controlled lights are installed in meeting rooms. Both types of
! lights have two states ‘‘on” and ‘‘off”, and the state of lights is a
pointing from agent j to agent i. t ij is the unit tangential vector
!   passive reaction to the behaviour of occupants. For sensible lights,
and orthogonal to n ij and Dv tji ¼ v j  v i  t ij is the tangential
the lights will switch on once it detects the presence of occupants
velocity difference. within the range of 4 m and will switch off when there is no occu-
c) Boundary force. pant within the range for 15 min. The manually controlled lights in
The boundary force is similar to the physical force of inter-agent the meeting rooms are directly associated with the control of occu-
and the mathematical expression is shown in Equation (7): pant. When an occupant enters a meeting room and finds the lights
  in the room in ‘‘off” position, the occupant will always switch on
! ri  diw ! !
f iw ¼ Ai exp n iw þ kg ðr i  diw Þ n iw the lights. An occupant will leave the room with lights on if there
Bi
are other people in the room and when the last occupant leaves the
!
þ jg ðr i  diw ÞDv twi t iw ; ð7Þ room, he/she will switch off the lights based on the probability
listed in Table 3.
where diw is the distance between the centre of agent i and the sur-
face of walls. 3.2.4. Behaviour of computer and personal electronic device agents
The specific parameters of the SFM considered in this study are Computers are directly controlled by occupants as well. The
detailed in Table 1. states of a computer are ‘‘on”, ‘‘off” and ‘‘standby” respectively.
In addition to agent movement, 3 years of hourly entry/exit First, an occupant will be assigned a computer (could be in any
record from Alan Gilbert learning commons was used to generate state) after determining which area to work. The occupant needs
the hourly occupancy profile and the duration as shown in Figs. 6 to decide whether to use the computer based on the probability.
and 7. The hourly entry record was observed to generally corre- When leaving, the occupant will either log off the account to let
spond to a normal distribution and the maximum entrance is the computer transfer into standby mode or directly turn off the
70.04 at 13:00. The average time that the occupants spend is gen- computer. During the time in the building, an occupant will also
have a certain probability of charging his/her personal electronic
device(s). For the purpose of simplicity and computational effec-
Table 1
Parameters of the social force model.
tiveness, this study assumed that an occupant would only charge
one device and keep charging it until he/she leaves the building.
Parameter Symbol Value
Another assumption in this study is that the power rating of each
Agent radius r 0.25 m electric appliance (e.g., lights, computers and personal electronic
Strength of social repulsive force A 2000 N devices) is the same according to the observatory results.
Characteristic distance of social repulsive B 0.08 m
force
Coefficient of sliding friction k 240000 kg m1 3.2.5. Simulation
s1 In this study, the model simulation duration is 6 months (from
Body compression coefficient j 120000 kg s2 17th June 2019 to 14th Dec 2019). This period was chosen because
Agent reaction time s 0.5 s
it was before any interruptions and/or restrictions due to COVID-
6
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 6. The hourly record of occupant entering Alan Gilbert learning commons.

Fig. 7. The average time occupant spent in Alan Gilbert learning commons.

Fig. 8. The general route of an occupant in the building.

19 pandemic. Hence, the data sets would provide a good represen- scale. During the simulation, the value of each outcome will be
tation of normal school activities, with most of the students on recorded at every time step and then these minute-by-minute out-
campus. The time granularity (time step) is set as 1 min basis. comes will be resampled into 1-hourly basis, with the average
The outcomes of this study are ‘‘Number of students”, ‘‘Number value corresponding to the hourly-based energy consumption
of computers in use”, ‘‘Number of computers on standby”, ‘‘Num- and weather data. Considering the randomness of occupant beha-
ber of lighting” and ‘‘Number of charging devices”. The power of viour, the simulation will be implemented 5 times and each time
each electric appliance was not taken into consideration mainly before executing the model, the random seed which controls the
because the generated data sets will be normalised into a unit randomness of the model will be manually changed to different

7
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Table 2
Questionnaire for electricity consumption behaviour in Alan Gilbert learning commons.

Q1. Which part of Alan Gilbert learning commons are you most likely to use?
Public area Neutral Meeting room
Q2. How likely are you to make use of a computer when studying/working in Alan Gilbert learning commons?
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
Q3. How likely are you to charge your personal electronic device(s) while studying/working in Alan Gilbert learning commons?
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
Q4. How likely are you to turn off the computer when leaving Alan Gilbert learning commons?
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely
Q5. How likely are you to switch off the lights when leaving Alan Gilbert learning commons?
(Display this question if Q1 answer is Meeting room)
Extremely unlikely Somewhat unlikely Neither likely nor unlikely Somewhat likely Extremely likely

Table 3 it is not holidays/exams and 1 means it is holidays/exams) and


Summary of Questionnaire response. ‘Period’ (1–6). ‘Period’ is an artificial feature that represents the dif-
Question Content Mean Std ferent periods of the day. In this study, the day was divided into
Q1 Which part of Alan Gilbert learning commons are 1.43 0.75
last night (11:00 pm  6:00am), morning (6:00am 12:00am),
you most likely to use? afternoon (12:00am  5:00 pm), evening (5:00 pm  9:00 pm)
Q2 How likely are you to make use of a computer 4.04 1.18 and night (9:00 pm  11:00 pm). Then the ‘Period’ was further
when studying/working in Alan Gilbert learning numericalised into 1–5 (i.e., last night: 1, morning: 2, afternoon:
commons?
3, evening: 4 and night: 5), which is the format acceptable to the
Q3 How likely are you to charge your personal 4.26 1.02
electronic device(s) while studying/working in machine learning methods. The holiday and examination dates
Alan Gilbert learning commons? were extracted from UoM calendar respectively. Detailed informa-
Q4 How likely are you to turn off the computer when 3.82 1.30 tion regarding holidays considered for this study are as follows:
leaving Alan Gilbert learning commons? 10-06-2019: 15-09-2019, while the examination dates considered
Q5 How likely are you to turn off the lights when 4.2 1.16
leaving Alan Gilbert learning commons?
are 19-08-2019: 01-09-2019.
Building access record of Alan Gilbert learning commons was
extracted from the University’s building management system,
including ‘‘User ID”, ‘‘Entry time” and ‘‘Exit time”. Based on the
values, so as to guarantee that the outcomes of each simulation is
record, the hourly number of users entering and leaving the build-
different. The outcomes of each of the 5 simulation runs were
ing was generated and named as ‘‘Enter” and ‘‘Exit” respectively.
named from simulated occupant data_1 (SOD1) to simulated occu-
Furthermore, Figs. 6 and 7 respectively shows occupancy loading
pant data_5 (SOD5), respectively. Each SOD includes the hourly
for the case study building as well as the average hourly duration
‘‘Number of students”, ‘‘Number of computers in use”, ‘‘Number
of occupants for the study period.
of computers on standby”, ‘‘Number of lighting” and ‘‘Number of
Original data was then combined with SOD to generate the final
charging devices”.
data sets for the building energy consumption prediction using
machine learning algorithms. As depicted in Fig. 2, a total of 7 data-
3.3. Energy consumption prediction module sets were created from ‘Original + SOD10 , ‘Original + SOD20 until
‘Original + SOD50 , ‘Original + SOD1-50 (combination of original
Machine learning methods were employed to predict the and all simulated occupational data) and ‘Original’.
energy consumption of Alan Gilbert learning commons. In order Data normalisation was implemented in order to translate the
to evaluate the feasibility and capability of generalisation of ABM data into a unit sphere to eliminate the effect of the difference in
in improving the performance of machine learning in terms of pre- scale of the feature dimension. Data normalization is a critical step
diction, 4 widely used machine learning methods are employed as for some machine learning algorithms that are based on Euclidean
candidate methods for energy consumption prediction tasks, distance such as K-Nearest Neighbor (KNN). The mathematical
namely, Stochastic gradient descent regression (SGD), support vec- expression of data normalisation is shown in Equation (8):
tor machine (SVM), random forest (RF) and K-nearest neighbours
(KNN). X  X min
X norm ¼ ð8Þ
X max  X min

3.3.1. Data collection and pre-processing Where X is a data point, X min is the minimum value, X max is the max-
Apart from SOD, 6 months (from 17th June 2019 to 14th Dec imum value and X norm is the normalized value.
2019) of Original data (e.g., building energy consumption, building In order to explore the impact of data length on prediction per-
access record and meteorological data) was collected from UoM’s formance, different lengths (1 month, November only), 3 months
building energy management system, building access system as (September to November) and 5 months (July to November)) of
well as UoM’s meteorological observatory respectively. The granu- training data were set and December data was used as testing data.
larity of the acquired data was based on hourly basis. The meteo-
rological data consists of a total of 9 features, including wind 3.3.2. Boruta feature selection (BFS)
speed, wind direction, global solar radiation, indirect solar radia- While the dimensionality of the data has increased, the previ-
tion, seconds sunshine, temperature, relative humidity, apparent ous study [46] has shown that increasing the dimensionality of
temperature, and pressure. the data does not always improve the predictive performance of
With regards to time information, a total of 10 features have machine learning methods. Repetitive, redundant, and irrelevant
been extracted, including ’Hour of day’ (0–23), ’Day of week’ (0– data can reduce the performance of machine learning. Therefore,
6: Monday-Sunday), ’Day in month’ (1–30), ’Day of year’ (1–365), before feeding the data into the machine learning algorithms
’Week’ (28–50), ’Month’ (7–12), ’Weekday’ (0, 1), ‘Holidays’ (0, 1), (i.e., SGD, KNN, RF and SVM that are employed in this research
‘Exam Period’ (0, 1) (for ‘‘Holidays” and ‘‘Exam Period”, 0 means which function as regressors of prediction task), implementing fea-
8
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

 Xn 
ture engineering, especially feature selection method to determine 1
min kwk2 þ C f
i¼1 i
ð11Þ
all relevant feature sets is essential and would have the greatest 2
impact on the prediction performance of MLs.
BFS is a wrapper feature selection algorithm that embeds a ran- where w represents the normal vector, C is the cost constant and f
dom forest (RF) for determining all features that are relevant to the represents the relaxation factor.
outcome labels and discarding irrelevant features which can occa- c) Random Forest (RF)
sionally make themselves important due to randomness [47]. It is RF is an ensemble method [50] that consists of several indepen-
more sophisticated and difficult than common feature selection dent decision trees (DT). When conducting prediction task, the
algorithms which depend on the performance of prediction as training period entails that each DT generated in a RF algorithm
the essential criterion for selecting the features. The main disad- is trained based on a different random subset of the training set
vantage of those algorithms is the loss of some relevant features and the categories with the majority of votes from individual pre-
[48]. Adding more randomness to the feature set is the core idea dictions of all trees are regarded as finial prediction results. The
of BFS, by randomly shuffling data of original feature set and then accuracy and stability of RFs are significantly enhanced when com-
merging the shuffled with then original data to form an extended pared to single DT, thereby making them more suitable for tackling
feature set. BFS is then implemented to assess the importance of a wider range of prediction challenges.
the features based on the extended feature set and only original d) K-nearest neighbours (KNN)
features whose importance is higher than that of shuffled features KNN regression is a non-parametric algorithm and the principle
are considered important. A detailed procedure for BFS is as follow: of KNN regression is to find a predetermined number of points in
the training sample that are closest in distance to the new point,
i. Add randomness to the feature set by creating shuffled and to predict the label from these points [51,52]. Assuming data
copies (shadow features) of all features and then merge set ðx1 ; y1 Þ;    ; ðxn ; yn Þ is the training set with distance metrics d,
the shadow features with original features to form an where xi ¼ ðxi1 ; xi2 ;    ; xim Þ is independent input variables. When
extended feature set. given a new instance x, KNN calculates the distance di between x
ii. Implement an RF model on the extended feature set. Mea- and each xi and then ranks the distances di (the corresponding i
sure the average reduced accuracy (Z value), whereby a th nearest neighbour NN i ðxÞ) by its values and its output is noted
higher Z value implies a more important feature. The largest as yi ðxÞ. The predicted output is the mean of the outputs of its
P
Z value of the shadow feature is denoted as Z max . KNN in regression as b y ¼ 1k ki¼1 yi ðxÞ.
iii. During each iteration, the features whose Z value is higher
than Z max will be kept, while those lower than Z max are con- 3.3.4. Evaluation metrics
sidered highly unimportant and will be eliminated from the In order to evaluate the model performance in terms of building
feature set. energy consumption prediction, the following evaluation metrics
iv. Repeat the above process until all features are confirmed (or are included: root mean square error (RMSE), coefficient of deter-
rejected) or reach the maximum number of iterations mination (R2), mean absolute error (MAE) and mean absolute per-
centage error (MAPE)
3.3.3. Machine learning algorithms vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u n
a) Stochastic gradient descent regression (SGD). u1 X
SGD regression is a linear regression that employs a stochastic RMSE ¼ t ðy  pi Þ2 ð12Þ
n i¼1 i
gradient descent algorithm as a hyperparameter optimiser to
determine the best model parameters (e.g., the coefficients b of lin- P  Pn  Pn 
ear regression) and the mathematical description of GSD is as n ni¼1 yi pi  i¼1 yi Þ i¼1 pi
R2 ¼ rhffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi ð13Þ
follows: n Pn 2 ih Pn Pn 2 i
n y 2Þ  y n p 2Þ  p
Assuming a linear regression f ðxÞ ¼ xT x þ b with coefficient i¼1 i i¼1 i i¼1 i i¼1 i

x 2 Rm and intercept b 2 R. The objective of SGD is to determine


a coefficient and intercept that can achieve the minimum loss 1X n
MAE ¼ jy  p i j ð14Þ
function Eðx; bÞ by using the sum of squared errors between the n i¼1 i
trained set and real labels. This process can be mathematically rep-
resented as shown in Equation (9) n
1X
yi  pi
MAPE ¼ ð15Þ
1X yi
n
n i¼1
Eðx; bÞ ¼ Lðyi ; f ðxi ÞÞ þ aRðxÞ ð9Þ
n i¼1
where yi is actual energy consumption and pi is the predicted
Where L is a loss function that measures model fit and least-squares energy consumption.
2
is chosen as loss function Lðyi ; f ðxi ÞÞ ¼ 12 ðyi  f ðxi ÞÞ , R is a regularisa-
tion term that penalises model complexity, greater than0a is a non- 4. Results and discussion
negative hyperparameter that controls the regularisation strength.
b) Support vector machine (SVM) The feedback of the online questionnaire that was deployed for
SVM was first proposed by Vapnik [49] and has been widely uti- 3 months indicates a 25.35 % response rate (219 post graduate
lised ever since. SVM is a binary classification model whose basic research students responded to the questionnaire out of a popula-
model is a linear classifier defined by maximising the geometric tion of 864). Among the 219 responses, 4 responses were discarded
interval on the feature space. The mathematical description of due to their lack of completion. The respondents assessed each
SVM is as follows: question on a 5-point scale (with the exception of Question 1 that
was based on a 3-point scale), from ‘Extremely unlikely’ (score = 1)
y ¼ xuðxÞ þ b ð10Þ
to ‘Extremely likely’ (score = 5). The results of the questionnaire are
where y is the predicted values, b and x are adjustable coefficients, summarised in Table 3 and Fig. 9.
u represents the hyperplane. The purpose of the SVM method is to Based on the results of the questionnaire, the probabilities of
minimise the empirical risk as given in Equation (11): occupant behaviours are summarised in Table 4.
9
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 9. Summary of questionnaire response.

Table 4 as a result of ‘‘Welcome Week” whereby students are predomi-


Summary of probabilities of occupant behaviour. nantly occupied by school registrations, familiarisation with new
Question Content Probability study environments and general networking, which are mostly
Q1 Use public area of Alan Gilbert learning commons 79 %
done outdoors or within designated administrative offices. Regular
Q2 Make use of a computer when studying/working in 46 % teaching activities fully commence in later September, thereby
Alan Gilbert learning commons? explaining the relatively stable pattern of energy consumption in
Q3 Charge personal electronic device(s) while 52 % Segment 5. The variations imposed by these very dynamic univer-
studying/working in Alan Gilbert learning commons
sity time patterns also cause immense differences in energy con-
Q4 Turn off computer when leaving Alan Gilbert 40 %
learning commons sumption, which in turn impedes the ability of machine learning
Q5 Turn off lights when leaving 61 % methods to understand the patterns of building energy
consumption.
A comparison between actual building access record and simu-
The occupant behaviour simulation module was implemented lated occupancy profiles is depicted in Fig. 13. Although the simu-
using the Anylogic software (academic version 8.7.12) at the lated data shared a similar pattern with the actual data, the
University of Manchester (UK). Fig. 10 provides a screenshot of typ- inherent randomness within the model led to an obvious lag
ical interfaces that emerge during module operation, which within some of the simulated data, which in turn generated an
includes 2D and 3D views of the animations. The chart output overall larger value compared with the actual data.
and the logic flowchart of simulation are also presented in Fig. 11. The correlation analysis between occupant behaviours and
Hourly energy consumption of Alan Gilbert learning commons energy consumption was conducted as well to explore the feasibil-
is presented in Fig. 12. Despite what appears to be a relatively ity of using the simulated occupational features as input data for
stable trend of energy consumption throughout the data sets, the building energy consumption prediction. As shown in Fig. 14, a sig-
university schedule still introduced significant fluctuations to the nificant difference in terms of the feature correlation between each
energy consumption patterns. For instance, Segment 1 in Fig. 12 round of simulation can be detected, which further strengthened
depicts the energy consumption during examination and disserta- the impact of the randomness on occupant behaviour and the
tion periods for undergraduate and postgraduate (taught) students necessity to implement occupant behaviour simulation multiple
respectively, followed by summer holiday (Segment 2) when stu- times. Despite the differences, each occupant behaviour feature
dents are required to return home. Segment 3 then depicts a sud- showed a strong correlation with energy consumption.
den increase around August, due to postgraduate (taught) In addition, the correlation between other input features and
dissertation activities and re-sit examinations (make-up examina- energy consumption was conducted as depicted in Fig. 15. Features
tions), and another slight decrease in early September (Segment 4) showing a strong correlation were labelled in red boxes (i.e., the

10
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 10. The graphic user interface of occupant behaviour simulation module (2D and 3D animation).

coefficient efficient of features in the red box is at least 2 times lar- profile in Fig. 10, when using larger size of training data, the uni-
ger than that of features without the red box). As shown in the Fig- versity opening week, examination, dissertation and summary
ure, building access record indicated the strongest correlation, are included into training data. The machine learning algorithms
followed by some meteorological features including solar radia- will learn the information about this information and adjust their
tion, relative humidity, and temperature. Hour of day, period, hol- hyperparameters accordingly, which will in turn impede their per-
iday and examination were of time information that showed a formance in predicting building energy consumption during nor-
strong correlation with energy consumption. mal teaching period.
The results of energy consumption prediction using 4 machine Considering the possible impact of day type (e.g., weekday and
learning algorithms and based on different datasets (e.g., Original weekend) on prediction performance of machine learning algo-
data, Original + SOD1, Original + SOD2, . . ., Original + SOD5 and rithms, a prediction approach that was based on just weekday data
Original + SOD1-5) and different lengths of data (e.g., 1 month, was conducted, using the same procedure. The results obtained are
3 months and 5 months) are listed in Fig. 16 and Table 5. Due to detailed in Table 6. The results indicate that there is no difference
the comparative similarity of evaluation metrics in terms of predic- in prediction performance when using weekday data. As a self-
tion results, only RMSE result is illustrated. As shown in Fig. 16, all learning hub or a library, the number of students and the time they
machine learning algorithms (except RF), under all scenarios spend in the building is relatively stable throughout the whole
including SOD can significantly improve the prediction perfor- week, which may explain why there is no obvious difference in
mance. For instance, when using SGD based on 1 month training performance when only using weekday data compared with using
data, the prediction accuracy improved from18.6 % to 35.8 %, com- all data.
pared with using only Original data. For all algorithms, the best Although combing Original data and SOD1-5 provided the best
performance was achieved when using Original + SOD1-5. In terms prediction performance, it also led to a significant increase in input
of training data size, it was noticed that by extending the data size, data dimension which could hinder machine learning algorithms
the prediction accuracy correspondingly decreased. The main rea- from delivering the best performance. On the one hand, a higher
son for decrease in performance can be attributed to the complex- dimension will undoubtedly require more computer resource and
ity of training data. As seen in the historical energy consumption learning time. On the other hand, it could lead to unmanageable
11
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 11. The graphic user interface of occupant behaviour simulation module (a) chart output and (b) the logic flowchart.

Fig. 12. Hourly energy consumption of Alan Gilbert learning commons.

12
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 13. Comparison between actual and simulated building access records.

Fig. 14. The correlation coefficient between occupant behaviours and energy consumption.

number of dimensions (i.e., when the number of dimensions of agent-based modelling and machine learning methods, there are
increases, the volume of the space increases too quickly and thus still some limitations that need to be addressed through further
the available data becomes sparse). With regards to this study, research. For instance, occupant behaviours are extremely stochas-
the number of features increased from 22 (in the Original dataset) tic, and a variety of factors may contribute to such stochasticity
to 47 (Original + SOD1-5). It is also worth noting the accuracy level including but not limited to environmental factors (especially the
or representativeness of the simulated occupant behaviour, the surrounding environment), human characteristics, social-
higher the number of features, and the model will need to be exe- economic factors and peer pressure. In this study, those factors
cuted more times to obtain various possible results. In order to bet- were not taken into consideration due to the specificity of Alan Gil-
ter optimise the computational effectiveness of the feature bert learning commons (library), whereby occupant electricity-
selection process, without necessarily compromising on the qual- related behaviours are relatively simple and occupants have lim-
ity of the analysis, Boruta feature selection approach was imple- ited interactions with each other. Also, the procedures for estab-
mented to identify and eventually select all relevant features as lishing ABM to simulate occupant behaviours varies from
listed in Table 7. building to building, and it is believed that significant efforts
The results of using the selected features for building energy may be required when simulating a building with more types of
consumption are listed in Table 8, and a comparison among Origi- occupants and more complicated occupant behaviour patterns,
nal, Selected and Original + SOD1-5 is illustrated in Fig. 17 as well. which may affect the generalisation capability of the proposed
The results suggest that using Selected data slightly improved the hybrid of agent-based modelling and machine learning method.
performance of machine learning algorithms in predicting building It should also be emphasised that there could be potential overfit-
energy consumption, despite the exclusion of some features. ting problems due to the assumptions made in the proposed
Despite the significant improvements that have been achieved method and the complexity of the structure of the proposed
in predicting building energy consumption via the proposed hybrid method.

13
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Fig. 15. The correlation coefficient between meteorological, time information and energy consumption.

Fig. 16. The machine learning performance (RMSE) in predicting building energy consumption based on different datasets.

14
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Table 5
The machine learning performance in predicting building energy consumption based on different dataset.

Dataset SGD KNN SVM RF


RMSE R2 MAE MAPE RMSE R2 MAE MAPE RMSE R2 MAE MAPE RMSE R2 MAE MAPE
Original (1 month) 13.2 0.43 11.07 0.1 11.29 0.58 8.32 0.08 13.59 0.4 10.48 0.1 6.62 0.86 5.29 0.05
Original (3 months) 14.68 0.3 12.2 0.11 12.12 0.52 9.2 0.09 15.38 0.23 11.82 0.11 10.11 0.67 7.86 0.07
Original (5 months) 23.04 0.74 17.98 0.17 12.18 0.51 9.36 0.09 23.42 0.79 17.96 0.17 11.85 0.54 9.48 0.09
Original + SOD1 (1 month) 9.68 0.69 7.73 0.07 9.26 0.72 6.93 0.06 9.78 0.69 7.87 0.07 7.1 0.84 5.52 0.05
Original + SOD1 (3 months) 12.43 0.49 10.15 0.09 11.36 0.58 8.81 0.08 12.08 0.52 9.84 0.09 10.33 0.65 8.09 0.07
Original + SOD1 (5 months) 20.48 0.37 15.7 0.14 11.37 0.58 8.83 0.08 19.24 0.21 14.54 0.13 11.75 0.55 9.22 0.08
Original + SOD 2 (1 month) 9.22 0.82 7.13 0.07 8.89 0.74 6.42 0.06 9.1 0.73 6.94 0.07 7.0 0.84 5.64 0.05
Original + SOD 2 (3 months) 11.95 0.53 9.3 0.09 11.56 0.56 9.07 0.08 11.51 0.57 9.02 0.09 9.95 0.68 7.88 0.07
Original + SOD 2 (5 months) 20.64 0.39 15.74 0.14 12.13 0.52 9.49 0.09 20.72 0.4 15.88 0.15 11.27 0.58 9.08 0.08
Original + SOD 3 (1 month) 9.55 0.7 7.34 0.07 8.95 0.74 6.7 0.06 9.86 0.68 7.64 0.07 6.82 0.85 5.5 0.05
Original + SOD 3 (3 months) 12.54 0.49 9.9 0.09 11.56 0.56 8.89 0.08 12.31 0.5 9.76 0.09 10.76 0.62 8.41 0.08
Original + SOD 3 (5 months) 20.7 0.4 15.45 0.14 11.31 0.58 8.88 0.08 19.57 0.25 14.6 0.13 12.06 0.52 9.69 0.09
Original + SOD 4 (1 month) 10.75 0.62 8.91 0.08 9.89 0.68 7.13 0.07 10.44 0.64 8.3 0.08 6.62 0.86 5.29 0.05
Original + SOD 4 (3 months) 12.6 0.48 10.09 0.1 11.21 0.59 8.85 0.08 12.44 0.49 9.88 0.09 10.23 0.66 8.01 0.07
Original + SOD 4 (5 months) 20.75 0.41 16.38 0.15 11.68 0.55 9.13 0.08 19.74 0.27 15.52 0.14 11.99 0.53 9.45 0.09
Original + SOD 5 (1 month) 10.69 0.63 8.88 0.08 9.24 0.72 6.97 0.07 10.38 0.65 8.37 0.08 6.69 0.85 5.36 0.05
Original + SOD 5 (3 months) 12.49 0.49 10.15 0.1 10.49 0.64 8.25 0.08 12.48 0.49 10.11 0.1 10.2 0.66 8.04 0.07
Original + SOD 5 (5 months) 20.09 0.32 16.18 0.15 11.28 0.58 8.86 0.08 20.5 0.37 16.45 0.1 11.22 0.59 8.99 0.08
Original + SOD1-5 (1 month) 8.47 0.77 6.6 0.06 7.19 0.83 5.49 0.05 8.29 0.78 6.39 0.06 6.82 0.85 5.42 0.05
Original + SOD1-5 (3 months) 11.38 0.58 9.04 0.08 10.57 0.63 8.45 0.08 11.41 0.57 9.14 0.09 10.72 0.62 8.44 0.08
Original + SOD1-5 (5 months) 18.99 0.18 14.72 0.14 10.9 0.61 8.73 0.08 18.55 0.12 14.37 0.13 11.87 0.54 9.18 0.08

Note: the unit for RMSE and MAE is kWh.

Table 6
The machine learning performance in predicting building energy consumption based on different dataset (weekday data).

Dataset SGD KNN SVM RF


RMSE R2 MAE MAPE RMSE R2 MAE MAPE RMSE R2 MAE MAPE RMSE R2 MAE MAPE
Original (1 month) 13.34 0.39 11.32 0.1 12.34 0.48 9.35 0.09 13.35 0.39 10.77 0.1 6.72 0.85 5.51 0.05
Original (3 months) 14.15 0.32 11.83 0.11 13.05 0.42 10.06 0.09 14.36 0.3 11.55 0.11 8.49 0.75 6.81 0.06
Original (5 months) 22.74 0.77 17.75 0.17 12.21 0.49 9.73 0.09 21.77 0.62 16.66 0.16 11.38 0.56 9.03 0.08
Original + SOD1 (1 month) 9.58 0.69 7.82 0.07 9.45 0.69 7.29 0.07 9.64 0.68 7.93 0.07 7.5 0.81 5.69 0.05
Original + SOD1 (3 months) 11.63 0.54 9.49 0.13 11.62 0.54 9.41 0.09 11.71 0.53 9.41 0.09 9.2 0.71 7.27 0.07
Original + SOD1 (5 months) 18.4 0.16 14.02 0.13 11.52 0.55 9.17 0.08 17.88 0.09 13.5 0.12 11.54 0.54 8.87 0.08
Original + SOD 2 (1 month) 8.89 0.73 6.84 0.07 8.53 0.75 6.59 0.06 9.0 0.72 6.8 0.07 7.02 0.83 5.74 0.05
Original + SOD 2 (3 months) 11.58 0.54 8.95 0.08 11.51 0.55 9.21 0.09 11.37 0.56 8.97 0.08 9.12 0.72 7.29 0.07
Original + SOD 2 (5 months) 19.58 0.35 15.19 0.14 11.58 0.54 9.38 0.09 20.23 0.4 15.59 0.14 11.06 0.58 8.71 0.08
Original + SOD 3 (1 month) 9.91 0.66 7.63 0.07 9.03 0.72 6.92 0.06 9.93 0.66 7.64 0.07 6.59 0.85 5.36 0.05
Original + SOD 3 (3 months) 12.76 0.44 10.02 0.09 11.08 0.58 8.88 0.08 12.84 0.44 10.2 0.09 8.87 0.73 7.02 0.06
Original + SOD 3 (5 months) 20.29 0.41 15.06 0.14 11.08 0.58 8.89 0.08 19.83 0.34 14.8 0.13 11.23 0.57 8.88 0.08
Original + SOD 4 (1 month) 10.63 0.61 8.88 0.08 10.57 0.62 8.04 0.07 10.43 0.63 8.57 0.08 6.47 0.86 5.22 0.05
Original + SOD 4 (3 months) 12.29 0.48 9.71 0.14 11.65 0.54 9.38 0.09 12.21 0.49 9.58 0.09 8.37 0.76 6.74 0.06
Original + SOD 4 (5 months) 18.69 0.19 14.74 0.14 11.97 0.51 9.67 0.09 18.93 0.22 14.87 0.14 12.09 0.5 9.27 0.08
Original + SOD 5 (1 month) 11.13 0.58 9.45 0.09 9.4 0.7 7.34 0.07 10.55 0.62 8.63 0.08 6.54 0.85 5.33 0.05
Original + SOD 5 (3 months) 12.65 0.45 10.21 0.1 11.39 0.56 8.97 0.08 12.69 0.45 10.13 0.1 8.61 0.75 6.93 0.06
Original + SOD 5 (5 months) 19.44 0.29 15.79 0.15 11.7 0.53 9.5 0.09 19.36 0.28 15.63 0.15 11.08 0.58 8.52 0.08
Original + SOD1-5 (1 month) 8.32 0.76 6.35 0.06 6.91 0.84 5.56 0.05 8.18 0.77 6.22 0.06 6.5 0.86 5.04 0.05
Original + SOD1-5 (3 months) 11.5 0.55 9.11 0.09 10.45 0.63 8.33 0.08 11.68 0.53 9.24 0.09 8.88 0.73 6.88 0.06
Original + SOD1-5 (5 months) 18.23 0.13 14.22 0.13 10.81 0.6 8.67 0.08 18.39 0.16 14.37 0.13 10.68 0.61 7.83 0.07

Note: the unit for RMSE and MAE is kWh.

15
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

Table 7 5. Conclusion
Summary of Boruta feature selection.

Content Feature Occupant behaviour as a primary factor that contributes to the


Selected ’Wind direction’, ’Global solar radiation’, ’Indirect solar uncertainty of building energy consumption has to a great extent
data radiation’, ’Seconds sunshine’, ’Temperature’, ’Relative humidity’, hindered the performance of machine learning in predicting build-
’Apparent temperature’, ’Pressure’, ’Enter’, ’Exit’, ’Hour of day’,, ing energy consumption. For most of the existing buildings, it is
’Period’, ’Week’, ’Month’, ’Holiday’, ’Number of lighting10 ,
extremely challenging or impossible to obtain occupant behaviour
’Number of student10 , ’Number of in use computers20 , ’Number
of standby computer20 , ’Number of lighting20 , ’Number of data for a variety of reasons including privacy concerns and unre-
personal charging devices20 , ’Number of student20 , ’Number of in liable building energy management systems. Therefore, this paper
use computers30 , ’Number of standby computer30 , ’Number of presented an agent-based (ABM) machine learning model that is
lighting30 , ’Number of personal charging devices20 , ’Number of integrated with social force modelling (SFM) in order to predict
student30 , ’Number of in use computers40 , ’Number of standby
electrical energy consumption. The fundamental aim of ABM in
computer40 , ’Number of personal charging devices40 , ’Number of
in use computers50 , ’Number of standby computer50 , ’Number of this study is to stimulate occupant behaviour while SFM is applied
lighting50 , ’Number of personal charging devices50 , ’Number of to accurately simulate occupant movement after which the resul-
student50 . tant occupational data then serves as input features to machine
Rejected ’Wind speed’, ’Day in month’, ’Day of year’, ’Day of week’,
learning algorithms. Alan Gilbert learning commons, a self-
data ’Weekday’, ’Season’, ’Number of standby computer10 , ’Number of
in use computers10 , ’Number of personal charging devices10 ,
learning hub at the University of Manchester was used as a case
’Number of lighting40 , ’Number of student40 study. The occupant behaviours in Alan Gilbert learning commons
included the areas students choose to study/work, the probability
of students turning off computers/lighting when leaving, and the
probability of charging personal electronic devices during study/-

Fig. 17. The machine learning performance (RMSE) in predicting building energy consumption based on Original, Selected and Original + SOD1-5.

Table 8
The machine learning performance in predicting building energy consumption based on selected dataset.

Dataset SGD KNN SVM RF


RMSE R2 MAE MAPE RMSE R2 MAE MAPE RMSE R2 MAE MAPE RMSE R2 MAE MAPE
Selected data (1 month) 8.22 0.75 6.23 0.06 6.71 0.85 5.19 0.05 8.13 0.77 6.09 0.06 6.53 0.85 5.01 0.05
Selected data (3 months) 11.31 0.58 9.04 0.08 10.0 0.66 7.86 0.07 11.69 0.53 9.21 0.09 9.44 0.7 7.33 0.07
Selected data (5 months) 18.13 0.11 14.15 0.12 11.49 0.55 9.1 0.08 19.83 0.34 15.37 0.14 10.62 0.61 7.83 0.07

Note: the unit for RMSE and MAE is kWh.

16
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

work. The number of students, in-use/standby computers, lighting [11] R.K. Strand, Incorporating two-dimensional conduction modeling techniques
into an energy simulation program: the EnergyPlus radiant system example,
and charging of devices was the output data generated from ABM.
Energy Build. 274 (2022), https://doi.org/10.1016/j.enbuild.2022.112405.
The model was implemented 5 times in order to generate repre- [12] R. Meldem, F. Winkelmann, Comparison of DOE-2 with temperature
sentative possibilities. The simulated occupational data together measurements in the Pala test houses, Energy Build. 27 (1) (1998) 69–81,
with original data (meteorological and time information) were https://doi.org/10.1016/S0378-7788(97)00027-3.
[13] S. Wang, X. Liu, S. Gates, An introduction of new features for conventional and
used as input features. 4 popular machine learning algorithms, hybrid GSHP simulations in eQUEST 3.7, Energy Build. 105 (2015) 368–376,
namely, stochastic gradient descent regress, support vector https://doi.org/10.1016/j.enbuild.2015.07.041.
machine, K nearest neighbours and random forest were included [14] C. Peng, L. Wang, X. Zhang, DeST-based dynamic simulation and energy
efficiency retrofit analysis of commercial buildings in the hot summer/cold
to test the generalisation capability of ABM. winter zone of China: a case in Nanjing, Energy Build. 78 (2014) 123–131,
The results suggest that combining original data with single https://doi.org/10.1016/j.enbuild.2014.04.023.
simulated data can significantly improve the performances of [15] Y. Chen, X. Liang, T. Hong, X. Luo, Simulation and visualization of energy-
related occupant behavior in office buildings, Build. Simul. 10 (6) (2017) 785–
machine learning methods in predicting building energy consump- 798, https://doi.org/10.1007/s12273-017-0355-2.
tion. The best performances were achieved when combining orig- [16] S. Norouziasl, A. Jafari, Y. Zhu, Modeling and simulation of energy-related
inal data with all simulated occupant data (Original + SOD1-5). human-building interaction: a systematic review, J. Build. Eng. 44 (2021),
https://doi.org/10.1016/j.jobe.2021.102928.
Meanwhile, the impact of day type (weekday or weekend) was also [17] Y. Zhu, S. Saeidi, T. Rizzuto, A. Roetzel, R. Kooima, Potential and challenges of
explored, and the results implied that for library-type buildings immersive virtual environments for occupant energy behavior modeling and
whereby the daily number of students is relatively stable through- validation: a literature review, J. Build. Eng. 19 (2018) 302–319, https://doi.
org/10.1016/j.jobe.2018.05.017.
out the week, day type merely has any impact on the performance
[18] C.M. Clevenger, J. Haymaker, The impact of the building occupant on energy
of building energy consumption prediction processes. The perfor- modeling simulations, Jt. Int. Conf. Comput. Decis. Mak. Civ. Build. Eng. (2006)
mance of prediction was further improved slightly by implement- 1–10.
ing Boruta Feature Selection which excluded the irrelevant/ [19] S.O. Abioye et al., Artificial intelligence in the construction industry: a review
of present status, opportunities and future challenges, J. Build. Eng. 44 (2021),
redundant input features. It should also be emphasised that the https://doi.org/10.1016/j.jobe.2021.103299.
extension of training data length in this study led to a correspond- [20] Q. Qiao, A. Yunusa-Kaltungo, R. Edwards, Predicting building energy
ing decrease in the performance of machine learning algorithms consumption based on meteorological data, IEEE PES/IAS PowerAfrica 2020
(2020) 1–5, https://doi.org/10.1109/PowerAfrica49420.2020.9219909.
during building energy consumption prediction, due to the com- [21] H.C. Putra, C.J. Andrews, J.A. Senick, An agent-based model of building
plexity associated with the historical patterns of energy consump- occupant behavior during load shedding, Build. Simul. 10 (6) (2017) 845–
tion, which further highlights the need to intensify research on the 859, https://doi.org/10.1007/s12273-017-0384-x.
[22] Q. Qiao, A. Yunusa-Kaltungo, R. Edwards, Hybrid method for building energy
optimisation of data selection approaches in the near future. consumption prediction based on limited data, IEEE PES/IAS PowerAfrica 2020
(2020) 1–5, https://doi.org/10.1109/PowerAfrica49420.2020.9219915.
[23] R.F. Rupp, R.K. Andersen, J. Toftum, E. Ghisi, Occupant behaviour in mixed-
Data availability mode office buildings in a subtropical climate: beyond typical models of
adaptive actions, Build. Environ. 190 (December) (2020) 2021, https://doi.org/
Data will be made available on request. 10.1016/j.buildenv.2020.107541.
[24] S. Zheng, K. LeWinn, T. Ceja, M. Hanna-Attisha, L. O’Connell, S. Bishop, Adaptive
behavior as an alternative outcome to intelligence quotient in studies of
Declaration of Competing Interest children at risk: a study of preschool-aged children in flint, MI, USA, Front.
Psychol. 12 (August) (2021) 1–13, https://doi.org/10.3389/fpsyg.2021.692330.
[25] M.S. Mustapa, S.A. Zaki, H.B. Rijal, A. Hagishima, M.S.M. Ali, Thermal comfort
The authors declare that they have no known competing finan- and occupant adaptive behaviour in Japanese university buildings with free
cial interests or personal relationships that could have appeared running and cooling mode offices during summer, Build. Environ. 105 (2016)
to influence the work reported in this paper. 332–342, https://doi.org/10.1016/j.buildenv.2016.06.014.
[26] A. Mahdavi, F. Tahmasebi, B. Gunay, W.O’Brien, S. D’Oca, Technical Report :
Occupant Behavior Modeling Approaches and Evaluation, no. November. 2017.
References [27] Y. Liu et al., Thermal preference prediction based on occupants’ adaptive
behavior in indoor environments- A study of an air-conditioned multi-
occupancy office in China, Build. Environ. 206 (2021), https://doi.org/
[1] E. Azar, C.C. Menassa, ‘‘A decision framework for energy use reduction
10.1016/j.buildenv.2021.108355.
initiatives in commercial buildings,” Proc. - Winter Simul. Conf., pp. 816–827,
[28] P. Zheng et al., Thermal adaptive behavior and thermal comfort for occupants
2011, 10.1109/WSC.2011.6147808.
in multi-person offices with air-conditioning systems, Build. Environ. 207
[2] D.K. Abideen, A. Yunusa-Kaltungo, P. Manu, C. Cheung, A systematic review of
(2022), https://doi.org/10.1016/j.buildenv.2021.108432.
the extent to which BIM is integrated into operation and maintenance,
[29] S. Kumar, M.K. Singh, Seasonal comfort temperature and occupant’s adaptive
Sustainability 14 (14) (2022), https://doi.org/10.3390/su14148692.
behaviour in a naturally ventilated university workshop building under the
[3] T. Zhang, P.-O. Siebers, U. Aickelin, Modelling office energy consumption: an
composite climate of India, J. Build. Eng. 40 (2021), https://doi.org/10.1016/j.
agent based approach, SSRN Electron. J. (2018) 1–15, https://doi.org/10.2139/
jobe.2021.102701.
ssrn.2829316.
[30] A. Zhang, Q. Huang, Y. Du, Q. Zhen, Q. Zhang, Agent-based modelling of
[4] J. Mahecha Zambrano, U. Filippi Oberegger, G. Salvalai, Towards integrating
occupants’ clothing and activity behaviour and their impact on thermal
occupant behaviour modelling in simulation-aided building design: reasons,
comfort in buildings, IOP Conf. Ser.: Earth Environ. Sci. 329 (1) (2019) 012022.
challenges and solutions, Energy Build. 253 (2021), https://doi.org/10.1016/j.
[31] L. Jiang, R. Yao, K. Liu, R. McCrindle, An Epistemic-Deontic-Axiologic (EDA)
enbuild.2021.111498.
agent-based energy management system in office buildings, Appl. Energy 205
[5] S. Carlucci, F. Causone, S. Biandrate, M. Ferrando, A. Moazami, S. Erba, On the
(August) (2017) 440–452, https://doi.org/10.1016/j.apenergy.2017.07.081.
impact of stochastic modeling of occupant behavior on the energy use of office
[32] W. Parys, D. Saelens, H. Hens, Coupling of dynamic building simulation with
buildings, Energy Build. 246 (2021), https://doi.org/10.1016/j.
stochastic modelling of occupant behaviour in offices - a review-based
enbuild.2021.111049.
integrated methodology, J. Build. Perform. Simul. 4 (4) (2011) 339–358,
[6] E. Azar et al., Simulation-aided occupant-centric building design: a critical
https://doi.org/10.1080/19401493.2010.524711.
review of tools, methods, and applications, Energy Build. 224 (2020), https://
[33] T. Hong, Y. Chen, Z. Belafi, S. D’Oca, Occupant behavior models: a critical
doi.org/10.1016/j.enbuild.2020.110292.
review of implementation and representation approaches in building
[7] R.H. Socolow, The twin rivers program on energy conservation in housing:
performance simulation programs, Build. Simul. 11 (1) (2018) 1–14, https://
highlights and conclusions, Energy Build. 1 (3) (1978) 207–242, https://doi.
doi.org/10.1007/s12273-017-0396-6.
org/10.1016/0378-7788(78)90003-8.
[34] F. Haldi, D. Robinson, Adaptive actions on shading devices in response to local
[8] Y. Sun, F. Haghighat, B.C.M. Fung, A review of the -state-of-the-art in data -
visual stimuli, J. Build. Perform. Simul. 3 (2) (2010) 135–153, https://doi.org/
driven approaches for building energy prediction, Energy Build. 221 (2020)
10.1080/19401490903580759.
110022, https://doi.org/10.1016/j.enbuild.2020.110022.
[35] G. Flett, N. Kelly, An occupant-differentiated, higher-order Markov Chain
[9] Q. Qiao, A. Yunusa-Kaltungo, R.E. Edwards, Towards developing a systematic
method for prediction of domestic occupancy, Energy Build. 125 (2016) 219–
knowledge trend for building energy consumption prediction, J. Build. Eng. 35
230, https://doi.org/10.1016/j.enbuild.2016.05.015.
(October 2020) (2021) 101967, https://doi.org/10.1016/j.jobe.2020.101967.
[36] S. Tian, Y. Lu, X. Ge, Y. Zheng, An agent-based modeling approach combined
[10] Y. Jin, D. Yan, A. Chong, B. Dong, J. An, Building occupancy forecasting: a
with deep learning method in simulating household energy consumption, J.
systematical and critical review, Energy Build. 251 (2021), https://doi.org/
Build. Eng. 43 (2021), https://doi.org/10.1016/j.jobe.2021.103210.
10.1016/j.enbuild.2021.111345.

17
Q. Qiao and A. Yunusa-Kaltungo Energy & Buildings 283 (2023) 112797

[37] D.H. Dorrah, M. Marzouk, Integrated multi-objective optimization and agent- [44] J.Z. Qing-Hua Zhu, Y.Q. Yan Hou, The energy- saving scheduling of campus
based building occupancy modeling for space layout planning, J. Build. Eng. 34 classrooms: a simulation model, IEEE Syst. Man, Cybern. Mag. (2021), https://
(2021), https://doi.org/10.1016/j.jobe.2020.101902. doi.org/10.1109/MSMC.2020.3046236.
[38] Z. Wang, Y. Zhao, C. Zhang, P. Ma, X. Liu, A general multi agent-based [45] D. Helbing, P. Molnár, Social force model for pedestrian dynamics, Phys. Rev. E
distributed framework for optimal control of building HVAC systems, J. Build. 51 (5) (1995) 4282–4286, https://doi.org/10.1103/PhysRevE.51.4282.
Eng. 52 (2022), https://doi.org/10.1016/j.jobe.2022.104498. [46] Q. Qiao, A. Yunusa-Kaltungo, R.E. Edwards, Feature selection strategy for
[39] E. Bonabeau, Agent-based modeling: Methods and techniques for simulating machine learning methods in building energy consumption prediction, Energy
human systems, Proc. Natl. Acad. Sci. USA 99 (SUPPL. 3) (2002) 7280–7287, Rep. 8 (2022) 13621–13654, https://doi.org/10.1016/j.egyr.2022.10.125.
https://doi.org/10.1073/pnas.082080899. [47] M.B. Kursa, W.R. Rudnicki, Feature selection with the boruta package, J. Stat.
[40] I. Mahmood, H.A. Quair-tul-ain, F.J. Nasir, J.A. Aguado, A hierarchical multi- Softw. 36 (11) (2010) 1–13, https://doi.org/10.18637/jss.v036.i11.
resolution agent-based modeling and simulation framework for household [48] R. Tang, X. Zhang. ‘‘CART Decision Tree Combined with Boruta Feature
electricity demand profile, Simulation 96 (8) (2020) 655–678, https://doi.org/ Selection for Medical Data Classification,” 2020 5th IEEE Int. Conf. Big Data Anal.
10.1177/0037549720923401. ICBDA 2020, pp. 80–84, 2020, 10.1109/ICBDA49040.2020.9101199.
[41] M. Barakat, H. Khoury, An agent-based framework to study occupant multi- [49] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. (1995), https://doi.
comfort level in office buildings, Winter Simulation Conference (WSC) 2016 org/10.1023/A:1022627411411.
(2016) 1328–1339, https://doi.org/10.1109/WSC.2016.7822187. [50] M.W. Ahmad, M. Mourshed, Y. Rezgui, Trees vs Neurons: comparison between
[42] J. Malik et al., Ten questions concerning agent-based modeling of occupant random forest and ANN for high-resolution prediction of building energy
behavior for energy and environmental performance of buildings, Build. consumption, Energy Build. 147 (2017) 77–89, https://doi.org/10.1016/j.
Environ. 217 (2022), https://doi.org/10.1016/j.buildenv.2022.109016. enbuild.2017.04.038.
[43] C. Berger, A. Mahdavi, Review of current trends in agent-based modeling of [51] A.A. Salah, Machine learning for biometrics, Mach. Learn. (2011) 704–723,
building occupants for energy and indoor-environmental performance https://doi.org/10.4018/978-1-60960-818-7.ch402.
analysis, Build. Environ. 173 (2020), https://doi.org/10.1016/j. [52] Y. Song, J. Liang, J. Lu, X. Zhao, An efficient instance selection algorithm for k
buildenv.2020.106726. nearest neighbor regression, Neurocomputing 251 (2017) 26–34, https://doi.
org/10.1016/j.neucom.2017.04.018.

18

You might also like