You are on page 1of 21

Building and Environment 236 (2023) 110259

Contents lists available at ScienceDirect

Building and Environment


journal homepage: www.elsevier.com/locate/buildenv

A machine learning led investigation to understand individual difference


and the human-environment interactive effect on classroom
thermal comfort
Haifeng Lan a, Huiying (Cynthia) Hou a, *, Zhonghua Gou b
a
Department of Building Environment and Energy Engineering, The Hong Kong Polytechnic University, Hong Kong, 999077, China
b
School of Urban Design, Wuhan University, Wuhan, 430072, China

A R T I C L E I N F O A B S T R A C T

Keywords: The availability of the global thermal open database means that machine learning models have been increasingly
Classroom applied in thermal comfort studies in order to understand the factors and mechanisms that affect human thermal
Thermal comfort sensation. Previous global database analyses focused less on classroom thermal comfort, however, and more on
ASHRAE global database
model accuracy, while model interpretation was usually ignored, and individual differences and interaction
Machine learning
Individual difference
effects are particularly poorly explained. This study screened 4527 related records about classrooms from the
Interactive effects ASHRAE Global Thermal Comfort Database II, and used the cleaned data to train a hybrid model of extreme
SHAP value gradient boosting (XGBoost) and Bayesian optimisation (BO). SHAP values were used to interpret the machine
learning model. The results identified ten key influencing factors that are associated with thermal comfort,
although their importance varies among individuals. The effects of the factors can also be divided into main
effects (80%) and interactive effects (20%), and some interactive effects are more potent than the main effect.
Three typical types of interactive effects are concluded: two-way interaction, one-way interaction, and cross-
interaction. This study was based on a comprehensive global database and an innovative machine learning
method, and will lead to a more robust personal comfort model (PCM) that guides HVAC design and regulation
development in order to meet thermal environment and energy-saving requirements.

with the classroom environment is commonly found in both developed


and developing countries [9–11]. This study is based on the interna­
1. Introduction
tionally authorised open research database on thermal comfort, and
explores the possible factors and mechanisms that affect individual
Students have recognised that thermal comfort in the classroom
perceptions of thermal comfort in the classroom environment using a
environment is closely associated with their learning performance,
machine learning approach and an innovative model interpretation
health, and well-being [1,2], and have raised their expectations of
method.
receiving a thermally comfortable indoor environment in the classroom
Providing a high-quality thermal environment and promoting stu­
setting [3,4]. Although 50–60% of total energy consumption in a
dent satisfaction with thermal comfort are challenges according to
classroom is consumed by the heating, ventilation, and air conditioning
studies of thermal comfort in classrooms environment, as variation in
(HVAC) system in order to create a thermally comfortable environment,
the thermal perceptions of individual students result in diverse thermal
student satisfaction levels with indoor thermal comfort are low [5,6].
comfort requirements [7,12,13]. In the past, the PMV-PPD model [8,14]
The ASHRAE Global Thermal Comfort Database II [7] shows the big
and the adaptive thermal comfort model [15,16] were commonly
picture regarding overall classroom thermal comfort satisfaction level
adopted to predict users’ thermal perceptions in various types of indoor
according to satisfaction feedback from individual classroom users: only
environments, however, individual differences are insufficiently
52.8% of classroom users (total sample size: 10,903) were satisfied with
accounted for in the models [12,17]. These models therefore need to be
the thermal comfort of their environment, whereas the indoor envi­
improved, in order to create high levels of thermal comfort in a class­
ronment thermal comfort acceptance level set by ASHRAE Standard 55
room environment, and to meet students’ increased demands for
was 80% [8]. The database also suggests that student dissatisfaction

* Corresponding author.
E-mail address: cynthia.hou@polyu.edu.hk (H.(C. Hou).

https://doi.org/10.1016/j.buildenv.2023.110259
Received 21 December 2022; Received in revised form 19 February 2023; Accepted 29 March 2023
Available online 2 April 2023
0360-1323/© 2023 Elsevier Ltd. All rights reserved.
H. Lan et al. Building and Environment 236 (2023) 110259

Abbreviation LOWESS Locally Weighted Scatterplot Smoothing


LR Logistic Regression
ACTN3 α-actinin-3 Met Basal Metabolic Rate
ASHRAE American Society of Heating, Refrigerating, and Air- PCM Personal Comfort Model
Conditioning Engineers PMV-PPD Predicted Mean Vote-Predicted-Predicted Percentage of
AUC Area Under the Receiver Operating Characteristic Dissatisfied
BMI Body Mass Index RF Random Forest
BNN Bayesian Neural Network ROC Receiver Operating Characteristic
BO Bayesian Optimisation SET Standard Effective Temperature
Clo Clothing Thermal Resistance SHAP SHapley Additive exPlanations
DT Decision Tree SVM Support Vector Machine
GBDT Gradient Boosting Decision Tree TAM Thermal Adaptive Model
GMM Gaussian Mixture Model TPV Thermal Preference Vote
HVAC Heating, Ventilation, and Air Conditioning TSV Thermal Sensation Vote
KNN K-Nearest Neighbour XGBoost eXtreme Gradient Boosting

thermal comfort. One of the main objectives of this study is to develop a machine-
A growing number of studies on personal comfort models (PCMs) learning model to investigate human-classroom environment relation­
have been developed in recent years to provide a more accurate pre­ ships based on the ASHRAE Global Thermal Comfort Database II.
dictive description of the thermal comfort characteristics of individual Another objective is to improve the transparency and interpretability of
classroom users, and to meet the diverse individual thermal comfort the machine learning model, making its results more understandable.
requirements in a classroom environment [15,16]. Field tests and This study thus involves the following specific research questions.
questionnaire surveys are the two main methods used in these studies to
collect objective and subjective data to support PCMs development. (1) What are the key influencing factors in classroom thermal com­
Statistical methods, such as linear regression, t-tests, ANOVA, and cor­ fort, and how is their importance ranked?
relation, have been used for data analysis [2,9,18–20]. Machine learning (2) What influencing factors reflect individual differences in thermal
algorithms have gradually been adopted in PCM research in more recent comfort, and how do these individual differences affect the
years [21–23]. Their prediction accuracy has proven to be better, due to thermal perceptions of classroom users?
their capacity to model nonlinear connections and interactions between (3) What are the intensity, proportion, and types of interactive effects
factors [24,25]. between key influencing factors in thermal comfort?
Even though PCMs can provide a higher level of prediction accuracy
regarding the thermal comfort of individual classroom users, the re­ 2. Literature review
quirements for sample size are still a big challenge, as the process of
collecting data is often time-consuming, labour-intensive, and expensive 2.1. Thermal comfort study in educational buildings
[15,26]. The scope of these studies is also usually confined to a certain
room, a certain building, or a certain school. These also studies focus on Thermal comfort is a well-researched topic, and there have been
increasing model accuracy rather than model explanation. Previous numerous studies on the influencing factors on thermal comfort in
studies have investigated the relationship between different factors that educational buildings [9,11,35–39]. Zomorodian et al. [9] reviewed 50
affect thermal comfort, such as human factors (e.g. BMI, metabolic rate) years of thermal comfort field studies in educational buildings and found
and environmental factors (e.g. air temperature, humidity), they have that students’ thermal preferences were not within the comfort range
not thoroughly explored how these factors interact with each other [27, provided in the standards. It highlights the importance of conducting
28]. Although an increasing number of studies have pointed out that the in-depth thermal comfort studies to gain a better understanding of the
influencing factors of thermal comfort affect thermal comfort alone, and specific factors that influence the thermal comfort of occupants in
also interact with other factors to affect thermal comfort jointly [7,9,29, educational buildings. Manoj [37] reviewed 93 research articles selected
30], there is little more detailed qualitative and quantitative analysis of from Scopus database. It was found that students across different
the interaction effects [31–33]. educational stages were dissatisfied with indoor thermal environments
In order to address the limitations of previous classroom thermal and preferred cooler temperatures, supporting the need for separate
comfort studies, such as the relatively small sample size, the location- guidelines or standards based on age and education stage.
constrained scope, and lack of interpretative explanations regarding A field study led by Zhang [35] investigated the thermal comfort of
how models make decisions, this study conducted analyses based on the university classrooms in a hot and humid climate. The study identified
ASHRAE Global Thermal Comfort Database II [34] – an open-source that indoor temperature and air velocity were the most critical factors
database of indoor climatic observations with accompanying “right-­ for thermal comfort in classrooms. As educational buildings are often
here-right-now” subjective evaluations by the building occupants who occupied by people who are sitting still for long periods, such as in
were exposed to the conditions. The analyses include using more classrooms or lecture halls, temperature and air flow within these spaces
advanced machine learning algorithms, parameter optimisation strate­ must be carefully controlled to avoid discomfort and distraction,
gies, and model interpretation methods, to examine the influencing particularly in extreme weather conditions. Similarly, a study by Liu
factors (built physical environment factors, personality factors, climate et al. [11] evaluated thermal comfort in a primary school in China. The
factors, geographical factors), and their mechanisms (individual differ­ study found that classroom air temperature, air velocity, and relative
ences and interactive effects). This study contributes to the future humidity all had a significant impact on thermal comfort, with better
updating of thermal comfort standards in order to meet the needs of performance on tests in thermally comfortable classrooms. Other factors
individuals for thermal comfort, and to facilitate the optimisation of such as activity level, clothing, and personal factors (gender, age, BMI)
classroom environmental controls to achieve energy efficiency in should also be considered in the design and operation of educational
classrooms and education buildings [10,19,21]. buildings for the comfort and health of occupants, according to other

2
H. Lan et al. Building and Environment 236 (2023) 110259

studies [36,38,39]. 2.4. Machine learning in the thermal comfort open dataset
Overall, these studies demonstrate the importance of thermal com­
fort in educational buildings and the need for more research on the With the advancement of data science, an increasing number of re­
topic. They highlight the factors that influence thermal comfort, the searchers are focusing on the use of machine learning algorithms in
impact of design strategies, and the relationship between thermal developing data-driven models for predicting the thermal sensations of
comfort and student performance. occupants of people within buildings [22,23,53,54]. Researchers rec­
ognised the value of open-source research databases in advancing the
science and technology of HVAC, and started to apply machine learning
2.2. Individual differences
algorithms to explore the ASHRAE Global Thermal Comfort Database II
[21,31,53,55–57]. Table 1 summarises recent studies using machine
Individual differences in thermal comfort describe the phenomenon
learning approaches to analyse the Global Thermal Comfort Database II.
wherein subjects might have different perceptions of thermal comfort
The table outlines the datasets (data source, building types, samples, and
even if they are exposed to the same thermal environment [30,40].
location), methodologies (algorithms, cross-validation, and hyper­
Addressing individual differences and satisfying every student in the
parameter optimisation), input features (influencing factors and thermal
same classroom environment is difficult, however, because what suits
comfort indicators), and results (optimal algorithm and accuracy) of
one group or type of occupant may be unacceptable to others. [41].
previous studies.
It is necessary to understand such individual differences in the
As shown in Table 1, the influencing factors can be divided into four
thermal comfort of a classroom in order to provide a more satisfying
categories: indoor environmental factors (indoor air temperature, in­
classroom environment for students [9,33,42]. By understanding these
door relative humidity, and indoor radiation temperature, and etc.),
individual differences, steps can be taken to enhance the flexibility of
outdoor environmental factors (outdoor temperature, season climate,
HVAC systems, so that they can cater to the varying needs of different
and etc.), human factors (Clo, Met, age, sex), and building information
users and enable more people to feel comfortable and satisfied.
factors (location, operation, and etc.). The studies usually adopted 3-
Fanger [43] had already noticed in the 1970s that individuals are
point TSV or 3-point TPV as a predicted thermal comfort index, for
different as regards thermal comfort, and that they are affected by age,
the sake of practical applicability and model accuracy. It is simpler and
gender, adaptation, and other factors. Many researchers were inspired
easier to improve machine learning accuracy by using the 3-point in­
by Fanger’s studies, and started investigating individual differences in
dicator. Machine learning algorithms such as the support vector ma­
thermal comfort. Those studies point out that there are differences in
chine (SVM), decision trees (DT), random forest (RF), and K-nearest
thermal comfort between males and females. Females may feel less
neighbour (KNN), were extensively employed for model training. Five-
comfortable at lower temperatures than males, which is more evident
fold and ten-fold cross-validation, and manual tuning or grid search,
when people are working, due to their lower metabolic rate and body fat
are also commonly utilized to optimise the hyperparameters of models.
percentage [16,33]. The thermal comfort of both males and females
In summary, these studies use a large number of samples and factors for
changes with age, and older people will prefer a warmer thermal envi­
machine learning model training, which can provide higher prediction
ronment, due to a lower metabolic rate, reduced blood flow, and thinner
accuracy regarding thermal comfort sensation, and a broader perspec­
skin [44,45]. Overweight people have a lower thermoneutral tempera­
tive from which to understand human thermal comfort.
ture due to higher thermal insulation and metabolic rate [46,47].
Humphreys and Nicol [48] concluded that individual differences arise
from phenomenological differences, including (1) inter-individual dif­ 2.5. Limitations in previous studies
ferences in thermoneutrality, (2) inter-individual differences in the
interpretation of semantic scale categories, and (3) intra-individual Previous thermal comfort studies have recognised individual differ­
changes in semantic judgments over time. Rupp et al. [34] classified ences in, and interactive effects on, thermal comfort, and an increasing
the sources of individual differences into two categories: (1) physio­ number of studies take them into account in developing machine
logical sources, that is, metabolic thermal differences between in­ learning models. Previous studies are limited, however. Firstly, differ­
dividuals and age groups, and (2) cultural and behavioural differences ences in individual perception and interactive effects need to be fur­
expressed through clothing insulation. therly explored. Individual differences need to be explored from a wider
perspective. Although there have been some studies on the effects of
individual differences on thermal comfort, more data and wider
2.3. Interactive effects
research are necessary to confirm the generalisability and reliability of
these results. Further use of larger datasets and broader research scope is
Interaction effects exist when the effects of two or more features
needed to confirm the impact of individual differences on thermal
together are not equal to the sum of their separate effects [49]. For
comfort. Additionally, the intensity, proportion, and types of interactive
example, heavy activities in a high air temperature environment, which
effects between influencing factors are still uncertain. While some re­
increase metabolic rate and potentially increase body temperature,
searchers have recognised the importance of machine learning algo­
could result in a higher thermal sensation [50]. Similarly, slight activ­
rithms and hyperparameters in modelling performance when
ities in a low air temperature environment could increase airflow around
developing thermal comfort models, more advanced machine learning
the body, and may lead to lower thermal sensations [51]. The interactive
algorithms and hyperparameter optimisation techniques are less
effects of these factors on thermal comfort depend on the level of activity
commonly used for classroom thermal comfort studies [22,54]. Based on
undertaken, the respondent’s metabolic rate, and the air temperature of
the findings and limitations of previous studies, this paper proposes the
the environment.
following hypotheses.
In a classroom context, the thermal comfort of students in an indoor
environment is a comprehensive reflection of the interaction of various H0. A combination of sophisticated machine learning algorithms and
environmental and non-environment factors [33]. Past indoor environ­ advanced hyperparameter optimisation methods can lead to better
ment studies have examined the interactive effects of various driving performance and machine learning results.
factors on thermal comfort, including personal factors, such as gender,
H1. The impact of individual differences and interactive effects on
age, and BMI; contextual factors, such as building design, building ty­
thermal comfort can be recognised using a machine learning model.
pology, season, and climate; environmental interactions, such as light­
ing, acoustics, and indoor air quality; and cognition factors, such as H2. Individual differences and interactive effects recognised by ma­
attitudes, preferences and expectations [7,32,33,52]. chine learning models can be clearly interpreted using domain

3
H. Lan et al.
Table 1
Related works indoor thermal comfort by machine learning approaches.
Studies Dataset Methodology Input features Result

Reference Purpose Data source Building type Data processing Samples Algorithms Cross- Hyperparameter Influencing factors Thermal Optimal Accuracy
validation optimisation comfort algorithm
indicators

Wang et al. Testing the ASHRAE Classroom, • Remove 16,795 LR 10-fold Manual tuning Subjective thermal 2-point LR Accuracy =
(2020) effectiveness and Global office, missing value records SVM comfort vote (TPV, TCV 0.87 (2-
[53] accuracy of subjective Thermal multifamily • numeric TSV, PMV, TAV, AMV, 3-point point TCV)
thermal metrics on Comfort housing, senior encoding AMP) TPV Accuracy =
occupants’ thermal Database II centre 0.64 (3-
experience. point TPV)
Luo et al. Comparing machine ASHRAE Classroom, • Remove 10,618 LR, NB, 20-fold Grid search Tair, Vair, RH, SET, 3-point TSV RF Accuracy =
(2020) learning algorithms in Global office, missing value records ANN, KNN CLO, MET, Age, Sex, 7-point TSV 66.3% (3-
[31] predicting thermal Thermal multifamily • numeric AB, DT, Tout, Season, point TSV)
sensation Comfort housing, senior encoding GBM, RF, Operation mode, Accuracy =
Database II centre SVM building type, etc. 61.1% (7-
point TSV)
Farhan Predicting Individual ASHRAE Office • Remove 12,000 AdaBoost No Manual tuning Tair, Vair, RH, SET, SVM Accuracy =
et al. Thermal Comfort in RP884 missing value records RF CLO, MET, Age, Sex, 3-point 0.757
4

(2015) office • normalized SVM Tout, Season, etc. TCV


[56] • numeric
encoding
Lu et al. Developing adaptive ASHRAE Office • Feature 5576 SVM 5-fold Grid search Tair, Tra, Tout, RH, 7-point TSV KNN Recall =
(2019) thermal comfort RP884 selection using records RF RHout, Vair, CLO, 49.30%
[21] models tree-based KNN MET, etc.
estimator
• numeric
encoding
Ma et al. Building a predictive ASHRAE Office, • Replace the 78,113 BNN 5-fold Bayesian Tair, Tra, Tout, Top, 3-point BNN Accuracy =
(2021) model for occupant Global classroom, missing value records optimisation RH, RHout, Vair, CLO, TPV 0.703
[55] thermal preference Thermal multifamily with median MET, Weight, Age, Precision =
Comfort housing value Operation mode, etc. 0.693
Database II • numeric AUC =
encoding 0.838
Lala et al. Validating the multi- ASHRAE Classroom • Remove 1894 RF, DT, 5-fold Grid search Tair, Tra, Tout, Top, 3-point TSV Deep Precision =

Building and Environment 236 (2023) 110259


(2022) task thermal comfort Global (Students under missing value records KNN RH, RHout, Vair, CLO, 3-point comfort 0.9
[57] prediction model Thermal age of 14) • numeric AdaBoost, MET, Weight, Age, TPV Recall = 0.9
Comfort encoding DNN, Operation mode, 3-point F1-score =
Database II Deep climate and weather, TCV 0.9
comfort etc.
H. Lan et al. Building and Environment 236 (2023) 110259

knowledge combined with new machine learning interpretation Comfort Database II. It is an open-source database launched under the
techniques. leadership of the University of California at Berkeley’s Center for the
Built Environment, and the University of Sydney’s Indoor Environ­
3. Methodology mental Quality (IEQ) Laboratory [54]. The dataset was generated on the
basis of field experiments. Both instrumental (indoor climatic) and
The study method can be divided into three main sections: data subjective (questionnaire) data was required, and was thus recorded in
cleaning, model training, and model explanation. Fig. 1 is a flow chart the same space simultaneously. After the quality-assurance process,
illustrating the methodology of the research framework for this study. there was a total of 83,316 rows of data of paired subjective comfort
Firstly, the indicators and influencing factors related to classroom votes and objective instrumental measurements of thermal comfort pa­
thermal comfort were selected from the ASHARE Global Thermal rameters forming the latest version of the database, which then was
Comfort Database II, according to the research objectives of this paper combined with an additional 25,617 rows of data from the original
and the literature review. The cleaned data was then used to train the ASHRAE RP-884 database, bringing the total number of entries to 109,
hybrid model of extreme gradient boosting (XGBoost) and Bayesian 033 [58].
optimisation (BO) to obtain the best performance model. Finally, the
SHAP value method was used to explain the optimal performance model 3.1.2. Variable selection
in order to understand how individuals respond to influencing factors The variable selection was based on a thorough analysis of the pre­
and how the interaction between influencing factors affects thermal vious literature and the availability of database data. The independent
comfort. variables are student satisfaction with the thermal environment in the
classroom. The literature review reveals that subjective indicators such
3.1. Data source and processing as thermal sensation vote (TSV) and thermal preference vote (TPV) are
commonly used to assess thermal comfort. Conversely, objective in­
3.1.1. Thermal comfort database dicators including standard effective temperature (SET) and predicted
The data source for this paper is the ASHRAE Global Thermal mean Vote (PMV), are more frequently utilized for the evaluation of

Fig. 1. The research framework of the study.

5
H. Lan et al. Building and Environment 236 (2023) 110259

indoor thermal comfort [21,53,55,59]. Therefore, the four thermal Table 3


comfort indicators (independent variables) were initially chosen Characteristics related to thermal comfort.
(Table 2). Additionally, twenty influencing factors were chosen as Categories Factors Description
dependent variables. These factors can be divided into four categories:
Personality Age Age of subject [years]
personality, climate, operation, and indoor environment. Table 3 con­ Gender Gender of subject [female = 1, male = 2]
tains the detailed classification and description of these influencing Clo Intrinsic clothing ensemble insulation of
factors [21,55]. the subject [clo]
Met Average metabolic rate of the subject
[met]
3.1.3. Data clean Height Height of subject [m]
After combining the measurement data with the metadata, a raw BMI Underweight (BMI<18.5), normal (18.5
dataset of 109,033 objects and 55 variables are obtained. However, ≤ BMI<24), overweight (24 ≤ BMI<29)
ASHRAE Global Thermal Comfort Database II is a compilation of data­ and obese (BMI≥29)
Climate Season Spring = 1, Summer = 2, Autumn = 3,
sets that contributed by scientists and researchers from around the
Winter = 4
world. There is a discrepancy in their dataset because some studies have Köppen climate Topical = 1, Dry = 2, Temperate = 3,
taken into account certain measurement variables, while others have classification Continental = 4
not. As a result, it’s common to find missing values (data that is marked Outdoor air Outdoor air temperature from original
as N/A in the database is considered missing data) in the database, temperature dataset [◦ C]
Outdoor air Outdoor relative humidity from original
therefore it is necessary to establish a data cleaning procedure to obtain humidity dataset [◦ C]
a more uniform dataset. Operation Curtain State of internal blinds or curtains [0 = on,
Fig. 2a illustrates the data cleaning process, which involves four 1 = off]
steps. The data for classroom building type was screened out because the Window State of window [0 = on, 1 = off]
Door State of door [0 = on, 1 = on]
study mainly focuses on the thermal comfort of the classroom environ­
Fan State of fan [0 = on, 1 = off]
ment. Selected variables (including independent and dependent vari­ Cooling type Mixed mode = 1, Air conditioned = 2,
ables) were retained, and column variables that were duplicated in Naturally ventilated = 3
meaning or irrelevant to the research were removed. Thirdly, the rows Indoor Indoor air Air temperature measured in the occupied
containing missing values (marked as N/A) were deleted. Finally, an environment temperature zone [◦ C]
Indoor radiant Radiant temperature measured in the
encoding process was applied to convert categorical data (textual data)
temperature occupied zone [◦ C]
to numeric data. The final cleaned dataset consisted of 4527 records and Globe temperature Globe temperature measured in the
24 variables (Fig. 2b), spread across 12 cities on three continents occupied zone [◦ C]
(Fig. 2c). Indoor air Relative humidity [%]
humidity
Indoor air speed Air speed measured in the occupied zone
3.2. Machine learning model training [m/s]

After cleaning the data, model training proceeded. This study adopts
the XGBoost algorithm based on Bayesian optimisation to explore the successfully in many engineering fields, and produced outstanding
cleaned data generated from the ASHRAE Global Thermal Comfort performance [61]. As demonstrated by researcher Léo Grinsztajn [62] in
Database II. The following subsections introduce the XGBoost algorithm, multiple studies on medium-sized (1–10k samples) table-type datasets,
and Bayesian optimisation processing. XGBoost has been shown to have superior performance in processing
tabular data and even outperform some deep learning models.
3.2.1. eXtreme gradient boosting algorithm The objective function of XGBoost consists of a loss function and a
The eXtreme Gradient Boosting (XGBoost) algorithm, a form of tree regularisation term, which can be described by Equation (1).
algorithm with strong interpretation, was selected to provide an accu­ ∑
n ∑
t

rate interpretation of individual variation in human thermal perception Obj = l(yi , ŷi ) + Ω(fi ) (1)
[60,61]. It has advantages over traditional tree algorithms, such as RF i=1 i=1

and GBDT. XGBoost adds regularisation terms to the objective function ∑


Where, l(yi , ŷi ) is the loss function; and ti=1 Ω(fk ) represents the regu­
to control model complexity and support column sampling, so as to
larisation item; ŷi is the predicted value; yi is the true value; xi is the
prevent model overfitting and reduce computation time. It also builds all
value of features; and ft is the tree model t.
the subtrees that can be built from top to bottom, and then does reverse
The minimum value of the objective function can also be briefly
pruning from the bottom to the top, which ensures the model does not
written as Equation (2). The specific formula derivation process is shown
fall into local optimal solutions. This algorithm has been applied
in Appendix 1.

Table 2 ∑T
Gj 2
̂ ≅ − 1

Thermal comfort indictors. Obj + γT (2)
2 j=1 Hj + λ
Indicator Type Thermal Description Value Range
comfort Gj is the sum of the first partial derivatives of samples contained in
indicator leaf node j of the tree model, and Hj is the sum of the second partial
Subjective TSV Thermal sensation vote From − 3 (cold) derivatives of samples contained in leaf node j of the tree model. T is the
indicators to 3(hot) number of leaf nodes, γ is the control parameter of the number of leaf
(questionnaire) TPV Thermal preference vote − 1 (Prefer nodes. The control parameter of λ is regularisation intensity.
warmer), prefer
0 (no change),
According to Equation (2), the model’s splitting criterion is easier to
1 prefer cooler understand. XGBoost calculates the Gain as the splitting criterion of the
Objective PMV Predicated mean vote From − 3 (cold) node (as shown in Equation (3)). The objective function is used to assess
indicators to 3(hot) the change in model performance once a certain node in the decision
(Calculation SETOPT Standard effectively 10 to 40 Celsius
tree is split. If the model’s Gain is greater than before, this split will be
form TemperatureOperative degrees10 to 40
measurement) temperature Celsius degrees used; otherwise, the split will be discontinued.

6
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 2. The process of data cleaning.

[ ]
1 GL 2 GR 2 (GL + GR )2 entire hyperparameter optimisation process is shown in Fig. 3. In the
Gain = max + − − γ (3) first step, the dataset was split into the training-test set (80%) and an
2 HL + λ HR + λ HL + HR + λ
independent validation set (20%). The train-test set (Fig. 3a) includes
the inner loop for hyperparameter tuning, using Bayesian optimisation
Where, (GL /HL +λ) and (GR 2 /HL +λ) refer to the loss value of left and
based on k-fold cross-validation, and the outer loop to test how well the
right leaf nodes after splitting; [(GL + GR )2 /HL +HR +λ] refer to the lose
best-performing models can generalise on test sets. During the process, n
value without splitting.
models are created by setting the number of iterations (n). The top 10
models with the best performance were then chosen after ranking the n
3.2.2. Hyperparameters optimisation
models according to the defined assessment criterion. The second step
One drawback of the XGBoost algorithm is that it requires more
was examining the prediction accuracy of the top 10 models using the
parameter tuning. It frequently entails assessing tens of thousands of
independent validation set (Fig. 3b). In this step, the model with the best
hyperparameter combinations. Grid search and random search are
capacity for generalisation will be retained, and the overtrained model
common approaches to parameter tuning. Grid search will iterate across
will be disqualified. In the last step, the optimised parameters were used
the entire search space, making it very effective and extremely slow.
to re-train the model on the entire dataset (Fig. 3c).
Random search is fast, as it will randomly iterate across the search space,
but it could easily miss the most important points in the search space.
Bayesian optimisation for parameter tuning involves determining the 3.3. SHAP value for model interpretation
best set of hyperparameters within several iterations (see Appendix 2 for
the detail principle). It uses observed historical information (prior Machine learning models are often criticised as black boxes because
knowledge) for subsequent optimisation [63]. In other words, Bayesian it’s hard to understand how they make predictions [65,66]. Even though
optimisation is designed to minimize the number of evaluations required a feature importance or a partial dependency diagram can present the
to find the optimal solution, making it more data-efficient than other influence of factors in the model, in this case it failed to show how the
methods. Furthermore, it has the ability to find the global optimum characteristics relate to the outcome of individual differences and
solution, rather than just a local optimum, making it a good choice for interaction between factors [66]. The SHAP (SHapley Additive exPla­
complex problems with many local optima [64]. This study focus on nations) value method is based on cooperative game theory, and can be
both accuracy and speed in hyperparameter optimisation, Bayesian used to increase the transparency and interpretability of machine
optimisation is a clear superior choice compared to grid search and learning models [66]. Using SHAP value can explain the importance of
random search methods. the factors, how individuals respond to changes in influencing factors,
This study combines Bayesian optimisation with k-fold cross- and how the interactions between influencing factors affect the out­
validation to optimise the XGBoost algorithm’s hyperparameters. The comes [66,67].

7
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 3. The process of hyperparameter optimisation.

The principle of the SHAP value method involves interpreting the state, but it is also can be identified that the correlation between PMV
prediction results of the model by calculating the contribution of each and SET (0.7902) is much higher than that between TSV and SET
factor or combination of factors [67]. The formula for the SHAP value (0.1487) and between TPV and SET (0.1523).
and SHAP interaction value are shown as follows: Equation (4) and The results of correlation analysis among TSV, TPV, and PMV show
Equation (5). (Fig. 4b) that the correlation between TSV and TPV is 0.531, which is
higher than the correlation between TPV and PMV (0.1887), and the

M

g(z ) = φ0 +

φi zi (4) correlation between TSV and PMV (0.1722). SET and PMV, derived from
i=1 objective data, have a strong correlation. Similarly, TSV and TPV, which
are subjective vote data, have a strong correlation. The correlation be­
∑ ( ′ ′) tween objective and subjective data is weak, however, which means
M ∑
M
(5)

g(z ) = φ0 + φij zi ∗ zj
i=1 j=1
there are gaps between the thermal comfort indexes calculated based on
the objective data and the subjective thermal comfort indicated by
Where g(z ) is the explanatory model; φ0 is the average value of label. φi

classroom users. In other words, the thermal comfort index calculated
based on objective data cannot fully reflect the thermal comfort
is the SHAP value; φij is the SHAP interaction value; and zi ∈ {0, 1}M or zj
′ ′

perception of classroom users.


∈ {0, 1}M is the factor: when it is 1, it means that the factor is present and Fig. 4c shows that the distributions of 3-point TPV and 7-point TSV
when it is 0, it means that the factor is absent. are consistent. The classroom users prefer cooler conditions, with TSV
The SHAP value can be calculated using Equation (6). indications ranging from 1 to 3. The classroom users prefer thermal no
∑ |z′ |!(M − |z′ | − 1)! change in thermal conditions, and their TSV is almost 0. Classroom users
(6)
′ ′
φi (f , x) =
M!
[fx (z ) − fx (z \i)] that prefer warmer conditions have TSVs between − 3 and − 1. This
paper finally adopts the 3-point TPV as the label of the training set.
′ ′
z ⊆x

Because TPV not only has a high correlation with TSV but also has less
where, φi is SHAP value for feature i; f is backbox model; x is input
classification, which is more conducive to improving the classification
datapoint; z is subset; x data input; |z | is the number of features in z ; M
′ ′ ′ ′

accuracy of the XGBoost model.


is the total number of features; [|z |!(M − |z | − 1)! /M!] can be regard as
′ ′

the weighting; [fx (z ) − fx (z \i)] is the contribution of feature i.


′ ′

4.2. Bayesian optimisation


4. Results
In order to select the most robust model, this study conducted mul­
4.1. Comparison of the indexes of thermal comfort tiple trials and ultimately set 250 iterations. Each iteration was tested
using 5-fold cross-validation to ensure its stability. The tuning result of
The correlation analysis for thermal comfort indexes was carried out the hyperparameters based on the Bayesian optimisation is shown in
to select the appropriate thermal comfort index as the label of the Fig. 5a. The black dashed lines depict the specific parameter value that is
training set. Fig. 4a shows the correlation analyses between SET, 7-point used across the top 10 best-performing models. The green dashed line
TSV, 3-point-TPV, and PMV. The graphs displayed when SET is equal to depicts the best-performing model without the cross-validation
25 ± 1.5 ◦ C the classroom show that users are in the thermal neutral approach, and the red dashed line depicts the best-performing model

8
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 4. Comparison of subjective and objective thermal comfort indicators.

Fig. 5. Tuned hyperparameters during the 250 iterations for the XGBoost model.

9
H. Lan et al. Building and Environment 236 (2023) 110259

with cross-validation. It is clear the distribution range of parameters is the area under the ROC curve (AUC) to evaluate the performance of the
becoming narrow, as the performance of the model improves. Taking the two models. The confusion matrix shows each class in the evaluation
parameter colsample_bytree as the example, the values are selected in data and the number of correct and incorrect predictions. The perfor­
the range from 0 up to 1. The range of the top 10 best performance mance metrics can be compared, including accuracy, precision, sensi­
model is from 0.7 to 1, and finally the value of the best-performance tivity, specificity, and F1-score values (the definitions of metrics shown
model is located at 0.74. The change of range for all nine parametersis in Appendix 3).
summarised in Table 4. To summarize, compared to a full search range, A general stratified sampling method was employed to select 20% of
the distribution ranges of the hyperparameters in the top 10 models tend the cleaned dataset as the test set. The test set and the training set were
to be more centralized. Additionally, even though models without cross- designed to have the same distribution. Among the students in the
validation have similarities with those that have undergone cross- dataset, 13.3% preferred a warmer environment, 52.9% preferred no
validation, there are still noticeable differences, particularly in the change, and 32.8% preferred a cooler environment. This resulted in 129
hyperparameters of learning_rate and reg_alpha. students who preferred a warmer condition, 479 who preferred no
Fig. 5b) shows the performance of all evaluated models. In the scatter change, and 298 who preferred a cooler condition. In the default
plots, each dot is one of the 250 models sorted on the iteration number. XGBoost model (Fig. 6a), the number of samples agreeing with the
The horizontal axis of the plot is the evaluation criteria for model per­ prediction was 23 (preferred warm), 395 (preferred constant), and 163
formance (AUC). It can be seen that as the number of iterations in­ (preferred cool), resulting in recall rates of 17.8%, 82.4%, and 54.7%,
creases, the performance of each model tends to be better. The top 10 respectively. In contrast, the predicted samples of XGBoost based on
models are mostly clustered between 140 and 160 iterations, and when Bayesian optimisation (Fig. 6b) are consistent with the actual samples in
the number of iterations is 156, the model reaches its best performance. 98 (preferred warm), 445 (preferred unchanged), and 265 (preferred
It is notable that there are differences between the best performance cool) students, and the recall rates were 75.9%, 92.9%, and 88.9%,
model with cross-validation and the best performance model without respectively. Similarly, the default XGBoost model predicts that 44
cross-validation. The curves in Fig. 5c shows the performance of the students prefer warmer, 608 prefer no change, and 254 prefer cooler
optimal model on the training data set and testing data. In both curves, conditions. The XGBoost model’s predicted result based on Bayesian
logarithmic loss (logloss) on testing data is a litter higher than training optimisation showed that 111 students prefer warmer, 499 prefer no
data. As the number of epochs increases, the logarithmic loss (logloss) change, and 266 prefer cooler rooms. Using the number of samples
decreases in both datasets, but the gap between them become wider. consistent with the prediction, divide the total number of predicted
After about 120 epochs, the gap between the two datasets becomes samples in each classification (precision). The precision in the three
stable. The accuracy of the optimisation model on the testing set is classifications in the default model is thus 65%, 52%, and 64%, whereas,
slightly lower that of the training set, but the overall performance is in the Bayesian optimised model, it is 89%, 88%, and 90%, respectively.
basically consistent, which means the optimal model has good Other metrics of model performance are summarised in Table 5.
generalisability. In general, when compared to the default XGBosot model, the
Bayesian optimisation-based XGBoost model has higher precision,
4.3. Accuracy of the model recall, F1 score (the harmonic mean of recall and precision), and accu­
racy (the weighted average precision). In Fig. 6c, the ROC of the
The model is compared with the XGBoost setting with the default XGBoost model based on Bayesian optimisation is closer to the top-left
hyperparameter to determine whether its performance is improved after corner (a higher AUC value), also indicating a better a performance.
Bayesian optimisation. This study compares the confusion matrix and
4.4. Model explanation
Table 4
The hyperparameters of XGboost that need to be optimised. 4.4.1. Understanding the model’s decisions
XGboost Description Search Range of Optimal
SHAP value was used to explain the optimised XGBoost model. The
hyperparameters range Top 10 value top 10 factors that strongly influence thermal preference are shown in
models Fig. 7a. The overall importance and ranking of all 20 factors examined
1 colsample_bytree The subsample (0,1] 0.7–1 0.74 can be found in Figure A3. Apart from the six factors already considered
ratio of columns for in the PMV-PPD model (indoor air temperature, indoor radiation tem­
each level. perature, indoor air humidity and indoor air speed, met and clo), four
2 gama Minimum loss [0,10] 0.3–1.7 0.5 other features (BMI, climate, gender and cooling type) are also identified
(min_split_loss) reduction required
to make a further
as key factors related to classroom thermal comfort.
partition on a leaf Fig. 7b provides an overview of how those factors affect the thermal
node of the tree. preferences of students. It can be seen that when students prefer cooler
3 eta Step size shrinkage [0,1] 0–0.5 0.374 temperatures, or feel hot (f(x) = 1), the factors are marked in red
(learning_rate) used in update to
(positive SHAP value) more than those marked in blue (negative SHAP
prevent overfitting.
4 max_depth Maximum depth of [0,40] 12–29 29 value), which means more influencing factors play a positive role than a
a tree. negative role. The opposite is true when students prefer warmer tem­
5 min_chid_weight Minimum sum of [0,10] 2–4 3 peratures, or feel cold (f(x) = -1). When students prefer no change (f(x)
instance weight = 0), there are almost as many factors marked in red as in blue, which
(hessian) needed in
a child.
means the features that play positive and negative roles are almost
6 n_estimators The number of [0,250] 40–200 140 balanced.
weak estimators In Fig. 7c, the extracted examples further demonstrate how those
integrated. features affect individual thermal preference. For example, in Fig. 7c-
7 alpaha L1 regularisation (0,1] 0.3–0.6 0.44
Sample 1, at the bottom, indicated with E[f(X)] = 0.14, is the average
(reg_alpha) term on weights.
8 lambda L2 regularisation (0,100] 0–15 3 prediction of the model over the test set. At the top, indicated with f(x)
(reg_lambda) term on weights. = 1 (prefer cooler or feel hot), is the prediction of the model for the
9 subsample Subsample ratio of (0,1] 0.75–1 0.83 specific sample. The increment/decrement caused by features is shown
the training in the plot illustrating how the model goes from the average prediction
instances.
0.14 to the specific prediction 1, for the specific sample. When the

10
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 6. Accuracy assessment of the model.

In summary, thermal sensation results from the combined effect of


Table 5
various key influencing factors on thermal comfort. A thermoneutral
Evaluation indexes for performance of machine learning models.
condition can be achieved when the positive and negative effects of
Model Precision Recall F1- Support factors on thermal sensation are balanced. When the positive effects of
score
key influencing factors are greater than the negative effects, then people
XGBoost Prefer warmer 0.52 0.18 0.27 129 feel hot, and vice versa.
(Default) Prefer no 0.65 0.82 0.73 479
change
Prefer cooler 0.64 0.55 0.59 298
4.4.2. Interpretation of differential effect
Accuracy 0.64 906 As the six factors in the PMV-PPD model have been widely studied
Macro average 0.60 0.52 0.53 906 and recognised, this study will concentrate on the four other main fac­
Weight average 0.63 0.64 0.62 906 tors identified in this study that affect classroom thermal comfort: BMI,
XGBoost Prefer warmer 0.88 0.76 0.82 129
gender, climate, and cooling type.
+BO Prefer no 0.89 0.93 0.91 479
change
Prefer cooler 0.90 0.89 0.89 298 4.4.2.1. BMI and gender. BMI and gender were recognised as two key
Accuracy 0.89 906 personal variables that affect thermal comfort among classroom users in
Macro average 0.89 0.86 0.87 906
this study. As illustrated in Fig. 8, locally weighted scatterplot smooth­
Weight average 0.89 0.89 0.89 906
ing (LOWESS) between thermal preference and indoor temperature was
performed depending on BMI and gender. The curves are quite different
indoor air temperature is 31.7 ◦ C, there is a 0.33 increment of the pre­ among BMI groups. In general, overweight and obese students preferred
diction. When indoor air humidity is 58%, a 0.21 increment is added to lower temperatures than normal-weight and underweight students.
the prediction, and when the Met is 1.24, a 0.09 increment is added to Students who were overweight or obese have a relatively lower ther­
the projection. The rest can be done in the same way (Cooling type = 2, moneutral temperature, around 25 ◦ C. The thermoneutral temperature
contributing a 0.14 increment; Clo = 0.60, contributing a 0.08 incre­ of a student with a healthy BMI is around 26 ◦ C, and underweight stu­
ment; Climate = 1, contributing a 0.04 increment; Indoor radiation dents with thermoneutral temperatures are a little higher than 26 ◦ C.
temperature = 31.7, contributing a 0.04 increment; BMI = 25.76, Gender differences in thermal preference are relatively small. Females
contributing a 0.04 decrement; Indoor air speed = 0.2, contributing 0.02 have a higher thermoneutral temperature, and prefer a warmer envi­
decrement; Outdoor air humidity = 52, contributing a 0.02 increment; ronment than males.
10 other features contributing a 0.02 decrement). When looking at more
samples, it is worth noting that even though the f(x) is the same for 4.4.2.2. Climate. Fig. 8 depicts the variation in thermal preference
individuals, the contributions of factors vary. across four Koppen climate zones (dry, tropical, continental, and

11
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 7. Effect of global and local SHAP value on the model.

temperate). The LOWESS lines of dry, temperate, and continental shown in Fig. 9-a). The central graph is a comprehensive representation
climate zones had a similar slope and tendency, indicating that the of the relationship network, while the surrounding smaller graphs are
sensitivity of classroom users to indoor air temperature change is the connections between the four key influencing factors and other
essentially consistent in the three climate zones, but students in dry influencing factors. It can be seen that indoor air temperature, BMI, Met,
climate zones prefer a higher temperature with a higher thermoneutral and indoor radiation temperature have a greater effect on thermal
temperature, about 26–27 ◦ C. Temperate and continental climate zones comfort. The main effect and interaction effect of these four factors are
present a relatively low thermoneutral temperature, approximately greater than other factors. Fig. 9-b and 9-c list the top 10 the main effects
24–25 ◦ C. Additionally, despite the fact that it is difficult to determine a and interactive effects on the model. In general, the main effects are
student’s thermal neural temperature in a tropical climate zone from the greater than the interactive effects, and the influence of the main effect
plot, it is clear that students in this climate zone are commonly unsat­ and interactive effect on thermal comfort is 81.82% and 18.18%,
isfied with the thermal environment and prefer a cooler condition. respectively.
Fig. 10a highlights the top 15 important main and interactive effects.
4.4.2.3. Cooling type. The thermal-neutral temperatures for users in It clearly shows that there is a large average main effect for indoor air
naturally ventilated classrooms are flexible than air-conditioned class­ temperature, BMI, indoor radiant temperature, and indoor air humidity,
rooms, with a range from 22 to 26 ◦ C. The trend of the LOWESS curve in indoor air speed. These factors tend to have a larger effect on the
mixed mode classrooms is similar to that of naturally ventilated class­ model’s predictions. It can also be seen, however, that some interactive
rooms, but the curve has shifted slightly to the lower left. In mixed mode effects, such as the interactive effect between indoor air temperature and
classrooms the thermal-neutral temperature is about 25–26 ◦ C. In air- BMI, and the interactive effect between indoor air temperature and Met,
conditioned classrooms, and the thermal-neutral temperature is lower, are even larger than the main effects of climate, gender, and other
around 24 ◦ C, and naturally ventilated classrooms users are more sen­ factors.
sitive to changes in indoor air temperature. Furtherly, looking at the examples in Fig. 10b, three interactive ef­
fects are concluded: two-way interaction, one-way interaction, and
4.4.3. The interactive effect on thermal comfort cross-interaction. Two-way interaction in Fig. 10b–1 for example, is
The effect of key influencing factors is divided into main effects and shown when indoor air temperature and indoor radiation temperature
interactive effects by using the method of calculating SHAP interaction are both low, the interactive SHAP value is more negative, and vice
value (the original result is presented in Appendix 4-Figure A3). To more versa, which indicates students will feel colder when both indoor air
easily examine the results of the main and interactive effects of influ­ temperature and indoor radiation temperature are low, and hotter when
encing factors, all relationships among the key influencing factors are both indoor air temperature and indoor radiation temperature are high.

12
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 8. Comparison of the thermal preference differences among BMI, gender, climate and cooling type.

In one-way interaction, as shown in Fig. 10b–2, variations in BMI did not differences [21,31,33]. Analysing of ASHRAE global thermal comfort
significantly change the SHAP interaction value, but the changes in in­ database II provides a more macro perspective and comprehensive un­
door air temperature have a more noticeable effect on the SHAP inter­ derstanding of the mechanism affecting classroom thermal comfort. It
action value. During cross-interaction (Fig. 10b–3), with the increase in can complement field and laboratory studies on thermal comfort, as the
Met, the interaction SAHP values between BMI and Met show an database contains a large amount of data on thermal comfort, allowing
opposite trend in the high BMI group and the low BMI group. the results of field and laboratory studies to be verified and compared.
This study examined and confirmed the results of previous field and
5. Discussion laboratory studies on thermal comfort, such as determining the key
factors affecting thermal comfort, and identifying individual differences,
Applying the Bayesian optimisation simplified the hyperparameter and interactive effects. Some findings in this study can also provide
tuning process of the XGBoost algorithm [54,68], and the hybrid model research directions for future field and laboratory research; for example,
of XGBoost and Bayesian optimisation can achieve better performance. the threshold value for determining the positive or negative effect of key
This paper also introduces SHAP values as a model interpretation tool, influencing factors mentioned in this study and the intensity proportion,
which can make the complex machine learning model easier to under­ and types of interaction between factors need to be furtherly verified by
stand. Interpretability is the degree to which a model can be understood field tests and laboratory researches.
in human terms [53,67]. Some previous models are of higher accuracy,
but are too complicated to be understood without the help of additional
model explanation techniques [65,67]. By contrast, the more easily 5.1. The differential effect
interpretable models are limited in exploring more complicated re­
lationships [54,69,70]. This study proved that a helpful machine The differences in thermal comfort perceptions depending on in­
learning interpreting tool combined with domain expertise can make the dividuals, climate, and cooling strategies were confirmed [60,61]. In
results of machine learning models more understandable. fact, the difference in thermal perception among individuals is more
The ASHRAE global thermal comfort database II contains extensive closely related to the metabolic rate of the human body [12,59]. Indi­
data about thermal comfort, including field measurement data (such as vidual factors such as BMI, gender and age often affect the body’s
temperature, humidity, met, clo, wind speed, etc.) and more detail about metabolic rate, thus indirectly affecting the thermal balance of the
the personal characteristics of users (such as height, weight, age, gender, human body [30,59]. Although the metabolic rate was considered in the
etc.), as well as the metadata of buildings (such as spatial location, model, the error caused by the oversimplified measurement method
climate analysis, cooling type strategy, etc.) [34,53]. The database is meant that the measured metabolic rate could not fully reflect the real
very convenient for investigating the effect of various factors on thermal body metabolic rate. Applying a standard Met value to all individuals
comfort, especially those factors related to spatial and individual has been questioned in the past decade by the scientific community
[71–73], and therefore, improving the accuracy of the prediction model

13
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 9. Interactions among the 10 features.

14
H. Lan et al. Building and Environment 236 (2023) 110259

Fig. 10. SHAP interaction values affect the model.

of classroom thermal comfort, on the one hand, requires adopting more conditions.
advanced equipment to measure human metabolic rate [23,54,59,74].
On the other hand, considering more indicators related to human 5.3. Comparison with previous study
metabolic rate can indirectly make up for the shortage of measurement
accuracy [26,30,75]. Over the years, there have been many studies conducted to assess
More and more studies have demonstrated that elements of thermal comfort in educational buildings. This study confirms that in­
geographical location can alter thermal comfort perceptions [76–78]. door physical environment factors, such as temperature, humidity, and
One recent research discovery is that thermal perception can be air velocity, as well as personal influencing factors, such as BMI, gender,
nurtured with climate. It points out that the protein α-actinin-3 (ACTN3) Met, and Clo, and climate and ventilation type, all have a significant
in muscle genes could improve cold tolerance [79]. Modern humans impact on thermal comfort [30,36,37,39]. While age had a relatively
who moved from Africa to colder climates in Europe and Asia have no lesser impact on thermal comfort among students in this study compared
α-actinin-3 in their ACTN3 muscle gene, as this would have helped to individual factors like Met, BMI, Clo, and gender. This could be due to
modern humans better tolerate cooler climates [79]. The genetic finding the fact that the sample used in the study comprised mainly of college
may provide a convincing explanation of why classroom users in students with a narrow age range of 20.9 ± 2.15 years, making it
temperate and continental climates zone have lower thermoneutral difficult to observe significant differences in the impact of age on ther­
temperatures. This study also confirmed that compared with the stu­ mal comfort.
dents in the air conditioning environment, the acceptable temperature Although there are individual differences, this study has generally
range of the students in the natural ventilation condition is wider [20, found that classroom’s occupants prefer a temperature range of
78]. 22–26 ◦ C. However, some studies in educational buildings have found
that the optimal temperature range may vary more widely, around
5.2. The intensity, proportion and types of interactive effect 21.1–28.6 ◦ C [9]. It suggests that optimal temperature range for thermal
comfort can vary based on factors such as geographical location, season,
The results of SHAP values demonstrated that the main and inter­ type of building, and occupants’ specific characteristics.
active effects of key influencing factors accounted for 80% and 20% of Furthermore, the study finds that there are common interactive ef­
the thermal comfort model, respectively. Influencing factors, such as fects between these key factors, which is consistent with previous
indoor air temperature BMI, Met, and indoor radiation temperature, studies. [31–33,80]. It is important to highlight that, unlike previous
interact more easily with each other, and the interactive effects between studies, this research not only identifies the presence of interactive ef­
them are even greater than the main effect of other factors. This study fects but also quantifies their strength and qualitatively categorizes their
summarises three types of interaction effects: two-way interaction, one- types.
way interaction, and cross-interaction. Two-way interaction often leads
to additive spillover effects. One-way interaction can lead to spillover 5.4. Limitations and future studies
effects under extreme conditions. This means that interactions between
features are only more obvious under certain extreme conditions. The There still are some deficiencies in data quantity and spatial distri­
cross-interaction effect, changing the same environmental or individual bution. Although there are more than 100,000 records of data, the
factor, will lead to completely different trends under different database only has about 10,000 records on the building type of

15
H. Lan et al. Building and Environment 236 (2023) 110259

classroom. Some indicators have missing values, resulting in less than Hyperparameter tuning is vital to the performance of XGBoost. The
4600 records of data being available. The data collected was primarily proposed BO-XGBoost model for predicting thermal comfort presents
from only a few regions and countries. These limitations in data quality better performance in efficiency and accuracy. SHAP values can make
and quantity mean that this study may be insufficient/inconclusive for the complex XGBoost model more understandable.
data analysis. Furthermore, the scarce valid records on indoor air quality This database analysis complements field and laboratory studies on
indicators, such as CO2, hinder a thorough examination of air quality thermal comfort because it provides a more profound understanding of
indicators’ effect on thermal comfort. individual differences and interactive effects. Overall, this study pro­
The findings of this study indicate that further research is necessary vides valuable insights into the complex interplay of various factors that
to enhance the accuracy and interpretation of personal thermal comfort influence thermal comfort in educational buildings. By considering
models. This can be achieved by collecting more diverse personal these factors, building designers and facility managers can create more
characteristics as input, such as heart rate, pulse, and oxygen saturation, comfortable and efficient learning environments that meet the needs of a
from a wider range of classroom users and investigating the interaction diverse range of occupants.
of these factors. In addition, future studies should also consider addi­ Thermal comfort is the result of multiple influencing factors, and
tional indoor environment indicators, such as air quality, to provide a relying solely on traditional and typical environmental indicators may
more comprehensive understanding of the factors that affect thermal lead to some inaccuracies, because individual factors can lead to dif­
comfort. ferences in thermal preferences, and the importance of various indi­
To validate and reinforce the study’s findings, more comprehensive vidual factors may differ slightly in the specific environment of
case studies should be conducted to examine the conditions and cir­ classrooms. Moreover, this study initially highlighted that the interac­
cumstances in various universities. The results of multiple case studies tive effects between the influencing factors on classroom thermal com­
can be used to refine the recommendations and make them more fort both qualitatively and quantitatively. This helps to understand
applicable to universities worldwide. Finally, there is a need to choose which influencing factors are more likely to interact and the type of
the best solution from various options to meet the thermal comfort needs interaction produced. Additionally, the network relationship diagram
of students and decrease energy consumption in classrooms. This will identified by the influencing factors on classroom thermal comfort
require further research to identify and evaluate different solutions, such makes it easier to intuitively understand the complex relationship be­
as passive and active design strategies, and determine their effectiveness tween the influencing factors and thermal comfort in the classroom. This
in different contexts. study can contribute to guiding the design and regulation of HVAC, and
realising the real-time interaction between students and the thermal
6. Conclusions environment of their classrooms.

This study screened out the data related to classroom thermal com­ CRediT authorship contribution statement
fort from the ASHRAE Global Thermal Comfort Database II. The cleaned
data was trained by a XGBoost and Bayesian optimisation mixed model, Haifeng Lan: Writing – original draft, Visualization, Validation,
then interpreted by SHAP values. The main findings of this study are Methodology, Investigation, Formal analysis, Data curation, Conceptu­
summarised as follows. alization. Huiying (Cynthia) Hou: Writing – review & editing, Super­
There is a strong relationship between SET and PMV calculated from vision, Project administration, Conceptualization. Zhonghua Gou:
objective data. TSV and TPV, the subjective vote data, also have a strong Writing – review & editing, Supervision, Conceptualization.
relationship. The correlation between objective data and subjective
data, however, is weak. As a result, the thermal comfort index generated Declaration of competing interest
from objective data cannot adequately reflect classroom thermal
perception. The authors declare that they have no known competing financial
The thermal perceptions of students differ depending on BMI, interests or personal relationships that could have appeared to influence
gender, climate, and building cooling strategies. Overweight and obese the work reported in this paper.
students preferred lower temperatures than normal-weight and under­
weight students. Females prefer a warmer environment than males. Data availability
Students in dry climatic zones have greater heat tolerance. Students in
naturally ventilated classrooms are more sensitive to indoor tempera­ Data will be made available on request.
ture change than students in air-conditioned classrooms.
Factors are of different importance to individuals. A thermoneutral Acknowledgements
situation can be achieved when the positive and negative effects of the
key features are balanced. The effects of factors on classroom thermal We acknowledge ASHRAE for providing the data and platform, as
comfort can be divided into main effects (80%) and interaction effects well as the efforts of academics and organisations throughout the world
(20%), and some interaction effects are even greater than the main ef­ in making their research datasets available to the public. This study was
fects. There are three typical types of interactive effects, two-way supported by the grant from the Hong Kong Polytechnic University:
interaction, one-way interaction, and cross-interaction. Start-up Fund for New Recruits (Project ID: P0040305).

Appendix 1. The principle and formula derivation of XGBoost

The objective function of XGBoost consists of two parts: loss function and regularisation term.

n ∑
t
Obj = l(yi , ŷi ) + Ω(ft ), (A1)
i=1 t=1

1 1 ∑T
Ω(fk ) = γT + λ‖w‖2 = γT + λ w2 (A2)
2 2 j=1 j

16
H. Lan et al. Building and Environment 236 (2023) 110259


t
ŷi = ŷi (t− 1)
+ ft (xi ) = ft (xi ) (A3)
t=1


Where, l(yi , ŷi ) is the loss function; ti=1 Ω(fk ) represents the regularisation item; and where, ŷi is the predicted value; yi is the true value; xi is the value
of features; ft is the tree model t; T is the number of leaf node; w is the weight of leaf node; λ is penalises regularisation item for leaf weights.
The loss function can be rewritten as Equation (4).

n
( ( ) ∑t
Obj(t) = l yi , ̂y (t− 1)
+ ft (xi ) + Ω(ft ) (A4)
i=1 i=1

It’s known Taylor’s expansion of second order is Equation (5), so the loss function can also be rewritten as Equation (6). Where gi is the first
derivative and hi is the second derivative of the l(yi , ŷi ).
1
(A5)

f (x + Δx) ≅ f (x) + f (x)Δx + f ′′ (x)Δx2
2
∑n [ ( ) 1
Obj(t) ≅ y (t−
l yi , ̂ i
1)
+ gi ft (xi ) + hi ft2 (xi ) + Ω(ft ) + constant (A6)
i=1
2
(t− 1)
The loss function can be further expressed as Equation (7), as the l(yi , ̂ y i ) also is a constant. Where Gj is the sum of gi in leaf node, and Hj is the
sum of hi in leaf nodes
⎤ ⎤
∑ n [ ∑n [
1 2 ⎦ 1 2 1 ∑ T
(t)
Obj ≅ gi ft (xi ) + hi ft (xi ) + Ω(ft ) = gi ft (xi ) + hi ft (xi )⎦ + λ w2j
i=1
2 i=1
2 2 j=1
⎡( ) ⎛ ⎤ ⎞ (A7)
∑ ∑ ) ( )2
T
⎣ 1⎝∑ 2⎦ 1∑ T
(
⎠ Gj 1∑ T
Gj 2
= gi wj + hi + λ wj + λT = Hj + λ wj + + λT −
j=1 i∈Ij
2 i∈Ij 2 j=1 Hj + λ 2 j=1 Hj + λ

Equation(7)is a quadratic function of one variable,so when wj = Gj /(Hj + λ), the function obtians the minimum value, as shown in Equation (8).
∑T
Gj 2
̂ ≅1

Obj + λT (A8)
2 j=1 Hj + λ

Appendix 2. The principle and implementation of Bayesian optimisation

Fig. A1. The processing of Bayesian optimisation

As shown in Figure A1, Bayesian optimisation assumes that the function to be optimised is f (X)| X⊂Rn . Then, in each iteration (t = 1,2,⋯,T),
f(xt )|(xt εX) is obtained according to the acquisition function (αt). A noisy observation yt = f(xt ) + εt is obtained, where εt follows the zero-mean

17
H. Lan et al. Building and Environment 236 (2023) 110259

Gaussian distribution ε~N(0, σ 2 ). New observations xt , yt are added to the observation data, and then the next iteration is performed. The process can
be expressed as follows.

1: For t = 1,2, …, do
2: Find Xt by combining attributes of the posterior distribution in acquisition function (αt).
xt = argmaxX αt(X|D1:t− 1 ) (A9)

3: Sampling the objective function:


yt = f (xt ) + εt (A10)

4: Augment the data D1:t− 1 = {D1:t− 1 , (xt, yt, )} and update the Gaussian process.
5: End for

Appendix 3. . Confusion matrix and ROC curve

Fig. A2. Performance metrics

The confusion matrix represents the correctly classified TP values, FP values in the relevant class when they should be in another class, and FN
values in another class when they should be in the relevant class, and the correctly classified TN values in the other class. The most commonly used
performance metrics for classification according to these values are accuracy (ACC), precision (P), sensitivity (Sn), specificity (Sp), and F-score values.
The calculation of these performance metrics according to the values in the confusion matrix is according to Equations(A8-10).
TP + TN
Accuracy = (A11)
TP + FP + FN + TN

TP
Precision = (A12)
TP + FP

TP
Recall = (A13)
TP + FN

2 ∗ Precision ∗ Recall
F1 score = (A14)
Precision + Recall
An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.
This curve plots two parameters: True positive rate (Sensitivity) and false positive rate (Specificity).
TP
Sensitivity = (A15)
TP + FN

TN
Specificity = (A16)
FP + FN

18
H. Lan et al. Building and Environment 236 (2023) 110259

Appendix 4. Indicator distribution and importance ranking

Fig. A3. The distribution and importance ranking of all selected influencing factors

Appendix 5. The result of SHAP values

Fig. A4. SHAP interactive value among 10 key influencing factors.

19
H. Lan et al. Building and Environment 236 (2023) 110259

References [25] S. ichi Amari, Machine Learning, Appl. Math. Sci., 2016, https://doi.org/10.1007/
978-4-431-55978-8_11.
[26] F. Jazizadeh, A. Ghahramani, B. Becerik-Gerber, T. Kichkaylo, M. Orosz, User-led
[1] M.J. Mendell, G.A. Heath, Do indoor pollutants and thermal conditions in schools
decentralized thermal comfort driven HVAC operations for improved efficiency in
influence student performance? A critical review of the literature, Indoor Air 15
office buildings, Energy Build. 70 (2014) 398–410, https://doi.org/10.1016/J.
(2005) 27–52, https://doi.org/10.1111/J.1600-0668.2004.00320.X.
ENBUILD.2013.11.066.
[2] J.Y. Shen, S. Kojima, X.Y. Ying, X.J. Hu, Influence of thermal experience on thermal
[27] H. Tang, Y. Ding, B. Singer, Interactions and comprehensive effect of indoor
comfort in naturally conditioned university classrooms, Lowland Technol. Int. 21
environmental quality factors on occupant satisfaction, Build. Environ. 167 (2020),
(2019) 107–122. https://cot.unhas.ac.id/journals/index.php/ialt_lti/article/vie
106462, https://doi.org/10.1016/J.BUILDENV.2019.106462.
w/566. (Accessed 26 September 2022).
[28] Y.H. Yau, H.S. Toh, B.T. Chew, N.N. Nik Ghazali, A review of human thermal
[3] C. Ramírez-Dolores, L.A. Lugo-Ramírez, B.A. Hernández-Cortaza, G. Alcalá, J. Lara-
comfort model in predicting human–environment interaction in non-uniform
Valdés, J. Andaverde, Dataset on thermal comfort, perceived stress, and anxiety in
environmental conditions, J. Therm. Anal. Calorim. 147 (2022) 14739–14763,
university students under confinement due to COVID-19 in a hot and humid region
https://doi.org/10.1007/S10973-022-11585-0/TABLES/8.
of Mexico, Data Brief 41 (2022), 107996, https://doi.org/10.1016/J.
[29] R.J. De Dear, T. Akimoto, E.A. Arens, G. Brager, C. Candido, K.W.D. Cheong, B. Li,
DIB.2022.107996.
N. Nishihara, S.C. Sekhar, S. Tanabe, J. Toftum, H. Zhang, Y. Zhu, Progress in
[4] V. Lovec, M. Premrov, V.Ž. Leskovar, Practical impact of the COVID-19 pandemic
thermal comfort research over the last twenty years, Indoor Air 23 (2013)
on indoor air quality and thermal comfort in kindergartens. A case study of
442–461, https://doi.org/10.1111/INA.12046.
Slovenia, Int. J. Environ. Res. Publ. Health 18 (2021), https://doi.org/10.3390/
[30] Z. Wang, R. de Dear, M. Luo, B. Lin, Y. He, A. Ghahramani, Y. Zhu, Individual
IJERPH18189712.
difference in thermal comfort: a literature review, Build. Environ. 138 (2018)
[5] S. Barbhuiya, S. Barbhuiya, Thermal comfort and energy consumption in a UK
181–193, https://doi.org/10.1016/J.BUILDENV.2018.04.040.
educational building, Build. Environ. 68 (2013) 1–11, https://doi.org/10.1016/J.
[31] M. Luo, J. Xie, Y. Yan, Z. Ke, P. Yu, Z. Wang, J. Zhang, Comparing machine
BUILDENV.2013.06.002.
learning algorithms in predicting thermal sensation using ASHRAE Comfort
[6] L. Dias Pereira, D. Raimondo, S.P. Corgnati, M. Gameiro Da Silva, Energy
Database II, Energy Build. 210 (2020), 109776, https://doi.org/10.1016/J.
consumption in schools – a review paper, Renew. Sustain. Energy Rev. 40 (2014)
ENBUILD.2020.109776.
911–922, https://doi.org/10.1016/J.RSER.2014.08.010.
[32] C.K.C. Lam, Y. Gao, H. Yang, T. Chen, Y. Zhang, C. Ou, J. Hang, Interactive effect
[7] R.J. de Dear, G.S. Brager, Developing an adaptive model of thermal comfort and
between long-term and short-term thermal history on outdoor thermal comfort:
preference, ASHRAE Trans 104 (1998) 145–167.
comparison between Guangzhou, Zhuhai and Melbourne, Sci. Total Environ. 760
[8] A. ASHRAE, ASHRAE Standard 55: Thermal Environmental Conditions for Human
(2021), 144141, https://doi.org/10.1016/J.SCITOTENV.2020.144141.
Occupancy, 2004.
[33] F. Zhang, R. de Dear, Impacts of demographic, contextual and interaction effects on
[9] Z.S. Zomorodian, M. Tahsildoost, M. Hafezi, Thermal comfort in educational
thermal sensation—evidence from a global database, Build. Environ. 162 (2019),
buildings: a review article, Renew. Sustain. Energy Rev. 59 (2016) 895–906,
https://doi.org/10.1016/J.BUILDENV.2019.106286.
https://doi.org/10.1016/J.RSER.2016.01.033.
[34] V.F. Ličina, T. Cheung, H. Zhang, R. De Dear, T. Parkinson, E. Arens, C. Chun,
[10] H.H. Liang, T.P. Lin, R.L. Hwang, Linking occupants’ thermal perception and
S. Schiavon, M. Luo, G. Brager, Development of the ASHRAE global thermal
building thermal performance in naturally ventilated school buildings, Appl.
comfort database II, Build. Environ. 142 (2018) 502–512.
Energy 94 (2012) 355–363, https://doi.org/10.1016/J.APENERGY.2012.02.004.
[35] S. Jing, Y. Lei, H. Wang, C. Song, X. Yan, Thermal comfort and energy-saving
[11] Y. Liu, J. Jiang, D. Wang, J. Liu, The indoor thermal environment of rural school
potential in university classrooms during the heating season, Energy Build. 202
classrooms in Northwestern China, Indoor Built Environ. 26 (2017) 662–679,
(2019), 109390, https://doi.org/10.1016/J.ENBUILD.2019.109390.
https://doi.org/10.1177/1420326X16634826/ASSET/IMAGES/10.1177_
[36] H. Mohammadpourkarbasi, I. Jackson, D. Nukpezah, I. Appeaning Addo, R. Assasie
1420326X16634826-IMG2.PNG.
Oppong, Evaluation of thermal comfort in library buildings in the tropical climate
[12] G. Havenith, I. Holmér, K. Parsons, Personal factors in thermal comfort assessment:
of Kumasi, Ghana, Energy Build. 268 (2022), 112210, https://doi.org/10.1016/J.
clothing properties and metabolic heat production, Energy Build. 34 (2002)
ENBUILD.2022.112210.
581–591, https://doi.org/10.1016/S0378-7788(02)00008-7.
[37] M.K. Singh, R. Ooka, H.B. Rijal, S. Kumar, A. Kumar, S. Mahapatra, Progress in
[13] D.P. Wyon, P. Wargocki, The adaptive thermal comfort model may not always
thermal comfort studies in classrooms over last 50 years and way forward, Energy
predict thermal effects on performance, Indoor Air 24 (2014) 552–553, https://doi.
Build. 188–189 (2019) 149–174, https://doi.org/10.1016/J.
org/10.1111/ina.12098.
ENBUILD.2019.01.051.
[14] P.O. Fanger, Thermal comfort. Analysis and applications in environmental
[38] R.L. Hwang, T.P. Lin, N.J. Kuo, Field experiments on thermal comfort in campus
engineering, Therm. Comf. Anal. Appl. Environ. Eng. (1970).
classrooms in Taiwan, Energy Build. 38 (2006) 53–62, https://doi.org/10.1016/J.
[15] R. Rana, B. Kusy, R. Jurdak, J. Wall, W. Hu, Feasibility analysis of using humidex as
ENBUILD.2005.05.001.
an indoor thermal comfort predictor, Energy Build. 64 (2013) 17–25, https://doi.
[39] S.P. Corgnati, M. Filippi, S. Viazzo, Perception of the thermal environment in high
org/10.1016/J.ENBUILD.2013.04.019.
school and university classrooms: subjective preferences and thermal comfort,
[16] L. Jiang, R. Yao, Modelling personal thermal sensations using C-Support Vector
Build, Environ 42 (2007) 951–959, https://doi.org/10.1016/J.
Classification (C-SVC) algorithm, Build. Environ. 99 (2016) 98–106, https://doi.
BUILDENV.2005.10.027.
org/10.1016/J.BUILDENV.2016.01.022.
[40] H. Zhang, E. Arens, Y. Zhai, A review of the corrective power of personal comfort
[17] J. Van Hoof, Forty years of Fanger’s model of thermal comfort: comfort for all?
systems in non-neutral ambient environments, Build. Environ. 91 (2015) 15–41,
Indoor Air 18 (2008) 182–201, https://doi.org/10.1111/J.1600-
https://doi.org/10.1016/J.BUILDENV.2015.03.013.
0668.2007.00516.X.
[41] J. Nakano, S.I. Tanabe, K.I. Kimura, Differences in perception of indoor
[18] A. Martinez-Molina, P. Boarin, I. Tort-Ausina, J.L. Vivancos, Post-occupancy
environment between Japanese and non-Japanese workers, Energy Build. 34
evaluation of a historic primary school in Spain: comparing PMV, TSV and PD for
(2002) 615–621, https://doi.org/10.1016/S0378-7788(02)00012-9.
teachers’ and pupils’ thermal comfort, Build, Environ 117 (2017) 248–259,
[42] B.K. Sovacool, Diversity: energy studies need social science, Nature (2014),
https://doi.org/10.1016/J.BUILDENV.2017.03.010.
https://doi.org/10.1038/511529a.
[19] A.K. Mishra, M.T.H. Derks, L. Kooi, M.G.L.C. Loomans, H.S.M. Kort, Analysing
[43] F.R. d’Ambrosio Alfano, B.W. Olesen, B.I. Palella, Povl Ole Fanger’s impact ten
thermal comfort perception of students through the class hour, during heating
years later, Energy Build. 152 (2017) 243–249, https://doi.org/10.1016/J.
season, in a university classroom, Build, Environ 125 (2017) 464–474, https://doi.
ENBUILD.2017.07.052.
org/10.1016/J.BUILDENV.2017.09.016.
[44] J. van Hoof, L. Schellen, V. Soebarto, J.K.W. Wong, J.K. Kazak, Ten questions
[20] M. Puteh, M.H. Ibrahim, M. Adnan, C.N. Che’Ahmad, N.M. Noh, Thermal comfort
concerning thermal comfort and ageing, Build. Environ. 120 (2017) 123–133,
in classroom: constraints and issues, Procedia - Soc. Behav. Sci. 46 (2012)
https://doi.org/10.1016/J.BUILDENV.2017.05.008.
1834–1838, https://doi.org/10.1016/J.SBSPRO.2012.05.388.
[45] S. Karjalainen, Gender differences in thermal comfort and use of thermostats in
[21] S. Lu, W. Wang, C. Lin, E.C. Hameen, Data-driven simulation of a thermal comfort-
everyday thermal environments, Build. Environ. 42 (2007) 1594–1603, https://
based temperature set-point control with ASHRAE RP884, Build. Environ. 156
doi.org/10.1016/J.BUILDENV.2006.01.009.
(2019) 137–146, https://doi.org/10.1016/J.BUILDENV.2019.03.010.
[46] S. Del Ferraro, S. Iavicoli, S. Russo, V. Molinaro, A field study on thermal comfort
[22] G. Tardioli, R. Filho, P. Bernaud, D. Ntimos, An innovative modelling approach
in an Italian hospital considering differences in gender and age, Appl. Ergon. 50
based on building physics and machine learning for the prediction of indoor
(2015) 177–184, https://doi.org/10.1016/J.APERGO.2015.03.014.
thermal comfort in an office building, Buildings 12 (2022), https://doi.org/
[47] L. Schellen, W.D. van Marken Lichtenbelt, M.G.L.C. Loomans, J. Toftum, M.H. de
10.3390/buildings12040475.
Wit, Differences between young adults and elderly in thermal comfort,
[23] J. Kim, Y. Zhou, S. Schiavon, P. Raftery, G. Brager, Personal comfort models:
productivity, and thermal physiology in response to a moderate temperature drift
predicting individuals’ thermal preference using occupant heating and cooling
and a steady-state condition, Indoor Air 20 (2010) 273–283, https://doi.org/
behavior and machine learning, Build. Environ. 129 (2018) 96–106, https://doi.
10.1111/J.1600-0668.2010.00657.X.
org/10.1016/J.BUILDENV.2017.12.011.
[48] M.A. Humphreys, J. Fergus Nicol, The validity of ISO-PMV for predicting comfort
[24] M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, and prospects,
votes in every-day thermal environments, Energy Build. 34 (2002) 667–684,
Science (80-.) 349 (2015) 255–260, https://doi.org/10.1126/science.aaa8415.
https://doi.org/10.1016/S0378-7788(02)00018-X.

20
H. Lan et al. Building and Environment 236 (2023) 110259

[49] J. Antoy, Design of Experiments for Engineers and Scientists, second ed., Des. Exp. [66] J. Yang, Fast TreeSHAP: Accelerating SHAP Value Computation for Trees, (n.d.).
Eng. Sci., 2014 https://doi.org/10.1016/C2012-0-03558-2, 1–672. [67] K. Futagami, Y. Fukazawa, N. Kapoor, T. Kito, Pairwise acquisition prediction with
[50] K. Liu, T. Nie, W. Liu, Y. Liu, D. Lai, A machine learning approach to predict SHAP value interpretation, J. Financ. Data Sci. 7 (2021) 22–44, https://doi.org/
outdoor thermal comfort using local skin temperatures, Sustain. Cities Soc. 59 10.1016/J.JFDS.2021.02.001.
(2020), 102216, https://doi.org/10.1016/J.SCS.2020.102216. [68] J. Zhou, Y. Qiu, S. Zhu, D.J. Armaghani, M. Khandelwal, E.T. Mohamad,
[51] C. Vasilikou, M. Nikolopoulou, Outdoor thermal comfort for pedestrians in Estimation of the TBM advance rate under hard rock conditions using XGBoost and
movement: thermal walks in complex urban morphology, Int. J. Biometeorol. 64 Bayesian optimization, Undergr. Space 6 (2021) 506–515, https://doi.org/
(2020) 277–291, https://doi.org/10.1007/S00484-019-01782-2/FIGURES/10. 10.1016/J.UNDSP.2020.05.008.
[52] X. Du, Y. Zhang, S. Zhao, Research on interaction effect of thermal, light and [69] H. Lan, Z. Gou, C. Hou, Understanding the relationship between urban morphology
acoustic environment on human comfort in waiting hall of high-speed railway and solar potential in mixed-use neighborhoods using machine learning
station, Build. Environ. 207 (2022), 108494, https://doi.org/10.1016/j. algorithms, Sustain. Cities Soc. 87 (2022), 104225, https://doi.org/10.1016/J.
buildenv.2021.108494. SCS.2022.104225.
[53] Z. Wang, J. Wang, Y. He, Y. Liu, B. Lin, T. Hong, Dimension analysis of subjective [70] A.C. Cosma, R. Simha, Machine learning method for real-time non-invasive
thermal comfort metrics based on ASHRAE Global Thermal Comfort Database prediction of individual thermal preference in transient conditions, Build. Environ.
using machine learning, J. Build. Eng. 29 (2020), 101120, https://doi.org/ 148 (2019) 372–383, https://doi.org/10.1016/J.BUILDENV.2018.11.017.
10.1016/J.JOBE.2019.101120. [71] S. Kozey, K. Lyden, J. Staudenmayer, P. Freedson, Errors in MET estimates of
[54] B. Yang, X. Li, Y. Liu, L. Chen, R. Guo, F. Wang, K. Yan, Comparison of models for physical activities using 3.5 ml x kg(-1) x min(-1) as the baseline oxygen
predicting winter individual thermal comfort based on machine learning consumption, J. Phys. Activ. Health 7 (2010) 508–516, https://doi.org/10.1123/
algorithms, Build. Environ. 215 (2022), 108970, https://doi.org/10.1016/J. JPAH.7.4.508.
BUILDENV.2022.108970. [72] N.M. Byrne, A.P. Hills, G.R. Hunter, R.L. Weinsier, Y. Schutz, Metabolic equivalent:
[55] N. Ma, L. Chen, J. Hu, P. Perdikaris, W.W. Braham, Adaptive behavior and different one size does not fit all, J. Appl. Physiol. 99 (2005) 1112–1119, https://doi.org/
thermal experiences of real people: a Bayesian neural network approach to thermal 10.1152/JAPPLPHYSIOL.00023.2004.
preference prediction and classification, Build. Environ. 198 (2021), 107875, [73] R.G. McMurray, J. Soares, C.J. Caspersen, T. McCurdy, Examining variations of
https://doi.org/10.1016/J.BUILDENV.2021.107875. resting metabolic rate of adults: a public health perspective, Med. Sci. Sports Exerc.
[56] A.A. Farhan, K. Pattipati, B. Wang, P. Luh, Predicting individual thermal comfort 46 (2014) 1352, https://doi.org/10.1249/MSS.0000000000000232.
using machine learning algorithms, IEEE Int. Conf. Autom. Sci. Eng. 2015-October [74] T. Chaudhuri, D. Zhai, Y.C. Soh, H. Li, L. Xie, Random forest based thermal comfort
(2015) 708–713, https://doi.org/10.1109/COASE.2015.7294164. prediction from gender-specific physiological parameters using wearable sensing
[57] B. Lala Φ, H. Rizk, S. Manas Kala, A. Hagishima, Multi-task Learning for technology, Energy Build. 166 (2018) 391–406, https://doi.org/10.1016/J.
Concurrent Prediction of Thermal Comfort, Sensation, and Preference, 2022, ENBUILD.2018.02.035.
https://doi.org/10.48550/arxiv.2204.12380. [75] W. Jung, F. Jazizadeh, Comparative assessment of HVAC control strategies using
[58] Dryad Data – ASHRAE global database of thermal comfort field measurements, (n. personal thermal comfort and sensitivity models, Build. Environ. 158 (2019)
d.). https://datadryad.org/stash/dataset/doi:10.6078/D1F671 (accessed October 104–119, https://doi.org/10.1016/J.BUILDENV.2019.04.043.
11, 2022). [76] G.C. Donaldson, H. Rintamäki, S. Näyhä, Outdoor clothing: its relationship to
[59] D. Li, C.C. Menassa, V.R. Kamat, Personalized human comfort in indoor building geography, climate, behaviour and cold-related mortality in Europe, Int. J.
environments under diverse conditioning modes, Build. Environ. 126 (2017) Biometeorol. 45 (2001) 45–51.
304–317, https://doi.org/10.1016/J.BUILDENV.2017.10.004. [77] K.A. Nice, N. Nazarian, M.J. Lipson, M.A. Hart, S. Seneviratne, J. Thompson,
[60] T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in: Proc. ACM M. Naserikia, B. Godic, M. Stevenson, Isolating the impacts of urban form and
SIGKDD Int. Conf. Knowl. Discov. Data Min. 13-17-August-2016, 2016, fabric from geography on urban heat and human thermal comfort, Build. Environ.
pp. 785–794, https://doi.org/10.1145/2939672.2939785. 224 (2022), 109502, https://doi.org/10.1016/J.BUILDENV.2022.109502.
[61] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen, Xgboost: [78] J.W. Grigg, L.B. Buckley, Conservatism of lizard thermal tolerances and body
extreme gradient boosting, R Packag. Version (2015) 1–4, 0.4-2. 1. temperatures across evolutionary history and geography, Biol. Lett. 9 (2013),
[62] L. Grinsztajn, E. Oyallon, G. Varoquaux, Why Do Tree-Based Models Still https://doi.org/10.1098/RSBL.2012.1056.
Outperform Deep Learning on Tabular Data?, 2022, https://doi.org/10.48550/ [79] V.L. Wyckelsma, T. Venckunas, P.J. Houweling, M. Schlittler, V.M. Lauschke, C.
arxiv.2207.08815. F. Tiong, H.D. Wood, N. Ivarsson, H. Paulauskas, N. Eimantas, D.C. Andersson, K.
[63] P.I. Frazier, A Tutorial on Bayesian Optimization, 2018, https://doi.org/10.48550/ N. North, M. Brazaitis, H. Westerblad, Loss of α-actinin-3 during human evolution
arxiv.1807.02811. provides superior cold resilience and muscle heat generation, Am. J. Hum. Genet.
[64] J. Snoek, H. Larochelle, R.P. Adams, Practical bayesian optimization of machine 108 (2021) 446–457, https://doi.org/10.1016/J.AJHG.2021.01.013.
learning algorithms, Adv. Neural Inf. Process. Syst. 25 (2012). [80] R.F. Rupp, N.G. Vásquez, R. Lamberts, A review of human thermal comfort in the
[65] H. Lan, Z. Gou, Y. Lu, Machine learning approach to understand regional disparity built environment, Energy Build. 105 (2015) 178–205, https://doi.org/10.1016/J.
of residential solar adoption in Australia, Renew. Sustain. Energy Rev. 136 (2021), ENBUILD.2015.07.047.
110458, https://doi.org/10.1016/j.rser.2020.110458.

21

You might also like