You are on page 1of 13

Article - Measurement

Transactions of the Institute of


Measurement and Control
FMI real-time co-simulation-based 2023, Vol. 45(4) 661–673
Ó The Author(s) 2022

machine deep learning control of Article reuse guidelines:


sagepub.com/journals-permissions
DOI: 10.1177/01423312221119635
HVAC systems in smart buildings: journals.sagepub.com/home/tim

Digital-twins technology

Saeid-Reza Mohseni1, Meisam Jahanshahi Zeitouni2,


Ahmad Parvaresh2, Saber Abrazeh3, Meysam Gheisarnejad4
and Mohammad-Hassan Khooban4

Abstract
As heating, ventilation, and air conditioning (HVAC) systems have become one of the most contributing systems in energy consumption in the world,
the control of these large-scale systems remains a challenging duty due to the decoupling effects of control variables. Accordingly, the penetration of
these types of systems in all-smart buildings has increased in recent years. Furthermore, the application of digital twin as a fast-growing concept is being
developed. In HVAC systems, independent and accurate control of temperature and humidity of the indoor air has been playing an undeniable role in
reducing energy consumption. In this paper, to have cost-effective energy management in a single-zone HVAC system, a new reliable digital twin proxi-
mal policy optimization (PPO)–based model-independent nonsingular terminal sliding-mode control (MINTSMC) methodology has been proposed.
Moreover, due to the nonlinear characteristics of HVAC systems, MINTSMC tends to handle the un-modeled system dynamics and disturbances. For
regulating parameters of proposed control, an efficient PPO algorithm has been developed due to its actor-critic-based reinforcement learning.
Extensive examinations and comparative analyses with particle swarm optimization designed sliding-mode control and proportional–integral–derivative
controller have been made using digital twin of the proposed controller to show the importance, accuracy, and application of this method in the com-
fort and energy management achievement of HVAC control systems. A digital signal processor computing device has been utilized for implementation
by utilizing hardware-in-loop (HIL) in the concept of the digital twin. To determine the interface between established models, software-in-loop, and
HIL, the Functional Mock-up Interface has been utilized. The outcomes revealed a superior performance of suggested digital twin-based controller than
the compared control methodologies in the compensation of unknown uncertainties, fast-tracking, and smooth response.

Keywords
HVAC system, PPO algorithm, nonsingular terminal sliding-mode controller, hardware-in-loop, digital twin (DT)

Introduction and preliminaries derivative (PID) controller and ON/OFF controllers are nei-
ther cost-effective, energy-efficient, nor reliable, numerous
Nowadays, heating, ventilation, and air conditioning (HVAC) nonlinear methods have recently been investigated to increase
systems are key components of building mechanical systems HVAC system efficiency (Afram and Janabi-Sharifi, 2014).
that offer inhabitants thermal comfort and good indoor air
quality. They are more commonly utilized in a variety of 1
Shiraz University, Iran
structures, including industrial, commercial, residential, and 2
Sharif University of Technology, Iran
institutional structures. Because the use of energy by HVAC 3
Shahid Bahonar University of Kerman, Iran
equipment in industrial and commercial buildings accounts 4
Department of Electrical and Computer Engineering, Aarhus University,
for 50% of global energy consumption (Pérez-Lombard et al., Denmark
2008), the most major difficulty in HVAC systems is energy
management (Pérez-Lombard et al., 2008). *Saeid-Reza Mohseni is now affiliated to Sharif University of Technology,
Despite the similarities between HVAC and other types of Tehran, Iran; Saber Abrazeh and Ahmad Parvaresh are now affiliated to
process control, some characteristics, such as nonlinear Faculty of Electrical Engineering and Computing, University of Zagreb,
Zagreb, Croatia.
dynamics, time-varying system dynamics and set-points, time-
varying disturbances, and conflicting control loops, make Corresponding author:
HVAC system control unique and hard (Afram and Janabi- Meysam Gheisarnejad, Department of Engineering Electrical and
Sharifi, 2014). To address these difficulties, a variety of con- Computer Engineering, Edison Finlandsgade 22, building 5125, 8200
trol approaches have been developed or proposed. Because Aarhus N, Denmark.
traditional techniques such as proportional–integral– Email: me.gheisarnejad@gmail.com
662 Transactions of the Institute of Measurement and Control 45(4)

The application of multiple-input, multiple-output (MIMO) contributing part of this paper. The main contributions of this
robust controllers to increase HVAC system performance is work are as follows:
examined in Anderson et al. (2008) Temperature and relative
humidity have a greater direct impact on the functioning of  A NTSMC controller is developed in model-free
HVAC systems in almost all applications (Chi et al., 2006; framework to stabilize the temperature and relative
Salazar et al., 2007). Several performance measures, such as humidity in HVAC systems.
the energy economy index, thermal comfort index, and pre-  The PPO algorithm with the actor-critic structure is
dicted mean vote, were utilized to evaluate the performance developed to tune the MINTSMC controller para-
of the various regulating approaches. Chiang and Fu (2006) meters in a DT context.
proposed a system to maintain good tracking performance  A combination of software-in-loop with a DT virtual
for temperature and humidity ratio regardless of parametric model using the FMI toolbox has been suggested.
uncertainties in the thermal space. Recently, in Hendel et al.  The superiority and high-performance of the HVAC
(2019), terminal sliding-mode control (TSMC) has been sug- system with the application of proposed controller
gested to control higher-order and MIMO systems. It has have been proven by illustration and analysis of simu-
been proven that nonsingular TSMC is a good overall lation results.
dynamics and chattering-free performance (Abrazeh et al.,
2021; Li et al., 2014; Mohammadi Moghadam et al., 2021b; This paper is organized as follows. The section ‘‘HVAC
Vo and Kang, 2018). system nonlinear model’’ presents the nonlinear model of the
The combination of machine learning and deep learning typical single-zone HVAC system. Then, the section ‘‘Design
techniques with different control methods are effective of proposed controller’’ introduces all parts of the proposed
approaches in automatic control of HVAC systems (Drees, controller. The ‘‘DT controller of HVAC system’’ section
2019; Khooban et al., 2014; Peng et al., 2018; Yang et al., focuses on the description of the DT concept and FMI tool-
2019). On the contrary, the various investigations using intel- box for implementing the proposed controller and its results
ligent and soft computing methods were presented in the area and evaluation scenarios. The results of the simulation and
of HVAC systems. Reinforcement learning (RL) is an attrac- also the implementation of the controller have been presented
tive area of machine learning that has been used widely in a in ‘‘Experimental results’’ section. Finally, the ‘‘Conclusion’’
plethora of processes including HVAC systems. For example, section summarizes the goals of the study and describes some
RL control has been applied for thermal energy storage in additional avenues for continuing research.
Wei et al. (2017) and Namatevs (2018). There are lots of
advantages of this type of learning; one of the most contribut-
ing of them is the ability to identify and tune parameters of HVAC system nonlinear model
intelligent controllers. Among the several types of RL meth-
odologies, proximal policy optimization (PPO) (Schulman This section is devoted to elaborating the mathematical mod-
et al., 2017) is a highly popular model-free, which is increas- eling of the nonlinear HVAC system. Figure 1 shows a single-
ingly being used in nonlinear systems. The key to PPO’s suc- zone HVAC schematic with hydronic heating and cooling
cess is mathematical and conceptual simplicity combined with coils (Jahedi and Ardehali, 2012). A heat cooling and heating
excellent or at least good enough performance in a variety of coil, a supply air fan, ductwork, filters, and air mixing dam-
problems (Hajihosseini et al., 2020; Kobayashi, 2021). pers are all part of it. The system’s inputs are supply airflow
To assess the controllers and tuning algorithms and besides rates and supply cooling water flow rates. The controlled
having a digital representation of a physical object or system, devices, which are the supply fan’s motor, the hot water valve,
a fast-growing concept is developed, called digital twin (DT) the chilled water valve, and the outdoor air damper, must be
(Mateev, 2020). A DT, unlike a virtual prototype, is a virtual recognized to establish a sequence of controls. The tempera-
representation of a physical system that is updated with the ture of the room being conditioned by this air handler is the
latter’s performance, maintenance, and health status data dur- controlled variable. The controller is a room thermostat (a
ing the actual system’s life cycle (Mittal et al., 2019). To deter- sensor and controller in the same enclosure).
mine the interface between established models, the Functional The notion of the HVAC system can be expressed using
Mock-up Interface (FMI) is widely adopted in the DT context differential equations based on the principles of mass and heat
(Centomo et al., 2020). Using the FMI, in addition to the transfer (Jahedi and Ardehali, 2012; Kang et al., 2014)
standard for co-simulation, the models can be exchanged in
the simulation tools. The FMI is an open standard for trans- dW3 (W2  W3 )fa dW3
= + ð1Þ
ferring dynamical simulation models in a consistent format dt Vz ra Vz
between different tools.
dT3 (T2  T3 )fa hfg (W2  W3 )fa 1
In this paper, we have proposed a real-time DT of PPO- =  + (Qz  hfg Mz )
dt Vz Cpa Vz ra Cpa Vz
based model-independent nonsingular terminal sliding-mode
control (MINTSMC) for temperature and relative humidity ð2Þ
control in a typical single-zone HVAC system. To achieve this
dT2 (T3  T2 )fa 0:25(T0  T3 )fa
goal, we have used a modified non-interacting control method = +
to decouple and separate temperature and relative humidity dt Vhe Vhe
ð3Þ
effects on each other in a thermal space. Implementation and hw (0:25W0 + 0:75W3  W2 )fa rw Dhw fw
 
presentation of the proposed controller DT is the most Cpa Vhe ra Cpa Vhe
Mohseni et al. 663

Figure 1. A schematic of single-zone HVAC systems.

The variables are defined in Table 1. The state variables g13 ð xÞ = a2 (W2 + 0:0214) + 2 3 104 a2 x1 + a1 x2
are defined as in equation (4)   ð8Þ
+ 2:776 3 104 a2  a1 x3
x1 = [r:3 : x2 = T2 :x3 = T3 g22 ð xÞ = b4 ð9Þ
u1 = fa : u2 = fw ð4Þ
y1 = [r:3 : y2 = T3 f1 ð xÞ = 5000a4 Mz  1:388(Qz  hfg Mz ) ð10Þ

f3 ð xÞ = a3 (Qz  hfg Mz ) ð11Þ


Then, equations (1)–(3) can be conveyed as equation (5)
2 3 2 3 2 3 2 3 where
x_ 1 f1 (x) g11 (x) 0
4 x_ 2 5 = 4 0 5 + 4 g12 (x) 5u1 + 4 g22 (x) 5u2 ð5Þ 60 60hfg 1 1
x_ 3 f3 (x) g13 (x) 0 a1 = , a2 = , a3 = , a4 =
Vz Cpa Vz ra Cpa Vz ra Vz
In this equation 60 15 60hw 6000
b1 = ,b = ,b = ,b =
Vhe 2 Vhe 3 Cpa Vhe 4 ra Cpa Vhe
g11 (x) = 5000a1 W2 + 107a1  a1 x1  1:388a1 x2
+ 1:388a2 W2  2:776 3 104 a2 x1 ð6Þ Several methods can be applied to control MIMO systems
4 3 such as HVAC systems. When independent loop control is
 3:85 3 10 a2 x3 + 29:7 3 10 a2
used for these types of multivariable processes without taking
into consideration the coupling, the consequent performance
g12 ð xÞ = 1:5 3 104 b3 x1  b1 x2
  is frequently poor. The coupling issue should be addressed in
+ b1  b2  2:1 3 104 b3 x3 ð7Þ the control design to improve performance. Due to this fact,
+ b2 T0  0:25b3 W0 + b3 W2 + 0:016b3 the above-described model is bilinear. Therefore, there is no
664 Transactions of the Institute of Measurement and Control 45(4)

Table 1. HVAC model parameters description. The application of the previous control law to the aug-
mented system in equation (12) results in a closed-loop system
Parameter Description Value Unit with a new set of coordinates z(x)eR4 as follows
Cpa Specific heat of air Cpa 1.004 (kJ=k g 8C) z_ 1 = z2 ð16Þ
fa Volumetric flow rate of air 8.02 (m3 =s)
fw Volumetric flow rate of water 0.00366 (m3 =s) z_ 2 = V1 ð17Þ
hw Enthalpy of water 790.84 (kJ=k g )
hfg Enthalpy of water vapor 2500.45 (kJ=k g ) z_ 3 = z4 ð18Þ
Mz Moisture load of thermal zone 0.021 (k g =s)
Qz Thermal load 84.93 (kJ=s) z_ 4 = V2 ð19Þ
V he Volume of heat exchanger 1.7198 (m3 )
Vz Volume of thermal zone 1655.11 (m3 ) where V1 and V2 are the revised controlled variables, and they
Wo Humidity ratio of outdoor 0.018 (k g =k g ) are capable of influencing T3 and [r:3 , on a per-individual
W2 Humidity ratio of supply air 0.007 (k g =k g ) basis. The new state variables in equations (16)–(19) are
To Temperature of outdoor 23.88 (8C) defined as follows
ra Density of air 1.185 (k g =m3 )
rw Density of water 1000 (k g =m3 ) z1 = [r:3 ð20Þ

z2 = f1 ð xÞ + g11 ð xÞfa ð21Þ


way to choose decoupling or non-interacting control methods.
It is illustrated that it becomes necessary to make a change in z 3 = T3 ð22Þ
one control variable such as T3 , utilizing modulating a con-
trolled variable means, fa , the other control variables, namely, z4 = f3 ð xÞ + g13 ð xÞfa ð23Þ
T3 and [r:3 , are also forced to change. For the HVAC system
It must be witnessed that the controller variables of fa and
described above, there is no static non-interacting controller.
fw do not appear in the newly revised system of equations, as
Dynamic extension, on the contrary, can be used to achieve 1 and V2 already contain fa and fw . In Table 1, a description
V
non-interacting control. Adding an integrator to the input u1
of the HVAC model parameters which is used in equations is
results in the augmented system shown below (Jahedi and
presented.
Ardehali, 2012)
2 3 2 3 2 3
f1 ð xÞ + g11 ð xÞx4 0 0 Design of proposed controller
6 g12 ð xÞx4 7 607 6 g ð xÞ 7
6 7 6 7 6 22 7 In this section, an intelligent optimal controller is developed
x_ = 6 7 + 6 7u1 + 6 7u2
4 f3 ð xÞ + g13 ð xÞx4 5 4 0 5 4 0 5 in a model-free framework to stabilize a nonlinear HVAC
ð12Þ
x4 1 1 system. First, the nonlinear terminal sliding-mode control
y1 = h1 ð xÞ = x1 (NTSMC) with sliding-mode (SM) observer is adopted to reg-
y2 = h2 ð xÞ = x2 ulate the output of HVAC. Since the many key parameters
are embedded in the structure of the NTSMC controller, the
PPO is used to adjust the parameters in a multi-objective
In this equation, u1 is now the integrator input and x4 = fa is
manner.
a new state variable. The HVAC augmented system described
by equation (12) allows for decoupling, and the non-
interacting control law is obtained by nonlinear decoupling Model-independent NTSMC technique
theory resulting in the following expression
In the following, the method of NTSMC will be proposed.
  This technique is used for general nonlinear and plant uncer-
u1 1
= tainty such as the HVAC system. Also, the NTSMC control-
u2 a1 x4 b4 ðg11 ð xÞ + 1:388g13 ð xÞÞ
   ð13Þ ler is implemented in the model-free to decrease the
a1 x4 b4 1:388a1 x4 b4 V 1  B1 ð xÞ dependency of the present controller to the WT model and
3
g13 ð xÞ g11 ð xÞ V 2  B2 ð xÞ handle the plant nonlinearity (Zhao et al., 2019). The wind
  turbine dynamics can be demonstrated by the following
B1 ð xÞ = a1 x4 + 2:776 3 104 a2 x4 equation
ðf1 ð xÞ + g11 ð xÞx4 Þ  ð1:388a1 x4 Þ
ð14Þ y(l) (t) = e + auv (t) ð24Þ
g12 ð xÞx4  3:85 3 104 a2 x4 ðf3 ð xÞ + g13 ð xÞx4 Þ
 g11 ð xÞx4 where u and y(t) are the signal input and model system
dynamics terms, respectively; the derivative order is
B2 ð xÞ = 2 3 104 a2 x4 ðf1 ð xÞ + g11 ð xÞx4 Þ given by v; e is the unknown structure of the plant; and a is
a constant parameter. To define the sliding surface, the
+ a1 x4 g12 ð xÞx4 + (2:776 3 104 a2  a1 ) ð15Þ
state variables x1 and x2 are assumed as below equation
ðf3 ð xÞ + g13 ð xÞx4 Þx4  g13 ð xÞx4
Mohseni et al. 665

Figure 2. Block diagram of MINTSMC scheme with SM observer.

8
< Ðt Design of SM observer
x1 = e1 (t)dt
ð25Þ
: 0 In this section, by using SM observer, the unknown parameter
x2 = e1 (t)
e in equation (24) as ^e will be estimated (Zhang et al., 2017) as
the following equation
According to equation (25), the state-space models are
defined as :
^x = auv + ksgn(x  ^x) ð30Þ

x_ 1 = x2 where k is the designed parameter.
ð26Þ
x_ 2 = x_   auv  e
Definition: Let’s define the observer error as
By defining a second-order NTSM presented in equation (27)
with the designed parameter g, and p and q parameters chosen
to satisfy inequality 1\p=q\2 e2 = x  ^x ð31Þ

s1 = x1 + gx2
p=q
ð27Þ After that, the dynamic error equation can be achieved by
subtracting equation (24) from equation (30)
By differentiating equation (27), equation (28) is obtained as
follows e_ 2 = e  ksgn(e2 ) ð32Þ

p p1
s_ 1 = x_ 1 + g x2q x_ 2 Theorem 2 (Zhao et al., 2019): Considering s2 = e2 , If the para-
q
p p=q1  ð28Þ meter k is properly set, then e_ 2 ! 0 in limited time.
= x2 + g x2 ðx_  auv  eÞ
q
Figure 2 illustrates the block diagram of the MINTSMC
with SM observer (Zhao et al., 2019).
Theorem 1 (Zhao et al, 2019): The system error converges to
zero in finite time by choosing the state-space equation given
in equation (25), the NTSMC is given in equation (27), and RL
control law is formulated as equation (29)
RL is an efficient approach to handle a complex controller
that is expected to adapt to different situations of systems. It
  has taken advantage of neural network methods and allows
1 1 q 2p=q representing complex behaviors, especially in nonlinear sys-
uv = ^e + x_  + x2 + h1 sgn(s1 ) + h2 s1 ð29Þ
a gp tems with huge complexities. Besides, RL algorithms benefit
from data samples efficiently, which leads to a more stable
In equation (29), ^e is the approximate value of e, h1 and h2 learning process without being biased. RL is performed by
are the designed parameters, and h1 . k^e  ek + m(m . 0). different algorithms in order to find the best path toward the
666 Transactions of the Institute of Measurement and Control 45(4)

optimal solution in a specific situation. But, practical imple- continuous progress. In order to estimate the samples of the
mentation of RL which mainly deals with continuous control policy loss function and the gradients of the loss function, the
problems, has faced many challenges including inputs and Monte Carlo (MC) method could be used as below
outputs continuous space, and divergence of learning. " #
From a point of view, RL can be divided into value-based X
and policy-based algorithms. Unlike the value-based algo- J ðuÞ = ET ;pu ðtÞ Rðst , at Þ = ET ;pu ðtÞ ½Rðt Þ ð33Þ
t
rithms, the policy-based algorithms have no convergence
problem and deal perfectly with continuous control tasks. In " #
X
T
this category, trust domain strategy optimization (TRPO) and ru J ðuÞ = ET ;pu ðtÞ ru logpu ðat jst ÞR(t) ð34Þ
PPO are typical strategy optimization methods. PPO algo- t=1
rithm which is more evolved than TRPO is a simpler and
more general approach than TRPO. Therefore, in order to where p is the optimal policy that is adopted by the algorithm
control the temperature and humidity of outdoor air using to act on an MDP-based environment, and R is reward value
HVAC, which has a continuous state and action task, the at time step t.
PPO-based sliding-mode control method is proposed to adjust By representing a new definition of the value function, the
the controller parameters accurately. actor-critic architecture makes a significant impact on this
Regarding the implementation of the proposed algorithm, goal
Markov Decision Process (MDP) is a platform characterized X
by the following tuples {S, A, r, p, g}, by which an RL task Qp ðs, aÞ = Epu ½R(st , at )js:a ð35Þ
can be described: t
X
V p ðsÞ = Epu ½R(st , at )js ð36Þ
 State: the state S e R is the current situation of the
n
t
agent in the environment, and is considered as input
of actor-network. Ap ðs, aÞ = Qp ðs:aÞ  V p ðsÞ ð37Þ
 Action: the action A e Rm , the output of the agent, is
the possible move that the agent can make. The effectiveness of an action signal is compared to the others
 Reward: r : S 3 A ! R which is used as evaluation available in that state by the advantage function Ap (s:a) ,
criteria, is the feedback from the environment which while the value function V (s) measures how good it is to be in
determines how successful the agent’s actions will be. that state. The critic network task is the prediction of the
 Transition Function: The transition function value function by analyzing the cumulative receiving rewards.
p : S 3 A 3 S ! ½0, 1 which represents the probability The PPO algorithm is aimed to maximize the objective
of transition to a new state st + 1 , and estimate a function which is presented in the following equation
reward r under executing action at on the state st , are   
considered as a part of the environment. LðuÞ = E
^ t min rt ðuÞA
^ t : clip(rt ðuÞ :1  e : 1 + e)A
^t ð38Þ
 Discount factor: the discount factor g e ½01 , weaken
In equation (38), A ^ and E ^ refer to the approximation of the
the rewards’ effect on the agent’s choice of action to
value the rewards at different time steps. advantage function and expectation, epsilon e is a hyper-para-
meter, and rt (u) denotes the probability ratio as in equation (46)
The agent acquires information about an efficient action
pu ðat , st Þ
selection according to the current state and received feedback rt ðuÞ = ð39Þ
from the interaction by the environment. This interaction goal puold ðat , st Þ
 ‘ 
P t
is maximizing the cumulative rewards E g rt in the pres- According to equation (39) scheme, the change in probability
t=0 ratio is ignored when it would make the objective improve,
ence of environmental uncertainties. and the change is included when it makes the objective worse.
Figure 3 illustrates a single frame (at time step t) of L function.
It is obvious that the probability ratio r is clipped at 1  e or
The PPO algorithm 1 + e depending on whether the advantage function value is
Compared with the traditional neural network and actor- positive or negative.
critic-based algorithms, the PPO achieves the best optimal Due to obtaining the expectation of samples from an old
balance in algorithm complexity, robustness, precision, and policy under the new policy, PPO uses the concept of sam-
ease of implementation. pling. Therefore, each sample is used for several gradient
Many RL algorithms, including the PPO algorithm, ascent steps. Refining the new policy results in both old and
impose a Kullback–Leibler divergence constraint between new policies divergence, and hence, an increase in the var-
successive policies during parametric policy iteration to avoid iance of the estimation. Also, the old policy would be updated
large steps toward unknown regions of the state space. This to the new policy. The existence of a similar state transition
constraint results in optimum convergence during the optimi- function with a clipped probability ratio in the region
zation process. ½1  e , 1 + e guarantees to reach the mentioned purpose.
The policy gradient methods ought to lessen the variance The pseudo-code depiction of the standard PPO algorithm
of the gradient estimations toward better policies causing is shown in Algorithm 1.
Mohseni et al. 667

Algorithm 1. PPO, Actor-Critic Style.

for iteration = 1, 2, . max-iter do


for actor = 1, 2, .N do
Run policy puold in environment for T time steps
Compute advantage estimates A ^r
^1 , . . . , A
end for
Optimize surrogate L wrt u, with K epochs and minibatch size M<NT
uold u
end for

Figure 3. A single time step of L function. digital technological capabilities as well as exorbitant com-
pute, storage, and bandwidth costs, the DT and the huge
amounts of data it processes have remained elusive to organi-
zations until recently. Obstacles like this, on the contrary,
Multi-objective RL framework have become far less common in recent years (Grieves, 2014;
To handle an issue with more than one control problem, a Jahanshahi Zeitouni et al., 2020; Parvaresh et al., 2020;
multi-objective deep RL method for simultaneously making Zavareh et al., 2021).
decisions to reach all objectives is adopted. The key difference Companies may be able to fix physical difficulties faster by
between single and multi-objective RL relies on the use of recognizing them earlier, anticipating outcomes with greater
reward strategies, whether scalar reward or vector reward is precision, designing and building better goods, and, ulti-
used (which are utilized for single- and multi-objective prob- mately, better serving their consumers, thanks to the DT.
lems, respectively). Companies may gain value and benefits iteratively and faster
A multi-objective RL is implemented in a multi-objected than ever before with this form of smart architectural design
MDP environment. A multi-objective MDP uses an n element (Grieves, 2014).
reward vector of n rewards based on the number of objectives.
A policy p with a multi-objective value V p converts to a sca-
lar value by the function f which is a scalarization function. Combine SIL with DT virtual model using FMI toolbox
The output of this function is the weighted linear composition
of objective values as f (V p , w) = w  V p . In this function, The first requirement of this plan, implementing the HVAC
weights determine the importance of objectives. The solution system controller on a TMS320F28379D Microcontroller
is a set of policies (which are based on function f and named (TI) that is established as hardware-in-loop (HIL). For this
convex coverage set) that contains at least one optimal policy purpose, typical inputs (reference and feed-back) such as tem-
for each possible preference. peratures and humidity (reference) and measurement data
points (feed-back) have been transmitted by the standard
communication protocol. In SIL, the output controller data
DT controller of HVAC system of HIL are applied to the HVAC system. After that, the vir-
tual model or DT of the controller is designed by real hard-
DT concept ware components.
Indeed, digital solutions have the potential to provide tremen- Building HVAC control strategies are frequently estab-
dous value to a company that could never have been realized lished and implemented in a Building Automation System
before the emergence of linked, smart technologies (BAS) that is unable to interact easily with the building
(Mohammadi Moghadam et al., 2021a). Due to limits in HVAC system steady-state or dynamic models for

Figure 4. An overview of design with FMU.


668 Transactions of the Institute of Measurement and Control 45(4)

Figure 5. The structure of the proposed PPO-based sliding-mode controller.

performance evaluation via closed-loop simulation. HIL test- FMI is designed for commercial simulators to transform
ing, for example, necessitates hardware infrastructure, which their models to a normative form (FMI specification). With
is time-consuming and costly to set up and maintain. The vir- respect to the two offered operation modes, model exchange
tual model necessitates manual control logic translation from and co-simulation, we prefer co-simulation that allows
several vendor-specific languages to SIMULINK. When individual functional mock-up units (FMUs) to use their own
replicating a real-time system that requires rapid iterations, dedicated solver engines. The development of the FMI
SIL testing is a cost-effective way to ensure that the software standard has enabled software-in-the-loop simulation
can meet the demands. The code generated from the control- with dynamic system models from different software
ler model is the first step in SIL testing. This code is then put environments.
through its paces in a virtual environment, with no hardware, Figure 4 illustrates creating an integrated tool chain for the
to see how effectively it handles the simulated system. When model of HVAC systems by FMI co-simulation.
using various forms of input conditions, functions, and math-
ematical methods, tests are run to ensure that the code works
exactly as the model. SIL, like HIL, also offers the benefit of Proposed PPO-based sliding-mode control method
faster than real-time simulation, making full annual analysis
feasible. The new proposed PPO-based sliding-mode controller bene-
Develop and build the suggested DT controller with co- fits from a multi-objective PPO algorithm to estimate the con-
simulation as the following stage. Co-simulation is a tech- trol parameters adaptively. This method is especially effective
nique for resolving multi-physics model integration problems. in practical applications such as the HVAC system to control
It represents a specific type of simulation scenario in which at the outdoor temperature and humidity, which deals with
least two simulators are used to solve connected algebraic many unknown and nonlinear uncertainties and perturba-
equations and exchange data during simulation (Jinzhi et al., tions. The complete structure of the proposed controller is
2016). To determine this interface and the established models, depicted in Figure 5, where an actor-critic neural network
the FMI is widely adopted in the DT context. based on the multi-objective RL framework is adopted to
Engineers can use the FMI standard to exchange or co- tune the sliding-mode controller parameters ½h1T , h2T , g T ,
simulate dynamic models from many disciplines. In this way, h1H , h2H , g H  accurately. It is necessary to mention that nota-
FMI can broaden the scope of building and energy system tion ‘‘T’’ refers to the controller which controls the tempera-
simulation applications. It can also assist in overcoming simu- ture, and notation ‘‘H’’ refers to the controller which controls
lation’s current and future limitations (Schwan et al., 2017). the humidity. More specifically, the agent generates tuning
Using the FMI, in addition to the standard for co-simulation, signals ½h1T (t + 1), h2T (t + 1), gT (t + 1) and ½h1H (t + 1),
the models can be exchanged in the simulation tools. h2H (t + 1), gH (t + 1) according to the state variables and
Mohseni et al. 669

Figure 6. A proposed strategy for the combination of HIL and SIL testing.

Figure 7. Humidity and temperature used in the scenario.


Figure 9. HVAC output comparative results of sliding mode–PPO,
sliding mode–PSO, and PID-PSO controllers for temperature.

also regulatory signals ½dh1T (t), dh2T (t), dg T (t) and


½h1H (t), dh2H (t), dgH (t) Ðas shown in equation
Ð (40). The state
variables st = Tr , eTr , eTr dt, Hr , eHr , eHr dt include refer-
ence temperature and humidity, temperature and humidity
error, and also their integral

h1T (t + 1) = h1T (t) + dh1T (t)


h2T (t + 1) = h2T (t) + dh2T (t)
g T (t + 1) = gT (t) + dgT (t)
ð40Þ
h1H (t + 1) = h1H (t) + dh1H (t)
h2H (t + 1) = h2H (t) + dh2H (t)
g H (t + 1) = g H (t) + dg H (t)

Finally, the reward function has been created with the multi-
Figure 8. HVAC control board and its digital twin full training rewards. objective design in the DT-based controller for both of the
670 Transactions of the Institute of Measurement and Control 45(4)

Figure 10. HVAC digital twin output comparative results of sliding Figure 12. DT output comparative results.
mode–PPO, sliding mode–PSO, and PID-PSO controllers for
temperature.
1
RewardFunctionSIL = w1
absðeT HIL  eT SIL Þ
ð42Þ
1
+ w2
absðeH HIL  eH SIL Þ

where weights determine each objective priority. In these


equations, eT HIL and eH HIL are output temperature and humid-
ity errors in HIL, and eT SIL and eH SIL are output temperature
and humidity errors in SIL.
The PPO reward function in the HIL setup tries to reduce
the error of system outputs (humidity and temperature errors)
in equation (41). In equation (42), the difference between the
system outputs in HIL and SIL is removed by the reward
function.
Finally, it can be concluded that in this paper, DT control-
ler is designed in such a way that the outputs of the SIL sys-
tem have similar behavior with the outputs of the physical one
(HIL’s system outputs) based on Figure 6.

Figure 11. HVAC digital twin output comparative results of sliding Experimental results
mode–PPO, sliding mode–PSO, and PID-PSO controllers for humidity.
The decisive difference between the proposed method and the
SM controller is that this method benefits from the PPO algo-
HIL and SIL environments. There are two reward functions rithm to adjust the control coefficients, both in the hardware
of the PPO algorithm for HIL and SIL design. The quality of model and in the DT Platform. And that is why the accuracy
the tuning signal produced by the agent is evaluated by the of the controller’s adjustment coefficients has a tremendous
summation of weighted objective rewards value which is effect on the controller’s performance.
obtained based on humidity and temperature. For this pur- In this paper, to design the DT of the HVAC control
pose, there two reward function declarations are proposed board, initially, the SM controller is implemented on the TI
with the following rules hardware board and then its optimal coefficients are calcu-
lated in interaction with the PPO algorithm that has been
1 1 implemented in the MATLAB–SIMULINK environment.
RewardFunctionHIL = w1 + w2 ð41Þ First, the initial values are dedicated to the control
eT HIL 2 eH HIL 2
Mohseni et al. 671

Table 2. Comparative numerical results in terms of MSE, RMSE, ISE, and IAE errors.

Control methodology Performance index

Humidity Temperature

MSE RMSE ISE IAE MSE RMSE ISE IAE

DT-PID 0.0793 0.2817 0.8548 e + 4 7.293 e + 4 0.0611 0.2472 0.6601e + 3 1.2809 e + 4


DT-SM 0.0415 0.2036 0.4466 e + 4 3.317 e + 4 0.0419 0.2049 0.453 e + 3 0.7819 e + 4
DT-SM_PPO 0.0075 0.0868 0.0811 e + 4 1.9526 e + 4 0.0179 0.1339 0.1939 e + 3 0.6613 e + 4
HVAC-PID 0.1867 0.4321 2.0117e + 4 11.013 e + 4 0.0900 0.3001 0.973 e + 3 1.2525 e + 4
HVAC-SM 0.0417 0.2042 0.4491 e + 4 7.681 e + 4 0.0603 0.2456 0.6521 e + 3 1.2485 e + 4
HVAC-SM-PPO 0.0113 0.1063 0.1218 e + 4 3.984 e + 4 0.0208 0.1444 0.2254 e + 3 0.6841 e + 4

MSE: mean squared error; RMSE: root-mean-square error; ISE: integral of the squared error; IAE: integral absolute error.

Figure 13. HVAC and its digital twin output comparative results of sliding mode–PPO controller for temperature and humidity.

coefficients of NTSMC controller to offer a primary regula- In the next step, the hardware board control model is
tion for HVAC system by trial and error. Then, the PPO implemented in MATLAB–SIMULINK environment using
algorithm is adopted to adjust the controller parameters by equivalent hardware blocks and receives the output signals
employing the capability of deep neural networks. generated in the previous step as input. Finally, the coeffi-
For comparison purposes, the SM controller and PID cients of this DT platform are estimated using the PPO
controller are designed by particle swarm optimization (PSO) algorithm.
algorithm. In this way, the controller parameters in HIL and In general, the designed platform is expected to track
SIL are designed by minimizing the following objective hardware model behavior very well. In this section, under the
functions scenario in which temperature and humidity vary, the perfor-
mance of the proposed control algorithm as well as the DT of
ð ð
the HVAC control board is examined. In this scenario, as
Obj: HIL = w1 eT HIL 2 :dt + w2 eH HIL 2 :dt ð43Þ shown in Figure 7, temperature and humidity change in the
range of 18%–23% in degrees Celsius and 37%–44.5%,
ð respectively. Figure 8 illustrates the average of accumulated
Obj: SIL = w1 absðeT HIL  eT SIL Þ:dt rewards for the full-training steps in the HVAC hardware
ð ð44Þ control board and the DT platform. According to the details,
+ w2 absðeT HIL  eT SIL Þ:dt it is concluded that the hardware and DT structure humidity
and temperature error decline. And this is shown by keeping
672 Transactions of the Institute of Measurement and Control 45(4)

the graph constant in episodes 30 and above at 83 3 104 and References


61 3 104, respectively. Abrazeh S, Parvaresh A, Mohseni S-R, et al. (2021) Nonsingular ter-
Figures 9–12 depict the outputs of the temperature and minal sliding mode control with ultra-local model and single input
humidity in the hardware model, as well as its DT model interval type-2 fuzzy logic control for pitch control of wind tur-
using PID controllers, SM, and PPO-SM. bines. IEEE/CAA Journal of Automatica Sinica 8(3): 690–700.
As can be seen comparatively, the best performance is pro- Afram A and Janabi-Sharifi F (2014) Theory and applications of
vided by the PPO-SM controller in both the hardware control HVAC control systems—A review of model predictive control
board and its DT. Besides, the numerical results in terms of (MPC). Building and Environment 72: 343–355.
mean squared error, root-mean-square error, integral of the Andersonetal M (2008) MIMO robust control for HVAC systems.
IEEE Transactions on Control Systems Technology 16(3): 475–483.
squared error, and integral absolute error for both tempera-
Centomo S, Lora M and Fummi F (2020) Generation of functional
ture and humidity outcomes are presented in Table 2, which
mockup units for transactional cyber-physical virtual platforms.
confirms the superiority of the proposed algorithm. In: Kazmierski T, Steinhorst S and Große D (eds) Languages,
In order to validate the designed DT performance, Figure Design Methods, and Tools for Electronic System Design. Cham:
13 shows the humidity and output temperature of the HVAC Springer, pp. 27–46.
system under the proposed algorithm in the real system and Chi P-H, Weng F-B, Su A, et al. (2006) Numerical modeling of proton
its DT. The high similarity in the displayed outputs proves the exchange membrane fuel cell with considering thermal and relative
same behavior of both systems, and thus the correct perfor- humidity effects on the cell performance. Journal of Fuel Cell Sci-
mance of the designed DT. ence and Technology 3: 292–302.
Chiang M-L and Fu L-C (2006) Adaptive and robust control for non-
linear HVAC system. In: 2006 IEEE international conference on
Conclusion systems, man and cybernetics, Taipei, 8–11 October, pp. 64982–
64987. New York: IEEE.
In this paper, an innovative DT idea is considered for HVAC
Drees KH (2019) Building management system with augmented deep
systems. An adaptive controller-based NTSMC is designed for learning using combined regression and artificial neural network
control temperature and humidity of nonlinear and MIMO modeling. Google Patents, patent grant number: 10935940, Cedar-
HVAC systems. Besides, by employing the adaptive capability burg, WI, USA.
of the PPO algorithm, the NTSMC parameters will be tuned Grieves M (2014) Digital twin: Manufacturing excellence through vir-
appropriately, and it leads to having an efficient controller. By tual factory replication. White Paper 1: 1–7.
consuming a nonlinear decoupling method, the MIMO HVAC Hajihosseini M, Andalibi M, Gheisarnejad M, et al. (2020) DC/DC
system is changed to two independent single-input and single- power converter control-based deep machine learning techniques:
output systems. To develop the DT concept, the electronic Real-time implementation. IEEE Transactions on Power Electro-
nics 35(10): 9971–9977.
board has been simulated as part of the HVAC system, and it
Hendel R, Khaber F and Essounbouli N (2021) Adaptive high order
is the main and most important part of this paper. On the con-
sliding mode controller/observer based terminal sliding mode for
trary, the behavior electronic board is black boxes, so for the MIMO uncertain nonlinear system. International Journal of Con-
creation of FMUs, the FMI standard is employed. Inclusively, trol 94: 486–506.
the outcomes of the HVAC system revealed that the proposed Jahanshahi Zeitouni M, Parvaresh A, Abrazeh S, et al. (2020) Digital
controller is more efficient than the other controllers (PID and twins-assisted design of next-generation advanced controllers for
NTSMC designed by PSO algorithm). To verify the efficiency power systems and electronics: Wind turbine as a case study.
of the suggested DT controller, the results are compared, and Inventions 5(2): 19.
it is shown that the difference between HVAC DT output and Jahedi G and Ardehali M (2012) Wavelet based artificial neural net-
HVAC system is very small. It means that the electrical board work applied for energy efficiency enhancement of decoupled
as part of the HVAC system is well implemented and simu- HVAC system. Energy Conversion and Management 54(1): 47–56.
Jinzhi L, Chen D, Törngren M, et al. (2016) A model-driven and tool-
lated in a DT board. As future work, the adaptive potentials
integration framework for whole vehicle co-simulation environ-
of PPO algorithm can be adopted for designing other control ments. In: 8th European congress on embedded real time software
methodologies in the context of DT. and systems (ERTS 2016), Toulouse, January.
Kang C-S, Park J-I, Park M, et al. (2014) Novel modeling and con-
trol strategies for a HVAC system including carbon dioxide con-
Declaration of conflicting interests
trol. Energies 7(6): 3599–3617.
The author(s) declared no potential conflicts of interest with Khooban MH, Abadi DNM, Alfi A, et al. (2014) Optimal type-2
respect to the research, authorship, and/or publication of this fuzzy controller for HVAC systems. Automatika 55(1): 69–78.
article. Kobayashi T (2021) Proximal policy optimization with relative Pear-
son divergence. In: IEEE international conference on robotics and
automation (ICRA), Xi’an, China, 30 May–5 June, pp. 8416–
Funding 8421. New York: IEEE.
Li P, Ma J, Zheng Z, et al. (2014) Fast nonsingular integral terminal
The author(s) received no financial support for the research,
sliding mode control for nonlinear dynamical systems. In: 53rd
authorship, and/or publication of this article. IEEE conference on decision and control, Los Angeles, CA, 15–17
December, pp. 4739–4746. New York: IEEE.
ORCID iD Mateev M (2020) Industry 40 and the digital twin for building indus-
try. Industry 4.0 5(1): 29–32.
Meysam Gheisarnejad https://orcid.org/0000-0001-7370- Mittal S, Tolk A, Pyles A, et al. (2019) Digital twin modeling, co-
5106 simulation and cyber use-case inclusion methodology for IOT
Mohseni et al. 673

systems. In: 2019 winter simulation conference (WSC), National Schulman J, Wolski F, Dhariwal P, et al. (2017) Proximal policy opti-
Harbor, MD, 8–11 December 2019, pp. 2653–2664. New York: mization algorithms. arXiv preprint arXiv170706347.
IEEE. Schwan T, Unger R and Pipiorke J (2017) Aspects of FMI in building
Mohammadi Moghadam H, Foroozan H, Gheisarnejad M, et al. (2021a) simulation. In: Proceedings of the 12th international Modelica con-
A survey on new trends of digital twin technology for power systems. ference, Prague, 15–17 May, vol. 132, pp. 73–78. Linköping: Lin-
Journal of Intelligent & Fuzzy Systems 41: 3873–3893. köping University Electronic Press.
Mohammadi Moghadam H, Gheisarnejad M, Yalsavar M, et al. Vo AT and Kang H-J (2018) An adaptive terminal sliding mode con-
(2021b) A novel nonsingular terminal sliding mode control-based trol for robot manipulators with non-singular terminal sliding sur-
double interval type-2 fuzzy systems: Real-time implementation. face variables. IEEE Access 7: 8701–8712.
Inventions 6(2): 40. Wei T, Wang Y and Zhu Q (2017) Deep reinforcement learning for
Namatevs I (2018) Deep reinforcement learning on HVAC control. building HVAC control. In: Proceedings of the 54th annual design
Information Technology & Management Science 21. Available at: automation conference 2017, Austin, TX, 18–22 June 2017, pp. 1–
https://itms-journals.rtu.lv/article/view/itms-2018-0004 6. New York: IEEE.
Parvaresh A, Abrazeh S, Mohseni S-R, et al. (2020) A novel deep learn- Yang C, Shen W, Gunay B, et al. (2019) Toward machine learning-
ing backstepping controller-based digital twins technology for pitch based prognostics for heating ventilation and air-conditioning sys-
angle control of variable speed wind turbine. Designs 4(2): 15. tems. ASHRAE Transactions 125: 106–115.
Peng Y, Rysanek A, Nagy Z, et al. (2018) Using machine learning Zavareh B, Foroozan H, Gheisarnejad M, et al. (2021) New trends
techniques for occupancy-prediction-based cooling control in on digital twin-based blockchain technology in zero-emission ship
office buildings. Applied Energy 211: 1343–1358. applications. Naval Engineers Journal 133(3): 115–135.
Pérez-Lombard L, Ortiz J and Pout C (2008) A review on buildings energy Zhang C, Wu G, He J, et al. (2017) Sliding observer-based demagne-
consumption information. Energy and Buildings 40(3): 394–398. tisation fault-tolerant control in permanent magnet synchronous
Salazar R, López I and Rojano A (2007) A neural network model to motors. The Journal of Engineering 2017(6): 175–183.
predict temperature and relative humidity in a greenhouse. In: Zhao K, Yin T, Zhang C, et al. (2019) Robust model-free nonsingu-
International symposium on high technology for greenhouse system lar terminal sliding mode control for PMSM demagnetization
management: Greensys2007 801: 539–546. fault. IEEE Access 7: 15737–15748.

You might also like