Professional Documents
Culture Documents
Mohseni Et Al 2022 Fmi Real Time Co Simulation Based Machine Deep Learning Control of Hvac Systems in Smart Buildings
Mohseni Et Al 2022 Fmi Real Time Co Simulation Based Machine Deep Learning Control of Hvac Systems in Smart Buildings
Digital-twins technology
Abstract
As heating, ventilation, and air conditioning (HVAC) systems have become one of the most contributing systems in energy consumption in the world,
the control of these large-scale systems remains a challenging duty due to the decoupling effects of control variables. Accordingly, the penetration of
these types of systems in all-smart buildings has increased in recent years. Furthermore, the application of digital twin as a fast-growing concept is being
developed. In HVAC systems, independent and accurate control of temperature and humidity of the indoor air has been playing an undeniable role in
reducing energy consumption. In this paper, to have cost-effective energy management in a single-zone HVAC system, a new reliable digital twin proxi-
mal policy optimization (PPO)–based model-independent nonsingular terminal sliding-mode control (MINTSMC) methodology has been proposed.
Moreover, due to the nonlinear characteristics of HVAC systems, MINTSMC tends to handle the un-modeled system dynamics and disturbances. For
regulating parameters of proposed control, an efficient PPO algorithm has been developed due to its actor-critic-based reinforcement learning.
Extensive examinations and comparative analyses with particle swarm optimization designed sliding-mode control and proportional–integral–derivative
controller have been made using digital twin of the proposed controller to show the importance, accuracy, and application of this method in the com-
fort and energy management achievement of HVAC control systems. A digital signal processor computing device has been utilized for implementation
by utilizing hardware-in-loop (HIL) in the concept of the digital twin. To determine the interface between established models, software-in-loop, and
HIL, the Functional Mock-up Interface has been utilized. The outcomes revealed a superior performance of suggested digital twin-based controller than
the compared control methodologies in the compensation of unknown uncertainties, fast-tracking, and smooth response.
Keywords
HVAC system, PPO algorithm, nonsingular terminal sliding-mode controller, hardware-in-loop, digital twin (DT)
Introduction and preliminaries derivative (PID) controller and ON/OFF controllers are nei-
ther cost-effective, energy-efficient, nor reliable, numerous
Nowadays, heating, ventilation, and air conditioning (HVAC) nonlinear methods have recently been investigated to increase
systems are key components of building mechanical systems HVAC system efficiency (Afram and Janabi-Sharifi, 2014).
that offer inhabitants thermal comfort and good indoor air
quality. They are more commonly utilized in a variety of 1
Shiraz University, Iran
structures, including industrial, commercial, residential, and 2
Sharif University of Technology, Iran
institutional structures. Because the use of energy by HVAC 3
Shahid Bahonar University of Kerman, Iran
equipment in industrial and commercial buildings accounts 4
Department of Electrical and Computer Engineering, Aarhus University,
for 50% of global energy consumption (Pérez-Lombard et al., Denmark
2008), the most major difficulty in HVAC systems is energy
management (Pérez-Lombard et al., 2008). *Saeid-Reza Mohseni is now affiliated to Sharif University of Technology,
Despite the similarities between HVAC and other types of Tehran, Iran; Saber Abrazeh and Ahmad Parvaresh are now affiliated to
process control, some characteristics, such as nonlinear Faculty of Electrical Engineering and Computing, University of Zagreb,
Zagreb, Croatia.
dynamics, time-varying system dynamics and set-points, time-
varying disturbances, and conflicting control loops, make Corresponding author:
HVAC system control unique and hard (Afram and Janabi- Meysam Gheisarnejad, Department of Engineering Electrical and
Sharifi, 2014). To address these difficulties, a variety of con- Computer Engineering, Edison Finlandsgade 22, building 5125, 8200
trol approaches have been developed or proposed. Because Aarhus N, Denmark.
traditional techniques such as proportional–integral– Email: me.gheisarnejad@gmail.com
662 Transactions of the Institute of Measurement and Control 45(4)
The application of multiple-input, multiple-output (MIMO) contributing part of this paper. The main contributions of this
robust controllers to increase HVAC system performance is work are as follows:
examined in Anderson et al. (2008) Temperature and relative
humidity have a greater direct impact on the functioning of A NTSMC controller is developed in model-free
HVAC systems in almost all applications (Chi et al., 2006; framework to stabilize the temperature and relative
Salazar et al., 2007). Several performance measures, such as humidity in HVAC systems.
the energy economy index, thermal comfort index, and pre- The PPO algorithm with the actor-critic structure is
dicted mean vote, were utilized to evaluate the performance developed to tune the MINTSMC controller para-
of the various regulating approaches. Chiang and Fu (2006) meters in a DT context.
proposed a system to maintain good tracking performance A combination of software-in-loop with a DT virtual
for temperature and humidity ratio regardless of parametric model using the FMI toolbox has been suggested.
uncertainties in the thermal space. Recently, in Hendel et al. The superiority and high-performance of the HVAC
(2019), terminal sliding-mode control (TSMC) has been sug- system with the application of proposed controller
gested to control higher-order and MIMO systems. It has have been proven by illustration and analysis of simu-
been proven that nonsingular TSMC is a good overall lation results.
dynamics and chattering-free performance (Abrazeh et al.,
2021; Li et al., 2014; Mohammadi Moghadam et al., 2021b; This paper is organized as follows. The section ‘‘HVAC
Vo and Kang, 2018). system nonlinear model’’ presents the nonlinear model of the
The combination of machine learning and deep learning typical single-zone HVAC system. Then, the section ‘‘Design
techniques with different control methods are effective of proposed controller’’ introduces all parts of the proposed
approaches in automatic control of HVAC systems (Drees, controller. The ‘‘DT controller of HVAC system’’ section
2019; Khooban et al., 2014; Peng et al., 2018; Yang et al., focuses on the description of the DT concept and FMI tool-
2019). On the contrary, the various investigations using intel- box for implementing the proposed controller and its results
ligent and soft computing methods were presented in the area and evaluation scenarios. The results of the simulation and
of HVAC systems. Reinforcement learning (RL) is an attrac- also the implementation of the controller have been presented
tive area of machine learning that has been used widely in a in ‘‘Experimental results’’ section. Finally, the ‘‘Conclusion’’
plethora of processes including HVAC systems. For example, section summarizes the goals of the study and describes some
RL control has been applied for thermal energy storage in additional avenues for continuing research.
Wei et al. (2017) and Namatevs (2018). There are lots of
advantages of this type of learning; one of the most contribut-
ing of them is the ability to identify and tune parameters of HVAC system nonlinear model
intelligent controllers. Among the several types of RL meth-
odologies, proximal policy optimization (PPO) (Schulman This section is devoted to elaborating the mathematical mod-
et al., 2017) is a highly popular model-free, which is increas- eling of the nonlinear HVAC system. Figure 1 shows a single-
ingly being used in nonlinear systems. The key to PPO’s suc- zone HVAC schematic with hydronic heating and cooling
cess is mathematical and conceptual simplicity combined with coils (Jahedi and Ardehali, 2012). A heat cooling and heating
excellent or at least good enough performance in a variety of coil, a supply air fan, ductwork, filters, and air mixing dam-
problems (Hajihosseini et al., 2020; Kobayashi, 2021). pers are all part of it. The system’s inputs are supply airflow
To assess the controllers and tuning algorithms and besides rates and supply cooling water flow rates. The controlled
having a digital representation of a physical object or system, devices, which are the supply fan’s motor, the hot water valve,
a fast-growing concept is developed, called digital twin (DT) the chilled water valve, and the outdoor air damper, must be
(Mateev, 2020). A DT, unlike a virtual prototype, is a virtual recognized to establish a sequence of controls. The tempera-
representation of a physical system that is updated with the ture of the room being conditioned by this air handler is the
latter’s performance, maintenance, and health status data dur- controlled variable. The controller is a room thermostat (a
ing the actual system’s life cycle (Mittal et al., 2019). To deter- sensor and controller in the same enclosure).
mine the interface between established models, the Functional The notion of the HVAC system can be expressed using
Mock-up Interface (FMI) is widely adopted in the DT context differential equations based on the principles of mass and heat
(Centomo et al., 2020). Using the FMI, in addition to the transfer (Jahedi and Ardehali, 2012; Kang et al., 2014)
standard for co-simulation, the models can be exchanged in
the simulation tools. The FMI is an open standard for trans- dW3 (W2 W3 )fa dW3
= + ð1Þ
ferring dynamical simulation models in a consistent format dt Vz ra Vz
between different tools.
dT3 (T2 T3 )fa hfg (W2 W3 )fa 1
In this paper, we have proposed a real-time DT of PPO- = + (Qz hfg Mz )
dt Vz Cpa Vz ra Cpa Vz
based model-independent nonsingular terminal sliding-mode
control (MINTSMC) for temperature and relative humidity ð2Þ
control in a typical single-zone HVAC system. To achieve this
dT2 (T3 T2 )fa 0:25(T0 T3 )fa
goal, we have used a modified non-interacting control method = +
to decouple and separate temperature and relative humidity dt Vhe Vhe
ð3Þ
effects on each other in a thermal space. Implementation and hw (0:25W0 + 0:75W3 W2 )fa rw Dhw fw
presentation of the proposed controller DT is the most Cpa Vhe ra Cpa Vhe
Mohseni et al. 663
The variables are defined in Table 1. The state variables g13 ð xÞ = a2 (W2 + 0:0214) + 2 3 104 a2 x1 + a1 x2
are defined as in equation (4) ð8Þ
+ 2:776 3 104 a2 a1 x3
x1 = [r:3 : x2 = T2 :x3 = T3 g22 ð xÞ = b4 ð9Þ
u1 = fa : u2 = fw ð4Þ
y1 = [r:3 : y2 = T3 f1 ð xÞ = 5000a4 Mz 1:388(Qz hfg Mz ) ð10Þ
Table 1. HVAC model parameters description. The application of the previous control law to the aug-
mented system in equation (12) results in a closed-loop system
Parameter Description Value Unit with a new set of coordinates z(x)eR4 as follows
Cpa Specific heat of air Cpa 1.004 (kJ=k g 8C) z_ 1 = z2 ð16Þ
fa Volumetric flow rate of air 8.02 (m3 =s)
fw Volumetric flow rate of water 0.00366 (m3 =s) z_ 2 = V1 ð17Þ
hw Enthalpy of water 790.84 (kJ=k g )
hfg Enthalpy of water vapor 2500.45 (kJ=k g ) z_ 3 = z4 ð18Þ
Mz Moisture load of thermal zone 0.021 (k g =s)
Qz Thermal load 84.93 (kJ=s) z_ 4 = V2 ð19Þ
V he Volume of heat exchanger 1.7198 (m3 )
Vz Volume of thermal zone 1655.11 (m3 ) where V1 and V2 are the revised controlled variables, and they
Wo Humidity ratio of outdoor 0.018 (k g =k g ) are capable of influencing T3 and [r:3 , on a per-individual
W2 Humidity ratio of supply air 0.007 (k g =k g ) basis. The new state variables in equations (16)–(19) are
To Temperature of outdoor 23.88 (8C) defined as follows
ra Density of air 1.185 (k g =m3 )
rw Density of water 1000 (k g =m3 ) z1 = [r:3 ð20Þ
8
< Ðt Design of SM observer
x1 = e1 (t)dt
ð25Þ
: 0 In this section, by using SM observer, the unknown parameter
x2 = e1 (t)
e in equation (24) as ^e will be estimated (Zhang et al., 2017) as
the following equation
According to equation (25), the state-space models are
defined as :
^x = auv + ksgn(x ^x) ð30Þ
x_ 1 = x2 where k is the designed parameter.
ð26Þ
x_ 2 = x_ auv e
Definition: Let’s define the observer error as
By defining a second-order NTSM presented in equation (27)
with the designed parameter g, and p and q parameters chosen
to satisfy inequality 1\p=q\2 e2 = x ^x ð31Þ
s1 = x1 + gx2
p=q
ð27Þ After that, the dynamic error equation can be achieved by
subtracting equation (24) from equation (30)
By differentiating equation (27), equation (28) is obtained as
follows e_ 2 = e ksgn(e2 ) ð32Þ
p p1
s_ 1 = x_ 1 + g x2q x_ 2 Theorem 2 (Zhao et al., 2019): Considering s2 = e2 , If the para-
q
p p=q1 ð28Þ meter k is properly set, then e_ 2 ! 0 in limited time.
= x2 + g x2 ðx_ auv eÞ
q
Figure 2 illustrates the block diagram of the MINTSMC
with SM observer (Zhao et al., 2019).
Theorem 1 (Zhao et al, 2019): The system error converges to
zero in finite time by choosing the state-space equation given
in equation (25), the NTSMC is given in equation (27), and RL
control law is formulated as equation (29)
RL is an efficient approach to handle a complex controller
that is expected to adapt to different situations of systems. It
has taken advantage of neural network methods and allows
1 1 q 2p=q representing complex behaviors, especially in nonlinear sys-
uv = ^e + x_ + x2 + h1 sgn(s1 ) + h2 s1 ð29Þ
a gp tems with huge complexities. Besides, RL algorithms benefit
from data samples efficiently, which leads to a more stable
In equation (29), ^e is the approximate value of e, h1 and h2 learning process without being biased. RL is performed by
are the designed parameters, and h1 . k^e ek + m(m . 0). different algorithms in order to find the best path toward the
666 Transactions of the Institute of Measurement and Control 45(4)
optimal solution in a specific situation. But, practical imple- continuous progress. In order to estimate the samples of the
mentation of RL which mainly deals with continuous control policy loss function and the gradients of the loss function, the
problems, has faced many challenges including inputs and Monte Carlo (MC) method could be used as below
outputs continuous space, and divergence of learning. " #
From a point of view, RL can be divided into value-based X
and policy-based algorithms. Unlike the value-based algo- J ðuÞ = ET ;pu ðtÞ Rðst , at Þ = ET ;pu ðtÞ ½Rðt Þ ð33Þ
t
rithms, the policy-based algorithms have no convergence
problem and deal perfectly with continuous control tasks. In " #
X
T
this category, trust domain strategy optimization (TRPO) and ru J ðuÞ = ET ;pu ðtÞ ru logpu ðat jst ÞR(t) ð34Þ
PPO are typical strategy optimization methods. PPO algo- t=1
rithm which is more evolved than TRPO is a simpler and
more general approach than TRPO. Therefore, in order to where p is the optimal policy that is adopted by the algorithm
control the temperature and humidity of outdoor air using to act on an MDP-based environment, and R is reward value
HVAC, which has a continuous state and action task, the at time step t.
PPO-based sliding-mode control method is proposed to adjust By representing a new definition of the value function, the
the controller parameters accurately. actor-critic architecture makes a significant impact on this
Regarding the implementation of the proposed algorithm, goal
Markov Decision Process (MDP) is a platform characterized X
by the following tuples {S, A, r, p, g}, by which an RL task Qp ðs, aÞ = Epu ½R(st , at )js:a ð35Þ
can be described: t
X
V p ðsÞ = Epu ½R(st , at )js ð36Þ
State: the state S e R is the current situation of the
n
t
agent in the environment, and is considered as input
of actor-network. Ap ðs, aÞ = Qp ðs:aÞ V p ðsÞ ð37Þ
Action: the action A e Rm , the output of the agent, is
the possible move that the agent can make. The effectiveness of an action signal is compared to the others
Reward: r : S 3 A ! R which is used as evaluation available in that state by the advantage function Ap (s:a) ,
criteria, is the feedback from the environment which while the value function V (s) measures how good it is to be in
determines how successful the agent’s actions will be. that state. The critic network task is the prediction of the
Transition Function: The transition function value function by analyzing the cumulative receiving rewards.
p : S 3 A 3 S ! ½0, 1 which represents the probability The PPO algorithm is aimed to maximize the objective
of transition to a new state st + 1 , and estimate a function which is presented in the following equation
reward r under executing action at on the state st , are
considered as a part of the environment. LðuÞ = E
^ t min rt ðuÞA
^ t : clip(rt ðuÞ :1 e : 1 + e)A
^t ð38Þ
Discount factor: the discount factor g e ½01 , weaken
In equation (38), A ^ and E ^ refer to the approximation of the
the rewards’ effect on the agent’s choice of action to
value the rewards at different time steps. advantage function and expectation, epsilon e is a hyper-para-
meter, and rt (u) denotes the probability ratio as in equation (46)
The agent acquires information about an efficient action
pu ðat , st Þ
selection according to the current state and received feedback rt ðuÞ = ð39Þ
from the interaction by the environment. This interaction goal puold ðat , st Þ
‘
P t
is maximizing the cumulative rewards E g rt in the pres- According to equation (39) scheme, the change in probability
t=0 ratio is ignored when it would make the objective improve,
ence of environmental uncertainties. and the change is included when it makes the objective worse.
Figure 3 illustrates a single frame (at time step t) of L function.
It is obvious that the probability ratio r is clipped at 1 e or
The PPO algorithm 1 + e depending on whether the advantage function value is
Compared with the traditional neural network and actor- positive or negative.
critic-based algorithms, the PPO achieves the best optimal Due to obtaining the expectation of samples from an old
balance in algorithm complexity, robustness, precision, and policy under the new policy, PPO uses the concept of sam-
ease of implementation. pling. Therefore, each sample is used for several gradient
Many RL algorithms, including the PPO algorithm, ascent steps. Refining the new policy results in both old and
impose a Kullback–Leibler divergence constraint between new policies divergence, and hence, an increase in the var-
successive policies during parametric policy iteration to avoid iance of the estimation. Also, the old policy would be updated
large steps toward unknown regions of the state space. This to the new policy. The existence of a similar state transition
constraint results in optimum convergence during the optimi- function with a clipped probability ratio in the region
zation process. ½1 e , 1 + e guarantees to reach the mentioned purpose.
The policy gradient methods ought to lessen the variance The pseudo-code depiction of the standard PPO algorithm
of the gradient estimations toward better policies causing is shown in Algorithm 1.
Mohseni et al. 667
Figure 3. A single time step of L function. digital technological capabilities as well as exorbitant com-
pute, storage, and bandwidth costs, the DT and the huge
amounts of data it processes have remained elusive to organi-
zations until recently. Obstacles like this, on the contrary,
Multi-objective RL framework have become far less common in recent years (Grieves, 2014;
To handle an issue with more than one control problem, a Jahanshahi Zeitouni et al., 2020; Parvaresh et al., 2020;
multi-objective deep RL method for simultaneously making Zavareh et al., 2021).
decisions to reach all objectives is adopted. The key difference Companies may be able to fix physical difficulties faster by
between single and multi-objective RL relies on the use of recognizing them earlier, anticipating outcomes with greater
reward strategies, whether scalar reward or vector reward is precision, designing and building better goods, and, ulti-
used (which are utilized for single- and multi-objective prob- mately, better serving their consumers, thanks to the DT.
lems, respectively). Companies may gain value and benefits iteratively and faster
A multi-objective RL is implemented in a multi-objected than ever before with this form of smart architectural design
MDP environment. A multi-objective MDP uses an n element (Grieves, 2014).
reward vector of n rewards based on the number of objectives.
A policy p with a multi-objective value V p converts to a sca-
lar value by the function f which is a scalarization function. Combine SIL with DT virtual model using FMI toolbox
The output of this function is the weighted linear composition
of objective values as f (V p , w) = w V p . In this function, The first requirement of this plan, implementing the HVAC
weights determine the importance of objectives. The solution system controller on a TMS320F28379D Microcontroller
is a set of policies (which are based on function f and named (TI) that is established as hardware-in-loop (HIL). For this
convex coverage set) that contains at least one optimal policy purpose, typical inputs (reference and feed-back) such as tem-
for each possible preference. peratures and humidity (reference) and measurement data
points (feed-back) have been transmitted by the standard
communication protocol. In SIL, the output controller data
DT controller of HVAC system of HIL are applied to the HVAC system. After that, the vir-
tual model or DT of the controller is designed by real hard-
DT concept ware components.
Indeed, digital solutions have the potential to provide tremen- Building HVAC control strategies are frequently estab-
dous value to a company that could never have been realized lished and implemented in a Building Automation System
before the emergence of linked, smart technologies (BAS) that is unable to interact easily with the building
(Mohammadi Moghadam et al., 2021a). Due to limits in HVAC system steady-state or dynamic models for
performance evaluation via closed-loop simulation. HIL test- FMI is designed for commercial simulators to transform
ing, for example, necessitates hardware infrastructure, which their models to a normative form (FMI specification). With
is time-consuming and costly to set up and maintain. The vir- respect to the two offered operation modes, model exchange
tual model necessitates manual control logic translation from and co-simulation, we prefer co-simulation that allows
several vendor-specific languages to SIMULINK. When individual functional mock-up units (FMUs) to use their own
replicating a real-time system that requires rapid iterations, dedicated solver engines. The development of the FMI
SIL testing is a cost-effective way to ensure that the software standard has enabled software-in-the-loop simulation
can meet the demands. The code generated from the control- with dynamic system models from different software
ler model is the first step in SIL testing. This code is then put environments.
through its paces in a virtual environment, with no hardware, Figure 4 illustrates creating an integrated tool chain for the
to see how effectively it handles the simulated system. When model of HVAC systems by FMI co-simulation.
using various forms of input conditions, functions, and math-
ematical methods, tests are run to ensure that the code works
exactly as the model. SIL, like HIL, also offers the benefit of Proposed PPO-based sliding-mode control method
faster than real-time simulation, making full annual analysis
feasible. The new proposed PPO-based sliding-mode controller bene-
Develop and build the suggested DT controller with co- fits from a multi-objective PPO algorithm to estimate the con-
simulation as the following stage. Co-simulation is a tech- trol parameters adaptively. This method is especially effective
nique for resolving multi-physics model integration problems. in practical applications such as the HVAC system to control
It represents a specific type of simulation scenario in which at the outdoor temperature and humidity, which deals with
least two simulators are used to solve connected algebraic many unknown and nonlinear uncertainties and perturba-
equations and exchange data during simulation (Jinzhi et al., tions. The complete structure of the proposed controller is
2016). To determine this interface and the established models, depicted in Figure 5, where an actor-critic neural network
the FMI is widely adopted in the DT context. based on the multi-objective RL framework is adopted to
Engineers can use the FMI standard to exchange or co- tune the sliding-mode controller parameters ½h1T , h2T , g T ,
simulate dynamic models from many disciplines. In this way, h1H , h2H , g H accurately. It is necessary to mention that nota-
FMI can broaden the scope of building and energy system tion ‘‘T’’ refers to the controller which controls the tempera-
simulation applications. It can also assist in overcoming simu- ture, and notation ‘‘H’’ refers to the controller which controls
lation’s current and future limitations (Schwan et al., 2017). the humidity. More specifically, the agent generates tuning
Using the FMI, in addition to the standard for co-simulation, signals ½h1T (t + 1), h2T (t + 1), gT (t + 1) and ½h1H (t + 1),
the models can be exchanged in the simulation tools. h2H (t + 1), gH (t + 1) according to the state variables and
Mohseni et al. 669
Figure 6. A proposed strategy for the combination of HIL and SIL testing.
Finally, the reward function has been created with the multi-
Figure 8. HVAC control board and its digital twin full training rewards. objective design in the DT-based controller for both of the
670 Transactions of the Institute of Measurement and Control 45(4)
Figure 10. HVAC digital twin output comparative results of sliding Figure 12. DT output comparative results.
mode–PPO, sliding mode–PSO, and PID-PSO controllers for
temperature.
1
RewardFunctionSIL = w1
absðeT HIL eT SIL Þ
ð42Þ
1
+ w2
absðeH HIL eH SIL Þ
Figure 11. HVAC digital twin output comparative results of sliding Experimental results
mode–PPO, sliding mode–PSO, and PID-PSO controllers for humidity.
The decisive difference between the proposed method and the
SM controller is that this method benefits from the PPO algo-
HIL and SIL environments. There are two reward functions rithm to adjust the control coefficients, both in the hardware
of the PPO algorithm for HIL and SIL design. The quality of model and in the DT Platform. And that is why the accuracy
the tuning signal produced by the agent is evaluated by the of the controller’s adjustment coefficients has a tremendous
summation of weighted objective rewards value which is effect on the controller’s performance.
obtained based on humidity and temperature. For this pur- In this paper, to design the DT of the HVAC control
pose, there two reward function declarations are proposed board, initially, the SM controller is implemented on the TI
with the following rules hardware board and then its optimal coefficients are calcu-
lated in interaction with the PPO algorithm that has been
1 1 implemented in the MATLAB–SIMULINK environment.
RewardFunctionHIL = w1 + w2 ð41Þ First, the initial values are dedicated to the control
eT HIL 2 eH HIL 2
Mohseni et al. 671
Table 2. Comparative numerical results in terms of MSE, RMSE, ISE, and IAE errors.
Humidity Temperature
MSE: mean squared error; RMSE: root-mean-square error; ISE: integral of the squared error; IAE: integral absolute error.
Figure 13. HVAC and its digital twin output comparative results of sliding mode–PPO controller for temperature and humidity.
coefficients of NTSMC controller to offer a primary regula- In the next step, the hardware board control model is
tion for HVAC system by trial and error. Then, the PPO implemented in MATLAB–SIMULINK environment using
algorithm is adopted to adjust the controller parameters by equivalent hardware blocks and receives the output signals
employing the capability of deep neural networks. generated in the previous step as input. Finally, the coeffi-
For comparison purposes, the SM controller and PID cients of this DT platform are estimated using the PPO
controller are designed by particle swarm optimization (PSO) algorithm.
algorithm. In this way, the controller parameters in HIL and In general, the designed platform is expected to track
SIL are designed by minimizing the following objective hardware model behavior very well. In this section, under the
functions scenario in which temperature and humidity vary, the perfor-
mance of the proposed control algorithm as well as the DT of
ð ð
the HVAC control board is examined. In this scenario, as
Obj: HIL = w1 eT HIL 2 :dt + w2 eH HIL 2 :dt ð43Þ shown in Figure 7, temperature and humidity change in the
range of 18%–23% in degrees Celsius and 37%–44.5%,
ð respectively. Figure 8 illustrates the average of accumulated
Obj: SIL = w1 absðeT HIL eT SIL Þ:dt rewards for the full-training steps in the HVAC hardware
ð ð44Þ control board and the DT platform. According to the details,
+ w2 absðeT HIL eT SIL Þ:dt it is concluded that the hardware and DT structure humidity
and temperature error decline. And this is shown by keeping
672 Transactions of the Institute of Measurement and Control 45(4)
systems. In: 2019 winter simulation conference (WSC), National Schulman J, Wolski F, Dhariwal P, et al. (2017) Proximal policy opti-
Harbor, MD, 8–11 December 2019, pp. 2653–2664. New York: mization algorithms. arXiv preprint arXiv170706347.
IEEE. Schwan T, Unger R and Pipiorke J (2017) Aspects of FMI in building
Mohammadi Moghadam H, Foroozan H, Gheisarnejad M, et al. (2021a) simulation. In: Proceedings of the 12th international Modelica con-
A survey on new trends of digital twin technology for power systems. ference, Prague, 15–17 May, vol. 132, pp. 73–78. Linköping: Lin-
Journal of Intelligent & Fuzzy Systems 41: 3873–3893. köping University Electronic Press.
Mohammadi Moghadam H, Gheisarnejad M, Yalsavar M, et al. Vo AT and Kang H-J (2018) An adaptive terminal sliding mode con-
(2021b) A novel nonsingular terminal sliding mode control-based trol for robot manipulators with non-singular terminal sliding sur-
double interval type-2 fuzzy systems: Real-time implementation. face variables. IEEE Access 7: 8701–8712.
Inventions 6(2): 40. Wei T, Wang Y and Zhu Q (2017) Deep reinforcement learning for
Namatevs I (2018) Deep reinforcement learning on HVAC control. building HVAC control. In: Proceedings of the 54th annual design
Information Technology & Management Science 21. Available at: automation conference 2017, Austin, TX, 18–22 June 2017, pp. 1–
https://itms-journals.rtu.lv/article/view/itms-2018-0004 6. New York: IEEE.
Parvaresh A, Abrazeh S, Mohseni S-R, et al. (2020) A novel deep learn- Yang C, Shen W, Gunay B, et al. (2019) Toward machine learning-
ing backstepping controller-based digital twins technology for pitch based prognostics for heating ventilation and air-conditioning sys-
angle control of variable speed wind turbine. Designs 4(2): 15. tems. ASHRAE Transactions 125: 106–115.
Peng Y, Rysanek A, Nagy Z, et al. (2018) Using machine learning Zavareh B, Foroozan H, Gheisarnejad M, et al. (2021) New trends
techniques for occupancy-prediction-based cooling control in on digital twin-based blockchain technology in zero-emission ship
office buildings. Applied Energy 211: 1343–1358. applications. Naval Engineers Journal 133(3): 115–135.
Pérez-Lombard L, Ortiz J and Pout C (2008) A review on buildings energy Zhang C, Wu G, He J, et al. (2017) Sliding observer-based demagne-
consumption information. Energy and Buildings 40(3): 394–398. tisation fault-tolerant control in permanent magnet synchronous
Salazar R, López I and Rojano A (2007) A neural network model to motors. The Journal of Engineering 2017(6): 175–183.
predict temperature and relative humidity in a greenhouse. In: Zhao K, Yin T, Zhang C, et al. (2019) Robust model-free nonsingu-
International symposium on high technology for greenhouse system lar terminal sliding mode control for PMSM demagnetization
management: Greensys2007 801: 539–546. fault. IEEE Access 7: 15737–15748.