10 1016@j Ifacol 2018 09 613

10th IFAC Symposium on Fault Detection,
10th IFAC Symposium on Fault Detection,

Supervision
10th and Safetyon
IFAC Symposium for Technical Processes
Supervision and Safety forFault Detection,
Technical Processesonline at www.sciencedirect.com
Available
Warsaw,
10th IFACPoland,
Supervision August
and Safety
Symposium on29-31,
for 2018
Technical
Fault Processes
Detection,
Warsaw, Poland, August 29-31, 2018
Warsaw, Poland,
Supervision Augustfor
and Safety 29-31, 2018 Processes
Technical
Warsaw, Poland, August 29-31, 2018
ScienceDirect
IFAC PapersOnLine 51-24 (2018) 433–440
Feature
Feature extraction
extraction for
for batch
batch process
process
Feature extraction
monitoring and for
fault batch
detectionprocess
via
Feature extraction
monitoring
monitoring and
and for
fault
fault batch
detection
detectionprocess
via
via of
simultaneous
monitoring
simultaneous data
and
data scaling
fault
scaling and training
detection
and via of
training
simultaneoustensordatabased
scaling and training
models of
simultaneoustensordata scaling
based
tensor based models and
models training of
tensor
Carlos A. Muñoz ∗ based ∗∗ models
∗ Dries Telen ∗
∗ Philippe Nimmegeers ∗∗∗
Carlos A. Muñoz ∗ Dries Telen ∗ Philippe ∗ Nimmegeers ∗
Carlos A. Muñoz Dries Jan
Jan Telen
Van
Van Impe
Impe Philippe
∗
∗ Nimmegeers
Carlos A. Muñoz ∗ Dries Jan Van Telen Impe∗ ∗
Philippe Nimmegeers ∗
∗
∗ Jan Van Impe
∗ KU Leuven, Chemical Engineering Department, BioTeC+&OPTEC,
∗
∗ KU Leuven, Chemical Engineering Department, BioTeC+&OPTEC,
KU Leuven, ChemicalDeEngineering
Gebroeders Smetstraat,Department,
9000 Ghent,BioTeC+&OPTEC,
Belgium
∗
KU Leuven, Gebroeders
Chemical
Gebroeders De
De Smetstraat,
Engineering
Smetstraat, 9000 Ghent,
Department,
9000 Ghent, Belgium
BioTeC+&OPTEC,
Belgium
(e-mail: jan.vanimpe@kuleuven.be).
Gebroeders
(e-mail: De jan.vanimpe@kuleuven.be).
Smetstraat, 9000 Ghent, Belgium
Abstract:
Abstract: This This contribution
contribution presents
presents aa novel novel framework
framework for for process
process monitoring
monitoring and and fault
fault
Abstract:
detection of Thisbatchcontribution
processes. The presents a novel is
batch platform framework
key for theforfullprocess monitoringofand
scale production fault
specialty
detection
Abstract:
detection of batch
This processes.
contribution The batch
presents platform
a novel is key
frameworkfor the full scale
theforfullprocess production
monitoring of specialty
ofand fault
chemicals,ofactive
batchpharmaceutical
processes. The batch platform
ingredients, is key forproducts
biochemical scale production
and other high addedspecialty
value
chemicals,
detection
chemicals, active
of batch
active pharmaceutical
processes.
pharmaceutical The ingredients,
batch platform
ingredients, biochemical
is key
biochemical for products
the
productsfull and
scale
and other high
production
other high added
of value
specialty
added value
products, where flexibility is required due to the relatively limited
products, where flexibility is required due to the relatively limited life span of the products life span of the products
chemicals,
products,
and/or
and/or the active
thewhere
diversity
diversitypharmaceutical
flexibility
on big isproduct
on big product ingredients,
required due tobiochemical
portfolios
portfolios therelatively
of
of relatively products
relatively limited
small
small and
scale.
scale. other
spanhigh
lifeTraditional added
of the
Traditional value
products
approaches
approaches
products,
and/or
for faultthe where
diversityflexibility
identification oninbig isproduct
batch required due
portfolios
processes to of
require the relatively
relatively
unfolding limited
small
the lifeTraditional
scale.
third order span of the
structure of products
approaches
the data
for fault
and/or identification
faultthe onin batch processes require
requireofunfolding the third order structure of
of the data
for
(variables ×diversity
identification inbig
time × batches). product
batch Inprocessesportfolios
this contribution arelatively
unfolding small
the
tensor based scale.
third order
approach Traditional
structure
is consideredapproaches
the data
because
(variables
for fault
(variables ×
× time
identification
time ×
× batches).
in batch
batches). In
In this
this contribution
processes require
contribution a tensor
unfolding
a tensor based
the
based approach
third order
approach is considered
structure
is consideredof because
the data
because
it applies a more comprehensive multilinear analysis of the data.
it applies a more comprehensive multilinear analysis of the data. The novel approach allows The novel approach allows to
to
(variables
it applies
impose a×more
independence × batches).
time comprehensive
relationsInthrough
this contribution
multilinear a based
analysis
knowledge tensor
of the based
data.approach
structural The novel
sparsity is considered
approach
constraints because
allows to
on the
impose
it appliesindependence relations
a more comprehensive through
multilinearknowledge
analysis based
of thestructural
data. Thesparsity
novel constraints on the
impose
tensor independence
decomposition. relations through
Additionally, knowledge
a novel based
methodology structural
is presented for approach
sparsity constraints
simultaneousallows to
ondata
the
tensor
impose decomposition.
independence Additionally,
relations aa novel methodology is presented for simultaneous ondata
tensor
scaling decomposition.
and model training. Thisthrough
Additionally, allows knowledge
novel
finding the based
methodology
optimal structural
is
data sparsity
presented
scale to for constraints
simultaneous
exploit the
data
the variability
scaling
tensor and model training. This allows finding the
the optimal data scale
scale to exploit the variability
presentdecomposition.
scaling
present and
in
in allmodel
all directions
directions Additionally,
training. Thisfrom
(i.e.,
(i.e., allows
from aevery
novel
finding
every methodology
variable).
variable). The is
optimal
The datapresented
combination
combination to for
of simultaneous
exploit
of the the
the data
two variability
two proposed
proposed
scaling
methodsand
present in model
all
results in training.
directions Thisfrom
(i.e.,
an improved allows
framework finding
every the optimal
tovariable).
extract The data scale
combination
interpretable to exploit
features the the
of from the variability
two proposed
data. The
methods
present inresults
allthe in an
an improved
directions (i.e., framework every to extract interpretable features of from the
the data. The
methods
application results
of inproposed novelfrom
improved framework
framework tovariable).
to extract
the dynamic The simulation
combination
interpretable features
of the the two
from
fed-batch proposed
data. The
penicillin
application
methods of
of the
resultstheinproposed
anallows novel
improved framework
framework to
to the
the dynamic
to extract simulation
interpretable of the
features fed-batch
the from penicillin
application
production (Pensim) proposed novel framework
to evaluate its advantages dynamic simulation
over traditional of
methods forthe
fed-batch data. The
penicillin
multivariate
production
application
production (Pensim)
of the
(Pensim) allows
proposed
allows to
novel
to evaluate
framework
evaluate its
its advantages
to the
advantages over
dynamic
over traditional
simulation
traditional methods
of the
methods for
fed-batch
for multivariate
penicillin
multivariate
statistical
statistical process monitoring (MSPM). Some identified advantages of the novel framework are
process monitoring (MSPM). Some identified advantages of the novel framework are
production
statistical (Pensim)
process
a better distribution allows
on the to
monitoring evaluate its
(MSPM).
approximation advantages
Some the dataover
of identified traditional
advantages
with of methods
lower noise novelfor
thepropagation multivariate
framework are
and bias,
a better
statistical distribution on the approximation of the data with lower noise propagation and bias,
aasbetter aprocess
higher monitoring
well asdistribution interpretability(MSPM).
on the approximationof the Some of identified
extractedthe data advantages
with
features andlower of thedetection.
the noise
fault novel framework
propagation This are
andresults
bias,
as
a well as a
a higher interpretability of
of the
the extracted features and the fault detection. Thisandresults
inbetter
as an asdistribution
wellenhanced higher on the
capability forapproximation
interpretabilitymonitoring of the
extracted
batch data with
features
processes andandlower
the noise
fault fault propagation
detection.
detection. This bias,
results
in
in an
as wellenhanced
an as a higher
enhanced capability for
for monitoring
interpretability
capability batch
batch processes
of the extracted
monitoring featuresand
processes andandfault detection.
the fault
fault detection. This results
detection.
©
in 2018,
an IFAC
enhanced (International
capability Federation
for of
monitoring Automatic
batch Control)
processes Hosting
and by
fault Elsevier Ltd. All rights reserved.
detection.
Keywords: multivariate statistical process monitoring, feature extraction, fault detection,
Keywords:
Keywords: multivariate statistical
statistical process monitoring, feature extraction, fault
multivariateregularization. process monitoring, feature extraction, fault detection,
detection,
tensor decomposition,
tensor decomposition, regularization.
Keywords: multivariateregularization.
tensor decomposition, statistical process monitoring, feature extraction, fault detection,
tensor1.decomposition,
INTRODUCTION regularization. Advanced methods
methods have have been
been proposed
proposed and and investigated
investigated
1. INTRODUCTION Advanced
1. INTRODUCTION Advanced
to improvemethods
to improve the have beenof
the performance
performance ofproposed
this framework
this and investigated
framework regarding
regarding
Several methods
methods for 1. INTRODUCTION
for dimensionality
dimensionality reduction
reduction have have been Advanced
to improve
specific methods
goalstheof have
performance
multivariatebeen of proposed
this
statistical
been specific goals of multivariate statistical process monitor- and
framework
process investigated
regarding
monitor-
Several to
ingimprove
specific goals
(MSPM). theof performance
Gins multivariate
et al. of this
statistical
(2015) framework
studied process
how toregarding
monitor-
improve
Several
studied methods
studied and
and applied
applied for for
dimensionality
for the approximation
the reduction
approximation have
of the
of the been ing (MSPM). Gins et al. (2015) studied how to improve
system
system
Several
studied methods
and
variability inapplied for dimensionality
for
applications the reduction
approximation
of batch process of have
the been
system
monitoring specific
ing goals
(MSPM).
classification of
Gins
of multivariate
fault et al. statistical
(2015)
conditions instudied
batch
classification of fault conditions in batch processes imple-process
how to
processes monitor-
improve
imple-
variability in applications of batch process monitoring ing (MSPM).
classification
menting ofGins
faultetconditions
classification al. (2015)instudied
techniques asbatch how
least processes
squaresto improve
imple-
support
studied
(Zhiqiang and
variability et inapplied
al., 2013). for
applications the approximation
of
Compared batch
to process
other of the
non-linear
(Zhiqiang et al., 2013). Compared to other non-linear data classificationsystem
monitoring
data menting classification techniques as least squares support
variability
(Zhiqiang etinal.,applications
driven monitoring
monitoring 2013). of e.g.,
Compared
techniques, batch
e.g., process
tobased
other
based on neu- menting
monitoring
non-linear
artificial data vector machines
vector machines of fault
afterconditions
classification
after techniques
the fault
the inasbatch
fault condition least processes
condition squares
is identified
is identifiedimple-
supportvia
via
driven techniques, on artificial neu- menting
vector
multiway classification
machines
PCA. after
However,techniques
the fault as least
condition
possible squares
is
improvements support
identified
to via
this
(Zhiqiang
driven
ral networks
ral et
monitoringal.,
networks (ANNs) 2013).
(ANNs) or Compared
techniques,
or support e.g.,to
support vector other
based on
vector machinesnon-linear
artificial
machines (SVMs), data
neu-
(SVMs), multiway PCA. However, possible improvements to this
driven
ral monitoring
the networks
the dimensionality
dimensionality(ANNs) techniques,
or support
reduction
reduction e.g., based
vector
within
within on artificial
themachines
the scope
scope of con- vector
neu-
(SVMs),
of this
this con-
multiway
framework
frameworkmachines
PCA.
are after the fault
However,
are consistently
consistently condition
possible
constrained
constrained is identified
improvements
by
by the inherent
the inherentvia
to this
ral
the networks
dimensionality
tribution (ANNs)
is based
based on orthe
thesupport
reduction vector
within
(multi)linearthemachines
scope (SVMs),
of this
approximation con- multiway
framework
limitations PCA.
are
of However,
consistently
applying PCA possible
constrained
to a
of limitations of applying PCA to a set of autoscaled data.improvements
set of by the
autoscaled to this
inherent
data.
tribution is on (multi)linear approximation of framework
limitations
Some of areapplying
of
of these
these consistently
limitations PCA constrained
to afrom
result set of
from bylosstheof
theautoscaled
loss ofinherent
thedata.
tri-
the dimensionality
tribution
data. is
Thisbased reduction
on
approach the within
(multi)linear
results in the
more scope
robust
the data. This approach results in more robust applica- limitationsof this
approximation con-
of
applica- Some limitations result the the tri-
tribution
the data.
tions when is
This
when the based
the data on
approach
data sets the (multi)linear
results
sets are
are highlyin more approximation
robust
highly multidimensional
multidimensional and of
applica- Some of
dimensional of
these applying
limitations
structure ofPCA
the to
result
data, a set
from
the
and dimensional structure of the data, the decomposing based of
the autoscaled
loss
decomposing of the data.
tri-
based
tions
the
tions
not data.
whenThis approach
the data
not significantly
significantly large,
large, sets results
are
which
which highlyin multidimensional
is common
is commonmore ininrobust and Some
applica-
(bio)chemical
(bio)chemical on of these
dimensional
on orthogonal
orthogonal limitations
structure
features,
features, of the
the
the result
data,from
noise
noise the loss ofwhen
the decomposing
amplification
amplification the
when tri-
based
the
the
tions
not when the
significantly
processes. data
large,
Traditionally sets are
which
the highly
is multidimensional
common
dimensionality in (bio)chemical
reduction and
con- dimensional
on orthogonal
signal-to-noise
signal-to-noise structure
features,
ratio
ratio is
is of
low
low the
the data,
noise
and
and the
the the decomposing
amplification
limitation
limitation to
to when based
introduce
introduce the
processes. Traditionally the dimensionality reduction con- on orthogonal features, theandnoise amplification when the
not
sistssignificantly
processes.
sists of the
of the datalarge,
Traditionally
data which is common meaninreduction
the dimensionality
pretreatment
pretreatment via mean
via (bio)chemical
centering
centering and signal-to-noise
con-
and
knowledge
knowledge based
basedratio is low
constraints
constraints tothe
to train
train limitation
the models.
the to introduce
models.
processes.
sists of
autoscalingtheTraditionally
data
and the
pretreatment
the dimensionality
application via
of mean
principal reduction
centering
componentcon-
and signal-to-noise
knowledge based ratio is low and
constraints tothe
train limitation
the models.to introduce
autoscaling and the application of principal component knowledge Alternativesbased
Alternatives to the
to the standardtoapproach
standard
constraints approach
train the for for MSPM have
MSPM
models. have
sists of the
autoscaling
analysis
analysis (PCA)
(PCA) dataor
and pretreatment
orthe application
aa modified
modified via
version mean
of it, centering
of principal
version of it, e.g. and Alternatives
e.g.component
multiway
multiway been to
investigated the standard
dividing the approach
problem for
in MSPM
two steps have
i.e.,
autoscaling
analysis
PCA (Kourti(PCA)
(Kourti and
etorthe
al., application
a modified of principal
version
1995), consensus
consensus ofPCA,
it, e.g.
PCA, component been investigated dividing the problem in two steps i.e.,
multiway Alternatives
hierarchical
PCA et al., 1995), hierarchical been
data to
investigated
pretreatment the standard
dividing
and the approach
problem
dimensionality
data pretreatment and dimensionality reduction or data for
in
reductionMSPM
two steps
or have
i.e.,
data
analysis
PCA
PCA, (PCA)etoral.,
PCA, (Kourti
multiscale
multiscale a modified
PCA
PCA 1995),
(Alawi
(Alawi version
et al., of
consensus
et al., it, e.g.
PCA,
2014)
2014) multiway
hierarchical
kernel
kernel PCA been
PCA data investigated
pretreatment
decomposition. dividing the problem
and dimensionality
Variable scaling is an in two steps
reduction
an alternative
alternative orto i.e.,
todata
au-
PCA
PCA,
(Lee (Kourti
etmultiscale
al., et
2004), al.,
PCA 1995),
dynamic (Alawiconsensus
PCA et al.,
(Ku PCA,
2014)
et al., hierarchical
kernel
1995). PCA
PCA- decomposition. Variable scaling is au-
(Lee et al., 2004), dynamic PCA (Ku et al., 1995). PCA- data pretreatment
decomposition.
toscaling
toscaling because
because in
inand
Variable
this
this dimensionality
scaling
way
way the
the is an
physical
physicalreduction
alternative orto
interpretability
interpretabilitydata
au-
PCA,
(Lee
based multiscale
basedetmethods
al., 2004),
methods have
havePCA
dynamic
been
been (Alawi
PCA et (Ku
preferred
preferred al.,
dueet
due 2014)
toal.,
to theirkernel
1995).
their PCA decomposition.
PCA-
simplicity
simplicity toscaling
of the because
the trends
trends isVariable
in
keptthisandscaling
way
and the is
the an alternative
physical
original variance to
interpretability au-
struc-
(Lee
based
and et al.,
methods
the 2004),
relativelyhave dynamic
been
well PCA
preferred (Ku
approximation due etto
ofal., 1995).
their
the PCA-
simplicity
variability. of is kept the original variance struc-
and the relatively well approximation of the variability. toscaling of the trendsbecause is inkeptthisand
waythe the original
physical variance
interpretability
struc-
based
and the methods
relativelyhavewellbeenapproximation
preferred due to of their simplicity of the trends is kept and the original variance struc-
the variability.
and the relatively
2405-8963 © 2018,
Copyright well approximation of the variability.
IFAC (International Federation of Automatic Control)
2018 IFAC 433 Hosting by Elsevier Ltd. All rights reserved.
Copyright
Peer review© 2018 responsibility
IFAC 433Control.
Copyright ©under
2018 IFAC of International Federation of Automatic
433
10.1016/j.ifacol.2018.09.613
Copyright © 2018 IFAC 433
IFAC SAFEPROCESS 2018
434
Warsaw, Poland, August 29-31, 2018 Carlos A. Muñoz et al. / IFAC PapersOnLine 51-24 (2018) 433–440
ture in the data remains after pretreatment. However

this approach has failed to perform significantly better
(Westerhuis et al., 1999). On the one hand, alternatives
for data decomposition as independent component anal-
ysis (ICA) and canonical variate analysis (Jianga et al.,
2015) have been investigated but these also disrupt the
tri-dimensional structure of the data. Tensor based ap-
proaches on the other hand have been introduced as de-
composition alternatives that aim at projecting the data
in a space of reduced dimensionality but which is defined
based on tensor structures of the same order. The use of
a tensor based approach in this contribution is addition-
ally seen as the opportunity to develop novel data driven
models based on features that keep a higher physical Fig. 1. Structure of the data from batch processes and the
interpretability. This is possible because knowledge based three unfolding directions.
constraints can be specified for the tensor decomposition
resulting in the extraction of meaningful features. This 2. BATCH DATA SET AND TENSOR BASED
approach has been successfully exploited in applications of DECOMPOSITIONS
chemometrics (Bro and Jong, 1997; Gurden et al., 2001).
In this contribution a novel framework is proposed to Data originating from a batch process inherently has a tri-
tackle the two steps of MSPM together. This means data dimensional structure. This structure is shown in Fig. 1,
pretreatment and data decomposition are performed si- where the dataset (X [I×J×K] ) corresponds with a third
multaneously in order to overcome the limitations men- order tensor arranged to have K time points of the trends
tioned before. The core concept of the novel approach for J variables in the horizontal plane while the data
lies in keeping the relevant elements present in the data from (I) different batches run in the same unit(s) are
(structure and covariance) to train a model that is able of stacked in the vertical direction. This data set can be
exploiting them while extracting the maximal systematic unfolded in three ways to obtain a matrix structure that
variability present in each variable. This is expected to contains the same data points and which can be operated
improve the extraction of interpretable features and en- via standard matrix operations to be decomposed using
hance the performance for process monitoring and fault bilinear methods. These are the batch-wise (X(1)[I×JK] ),
detection. In addition this approach also enables further variable-wise (X(2)[J×IK] ) and the time-wise (X(3)[K×JI] )
improvements regarding regressions and control. The pro- unfolded matrices shown in Fig. 1.
posed framework can be understood as a simultaneous Direct tensor decomposition using multilinear methods is
scaling-training procedure where the data is scaled during an alternative for dimensionality reduction and feature
the model training procedure in order to obtain maximal extraction that keeps the original structure of the data.
learning in each direction and to improve the feature Two matrix product operations are introduced to extend
extraction. This is similar to the concept applied for the the basis for the application of tensor based decompo-
optimal experiment design (OED) in parameter estimation sitions. These are the Kronecker product (⊗) and the
(Telen et al., 2012). In the case of OED the aim is to Khatri-Rao product () which are given according to
determine optimal inputs (i.e. time trends for the indepen- equations (1) and (2) respectively. In this contribution the
dent variables) that guarantee obtaining data that provide two more widely exploited tensor decomposition methods
maximal information for each parameter that needs to be are investigated. Canonical polyadic decomposition (CPD)
estimated. Equivalently, in this case the aim is to scale the also known as parallel factor analysis (PARAFAC) decom-
available data in the way that the data driven model is poses the data set into the sum of rank one tensors of
trained taking as much information as possible from the the same order (Fanaee and Gama, 2016; Mortensen and
variability in each variable. Bro, 2006). In Fig. 2 the CPD of a third order tensor is
This contribution is structured as follows. First, some represented graphically, each rank one tensor is the result
definitions are introduced regarding batch processes data of the outer product (◦) of its three constituting vectors.
and tensor based decompositions. Then the concepts and Equation (3) presents two possible formulations for CPD.
traditional methods for data driven modeling, monitoring Alternatively through Tucker3 decomposition a third order
and fault detection of batch processes are presented. Sub- tensor is decomposed into three factor matrices and a core
sequently, the novel MSPM framework is described in two tensor. This structure corresponds with the multilinear
parts: the global scaling-training approach (independent singular value decomposition (SVD) of the tensor. If the
of the modeling technique) and the proposed tensor based three factor matrices are orthonormal and are chosen as
approach with structural constraints. Furthermore, the the right singular vectors of the unfolded matrices in the
results obtained by applying the novel framework to a three directions (i.e., X(1) , X(2) , X(3) ), the decomposition
benchmark fed-batch penicillin production (Pensim) case results in a core tensor for which the slices are equivalent
study are presented and discussed. Finally, the key con- to the singular values of the matrix (Sidiropoulos et al.,
clusions of this work are summarized and perspectives for 2017). One major difference with CPD is that in case
future work are provided. of Tucker3, each mode can have an independent rank,
i.e., instead of a single value for R as in CPD, in this
case r1 , r2 , r3 , determine the number of features extracted
in each mode (rows, columns and fibers of a three-order
434

Warsaw, Poland, August 29-31, 2018 Carlos A. Muñoz et al. / IFAC PapersOnLine 51-24 (2018) 433–440 435
ment, mean centering and scaling. Investigating batch

alignment is out of the scope of this contribution and
therefore the indicator variable is considered the standard
tool. The main purpose of mean centering the data is to
discard most of the non-linearities of the system. Finally
as the data driven methods are highly dependent of the
scale of the data, the variables have to be scaled to an
Fig. 2. CPD decomposition of a third order tensor. equivalent magnitude. Hence making the variability on
each variable proportional so the model can identify and
learn the variability in each direction.Autoscaling has been
widely applied for this purpose, since data is scaled to unit
variance at each time point giving an equivalent magnitude
to all points. However this method produces two unwanted
results: firstly, the original trends of the variables and the
original variance structure of the data are lost, and sec-
Fig. 3. Tucker decomposition of a third order tensor. ondly it can result in amplifying low amplitude variability,
tensor). Additionally, compared to CPD where the mixing- generally being noise, while high amplitude variability nor-
pattern only allows the combination of corresponding mally related with the systematic variability is weakened.
columns of A, B and C, in case of Tucker3 the mixing of Alternatively, the variable scaling procedure aims to keep
non-corresponding columns is included. In Fig. 3 Tucker3 the original trends for each variable and therefore the
decomposition is depicted and its formulation is presented original covariance. In this case the scale given to each
in equation (4). The reader is referred to Sidiropoulos et al. variable depends on the maximum and minimum values
(2017) for further details on the properties and derivations present in the data set for each variable.
of these methods, as well as the procedures applied to Fault detection is applied after PCA is trained and val-
compute them. idated. Two common statistics are used to define the
 
a11 B a12 B · · · boundaries for normal operation conditions (NOC). These
 
A[I1 ×I2 ] ⊗ B[I3 ×I4 ] = a21 B a22 B · · · (1) are the square predictive error (SPE) and the Hotelling’s
.. ..
. . [I 1 I3 ×I2 I4 ]
T 2 . The first represents the distance from a new observa-
A[I1 ×R] B[I2 ×R] = (a1 ⊗ b1 a2 ⊗ b2 · · · ) tion to the reduced latent space, it can be seen as a mea-
  sure of the residual variability that is not reproduced in the
a11 b1 a12 b2 · · ·
a21 b1 a22 b2 · · · (2) latent space. In equation (6) the formulation for SPE and
= 
.. .. its upper control limit are presented. This limit depends
. . [I I ×R]
1 2 on the mean (µ(t)) and variance (σ(t)) of the SPE at each
R
time point for the validation set. It is assumed that the
X = a r ◦ br ◦ c r values of SPE are chi-square distributed (χ2 ) with 2µ2 /σ 2
r=1
(3)
X(1) = A(C B) T degrees of freedom and a tolerance level α. Hotelling’s T 2
r1 r3
r2
in the other hand is a measure of the distance from the

X (i, j, k) = G(l, m, n)U(i, l)V(j, m)W(k, n) projection of a new observation in the latent space to the
l=1 m=1 n=1
(4) center of the region where all NOC projections are located.
X(1) = U G(1) (W ⊗ V ) T It is calculated according to equation (7) and its upper
control limit follows the F −distribution with degrees of
3. TRADITIONAL MONITORING APPROACH freedom equal to the number of validation batches (Nval )
minus the rank of the approximation (R) (Gins et al.,
Traditionally data driven on-line monitoring and fault 2014).
detection for batch processes is performed using PCA of SP E(t) =Et(1) ETt(1) = (Xt(1) − T PtT )(Xt(1) − T PtT )T
the batch-wise unfolded matrix (X1 ). As a result two factor σ 2 (t) 2 (6)
matrices are obtained. On the one hand, P which contains uSP E (t) = χ 2(t) 2
2µ(t) (2µ /σ (t);α)
R extracted features that define the directions of maximal
variance in the data. These features are orthogonal column T 2 (t) =T (t)ΣT (t)1 T (t)T
vectors that define the latent space. On the other hand
2
R(Nval − 1) (7)
uT 2 = F(R,Nval −R;α)
T is a matrix of R column vectors that represent the Nval (Nval − R)
projection of the original data in the latent space (i.e., the
latent variables or scores). The transformation applied in 4. NOVEL SCALING APPROACH
PCA is depicted in equation (5). The batch-wise unfold-
ing matrix is normally chosen to extract the variability The novel procedure proposed in this contribution consists
existing between batches. The error matrix E1 contains of simultaneous data scaling and training of the data
the residual variability that is not reproduced by the rank driven model. The aim is overcoming the limitations
approximation. Ideally the residual variance contains only of using a sequential approach based on autoscaling or
stochastic behavior and is normally distributed around variable scaling. Based on the concept of maximizing the
zero. information available in the data and providing the best
X(1) = T P T + E(1) (5) possible learning in all directions (i.e., from all variables
Data pretreatment is applied before the model is trained of the system), an error based scaling is developed. The
and usually consists of three operations, i.e., batch align- algorithm proposed is presented in Table 1. The first
435
436
model is trained based on data scaled using variable process data is driven by the interest in keeping the
scaling, this to keep the original covariance and to have structure of the data intact. However the added value
meaningful data trends. Next, the residual matrix is used of such tensor based methods has been investigated only
to rescale the data. Since the aim is to learn evenly from recently and several aspects still have to be developed and
every variable, the variable-wise unfolded matrix X(2) is understood further to exploit the comparative advantages
scaled using the error variance for each variable, i.e., of a tensor based approach. In this contribution the
the variance of each row in the variable-wise unfolded higher interpretability that can be obtained from a tensor
residual matrix. Then, a new model is trained based on decomposition is investigated to drive the traditional data
the scaled data and the loop is repeated until convergence mining approach towards the extraction of meaningful
is achieved. The condition for convergence is defined as features from the process data that in turn can result
achieving unit variance in the approximation error of every in better performance and interpretation regarding data
variable. Therefore, this algorithm guarantees having the approximation, process monitoring and fault detection.
model that best approximates the scaled data and which Thus, the use of structural sparsity in the factor matrices
produces unit variant residuals for every variable. The and/or core tensor is explored to constrain certain linear
advantages of this approach with respect to the traditional combinations of the loadings. The final aim is to guarantee
scaling can be pointed out. Since it is based on an independence between loadings in the time mode that
initial variable scaling, the elements in the data related approximate the dependent variables from those that
to deterministic behavior of the system are not lost or approximate the independent variables.
weakened. But since the data is rescaled based on the
variance of the residual for each variable, it is guaranteed When (multi)linear methods are used to decompose a
that the same weight is given to the variance present in given data set, a numerical search for loadings that best
each variable so the model is trained uniformly. approximate the data is performed. This results in finding
the basis or features that numerically produce the best
Table 1. Algorithm simultaneous error scaling results i.e., the minimum approximation error in the form
and PCA training. of least squares. However, this could imply the emergence
of numerical correlations between variables which are non-
1. Unfold X into X(1)
2. Mean center each column of the data
existing in the physical system. A clear example are the
3. Scale variable wise data X(2) to the range 0-1 distinction between dependent and independent variables.
5. Compute T, P via PCA for the scaled data. Equation (5) Since there exists a clear causality relation between these
2 →
− two subsets of variables the multilinear decomposition
4. While σE =Variance(E(2) ) = 1
(2)
2
4.1 Establish new scaling parameters as M = 1/σE
could be adjusted to reflect the independence of the inputs
(2) with respect to the states of the system. The strategy of
4.2 Scale data via XT M
(2) imposing independence between loadings in tensor based
4.3 Compute T, P via PCA for the new scaled data. models, has been already explored in chemometrics when
End
modeling data from chemical reactions that are being
An alternative formulation results from considering the monitored via UV-vis spectroscopy (Gurden et al., 2001).
problems of scaling the data and training the model as Equivalently in this contribution the proposed approach
a single optimization problem. In this form a scaling is investigated aiming at extracting more interpretable
parameter and a regularization term are added to the features for the variability present in the batch processes.
original least squares problem according to equation (8). Based on CPD and Tucker3 two different approaches are
This formulation is based on training a bilinear model presented to impose the structural constraints on the
as PCA but can be applied equivalently to multilinear models. In case of CPD, since this model requires the
methods. In this equation M is the vector containing the fewest number of parameters and the combinations are
scaling factors for each variable. As result of the product restricted to be only between corresponding loading and
between M and the unfolded data the scaling parameters scores, it is enough to impose structural sparsity on the
are applied. The regularization term on the other hand factor matrix B to generate the desired independence. This
aims at making the variance of the residual of each variable sparsity is structured as shown in equation (9) where zeros
equal to 1. In this term N is the number of elements are added in the rows n to n + k which correspond to the
along the variable-wise unfolded matrix which corresponds position of the independent variables in the data set and
to number of batches × number of time points. When for i columns to guarantee independence between the set of
applying the algorithm presented in Table 1 the solution independent variables and the first i time loadings. Thus,
to the optimization problem is obtained via an iterative those first i time loadings will only influence the dependent
procedure that guarantees first making the regularization variables.
term as close as possible to zero, and secondly finding the  
b11 · · · b1i b1(i+1) ··· b1R
optimal least squared solution to the problem given by  .. . . .. .. .. .. 
 . . . . . . 
the first term in equation (8) with M being the cumulative 
 0 ··· 0 bn(i+1) ··· bnR 

product of the variance from the residuals in each iteration.  . . . .. .. .. .. 
B=
 .. . . . . . 
 (9)
T
diag(E(2) E(2) )  0
min T
(X(2) T 2
M ) − [T, P ] + λ − 1 2 (8)  · · · 0 b(n+k)(i+1) ··· b(n+k)R 

M,T,P N  . . . .. .. .. .. 
 . . . . 
. . .
5. CONSTRAINED TENSOR DECOMPOSITION bJ1 · · · bJi bJ(i+1) ··· bJR
In case of Tucker3, given the more complex combination
The use of tensor based decomposition methods to train pattern between loadings, it is, according to equation (4),
models that reproduce the variability present in batch not possible to determine a unique set of parameters in the
436

variable loadings which are related with an unique vari- the explained variance with each extra latent variable
able. Therefore only adding sparsity to the factor matrix has been considered as one possible criterion. However,
V is not enough to guarantee the independence relations as in all cases the challenge is to identify a well defined limit
in case of CPD. To fully establish independence between for the rank where the approximation is good enough to
the loadings that approximate the independent and the reproduce the desired systematic variability while avoiding
dependent variables, structural sparsity has to be imposed non-systematic behavior that can lead to overfitting. Reg-
both in the factor matrix V and the core tensor G. As ularization techniques have been used to reduce the risk of
the elements of the core tensor can be interpreted as overfitting. The extra regularization term is traditionally
the weighting factors for the combination applied between formulated to reduce the complexity of the model, e.g., by
loadings, making these parameters equal to zero avoids introducing non-structural sparsity to the factor matrices.
certain combinations. The sparsity imposed to the factor In this way an equilibrium between least squares minimum
matrix V is equivalent to that applied to matrix B in case error of the approximation and the model complexity is
of CDP. Equation (10) represents the unfolded core tensor obtained. As it was mentioned before the novel scaling
G with the structural sparsity required to guarantee the approach results in the addition of a regularization term
desired independence. to the optimization problem. Thus the first aspect investi-
 
g1,1,1 · · · 0 · · · 0 · · · g1,r2 ,1 gated in the application of this method to the Pensim case
 ..  study is on the rank estimation.
G(:, :, 1) =  ... . . .. . .
. . .
.. . .
. . . 
gr1 ,1,1 · · · 0 · · · 0 · · · gr1 ,r2 ,1 In Figs. 4 and 5 the training curves for the standard PCA
 
g1,1,i · · · 0 · · · 0 · · · g1,r2 ,i of autoscaled data and the proposed alternative using the
 .. 
G(:, :, i) =  ... . . .. . .
. . .
.. . .
. . .  simultaneous scaling-training approach are presented. In
gr1 ,1,i · · · 0 · · · 0 · · · gr1 ,r2 ,i the figures the blue curve corresponds to the ratio between
  (10) the variance explained by the given rank and the one
0 · · · g1,n,(i+1) · · · g1,(n+k),(i+1) ··· 0
 . . ..  obtained if a lower rank is used. The orange curve is the
G(:, :, i + 1) =  ... . . . ..
.
..
.
..
. . .
0 · · · gr1 ,n,(i+1) · · · gr1 ,(n+k),(i+1) ··· 0 relative variability explained. Wold’s criterion (Gins et al.,
  2014) defines the limit on terms of the relative gain in
0 · · · g1,n,r3 · · · g1,(n+k),r3 ··· 0
 . . ..  the variance explained with each extra latent variable to
G(:, :, r3 ) =  ... . . . ..
.
..
.
..
. . . establish an equilibrium between model complexity and
0 · · · gr1 ,n,(i+1) · · · gr1 ,(n+k),r3 ··· 0
. the best rank approximation. As it can be seen in Figs. 4
and 5 the proposed novel approach produces a steeper
increase on the variance explained at low ranks and a
6. CASE STUDY sharper change when the point of no more significant
improvement is achieved. Thus while for the autoscaled
The in-silico Pensim case study (Birol et al., 2002) in- data the standard threshold for Wold’s criterion results in
cluded in the Matlab based software tool RAYMOND requiring 8 latent variables, in case of the proposed scaling-
(Gins et al., 2014) is used in this contribution as bench- training procedure with a rank 3 approximation sufficient
mark to evaluate and illustrate the advantages that the variance explained it is clear that higher complexity will
novel data driven monitoring framework offers. The Pen- not generate any significant improvement of the model.
sim model consists of a fed batch reactor that is used in
the production of penicillin. In the first stage the reactor Based on these results the rank approximation is fixed to
operates in batch condition, then the initial concentration use three latent variables for all applied methods. This pro-
decreases till the point when the feeding is started and vides a common ground to compare them. In case of CPD
the substrate in the reactor reaches an equilibrium. All
Table 2. Measured variables and initial condi-
11 variables depicted in Table 2 are measured during the
tions for Pensim case. study
reactor operation. Disturbances and noise are introduced
to simulate the variability expected in a real process. Other Type of Initial Sensor noise (SN)
Variable
conditions of the process were set as in the original bench- variable condition / Disturbance (D)
mark and are in detail presented by Birol et al. (2002). A Dissolved O2
Dependent 1.16-1.18 σ = 0.002 (SN)
total of 130 batches were simulated for a period equivalent [mmol/L]
to 400 time points. 100 batches were used as training data Volume [L] Dependent 90-115 -
set and 30 batches for validation of the NOC and definition pH Dependent 5 -
Temperature
of the control limits for SPE and T 2 . Dependent 298 -
[K]
Feed rate
Independent 0 σ = 0.005 (D)
7. RESULTS [L/h]
Aeration rate
Independent 8 σ = 0.3 (D)
7.1 Training and validation [L/h]
Agitation
Independent 30 σ = 1 (D)
power [W]
The dimensionality reduction via (multi)linear data driven Feed temp. [K] Independent 296
methods requires estimating the rank to be used in the ap- σ = 0.5 (D)
Cooling water
proximation. The rank selected is equivalent to the number Dependent - -
[L/h]
of latent variables or in other words the dimensionality of Base flow
Dependent - -
the latent space. Different methods have been investigated [L/h]
in order to determine the best rank approximation for Acid flow
Dependent - -
a given data set. Traditionally, the relative increase in [L/h]
437
438
from this variable that was not captured by the model. To

compare the results obtained by different combinations of
scaling and training approaches in Figs. 7 and 8 the error
distribution for the volume in the reactor and its trend
for one validation batch are presented. In these figures
the results of applying PCA and Tucker3 are depicted.
On the one hand PCA is applied to autoscaled data and
in combination with the proposed simultaneous scaling-
Fig. 4. Training curves for PCA of autoscaled data. Rank training approach. On the other hand Tucker3 is applied
approximation vs. explained variance. to data scaled using the variable scaling approach and
the proposed novel framework combining the simultaneous
scaling-training approach and the structural constraints to
impose independence through sparsity.
Fig. 5. Training curves for simultaneous data scaling

and PCA training. Rank approximation vs. explained
variance.
this condition restricts the use of the method to investigate
the addition of structural constraints because the rank is
too low to generate independent loadings for the two sets of
variables. In contrast for Tucker3 this rank defines only the Fig. 6. Error distribution over aproximation of dissolved
batch mode. The rank or number of loadings for the vari- oxygen and volume training PCA on autoscaled data.
ables and time modes were defined independently based on
conditions that guarantee the extraction of interpretable
features. After preliminary evaluations it was found that a
minimum of 3 features for the 7 dependent variables and
3 for the 4 independent variables is required to keep a
good model approximation while extracting interpretable
features. Therefore the multilinear rank for the Tucker3
decomposition was set to be r1 = 3, r2 = 6, r2 = 6.
Based on the rank for each method the total number of
parameters in each case and the compression ratios were
computed. From results presented in Table 3 it is clear that
the complexity of the three evaluated models is, from low Fig. 7. Error distribution and approximation to volume
to high, (i) CPD, (ii) Tucker3 and (iii) PCA. However, this trend. PCA of autoscaled data (above) and simulta-
comes at the cost of a higher complexity for decomposition, neous scaling-training appraoch (below).
as it is shown by the required computational time.
Table 3. Compression ratio for the
(multi)linear decomposition methods.
PCA CPD Tucker3
Rank 3 3 3,6,6
Total number of parameters 13,356 1,329 2,676
Compression factor 89.9 % 99.0 % 98.0 %
Computational time [s] 0.2 6.6 11.2
An assumption of the (multi)linear methods for dimen-

sionality reduction is considering that the error over the
data approximation is normal distributed around zero. Fig. 8. Error distribution and approximation to volume
However, the results of the application to the Pensim trend. Tucker3 of variable scaled data (above) and
case study demonstrate that this is not always the case simultaneous scaling-training appraoch of constrained
for all variables present in the data set. Fig. 6 shows the Tucker3 (below).
error distribution for two variables i.e., dissolved oxygen
and the volume in the reactor. These results correspond From results presented in Figs. 7 and 8 it can be seen
to the approximation produced by a PCA model based how the estimation using the two traditional scaling ap-
on autoscaled data. The results show that the applied proaches, i.e., autoscaled data and variable scaling present
method produces a preferential approximation of the first considerable deviations from a normal distribution of the
variable over the second. As a consequence the error dis- error and this is reflected in a noticeable bias on the
tribution on the approximation of the volume indicates estimation of the trend. However these two deviations
that probably there is still certain systematic behavior have a different character. On the one hand the estimation
438

produced by PCA of autoscaled data has a higher bias but 7.2 Monitoring and fault detection
with a good approximation of the trend behavior. On the
other hand, the result for the standard Tucker3 of variable First the interpretability of the features extracted from
scaled data results in lower bias but with inconsistencies on the data was evaluated. The novel proposed approach for
the systematic behavior of the variable. The latter devia- simultaneous error based scaling and training of the struc-
tion probably results from over-combination of loadings in tural constrained Tucker3 was compared with the results of
the standard Tucker3 model. This means, certain dynamic the standard PCA of autoscaled data. In Fig. 9 the features
behaviors that do not have a significant correlation in (loadings) extracted by PCA are presented. As expected,
the physical system are numerically combined to achieve since these features only represent the directions of highest
a better data approximation. In contrast the two results variability of the data in the the batch-wise unfold version,
using the proposed simultaneous approach are significantly it is very difficult to extract any further meaningful infor-
better and similar, independently of the method applied. mation from them. In contrast the features extracted in
At this level the relative advantages of using the proposed the time mode of the model trained using the proposed
structurally constrained Tucker3 method do not play a novel approach (Fig. 10) show a clear connection with
big role in comparison to the results obtained via PCA. the trends of the physical variables. Additionally a clear
A clear advantage is found when comparing the proposed distinction can be made from the features that approxi-
approach with respect to the standard Tucker3. In these mate the dependent variables and those that approximate
results it can be seen how the imposed constraint reduces the independent variables. Those three with a clear strong
the risk of over-combination of the loadings and therefore deterministic behavior are the features extracted for the
the wrong dynamic behavior disappears from the estima- dependent variables, while the other three correspond with
tion. the disturbances that were introduced for the independent
Finally in Figs. 6, 7 and 8 it is observed how the pro- variables.
posed novel scaling approach results in not only a more
homogeneous error distribution but the standarization of
the distribution. It can be seen in Fig. 6 that this has
particular importance because when applying autoscaling
it results in indirectly giving more weight to the variability
of one variable and its residual error. The dissolved oxygen
is approximated more accurately because any error on
this variable has a higher scale than the variability in
the volume. In Table 4 two parameters are evaluated over
the residual matrices to determine how well the different
scaling-decomposition methods perform at learning uni- Fig. 9. Loading of the PCA based model for autoscaled
formly from the set of variables. First the overall rela- data.
tive error of the approximation regarding the validation
batches is presented. As it can be seen, the overall approxi-
mation is equivalent for all methods applied since the same
rank approximation was used. However, results regarding
the modified E-criterion show a clear difference between
the methods. In OED this criterion is used to determine
how well distributed is the uncertainty of the estimation
along all parameters (Telen et al., 2012). Equivalently in
this case, being computed over the variance-covariance ma-
trix of the residuals it represents how well distributed the
approximation error is over all variables. Thus, the results
show that those models trained using the proposed novel Fig. 10. Time loadings of the constrained Tucker3 model
approach for simultaneous scaling and training result in a for data scaled and trained simultaneously.
better global distribution of the error along all variables.
This implies that those models have learned the variability A set of 10 new batches of the process were simulated to
of the system evenly from all the variables. evaluate the performance for fault detection of the trained
models. For these batches the parameter that determines
the oxygen uptake for maintenance of the microbial culture
Table 4. Residuals evaluation for (multi)linear in the kinetic model was modified (i.e., from the standard
decomposition. value 0.467 to 0.867). This deviation was intended to sim-
ulate a change in the process that was not directly related
PCA / Tucker3 / Const.
PCA / with the change in one variable but with the dynamic sys-
sim. Variable Tucker3 /
autoscal.
scaling scaled sim. scaling tem itself. This deviation simulates biological variability
relative that have an impact in the dynamics. Graphical results are
0.167 0.166 0.157 1.66 presented for the two online monitoring statistics, SPE and
error val.
Mod. T 2 . Fig. 11 corresponds to the case using standard PCA
166.8 4.71 208.8 9.04
E-crit. of autoscaled data, while Fig. 12 presents the case using
the novel proposed approach combining the simultaneous
scaling and training of the constrained Tucker3 decompo-
439
440
sition. Mainly two aspects can be highlighted from these 8. CONCLUSION

results as main advantages of the novel approach.
A novel framework for MSPM of batch processes and fault
First the fault identification occurs faster when using the
identification was developed and validated on a bench-
proposed novel monitoring approach. Since the introduced
mark case study. The combination of the simultaneous
deviation affects the process along all the batch duration,
data scaling and the training of a constrained Tucker3
the fact that with the novel approach the deviation is
model showed significant advantages compare to the tra-
identify faster represents a clear advantage. Additionally
ditional approach based on PCA. Features that show an
for some batches the identification occurs almost 30 time
interpretable correlation with the trends of the measured
points earlier. The second element is the considerable
variables were extracted from the data. It was shown
higher interpretability of the fault that can be gained from
how the more robust construction of the novel approach,
the results over the two monitoring statistics. From the
solving limitations of the current existing methods along
figures presenting the SPE, it can be seen that for the
the training and validation steps resulted in enhanced
case based on standard PCA the fault can be easily misin-
characteristics for fault identification. Future work in this
terpreted as a sudden change in the process conditions. In
direction will be on extending the proposed approach to
contrast, using the novel approach it is clear that the fault
regression models for applications of prediction and control
corresponds with a progressive change on the conditions.
of batch processes.
A clear increase on the SPE of the monitored batches can
be tracked to start around the time point 100.
ACKNOWLEDGEMENTS
Finally looking at the results in the T 2 a clear difference
can be found between the two approaches. In the standard This work was supported by Fonds Wetenschappelijk
case after the fault is identified, the trend of the T 2 for the Onderzoek Vlaanderen [G.0863.18], the Belgian Science
monitored batches changes and by the end of the time it Policy Office (DYSCO) [IAP VII/19] and KU Leuven
results in going over the threshold of normal operation for PFV/10/002 Center-of-Excellence Optimization in En-
some batches. This behavior suggest that after the fault gineering (OPTEC). CAM holds a VLAIO-Baekeland
occurs there is a change on the correlation between the [HBC.2017.0239] grant.
monitored variables. However this is not the case because
in fact the simulated fault affects only the dissolved oxygen REFERENCES
while the other monitored variables remain under NOC. In
Alawi, A., Zhang, J., and Morris, J. (2014). Organic
contrast the monitoring results from the novel approach
Process Research and Development, 19, 145–157.
reflect that in fact the overall process continues under
Birol, G., Undey, C., and Cinar, A. (2002). Computers and
NOC with a deviation on the dissolved oxygen which is
Chemical Engineering, 26, 1553–1565.
the only variable that contributes to the increase of the
Bro, R. and Jong, S.D. (1997). Journal of Chemometrics,
SPE.
11, 393–401.
Fanaee, H. and Gama, J. (2016). Knowledge-based systems,
98, 130–147.
Gins, G., Van den Kerkhof, P., Vanlaer, J., and Van Impe,
J. (2015). Journal of Process Control, 26, 90–101.
Gins, G., Vanlaer, J., Van den Kerkhof, P., and Van Impe,
J. (2014). Computers and chemical engineering, 69, 108–
118.
Gurden, S.P., Westerhuis, J.A., Bijlsma, S., and Smilde,
A.K. (2001). Journal of Chemometrics, 15, 101–121.
Jianga, B., Huanga, D., Zhub, X., Yanga, F., and Braatz,
R.D. (2015). Journal of Process Control, 26, 17–25.
Fig. 11. Online monitoring SPE and T 2 for the faulty Kourti, T., Nomikos, P., and MacGregor, J.F. (1995).
batches using standar PCA on autoscaled data. Journal of Process Control, 5, 277–284.
Ku, W., Storer, R.H., and Georgakis, C. (1995). Chemo-
metrics and Intelligent Laboratory Systems, 30, 179–196.
Lee, J.M., Yoo, C.K., and Lee, I.B. (2004). Computers and
chemical engineering, 28, 1837–1847.
Mortensen, P.P. and Bro, R. (2006). Chemometrics and
Intelligent Laboratory Systems, 84, 106–113.
Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang,
K., Papalexakis, E.E., and Faloutsos, C. (2017). IEEE
Transactions on signal processing, 65, 3551–3582.
Telen, D., Logist, F., Van Derlinden, E., Tack, I., and
Van Impe, J. (2012). Chemicalengineering Science, 78,
82–97.
Fig. 12. Online monitoring SPE and T 2 for the faulty Westerhuis, J., Kourti, T., and Macgregor, J. (1999).
batches using the novel simultaneous scaling-traing Journal of Chemometrics, 13, 397–413.
approach of a constrained Tucker3 decomposition. Zhiqiang, G., Zhihuan, S., and Gao, F. (2013). Industrial
and Engineering Chemistry Research, 52, 3543–3562.
440

10 1016@j Ifacol 2018 09 613

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1016@j Ifacol 2018 09 613

Uploaded by

Copyright:

Available Formats

10th IFAC Symposium on Fault Detection,

10th IFAC Symposium on Fault Detection,

ture in the data remains after pretreatment. However

ment, mean centering and scaling. Investigating batch

from this variable that was not captured by the model. To

Fig. 5. Training curves for simultaneous data scaling

An assumption of the (multi)linear methods for dimen-

sition. Mainly two aspects can be highlighted from these 8. CONCLUSION

You might also like