You are on page 1of 17

Computers and Chemical Engineering 71 (2014) 77–93

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

Adaptive soft sensor modeling framework based on just-in-time


learning and kernel partial least squares regression for nonlinear
multiphase batch processes
Huaiping Jin, Xiangguang Chen ∗ , Jianwen Yang, Lei Wu
Department of Chemical Engineering, Beijing Institute of Technology, Beijing 100081, People’s Republic of China

a r t i c l e i n f o a b s t r a c t

Article history: Batch processes are characterized by inherent nonlinearity, multiple phases and time-varying behavior
Received 20 December 2013 that pose great challenges for accurate state estimation. A multiphase just-in-time (MJIT) learning based
Received in revised form 3 April 2014 kernel partial least squares (KPLS) method is proposed for multiphase batch processes. Gaussian mix-
Accepted 19 July 2014
ture model is estimated to identify different operating phases where various JIT-KPLS frameworks are
Available online 28 July 2014
built. By applying Bayesian inference strategy, the query data is classified into a particular phase with
the maximal posterior probability, and thus the corresponding JIT-KPLS framework is chosen for online
Keywords:
prediction. To further improve the predictive accuracy of the MJIT-KPLS algorithm, a hybrid similarity
Adaptive soft sensor
Batch process
measure and an adaptive selection strategy are proposed for selecting local modeling samples. Moreover,
Kernel partial least squares maximal similarity replacement rule is proposed to update database. A procedure of input variable selec-
Just-in-time learning tion based on partial mutual information is also presented. The effectiveness of the MJIT-KPLS algorithm
Partial mutual information is demonstrated through application to industrial fed-batch chlortetracycline fermentation process.
Chlortetracycline fermentation process © 2014 Elsevier Ltd. All rights reserved.

1. Introduction principle models and data-driven models (Kadlec et al., 2009). The
focus of this work is on the data-driven soft sensor modeling.
Batch or semibatch processes have been widely used to produce Reviews of this type of soft sensor application have been pub-
special chemicals, materials for microelectronics, pharmaceutical lished in references Fortuna et al. (2007), Haimi et al. (2013), Kadlec
and agricultural products (Cinar et al., 2003). As prior require- et al. (2009), Kano and Nakagawa (2008), Kano and Fujiwara (2013),
ments, the reliable real-time measurements play a crucial role in Sliskovic et al. (2011).
process automation, monitoring and optimization (Alford, 2006). The most common data-driven modeling techniques for devel-
However, the lack of reliable online sensors, which can accurately oping soft sensors are multivariate statistical techniques such as
detect the important state variables, has become one of the major multivariate linear regression (MLR) (Kano and Ogawa, 2010), prin-
challenges of controlling batch processes accurately, automatically ciple component regression (PCR) (Jolliffe, 2002) and partial least
and optimally (Chen et al., 2006; Dochain, 2008; Nicoletti and Jain, squares (PLS) (Lin et al., 2007; Kano and Fujiwara, 2013; Sharmin
2009). et al., 2006). These linear modeling methods account for 90% of
Over the past two decades, soft sensors have received increas- the soft sensors used in industry (Kano and Ogawa, 2010) due to
ing attention in both academia and industry due to their inferential their statistical background, ease of interpretability, and because
estimation capability. Although soft sensors have been applied in they can deal efficiently with data collinearity, which is common
broad fields, the online prediction remains the dominant applica- among industrial datasets (Kadlec and Gabrys, 2011). Nevertheless,
tion (Kadlec et al., 2009). When online analyzers are not available, these linear methods cannot function well when applied to highly
soft sensing technology aims to provide estimations of difficult- nonlinear processes like batch processes. Thus many efforts have
to-measure variables based on some easy-to-measure variables. been attempted to nonlinear approaches, such as artificial neu-
Generally, soft sensors can be classified into two groups: first- ral networks (ANN) (Cui et al., 2012; Gonzaga et al., 2009), KPLS
(Jia et al., 2013; Yu, 2012a), support vector regression (Yu, 2012b),
neuro-fuzzy system (Jang, 1993; Jassar et al., 2009) and Gaussian
∗ Corresponding author. Tel.: +86 13601333018; fax: +86 010 68914662.
process regression (GPR) (Grbić et al., 2013). However, there exist
E-mail addresses: jinhuaiping@gmail.com (H. Jin), xgc1@bit.edu.cn (X. Chen),
some issues remained to be solved to develop soft sensors for batch
yangjianwen@bit.edu.cn (J. Yang), wulei@bit.edu.cn (L. Wu). processes.

http://dx.doi.org/10.1016/j.compchemeng.2014.07.014
0098-1354/© 2014 Elsevier Ltd. All rights reserved.
78 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

One particular drawback of many current soft sensors is their characteristics. In real applications, the most frequently used
non-adaptive nature. Traditionally, the predictive models are not similarity measures are based on the Euclidean distance or the
adaptive, and once deployed into the real-life operation, the mod- Mahalanobis distance (Cheng and Chiu, 2004; Ge and Song, 2010;
els will not change, whereas the operation environment is often Schaal et al., 2002), most of which are defined only from the per-
changing. To cope with changes in process characteristics, model spective of sample algebraic space irrespective of specific process
maintenance is essential to maintain high estimation performance knowledge. Moreover, the optimal local modeling size is usually
for a long time. Without process or expert knowledge, the soft determined offline as reported in (Ge and Song, 2010). An adaptive
sensor model has to be updated automatically. Thus many kinds strategy is required to choose local modeling samples adaptively
of recursive modeling methods, which update models by priori- for each query data. Besides, a reliable database updating scheme is
tizing newer samples, have been proposed (Kadlec et al., 2011). also crucial for JIT based models to tackle changes in process charac-
Although these methods can adapt the soft sensor to a new oper- teristics. Usually, the database is updated only by simply removing
ation condition recursively, they cannot cope with abrupt changes the oldest samples and adding the new samples (Shigemori et al.,
in process characteristics, which are caused by replacement of a 2011).
catalyst, cleaning of equipment, etc., because a query sample just The second problem encountered in soft sensor modeling is
after an abrupt change becomes significantly different from the the lack of a systematic guideline for input variable selection. As
prioritized samples. It is also a common practice to enhance the reported in literature work (Cui et al., 2012; Kadlec and Gabrys,
model adaptation by the ensemble learning framework (Grbić et al., 2011; Kim et al., 2013; Pani et al., 2013), input variables are often
2013; Kadlec et al., 2011; Kadlec and Gabrys, 2011; Yu, 2012c). selected based on engineers’ personal experience and prior pro-
In the ensemble framework, data are divided into different sub- cess knowledge. However, it is time-consuming for the engineers
domains and local sub-models are constructed over each domain. to select the input variables since trial and error is inevitable. Addi-
In this way, instead of using a single global model, multiple sim- tionally, the selected variables may not be optimal. Also, it becomes
pler models are developed and then combined to obtain the final very difficult even for experienced engineers to properly select
prediction. Some other adaptive soft sensors were developed by input variables when a large number of variables are measured and
partitioning the process data into multiple clusters corresponding physical and chemical phenomena are not sufficiently understood.
to different operating phases where local predictive model is built Consequently, various data-based methods have been proposed for
for each phase (Yu, 2012a; Yu and Qin, 2008, 2009). When imple- selecting proper input variables.
mented online, the query sample, for which an output estimation One popular approach for reducing the input variable dimen-
is required, is firstly classified into a particular phase, and then sion is by projecting the original input space into an adequate lower
the corresponding local model representing the identical phase is dimensional space. The most popular methods for achieving such
adaptively chosen for prediction. Although such phase based multi- projection task are based on linear projection of input space, such
model methods outperforms the single models, they are unable to as the widely known principle component analysis (PCA) and PLS.
effectively capture the between-phase transient dynamics. Thus, a However, the new variables resulting from such methods are diffi-
Bayesian model averaging (BMA) based multi-model method was cult to interpret in terms of actual process variables (Delgado et al.,
proposed to tackle this issue (Yu et al., 2013). In practice, however, 2009). More importantly, the underlying assumption of linearly
it is difficult to determine the partition number due to the lack of structured dependence contradicts to the development of statis-
the quantitative and precise information of phase divisions. More tical model of nonlinear processes.
importantly, the local models used in the BMA based multi-model Another approach for reducing input space dimension is based
methods are built offline and not updated once deployed online, on selecting the most important variables from all potential vari-
thus the changes in process characteristics cannot be well dealt ables according to some criteria. As a nonparametric and nonlinear
with due to the time-varying nature of the real-life processes. measure of relevance derived from information theory, mutual
Recently, just-in-time (JIT) learning has attracted increasing information (MI) has recently been applied to nonlinear processes
attention in process modeling and soft sensor development (Cheng modeling (May et al., 2008a), process monitoring (Chen et al., 2013;
and Chiu, 2004; Fujiwara et al., 2009). By applying JIT learning, a Rashid and Yu, 2012a, 2012b) and soft sensor design (Grbić et al.,
local model is constructed from the samples similar to the query 2013). Unlike those linear methods that only consider linear rela-
sample. Thus, on one hand, JIT based model can cope with abrupt tionships between variables, MI is theoretically able to identify
changes as well as gradual ones. On the other hand, it can deal with relations of any type. It furthermore makes no assumption about
nonlinearity since it builds a local model repeatedly. Compared the distribution of the data. However, several issues have arisen
to the traditional modeling methods which can be considered as in the formulation of MI-based selection algorithms, which are:
global modeling, JIT based method exhibits a local model structure the ability of handling the inter-dependencies between candidates
where a local model is built from the historical dataset selected and the lack of an appropriate principle for determining when to
by some similarity measure to the query data when the estima- halt the selection procedure (Chow and Huang, 2005). To tackle
tion is required. Once the estimated output is obtained, the built this issue, the partial mutual information (PMI) criterion has been
local model is discarded. Nevertheless, the estimation accuracy of developed by considering the effect of the already selected input
JIT based models is expected to be further improved by selecting variables when evaluating the relevance between one plausible
the optimal combination of local regression function, input variable input variable and the output variable (May et al., 2008a, 2008b;
selection, similarity measure, and database updating scheme, etc., Sharma, 2000).
simultaneously. Apart from the model adaptation and input variable selection,
To enhance the predictive performance and adaptability of JIT during data-driven soft sensor modeling, much attention is only
based models, there are some problems remain to be addressed. paid to the plant data, whereas the process characteristics are usu-
Although linear local model based JIT methods (Cheng and Chiu, ally ignored. In practice, batch processes are often characterized by
2004; Kim et al., 2013) can successfully address the process their multiphase characteristics where multiple operating phases
nonlinearity, high nonlinearity variable relationships are widely are involved. Thus the accuracy and reliability of quality variable
existed in batch processes where a local linear model may not prediction can heavily degrade as the operating phase and pro-
always function well. Thus a nonlinear modeling technique with cess dynamics change. Multiphase modeling strategy with phase
high computational efficiency is preferable. In addition, the com- identification has been reported more efficient than the conven-
monly used similarity measures rarely take into account process tional single-model based methods (Yu, 2012a; Yu and Qin, 2008,
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 79

2009; Yu et al., 2013). These results indicate that multiphase nature Due to the curse of dimensionality, it is not feasible to calculate
essentially needs to be considered when designing data-driven soft the nonlinear mapping of each original input and then conduct the
sensors for batch processes. PLS regression in the feature space. To tackle this issue, by apply-
To address the above-mentioned issues, a novel adaptive soft ing the so-called ‘kernel trick’, the inner product of two mapped
sensor modeling framework is proposed for nonlinear multiphase samples can be given by
batch processes. This soft sensing algorithm is outlined as follows:
Kij = K(xi , xj ) = ˚(xi )T ˚(xj ), (2)

(i) JIT learning framework is adopted due to its capability of where K(·, ·) denotes a nonlinear kernel function. Denote  as an
dealing with changes in process characteristics as well as (n × m) matrix whose ith row is the vector ˚(xi ) in a m-dimensional
T
nonlinearity. Within this learning framework, a new hybrid feature space F, we can see that  represents the (n × n) ker-
similarity measure is defined by integrating Euclidean distance nel Gram matrix K of the cross dot products between all mapped
n
based similarity with process phase similarity. Further, the input data points {˚(xi )}i=1 . Depending on the nonlinear trans-
optimal local modeling samples are adaptively determined by formation ˚(·), the feature space can be high-dimensional, even
online cross-validation optimization. Besides, a maximal simi- infinite dimensional when a kernel function is used. In this study,
larity replacement rule is proposed to update sample database. the following radial basis function (RBF) is selected as the kernel
(ii) KPLS is chosen as the local modeling technique for two reasons. function
One reason is that it is more effective to capture the nonlinear  
||xi − xj ||2
characteristics of batch processes than linear methods such as K(xi , xj ) = exp − , (3)
2 2
PLS regression. Another reason is that KPLS essentially requires
only linear algebra, making it as simple as a regular linear PLS where  stands for the width of RBF kernel.
regression. The KPLS algorithm is outlined as follows. Let A be the desired
(iii) Gaussian mixture model (GMM) is estimated to identify the number of latent variables (LV). Repeat for i = 1 to A
operating phases of batch process, and then various JIT-KPLS
modeling frameworks are constructed for different phases. Step 1. ti = Ki ui , ti ← ti /||ti ||.
(iv) The input variables are selected based on PMI criterion Step 2. ci = YTi ti .
between potential input variables and output variable by Step 3. ui = Yi ci , ui ← ui /||ui ||.
performing a stepwise selection procedure, which effectively Step 4. Ki+1 = (I − ti ti T )Ki (I − ti ti T ), where I is an n-dimensional
alleviates the effect of the redundancy between input variables. identity matrix.
Step 5. Yi+1 = Yi − ti ti T Yi .
The proposed MJIT-KPLS soft sensing framework allows per-
forming model adaptation in four aspects. First, a query sample After extracting the desired latent variables, the corresponding
can be automatically classified into a particular operating phase regression coefficient matrix is expressed as
based on the estimated GMM model through Bayesian inference −1 T
T
strategy. Second, in the JIT learning framework, a local KPLS model B =  U(TT KU) T Y (4)
is constructed online when the estimation for the query sample is
and to make prediction on training data we can write
required. Third, an adaptive strategy is proposed to select the opti-
−1 T
mal local modeling size adaptively for local KPLS modeling. Finally, Ŷ = B = KU(TT KU) T Y = TTT Y, (5)
the proposed framework can adapt to new process state by adding
T T −1
new samples into the database from which the samples for local where T may be expressed as T = R with R =  U(T KU) .
modeling are selected. For predictions made on testing points {xi }n+nt
i=n+1
the matrix of
The rest of this paper proceeds as follows. Section 2 briefly out- regression coefficients B have to be used, i.e.
lines the theories about KPLS regression, JIT learning, PMI criterion, −1 T
and GMM algorithm. The proposed adaptive soft sensing algorithm, Ŷt = t B = Kt U(TT KU) T Y, (6)
MJIT-KPLS, is discussed in detail in Section 3. Subsequently, a case where t is the matrix of the mapped testing samples and the
study of industrial fed-batch chlortetracycline fermentation pro- resulting Kt is the (nt × n) “test” matrix whose elements are Kij =
cess is used to evaluate the proposed algorithm in Section 4. Finally, n+nt
K(xi , xj ) where {xi }i=n+1 and {xj }nj=1 are the testing and training
this research is concluded in Section 5.
samples, respectively.
It is worth noting that before nonlinear kernel projection, the
2. Preliminaries mean centering should be conducted on the above kernel matrix
by the following procedures
In this section, KPLS regression, JIT learning, PMI criterion, and  1
  1

GMM algorithm are briefly introduced. K= I− 1n 1Tn K I− 1n 1Tn (7)
n n
 1
 1

2.1. Kernel partial least squares Kt = Kt − 1n 1T K I− 1n 1Tn , (8)
n t n n
As an extension of the standard PLS proposed by Wold (1966), where I is again an n-dimensional identity matrix, and 1n , 1nt rep-
the KPLS model is constructed by mapping the original input into resent the vectors whose elements are ones, with length n and nt ,
a feature space F where a linear PLS model is created (Rosipal and respectively.
Trejo, 2001). Assume a nonlinear transformation of the input vari-
ables {xi }ni=1 into a high-dimensional feature space F, i.e. mapping 2.2. Just-in-time learning
given by

˚ : xi ∈ Rd → ˚(xi ) ∈ F (1) Just-in-time learning (Cybenko, 1996) is also called instance-


based learning, lazing learning or model-on-demand. This
Effectively it means that we can obtain a nonlinear regression approach can cope with changes in process characteristics as well
model in the space of the original input variables. as nonlinearity, thus has been used for nonlinear process modeling,
80 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

Global modeling Just-in-time modeling the PMI between X and Y, for a set of already selected inputs Z, is
therefore given by
Modeling Query Query   
data data   fX  ,Y  (x , y )
tools
x new PMI(X, Y |Z) = fX  ,Y  (x , y ) ln dx dy , (9)
x new fX  (x )fY  (y )

where
Offline
Relevant Local
Database global Database dataset model x = x − E(x|Z), y = y − E(y|Z), (10)
model
where E(·) denotes the expectation operation. Use of the condi-
Predicted Predicted tional expectations ensures that the resulting variables x and y
Historical output Historical Modeling output stand for the residual information in variables x and y, once the
data ŷnew data tools ŷnew effect of the already selected variable(s) Z has been taken into con-
Once the model is built, the Once the predicted output is obtained, sideration.
database is discarded. the local model is discarded. PMI is symmetric under the same condition Z, PMI(X, Y |Z) =
PMI(Y, X|Z). We have 0 ≤ PMI(X, Y |Z) where zero is obtained only
Fig. 1. Comparison between global and JIT modeling methods.
if X and Y are independent under condition Z. In the case of Z = ∅,
PMI(X, Y |Z) = MI(X, Y ).
A sample estimate of the PMI criterion can be formulated as
 
1
n
monitoring as well as soft sensing (Cheng and Chiu, 2004; Fujiwara fX  ,Y  (xi , yi )
PMI(X, Y |Z) = ln , (11)
et al., 2009; Kim et al., 2013). n fX  (xi )fY  (yi )
i=1
In comparison with the conventional global modeling methods,
JIT modeling has the following features: (i) the available input and where xi and yi are the ith residuals in the sample data set of size n;
output data are stored in a database; (ii) only when estimation is fY  (yi ), fX  (xi ), and fX  ,Y  (xi , yi ) are the marginal and joint probability
required, a local model is constructed from relevant samples deter- densities, respectively.
mined through some similarity measure to query data, and then the The computation of PMI is similar to that of MI, where both
output variable is estimated; (iii) the built local model is discarded require estimation of probability densities. In MI, the probability
after its use for estimation; and (iv) JIT based model is inherently densities are estimated for the original inputs and output, whereas
adaptive by simply adding the new available input and output data in PMI, the densities are estimated for the residual information in
into the database. variables X and Y after considering the effect of the already selected
The difference between conventional global modeling and JIT inputs Z. Various methods have been proposed for PMI estima-
modeling is illustrated in Fig. 1. The advantages of global models are tion, such as box-counting algorithm (Paluš et al., 2001), histograms
that they enjoy a mature mathematical and computational theory (Fernando et al., 2009), kernel estimator (May et al., 2008a, 2008b;
so that the resulting models are easy to interpret in the frame- Scott, 1992; Sharma, 2000), and k-nearest neighbor (k-NN) statis-
work of the underlying applications (as physical laws for example). tics (Frenzel and Pompe, 2007).
However, global methods have two major drawbacks. First, the
optimizations are difficult to perform and, with the data essentially 2.4. Gaussian mixture model
replaced by the model, there are no good methods to update models
if new data become available (Cybenko, 1996). Second, the global Gaussian mixture model (GMM) is a probabilistic model that
modeling needs to build a single global model which in practice assumes all the data points are generated from a mixture of a
is extremely compute intensive and rarely yields good approxima- finite number of Gaussian distributions with unknown parameters.
tions on large problems. Alternatively, JIT learning uses relevant As a kind of statistical inference based clustering method, GMM
data similar to query data to build models dynamically as the need has been widely used in pattern recognition and machine learning
arises. (Bishop, 2006) as well as industrial applications such as process
In the JIT learning framework, given a query data xnew , there monitoring (Yu and Qin, 2008, 2009) and soft sensor modeling (Yu,
are three main steps to predict its output: (i) relevant samples that 2012a). For an arbitrary data x ∈ Rd , we suppose that it comes from
are most relevant to xnew are selected from the database by a sim- a Gaussian mixture distribution given by
ilarity measure; (ii) a local model is built based on the selected
dataset; and (iii) the new output ynew is predicted using the built
C

local model, then the local model is discarded. When a new query p(x|GM ) = i N(x|␮i , i ), (12)
data comes, a new local model will be built based on the same i=1
procedure.
where C denotes the number of Gaussian components; GM =
{␮1 , . . ., ␮C , 1 , . . ., C , ␲1 , . . ., ␲C } is the vector of parameters of
the mixture model; N(x|␮i , i ) represents a multivariate Gaussian
2.3. Partial mutual information distribution with the mean vector ␮i and the covariance matrix i ,
and the corresponding probability density function is given by
Mutual information (MI) is a nonparametric and nonlinear mea-
sure of relevance derived from information theory. It can be used 1 1 −1
p(x|GM,i ) =
exp − (x − ␮i )T i (x − ␮i ) (13)
to identify nonlinear dependence between input and output vari- 2
(2)d det(i )
ables. However, MI cannot deal with dependencies between input
variables. To overcome this problem, Sharma (2000) introduced with the mixing coefficients {i }Ci=1 which satisfy
the partial mutual information (PMI) criterion, which addresses
the issue of redundant variables. This is achieved by quantifying
C

the nonlinear dependence of candidate input X on output Y that is i = 1, 0 ≤ i ≤ 1. (14)


not accounted for by the already selected input variables Z. Thus, i=1
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 81

Suppose we have a dataset of observations X = {xi }ni=1 that are inde- values of PMI between variables are not known, and thus have to
pendently drawn from the Gaussian mixture distribution, then the be estimated from the available data {X,y}. For this reason, a k-NN
parameters of mixture can be estimated by maximizing the log of based estimator proposed by Frenzel and Pompe (2007) is used to
the likelihood function given by calculate the PMI criterion.
C  However, two difficulties are encountered in the PMI based

n
selection of input variables. The first one concerns the choice of
ln p(X|GM ) = ln c N (xi |␮c , c ) (15)
neighbor number used in k-NN based estimator. The number k of
i c=1
neighbors taken into account in the PMI estimation must be chosen
The expectation-maximization (EM) algorithm is commonly used carefully, especially in the case of a small sample and noisy data.
for GMM estimation (McLachlan and Peel, 2000). Often, the result of the selection highly depends on the choice of
According to the Bayesian inference strategy, the posterior this parameter. If the number of neighbors is too small, the esti-
probability of x belonging to the ith Gaussian component can be mation will have a large variance; if the number of neighbors is
calculated by too large, the variance of the estimator will be small, but all esti-
mations will converge to zero, even for highly-dependent variables.
i N(x|␮i , i )
P(GM,i |x) = (16) To address this issue, a combined K-fold/permutation test (François

C
et al., 2007) is used for determining the optimal number of neigh-
c N(x|␮c , c ) bors.
c=1 The second difficulty lies in deciding when to halt the selection
procedure. As necessary part of any stepwise predictor selection
3. Proposed adaptive soft sensor method is a criterion that indicates whether the selected input
variable is significant to the output variable. In this work, a con-
The proposed MJIT-KPLS soft sensing algorithm builds soft fidence limit is calculated assuming independence between inputs
sensors online by performing KPLS regression in multiphase JIT and the output variable. This independence is formed by random-
modeling framework. The JIT learning framework for each operat- izing samples from the original variable, and this can be achieved
ing phase is similarly determined by a group of offline parameters by bootstrapping for which the bootstrapping size is chosen as
which are different from phase to phase. Thus the development 100 in our work. Then, we can obtain a 95th percentile confidence
of MJIT-KPLS soft sensors consists of building multiple similar limit (referred to as the 95th percentile randomized sample PMI
JIT-KPLS modeling frameworks. Without considering a particular score). That is, if the PMI score of the selected input is greater than
phase, the development of JIT-KPLS soft sensors can be split into the estimated 95th percentile randomized sample PMI score, one
the following steps: (i) outlier detection of multiway data; (ii) input can conclude that there would be less than a 5% chance that the
variable selection; (iii) similarity measure definition; (iv) adaptive variables are truly independent.
sample selection, online training and prediction; and (v) database A stepwise procedure of input variable selection based PMI cri-
updating. The above steps will be discussed in detail in the fol- terion can now be summarized as follows:
lowing sections. The selection of multiphase offline parameters is
finally presented. Step 1. Initialize the potential input variables denoted by {Xi }di=1 ,
and set the selected inputs as the vector Zin .
3.1. Outlier detection of multiway data Step 2. Determine the optimal number of neighbors for k-NN esti-
mator using the combined K-fold/permutation test.
To develop robust soft sensors, 3 edit rule is firstly used to Step 3. Compute the PMI values between the output variable Y and
clean outliers in modeling samples. The presence of outliers may each of the plausible new input variables in Zin , conditional to the
lead to model misspecification, biased parameter estimation and pre-existing input variables Z.
incorrect analysis results (Liu et al., 2004). However, the outlier Step 4. Identify the variable with the highest PMI score in the pre-
detection of batch process data is different from other processes vious step. If this PMI score is greater than the 95th percentile
due to their unique three-dimension matrix X - (IB × JB × KB ) with randomized sample PMI score, add the new input variable into Z;
IB , JB , and KB representing the number of batches, process vari- otherwise terminate the selection procedure.
ables, and sampling instants, respectively. Firstly, X - are converted Step 5. Repeat Steps 3 and 4 until all significant input variables
into a two-dimension matrix X (IB × KB JB ) via batch-wise unfolding have been selected.
(Westerhuis et al., 1999). Then 3 edit rule is performed on the data
block within the same sampling instant to remove the outliers. 3.3. Similarity measure definition
Finally, the clean samples are further rearranged to X (IB KB × JB )
through variable-wise unfolding (Westerhuis et al., 1999) as illus- In JIT learning framework, those samples with high similarity to
trated in Fig. 2. query data are selected for building local model when estimation is
required. The similarity measure between samples has significant
3.2. Input variable selection influence on the estimation accuracy of JIT based models. The most
frequently used similarity is defined on the basis of the Euclidean
Selecting appropriate input variables is crucial to enhance the distance or the Mahalanobis distance (Ge and Song, 2010; Kim
estimation accuracy and maintain the reliability of soft sensors, but et al., 2013). Some other similarity measures are based on the angle
the variable selection is one of the most difficult tasks concern- based (Cheng and Chiu, 2004) and correlation based (Fujiwara et al.,
ing soft sensor development (George, 2000). Presence of irrelevant 2009) methods. More similarity measures were surveyed by (Kano
(or less relevant) variable data in the input data will leads to and Fujiwara, 2013; Kim et al., 2013). However, these similarity
performance deterioration of soft sensors. A soft sensor works sat- measures are defined only from the perspective of sample algebraic
isfactorily if only those secondary variables that are most relevant space while no process knowledge is concerned. Actually, the aim of
to the primary variable are employed (Pani and Mohanta, 2011). defining similarity measure is to select those samples that describe
To select input variables most relevant to output variable while similar process characteristics. A drawback of evaluating sample
at the same time considering the redundancy between input vari- similarity from algebraic space alone is that by using such simi-
ables, PMI criterion is applied in our work. In practice, the true larity measure, some samples with high similarity exhibit greatly
82 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

Fig. 2. Illustration of outlier detection for batch processes data.

different process behavior since they do not belong to similar by dividing their standard deviations, and (ii) center them by sub-
process phases. Hence, for batch processes, we aim to enhance tracting their averages.
the predictive capability of JIT based soft sensors by integrating
Euclidean distance based sample similarity with phase similarity. 3.4. Adaptive sample selection, online training and prediction
Subsequently, it is necessary to define a measure for describing
phase similarity. In batch processes each run has the same duration After defining the similarity measure, the similarities between
and the variables follow closely some predetermined trajectories the query data and the historical samples can be calculated. For JIT
(Westerhuis et al., 1999). In general, batch processes go through based soft sensors, only those samples that are most relevant to
multiple phases during the batch run. Among process variables, query data are selected. Thus, the number of local modeling sam-
process duration plays a significant role to indicate the progress of ples Nlocal is usually less than the number of samples (n) in database.
the batch. Thus in this work, process duration is chosen as an indi- In practice, how to determine the local modeling size is difficult.
cator variable to evaluate the phase similarity. Although the phase The conventional method is to determine this parameter offline
similarity can be roughly defined based on process duration, it is not by trial and error, as reported in (Ge and Song, 2010). Although
enough to select local modeling samples by adopting phase simi- the local modeling size determined offline give rise to high accu-
larity alone, because batch-to-batch variation in batch processes is racy for training data, such empirical method is time-consuming
inevitable. Therefore, a hybrid similarity measure is defined by and the obtained parameters are no longer optimal for new pro-
samp phase
cess data. Thus, the predictive performance may deteriorate if using
si = si + (1 − )si , (17) fixed number of local modeling size. Moreover, different query data
 samp  may correspond to different optimal number of local modeling size.
samp di To enhance the predictive accuracy and adaptability of JIT based
si = exp − samp , (18)
d ϕsamp soft sensors, an adaptive sample strategy based on online cross-
min and N max
 validation is proposed. Specially, two parameters Nlocal local
phase are chosen such that only the relevant datasets formed by the
phase di
si = exp − , (19) min max
Nlocal th relevant data to the Nlocal th data are used in the regression.
phase
d ϕphase min , N max ](l = 1, 2, . . ., L) with the search step N
For Nl ∈ [Nlocal local step , the
samp total number of potential local models involved in the optimization
di = ||xnew − xi ||, (20)
procedure is
phase  
di = ||tnew − ti ||, (21) max − N min
Nlocal local
L = int +1 (22)
Nstep
samp phase
where si , si
and (i = 1, 2, · · ·, n) denote hybrid similarity,
si
sample similarity and phase similarity, respectively;  is a weight The detailed algorithm of the JIT-KPLS soft sensor is described as
parameter between 0 and 1; d
samp
and d
phase
are the standard devi- follows. Given a database D = {X, y} = {xi , yi }ni=1 , the offline param-
min , N max , N
eters Nlocal
ations of di
samp
and di
phase
, respectively; tnew and ti represent process local step , weight parameter , the number of latent

duration corresponding to query data xnew and historical samples variables LV, RBF kernel width , and a query data xnew with time
in database, respectively; ϕsamp and ϕphase are two localization variable tnew :
parameters. The proposed hybrid similarity measure is equivalent
to the conventional single similarity measure when  = 1. Step 1. Set l = 1, and calculate the similarity si (i = 1, 2, . . ., n)
Usually, the two similarity measures, si
samp
and si
phase
should be between xnew and each data xi by using Eq. (17).
normalized before weighted calculation to eliminate the influence Step 2. Arrange all the si in the descending order. The relevant
resulted from magnitude differences. In the absence of knowledge dataset Dl = (Xl , yl ) is constructed by selecting Nl most relevant
about the relative importance of the variables, the standard mul- samples, corresponding to the largest si to the Nl th largest si , where
min + (l − 1)N
Nl = Nlocal
tivariate approach is use to (i) scale each variable to unit variance step .
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 83

Step 3. Compute the prediction performance of local KPLS model of the conventional updating method is that the samples in some
on training dataset Dl via K-fold cross-validation. process regions may become sparse because the oldest samples
We split the selected dataset Dl into K roughly equal-sized parts. deleted may describe different characteristics from that described
For the kth part, we fit the local KPLS model to the other K − 1 by the new samples. For batch processes, new points should be used
parts of the data and the prediction error of the fitted model when to replace those old points that describe similar process behaviors
predicting the kth part of samples is given by to assure that the samples in every process region is new and dense.
 Most data-driven soft sensor modeling techniques rely on
nk
y
i=1 i
− fKPLS (xi , k ) supervised learning where the model is trained on available his-
RMSEk = , (23) torical data. This learning method is an inductive process – given a
nk
finite training set we wish to infer a function that makes predictions
where fKPLS (x, k ) represents the local KPLS model built from K − 1 for all possible input values. For all the variety, supervised learn-
parts of the dataset Dl , k means parameters of local PLS model, ing algorithms are based on the idea that similar input patterns
and nk denotes the number of samples in the kth part of samples. will usually give rise to similar outputs (or output distributions)
Repeat the above training and prediction for K times, and then the (Rasmussen and Williams, 2006). This idea is the essence of data-
cross-validation estimation of prediction error is calculated by driven modeling that is based on supervised learning. Therefore,
changes in process characteristics will result in the changes of rela-
1
K
tions between input and output data. Exactly, similar input under
CVl = RMSEk (24)
K new operation condition may not give rise to similar output as
k=1
before. Consequently, to capture the newest process characteris-
Step 4. Set l = l + 1, and go to Step 2 until l = L. tics, the task of updating database shifts to finding the old samples
Step 5. According to the cross-validation errors, the optimal Nl is whose inputs are most similar to inputs of new samples, and then
determined by replace them.
opt Following the idea mentioned above, maximum similarity
Nl = arg min(CVl ) (25)
l replacement (MSR) rule is proposed to update database in MJIT-
opt
KPLS soft sensing algorithm. In our proposed updating strategy, the
Step 6. Train local KPLS model using the most Nl relevant sam- sample updating is only carried out in sample datasets with similar
opt
ples, corresponding to the largest si to the Nl th largest si , and process characteristics. That is, when a new sample is added into
then the output variable is predicted by the database, an old sample that is most similar to the new sample
will be removed.
ŷnew = fKPLS (xnew , opt ), (26)
The modeling framework of the JIT-KPLS soft sensor is depicted
where opt denotes the optimal model parameters of the local in Fig. 3. Given a new sample (xnew , ynew ) with time variable tnew ,
KPLS model. and similarity weight update . The database updating algorithm is
Step 7. When the next query data comes, go to Step 1. explained as follows:

In the present work, five-fold cross-validation is adopted. The Step 1. Check whether the new sample is outlier; if yes, discard it;
KPLS regression detailed in Section 2.1 servers as the local modeling otherwise go to Step 2.
method in the proposed adaptive soft sensing algorithm. Step 2. Compute the similarity between the new sample and his-
torical samples based on the proposed hybrid similarity measure
3.5. Database updating scheme given by

JIT based models assume that all available observations are si = update si
samp phase
+ (1 − update )si , (27)
stored in a database, which distinguishes JIT modeling from other
global modeling techniques. Obviously, database technology is cru- where the similarity weight update may have different value from
cial to JIT based models. JIT modeling updates models in a different that used in adaptive sample selection.
way from traditional global methods. For example, neural networks Step 3. Find out the historical sample (xold , yold ) whose input has
need to be retrained to adjust the network parameters according the highest similarity to that of the new sample, then remove it
to the new operation. In the extreme cases, the network structure and add the new sample (xnew , ynew ) into the database.
needs to be re-determined to achieve better prediction. Clearly, Step 4. When a new sample is available, go to Step 1.
this procedure is not desirable from computational point of view.
In contrast, JIT based model is inherently adaptive by simply adding
3.6. Selection of offline parameters
the current available input and output data to the database.
As Kim et al. (2013) stated, for nonlinear time-varying pro-
Although local model is built online in JIT learning framework,
cesses, both the age and the density of samples are important
there are still some critical offline parameters need to be optimized
indexes to evaluate the goodness of database. The estimation per-
offline using the modeling data. These parameters are as follows:
formance may deteriorate when the samples in the database are
old and sparse. The samples in the database should be carefully
selected to assure the estimation performance. Assume a database (i) The number of latent variables for KPLS regression, LV.
with unchanged size has been built in advance, our focus shifts (ii) The width of RBF kernel function for KPLS regression, .
to updating database when implementing online. To maintain the (iii) The similarity weights,  and update , which are used for
database size, adding a new sample means removing an old one. selecting local modeling samples and updating database,
A popular updating strategy is the moving time-window method, respectively.
in which a newly obtained sample is added to the database and (iv) The localization parameters, ϕsamp and ϕphase , which are used
the oldest sample is removed from the database. This method has for sample similarity and phase similarity, respectively.
min ,
(v) The search parameters of the optimal local modeling size, Nlocal
been successfully used in the steel processes for more than 7 years
max , and N
Nlocal
(Shigemori et al., 2011). Nevertheless, this updating method should step , which represent the minimum, the maximum
be improved for applying in batch processes. The main drawback and the search step.
84 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

Fig. 3. Modeling framework of the JIT-KPLS soft sensing algorithm.

These parameters construct a vector given below The schematic diagram of the proposed MJIT-KPLS algorithm
is depicted in Fig. 4, and the step-by-step procedure is listed as
min max
off = {LV, , , update , Nlocal , Nstep , Nlocal , ϕsamp , ϕphase } (28) follows:

Since multiple phases are often involved in batch processes, multi- Step 1. Collect the input and output data of the batch processes
phase modeling is preferable to the conventional single modeling and remove the outliers using multiway 3 edit rule.
framework. That is, multiple groups of offline parameters, i.e. Step 2. Perform GMM clustering to identify C operating phases
off,i (i = 1, 2, . . ., C) are required for different operating phases. where various JIT-KPLS modeling frameworks are determined for
Thus, GMM is estimated to identify the operating phases through- different phases.
out batch run, resulting C Gaussian components. For the C identified Step 3. When a query data comes, firstly compute its posterior
phases, the corresponding JIT-KPLS modeling frameworks are probabilities with respect to different operating phases and choose
expressed as the JIT-KPLS modeling framework corresponding to the maximal
posterior probabilities.
{JIT-KPLS(x|off,1 ), JIT-KPLS(x|off,2 ), . . ., JIT-KPLS(x|off,C )} Step 4. Select the dataset most relevant to query data from the
database and then build a local KPLS model.
(29)

Within each phase, the optimal parameters off,i are optimized


offline for the ith operating phase. When implemented online, the
Bayesian inference based posterior probabilities for query data xnew
with respect to different phases are estimated, and then the JIT-
KPLS modeling framework corresponding to the batch phase with
the maximal posterior probability is adaptively chosen for online
prediction.
It is worth noting that the sample database is equally shared
by all JIT-KPLS modeling frameworks. That means that for every
query data, the local modeling samples are searched throughout
the whole database irrespective of considering which phase the
query data belongs to. By contrast, only the offline parameters cor-
responding to the same phase as query data are adaptively chosen
for local KPLS training and prediction.
Owing to the lack of a systematic guideline for determining
the optimal values of offline parameters, two methods are recom-
mended: (i) selection by optimization on validation set; and (ii)
cross-validation on training data. Ideally, if we had enough data,
we would set aside a validation set and use it to assess the perfor-
mance of our predictive model that is characterized by potential
offline parameters. If data are scarce, to solve the problem, K-fold
cross-validation is preferable, in which part of the available data in
training data is used to fit the model, and a different part is used
to test it. The combination of offline parameters resulting in the
smallest validation error is finally chosen. Fig. 4. Schematic diagram of the proposed MJIT-KPLS soft sensing algorithm.
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 85

Fig. 5. Schematic diagram of the industrial fed-batch CTC fermentation process.

Step 5. Estimate the output of the query data, and then discard the 4.1. Industrial chlortetracycline fed-batch fermentation process
model.
Step 6. When a new query data comes, return to Step 3. Chlortetracycline (CTC) is a kind of broad-spectrum antibiotics,
which is included in the tetracycline family and produced by the
4. Case study cultivation of Streptomyces aureofaciens. It has been widely used
in pharmaceutical, agriculture and animal husbandry. Especially,
In this section, the proposed soft sensing algorithm is tested on CTC is a perfect feed additive due to the ability of promoting the
industrial fed-batch chlortetracycline fermentation process. Two growth of animals. In addition, CTC is characterized by low residual
indexes are used to evaluate the model performance, including amount, low production cost and mature production technology.
root-mean-square error (RMSE) and the coefficient of determina- Thus, CTC has become a bacteriostatic growth additive with the
tion (R2 ) given by largest consumption in feed industry for a long time and will hold
 this trend in the future.

 1
Ntest The CTC fermentation process under study is an industrial
RMSE = 
2
(ŷi − yi ) , (30) process practiced in a biochemical factory belonging to Charoen
Ntest
i=1 Pokphand Group. The schematic diagram of the fed-batch CTC fer-
  mentation process is shown in Fig. 5.
Ntest 2 The cultivation of S. aureofaciens is carried out in an air-lift
i=1
(ŷi − yi )
R2 = 1 −  , (31) stirred fermentor with a volume of 120 m3 . This process mainly
Ntest
i=1
(yi − ȳi )2 goes through two stages: the cultivation process begins with a short
period of batch operation to grow the microorganisms and maxi-
where ŷi is the estimated output, yi is the actual output, and ȳi is mize the cell density; and then it is switched to fed-batch mode in
the mean of the actual output; Ntest denotes the number of testing order to boost the synthesis of CTC as the target metabolite while
samples. the cell growth continues at a slower rate. The fermentor serves
RMSE is used as an indicator of the predictive accuracy in online as the major equipment in the process with the substrate and air
prediction. R2 represents a squared correlation between the actual continuously fed in to supply the raw materials of cell culture and
output and the predicted output, it measures how much of the total maintain the necessary oxygen consumption for microorganisms.
variance in the output variable data can be explained by the model. The industrial plant and product are shown in Fig. 6. Meanwhile,
If the model accounts for a small percentage of the total variation the trend plots of variables are given in Fig. 7.
of the output variable (R2 → 0), i.e. if the unexplained variation is A distributed control system is applied to allow the auto-
high, then the model is either not good or there is a weak correlation matic supervision and control of the main process variables. The
between the input and output variables. supervisory system is built by using Ethernet interconnecting pro-
The configuration of the computer for soft sensor modeling is grammable logical controllers, which actuate directly on process
listed as follows: OS: Windows 7 Ultimate (32 bit); CPU: Pentium equipments, with human–machine user interfaces. Five control
(R) Dual-Core E6600 (3.06 GHz, 3.07 GHz); RAM: 2 GB; MATLAB ver- loops are involved: temperature, pH, antifoam feeding, DO, and
sion: 2010a. substrate feeding. The temperature is controlled at 32 ◦ C in batch
86 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

on their experience. This situation results from the fact that no


reliable online sensor is available for measurements of substrate
concentration.
Although there are some attempts being done on online mea-
surements, no reliable sensors, up to now, are suitable for industrial
use for fermentations. Bioprocesses are remarkably harsh envi-
ronments for online sensors, as the growing culture can infiltrate
sensors and this invalidates their results. In real production process,
substrate concentration has to be analyzed offline every 6 h. In addi-
tion, large time delay is caused since a large number of fermentors
need to be analyzed at each analysis period. To provide online pre-
diction of substrate concentration, a data-driven soft sensor based
on the proposed soft sensing algorithm is developed in next section.

4.2. Online prediction of substrate concentration

Three industrial datasets were collected for soft sensor modeling


and evaluation: (i) training dataset (from September 3 to November
18, 2011) collected from 50 batches consists of 755 samples with
Fig. 6. The industrial plant and product of CTC fermentation process. the sampling interval of 6 h; (ii) testing dataset I (from November
19, 2011, to January 2, 2012) collected from 25 batches consists
of 388 samples with the sampling interval of 6 h; and (iii) test-
culture period and 29 ◦ C in fed-batch period by manipulating the ing dataset II (from March 27 to April 22, 2013) collected from 30
flow rate of cold water. Ammonia water is automatically fed in to batches consists of 355 samples with the sampling interval of 8 h.
control the pH value at about 5.95. Antifoam feeding is triggered The training dataset is used for model learning while the testing
once the foam has reached the foam detection electrode. DO is con- dataset I is used for model evaluation. Besides, the testing dataset
trolled by manipulating air flow rate using a cascade PID controller. II is used to assess the adaptation capability of the proposed MJIT-
Different from the four variables mentioned above, the feeding rate KPLS and other soft sensors because it describes a new process
of substrate remains to be determined by skilled operators based state whose characteristics has significantly changed. Nine input

34 6.4 150
oxygen (%)
Dissolved

32 6.2 100
pH

30 6 50

28 0
0 50 100 0 50 100 0 50 100

5
x 10
Air flow rate (m3/h)

4000 4 800
feeding rate (L/h)
onsumption (m3)

600
Substrate

3000
2 400
Air

2000
200
1000 0 0
0 50 100 0 50 100 0 50 100

4
x 10
Culture volume (m3)

4 3000 100
consumption (L)
consumption (L)

Ammonia
Substrate

2000 90
2
1000 80

0 0 70
0 50 100 0 50 100 0 50 100

6
concentration (%)
Substrate

2
0 50 100
Time (h)

Fig. 7. Trend plots of variables in fed-batch CTC fermentation process.


H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 87

Table 1 Table 2
Results of input variable selection based on PMI criterion. The 95th percentile PMI Average prediction results from MJIT-KPLS A-D models (testing dataset I).
corresponds to the 95th percentile randomized sample PMI score. Note that the last
row with numbers in bold represents the case where the PMI score is lower than the Model RMSE R2
95th percentile randomized sample PMI score, because of which pH is not included MJIT-KPLS A 0.2483 0.9442
in the final input variable set. MJIT-KPLS B 0.2416 0.9472
MJIT-KPLS C 0.2461 0.9452
No. Potential input variable PMI 95th percentile PMI
MJIT-KPLS D 0.2393 0.9482
1 Cultivation time (min) 0.7105 0.1557
2 Substrate feeding rate (L/h) 0.6347 0.1188
3 Ammonia consumption (L) 0.4967 0.1120 MJIT-KPLS C: MJIT-KPLS with single similarity measure and adap-
4 Substrate consumption (L) 0.4296 0.1321 tive sample selection.
5 Air consumption (m3 ) 0.3946 0.1205
6 Temperature (◦ C) 0.3209 0.0953
MJIT-KPLS D: MJIT-KPLS with hybrid similarity measure and adap-
7 Culture volume (m3 ) 0.2905 0.1258 tive sample selection.
8 Dissolved oxygen (%) 0.2737 0.1192
9 Air flow rate (m3 /h) 0.2524 0.1023 To build MJIT-KPLS soft sensors, the vector of offline parame-
10 pH 0.1026 0.1225
ters off discussed in Section 3.6 must be determined before online
implementation. The localization parameters, ϕsamp and ϕphase are
set to 2.5. And the similarity weight for database updating is deter-
variables were finally selected for soft sensor development based mined by trial and error in the range of [0, 0.1, . . ., 1]. It is noted
on PMI criterion (see Table 1). that the fixed local modeling size Nlocal is used for MJIT-KPLS A and B
min , N
soft sensors, while a group of search parameters {Nlocal max
To enable multiphase modeling, the operating phases of indus- step , Nlocal }
trial fed-batch CTC fermentation process are identified using are employed in MJIT-KPLS C and D soft sensors. The remaining
Gaussian mixture model and the phase identification results are undetermined offline parameters are optimized by five-fold cross-
shown in Fig. 8. A three-stage cultivation is applied to the industrial validation in the following ranges:
CTC production process, where S. aureofaciens firstly goes through
LV = [1, 2, . . ., 15]; (i)
progressive scale-up cultivation during the former two stages until
the microorganisms concentration reaches a desired level, and then  = [1, 2, . . ., 15]; (ii)
the seed liquid is inoculated to the fermentor corresponding to the
third stage. Thus, for the third stage on which we focus, the entire  = [0, 0.1, . . ., 1]; (iii)
CTC cultivation process is identified with three distinct phases that Nlocal = [40, 45, . . ., 150]; (iv)
correspond to the exponential phase, stationary phase, and death
phase throughout the microbial culture cycle. Subsequently, vari- min max
Nlocal = [40, 45, . . ., 90], Nlocal = [110, 115, . . ., 150],
ous JIT-KPLS modeling frameworks are built for different operating
phases. and Nstep = 5. (v)
The first goal of our experiments is to evaluate the effectiveness The offline parameters for different phases are determined in the
of the proposed hybrid similarity measure and the adaptive sample same way. The above settings result in building a large number
selection strategy. Thus, the following four models with different of JIT modeling frameworks with different parameter combination
combination of similarity measure and sample selection strategy off . For every type of MJIT-KPLS models, ten groups of the optimal
are compared. offline parameters determined using training data are selected to
obtain reliable evaluation of the model performance.
MJIT-KPLS A: MJIT-KPLS with single similarity measure and fixed The performance comparison of MJIT-KPLS A-D models on the
local modeling size. testing dataset I is shown in Fig. 9. Meanwhile, the average pre-
MJIT-KPLS B: MJIT-KPLS with hybrid similarity measure and fixed diction results are listed in Table 2. Comparing model A with B
local modeling size. reveals that the MJIT-KPLS soft sensors using the proposed hybrid
similarity measure outperforms those using the conventional sin-
gle similarity measure. The comparison results between model C
4 and D also support the efficiency of the new hybrid similarity mea-
sure. This fact indicates that, for batch processes, phase similarity

Death phase
0.25
3
Phase No.

Stationary phase
0.245
2
RMSE

Exponential
phase

1 0.24

MJIT-KPLS A
MJIT-KPLS B
MJIT-KPLS C
MJIT-KPLS D
0 0.235
0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10
Time (h) Model number

Fig. 8. Phase identification of industrial fed-batch CTC fermentation process. Fig. 9. Performance comparison between MJIT-KPLS A-D models (testing dataset I).
88 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

measure plays a critical role of evaluating process similarity. It also Without update
shows that process priori knowledge or expert knowledge has to 0.255 λ update=1

be taken into account especially when developing data-driven soft λ update=0.8


λ update=0.6
sensors. Moreover, the adaptive sample selection strategy is also
analyzed. A direct comparison of the results of models B and D 0.25

indicates that the prediction accuracy is improved by using the

RMSE
proposed adaptive sample selection strategy instead of the con-
0.245
ventional fixed modeling size. According to the above comparison
results, it can be concluded that MJIT-KPLS D performs best owing
to the hybrid similarity measure and the adaptive sample selection 0.24
method.
In the next step, the effectiveness of the proposed database
updating scheme, i.e. maximum similarity replacement (MSR) rule 0.235
1 2 3 4 5 6 7 8 9 10
is evaluated. For traditional data-based modeling methods, it is not Model number
(a) MJIT-KPLS A
a trivial task to update models online. In contrast, MJIT-KPLS is
0.25
inherently adaptive by simply adding available new input-output Without update
0.248
pairs into database, which makes the soft sensor capable of cap- λ update=1

turing the new process characteristics. In the present soft sensing 0.246 λ update=0.8
λ update=0.6
algorithm, MSR rule is applied to update the database. Two sce- 0.244
narios are compared: online prediction without updating and with
0.242
updating under various update similarity weights. In the former,

RMSE
0.24
the original database remains unchanged, whereas in the latter the
database is constantly updated by using MSR rule once a new input- 0.238

output pair is available. Compared to the prediction results without 0.236


database updating, without exception, the prediction errors of 0.234
MJIT-KPLS A-D models have all been clearly reduced by using MSR
0.232
database updating scheme (see Fig. 10).
To illustrate the predictive capability of the proposed soft 0.23
1 2 3 4 5 6 7 8 9 10
sensing algorithm, the prediction results from MJIT-KPLS D model Model number
(b) MJIT-KPLS B
for four testing runs are depicted in Fig. 11. It is readily observed that
0.255
the proposed soft sensor can be used to predict substrate concen- Without update
tration with small RMSE and high R2 . It should be emphasized that λ update=1
λ update=0.8
the adaptive sample selection method is used to determine local
0.25 λ update=0.6
modeling size for MJIT-KPLS D model. The optimal local modeling
sizes corresponding to the four testing batches are shown in Fig. 12.
It is shown that the obtained optimal local modeling size changes
RMSE

0.245
along sampling time. This may be due to the fact that batch process
shows different nonlinearity within different phases, resulting in
the variations of the optimal number of samples required to build
0.24
local models.
To further assess the predictive accuracy and the adaptation
capability of the proposed MJIT-KPLS soft sensing algorithm, the
0.235
following soft sensors are developed for comparison: 1 2 3 4 5 6 7 8 9 10
Model number
(i) Single-model based soft sensors. Three popular global model- (c) MJIT-KPLS C

ing techniques are selected: PLS (de Jong, 1993), KPLS (Rosipal 0.245
and Trejo, 2001), and GPR (Rasmussen and Williams, 2006).
(ii) Conventional multi-model based soft sensors. The adaptive
multi-model soft sensor modeling framework (Yu, 2012a) is 0.24
adopted to handle multiphase characteristics of batch pro-
cesses by constructing local predictive models for various
RMSE

operating phases. First, Gaussian mixture model is estimated 0.235


from training data to identify different operating phases. Fur-
ther, various localized models are built to characterize the
shifting dynamics across different phases. Using the Bayesian 0.23
Without update
λ update=1
inference strategy, the new sample is classified into a partic-
λ update=0.8
ular phase with the maximal posterior probability, and thus,
λ update=0.6
the local model representing the identical phase is chosen for 0.225
1 2 3 4 5 6 7 8 9 10
prediction. By applying the adaptive multi-model modeling
Model number
framework, multi-kernel KPLS (MKPLS) and multi-kernel GPR (d) MJIT-KPLS D

(MGPR) soft sensors are constructed.


Fig. 10. Prediction performance of MJIT-KPLS A-D models before and after database
(iii) Bayesian model averaging (BMA) based multi-model soft sen-
updating (testing dataset I).
sors. Though multiple localized models can handle multiphase
operation, they may not effectively characterize the transient
dynamics during the transitional stages. Thus, the BMA based
multi-model soft sensing framework (Yu et al., 2013) is used to
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 89

6.5 160
Offline analysis

Number of local modeling samples


6
MJIT-KPLS D prediction 150
Substrate concentration (%)

5.5
140
5
4.5 130
4
120
3.5 Phase I Phase II Phase III
3 110

2.5 100
2 Phase I Phase II Phase III
90
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Time (h)
Time (h)
(a) Testing batc h 1
(a) Testing batch 1, RMSE = 0.1599, R2 = 0.9647

6.5 160

Number of local modeling samples


6 Offline analysis
150
MJIT-KPLS D prediction
Substrate concentration (%)

5.5
140
5
4.5 130

4 120
3.5 Phase I Phase II Phase III
110
3
2.5 100
Phase I Phase II Phase III
2 90
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
Time (h)
Time (h) (b) Test ing batch 2
(b) Testing batch 2, RMSE = 0.1576, R2 = 0.9822
160
6.5
Number of local modeling samples

150
6 Offline analys is
MJIT-KPLS D prediction
Substrate concentration (%)

5.5 140

5 130
4.5
120
4
3.5 Phase I Phase II Phase III 110

3 100
2.5 Phase I Phase II Phase III
90
2 0 10 20 30 40 50 60 70 80 90 100
Time (h)
0 10 20 30 40 50 60 70 80 90 100 (c) Test ing batc h 3
Time (h)
(c) Testing batch 3, RMSE = 0.1475, R2 = 0.9865 160
Number of local modeling samples

6.5 150

6 Offline analys is
140
MJIT-KPLS D prediction
Substrate concentration (%)

5.5
130
5
4.5 120

4 110
3.5 Phase I Phase II Phase III
100
3 Phase I Phase II Phase III
2.5 90
0 10 20 30 40 50 60 70 80 90 100
2 Time (h)
(d) Testing batch 4
0 10 20 30 40 50 60 70 80 90 100
Time (h) Fig. 12. The adaptive optimal number of local modeling samples from MJIT-KPLS D
(d) Testing batch 4, RMSE = 0.1484, R2 = 0.9861 soft sensor, update = 0.8.

Fig. 11. Prediction results of four testing batches by using MJIT-KPLS D model with
database updating, and update = 0.8.
90 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

0.5 GPR soft sensors result in predictions that are by a large margin
Method 1: PLS more accurate than the predictions of the single PLS soft sensor. The
Method 2: KPLS
Method 3: GPR
great enhancement of prediction accuracy indicates that the non-
0.45
Method 4: MKPLS linear modeling techniques are more suitable for nonlinear batch
Method 5: MGPR processes.
Method 6: BMA-MKPLS
0.4 Method 7: BMA-MGPR
Though single nonlinear soft sensors are capable of dealing
Method 8: JIT-PLS with process nonlinearity, they may be ill-suited for batch pro-
Method 9: JIT-KPLS cesses with multiple operating phases. The multi-model based KPLS
RMSE

Method 10: MJIT-KPLS


0.35 and GPR are adopted so that the multiphase operation of batch
processes can be well characterized and thus give rise to better
BMA based prediction performance than the regular nonlinear single KPLS and
0.3 Single-model Multi-model JIT learning
multi-model GPR soft sensors (see Fig. 13 and Table 3).
Although the multiple localized models in MKPLS and MGPR
0.25
performs better than the single models, such methods may not
Testing dataset I effectively characterize the transient dynamics during the transi-
Testing dataset II tional stages between consecutive phases. Compared to MKPLS and
0.2 MGPR soft sensors, one can see that the BMA-MKPLS and BMA-
1 2 3 4 5 6 7 8 9 10
Method
MGPR soft sensors perform better because the Bayesian model
averaging strategy is designed to adaptively integrate the local
Fig. 13. Prediction RMSE from different soft sensor modeling methods. model predictions within the transitional stages between two adja-
cent operating phases (see Fig. 13 and Table 3).
built BMA-MKPLS and BMA-MGPR soft sensors. Compared to In contrast with the above-mentioned single or multi-model
the multi-model modeling methods, the local models for BMA based soft sensors which build global or local model offline on train-
based methods are constructed in the same way, whereas the ing data, the JIT based soft sensors build local models by selecting
prediction for samples within the transitional stage performs samples similar to the query data. Thus JIT learning based models
differently. In BMA framework, the between-phase transitional can effectively cope with abrupt changes in process characteristics
stage is determined by the posterior probabilities of measure- as well as gradual ones. Furthermore, it can cope with nonlin-
ment samples with respect to the local models. Further, the earity by building local models repeatedly. Based on the present
corresponding two consecutive phases are set as the adap- results in Fig. 13 and Table 3, it is obvious that the JIT-PLS, JIT-KPLS,
tive weightings to integrate the two adjacent local modes for MJIT-KPLS are superior to other methods. Even though linear PLS
prediction. is adopted in JIT-PLS method, it still obtains satisfactory prediction
(iv) JIT learning based soft sensors. JIT-PLS and JIT-KPLS soft sensors results due to the use of the JIT learning framework. The JIT-KPLS
are developed within the newly proposed JIT learning frame- method shows better performance than JIT-PLS method since the
work. That means, the proposed hybrid similarity measure, nonlinear KPLS instead of PLS is built. However, the JIT-PLS and
the adaptive sample selection strategy, and the MSR database JIT-KPLS soft sensors construct a single JIT modeling framework
updating scheme are applied to the two soft sensors. Unlike without considering the multiplicity of operating phases of batch
MJIT-KPLS approach, JIT-PLS and JIT-KPLS only build a global processes, which comprises the prediction performance. Among
JIT learning framework irrespective of the multiphase charac- all soft sensors, MJIT-KPLS performs best due to the integration of
teristics of batch processes. It should be noted that the JIT-PLS multiphase modeling, hybrid similarity measure, adaptive sample
method is the same as the JIT-KPLS approach except that the selection, and MSR database updating.
local PLS model instead of KPLS model is built. All soft sensors are also evaluated on testing dataset II which
describes a new process state different from the training data and
The prediction results on testing dataset I and II from differ- testing dataset I. Similar to the comparison results from testing
ent soft sensors are given in Fig. 13. Meanwhile, the quantitative dataset I, nonlinear modeling techniques is more capable of hand-
model comparison is listed in Table 3. The prediction performance ling process nonlinearity than linear method. Compared to the
on the testing dataset I are firstly analyzed. One can see that sin- single KPLS and GPR models, however, the prediction accuracies
gle PLS based soft sensor leads to fairly poor prediction in terms of MKPLS and MGPR are not improved by applying multi-model
of much higher RMSE and lower R2 . This is due to the fact that modeling framework. This performance degradation may be due to
PLS is essentially linear modeling technique, and thus cannot well the changes of process characteristics. In contrast, the BMA based
handle process nonlinearity. In contrast, both the single KPLS and MKPLS and MGPR approaches give lower RMSE and higher R2 than

Table 3
Quantitative comparison of prediction results from different soft sensors. Testing dataset I (from November 19, 2011, to January 2, 2012). Testing dataset II (from March 27
to April 22, 2013). The similarity weight used for database updating is update = 0.8.

No. Method Type Testing dataset I Testing dataset II


2
RMSE R RMSE R2

1 PLS Single-model 0.3837 0.8668 0.4780 0.8315


2 KPLS Single-model 0.2669 0.9356 0.3595 0.9047
3 GPR Single-model 0.2691 0.9345 0.3518 0.9088
4 MKPLS Multi-model 0.2587 0.9395 0.3786 0.8943
5 MGPR Multi-model 0.2539 0.9417 0.3585 0.9052
6 BMA-MKPLS BMA multi-model 0.2432 0.9465 0.3421 0.9137
7 BMA-MGPR BMA multi-model 0.2471 0.9448 0.3424 0.9135
8 JIT-PLS JIT learning 0.2366 0.9494 0.3147 0.9270
9 JIT-KPLS JIT learning 0.2342 0.9506 0.3129 0.9282
10 MJIT-KPLS JIT learning 0.2294 0.9524 0.3088 0.9297
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 91

0.05 0.08

0.07
0.045 mean = 0.0372, std = 0.0022 mean = 0.0411, std = 0.0075
CPU time (s)

0.06

CPU time (s)


0.04
0.05

0.035 0.04

0.03
0.03
1 2 3 4 5 6 7 8 9 10 0.02
Model number 0 50 100 150 200 250 300
(a) MJIT-KPLS A Testing sample
(a) MJIT-KPLS A
0.05
0.07
0.045 mean = 0.0360, std = 0.0049 mean = 0.0394, std = 0.0054
0.06
CPU time (s)

0.04

CPU time (s)


0.05
0.035

0.04
0.03

0.03
0.025
1 2 3 4 5 6 7 8 9 10
Model number
0.02
(b) MJIT-KPLS B 0 50 100 150 200 250 300
Test ing sample
(b) MJIT-KPLS B
0.8
1.2 mean = 0.8733, std = 0.2096
mean = 0.5483, std = 0.0931
1.1
0.7
CPU time (s)

1
CPU time (s)

0.9
0.6
0.8
0.7
0.5
0.6
0.5
1 2 3 4 5 6 7 8 9 10 0.4
Model number
(c) MJIT-KPLS C 0 50 100 150 200 250 300
Testing sample
1.1
(c) MJIT-KPLS C
1 mean = 0.7841, std = 0.1212
1.1
mean = 0.7040, std = 0.1108
0.9
CPU time (s)

1
0.8
0.9
CPU time (s)

0.7
0.8
0.6
0.7
0.5
1 2 3 4 5 6 7 8 9 10 0.6
Model number
(d) MJIT-KPLS D 0.5
0 50 100 150 200 250 300
Fig. 14. Average CPU running time for all testing samples from ten optimal models. Testing sample
(d) MJIT-KPLS D

Fig. 15. An example of CPU time for each testing sample using MJIT-KPLS D soft
other single-model or multi-model based methods. In addition, the sensor.
prediction results of JIT based soft sensors once again outperform
other soft sensors. However, it is obvious that the model perfor-
mance of all soft sensors deteriorate due to the great changes in Apart from the prediction accuracy, the real-time performance
process characteristics. Nevertheless, the JIT based soft sensors gain is also very important for JIT learning. Unlike the traditional model-
much better results than other methods due to their capability of ing methods which construct models offline, the JIT based model is
adapting to new process environment. In practice, the conventional built only when the estimation of a query data is required. Thus the
or BMA based multi-model methods cannot well deal with the computational load of the proposed MJIT-KPLS method is analyzed.
changes in process characteristics because such models are built The average CPU running time for all testing samples is shown in
offline and remain unchanged once implemented online. By con- Fig. 14, where ten optimal models are selected for each type of
trast, JIT based soft sensors are continuously updated by adding MJIT-KPLS method. Meanwhile, the CPU time for each testing sam-
new input-output data into the database. ple from an example prediction is given in Fig. 15. Clearly, models
92 H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93

C–D spend more time than models A–B due to an optimization pro- Cybenko G. Just-in-time learning and estimation. In: Bittanti S, Picci G, editors. Iden-
cess which is used to determine the local modeling size adaptively tification, adaptation, learning: the science of learning models from data. Berlin:
Springer; 1996. p. 423–34.
by cross-validation. The statistical mean values and standard devia- de Jong S. SIMPLS: an alternative approach to partial least squares regression.
tions of the CPU running time show that the real-time performance Chemom Intell Lab Syst 1993;18(3):251–63.
of the proposed MJIT-KPLS methods is satisfactory for most batch Delgado MR, Nagai EY, de Arruda LVR. A neuro-coevolutionary genetic fuzzy system
to design soft sensors. Soft Comput 2009;13(5):481–95.
processes where reliable online sensors for key quality variables Dochain D. Bioprocess control. London: ISE Ltd; 2008.
are usually not available. For the batch fermentation application Fernando TMKG, Maier HR, Dandy GC. Selection of input variables for data driven
under study, such short running time is negligible compared to the models: an average shifted histogram partial mutual information estimator
approach. J Hydrol 2009;367(3):165–76.
offline analysis of 6 h interval.
Fortuna L, Graziani S, Rizzo A, Xibilia MG. Soft sensors for monitoring and control of
industrial processes. London: Springer; 2007.
Fujiwara K, Kano M, Hasebe S, Takinami A. Soft-sensor development using
5. Conclusions correlation-based just-in-time modeling. AIChE J 2009;55(7):1754–65.
François D, Rossi F, Wertz V, Verleysen M. Resampling methods for parameter-
free and robust feature selection with mutual information. Neurocomputing
The MJIT-KPLS soft sensing algorithm, the main contribution of 2007;70(7):1276–88.
this work, provides a novel adaptive soft sensor modeling frame- Frenzel S, Pompe B. Partial mutual information for coupling analysis of multivariate
work for multiphase batch processes. By exploiting the JIT learning time series. Phys Rev Lett 2007;99(20):204101.
Ge Z, Song Z. A comparative study of just-in-time-learning based methods for online
framework, MJIT-KPLS allows to cope with changes in process soft sensor modeling. Chemom Intell Lab Syst 2010;104(2):306–17.
characteristics as well as process nonlinearity. Moreover, Gaussian George EI. The variable selection problem. J Am Stat Assoc 2000;95(452):1304–8.
mixture model enables automatic phase identification and mul- Gonzaga J, Meleiro L, Kiang C, Maciel Filho R. ANN-based soft-sensor for real-time
process monitoring and control of an industrial polymerization process. Comput
tiphase modeling. Through the Bayesian inference strategy, the Chem Eng 2009;33(1):43–9.
JIT-KPLS modeling framework representing the identical phase as Grbić R, Slišković D, Kadlec P. Adaptive soft sensor for online prediction and process
the query data is adaptively chosen for online prediction, which is in monitoring based on a mixture of Gaussian process models. Comput Chem Eng
2013;58:84–97.
favor of capturing shifting dynamics throughout multiple operating Haimi H, Mulas M, Corona F, Vahala R. Data-derived soft-sensors for bio-
phases of batch processes. Besides, the estimation accuracy of MJIT- logical wastewater treatment plants: an overview. Environ Model Softw
KPLS approach is further improved by using the combination of a 2013;47:88–107.
Jang J-S. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst
hybrid similarity measure, an adaptive sample selection strategy, a Man Cybern 1993;23(3):665–85.
maximal similarity replacement based database updating scheme, Jassar S, Liao Z, Zhao L. Adaptive neuro-fuzzy based inferential sensor model for
and the PMI based input variable selection. estimating the average air temperature in space heating systems. Build Environ
2009;44(8):1609–16.
The MJIT-KPLS soft sensor support model adaptation possibili-
Jia R, Mao Z, Wang F. KPLS model based product quality control for batch processes.
ties in four aspects: (i) automatically identifying operating phase CIESC J 2013;64(4):1332–9.
based on Gaussian mixture model and Bayesian inference strategy; Jolliffe I. Principal component analysis. second ed. New York: Springer; 2002.
(ii) performing JIT learning to build local KPLS model online from Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry.
Comput Chem Eng 2009;33(4):795–814.
the samples similar to the query data; (iii) adaptively determining Kadlec P, Gabrys B. Local learning-based adaptive soft sensor for catalyst activation
the optimal local modeling size based on cross-validation; and (iv) prediction. AIChE J 2011;57(5):1288–301.
adapting to new process environment by updating database. Kadlec P, Grbić R, Gabrys B. Review of adaptation mechanisms for data-driven soft
sensors. Comput Chem Eng 2011;35(1):1–24.
The application to the industrial fed-batch CTC fermentation Kano M, Nakagawa Y. Data-based process monitoring, process control, and quality
process has demonstrated that, in terms of the predictive accu- improvement: recent developments and applications in steel industry. Comput
racy and model adaptation capability, the proposed MJIT-KPLS soft Chem Eng 2008;32(1):12–24.
Kano M, Ogawa M. The state of the art in chemical process control in Japan: good
sensing framework is superior to the single-model based methods, practice and questionnaire survey. J Process Control 2010;20(9):969–82.
the conventional multi-model based methods, and the Bayesian Kano M, Fujiwara K. Virtual sensing technology in process industries: trends
model averaging based multi-model methods. and challenges revealed by recent industrial applications. J Chem Eng Jpn
2013;46(1):1–17.
Kim S, Kano M, Hasebe S, Takinami A, Seki T. Long-term industrial applications of
inferential control based on just-in-time soft-sensors: economical impact and
Acknowledgements challenges. Ind Eng Chem Res 2013;52(35):12346–56.
Lin B, Recke B, Knudsen JK, Jørgensen SB. A systematic approach for soft sensor
development. Comput Chem Eng 2007;31(5):419–25.
We thank Charoen Pokphand Group for their financial support
Liu H, Shah S, Jiang W. On-line outlier detection and data cleaning. Comput Chem
and for providing the industrial datasets of fed-batch CTC fermen- Eng 2004;28(9):1635–47.
tation process. We also appreciate the valuable comments and May RJ, Maier HR, Dandy GC, Fernando T. Non-linear variable selection for arti-
suggestions of the anonymous reviewers. ficial neural networks using partial mutual information. Environ Mod Softw
2008a;23(10):1312–26.
May RJ, Dandy GC, Maier HR, Nixon JB. Application of partial mutual information
variable selection to ANN forecasting of water quality in water distribution
References systems. Environ Model Softw 2008b;23(10):1289–99.
McLachlan G, Peel D. Finite mixture models. New York: John Wiley & Sons; 2000.
Alford JS. Bioprocess control: advances and challenges. Comput Chem Eng Nicoletti MC, Jain LC. Computational intelligence techniques for bioprocess mod-
2006;30(10):1464–75. elling, supervision and control. Berlin/Heidelberg: Springer; 2009.
Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006. Paluš M, Komárek V, Hrnčíř Z, Štěrbová K. Synchronization as adjustment of informa-
Chen J, Yu J, Mori J, Rashid MM, Hu G, Yu H, et al. A non-Gaussian pattern matching tion rates: detection from bivariate time series. Phys Rev E 2001;63(4):046211.
based dynamic process monitoring approach and its application to cryogenic air Pani AK, Mohanta HK. A survey of data treatment techniques for soft sensor design.
separation process. Comput Chem Eng 2013;58:40–53. Chem Prod Process Model 2011;6(1), Article 2.
Chen LZ, Nguang SK, Chen XD. Modelling and optimization of biotechnological pro- Pani AK, Vadlamudi VK, Mohanta HK. Development and comparison of neural net-
cesses: artificial intelligence approaches. Berlin/Heidelberg: Springer; 2006. work based soft sensors for online estimation of cement clinker quality. ISA
Cheng C, Chiu M-S. A new data-based methodology for nonlinear process modeling. Trans 2013;52(1):19–29.
Chem Eng Sci 2004;59(13):2801–10. Rashid MM, Yu J. A new dissimilarity method integrating multidimensional mutual
Chow TW, Huang D. Estimating optimal feature subsets using efficient esti- information and independent component analysis for non-Gaussian dynamic
mation of high-dimensional mutual information. IEEE Trans Neural Netw process monitoring. Chemom Intell Lab Syst 2012a;115:44–58.
2005;16(1):213–24. Rashid MM, Yu J. Nonlinear and non-Gaussian dynamic batch process monitoring
Cinar A, Parulekar SJ, Undey C, Birol G. Batch fermentation: modeling, monitoring, using a new multiway kernel independent component analysis and multidi-
and control. New York: CRC Press; 2003. mensional mutual information based dissimilarity approach. Ind Eng Chem Res
Cui L, Xie P, Sun J, Yu T, Yuan J. Data-driven prediction of the product for- 2012b;51(33):10910–20.
mation in industrial 2-keto-l-gulonic acid fermentation. Comput Chem Eng Rasmussen C, Williams C. Gaussian processes for machine learning. Cambridge, MA:
2012;36:386–91. MIT Press; 2006.
H. Jin et al. / Computers and Chemical Engineering 71 (2014) 77–93 93

Rosipal R, Trejo LJ. Kernel partial least squares regression in reproducing kernel Wold H. Nonlinear estimation by iterative least squares procedures. In: Research
hilbert space. J Mach Learn Res 2001;2:97–123. papers in statistics. New York: Wiley; 1966. p. 411–44.
Schaal S, Atkeson CG, Vijayakumar S. Scalable techniques from nonparametric statis- Yu J. Multiway Gaussian mixture model based adaptive kernel partial least
tics for real time robot learning. Appl Intell 2002;17(1):49–60. squares regression method for soft sensor estimation and reliable qual-
Scott DW. Multivariate density estimation: theory, practice, and visualization. New ity prediction of nonlinear multiphase batch processes. Ind Eng Chem Res
York: John Wiley & Sons; 1992. 2012a;51(40):13227–37.
Sharma A. Seasonal to interannual rainfall probabilistic forecasts for improved water Yu J. A Bayesian inference based two-stage support vector regression frame-
supply management: part 1—a strategy for system predictor identification. J work for soft sensor development in batch bioprocesses. Comput Chem Eng
Hydrol 2000;239(1):232–9. 2012b;41:134–44.
Sharmin R, Sundararaj U, Shah S, Griend LV, Sun Y-J. Inferential sensors for estima- Yu J. Online quality prediction of nonlinear and non-Gaussian chemical processes
tion of polymer quality parameters: industrial application of a PLS-based soft with shifting dynamics using finite mixture model based Gaussian process
sensor for a LDPE plant. Chem Eng Sci 2006;61(19):6372–84. regression approach. Chem Eng Sci 2012c;82:22–30.
Shigemori H, Kano M, Hasebe S. Optimum quality design system for steel Yu J, Chen K, Rashid MM. A Bayesian model averaging based multi-kernel Gaussian
products through locally weighted regression model. J Process Control process regression framework for nonlinear state estimation and quality pre-
2011;21(2):293–301. diction of multiphase batch processes with transient dynamics and uncertainty.
Sliskovic D, Grbic R, Hocenski Z. Methods for plant data-based process modeling in Chem Eng Sci 2013;93:96–109.
soft-sensor development. Automatika 2011;52(4):306–18. Yu J, Qin SJ. Multimode process monitoring with Bayesian inference-based finite
Westerhuis JA, Kourti T, MacGregor JF. Comparing alternative approaches Gaussian mixture models. AIChE J 2008;54(7):1811–29.
for multivariate statistical analysis of batch process data. J Chemom Yu J, Qin SJ. Multiway Gaussian mixture model based multiphase batch process
1999;13(3–4):397–413. monitoring. Ind Eng Chem Res 2009;48(18):8585–94.

You might also like