You are on page 1of 6

Fault Detection in Industrial Plant Using κ-Nearest

Neighbors with Random Subspace Method


Fellipe do Prado Arruda, Anderson da Silva Soares,
Valniria da Silva Bandeira, Telma Woerle de Lima, and
Kleiton Vinícius Braga, Gustavo Teodoro Laureano
Clarimar José Coelho Intitute of Informatics
Computer Science Departament Federal University of Goiás
Pontifical University Catholic of Goiás Goiânia, GO, Brazil
Goiânia, GO, Brazil

Resumo—In this paper we propose a ensemble approach using reduction for Fisher’s discriminant analysis (FDA) and discri-
κ-nearest neighbors (κ-NN) combined with random subspace minant partial least square (DPLS). The Tennessee Eastman
method (RSM) to achieve an improved classification performance process (TEP) simulator was used to generate overlapping
to fault detection problem. Fault detection and isolation is
a subfield of control engineering which concerns itself with datasets to evaluate the classification performance.
monitoring a system, identifying when a fault has occurred, and Ling [11] proposed a novel optimization algorithm, based on
pinpointing the type of fault and its location. Fault detection a modified binary particle swarm optimization with mutation
is utilized to determine that a problem has occurred within combined with support vector machine (SVM) to select the
in a certain channel or area of operation. In other words, the fault feature variables for fault diagnosis. The simulations on
software application may recognize that the system is operating
successfully, but performing at a level that is sub-optimal to pre- Tennessee eastman process show the proposed methodology
determined target. In our study we showed that the proposed can effectively escape from local optima to find the global
methodology is more efficiently than classical artificial neural optimal value comparing with standard particle swarm opti-
network. mization.
Keywords: Fault detection, Pattern recognition, Random subs- Samanta [12] compare the performance of three types
pace method, κ-nearest neighbors.
of Artificial Neural Networks (ANN), namely, multi layer
I. I NTRODUÇÃO perceptron (MLP), radial basis function (RBF) network and
Faults in industrial plants are deviations from the normal probabilistic neural network (PNN), for bearing fault detection.
behavior of the plant or its instrumentation. Fault occur due to ANNs have potential applications in automated detection and
sensor fault, actuator fault or fault in the plant. The diagnosis dignosis of machine conditions. Multi layer perceptrons and
of the fault determines whether a fault has occurred, the radial basis function networks are most commonly used ANNs.
location of the fault, the fault size and the time it occurred Peter and Jin [13] propose a fault detection method using
[1]. the kNN is developed to explicitly account for some unique
The rapid fault detection is critical to preventing its spread characteristics of most semiconductor processes. The traditio-
and cause further deterioration of the loss system equipment nal kNN algorithm is adapted such that only normal operation
and human lives [2], [3]. If a failure is detected, the diagnosis data is needed to build a process model. The results are
consists to locate the component/sub-system involved and, illustrated by simulated examples. The kNN method performs
if possible, the cause of the problem source. The correct better than PCA in a real industrial example with limited data
and detailed diagnosis is of great value to guide procedures preprocessing.
for accommodation of failure, timely, corrective maintenance Traditional univariate statistical process control charts have
actions, reducing the time spent on this process [4], [5]. long been used for fault detection. Recently, multivariate
Algorithms traditionally used in pattern classification pro- statistical fault detection methods such as principal component
blems in different fields can be adapted to detect and diagnose analysis (PCA)-based methods have drawn increasing interest
faults. In this case, the problem of fault detection and diag- in the fault detection in manufacturing industry. However, the
nosis is formulated as a classification problem. Are defined unique characteristics of the semiconductor processes, such
two classes, one for normal operation and one for the fault as nonlinearity in most batch processes, multimodal batch
operation based on sensor data [6]. trajectories due to product mix, and process steps with variable
Reddy [7] developed a detection scheme of simultaneous durations, have posed some difficulties to the PCA-based
failures at a functional level applied to a particular class of methods. To explicitly account for these unique characteristics,
problems. a fault detection method using the k-nearest neighbor rule (FD-
Chiang [8], [9], [10] developed an information criterion kNN) is developed.
that automatically determines the order of the dimensionality Baraldi [14] presents an ensemble-based scheme for nuclear
transient identification. The approach adopted to construct the several classification techniques such as linear discriminant
ensemble of classifiers is bagging. The performance of the analysis (LDA), ANN, SVM and based on clustering of
proposed classification scheme has been verified by compari- patterns can be used [21], [22].
son with a single supervised, evolutionary-optimized fuzzy C-
A. κ-Nearest Neighbors Algorithm
means classifier with respect of the task of classifying artificial
datasets. The results obtained indicate that in the cases of The κ-NN algorithm identifies the category of unknown data
datasets of large or very small sizes and/or complex decision point on the basis of its nearest neighbor whose class is already
boundaries, the bagging ensembles can improve classification known. The training set consists of vectors of n-dimensional
accuracy. and each element of this set is a point in space n-dimensional
In this work, classification techniques are investigated to [23].
fault detection, in particular the algorithm κ-NN. Ensemble
techniques are also considered to enhance the performance of
the classifier. The main idea about ensemble is learn a set of 2

classifiers and combine the predictions them. The combination


of classifiers is based on the hypothesis that the decision to 1.5

Feature 2
combine a set of classifiers can increase the rate of correct
detection, surpassing the performance of individual classifiers. 1
Once created, set members must have their opinions combined
into a single decision. The rule of the majority vote is the Class 1
0.5
function most popular to combination of the set classifiers Class 2
[15], [16].
0
In particular, we explore the RSM proposed by Ho [17]. The 0 0.5 1 1.5 2
RSM algorithm creates different training sets with resampling Feature 1
without repetition of variables available. Each set is used to Figura 1. Illustration of the classification of a new pattern using κ-NN
generate a classifier according to a predetermined learning algorithm.
rule, por example, a ensemble where each classifier vote
with equal weight. The ultimate classifier is obtained as a The class of an element that does not belong to the training
combination of individual classifiers based, for example, by set is determined by the classifier κ-NN searching for the κ
majority voting [18]. elements of the training set which are the smallest distance
Simple majority voting is a decision rule that selects one of the unknown element, that is, nearest the unknown element
of many alternatives, based on the predicted classes with [24].
the most votes. It is the decision rule used most often in In this work the distance is calculated by euclidian distance.
ensemble methods. Weighted majority voting can be made if Let X = [x1 , . . . , xκ ]T the elements of a known class and
the decision of each classifier is multiplied by a weight to Y = [y1 , . . . , yκ ]T the elements of a unknown class
reflect the individual confidence of these decisions. Simple √
majority voting is a special case of weighted majority voting, D(X, Y ) = (x1 − y1 )2 + . . . + (xκ − yκ )2 .
assigning an equal weight of 1/k to each classifier where k is The κ elements are called κ-nearest neighbors. The algo-
the number of classifiers in an ensemble [19]. rithm checks which are the classes of κ nearest neighbors
II. M ETHODOLOGY AND A LGORITHMS and the most frequent class is assigned to the element class
unknown [25].
In this work, fault detection is formulated as classification The Figure 1 illustrates the classification of a new pattern
problem where two classes are defined described by two attributes according to the rule of algorithm
w0 : Normal operation κ-NN for κ = 3. You can verify that the vicinity of the pattern
w1 : Fault considered comprises an element belonging to the class 1 and
two elements belonging to class 2 which implies assigning
The process of building the classifiers model involves the Class 2 to the new standard.
separation of data into a training set and testing. In the The κ-NN is a nonparametric method does not require
classifier training stage, is delimited the decision frontiers in the choice of a model of data distribution and does not
the space P of features in order to separate it into two regions present problems of ill-conditioning in relation to parameter
S0 and S1 , corresponding to classes w0 and w1 , respectively. estimation. However, its performance can be very sensitive
Such features should be conveniently extracted from the sig- to the influence of outliers and the choice of the number of
nals monitored in order to emphasize the differences between neighbors to be considered [26].
normal operating conditions and faults [20].
During system operation the decision rule makes a set of B. Random Subspace Method
features extracted from signals at a given instant is labeled as The RSM, also named attribute bagging, is an ensemble
a standard class w0 (normal) or w1 (fault). For this purpose, classifier that consists of several classifiers and outputs the
FI: Flow Indicator;
SC: Speed Controller;
TI: Temperature Indicator;
CSW: Cold Water Source;
PI: Pressure Indicador;
CWR: Cold Water Return;
LI: Level Indicador;
JI: J Indicator

Figura 2. Schematic representation of Tennessee Eastman Process.

class based on the outputs of these individual classifiers. RSM space [19].
is a generalization of the random forest algorithm [27]. The combined decision of such classifiers may be superior
The ensemble classifier is constructed using the following to a single classifier constructed on the original training set in
algorithm: the complete feature space. To perform the classification, all
subspaces are applied to classify a new pattern, counting the
• Let n the objects of training set and the D features in the
final results obtained by single classifiers. The class with the
training data;
most votes is assigned [18].
• Let L the individual classifiers in the ensemble;
• For each individual classifier, l, choose dl (dl < D) input III. C ASE S TUDY: T ENNESSEE E ASTMAN P ROCESS
variables for l. It is common to have only one value of
The process simulator for the Tennessee Eastman Process
dl for all the individual classifiers;
(TEP) industrial challenge problem was created by the East-
• For each individual classifier, l, create a training set by
man Chemical Company to provide a realistic industrial pro-
choosing dl features from D without replacement and
cess for evaluating process control and monitoring methods.
train the classifier;
The TEP simulator has been widely used by the process
• For classifying a new object, combine the outputs of the L
monitoring community as a source of data for comparing
individual classifiers by majority voting or by combining
various approaches. TEP is recognized like as benchmark in
the posterior probabilities.
fault detection studies.
The RSM may benefit from using both random subspaces It was proposed and implemented by [28] in FORTRAN
for constructing the classifiers and aggregating it. When the language for a real need for realistic problems for the appli-
dataset has many redundant features, one may obtain better cation and discussion of different techniques of multivariate
classifiers in random subspaces than in the original feature process control. Posteriorly, [29] proposed some improvements
and provided the SIMULINK TEP application. The TEP can ANN, one classifier is build to distinguish between normal
be described as an arrangement of recycle reactor-separator- operation class and fault condition, that is, for 10 types of
schematic representation showed in Figure2. fault, 10 classifiers are obtained.
The TEP produces two products (G and H) from four To measure the classifier overall accuracy we used hit rate
reactants (A, C, D and E). A further inert trace component defined according to
(B) and one byproduct (F) are present. The reactions are: ( )
Number of errors
A(gas) + C(gas) + D(gas) −→ G(liq) Accuracy = 1 − . (1)
Number of objects
A(gas) + C(gas) + E(gas) −→ H(liq)
IV. R ESULTS
A(gas) + E(gas) −→ F(liq)
First, we implemented a experiment to determine the num-
3D(gas) −→ 2F(liq) ber of classifiers in ensemble. We vary the number of classifi-
These reactions are irreversible and exothermic with rates ers from 5 to 100 and measure the overall accuracy (average
that depend on temperature and on the reactor gas phase of all faults). The result can be seen in Figure 4. Since
concentration of the reactants. In addition to the reactants 30 classifiers, the accuracy has a significant gain. Thus, the
inserts through streams 1, 2, 3 e 4, a purge is necessary to number of classifiers was set at 30.
prevent the inert component B from building up in the recycle
stream. The reactions occur in the reactor with cooling coil to 94

control the temperature of the mixture. 92


The recycle is then fed back to the mixing zone, using a 90
compressor to compensate for pressure losses in the reactor,

Accuracy (%)
88
condenser and flash. The liquid bottom stream of the flash is
pumped into a stripper in order to obtain the desired product 86

purity [30]. 84

The historical database for the TEP was generated by 82


simulating the plant for a period of 1020 sampling periods
80
for each of the 52 variables (41 measured variables + 11 0 20 40 60 80
Number of Classifiers in Random Subspace Method
100

manipulated variables) [31].


The data was sampled every 3 min, and the random seed Figura 4. Accuracy versus number of classifiers in ensemble.
(used to specify the stochastic measurement noise and dis-
turbances) was changed before the computation of the data The Table II presents the results obtained by proposed
set for each fault. Ten testing sets were generated using the method (κ-NN-RSM) and ANN. κ-NN-RSM and ANN ob-
preprogrammed faults (fault 1-10) described in Table I. In tained 93.38% and 81.70%, respectively. The κ-NN-RSM has
additional, one testing set (Normal operation) was generated better accuracy in 7 of the 10 faults considered. In average,
with no faults [30]. the κ-NN-RSM is 15% better than ANN. Only for faults types
1, 2, 6 and 10, the ANN has better accuracy than κ-NN-RSM,
Tabela I however, the κ-NN-RSM has good classification performance
S IMULATED FAULTS . for this faults.
Fault Fault Description
1 Step in A/C feed ratio, B composition constant Tabela II
2 Step in B composition, A/C ratio constant FAULT DETECTION RESULTS FOR PROPOSED METHOD (κ-NN-RSM) AND
3 Step in D feed COMPARISON METHOD (ANN).
4 Step in reactor cooling water inlet temperature
5 Step in condenser cooling water inlet temperature Fault κ-NN-RSM ANN
6 A feed loss. Type 1 96.67 98.36
7 C header pressure loss-reduced avalability Type 2 96.67 98.33
8 Random variation in A-C feed composition Type 3 97.67 82.23
9 Random variation in D feed temperature Type 4 98.33 62.96
10 Random variation in C feed temperature Type 5 82.73 50.93
Type 6 97.67 99.60
Type 7 96.34 93.56
Type 8 90.68 89.59
A. Proposed Algorithm and Implementation Details Type 9 97.01 55.83
Type 10 80.06 85.60
All algorithms were implemented in Matlab 2013b. The Average 93.38 81.70
proposed algorithm is named κ-NN-RSM and the number of
classifiers for ensemble was defined in 30. Artificial Neural
Networks (ANN) using the multilayer perceptron architecture The κ-NN-RSM presents good results because of the use the
were implemented for comparisons objective. Each ANN classifier ensemble while ANN is just a single classifier. The
contains 20 neurons in hidden layer. In booth, κ-NN-RSM and Figures 3(a), 3(b) and 3(c) presents the scatter plot of training
275 5
Normal operation
Normal operation 4 Fault type 1
270 New object
Fault type 1
o 3 o
265
2
Sensor 20

Sensor 19
New object
260 1

0
255
−1
250
−2

245 −3
8.4 8.6 8.8 9 9.2 9.4 8.4 8.6 8.8 9 9.2 9.4
Sensor 4 Sensor 4

(a) The new object is assigned for the class 1 (normal operation) (b) The new object is assigned for the class 2 (fault type 1)

33
New object
32.5 o

32

31.5
Sensor 5

31

30.5

30
Normal operation
29.5
Fault type 1
29
8.4 8.6 8.8 9 9.2 9.4
Sensor 4

(c) The new object is assigned for the class 2 (fault type 1)

Figura 3. Scatter plot of training objects and recognition of a new object using three classifiers with different subspaces.

objects. We choose three classifiers from the all classification R EFERÊNCIAS


ensemble to show what happened in classification of a new
object. As can be seen, in the first classifier, using the data [1] J. J. Gertler, Fault Diagnosis: Models, Artificial Intelligence, Applicati-
ons. Marcel Dekker, 1998.
from sensors 4 and 20, the new object is wrongly assigned [2] P. Mhaskar, J. Liu, and P. D. Christofides, Fault-Tolerant Process
to class 2 (fault type 1). However, the others two classifiers, Control: Methods and Applications. Bristol, UK, UK: Springer, 2013.
using the data from sensors 4, 19, 4 and 5, assigned correctly [3] A. Zolghadri, D. Henry, J. Cieslak, D. Efimov, and P. Goupil, Fault
Diagnosis and Fault-Tolerant Control and Guidance for Aerospace
to class 1 (normal operation). Vehicles: From Theory to Application, 2nd ed. London: Springer, 2013.
[4] J. M. Candelaria, Fault detection and isoloation in low-voltage DC-bus
V. C ONCLUSION microgrid systems. ProQuest, 2012.
[5] R. Casimir, E. Boutleux, and G. Clerc, Fault diagnosis in an induction
motor by pattern recognition methods, 2003, no. August, pp. 294–299.
In this work we proposed a ensemble based on κ-nearest [6] S. Wang and F. Xiao, “Ahu sensor fault diagnosis using principal
neighbors for fault detection in industrial plant. The ensemble component analysis method,” Energy and Buildings, vol. 36, no. 2, pp.
was implemented using the random subspace method. The 147–160, 2004.
[7] A. Reddy and P. Banerjee, “Algorithm-based fault detection for signal
proficiency of these technique for fault detection is evaluated processing applications,” IEEE Transactions on Computers, vol. 39, pp.
by application to data collected from the Tennessee Eastman 1304–1308, 1990.
chemical plant simulator. The combination of κ-nearest neigh- [8] L. H. Chiang, E. L. Russell, and R. D. Braatz, “Fault diagnosis
bors with random subspace method (κ-NN-RSM) were shown in chemical processes using fisher discriminant analysis, discriminant
partial least squares, and principal component analysis,” Chemometrics
to be better accuracy than classical implementation of artificial and Intelligent Laboratory Systems, vol. 50, pp. 243–252, 2000.
neural network. In average, the κ-NN-RSM was 15% better [9] L. H. Chiang and R. J. Pell, “Genetic algorithms combined with
than ANN. discrimant analysis for key variable identification,” Journal of Process
Control, vol. 14, pp. 143–155, 2004.
[10] L. H. Chiang, M. E. Kotanchek, and A. K. Kordon, “Fault diagnosis
ACKNOWLEDGEMENTS based on fisher discriminant analysis and support vector machines,”
Computers and Chemical Engineering, vol. 28, pp. 1389–1401, 2004.
The authors thank the research agencies CAPES, FAPEG, [11] L. Wang and J. Yu, “Fault feature selection based on modified binary
PSO with mutation and its application in chemical process fault diagno-
FAPESP and CNPq for the support provided to this research. sis,” Lecture Notes in Computer Science, vol. 3612, pp. 832–840, July
This is also a contribution of the National Institute of Advan- 2005.
ced Analytical Science and Technology (INCTAA) (CNPq - [12] B. Samanta, K. R. Al-Balushi, and S. Al-Araimi, “Artificial neural net-
works and genetic algorithm for bearing fault detection,” Soft Computing
proc. no. 573894/2008-6 and FAPESP proc. no. 2008/57808- - A Fusion of Foundations Methodologies and Applications, vol. 10,
1). no. 3, pp. 1433–7479, February 2006.
[13] Q. P. He and J. Wang, “Fault detection using the k-nearest neighbor rule
for semiconductor manufacturing processes,” Semiconductor Manufac-
turing, IEEE Transactions on, vol. 20, no. 4, pp. 345–354, 2007.
[14] P. Baraldi, R. Razavi-Far, and E. Zio, “Bagged ensemble of fuzzy c-
means classifiers for nuclear transient identification,” Annals of Nuclear
Energy, vol. 38, no. 5, pp. 1161–1171, 2011.
[15] F. P. G. Marquez and M. Papaelias, Fault Detection: Classification,
Techniques and Role in Industrial Systems, 2nd ed. New York: Nova
Science Pub Inc, 2013.
[16] C. Zhang and Y. Ma, Ensemble Machine Learning: Methods and
Applications. New Work: Springer, 2012.
[17] T. K. Ho, “The random subspace method for constructing decision
forests,” Pattern Analysis and Machine Intelligence, IEEE Transactions
on, vol. 20, no. 8, pp. 832–844, 1998.
[18] Y. M. Cha Zhang, Ensemble Machine Learning: Methods and Applica-
tions. New York: CRC Press, 2012.
[19] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms. Springer,
2012.
[20] S. Theodoridis and K. Koutroumbas, Pattern Recognition, 3rd ed.
Academic Press, February 2006.
[21] S. K. Pal, Pattern Recognition: From Classical to Modern Approaches.
Singapore: World Scientific, 2002.
[22] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed.
New York: John Wiley, 2001.
[23] N. Bhatia and V. S. S. C. S., “Survey of nearest neighbor techniques,”
International Journal of Computer Science and Information Security,
vol. 8, no. 2, 2010.
[24] G. Shakhnarovich, T. Darrell, and P. Indyk, Nearest-Neighbor Methods
in Learning and Vision: Theory and Practice, ser. Neural Information
Processing Series. The MIT Press, 2005.
[25] B. V. Dasarathy, Nearest Neighbor: Pattern Classification Techniques,
third edition ed., ser. Nn Pattern Classification Techniques. IEEE
Computer Society, 1990.
[26] S. Cost and S. Salzberg, “A weighted nearest neighbor algorithm for
learning with symbolic features,” Machine Learning, vol. 10, no. 1, pp.
57–78, 2001.
[27] T. K. Ho, “Nearest neighbors in random subspaces,” in Advances in
Pattern Recognition. Springer, 1998, pp. 640–648.
[28] J. J. Downs and E. F. Vogel, “A plant-wide industrial process control
problem,” Computer and Chemical Engineering, vol. 17, no. 3, pp. 245–
255, 1993.
[29] N. L. Ricker, “Decentralized control of the tennessee eastman challenge
process,” Journal of Process Control, vol. 6, no. 4, p. 205, 221 1996.
[30] J. Biegler and Wachter, “Tennessee eastman plant-wide industrial pro-
cess challenge problem. complete model,” Comp. Chem. Eng., vol. 27,
2003.
[31] P. N. Lodal, “Case history: A steam line rupture at tennessee eastman
division,” Process Safety Progress, vol. 19, no. 3, p. 154–159, 2003.

You might also like