You are on page 1of 6

Computers" chem. Engng, Vol. 21, Suppl., pp.

$655-$660, 1997
Pergamon © 1997 Elsevier Science Ltd
All fights reserved
Printed in Great Britain
PII:S0098-1354(97)00124-5 0098-1354/97 $17.00+0.00

Signed Digraph Based Multiple Fault


Diagnosis
Hiranmayee Vedam
Venkat Venkatasubramanian*
Laboratory for Intelligent Process Systems
School of Chemical Engineering
Purdue University
W. Lafayette, IN 47906 - 1283, U.S.A

Abstract. Abnormal Situation Management (ASM) has received considerable attention from industry and academia
recently. The first step towards better ASM is the timely detection and diagnosis of the abnormal situation, Most of
the existing methods for fault diagnosis assume that only a single fault occurs at any given time. However, multiple
faults do occur in processes, albeit less frequently than single faults. When multiple faults occur, existing methods
either lead to incorrect diagnosis or complete lack of diagnosis.
Multiple fault diagnosis (MFD) is a difficult problem because the number of combinations grows exponentially
with the number of faults. In this paper, a signed directed graph (SDG) based algorithm for MFD is developed. The
computational complexity is efficiently handled by assuming that the probability of occurrence of a multiple fault
scenario decreases with an increasing number of faults involved. SDG based diagnosis, like any other qualitative
method, has poor resolution. This poor resolution is overcome by using a knowledge base consisting of knowledge
about the process constraints, maintenanceschedules etc. The proposed algorithm is implemented in Gensym's expert
system shell, G2. The application of the algorithm is illustrated using an industrial scale simulation of the standard
FCCU called TRAINER.

1 Introduction 1990), and process history based methods like neu-


ral networks (Thompson and Kramer, 1994; Kavuri
Abnormal situation management (ASM) involves the and Venkatasubramanian, 1994), principal component
timely detection, diagnosis and correction of abnor- analysis (Nomikos and MacGregor, 1994), and quali-
mal process conditions. An estimated $20 B is lost an- tative trends analysis (Rengaswamy and Venkatasub-
nually by the petrochemical industry in the US due to ramanian, 1995). Qualitative methods have an advan-
insufficient ASM. It is also estimated that there were tage that they donot require detailed knowledge about
240 plant shutdowns during a one year period that the process and are relatively easier to develop. The
could have been prevented (Nimmo, 1995). The iden- disadvantage of such methods is their lack of reso-
tification of the root causes for the abnormal situation lution i.e., the number of root causes the diagnostic
from process measurements is known as fault diagno- method notifies as possible root causes is very large.
sis. 1 Also, most of the diagnostic methods assume that the
During an abnormal situation, the operators are typ- abnormal situation is due to a single root cause. Hence
ically flooded with dozens of alarms. An operator these methods would lead to either wrong diagnosis or
is expected to make a quick decision about the root complete lack of diagnosis when multiple faults occur.
causes and take corrective action in a very short span
of time making ASM a difficult task. An automated Multiple fault diagnosis is a difficult problem be-
framework to support operator decision-making in cause the problem has a combinatorially explosive
performing fault diagnosis and suggesting possible search space. The probability of multiple faults oc-
corrective actions can improve the situation signifi- curing in a process is often small, but when multi-
cantly. Significant attention has been given by the re- ple faults do occur, these multiple faults make it dif-
search community in the recent years to automate fault ficult for the operator to identify the root causes and
diagnosis. These automated fault diagnosis methods reduce the number of possible corrective actions. Op-
can be classified into model based methods like signed erators receive little or no training in identifying mul-
directed graphs (SDG) (Iri et al., 1979; Wilcox and tiple faults. This can significantly hamper the abil-
Himmelblau, 1994), observer based methods (Frank, ity of the operator to cope with the abnormal situation
1990) and assumption based methods (Petti et al., when multiple faults occur. DeKleer and Williams
(1987) developed an assumption based algorithm to
Author to whom all correspondenceshould be addressed,
Email: venkat@ecn, purdue, edu Fax: 1-317-494-0805
diagnose multiple faults in digital circuits. The dis-
1The termsrootcauseand faultare usedinterchangeablyin this advantage of their approach is that the computational
paper. complexity increases exponentially with the number

$655
$656 PSE '97-ESCAPE-7 Joint Conference
of faults. Morales and Garcia (1990) modified their based on model equations was developed by My-
development by using group propagation technique to laraswamy(1996). For many processes, however,
reduce the computational complexity. They applied model equations may be available only for some of the
their modular approach to diagnose multiple faults in units. Hence, a combination of model equation based
digital circuits. However, the above approaches are approach and operator' s knowledge is required to de-
not directly applicable to chemical processes because velop the SDG for the entire process. Process data for
of the dynamic nature of chemical processes. Further- various abnormal situations is used as the operator's
more, the diagnosis in chemical processes for ASM knowledge in this work. Partial digraph for the pro-
should be performed on-line in a short period of time. cess can be built using model equations for the units,
Hence, computational efficiency is a crucial factor. where ever available. The operator's knowledge is
Chung et.al (1994) developed a SDG- neural network then used to infer the cause-effect relations in the units
based method for identifying multiple incipient faults where model equations are not available. This pro-
in a nuclear power plant. They developed SDGs for cedure is elucidated while developing the SDG for
subsystems of the nuclear plant. Each SDG operates TRAINER FCCU in Section 4.
under a single fault assumption. The root causes from One of the serious limitations of qualitative meth-
individual SDGs are combined to arrive at multiple ods like SDG is their lack of resolution. The rea-
faults. son can be attributed to the qualitative ambiguities in-
In this paper, we will discuss algorithms to perform volved in this kind of an approach (Kuipers, 1986).
multiple fault diagnosis (MFD) using a SDG for the However, SDGs have a distinction of finding all pos-
whole process. The computational complexity arising sible fault candidates. In this work, the resolution of
due to combinatorics is reduced by an assumption that SDG based fault diagnosis is improved using a knowl-
the probability of occurrence of a multiple fault sce- edge base which would screen out physically impos-
nario decreases with the number of faults in a given sible root nodes. This knowledge base can consist of
scenario. The resolution of SDG based diagnosis is knowledge about reliability of equipment, infeasible
significantly enhanced by using a knowledge base to root nodes and information about equipment mainte-
screen out root nodes which have very little proba- nance. If an equipment has been recently serviced and
bility. The proposed algorithms will be illustrated on its reliability is high then the root node pertaining to
a simulation of a standard FCCU called TRAINER that equipment has very little probability of being a
(SACDA, 1995). Section 2 will discuss the develop- possible root node. The heat exchanger transfer coef-
ment of signed digraphs, the knowledge base devel- ficient decreases with time due to fouling and if a re-
opment and the algorithm for SDG based single fault cent maintenance was not performed, then a positive
diagnosis. In Section 3, the shortcomings of existing change in heat exchanger transfer coefficient can be
algorithm in the context of MFD are elaborated and ruled out as a root node. Using such a knowledge base,
the algorithms to perform MFD are presented. In Sec- we will show how the resolution of SDG based fault
tion 4, the case study is briefly discussed and results of diagnosis has on an average improved more than 52%
MFD for the case study are presented. We will con- for the TRAINER FCCU in Section 4.
clude with a discussion of the present work and sug- Iri et.al(1979) proposed an algorithm for using SDG
gestions for future work. for fault diagnosis. The inherent assumption in their
approach was that a single root cause which can ex-
plain the given abnormal situation can be found. They
2 Signed Directed Graphs also assume that there exists a valid causal path be-
tween the root node and the observed abnormal mea-
A signed directed graph is a representation of the pro- surements. A root node is any node in the digraph
cess causal information, in which the process vari- which has atleast one consistent arc connecting it to
ables (and parameters) are represented as graph nodes an effect node and no consistant arc connecting it
and causal relations are represented by directed arcs. to a cause node. An arc is said to be consistant if
Nodes in the SDG assume values of (0), (+) and (- sign(cause) * sign(arc) * sign(effect) = (+). Wilcox
) representing the nominal steady state value, higher and Himmelblau (1994) presented a new digraph-
and lower than steady-state values respectively. Di- based diagnosis reasoning called possible cause-effect
rected arcs point from a cause node to its effect node. graph. This approach reduces the search space and
Arc signs associated with each directed arc can take hence the number of root causes generated. Based on
values of (+) and (-) representing whether the cause these approaches the basic algorithm for performing
and effect change in the same direction or opposite di- single fault diagnosis using SDG can be summarized
rection respectively. A SDG may also include condi- in Figure 1. The basic steps involved in performing
tional arcs which become active only if certain condi- SDG based single fault diagnosis are :
tions are satisfied. For example, the arc connecting a
manipulated variable to the controlled variable is ac- 1. Propagate the deviation in the nodes representing
tive only if the controller is not in manual mode. process measurements (measured nodes) from
A SDG for the process can be developed from effect to cause node via consistant arcs till the
model equations representing the process or from root nodes are identified.
the operator's knowledge of the process. An au-
tomated framework for the development of SDGs 2. Use the knowledge base to screen out physically
PSE '97-ESCAPE-7 Joint Conference $657

in the SDG receive values of (+) and (-) respectively.


ho!~ga~ nodedevia~, from
effectno~ to causenode These deviations are propagated from effect to cause
nodes via consistant arcs till root nodes are identified.
The root nodes identified for the present abnormal
situationare {F3,T1,QI .... UAL, 17 5
set,Ttm}. The
knowledge base is used to screen out root node with
J very little probability, Ttm. A breadth-first search is
performed on each of the remaining root nodes. How-
ever, there exists no valid causal path from any of the
root nodes to both T2 and T3. Hence, no root cause is
identified using the above algorithm. In Section 3, we
will show how the MFD algorithm identifies the cor-
rect root cause, namely {F3 and UA] }.

Perform breadth-first sea~h


with the root node as origin
3 Multiple Fault Diagnosis Using
SDG
The algorithm described in Section 2 for performing
single fault diagnosis will lead to either lack of diagno-
t node to all the abnormal / ~ / sis or wrong diagnosis when multiple faults occur in a
umd nodes ? j J
process. The algorithm tries to find a single consistant
path from the root node to all the abnormal measure-
ments. Since no such path exists when multiple faults
Root Cause Identitied!![ 1 occur, the algorithm might find a wrong root cause or
might not find any root node that can explain the ab-
Figure 1: Algorithm for SDG based single fault diag- normal measurements. Hence, the above algorithm
nosis needs to be modified to perform multiple diagnosis.
A simplistic approach would be to change the single
fault diagnosis algorithm as follows:
impossible root nodes.
1. Perform Steps 1-3 as in the single fault diagnosis
3. If a conflict exists in assigning a sign to any node discussed in Section 2.
then a voting scheme is used.
2. All combinations of root nodes which can ex-
4. For each of the root nodes a breadth-first search plain all the observed measured node deviations
is performed to check if a valid causal path exists are identified.
between the root node and the observed abnormal
measurements. 3. Minimal cut sets of such combinations are iden-
tified as possible multiple root causes for the ob-
5. A root node is identified as the root cause for the served abnormal situation. A minimal cut set is
abnormal situation if such a causal path exists. defined as the minimal number of root causes re-
quired to explain the given abnormal situation.
Consider, for example, the SDG for the preheater
section in Model IV FCCU (McFarlane et al., 1993) in 4. A combination of root nodes which can explain
Figure 2. The variables T2 and T3 are measured. If all the observed measured node deviations and is
T2 increases and T3 decreases, then T2 and T3 nodes minimal is called a root cause.

The MFD algorithm discussed above, called MFD 1


© is computationally expensive and the computational
time increases exponentially with the number of root
~ 1 nodes. For example, if the number of root node af-
ter Step 1, is say 20. Then the worst case estimate of
QIos
(/......... number of combinations explored in Step 2 above is
of the order of 10 l°. In order to perform MFD in a
reasonable period of time MFD1 is modified based on
the assumption that the probability of occurrence of
multiple faults decreases with the number of faults in
~s,tU ",,\/,:/ a given scenario. We call this algorithm MFD2. For
example, if the number of root nodes is 20, assume all
root nodes are equally likely for the given abnormal
Figure 2: SDG for Preheater Section of ModellV situation. The probability that one of them can explain
FCCU the abnormal situation is 0.05, that two root nodes can
$658 PSE '97-ESCAPE-7 Joint Conference

explain the scenario is 0.0025 and so on. Hence the


probability that a single fault can occur is 8000 times
more likely than the occurrence of a four fault scenario
that can explain the same abnormal situation. The al-
gorithm can be described as follows:

1. Perform Steps 1-3 as in the single fault diagnosis


discussed in Section 2.

2. Arrange the root nodes in decreasing order ac- ~ ir "/


cording to the number of measured node devia-
tions each root node can explain.

3. Combinations of root nodes are called root node


lists (RNLs). RNLs with fewer number of root
nodes are explored before exploring RNLs with
larger number of root nodes. For example, single
root nodes which can explain the observed ab-
normal situation are first explored before RNLs
consisting of two root nodes that can explain the
same abnormal situation.
Figure 3: Signed Digraph for Slurry PA of TRAINER
4. Whenever a RNL can explain the observed ab- FCCU
normal situation all other combinations involv-
ing that RNL are removed from the search space,
However, an RNL consisting of {F3, T1} cannot ex-
thus reducing the search space dramatically. The plain the observed deviations and hence retained in the
rationale behind this step is that if a RNL, say
search space. If Nma~ is set to 3, then combinationsof
RNL1 can explain a given abnormal situation
three nodes are formed from the two node RNLs that
then, every other RNL consisting of all the nodes cannot explain the observed measured node deviation
in RNL1 are supersets of RNL1, hence RNL1 is
and the root nodes. One such RNL is {Fa, T1, Qtos~}.
the minimal cut set of all such RNLs.
Eventhough this RNL can explain the observed mea-
5. If the number of root node in any RNL exceeds a sured node deviation, this RNL doesnot become a root
maximum number, Nma~, and if that RNL is still cause because, its subset {Fa, Qtoss} is already a root
unable to explain the abnormal situation, then cause. Hence this RNL is discarded. If any RNLs with
such RNLs are also eliminated, further reduc- three nodes still remain in the search space after all the
ing the search space. This is a valid procedure combinations are explored, the search is stopped be-
because of the assumption that the probability cause the Nma~: is set to 3. If none exists, then the al-
of occurrence of an RNL that can explain the gorithm stops because, the search space has been ex-
observed abnormal situation decreases with the plored completly. This algorithm differs from MFD1
number of nodes in that RNL. in that, MFD1 explores all the combinations possible
before finding the minimal cutsets. The approach to
6. An RNL that which can explain the measured root cause identification in MFD1 is similar to iden-
node deviations and is minimal is called a root tifying minimal cut sets in Fault Tree Analysis. The
cause. performance of MFD2 algorithm alone on TRAINER
FCCU is discussed in Section 4, since MFD1 never
Single fault diagnosis is a special case of this al- converged for all the abnormal situations examined.
gorithm when Nma~ is set equal to 1. For the exam-
ple with 20 root nodes, if Nrna~ is set to 3, then the
number of combinations explored in the worst case 4 MFD for TRAINER FCCU
is 1350. The reduction in computational load for 20
nodes with N,,~a~ equal to 3, is of the order of 108, In this Section, MFD2 is applied to an industrial sim-
making the algorithm an attractive one. Revisiting the ulation of a standard FCCU called TRAINER. This
SDG for the preheater section of Model IV FCCU, simulation was provided to us by Honeywell Tehnol-
each of the root nodes is examined to see if it can ex- ogy Center in the form of executables. No model
plain the deviation in T2 and T3. Since no such root equations for any part of the FCCU are available. The
node exists, two node combinations of the root nodes FCCU simulated is a stacked regenerator/disengager
form two node RNLs. Each RNL is examined to see if design with the regenerator on the bottom and oper-
the root nodes in the RNL can explain the deviation in ating at a higher pressure. The convertor consists of
T2 and T3. An RNL consisting of {Fa, Qto,~} can ex- an external riser standpipe and single stage stripper.
plain the deviations and hence becomes a root cause. A waste heat boiler is used to extract heat from the
Then the RNL is removed from the search space. Sim- hot flue gas that emerges from the regenerator. It pro-
ilarly an RNL consisting of {F3, UA I } also becomes vides steam required for the air blower steam turbine
a root cause and hence removed from the search space. and a number of steam driven pumps. A centrifugal
PSE '97-ESCAPE-7 Joint Conference $659
air blower provides the regenerator air. A main frac- provements in the algorithm has been achieved by us-
tionator with several pumparounds and a single stage ing a knowledge base which can sieve out physically
wet gas compressor form the down stream processing impossible faults. The number of root causes identi-
units. The feed to the riser is preheated using the slurry fied after using the knowledge base (KB) is shown in
pumparound (PA) from the main fractionator. Addi- Table 1. A sample of the rules used in the knowledge
tional heating of the feed is provided using a feed fur- base is shown in Table 2.
nace before the feed enters the riser. A slide valve con-
trois the catalyst flow from the regenerator to the riser.
Table 2: Sample Rules Used in the Knowledge base
The flow and temperature of the feed, catalyst, steam
Pump efficiency has values (-) or (0)
and air are controlled using low level PID controllers.
UAF has values (-) or (0)
The level controllers are cascaded around the flow
Power is (-) or (0)
controllers where necessary. Field operated devices
Fans are (-) or (0)
and override controllers are also provided. The heat-
ing of feed using slurry PA and the waste heat boiler
make the system highly coupled. This makes decom-
position into subsystems a difficult task. Over 300 MFD2 has been implemented on several combina-
different abnormal scenarios can be simulated (My- tions of the faults. In all the cases, the actual fault
laraswamy, 1996; SACDA, 1995) combination is correctly identified. The computa-
tional time is less than a minute on a Sparc 10 ma-
chine for all the abnormal situations studied. The al-
Table 1: Increase in Resolution Using a knowledge gorithm can correctly identify the fault combination
base even in cases when the effect of one fault annuled the
effect of the other on some of the measured variables.
Fault Number Number Improvement ( % ) It can also identify multiple faults that occur both se-
No. Before KB After KB quentiallyand simultaneously. Only one abnormal sit-
1 42 23 45 uation is discussed in detail here due to spatial con-
2 33 10 70 straints.
3 33 10 70 The abnormal situation is a combination of loss in
4 33 7 79 slurry PA due to decrease in pump efficiency and a
5 34 11 68 positive drift in the controller transmitter that controls
6 8 5 37 the the air inflow to the regenerator. The simulation
7 4 4 0 starts at steady state. After 1 minute the air controller
transmitter begins to drift. The slurry PA pump effi-
ciency starts decreasing at t=5 minutes. The algorithm
The SDG for TRAINER is developed using a com- identifies air flow controller transmitter failure as one
bination of model based and operator' s knowledge as of the root causes between t=l minute and t=5 min-
discussed in Section 2. Model equations are available utes. After t=5 minutes, both the faults are identified.
for equipment like controllers, shell and tube heat ex- The output of the.algorithm after t=5 minutes in the
changers, fan type heat exchangers and valves. Model final form presented to the operator is shown in Fig-
equations for large equipment like main fractionator, ure 4. Some of the other root causes that can show
air blower, riser/regenerator and wet gas compressor similar deviations in the process measurements like
are not available. The knowledge base consists of transmitter drifts in slurry PA controllers, reduction in
the data obtained from the simulation for ten different airblower capacity are also identified.
faults.The partial SDGs for equipments with known
model equations are combined with the data from the
operator's knowledge to construct the SDG for the 5 Discussion and Future work
TRAINER FCCU. The SDG consists of 69 measured
nodes and 224 unmeasured nodes. The SDG has the This paper discusses algorithms for performing MFD
capability to diagnose 88 different faults and their using SDGs. The algorithm differs from the earlier ap-
combinations. It took about 100 (wo)man hours to proaches in that, the multiple faults are explored only
build the SDG. The SDG for slurry PA is shown in Fig- on an as needed basis, reducing the computational
ure 3 load significantly. The lack of resolution in SDG is
Knowledge based systems to perform both single improved significantly using a knowledge base. This
fault diagnosis and multiple fault diagnosis are imple- algorithm is illustrated using TRAINER FCCU. The
mented in Gensym' s real time expert system shell, G2. multiple faults identified need to be presented coher-
The single fault diagnosis algorithm has been imple- ently to the operator. An approach for presentation
mented as a part of AEGIS (Abnormal Event Guid- is as shown in Figure 4. Better ways to present the
ance and Information System), Honeywell's prototype results are being explored. From Figure 4 it can be
for ASM (ASM Home Page). The original SDG seen that the resolution in the case of multiple faults
algorithm had very poor resolution as shown in Ta- is still inadequate. We are currently investigating sev-
ble 1.The actual abnormal situations used cannot be eral methods to improve the resolution and to perform
discussed due to proprietary reasons. Significant im- conflict resolution.
$660 PSE '97-ESCAPE-7 Joint Conference

Mylaraswamy, D., DKIT: A Blackboard-based, dis-


tributed, multi-expert environmentfor Abnormal
Situation Management. PhD thesis, Purdue Uni-
versity (1996).
H
Nimmo, I., Adequately address abnormal situation op-
erations. Chem. Eng. Prog., 91(9), 36-45 (1995).
Nomikos, E and J. E MacGregor, Monitoring of batch
processes using multiway principal component
analysis. AIChEJ., 40, 1361-1375 (1994).

Petti, T., J. Klein, and P. Dhurjati, Diagnostic model


processor, using deep knowledge for process
fault diagnosis. AIChE J., 36(4), 565-575
(1990).
Rengaswamy, R. and V. Venkatasubramanian, A syn-
tactic pattern-recognition approach for process
monitoring and fault diagnosis. Engng Applic.
Artif. Intell., 8(1), 35-51 (1995).
Figure 4: Diagnosis output of MFD2 for Failure of
Slurry PA pump and Bias in air flow controller trans- SACDA, Process Model Description : Fluidized Cat-
mitter. F5 - Air flow controller transmitter bias;F1- alytic Cracking Unit Standard Model. SACDA
Slurry PA pump failure Inc. (1995).

Thompson, M. and M. A. Kramer, Modeling chemi-


References cal processes using prior knowledge and neural
networks. AICHE Journal, 40(8), 1328-1338
ASM Home Page, The abnormal situation ma- (1994).
nagement joint research and development
consortium. URL Ref.:http://www.iac.honey- Wilcox, N. A. and D. M. Himmelblau, The possible
well.com/Pub/Tech/asmwww.html. cause-effect graph model for process fault diag-
nosis - i. methodology. Comput. Chem. Engng.,
Chung, H., Z. Bien, J. Park, and P. Seong, Incipient 18(2), 103-116 (1994).
multiple fault diagnosis in real time with applica-
tion to large-scale systems. IEEE Trans. Nuclear
Science, 41(4), 1692-1703 (1994). Acknowledgement
Frank, P. M., Fault diagnosis systems using analyti- The authors would like to thank Honeywell Technol-
cal and knowledge-based redundancy - a survey. ogy Center and the CIPAC consortium member com-
Automatica, 26, 459-474 (1990). panies which provided the financial support for this re-
Iri, M., K. Aoki, E. O' Shima, and H. Matsuyama, search. We also would like to thank the ASM consor-
An algorithm for diagnosis of system failures in tium members for their valuable feedback and insights
chemical processes. Comput. Chem Engng., 3, into FCC operations.
489-493 (1979).
Kavuri, S. and V. Venkatasubramanian, Neural net-
work decomposition strategies for large-scale
fault diagnosis. Int. J. Control, 59(1994), 767-
792 (1994).
Kleer, J. D. and B. C. Williams, Diagnosing multiple
faults. Artificiallntelligence, 32, 97-130 (1987).
Kuipers, B., Qualitative simulations. Artificial Intelli-
gence, 29, 289-338 (1986).
McFarlane, R., R. Reineman, J. Bartee, and G. Geor-
gakis, Dynamic simulator for a model iv fluid
catalytic cracking unit. Comput. Chem. Engng,
17(3), 275-300 (1993).
Morales, E. and H. Garcia, Artificial Intelligence in
Process Engineering, chapter 5. Academic Press
(1990).

You might also like