The Metal Press Case Study

Kirsten Winter The University of Queensland ARC Centre for Complex Systems School of ITEE, St Lucia, QLD 4072 Australia
Abstract In this report we present our experience with the tool-supported FMEA process using Behavior Trees as a modeling notation, the failure mode injection technique, and a model checker as analysis tool. The approach is based on the work in [3]. As a case study we use the industrial press [1], which is similar to the one described by [2, 3]. The report describes in detail how we derive the Behavior Tree model from the functional requirements. We provide a formalization of the safety requirements, describe the injected failure modes and present the results of the FMEA. We conclude the report with an analysis of the performance of the proposed process.

Nisansala Yatapanage Institute for Integrated and Intelligent Systems, Griffith University, Nathan, QLD 4111 Australia

1

The Industrial Metal Press

The industrial metal press is a system for compressing sheets of metal into body parts for vehicles. When the press is turned on, a plunger begins rising, with the aid of an electric motor. The operator may then load the metal into the press. When the plunger reaches the top of the press, the operator can then push and hold a button, which turns off the motor, causing the plunger to fall. For the safety of the operator, the button is located a safe distance away from the falling plunger. When the plunger reaches the bottom, thereby compressing the metal, the motor is automatically turned on. The plunger begins rising again, repeating the cycle. The operator may abort the fall of the plunger by releasing the button, but only if the plunger has not yet reached the ”point of no return” also referred to as PONR. It is dangerous to turn on the motor when the plunger is falling beyond this point, as the momentum of the plunger could cause the motor to explode, exposing the operator to flying parts. A software controller operates the motor based on inputs received from the system’s operator via the button, as well as three sensors for detecting the

1

top sensor

PoNR sensor

bottom sensor

motor parallel I/F to PLC hydraulic clutches

serial I/F from PLC

Figure 1: Diagram of the Industrial Press showing the plunger at the top location of the plunger; one located at the top of the press (top sensor), one at the point of no return (PONR sensor) and one at the bottom (bottom sensor). The symmetry of the plant is sketched in Figure 1.

2

The Model
1. The plunger is initially resting at the bottom with the motor off. 2. When power is supplied to the system the controller shall turn the motor on, causing the plunger to rise. 3. When at the top, the plunger shall be held there until the operator pushes and holds down the button. This shall cause the controller to turn the motor off and the plunger will begin to fall. 4. If the operator releases the button while the plunger is falling slowly (above PONR), the controller shall turn the motor on again; this will cause the plunger to start rising again, without reaching the bottom.

The initial system requirements for the metal press are as follows:

2

5. If the plunger if falling fast (below PONR) then the controller shall leave the motor off until the plunger reaches the bottom. 6. When the plunger is at the bottom the controller shall turn the motor on and the plunger will rise again.

2.1

Modelling a single requirement in an RBT

Each of these requirements is systematically captured as a Requirements Behavior Tree (RBT) which usually consists of a sequence of BT nodes specifying the interaction between the various components’ behaviours as sequential system behaviour. The tag in each node indicates the requirement from which it is derives. An additional “+” in the tag shows where information has been found missing in the requirements and has been added to the respective RBT.

2.2

Integration of RBTs into an IBT

The resulting six RBTs are then integrated. Our notion of integration is based on the idea that RBTs can be joined where their control flow relies on the same precondition - as represented by the (sometimes implicit) root node of the RBT - to be satisfied. This means an integration is possible at places where the root node of one RBT (R) matches a node in a second RBT (R ′ ). The matching node becomes a grafting point such that RBT R continues the sequence in RBT R ′ (if the matching node is the leaf node of R ′ ) or becomes a branch of RBT R ′ (if the matching node is a non-leaf node).
... R1 Plunger [ At bottom ] R3 R2 Power ?? On ?? Controller [ Opening ] R3 +

Button [ Pushed ] Controller [ Closing ] Motor [ Off ] Plunger [ Falling ]

R2 +

R3 R2 Motor [ On ] R3 R2 Plunger [ Rising ] R6 R3 Plunger ?? At top ?? Controller [ Open ] Operator ?? Pushes button ?? ... R6 + Plunger ?? At bottom ??

Operator R4, R5?? Release button ??

R3 +

Controller [ Opening ]

R4

Plunger ? Falling slowly ? Controller [ Opening ]

R5

Plunger ? Falling fast ? Plunger ?? At bottom ??

R3

R4 +

R5

Figure 2: Integrated Behavior Tree of the press system

3

Figure 2 shows the integrated Behavior Tree (IBT) which is the result of integrating the six RBTs that were derived from the given requirements. RBTs R1, R2, and R3 continue the sequence of behaviour. The precondition for R2 is given by the single node in R1 and the (implicit) precondition of R3 is provided by the leaf node of R2. The integration of all three yields a simple sequence. The integration of RBT R4 and RBT R5 leads to a branching. It is based on the fact that both RBTs share the same root node, namely Operator??ReleaseButton??. R5 becomes a (alternative) branch of R4. The choice whether a branch becomes an alternative or a concurrent branch is made by the modeller. To integrate the subtree for requirements R4 and R5 we consider the following. The precondition for the node Operator??ReleaseButton??, the root node of the subtree, is that the button must be pressed to be released. However, since we are not interested in the behaviour where the button gets pressed and instantaneously released, we also require that the plunger is falling before the operator can release the button again. Plunger[Falling] becomes a precondition for the subtree consisting of the RBTs for R4 and R5 and we integrate the subtree below the node Plunger[Falling]. Reversions and macros are added where a leaf node matches a node higher up the tree or in parallel branches, respectively.

2.3

Refining the Integrated Behavior Tree

After the integration is achieved we find that the selection in node Plunger?FallingFast? does not make sense if the plunger has only one state of falling (and rising). Falling and rising of the plunger wrt. the PONR, i.e., below and above PONR, is implicit in the requirements. We add this information by replacing the BT nodes Plunger[Falling] and Plunger[Rising] by a sequence of nodes: Plunger[FallingSlowly] followed by the event Plunger??belowPONR?? and Plunger[FallingFast]. The node Plunger[Rising] is replaced accordingly. 2.3.1 Transforming an IBT into a DBT: adding structure

The result is the initial IBT that is derived from the requirements. In terms of FMEA, however, there is not much that can be analysed on this level. The model assumes that the controller has access to the plunger’s current state and can essentially observe the environment. In real systems, however, the environment (here the plunger) is not directly visible to the controlling component. Sensors have to be added to monitor the environment and report on any changes. Failing of sensors and the effect thereof can be analysed be means of FMEA. Additionally, we also add architectural structure and communication mechanisms to the model. This allows analysis of the communication between the components to reveal problems. We choose the system to be composed of parallel components that are synchronised via an event-driven communication. The components are as given in the requirements (controller, plunger, operator, button, motor) and additionally three sensors (top sensor, bottom sensor, PONR sensor). The operator and the plunger model the environment as far as it is

4

influencing the operation of the system. (The plunger is a “hybrid” that is half environment and half controlled by the system via the motor.) The architecture of this system is depicted in Figure 3. The dashed border of the plunger and operator components indicate that both have access to the environment, i.e., they receive external input.
Motor

> > Plunger > >

TopSensor PONRSensor BottomSensor Controller

> Operator >

Button

Figure 3: Architecture of Press model To obtain the BT model for this architecture we transform our RBT into a Design Behavior Tree (DBT). To derive a DBT from an RBT, we separate the components and create a thread for each component behaviour. For the press system we get a DBT with eight main threads which is shown in Figure 4.

Figure 4: Overview of the Design Behavior Tree of the press system The interaction between the component threads is triggered by internal input and output events. Instead of accessing the current state of a second component a component waits for a message sent by the second component that reports 5

on the relevant state changes. Moreover, the controller maintains its own internal image of the current status of the sensors. This internal representation of the sensors is modelled using attributes of the controller component. The controller’s “polling” of each sensor is modelling by a separate thread within the controller thread. 2.3.2 Example: the controller thread
R2 + Controller [ Opening ] R2 + Controller [ Opening ] Controller < turnMotorOn >

R2

Motor [ On ]

R2

R2

Plunger [ Rising ] Plunger ?? At top ?? Controller [ Open ] Operator ?? Pushes button ?? Button [ Pushed ] Controller [ Closing ] Motor [ Off ] Plunger [ Falling ] Controller R3 ?? Button=Pushed ?? R3 + Controller [ Closing ] Controller < turnMotorOff > Controller R3 ?? TopSensor=High ?? R3 + Controller [ Open ]

R3

R3 +

R3

R3

R3 +

R3

R3

...

R3

Figure 5: Transforming the IBT into the DBT: the controller thread We demonstrate the transformation from the IBT to the DBT on the controller behaviour. Figure 5 shows on the left hand side (a part of) the (original) IBT and on the right hand side (the corresponding part of) the controller thread that results from the transformation. We can observe three steps: 1. IBT nodes of the controller component are kept in the controller thread. 2. Where a controller IBT node is followed by a state realisation of another component (one might say the controller node triggers this state change) we add an internal output event to the DBT controller thread modelling the controller sending a triggering message. 3. Where a controller IBT node is preceded by a guard node or a state realisation of another component (one might say the controller behaviour is triggered by another component) we add either

...

6

system (IBT)
Plunger ?? At top ??

controller (DBT)

topsensor (DBT)

plunger (DBT)
Plunger >> AtTop << Plunger [ At Top ]

Controller [ Open ]

Figure 6: Transformation from IBT to DBT: derived interaction between controller, top sensor, and plunger - a guard on the controller’s internal attribute which represents an image of the other components status (like in Figure 5), or - an internal input event of the controller reading a triggering message from the other component. Essentially, the causal dependencies between the components’ behaviour that are represented by a sequential flow (a simple arrow) in the IBT are replaced (or refined into) by explicit “triggering” mechanisms (like event communication) in the DBT. In Figure 6 we demonstrate in more detail the case where the controller IBT node is preceded by a triggering node. In our case the controller maintains an internal representation of sensor values via attributes. This is achieved by transforming the sequence of two IBT nodes on the left hand side of Figure 6 into a sequence of interactions spread over three components: the plunger, the top sensor, and the controller. The interaction between the environment and the plunger becomes apparent through the external input node in the plunger thread, Plunger>>AtTop<<. The plunger changes its state and sends an internal message, Plunger<plungerAtTop>, informing the sensor about this state change. The top sensor receives this message, TopSensor>plungerAtTop<, changes its internal state, and informs the controller, which updates its internal representation of the top sensor value (in a parallel thread) and queries this attribute value Controller??TopSensor=High?? (in the controller’s main thread).

...
TopSensor > plungerAtTop < TopSensor [ High ] Controller > topSensorHigh < Controller [ TopSensor:=High ] TopSensor < topSensorHigh > Controller ?? TopSensor=High ?? Controller [ Open ]

... ...

Plunger < plungerAtTop >

...

...

7

3

Safety conditions

The safety conditions for this system are described below. We formalise the safety conditions in linear temporal logic LTL. Any violation of these formulae provides a scenario for a hazard that might occur. 1. If the operator is not pushing the button and the plunger is at the top, the motor should remain on. G ((operator = released button ∧ plunger = at top) ⇒ motor = on) 2. If the plunger is falling below the PONR, a state modelled as f alling f ast, the motor should remain off. G (plunger = f alling f ast ⇒ motor = of f) 3. If the plunger is falling above the PONR, a state modelled as f alling slow, and the operator releases the button, the motor should eventually turn on, before the plunger changes state. This can be modelled in LTL as G ((plunger = f alling slow ∧ operator = released button) ⇒ (plunger = f alling slow U motor = on)) For technical reasons it also is possible in our model (in contrast to the real system) that the behavior skips forever without performing any action. Therefore, we have to add an antecedent to the formula to exclude paths on which the plunger never falls beyond the PONR (i.e., plunger is f alling f ast), which do not reflect any real behavior of the system. GF (plunger = f alling f ast) ⇒ G ((plunger = f alling slow ∧ operator = released button) ⇒ (plunger = f alling slow U motor = on))) 4. The motor should never turn off while the plunger is rising. G (¬((plunger = rising below P ON R ∨ plunger = rising above P ON R) ∧ motor = of f))

4

Failure modes of the system

As possible failure modes of the system we consider the failing of sensors and other components to send the correct values to the controller. The system reacts to the signals from four different sensors: the top sensor, the PONR sensor, the

8

bottom sensor, each of which can indicate values high and low , the button sensor signalling pushed or released . Each of these sensors can get stuck at one of the two values causing the controller to assume a wrong situation, because the sensor is not reflecting the actual situation in the plant anymore. For example, the top sensor might get stuck at value high, causing the system to assume that the plunger is still at the top although it might have fallen already. As an additional failure mode, we consider commission failures of the motor component, which can change instantaneous to an on or of f state. This leads to ten different failure modes for the industrial press which we call single-failure modes. When we combine the different single-failure modes we get 40 double-failure modes, each of which models the combination of two single-failure mode.

5

FMEA results

In single-failure mode seven of the injected faults lead to a violation of a safety conditions: - Button sensor stuck pushed leads to a violation of safety conditions (1) and (3); - Bottom sensor stuck high leads to a violation of safety conditions (2); - PONR sensor stuck low leads to a violation of safety condition (2); - PONR sensor stuck high leads to a violation of safety condition (3); - Top sensor stuck high leads to a violation of safety condition (4); - Motor on leads to a violation of safety condition (2); - Motor of f leads to a violation of safety condition (1),(3) and (4). In all other failure modes the system satisfies the four given safety conditions 1 . In Figure 7 we give an example of a violation which is generally output by the system as a counter-example. Generally, a counter-example lists a sequence of states which lead to a state in which the safety condition does not hold. In our application within the process of FMEA, we can read from a counter-example the relationship between a failure mode of a component and the hazard that can occur as a consequence. That is, the counter-example shows how the failure mode leads to the hazard condition.
1 In some case we have prioritized internal actions over external action to avoid unrealistic counter examples that occur due to race conditions in the environment model. This is very coarse abstraction of the timing of the system (we only assume that external messages are slower than internal). To solve these cases a timed model inclusive time model checking could be useful

9

Failure: PONRSensor stuck low Violated Theorem: If the plunger is falling below the PONR the motor should remain off. (2) Counter-Example: The plunger operates as usual until it reaches the falling slow state. At this point, the PONR sensor fails, remaining stuck in the low state. The plunger then begins falling fast. Normally when the plunger reaches the falling fast state, the PONR sensor would switch to high. However, since it has failed, it remains in the low state. The operator then releases the button. Under normal operation, this abort attempt would not be allowed. However, in this case, the PONR sensor incorrectly indicates that the plunger is still falling slow, so the controller allows the abort and turns on the motor. This violates the safety condition that the motor should not be turned on if the plunger is falling fast. Figure 7: Counter-Example as indicated by the model checker Every failure mode was then combined with a second failure mode to exercise a double-failure mode. As a result only in one case the combination of failures led to a safety violation that was not apparent in the single-failure view, namely - Top sensor and bottom sensor stuck high leads to a violation of safety condition (2). Interestingly though, in a number of cases failure modes would eliminate each other’s impact. For example, if the PONR sensor stuck low and the Button stuck pushed failure mode occur simultaneously, safety condition 2 could be proved to be satisfied by the faulty system. However, the press system with only the PONR sensor stuck low violates condition 2.

6

Statistics of the runs

The graph in Figure 8 depicts the total execution time running the model checking on each single-failure view of the press system. We ordered the runs by theorems to be checked. That is, the first ten entries of the diagram describe the performance of the model checker checking theorem 1 for each of the ten failure views. The next ten entries show the performance of the model checker checking theorem 2 for all ten failure views and so on. Figure 8 shows a fairly homogeneous distribution of all runs with the exception of two runs which took a little more time. In both these runs the tool checked theorem 3, which is the most complex formula with three nested temporal operators of which one is the until operator resulting in a complex model 10

Total Execution Time for Metal Press (Single Failures)

Total Execution Time for Metal Press (Double Failures)

60
Time (seconds)

40
Time (seconds)

50

39

38

40 37 30 36 20 35 10
Experiment No.

34
Experiment No.

0 0 5 10 15 20 25 30 35 40

33 0 20 40 60 80 100 120 140 160

Figure 8: Total execution time of the press system with single-failure view and double-failure view checking algorithm to be applied. Moreover, in both cases a violation of the theorem was found so that in addition a counter-example had to be computed by the model checker. Figure 8 shows the distribution of verification time used for checking the press system in double-failure mode. As in single-failure mode the runs are relatively homogeneously distributed over time. Note, however, that the graphic in Figure 8 depicts a higher resolution of the time axis. Figure 8 shows also that the runs on theorem 3 use a higher execution time in general due to the theorem’s higher complexity. Also notably is that the average total execution time is the same for the single-failure and double-failure view, namely 35.5 seconds. That is, the number of faults injected does not increase the complexity of the model checking process. The reason for this, however, might be that the case study is too small to show any significant differences. That is, the actual checking time that is used by the tool might be effected by the double failure mode but it is relatively small compared to the time necessary to build up the internal model of the press system which is roughly the same for single and double points of failure. This relationship becomes more apparent for bigger systems like the mine pump as described in the next section.

References
[1] Atchison B, Lindsay P, Tombs D. A case study in software safety assurance using formal methods. Technical Report, University of Queensland, SVRC 99-31, www.itee.uq.edu.au/~pal/SVRC/tr99-31.pdf 1999. [2] McDermid J, Kelly T. Industrial press: Safety case. Technical Report, High Integrity Systems Engineering Group, University of York 1996. [3] Grunske L, Lindsay PA, Yatapanage N, Winter K. An automated Failure Mode and Effect Analysis based on high-level design specification with Be11

havior Trees. Integrated Formal Methods, 5th International Conference, IFM 2005, Eindhoven, The Netherlands, November 29 - December 2, 2005, Proceedings, Lecture Notes in Computer Science, vol. 3771, Romijn J, Smith G, van de Pol J (eds.), Springer, 2005; 129–149.

12

Sign up to vote on this title
UsefulNot useful