You are on page 1of 180

Diagnosis and Control

LM Automation Engineering

Safety and Risk


Assessment
Andrea Paoli
Andrea Tilli
DEI University of Bologna
Email: andrea.tilli@unibo.it

Diagnosis and Control Safety design

A.A. 2014 - 2015

Resume and Introduction


We deal with Faults
Motivations:

Quality
(reliability/availability)

Safety

Fault Tolerance (FT)


or
Fail-Operational FT
Fail-Safe
or
Fail-Safe FT

Countermeasures:
Focus on algorithms: mainly diagnosis
Some basics on FTC (but case dependent)
Some basics also on other countermeasures
Diagnosis and Control Safety design

Resume and Introduction


We deal with Faults
Motivations:

Quality
(reliability/availability)

Safety

Fault Tolerance (FT)


or
Fail-Operational FT
Fail-Safe
or
Fail-Safe FT

Countermeasures:
Focus on algorithms: mainly diagnosis
Some basics on FTC (but case dependent)
Some basics also on other countermeasures
Diagnosis and Control Safety design

Definition
According to IEC-61508 (later):
SAFETY is freedom from unacceptable risk of
physical injury or of damage to the health of
people, property or environment.
Why this definition is relevant for engineering
systems?

Diagnosis and Control Safety design

Introduction
1976: Seveso disaster
uncontrolled exothermal reaction
causing the dissemination of dioxin
1986: Chernobyl disaster
Failure during the test of a safety
emergency core cooling feature
1986: Space Shuttle Challenger
O-ring seal in its right solid rocket
booster failed at liftoff
1996: ESA Arianne V unmanned rocket
Inertial reference software exception
Diagnosis and Control Safety design

Definitions
According to Birolini Reliability Engineering Theory and
Practice,
Safety can be divided in:

Accident prevention

Technical safety

Diagnosis and Control Safety design

Technical Safety
Safety preservation considering
a) system malfunctions (faults)
b) off-nominal external conditions
c) incorrect human behaviour (when possible..)
which give (or could give) consequences affecting the
people or environment health.
Set of countermeasures to guarantee acceptable risk

o
o

See safety definition


They have to manage a chain of cause effect links

removing first causes (or reduce their probability) when possible


breaking the cause-effect chain
(or both)

They have to be defined as soon as possible

Better at design stage of the system NO LEARN BY MISTAKES!

Diagnosis and Control Safety design

Frequency and probability


Frequency is the number of occurrences
of a repeating event per unit of time

E.g. a fault has a frequency of 1e-3 per year

A probability provides a quantitative description of the likely


occurrence of a particular event.

(number of outcomes corresponding to event E) / (total number of


outcomes)

Example:

An event with frequency 0.1 per year (i.e. 1 every 10 years) in 1 year
has a probability 0.1 of occurring?

Failure rate and reliability R(t)?


Diagnosis and Control Safety design

Hazard and Risk


We mentioned risk, but what is a risk?
Intuitively is a measure of danger

In safety management (also named risk management)


the term "hazard" is used to mean an event that could cause
directly harm to people, environment etc.
the term "risk, linked to an hazard, is used to mean a
combination of hazard probability and hazard consequence
severity.
Remark:
usually an hazard is a top event in the system, caused by basic
events (in the system, in the environment or in human behavior)

stand-alone basic events or combinations (series, parallel)


e.g. toxic gas release caused by o-ring failure

Diagnosis and Control Safety design

Hazard and Risk


There is no such thing as zero risk!!

The concept of tolerable (socially derived) risk is fundamental


Acceptable/Unacceptable risks
The focus is on RISK REDUCTION!!!!
Countermeasures for technical safety (at design level)

What are the dimensions of a risk?

Frequency
Consequences

Diagnosis and Control Safety design

10

Design of safety critical systems


Hazard/Risk Safety-Critical Systems
Methodologies and tools adopted for design of safety-critical
systems:

Conceptually the same as for Availability (Reliability/Maintainability)


Special procedures were developed in technical disciplines such as
railway, aircraft, space, military, and nuclear systems.
Some terms to indicate safety property: system integrity or system
dependability.

Usually safety and reliability are achieved by a combination of

Fault Avoidance
Fault Removal
Fault Tolerance

Diagnosis and Control Safety design

12

Design of safety critical systems


Important difference between Reliability/Availability and
Safety:
Design for Reliability/Availability:
Focus on correct operation with the expected performances

Design for Safety


Focus on absence (reduction) of risks for people, environment, etc.

Paradox: a switched-off system offers the maximum safety


But it is totally useless!!

In general when a safety-related fault occurs, we need to steer the


system to a safe state, not necessary operational

Sometimes the only safe-state is an operational state

E.g. guidance and propulsion for airplanes, guidance for cars

Diagnosis and Control Safety design

13

Design of safety critical systems


Another one:
Design for Reliability/Availability:
Its a free choice!
Improve product/system quality

Design for Safety


Its mandatory!
Decrease hazard risks up to an acceptable levels following given
design procedures

o
o

According to some mandatory standards/rules

IEC, ISO EN
Mandatory

Diagnosis and Control Safety design

14

Design of safety critical systems


Usually safety and reliability are achieved by a combination of
Fault Avoidance and Fault Removal

Avoidance Design techniques


Removal Tests on prototypes
Mainly related to the avoidance/removal of first causes affecting availability/
reliability and safety

o
o
o

Mainly intended for systematic faults, but they can be applied to random fault, when
possible, by increasing the basic components reliability

Rem:
- Systematic fault is a design mistake which leads to a certain fault whenever certain conditions take place
- Random fault is an occasional fault due to normal ageing of components

Diagnosis and Control Safety design

15

Design of safety critical systems


Usually safety and reliability are achieved by a combination of
Fault Tolerance

Mainly related to breaking of cause-effect chain

It is the only solution when basic causes cannot be removed or reduced


- no reliability increase for component is possible
- external environmental or human events are considered

Redundancy Reliability/Availability and Safety (usually expensive or not


feasible)

Fault detection and diagnosis (possibly forecasting) + Automatic supervision


and protection Safety (and possibly Reliability/Availability, but complex/
expensive as redundancy.. sometimes de-rated)

Diagnosis and Control Safety design

16

Design of safety critical systems


Focusing on safety:
The Hazard Analysis (HA) extracts safety-critical failures and
determines the basic causes with logical interconnections.
With respect to previous definitions:
Safety-critical failures are hazards.

Hazards = events in the systems which could give directly harm to humans,
environment etc.

Basic causes are:


- low level faults,
- off-nominal environment conditions,
- human mistakes

From now on we mainly focus on the first one, for the sake of simplicity

Diagnosis and Control Safety design

17

Design of safety critical systems


The hazard risk is measured as a number
R = C F = C FH ( FOP) ( PGC)

C is the consequence (severity) of hazard


FH is the frequency (probability) of hazard
FOP is the frequency of operation state
PGC probability the hazard turns into harmful consequences

Diagnosis and Control Safety design

18

Hazard analysis and Risk assessment


The procedure is based on the following steps:
1. Hazard identification: TOP events (Top Event: an accidental event
like release from a pipe, release from a tank, catastrophic breakage
of a vessel, )
2. Risk calculation: for all TOP events
o calculate occurrence probability in a defined time-span

Qualitative analysis to define low level causes


Quantitative analysis to compute probability

calculate magnitude of consequences (for people, structures and


environment)

Diagnosis and Control Safety design

19

Hazard analysis and Risk assessment


The resulting risk is acceptable or not?
There is no such thing as zero risk!!
The concept of tolerable (socially derived) risk is fundamental to
define a threshold (usually reported in norms for specific products)

The focus is then on RISK REDUCTION


to comply with the tolerable threshold
by countermeasures defined at design stage

No Learn by Mistake!!

Diagnosis and Control Safety design

20

Risk reduction
Identification of
hazards and effects

Frequency
Evaluation

Consequence
Evaluation
Risk Reduction
Measures
Risk Determination

Risk Evaluation
Risk Acceptability
Criteria

Risk
Acceptance

NO

YES
Set Functional
Requirements

Diagnosis and Control Safety design

21

Risk reduction
Definition of risk reduction countermeasure:
What it is
The effect on the first causes or on the cause-effect chain has to
be clarified

prevent basic causes?


prevent hazard?
mitigate consequences?

How reliable it is
It has to be combined with original hazard frequency/probability
to assess actual risk reduction.

Often the reliability level (referred as integrity) is a specification


coming from the hazard original risk and the tolerable risk threshold.

Diagnosis and Control Safety design

22

Risk reduction
A possible false-friend..
Sometimes risk reduction measure could lead to some new hazards

E.g. automatic braking systems in cars

New hazard analysis iteration


Identification of
hazards and effect

Frequency
Evaluation

Consequence
Evaluation
Risk Reduction
Measures
Risk Determination

Risk Evaluation

Diagnosis and Control Safety design

23

Functional Safety
Functional safety is part of the overall safety
It deals with safety functions
Safety functions are particular kind of risk reduction
measures which are based on the execution of some
actions (as sequences of operations) when a particular
hazard occurs or to prevent hazards
o

Different measures can be found in safety (more passive)

They need a system or a set of systems performing such


actions when needed (i.e. systems where the safety
functions are allocated)
o

Nowadays: systems are often E/E/PE

Diagnosis and Control Safety design

24

The norm IEC 61508

International standard IEC 61508: Functional safety of electrical/


electronic/programmable electronic safety-related systems
Developed in 1990es
Concerns with electrical, electronic and programmable safety-related
systems, where failure will affect people or the environment
Gives general guidelines, which can be specified in specific domains
(e.g. ISO 26262 in automotive, IEC/EN 61511 for process industries,
EN 50126 for railways, IEC/EN62061 for machinery)
Gives concepts that should be considered best practice

Diagnosis and Control Safety design

25

The norm/standard IEC 61508


Norm structure (Ed1:1998-2000, Ed2:2010):

Part 0: Functional safety and IEC61508 (IEC TR 61508-0 2005*)


Part 1: General requirements (required for compliance);
Part 2: Requirements for electrical/electronic/programmable
electronic safety-related systems (required for compliance);
Part 3: Software requirements (required for compliance);
Part 4: Definitions and abbreviations (supporting information)
Part 5: Examples of methods for the determination of safety integrity
levels (supporting information);
Part 6: Guidelines on the application of parts 2 and 3 (supporting
information)
Part 7: Overview of techniques and measures (supporting
information).

Diagnosis and Control Safety design

26

IEC 61508: definitions


Electrical Systems (E or ES)
Electronic Systems (E or ES)

The same acronym as electrical systems since they are equivalent


in IEC61508

Programmable Electronic Systems (PE or PES):

A system based on one or more programmable electronic device,


connected to input/output devices for the purpose of control,
protection or monitoring

Diagnosis and Control Safety design

27

IEC 61508: definitions


Safety-related systems (SRS):
Any system that implements safety functions necessary to achieve a
safe state for the EUC (E/E/PE or not)

o
o
o

Process plant emergency shut-down systems


Crane automatic safe-load indicator
Railway signal systems

Safety instrumented system (SIS in IEC61511, actually):


A particular kind of SRS
E/E/PE Instrumentation (HW/SW) added specifically to implement
one or more safety functions in order to reduce risk

No other normal-behavior functions

Diagnosis and Control Safety design

28

IEC 61508: the risk reduction model


Identification of
hazards and effects

Frequency
Evaluation

Consequence
Evaluation
Risk Reduction
Measures
Risk
Determination

Risk
Evaluation
Risk
Acceptability
Criteria

Risk
Accepta
nce

Set Functional
Requirements

Residual
Risk

Tolerable
Risk

EUC Risk

Necessary Risk Reduction

Increasing
Risk

Actual Risk Reduction


Partial risk covered
by other tech. safety
related system

Partial risk covered


by E/E/PE safety
related system

Partial risk covered


by external risk
reduction facilities

Risk reduction achieved by all safety-related systems and external risk


reduction facilities

Diagnosis and Control Safety design

29

IEC 61508: the risk reduction model


Identification of
hazards and effects

Frequency
Evaluation

Consequence
Evaluation
Risk Reduction
Measures
Risk
Determination

Risk
Evaluation
Risk
Acceptability
Criteria

Risk
Accepta
nce

Set Functional
Requirements

Residual
Risk

Tolerable
Risk

EUC Risk

Necessary Risk Reduction

Increasing
Risk

Actual Risk Reduction


Partial risk covered
by other tech. safety
related system

Partial risk covered


by E/E/PE safety
related system

Partial risk covered


by external risk
reduction facilities

Risk reduction achieved by all safety-related systems and external risk


reduction facilities

Does it looks natural?


Diagnosis and Control Safety design

30

EN954: a prescriptive standard


Actually it was not so natural
In machinery before IEC61508/62061, EN954 was adopted

That was a prescriptive standard/norm

Diagnosis and Control Safety design

Prescriptive Standards

Courtesy by Exida

Diagnosis and Control Safety design

The IEC61508/62062 are


Methodology/Performance-based

Courtesy by Exida

Diagnosis and Control Safety design

IEC 61508: the risk reduction model


Residual
Risk

Tolerable
Risk

EUC Risk

Necessary Risk Reduction

Increasing
Risk

Actual Risk Reduction


Partial risk covered
by external risk
reduction facilities

Diagnosis and Control Safety design

Total/Partial risk
covered by other
technologies SRS

Total/Partial risk
covered by E/E/PE
SRS

34

IEC 61508: the risk reduction model


Residual
Risk

Tolerable
Risk

EUC Risk

Necessary Risk Reduction

Increasing
Risk

Actual Risk Reduction


Partial risk covered
by external risk
reduction facilities

Total/Partial risk
covered by Other
technologies SRS

Total/Partial risk
covered by E/E/PE
SRS

Risk reduction achieved by one or more safety-related systems which host safety
functions and possibly external risk reduction facilities (SAFETY ALLOCATION)

Diagnosis and Control Safety design

35

IEC 61508: the risk reduction model


Residual
Risk

Tolerable
Risk

EUC Risk

Necessary Risk Reduction

Increasing
Risk

Actual Risk Reduction


Partial risk covered
by external risk
reduction facilities

Total/Partial risk
covered by other
technologies SRS

Total/Partial risk
covered by E/E/PE
SRS

Risk reduction achieved by one or more safety-related systems which host safety
functions and possibly external risk reduction facilities (SAFETY ALLOCATION)

Diagnosis and Control Safety design

36

IEC 61508: Risk Reduction Model

Equipment
Under
Control (EUC)

Process control system


Inherent design safety features

Diagnosis and Control Safety design

37

IEC 61508: Risk Reduction Model

Equipment
Under
Control (EUC)

Additional
SRS or safety
measures

Process control system


Inherent design safety features
Process control system
Designated safety-related systems
External risk reduction facilities

Diagnosis and Control Safety design

38

IEC 61508: Risk Reduction Model

Equipment
Under
Control (EUC)

Additional
SRS or safety
measures

Focus
on
E/E/PE
SRS

Process control system


Inherent design safety features
Process control system
Designated safety-related systems (i.e. SIS)
External risk reduction facilities

Diagnosis and Control Safety design

39

IEC 61508: Risk Reduction Model

Equipment
Under
Control (EUC)

Additional
SRS or safety
measures

Focus
on
E/E/PE
SRS

Process control system


Inherent design safety features
Process control system
Designated safety-related systems (i.e. SIS)
External risk reduction facilities

Diagnosis and Control Safety design

40

Safety Instrumented Systems


Independent system composed of sensors, logic solvers, and
final control elements for the purpose of:
Automatically taking the process to a safe state when predetermined conditions are violated

o
o

Permissive prevention: Permit a process to move forward in a safe manner


(preventing the hazard) when specified conditions are met
Mitigation: Taking action to mitigate the consequences of an industrial
hazard

Diagnosis and Control Safety design

41

Safety Instrumented Systems

Diagnosis and Control Safety design

42

IEC 61508: Safety Life cycle


ANALYSIS:
Requirements
Specifications
Documentation

REALIZATION:
Design
Implementation
Verification
Documentation
OPERATION:
Startup
Operation
Maintenance
Modifications
Decommissioning

Diagnosis and Control Safety design

43

IEC 61508: Safety Life cycle


ANALYSIS:
Requirements
Specifications
Documentation

REALIZATION:
Design
Implementation
Verification
Documentation
OPERATION:
Startup
Operation
Maintenance
Modifications
Decommissioning

Diagnosis and Control Safety design

44

IEC 61508: Safety Life cycle


1
2
3
4
5
Overall planning

Operation and
maintenance
planning

Safety
Validation
planning

Install. and
commiss.
planning

9
12
13
14
16

Diagnosis and Control Safety design

Concept
Overall Scope
Definition
Hazard and
risk analysis
Overall Safety
requirements
Safety req.
allocation
Safety related
Systems E/
EE/PES
Realization

Safety related
10 Systems Others

Overall installation and


commissioning
Overall safety
validation
Overall operation
maintenance
Decommissioning or
disposal

Realization

15

External risk
reduction
11
facilities
Realization

Overall modification
and retrofit

45

IEC 61508: Safety Life cycle


1
2
3
4
5
Overall planning

Operation and
maintenance
planning

Safety
Validation
planning

Install. and
commiss.
planning

9
12
13
14
16

Diagnosis and Control Safety design

Concept
Overall Scope
Definition
Hazard and
risk analysis
Overall Safety
requirements
Safety req.
allocation
Safety related
Systems E/
EE/PES
Realization

Safety related
10 Systems Others

Overall installation and


commissioning
Overall safety
validation
Overall operation
maintenance
Decommissioning or
disposal

Realization

15

External risk
reduction
11
facilities
Realization

Overall modification
and retrofit

46

IEC 61508: Safety Life cycle


1
2
3
4
5
Overall planning

Operation and
maintenance
planning

Safety
Validation
planning

Install. and
commiss.
planning

9
12
13
14
16

Diagnosis and Control Safety design

Concept
Overall Scope
Definition
Hazard and
risk analysis
Overall Safety
requirements
Safety req.
allocation
Safety related
Systems E/
EE/PES
Realization

Safety related
10 Systems Others

Overall installation and


commissioning
Overall safety
validation
Overall operation
maintenance
Decommissioning or
disposal

Realization

15

External risk
reduction
11
facilities
Realization

Overall modification
and retrofit

47

IEC 61508: Safety Life cycle


9

Safety related
Systems E/
EE/PES
Realization

9.1

Safety Requirement
Specifications
Functions

9.2

Diagnosis and Control Safety design

Validation
Planning

9.3

Integrity
Design
development

9.4

Integration

9.6

Safety
Validation

9.5

Installation
commissioning
Operation
Maintenance

48

IEC 61508: Safety Life cycle


9

Safety related
Systems E/
EE/PES
Realization

Time

Details

Diagnosis and Control Safety design

49

(Hazard Analysis) and Risk Assessment


Hazard Analysis is sometimes omitted
just Risk Assessment

We focus on this important step of the safety life-cycle


COTS SIS are often available

Design issues are for the COTS SIS manufacturers, not for users

Diagnosis and Control Safety design

50

An example of risk assessment


Diammonium phosphate (DAP) is made by reacting ammonia
and phosphoric acid.

DAP is non-hazardous.
Ammonia is the most hazardous chemical present.

Diagnosis and Control Safety design

51

An example of risk assessment

Diagnosis and Control Safety design

52

(Hazard Analysis) and Risk Assessment


Objective:
Use systematic methods to:
- list all of the potential hazards
- assess the risk of each hazard

Impossible to have formal guarantees


Have you considered everything?
Procedures help, but cannot guarantee

Many methods have been proposed


Diagnosis and Control Safety design

53

Qualitative methods
Aims:

Identify and rank by importance potential hazards (and faults)


Qualitatively evaluate plant safety (and availability)
Provide inputs to quantitative methods

Scope:

Reduce risks safety requirements


(Improve reliability)

Tools:

Hazard and operability analysis (HAZOP)


Failure mode and effects analysis (FMEA)

When:

At design stage

Diagnosis and Control Safety design

54

Hazard and Operability Analysis


HAZOP studies were introduced in chemical industries as
A formal, systematical, critical examination of the process and
engineering intentions of new facilities to assess the hazard
potential of maloperation or malfunction of individual items and the
effects on the facility as a whole

Basic concepts:
Partitioning the system in components/items/nodes

tailored to chemical plants

circuit of pipes, valves, tanks, reactors, etc

Botton-Up inductive approach

Start from components deviations w.r.t. nominal behavior

Why systematic?

Some rules help not forgetting possible deviations

Diagnosis and Control Safety design

55

Hazard and Operability Analysis


Basic definitions and procedure:
For each component define:

Intention: expected function

Represented by values of some PARAMETERS


Flow, Pressure, Temperature

Deviations: departures from the intention


Causes: reasons of deviations
Consequences: result of deviations
Hazards: safety-relevant events due to consequences

o
o
o
o

Consider each plant item and apply a set of predefined guide words
to each item parameter to find possible deviations

Guide words: NO/NOT, MORE, LESS, PART OF, REVERSE

Possibly dependent on the parameter type

Apply all guide words not to forget possible deviations

No formal guarantee actually

Diagnosis and Control Safety design

56

Hazard and Operability Analysis

Diagnosis and Control Safety design

57

Hazard and Operability Analysis

Actually this part is


risk assessment and
reduction

Diagnosis and Control Safety design

58

Hazard and Operability Analysis

Item keywords

Diagnosis and Control Safety design

Outputs

59

An example of risk assessment

HAZARD
TOP EVENT
Diagnosis and Control Safety design

60

Failure Mode and Effect Analysis


Failure mode and effect analysis (FMEA):

Step-by-step procedure to evaluate severity of potential failure


modes
Identify items where modifications are required to reduce the
severity of the effect of failure modes
Evaluate and rank failures combining severity of consequences and
probability of occurrence
Evaluate and rank diagnosability of failure modes
Identify corrective measures and reconfigurations

Described in the US Department of Defense document for


military and aerospace systems MIL-STD-1629A
Diagnosis and Control Safety design

61

Failure Mode and Effect Analysis


FMEA general procedure
Define the system and its required functionalities and performances
Construct component/reliability block diagrams
List components

Define them
Identify their failure modes
Evaluate their failure rates

Analyze the effect of each component failure mode on system


performance
Evaluate severity rankings, failure rates and diagnosability rankings
Prioritize failure modes
Recommends design improvements, reconfiguration actions,
maintenance strategies

Diagnosis and Control Safety design

62

FMEA: worksheets

Diagnosis and Control Safety design

63

FMEA: worksheets

Diagnosis and Control Safety design

64

FMEA: worksheets

Diagnosis and Control Safety design

65

FMEA: Criticality Analysis


Criticality is a relative measure of the consequences of a
failure mode and its frequency of occurrence

Failure Mode and Effects Criticality Analysis (FMECA)

Severity ranks and severity index (is)

Level 1 minor no significant effect (is=1)


Level 2 Major reduction in operational effectiveness (is=2)
Level 3 Critical significant reduction of functional performance
with immediate change in operating state (is=3)
Level 4 Catastrophic total loss of system, significant property
damage, deaths and/or environmental damage (is=4)

Diagnosis and Control Safety design

66

FMEA: Criticality Analysis


Frequency ranks
Level 1 Very low frequency 0 < 0.01 failures/year
Level 2 Low frequency 0.01 < 0 < 0.1 failures/year
Level 3 Medium frequency 0.1 < 0 <1 failures/year
Level 4 High frequency 0 > 1 failures/year

Components failure rate data need to be adjusted:


0=(k1k2b)

o
o
o
o
o
o

b: base failure rate


k1: environmental stress factors k2: duty stress factors
: portion of failures in specified failure mode
: conditional probability that the failure will give bad consequences
0: allocated failure rate
(t: operating time)

Diagnosis and Control Safety design

67

FMECA: worksheets

Diagnosis and Control Safety design

69

FMECA: Criticality Matrix


Criticality matrix

Combines the probability level with the severity classification to


Compare the frequency of each failure mode to all other failure
modes with respect to severity
Is built inserting failure mode identification numbers in the
appropriate matrix frequency Vs. severity cell

io

Increasing Frequency

4
2.2/1

3
2
1
1

Diagnosis and Control Safety design

is
Increasing Severity
70

FMECA: Criticality Matrix


Criticality matrix

Combines the probability level with the severity classification to


Compare the frequency of each failure mode to all other failure
modes with respect to severity
Is built inserting failure mode identification numbers in the
appropriate matrix frequency Vs. severity cell

io

Increasing Frequency

4
2.2/1

3
2

2.2/1

1
1

Diagnosis and Control Safety design

is
Increasing Severity
71

FMECA: Criticality Matrix


Criticality matrix

Combines the probability level with the severity classification to


Compare the frequency of each failure mode to all other failure
modes with respect to severity
Is built inserting failure mode identification numbers in the
appropriate matrix frequency Vs. severity cell

io

Increasing Frequency

4
2.2/1

3
2

2.2/1

1
1

Diagnosis and Control Safety design

is
Increasing Severity
72

Remark
HAZOP and FMECA do not give only a list of hazards
Developed for any kind of failure (reliability)
Also risk assessment and reduction are partially covered
o Anyway in safety we are mainly interested to use them to get
hazard list

HAZOP and FMECA derive Hazard as consequences/effects of


SINGLE component/node fault-mode
There can be others

How to get the others?


Designer decides its own strategy to find other possible hazards
HAZOP and FMECA considering all the possible combinations with
more than one of the faults previously considered.
o Complexity
Diagnosis and Control Safety design

73

Remark
Beside the previous problem (how to be sure about hazard list
completeness..):
hazard frequency assessment!
Quantitative methods

Diagnosis and Control Safety design

74

Quantitative Risk Assessment

Diagnosis and Control Safety design

75

Quantitative Risk Assessment


Node

Guide Word

Diagnosis and Control Safety design

Parameter

76

Quantitative Risk Assessment

SAFETY RELEVANT CONSEQUENCE

Diagnosis and Control Safety design

77

Quantitative Risk Assessment

TOP EVENT

Diagnosis and Control Safety design

78

Quantitative Risk Assessment

Cause of Top Event

TOP EVENT

Diagnosis and Control Safety design

79

Quantitative Risk Assessment

Diagnosis and Control Safety design

80

Quantitative Risk Assessment


Release to environment

Diagnosis and Control Safety design

82

Fault Tree Analysis


Is a system reliability analysis method
Developed in Bell Lab in 1962
Fault: top event is an undesirable event (failure)
Tree: it has a tree structure
It is a logical model

Focus on failures
Try to reduce the probability of the top event
What could cause the top event to occur?
Only relevant failure modes are considered

o
o
o

Use it to evaluate the probability of top event


identify critical causes of top event
improve design to reduce the probability of top event

Diagnosis and Control Safety design

83

Fault Tree Analysis


Recipe:

Event symbols
Gate symbols
Transfer symbols

Top Event

and

Event 1

or

Diagnosis and Control Safety design

84

Fault Tree Analysis


Recipe:
Event symbols

Basic event: requiring no further development

External event: deterministic (0 or 1)

Undeveloped event: no further developed

Intermediate event: determined by other events

Gate symbols
Transfer symbols

Diagnosis and Control Safety design

85

Fault Tree Analysis


Recipe:
Event symbols
Gate symbols

AND

and

Priority AND

and

OR

or

XOR

xor

K-out-of-N

NOT

Inhibit gate

Transfer symbols

Diagnosis and Control Safety design

86

Fault Tree Analysis


Recipe:
Event symbols
Gate symbols
Transfer symbols

Transfer IN: the event is developed somewhere else

Transfer OUT: this portion of the three is attached to the


corresponding transfer In

Diagnosis and Control Safety design

87

Fault Tree Analysis


Top Down Process:

Define the undesired event, i.e., the top event.


Understand the system thoroughly.
Determine high level fault events. Furthermore, continue FTA for
determining lower level fault events.
Construct the fault tree
Evaluate the fault tree

Diagnosis and Control Safety design

88

Fault Tree Analysis


Example:

Develop the fault tree for a system with the following diagram
A

Diagnosis and Control Safety design

89

Fault Tree Analysis


Example:

Develop the fault tree for a system with the following diagram
Top Event: no current out of end
A

Diagnosis and Control Safety design

90

Fault Tree Analysis


Example:

Develop the fault tree for a system with the following diagram
OR
A

AND

Diagnosis and Control Safety design

91

Fault Tree Analysis


Example:

Develop the fault tree for a system with the following diagram
No current at
end

and

A fails

B fails

No current
Out of D

or

C fails

Diagnosis and Control Safety design

D fails

92

Fault Tree Analysis


Example:

Develop the fault tree for a system with the following diagram
No current at
end

OR
A

and

B
A fails
C

B fails

or

AND
Diagnosis and Control Safety design

No current
Out of D

C fails

D fails

93

Fault Tree Analysis


Fault Tree Evaluation:

Boolean algebra analysis


o Determine minimal cut sets MC1, , MCn

Cut sets are the unique combinations of component failures that can
cause system failure
A cut set is said to be a minimal cut set if, when any basic event is
removed from the set, the remaining events collectively are no longer a
cut set

Compute the failure probability as

Pr(T ) = Pr ( MC1 ! MC2 !! MCn )

OR gate: union of events


Pr( A ! B) = Pr ( A) + Pr ( B) " Pr ( A) # Pr( B)
AND gate: intersection of events
Pr( A ! B) = Pr ( A) " Pr ( B)

Diagnosis and Control Safety design

94

Fault Tree Analysis


Fault Tree Evaluation:

Boolean algebra analysis


No current at
end

MC1 = { A,B,C}
MC2 = { A,B,D}
Pr(T ) = Pr ( A ! B ! C + A ! B ! D) =

and

A fails

B fails

= Pr( A ! B ! C ) + Pr ( A ! B ! D) " Pr ( A ! B ! C ! D) =
= Pr( A) ! Pr ( B) ! Pr (C ) + Pr ( A) ! Pr( B) ! Pr ( D) " Pr ( A) ! Pr ( B) ! Pr(C ) ! Pr ( D)

No current
Out of D

or

C fails

Diagnosis and Control Safety design

D fails

95

Fault Tree Analysis


Fault Tree Evaluation:

Boolean algebra analysis


No current at
end

and

A fails

B fails

Pr(T ) = Pr ( A ! B ! (C + D)) =
= Pr( A) ! Pr ( B) ! Pr (C + D) =

= Pr( A) ! Pr ( B) ! (Pr(C ) + Pr ( D) " Pr (C ) ! Pr ( D)) =


= Pr( A) ! Pr ( B) ! Pr (C ) + Pr ( A) ! Pr( B) ! Pr ( D) " Pr ( A) ! Pr ( B) ! Pr(C ) ! Pr ( D)

No current
Out of D

or

C fails

Diagnosis and Control Safety design

D fails

96

Fault Tree Analysis


Exercise:

Develop the fault tree for a system with the following diagram
Top Event: no current (flow) out of end

V-2

P-1

V-4

V-1

T-1

V-3
Sensing and
Control

Diagnosis and Control Safety design

V-5
P-2
AC Power
Source

97

Quantitative Risk Assessment


We have seen FTA
Very common and linked to RBD
Also for reliability

Usually for safety:


Failure (Occurrence) Rate of each Hazard is needed!
How can be computed by FTA?
o Considered rules are more oriented to probability
o Markov models Computations (automatic..)
..
o In general Simulations
Diagnosis and Control Safety design

99

Risk Reduction
Up to now: Hazard Analysis and Risk assessment
Target: reduce the risk at acceptable level with some
solutions
Acceptable level definition? Field-specific standard/norm
ALARP see later

Define what safety solutions(*)


Define reliability requirements for safety solutions(*)
o(*) safety functions in functional safety

Usually referred as integrity


Many ways, with different metrics:
- qualitative
- quantitative (preferable)
Diagnosis and Control Safety design

100

Risk Reduction
We will consider a possible quantitative method
LOPA
exploit SIL from IEC61508 to classify integrity/reliability
requirements

Some rules/standards inspired by IEC61508 actually


use different procedures
Common guidelines: HA and RA integrity of safety funct.
Qualitative instead of quantitative
Sometimes no accepted risk level explication
Diagnosis and Control Safety design

101

How to reduce risk?

Diagnosis and Control Safety design

102

How to reduce risk?


Layer of protection analysis (LOPA)

Diagnosis and Control Safety design

103

How to reduce risk?


Layer of protection analysis (LOPA)

Prob: 1.9E-3
Diagnosis and Control Safety design

104

How to reduce risk?


Layer of protection analysis (LOPA)

ACCEPTABLE?????
E.g. Tolerable approx 1E-5!!!
Diagnosis and Control Safety design

Prob: 1.9E-3
105

How to reduce risk?


We need a new protection layer!!!

Diagnosis and Control Safety design

106

How to reduce risk?


We need a new protection layer!!!

Prob: 1.9E-5
Diagnosis and Control Safety design

107

How to reduce risk?


We need a new protection layer!!!

-2 orders of magnitude
SIL 2 (see later)
Diagnosis and Control Safety design

Prob: 1.9E-5
108

A step back: remark


Layer of protection analysis (LOPA)

Diagnosis and Control Safety design

109

IEC 61508: the SIL concept


Safety Integrity Levels (SIL) used to classify the
reliability levels requested to safety functions (or
solutions in general) in 4 main categories
REMARK: the reliability levels correspond to the
overall risk reduction provided
SIL1 SIL2 SIL3 SIL4
SIL1 has the lowest level of risk reduction.
SIL4 has the highest level of risk reduction (just nuclear).

Diagnosis and Control Safety design

110

IEC 61508: the SIL concept


The definition of the 4 categories depends on the working
mode of the safety function
Two modes have been considered:
Low demand mode frequency of demands for operation made on
a safety-related system is no greater than twice the proof test
frequency and lower than 1per-year (per-year = per-annum, pa);
High demand or continuous mode frequency of demands for
operation made on safety-related system is greater than twice the
proof check frequency or larger than 1pa

Continuous mode (Safety Function works at any time) is already covered by


the above definition (demands of operations have an infinite frequency)

Diagnosis and Control Safety design

111

IEC 61508: the SIL concept

Diagnosis and Control Safety design

112

IEC 61508: the SIL concept

Diagnosis and Control Safety design

113

IEC 61508: SIL and ALARP


Unacceptable Region

ALARP Region

Risk cannot be justified


except in extraordinary
circumstances
Usually the cost of risk reduction
is more than proportional
w.r.t. risk reduction

Tolerable only if risk reduction


is impracticable or if its cost is
disproportionate to the
improvement gained.

Necessary to maintain assurance that


risk remains at this level

Broadly acceptable
Region
Negligible Risk

Diagnosis and Control Safety design

114

IEC 61508: SIL and ALARP


Example

Tolerable risk: 1e-4 per year


Negligible risk: 1e-6 per year
Risk assessed: 8e-5 per year
Hazard: 2 deaths
Cost per killed life: 1.5 M
New SIS: cost 4.5 k, assessed risk 2e-6 per year
Plant life: 30 years

# of saved lives in 30 years: ((8e-5) (2e-6))302=4.7e-3


Cost of a saved life: 4.5e3/4.7e-3 = 0,96 M < 1.5 M OK!!
Diagnosis and Control Safety design

115

IEC 61508: the SIL concept


Typical workflow in industrial automation:
HA and RA
(or HRA)

- Safety functions
- Risk reduction
requests turned in
SIL level requests

Buy COTS HW and


SW IDE+RTE with
the requested SIL
level

Other fields (e.g. automotive, aerospace): the SRS hosting


the Safety Functions are developed internally
Safety systems providers for Industrial automation: have to
deal with design and realization of SRS for SF
How to achieve the requested SIL?
In the following some basic ideas from IEC61508
Diagnosis and Control Safety design

116

Evaluation/Design of the layer of protection?


Usually we need hardware redundancy!!

Diagnosis and Control Safety design

117

Evaluation/Design of the layer of protection?


Probabilistic data on components

s safe failure rate


d danger failure rate
dd detected danger failures, du undetected danger failures

Diagnostic cover:

DC = "

!dd
!dd
="
!d
!dd + !du

Common causes of failure!!! CCF (beta-factor)


A
CCF
B

Diagnosis and Control Safety design

118

Evaluation/Design of the layer of protection?

Diagnosis and Control Safety design

119

Evaluation/Design of the layer of protection?

Prob Failure per hour


Sum >= 1e-6
= SIL 1
We need SIL 2!!
How to move to SIL2?
-New structure
-Decrease CCF

Diagnosis and Control Safety design

120

Remark on previous table


Those tables are obtained using RBD or Markov
models applied to the considered scheme of adopted
safety functions
We focused on HW, also SW should be considered (VV model to avoid systematic failures).

Diagnosis and Control Safety design

121

REDUNDANCY
Premises:
General objectives we dealt with:
- (increase system availability)
- obtain a prescribed integrity for a safety function/measure
How to get them?
- push toward component perfection
new technologies etc.

- introduce some fault tolerance


the system should be able to work even after some faults

Usual tool for fault tolerance: REDUNDANCY


Diagnosis and Control Safety design

122

REDUNDANCY
Reminder:
Redundancy is the most obvious countermeasure to faults:
if a component does not work at all or works in a partially bad
way (partial fault) another one does his job
For partial faults, this is not the only solution:
Control solutions which can obtain a good (or slightly degraded)
system behaviour
Diagnosis + Control reconfiguration or robustness
Case-dependant (not always possible)
Diagnosis and Control Safety design

123

REDUNDANCY
Remark:
Redundancy is usually intended for fault tolerance
Another feature of redundancy is:
Inherent and straightforward fault detection and diagnostic
If two components should give the same output, and it doesnt happen a fault has
clearly occurred.
Usually, for fault detection, less redundant components are needed w.r.t. fault
tolerance.

Diagnosis and Control Safety design

124

Components degradation (1/3)


For components (possibly complex components) the following
level of degradation are introduced
Fail-operational (FO): One fault (or more) is tolerated (i.e., the
component stays operational after one fault).

Availability: increase reliability, improve system quality,


maintenance of failed subparts can be scheduled while the component is still
operational
For Safety issues: this is required if no safe state can be reached when that
component fails (even with additional safety action/countermeasure). The only
solution is to keep the component in operational state as much as possible.
REDUNDANCY is the usual tool for FO

Sometimes control for partial faults

Diagnosis and Control Safety design

125

Components degradation (2/3)


For components (possibly complex components) the following
level of degradation are introduced
Fail-safe (FS): After one (or several) fault(s), the component directly
reaches a safe state

passive fail-safe, without external power

Usually depends on the plant inherent (open loop) characteristics

or is brought to a safe state by a special action


o
o

active fail-safe, with external power


in such case an ADDITIONAL SAFE FUNCTION is needed

in both cases the Fail-safe property is not just a property of the


component, it depends on plant or other safety fucntions

Diagnosis and Control Safety design

126

Components degradation (3/3)


For components (possibly complex components) the following
level of degradation are introduced
Fail-silent (FSIL): After one (or several) fault(s), the component
exhibits quiet behavior externally (i.e., stays passive by switching
off) and therefore does not wrongly influence other components.

o
o

This is usually interesting for basic components


It allows to use redundancy
The faulty component does not impair the correct behavior of others

Diagnosis and Control Safety design

127

Redundancy for components


There are two basic approaches:
STATIC REDUNDANCY
DYNAMIC REDUNDANCY
o
o

Hot Standby
Cold Standby

Diagnosis and Control Safety design

128

Static redundancy for components


Uses three or more parallel modules that have the same input
signal and are all active.

Their outputs are connected to a voter that compares these signals


and decides by majority which signal value is the correct one.
No process knowledge

1
2

VOTER
3

n
Diagnosis and Control Safety design

xm
This structure is
actually for sensors or
controllers, not for
actuators (see later)
129

Static redundancy for components


If a triple-redundant modular system is applied, and a fault in
one of the modules generates a wrong output, this faulty
module is masked (i.e., not taken into account) by the twoout-of-three voting.

Hence, a single faulty module is tolerated without any effort for


specific fault detection.

With n (odds) redundant modules it is possible to mask/


tolerate (n 1)/2 faults.

Diagnosis and Control Safety design

130

Dynamic redundancy
Requires fewer modules at the cost of more information
processing.
A minimal configuration consists of two modules

One module is usually in operation


If it fails, the standby or backup unit takes over.

This requires fault detection to observe if the operational


modules become faulty.

consistency checking
comparison with redundant modules
information redundancy ( parity checking or watchdog timers).

After fault detection, it is the task of the reconfiguration to


switch to the standby module and to remove the faulty one.
Diagnosis and Control Safety design

131

Dynamic redundancy: hot standby


The standby module is continuously operating

short transfer time


at the cost of its operational aging (wear-out)

Fault
Detection

Reconfiguration

xm
2

Diagnosis and Control Safety design

132

Dynamic redundancy: cold standby


The standby system is out of function during nominal
functioning

does not wear


requires two additional switches at the input and more transfer time
due to a startup procedure

Fault
Detection

Reconfiguration

xm
2

Diagnosis and Control Safety design

133

Redundancy for components


Dynamic redundancy and cold standby is especially attractive
for mechatronic systems
more measured signals and embedded computers are already
available
fault detection can be improved by applying model-based
approaches.

Redundant schemes exist also for software fault tolerance.


Always remember that software is a component and can fail!!!!
Usually deterministic fault, not covered by V-V at development stage
DIVERSITY in REDUNDANCY otherwise CCF!!

o
o

SW redundancy to prevent latent deterministic faults is used for very


demanding risk reduction
Validation-Verification process is assumed not to be enough!

Diagnosis and Control Safety design

134

Components degradation with Redundancy


Fault behavior for different redundant structures

Duplex in static redundancy (with voter) is just to detect fault by


comparison, no FO actually
Duo-duplex is similar (two Duplex)

Diagnosis and Control Safety design

135

Components degradation with Redundancy


Fault behavior for different redundant structures

Duplex in static redundancy (with voter) is just to detect fault by


comparison, no FO actually
Duo-duplex is similar (two Duplex)

Diagnosis and Control Safety design

136

Components degradation with Redundancy

Fault behavior for different redundant structures

For flight-control computers, a triplex structure with dynamic redundancy (hot


standby) is typically used, which leads to FO-FO-F
Two failures are tolerated and a third gives a failure.
The third could be FS if the pilot can operate manually

Some complex planes cannot be governed manually (instability)

REM: usually different SW implementations to avoid deterministic CCF


Usually valid for any redundant control system with SW control algorithm

Diagnosis and Control Safety design

137

Components degradation with Redundancy


Different interpretations of FO-FO-FO-F according to
application of the component.
Three basic cases:
- Nominal working function with no safety issue
- Nominal working function with safety-critical requirements
- Additional safety function to handle nominal system faults
leading to hazards

Diagnosis and Control Safety design

139

Components degradation with Redundancy


Different interpretations of FO-FO-FO-F according to
application of the component.
Nominal working function with no safety issue
Possible safety issues are solved with additional safety function/
measures
Objective: to improve availability
The system with that component can work safely until the F
condition is reached

When the F is reached the Safety function/measure takes place, if needed

Diagnosis and Control Safety design

140

Components degradation with Redundancy


Different interpretations of FO-FO-FO-F according to
application of the component.
Nominal working function with safety-critical requirements
If the component fails, the function is lost and an hazard occurs
immediately
The system with that component can work safely up to two faults
before the F:
FO-FO-F
When the FO just before F (FO-F) the system should be steered
quickly to a safe stop condition to prevent the final fault

Rem: If the procedure to stop is too long and the probability of the final fault
along such stop-procedure is unacceptable, then such safe stop should be
started just after the previous fault.

Diagnosis and Control Safety design

141

Components degradation with Redundancy


Different interpretations of FO-FO-FO-F according to
application of the component.
Additional safety function to handle nominal system faults
leading to hazards
If the component fails, the safety function is lost but the hazard
depend on the
The system with that component can work up to F condition of the
safety function
As soon as F occurs, the system should be steered quickly to a safe
stop condition to prevent the system faults leading to hazard with no
safety function

Rem: If the procedure to stop is too long and the probability of the faults along
such stop-procedure is unacceptable, then such safe stop should be started
just after the previous fault.

Diagnosis and Control Safety design

142

Fault Tolerant Sensors


A fault-tolerant sensor configuration should be at least failoperational (FO) for one sensor fault.
Otherwise the fault can open the feedback loop!!

This can be obtained by:


hardware redundancy with the same type of sensors

Static/Dynamic redundancy

analytical redundancy with different sensors and process models

Virtual sensors

Diagnosis and Control Safety design

143

Sensor hardware redundancy


Sensor systems with static redundancy are realized with a
triplex system and a voter
Sensor
1

Sensor
2

VOTER
Sensor
3

Diagnosis and Control Safety design

xm

144

Sensor hardware redundancy


Sensor dynamic redundancy needs at least two sensors and
a fault detection for each sensor.

Usually only hot standby is feasible.


Fault
Detection

Reconfiguration

Sensor
1

xm
Sensor
2

Often Fault Detection uses knowledge and models of the Plant


Another less powerful possibility is to use plausibility checks for two
sensors, as well as using signal models (e.g., variance) to select the
more plausible one.

Diagnosis and Control Safety design

145

Sensor analytical redundancy


Consider a process with one input u and one main output y1
and an auxiliary output y2.

Assuming the process input signal u is not available

One of the output signals (e.g., y1) can be reconstructed and


used as a redundant signal if process models GM1 and GM2
are known and significant disturbances do not appear (ideal
cases)
u

G1

Sensor
1

G2

Sensor
2

Process

Diagnosis and Control Safety design

y1
y2

GM 1 GM 2

y1

Process
Model

146

Sensor analytical redundancy


For a process with only one output sensor y1 and one input
sensor u, the output can be reconstructed if the process
model GM1 is know

G1

y1

Sensor
1

Input
Sensor

GM 1

y1

Process
Model

Diagnosis and Control Safety design

147

Sensor analytical redundancy


Remark: a single analytical redundancy enables fault
detection but not tolerance (similar to static duplex)
u

G1

Sensor
1

G2

Sensor
2

G1

y1
y2

GM 1 GM 2

y1

Sensor
1

Input
Sensor

y1

GM 1

y1

Process
Model

Diagnosis and Control Safety design

148

Sensor analytical redundancy


To obtain a fault-tolerant measurement at least 3 different
values for y (e.g. 1 measure + 2 reconstr.) must be available

A sensor fault is then detected and masked by a majority voter to


obtain FO-F (further decisions on first FO depend on context: safety
related or not.. see before)

G1

Sensor
1

G2

Sensor
2

y2

GM 1 GM 2
Process
Model

Process
Input
Sensor

Diagnosis and Control Safety design

y1

GM 1

y1

VOTER

y FT

y1u
149

Sensor analytical redundancy


Remark 1:
This kind of scheme can be adjusted to get
hot-standby dynamic redundancy (FO-F)

One of the Analytical Redundancy (u and GM1) is used to take a decision


ASYMMETRY: This is assumed PERFECT to achieve fault detection

G1

Sensor
1

G2

Sensor
2

y2

y FT
GM 1 GM 2
Process
Model

Process
Input
Sensor

Diagnosis and Control Safety design

y1

GM 1

y1
y1u

Reconfiguration

150

Sensor analytical redundancy


Remark 1 (contd):

When such ASYMMETRY is convenient?


- If sensor and both reconstructions have similar accuracy triplex with voter
is preferable (SYMMETRICAL SOLUTION)
- If one reconstruction has poorer accuracy, but it is very reliable (practically failfree), it can be used for fault detection

G1

Sensor
1

G2

Sensor
2

y2

y FT
GM 1 GM 2
Process
Model

Process
Input
Sensor

Diagnosis and Control Safety design

y1

GM 1

y1
y1u

Reconfiguration

151

Sensor analytical redundancy


Remark 2 (Triplex with voter):
Up to now, analytical redundancy used to mimic static HW
redundancy just for y1
Actually, exploiting Analytical Redundancy interlaced
knowledge/model-based multiple redundancy (i.e. for y2 and u)
u

G1

Sensor
1

G2

Sensor
2

y2

GM 1 GM 2
Process
Model

Process
Input
Sensor

Diagnosis and Control Safety design

y1

GM 1

y1

VOTER

y FT

y1u
152

Sensor analytical redundancy


Interlaced knowledge-based multiple redundancy:
In the considered example, a general fault-tolerant sensor
system can be designed if two output sensors and one input
sensor yield measurements of same quality (symmetry).
No HW redundancy is needed. Plant knowledge is exploited
Three test signals can be generated (residuals), and by
decision logic, fault-tolerant outputs can be obtained in the
case of single faults of any of the three sensors.

The residuals can be generated by parity equations based on


models or suitable state observers

Diagnosis and Control Safety design

153

Sensor analytical redundancy


u

G1

Sensor
1

G2

Sensor
2

y1
y2

y1FT

Process
Input
Sensor

y FT

GM 1
GM 2
GM 1 GM 2
Process
Model

Diagnosis and Control Safety design

r1

y1u
y 2u

y 2FT
uFT

r2
r3

y1

Residual Generation and


Decision Logic

154

Remark
Analytical redundancy vs Diagnosis Algorithms
Analytical redundancy aims at providing an additional copy of
one information already available form another source
Obtained through a different path based on an algorithms
Diagnosis Algorithms aims at detect when an information (or a
component) is not reliable
An additional copy is not strictly necessary

Usually:
- analytical redundancies give diagnosis
- diagnosis algorithms do not give analytic redundancy
Diagnosis and Control Safety design

155

Fault Tolerant Actuators


Actuators generally consist of different parts:

input transformer,
actuation converter,
actuation transformer,
actuation element (e.g., dc amplifier, dc motor, gear and valve).
Available measurements are frequently the input signal ui,
manipulated variable uo, and intermediate signal u3.
Signal
Transformer
(Amplifier)

ui

Actuation
Converter
(Motor)

u1

Diagnosis and Control Safety design

Actuation
Transformer
(Gear)

u2

u3

Actuation
Element
(Valve)

uo

156

Fault Tolerant Actuators


Multiple complete actuators in parallel
with either static redundancy or dynamic redundancy with cold or hot
standby.

One example of static redundancy is hydraulic actuators for fly-by-wire


aircraft, where at least two independent actuators operate with two
independent hydraulic energy circuits.

Limit the redundancy to parts of the actuator that have the


lowest reliability.

E.g. the actuation converter is split into separate parallel parts.


As cost and weight generally are higher for them than for sensors,
actuators with fail-operational duplex configuration are preferred.
One goal should always be that the faulty part of the actuator fails
silent: has no influence on the redundant parts.

Diagnosis and Control Safety design

157

Comments on redundancy (1/2)


We expect that redundancy gives better reliability in
each condition. Is it true?
Example: a triplex with an ideal vote (2oo3)
oCompute R(t) w.r.t single component
oCheck by Matlab
oWhat is the result?

Diagnosis and Control Safety design

158

Comments on redundancy (2/2)


Actually redundancy gives its best only when inherent
diagnosis capability is used to trigger maintenance
Again the triplex example
oMarkov model and Matlab
oCheck also asymptotic availability
The same maintenance is assumed after total failure (Question)
Computations and check by Matlab

Diagnosis and Control Safety design

159

What else?
Reliability of human operator?..
Reliability of algorithms for analytical redundancy?
Similar to software reliability
Systematic faults
o
o

Formal validation and verification


Extensive tests (corner cases)

But Random faults, as well


o
o
o
o

Models are not perfect: stochastic and deterministic uncertainties


Robustness?
Silent algorithms?
Check by Monte Carlo Analysis?

Diagnosis and Control Safety design

160

Automotive _
Many nominal working functions are
- safety functions, as well
- or need high availability for customer satisfaction
Keep operational (possibly degraded) after a fault to prevent
hazard or increase availability

Large use of analytical redundancy (for diagnostic and


fault-tolerance)
Reduce costs and HW complexity
Achieve diagnosis and some fault tolerance for sensors
actuators and other components (i.e. plant)
o

HW redundancy is used only when unavoidable, mainly for actuators


or some safety measures (i.e. mechanical limp home)

Algorithms are crucial!


Diagnosis and Control Safety design

161

Automotive
Reliability/integrity of such algorithms?
A single computing unit (ECU) for each automotive
system is usually adopted (no computing HW
redundancy), how to improve its reliability/integrity?
Not one for the whole car, but for a subsystem or a set of
subsystems:
o

A main ECU for Engine Control and many others: ABS ECU,
ESC ECU

What is the common solution?


Usually represented by four levels (0-3)
o

Often referred as diagnostic levels but they are more

Diagnosis and Control Safety design

162

Automotive
LEVEL 0: ELECTRIC SIGNALS CONSISTENCY
Very basic HW checks on electric signals in ECU I/O
Minimal use of information on the system under control
o Limit checking
o Max/min slope
o Stuck signal
o .

Sgn

If any consistency check is violated, fault detection alarm is


issued
Usually managed in the next level
Diagnosis and Control Safety design

163

Automotive
LEVEL 1: FUNCTIONAL DIAGNOSIS
Functional because it consider the functions linked to signals.
Models and knowledge about the plant

Open loop models, observers and other algorithms are


combined with measures and results by Level 0 to get:
interlaced model-based multiple analytical redundancy
This leads to:
- fault detection and isolation
- fault tolerance, when possible
It triggers safety or just recovery
functions when needed
u

G1

Sensor
1

G2

Sensor
2

y1

y2

y1FT

Process

Input
Sensor

y FT

GM 1

GM 2

GM 1 GM 2

r1

y1u

y 2u

y1

y 2FT
uFT

r2
r3

Residual Generation and


Decision Logic

Process
Model

Diagnosis and Control Safety design

164

Automotive
LEVEL 2: SUPERVISORY DIAGNOSIS
Can we trust the algorithms giving analytical redundancy at
LEVEL 1?
Random faults are possible due to approximations!

Solution: other algorithms to check the consistency of the


LEVEL 1 results
Based on different models
o The same output considered by a different side of the physical system

Simple and approximated


o The differences between the level 1 and level 2 results should be in a given
tolerance range.

Discrepancies trigger some safety/recover functions


Diagnosis and Control Safety design

165

Automotive
LEVEL 3: COMPUTING HW OPERATIONAL SUPERV.
ECU: Unique computing platform (C-based+RTOS) for basic
ctrl functions and Levels 0-1-2 diagnosis and functions
Limited fault-tolerance capabilities
C+RTOS fault probability is fairly low

But incipient C+RTOS faults


have to be detected as soon as
possible to prevent hazards
Self-test routines running on C
Auxiliary simple C (smart watch dog)
It checks the main one by questions
It can perform some Level 3 safety function
when needed and possible
Diagnosis and Control Safety design

166

Automotive
REMARK: FUNCTIONS TRIGGERED BY LEVELS 0-1-2-3
Active or to enable passive countermeasures
Active: for fault tolerance or other active function
To enable passive countermeasures: impose fail-silent to make the mechanical
countermeasure to be effective
o Limp home

The higher the triggering level is, the more abrupt and unpleasant the
actions are
If Level 2 or 3 trigger actions very critical conditions
o No margins for smooth recovery

Recording in ECU fault/event list for maintenance is mandatory


Often they include signaling to the driver (dashboard)

Diagnosis and Control Safety design

167

Automotive
COMMENT: AUTOMOTIVE DIAGNOSTIC LAYERS vs GENERAL
ARCHITECTURES
Similarities with Layers of Protections seen in LOPA
- Level 0 and Level 1 should detect all the faults and
trigger related actions
- If they fail, Level 2 should act
Level 3 is rather unusual and parallel to previous:
Shared computation HW + external testing unit
No clear reliability assessment for the layers
o
o
o

No clear quantitative methods to evaluate algorithms reliability


add an algorithm and give a personal qualitative assessment
you try to validate by simulations and tests

Diagnosis and Control Safety design

168

Automotive
FEATURES & TRENDS:
Many ECUs by different manufacturers for automotive subsystems
(usually referred as tier-1 providers for car-makers)
o E.g. Bosh, Magneti Marelli, Delphi, Omron

Functions on an ECU can be developed by third-party


o Not the tier-1 providers, but the car-maker itself or other external entities selected
by the car-makers.

Integration in a comprehensive nominal control + safety/diagnostic


system?
Standardization is needed
o AUTOSAR for general organization
o ISO 26262 functional safety in automotive (inspired by IEC61508)
Diagnosis and Control Safety design

169

Automatic Machinery

PRELIMINARIES: Typical functional model


Reference unit: Machine (or part of it)
Group of (1dof) mechanisms which have to give synchronized
motion
In the past: mechanical links from single source
Nowadays: electronic control is more and more common
electric drives

Electric or Electronic Cams


E.g.: labelling machine (course introduction)
M

m
a
c
h
i
n
e

M
T

M
T

M
T

M
T

M
T

M
T

M
T

M
O

M
O

M
O

M
O

M
O

M
O

M
O

Mechanical Axes

Diagnosis and Control Safety design

m
a
c
h
i
n
e

M
T

M
T

M
T

M
T

M
T

M
T

M
T

M
O

M
O

M
O

M
O

M
O

M
O

M
O

Electronic-Electric Axes

172

Automatic Machinery
Typical functional model:

Logic control is very important:


Almost completely automatic
Working and manipulating sequences
trigger:
- different motion trajectories
- different links among the motions of each mechanism
Significant direct actions on the field

Diagnosis and Control Safety design

173

Automatic Machinery
Typical functional model (contd):

Control of temporal systems


Almost completely inside the electric drives

Some other occurrences

Often seen as intelligent actuators


Embedded controllers
E.g. temperature control of labelling glue
E.g. mechanical tension control
(films for packaging: dancer mechanisms)

Monitoring/Action by human operator is occasional


Only for relevant change in the performed manufacturing

E.g.: Change in product format (format change-over - cambio


formato)

Diagnosis and Control Safety design

174

Automatic Machinery
Typical function model:
Supervision

Supervisor
Logic Ctrl
Trajectory generator

Control

F(z)

F(z)
MotorCtrl

F(z)
MotorCtrl

F(z)
MotorCtrl

Plant
Diagnosis and Control Safety design

175

Automatic Machinery
Typical technological architecture
Field-bus
PLC Axes
Controller

I
M
A
C
H
I
N
E

Vector
Drives

Inverters

Motion Control
System

O
M

MT

MT

MT

MT

MT

MT

MT

MO

MO

MO

MO

MO

MO

MO

Diagnosis and Control Safety design

176

Automatic Machinery
What about safety?
Automatic machines work without humans
Main potential safety issues (according to standards/rules):
Contact of moving parts with humans
Some objects are lost and thrown away: they could hit people or
properties
Release of chemical in the nearby environment (pharmaceutics)

Main approach:
Mechanical barriers!
oPASSIVE SAFETY MEASURES
Diagnosis and Control Safety design

177

Automatic Machinery

Diagnosis and Control Safety design

178

Automatic Machinery
Whatever fault we consider by FMECA, HAZOP
Mechanical passive barriers reduce the hazard risk
to very low level
This is not a solution for availability
For safety (mandatory by law) is usually enough.

No need of functional safety?

Diagnosis and Control Safety design

179

Automatic Machinery
No need of functional safety?
Barriers can be opened accidentally
while the machine is running
Sometimes just optical barrier light curtains
o Photocells - Optical fork

Machine must stop ASAP in a safe way


Additional request by standards/rules:
External Emergency Button
Safety along moving maintenance
Stop if a max-safe-speed is exceeded

Safety functions are needed for that!!


Diagnosis and Control Safety design

180

Automatic Machinery

Diagnosis and Control Safety design

181

Automatic Machinery
Basic safety function:
stop the machine as soon as the barriers are opened or
the emergency button has been pressed or the maxsafe-speed is exceeded in maintenance.
Very simple
The main trouble is: integrity certification
Standard PLC/Motion Controller cannot be used to
implement Safety Function
o Computation HW reliability is not suitable
o Automotive-like solutions (additional C) looks too expensive
expensive design process w.r.t. volumes and safety function entity
Diagnosis and Control Safety design

182

Automatic Machinery
The main trouble is integrity certification (contd)
Use simple safe relay to make the computations
oElectromechanical component integrity certification is available
oSome troubles:
rough stop management long downtimes to restore the machine

Nowadays: smarter solutions

REM: these are examples of SIS


Diagnosis and Control Safety design

183

Automatic Machinery
Safety Relays based solution

Diagnosis and Control Safety design

184

Automatic Machinery
New solution: safe-logic, safe fieldbus, safe drives

Diagnosis and Control Safety design

185

Automatic Machinery
Norms for automatic machines safety:
ISO EN 954 up to 31st Dec. 2011
ISO EN 13849
oMore oriented to mechanical and electromechanical safety
functions

IEC EN 62061
oMore oriented to Programmable Electronics (IEC61508)

Some overlapping between the scopes of the ISO and IEC


norms, but with some differences in safety solutions.
A joint ISO/IEC is expected in 2016(?): ISO-IEC 62737
Diagnosis and Control Safety design

186

Bibliography
1.
2.
3.
4.
5.
6.
7.
8.
9.

Norma CEI EN 61508: Sicurezza funzionale dei sistemi elettrici, elettronici ed elettronici programmabili per applicazioni di
sicurezza, CEI, 2002.
Norma IEC EN 61511: Functional safety Safety instrumented systems for the process industry sector, IEC 2003.
D.J. Smith, K.G.L. Simpson, Functional safety, Elsevier, 2004.
AA.VV. - GUIDELINES FOR Hazard Evaluation Procedures, CCPS Center for chemical process safety, 1995.
Lees, F.P., Loss prevention in the process industries, Butterworth Hinemann, 1996.
Andrews, J.D., Moss, T.R., Reliability and Risk Assessment, Professional Engineering Publications, ISBN 1 86058 290 7,
2002.
Birolini, A. Reliability Engineering Theory and Practice, Springer Verlag, ISBN 3 540 66385 1, 1999.
Grassani, E., La sicurezza sulle macchine, Editoriale Delfino, ISBN 978 88 89518 50 2, 2008.
Isermann, R., Schwarz, R., Stolz, S., Fault-Tolerant drive-by-wire systems, IEEE Control Systems Magazine, October 2002.

Diagnosis and Control Safety design

187

You might also like