Professional Documents
Culture Documents
LM Automation Engineering
Quality
(reliability/availability)
Safety
Countermeasures:
Focus on algorithms: mainly diagnosis
Some basics on FTC (but case dependent)
Some basics also on other countermeasures
Diagnosis and Control Safety design
Quality
(reliability/availability)
Safety
Countermeasures:
Focus on algorithms: mainly diagnosis
Some basics on FTC (but case dependent)
Some basics also on other countermeasures
Diagnosis and Control Safety design
Definition
According to IEC-61508 (later):
SAFETY is freedom from unacceptable risk of
physical injury or of damage to the health of
people, property or environment.
Why this definition is relevant for engineering
systems?
Introduction
1976: Seveso disaster
uncontrolled exothermal reaction
causing the dissemination of dioxin
1986: Chernobyl disaster
Failure during the test of a safety
emergency core cooling feature
1986: Space Shuttle Challenger
O-ring seal in its right solid rocket
booster failed at liftoff
1996: ESA Arianne V unmanned rocket
Inertial reference software exception
Diagnosis and Control Safety design
Definitions
According to Birolini Reliability Engineering Theory and
Practice,
Safety can be divided in:
Accident prevention
Technical safety
Technical Safety
Safety preservation considering
a) system malfunctions (faults)
b) off-nominal external conditions
c) incorrect human behaviour (when possible..)
which give (or could give) consequences affecting the
people or environment health.
Set of countermeasures to guarantee acceptable risk
o
o
Example:
An event with frequency 0.1 per year (i.e. 1 every 10 years) in 1 year
has a probability 0.1 of occurring?
Frequency
Consequences
10
Fault Avoidance
Fault Removal
Fault Tolerance
12
13
o
o
IEC, ISO EN
Mandatory
14
o
o
o
Mainly intended for systematic faults, but they can be applied to random fault, when
possible, by increasing the basic components reliability
Rem:
- Systematic fault is a design mistake which leads to a certain fault whenever certain conditions take place
- Random fault is an occasional fault due to normal ageing of components
15
16
Hazards = events in the systems which could give directly harm to humans,
environment etc.
From now on we mainly focus on the first one, for the sake of simplicity
17
18
19
No Learn by Mistake!!
20
Risk reduction
Identification of
hazards and effects
Frequency
Evaluation
Consequence
Evaluation
Risk Reduction
Measures
Risk Determination
Risk Evaluation
Risk Acceptability
Criteria
Risk
Acceptance
NO
YES
Set Functional
Requirements
21
Risk reduction
Definition of risk reduction countermeasure:
What it is
The effect on the first causes or on the cause-effect chain has to
be clarified
How reliable it is
It has to be combined with original hazard frequency/probability
to assess actual risk reduction.
22
Risk reduction
A possible false-friend..
Sometimes risk reduction measure could lead to some new hazards
Frequency
Evaluation
Consequence
Evaluation
Risk Reduction
Measures
Risk Determination
Risk Evaluation
23
Functional Safety
Functional safety is part of the overall safety
It deals with safety functions
Safety functions are particular kind of risk reduction
measures which are based on the execution of some
actions (as sequences of operations) when a particular
hazard occurs or to prevent hazards
o
24
25
26
27
o
o
o
28
Frequency
Evaluation
Consequence
Evaluation
Risk Reduction
Measures
Risk
Determination
Risk
Evaluation
Risk
Acceptability
Criteria
Risk
Accepta
nce
Set Functional
Requirements
Residual
Risk
Tolerable
Risk
EUC Risk
Increasing
Risk
29
Frequency
Evaluation
Consequence
Evaluation
Risk Reduction
Measures
Risk
Determination
Risk
Evaluation
Risk
Acceptability
Criteria
Risk
Accepta
nce
Set Functional
Requirements
Residual
Risk
Tolerable
Risk
EUC Risk
Increasing
Risk
30
Prescriptive Standards
Courtesy by Exida
Courtesy by Exida
Tolerable
Risk
EUC Risk
Increasing
Risk
Total/Partial risk
covered by other
technologies SRS
Total/Partial risk
covered by E/E/PE
SRS
34
Tolerable
Risk
EUC Risk
Increasing
Risk
Total/Partial risk
covered by Other
technologies SRS
Total/Partial risk
covered by E/E/PE
SRS
Risk reduction achieved by one or more safety-related systems which host safety
functions and possibly external risk reduction facilities (SAFETY ALLOCATION)
35
Tolerable
Risk
EUC Risk
Increasing
Risk
Total/Partial risk
covered by other
technologies SRS
Total/Partial risk
covered by E/E/PE
SRS
Risk reduction achieved by one or more safety-related systems which host safety
functions and possibly external risk reduction facilities (SAFETY ALLOCATION)
36
Equipment
Under
Control (EUC)
37
Equipment
Under
Control (EUC)
Additional
SRS or safety
measures
38
Equipment
Under
Control (EUC)
Additional
SRS or safety
measures
Focus
on
E/E/PE
SRS
39
Equipment
Under
Control (EUC)
Additional
SRS or safety
measures
Focus
on
E/E/PE
SRS
40
o
o
41
42
REALIZATION:
Design
Implementation
Verification
Documentation
OPERATION:
Startup
Operation
Maintenance
Modifications
Decommissioning
43
REALIZATION:
Design
Implementation
Verification
Documentation
OPERATION:
Startup
Operation
Maintenance
Modifications
Decommissioning
44
Operation and
maintenance
planning
Safety
Validation
planning
Install. and
commiss.
planning
9
12
13
14
16
Concept
Overall Scope
Definition
Hazard and
risk analysis
Overall Safety
requirements
Safety req.
allocation
Safety related
Systems E/
EE/PES
Realization
Safety related
10 Systems Others
Realization
15
External risk
reduction
11
facilities
Realization
Overall modification
and retrofit
45
Operation and
maintenance
planning
Safety
Validation
planning
Install. and
commiss.
planning
9
12
13
14
16
Concept
Overall Scope
Definition
Hazard and
risk analysis
Overall Safety
requirements
Safety req.
allocation
Safety related
Systems E/
EE/PES
Realization
Safety related
10 Systems Others
Realization
15
External risk
reduction
11
facilities
Realization
Overall modification
and retrofit
46
Operation and
maintenance
planning
Safety
Validation
planning
Install. and
commiss.
planning
9
12
13
14
16
Concept
Overall Scope
Definition
Hazard and
risk analysis
Overall Safety
requirements
Safety req.
allocation
Safety related
Systems E/
EE/PES
Realization
Safety related
10 Systems Others
Realization
15
External risk
reduction
11
facilities
Realization
Overall modification
and retrofit
47
Safety related
Systems E/
EE/PES
Realization
9.1
Safety Requirement
Specifications
Functions
9.2
Validation
Planning
9.3
Integrity
Design
development
9.4
Integration
9.6
Safety
Validation
9.5
Installation
commissioning
Operation
Maintenance
48
Safety related
Systems E/
EE/PES
Realization
Time
Details
49
Design issues are for the COTS SIS manufacturers, not for users
50
DAP is non-hazardous.
Ammonia is the most hazardous chemical present.
51
52
53
Qualitative methods
Aims:
Scope:
Tools:
When:
At design stage
54
Basic concepts:
Partitioning the system in components/items/nodes
Why systematic?
55
o
o
o
o
Consider each plant item and apply a set of predefined guide words
to each item parameter to find possible deviations
56
57
58
Item keywords
Outputs
59
HAZARD
TOP EVENT
Diagnosis and Control Safety design
60
61
Define them
Identify their failure modes
Evaluate their failure rates
62
FMEA: worksheets
63
FMEA: worksheets
64
FMEA: worksheets
65
66
o
o
o
o
o
o
67
FMECA: worksheets
69
io
Increasing Frequency
4
2.2/1
3
2
1
1
is
Increasing Severity
70
io
Increasing Frequency
4
2.2/1
3
2
2.2/1
1
1
is
Increasing Severity
71
io
Increasing Frequency
4
2.2/1
3
2
2.2/1
1
1
is
Increasing Severity
72
Remark
HAZOP and FMECA do not give only a list of hazards
Developed for any kind of failure (reliability)
Also risk assessment and reduction are partially covered
o Anyway in safety we are mainly interested to use them to get
hazard list
73
Remark
Beside the previous problem (how to be sure about hazard list
completeness..):
hazard frequency assessment!
Quantitative methods
74
75
Guide Word
Parameter
76
77
TOP EVENT
78
TOP EVENT
79
80
82
Focus on failures
Try to reduce the probability of the top event
What could cause the top event to occur?
Only relevant failure modes are considered
o
o
o
83
Event symbols
Gate symbols
Transfer symbols
Top Event
and
Event 1
or
84
Gate symbols
Transfer symbols
85
AND
and
Priority AND
and
OR
or
XOR
xor
K-out-of-N
NOT
Inhibit gate
Transfer symbols
86
87
88
Develop the fault tree for a system with the following diagram
A
89
Develop the fault tree for a system with the following diagram
Top Event: no current out of end
A
90
Develop the fault tree for a system with the following diagram
OR
A
AND
91
Develop the fault tree for a system with the following diagram
No current at
end
and
A fails
B fails
No current
Out of D
or
C fails
D fails
92
Develop the fault tree for a system with the following diagram
No current at
end
OR
A
and
B
A fails
C
B fails
or
AND
Diagnosis and Control Safety design
No current
Out of D
C fails
D fails
93
Cut sets are the unique combinations of component failures that can
cause system failure
A cut set is said to be a minimal cut set if, when any basic event is
removed from the set, the remaining events collectively are no longer a
cut set
94
MC1 = { A,B,C}
MC2 = { A,B,D}
Pr(T ) = Pr ( A ! B ! C + A ! B ! D) =
and
A fails
B fails
= Pr( A ! B ! C ) + Pr ( A ! B ! D) " Pr ( A ! B ! C ! D) =
= Pr( A) ! Pr ( B) ! Pr (C ) + Pr ( A) ! Pr( B) ! Pr ( D) " Pr ( A) ! Pr ( B) ! Pr(C ) ! Pr ( D)
No current
Out of D
or
C fails
D fails
95
and
A fails
B fails
Pr(T ) = Pr ( A ! B ! (C + D)) =
= Pr( A) ! Pr ( B) ! Pr (C + D) =
No current
Out of D
or
C fails
D fails
96
Develop the fault tree for a system with the following diagram
Top Event: no current (flow) out of end
V-2
P-1
V-4
V-1
T-1
V-3
Sensing and
Control
V-5
P-2
AC Power
Source
97
99
Risk Reduction
Up to now: Hazard Analysis and Risk assessment
Target: reduce the risk at acceptable level with some
solutions
Acceptable level definition? Field-specific standard/norm
ALARP see later
100
Risk Reduction
We will consider a possible quantitative method
LOPA
exploit SIL from IEC61508 to classify integrity/reliability
requirements
101
102
103
Prob: 1.9E-3
Diagnosis and Control Safety design
104
ACCEPTABLE?????
E.g. Tolerable approx 1E-5!!!
Diagnosis and Control Safety design
Prob: 1.9E-3
105
106
Prob: 1.9E-5
Diagnosis and Control Safety design
107
-2 orders of magnitude
SIL 2 (see later)
Diagnosis and Control Safety design
Prob: 1.9E-5
108
109
110
111
112
113
ALARP Region
Broadly acceptable
Region
Negligible Risk
114
115
- Safety functions
- Risk reduction
requests turned in
SIL level requests
116
117
Diagnostic cover:
DC = "
!dd
!dd
="
!d
!dd + !du
118
119
120
121
REDUNDANCY
Premises:
General objectives we dealt with:
- (increase system availability)
- obtain a prescribed integrity for a safety function/measure
How to get them?
- push toward component perfection
new technologies etc.
122
REDUNDANCY
Reminder:
Redundancy is the most obvious countermeasure to faults:
if a component does not work at all or works in a partially bad
way (partial fault) another one does his job
For partial faults, this is not the only solution:
Control solutions which can obtain a good (or slightly degraded)
system behaviour
Diagnosis + Control reconfiguration or robustness
Case-dependant (not always possible)
Diagnosis and Control Safety design
123
REDUNDANCY
Remark:
Redundancy is usually intended for fault tolerance
Another feature of redundancy is:
Inherent and straightforward fault detection and diagnostic
If two components should give the same output, and it doesnt happen a fault has
clearly occurred.
Usually, for fault detection, less redundant components are needed w.r.t. fault
tolerance.
124
125
126
o
o
127
Hot Standby
Cold Standby
128
1
2
VOTER
3
n
Diagnosis and Control Safety design
xm
This structure is
actually for sensors or
controllers, not for
actuators (see later)
129
130
Dynamic redundancy
Requires fewer modules at the cost of more information
processing.
A minimal configuration consists of two modules
consistency checking
comparison with redundant modules
information redundancy ( parity checking or watchdog timers).
131
Fault
Detection
Reconfiguration
xm
2
132
Fault
Detection
Reconfiguration
xm
2
133
o
o
134
135
136
137
139
140
Rem: If the procedure to stop is too long and the probability of the final fault
along such stop-procedure is unacceptable, then such safe stop should be
started just after the previous fault.
141
Rem: If the procedure to stop is too long and the probability of the faults along
such stop-procedure is unacceptable, then such safe stop should be started
just after the previous fault.
142
Static/Dynamic redundancy
Virtual sensors
143
Sensor
2
VOTER
Sensor
3
xm
144
Reconfiguration
Sensor
1
xm
Sensor
2
145
G1
Sensor
1
G2
Sensor
2
Process
y1
y2
GM 1 GM 2
y1
Process
Model
146
G1
y1
Sensor
1
Input
Sensor
GM 1
y1
Process
Model
147
G1
Sensor
1
G2
Sensor
2
G1
y1
y2
GM 1 GM 2
y1
Sensor
1
Input
Sensor
y1
GM 1
y1
Process
Model
148
G1
Sensor
1
G2
Sensor
2
y2
GM 1 GM 2
Process
Model
Process
Input
Sensor
y1
GM 1
y1
VOTER
y FT
y1u
149
G1
Sensor
1
G2
Sensor
2
y2
y FT
GM 1 GM 2
Process
Model
Process
Input
Sensor
y1
GM 1
y1
y1u
Reconfiguration
150
G1
Sensor
1
G2
Sensor
2
y2
y FT
GM 1 GM 2
Process
Model
Process
Input
Sensor
y1
GM 1
y1
y1u
Reconfiguration
151
G1
Sensor
1
G2
Sensor
2
y2
GM 1 GM 2
Process
Model
Process
Input
Sensor
y1
GM 1
y1
VOTER
y FT
y1u
152
153
G1
Sensor
1
G2
Sensor
2
y1
y2
y1FT
Process
Input
Sensor
y FT
GM 1
GM 2
GM 1 GM 2
Process
Model
r1
y1u
y 2u
y 2FT
uFT
r2
r3
y1
154
Remark
Analytical redundancy vs Diagnosis Algorithms
Analytical redundancy aims at providing an additional copy of
one information already available form another source
Obtained through a different path based on an algorithms
Diagnosis Algorithms aims at detect when an information (or a
component) is not reliable
An additional copy is not strictly necessary
Usually:
- analytical redundancies give diagnosis
- diagnosis algorithms do not give analytic redundancy
Diagnosis and Control Safety design
155
input transformer,
actuation converter,
actuation transformer,
actuation element (e.g., dc amplifier, dc motor, gear and valve).
Available measurements are frequently the input signal ui,
manipulated variable uo, and intermediate signal u3.
Signal
Transformer
(Amplifier)
ui
Actuation
Converter
(Motor)
u1
Actuation
Transformer
(Gear)
u2
u3
Actuation
Element
(Valve)
uo
156
157
158
159
What else?
Reliability of human operator?..
Reliability of algorithms for analytical redundancy?
Similar to software reliability
Systematic faults
o
o
160
Automotive _
Many nominal working functions are
- safety functions, as well
- or need high availability for customer satisfaction
Keep operational (possibly degraded) after a fault to prevent
hazard or increase availability
161
Automotive
Reliability/integrity of such algorithms?
A single computing unit (ECU) for each automotive
system is usually adopted (no computing HW
redundancy), how to improve its reliability/integrity?
Not one for the whole car, but for a subsystem or a set of
subsystems:
o
A main ECU for Engine Control and many others: ABS ECU,
ESC ECU
162
Automotive
LEVEL 0: ELECTRIC SIGNALS CONSISTENCY
Very basic HW checks on electric signals in ECU I/O
Minimal use of information on the system under control
o Limit checking
o Max/min slope
o Stuck signal
o .
Sgn
163
Automotive
LEVEL 1: FUNCTIONAL DIAGNOSIS
Functional because it consider the functions linked to signals.
Models and knowledge about the plant
G1
Sensor
1
G2
Sensor
2
y1
y2
y1FT
Process
Input
Sensor
y FT
GM 1
GM 2
GM 1 GM 2
r1
y1u
y 2u
y1
y 2FT
uFT
r2
r3
Process
Model
164
Automotive
LEVEL 2: SUPERVISORY DIAGNOSIS
Can we trust the algorithms giving analytical redundancy at
LEVEL 1?
Random faults are possible due to approximations!
165
Automotive
LEVEL 3: COMPUTING HW OPERATIONAL SUPERV.
ECU: Unique computing platform (C-based+RTOS) for basic
ctrl functions and Levels 0-1-2 diagnosis and functions
Limited fault-tolerance capabilities
C+RTOS fault probability is fairly low
166
Automotive
REMARK: FUNCTIONS TRIGGERED BY LEVELS 0-1-2-3
Active or to enable passive countermeasures
Active: for fault tolerance or other active function
To enable passive countermeasures: impose fail-silent to make the mechanical
countermeasure to be effective
o Limp home
The higher the triggering level is, the more abrupt and unpleasant the
actions are
If Level 2 or 3 trigger actions very critical conditions
o No margins for smooth recovery
167
Automotive
COMMENT: AUTOMOTIVE DIAGNOSTIC LAYERS vs GENERAL
ARCHITECTURES
Similarities with Layers of Protections seen in LOPA
- Level 0 and Level 1 should detect all the faults and
trigger related actions
- If they fail, Level 2 should act
Level 3 is rather unusual and parallel to previous:
Shared computation HW + external testing unit
No clear reliability assessment for the layers
o
o
o
168
Automotive
FEATURES & TRENDS:
Many ECUs by different manufacturers for automotive subsystems
(usually referred as tier-1 providers for car-makers)
o E.g. Bosh, Magneti Marelli, Delphi, Omron
169
Automatic Machinery
m
a
c
h
i
n
e
M
T
M
T
M
T
M
T
M
T
M
T
M
T
M
O
M
O
M
O
M
O
M
O
M
O
M
O
Mechanical Axes
m
a
c
h
i
n
e
M
T
M
T
M
T
M
T
M
T
M
T
M
T
M
O
M
O
M
O
M
O
M
O
M
O
M
O
Electronic-Electric Axes
172
Automatic Machinery
Typical functional model:
173
Automatic Machinery
Typical functional model (contd):
174
Automatic Machinery
Typical function model:
Supervision
Supervisor
Logic Ctrl
Trajectory generator
Control
F(z)
F(z)
MotorCtrl
F(z)
MotorCtrl
F(z)
MotorCtrl
Plant
Diagnosis and Control Safety design
175
Automatic Machinery
Typical technological architecture
Field-bus
PLC Axes
Controller
I
M
A
C
H
I
N
E
Vector
Drives
Inverters
Motion Control
System
O
M
MT
MT
MT
MT
MT
MT
MT
MO
MO
MO
MO
MO
MO
MO
176
Automatic Machinery
What about safety?
Automatic machines work without humans
Main potential safety issues (according to standards/rules):
Contact of moving parts with humans
Some objects are lost and thrown away: they could hit people or
properties
Release of chemical in the nearby environment (pharmaceutics)
Main approach:
Mechanical barriers!
oPASSIVE SAFETY MEASURES
Diagnosis and Control Safety design
177
Automatic Machinery
178
Automatic Machinery
Whatever fault we consider by FMECA, HAZOP
Mechanical passive barriers reduce the hazard risk
to very low level
This is not a solution for availability
For safety (mandatory by law) is usually enough.
179
Automatic Machinery
No need of functional safety?
Barriers can be opened accidentally
while the machine is running
Sometimes just optical barrier light curtains
o Photocells - Optical fork
180
Automatic Machinery
181
Automatic Machinery
Basic safety function:
stop the machine as soon as the barriers are opened or
the emergency button has been pressed or the maxsafe-speed is exceeded in maintenance.
Very simple
The main trouble is: integrity certification
Standard PLC/Motion Controller cannot be used to
implement Safety Function
o Computation HW reliability is not suitable
o Automotive-like solutions (additional C) looks too expensive
expensive design process w.r.t. volumes and safety function entity
Diagnosis and Control Safety design
182
Automatic Machinery
The main trouble is integrity certification (contd)
Use simple safe relay to make the computations
oElectromechanical component integrity certification is available
oSome troubles:
rough stop management long downtimes to restore the machine
183
Automatic Machinery
Safety Relays based solution
184
Automatic Machinery
New solution: safe-logic, safe fieldbus, safe drives
185
Automatic Machinery
Norms for automatic machines safety:
ISO EN 954 up to 31st Dec. 2011
ISO EN 13849
oMore oriented to mechanical and electromechanical safety
functions
IEC EN 62061
oMore oriented to Programmable Electronics (IEC61508)
186
Bibliography
1.
2.
3.
4.
5.
6.
7.
8.
9.
Norma CEI EN 61508: Sicurezza funzionale dei sistemi elettrici, elettronici ed elettronici programmabili per applicazioni di
sicurezza, CEI, 2002.
Norma IEC EN 61511: Functional safety Safety instrumented systems for the process industry sector, IEC 2003.
D.J. Smith, K.G.L. Simpson, Functional safety, Elsevier, 2004.
AA.VV. - GUIDELINES FOR Hazard Evaluation Procedures, CCPS Center for chemical process safety, 1995.
Lees, F.P., Loss prevention in the process industries, Butterworth Hinemann, 1996.
Andrews, J.D., Moss, T.R., Reliability and Risk Assessment, Professional Engineering Publications, ISBN 1 86058 290 7,
2002.
Birolini, A. Reliability Engineering Theory and Practice, Springer Verlag, ISBN 3 540 66385 1, 1999.
Grassani, E., La sicurezza sulle macchine, Editoriale Delfino, ISBN 978 88 89518 50 2, 2008.
Isermann, R., Schwarz, R., Stolz, S., Fault-Tolerant drive-by-wire systems, IEEE Control Systems Magazine, October 2002.
187