You are on page 1of 293

Safety Engineering in Aviation

21BILD

Introduction lecture

2.10.2023 1 / 21
Andrej Lališ

• Head of Aviation Safety and Security Laboratory


(A234c)
• lalisand@fd.cvut.cz
• MS Teams for urgent matters

2.10.2023 2 / 21
Subject conditions

• Credit, Exam
• Exercises attendance, routine tests
• Semestral assignments - optional
• Learning materials available at Moodle
https://moodle-vyuka.cvut.cz
• Lecture recording on MS Teams

2.10.2023 3 / 21
Textbooks

• N. Leveson, Engineering a Safer World: Systems Thinking Applied to


Safety, MIT Press, Cambridge, 2012.
• S. Dekker, Drift into Failure: From Hunting Broken Components to
Understanding Complex Systems, CRC Press, 2011.
• S. Dekker, The Field Guide to Understanding ‘Human Error’, Ashgate,
2014.
• E. Hollnagel, Safety-I and Safety-II: The Past and Future of Safety
Management, Ashgate, 2014.
• E. Hollnagel, Safety-II in Practice: Developing the Resilience
Potentials, Routledge, 2018.

2.10.2023 4 / 21
What you will learn?

1. Understand the issue of safety


2. Assess new product/system for safety
3. Manage safety in everyday operations
4. Explain and follow-up safety occurrence
5. Practical details from the aviation

2.10.2023 5 / 21
August 2023

Source: https://avherald.com/

2.10.2023 6 / 21
September 2023

Source: https://avherald.com/

2.10.2023 7 / 21
January 2023

One passenger received serious injuries


(sacral spine fracture), 4 other passengers
minor injuries.

2.10.2023 8 / 21
Safety

a) „Freedom from all conditions that cause injury of any kind or death
of a person, or damage to or loss of equipment or property”
(Malasky 1974)

b) “Freedom from danger, risk or threat” (Crutchfield & Roughton


2013)

c) WHO divides safety into two dimensions based on external


(objective) or internal (subjective) criteria

2.10.2023 9 / 21
Safety

2.10.2023 10 / 19
Safety

2.10.2023 11 / 19
Safety

2.10.2023 12 / 19
Security

a) “Safeguarding civil aviation against acts of unlawful interference.”


(ICAO, 2017)

b) “The prevention of and protection against assault, damage, fire,


fraud, invasion of privacy, theft, unlawful entry, and other such
occurrences caused by deliberate action.”(Kumar, 2014)

c) “The extent to which a computer system is protected from data


corruption, destruction, interception, loss, or unauthorized
access.”(Francis, 2014)

2.10.2023 13 / 21
Safety stats

2.10.2023 14 / 21
Safety stats

2.10.2023 15 / 21
Safety stats

Source: https://aviation-safety.net/database/2022-analysis

2.10.2023 16 / 21
Safety stats

Source: https://aviation-safety.net/database/2022-analysis

2.10.2023 17 / 21
Safety stats

Fatal accident likelihood as of 2023:

∿10 -7

2.10.2023 18 / 21
Safety stats

∿10-4

2.10.2023 19 / 21
Safety stats

∿10-5

Source: https://www.policie.cz/

2.10.2023 20 / 21
Why to improve?

1. Fast pace of technological change (new hazards)


2. Reduced ability to learn from experience
3. Changing nature of accidents – complexity and coupling
4. Decreasing tolerance for single accidents
5. Difficulty in selecting priorities and making trade-offs

2.10.2023 21 / 21
References

Crutchfield, N. and Roughton J. (2013). Safety Culture: An Innovative Leadership


Approach. 1st Edition, Butterworth-Heinemann.

Francis, A. (2014). The Roles of Peace and Security, Political Leadership, and
Entrepreneurship in the Socio-Economic Development of Emerging Countries.
AuthorHouse UK.

ICAO (2017). Annex 17 - Safeguarding Civil Aviation Against Acts of Unlawful


Interference. International Civil Aviation Organization (ICAO), Montréal, Quebec, 10.
edition 2017.

Kumar A. (2014). A Textbook of Security. Goyal Brothers Prakashan.

Malasky, Sol. W. (1974). System safety: planning/engineering/management. Spartan


Books.
Safety Engineering in Aviation
21BILD

Safety models and methods

9.10.2023 1 / 28
Railway couplers 1893

1. Where and how safety emerged?


2. How we started to deal with safety?
3. What were the key discoveries?

9.10.2023 2 / 28
Overview

Source: https://www.american-rails.com/

9.10.2023 3 / 28
1880 – Buffalo sugar milll

Source: https://www.alamy.com/

9.10.2023 4 / 28
Railway couplers 1893

Source: https://www.youtube.com/

9.10.2023 5 / 28
Railway couplers 1893

Statistics in railway (1888-1894):


• ~16 000 workers killed (~7 a day)
• ~170 000 workers crippled

In the U.S., an employer did not have to pay injured employees if:
• the employee contributed (only if partly) to the cause of the accident
• another employee contributed to the accident
• the employee knew of the hazards involved in the accident before
the injury and still agreed to work in the conditions for pay

Source: Gloss and Wardle (1984)

9.10.2023 6 / 28
Heinrich’s Domino Model

9.10.2023 7 / 28
Heinrich’s Domino Model

Adapted from Heinrich (1931)

9.10.2023 8 / 28
Heinrich’s Domino Model

Adapted from Heinrich (1931)

9.10.2023 9 / 28
Heinrich’s 300-29-1 Model

Source: https://avatarms.com/whats-300291-really-mean/

9.10.2023 10 / 28
FMEA (1949)

• Failure Modes and Effects Analysis


• Allows both bottom-up and top-down
• Provides specific structure to guide the analysis

9.10.2023 11 / 28
FMEA (1949)

9.10.2023 12 / 28
Safety models/methods

9.10.2023 13 / 28
Safety models/methods

Adapted from: Eurocontrol, A White Paper on Resilience Engineering for ATM, 2009.

9.10.2023 14 / 28
FTA (1961)

• Fault Tree Analysis


• Deductive top-down approach
• Boolean algebra – using TRUE/FALSE values
• AND, OR, NOT operators and visual representation

9.10.2023 15 / 28
FTA (1961)

9.10.2023 16 / 28
Linear Chain-of-Failure Model

Adapted from Hollnagel (2009)

9.10.2023 17 / 28
Linear Chain-of-Failure Model

if-then?

9.10.2023 18 / 28
Linear Chain-of-Failure Model

if-then?

9.10.2023 19 / 28
Linear Chain-of-Failure Model

Adapted from Mündel (2020)

9.10.2023 20 / 28
Linear Chain-of-Failure Model

Adapted from Mündel (2020)

9.10.2023 21 / 28
Reliability vs. Safety

Unreliable but safe


Reliable but unsafe
Unreliable and unsafe

9.10.2023 22 / 28
Reliability

a) “Reliability is the probability of success or the probability that the


system will perform its intended function under specified design
limits” (Pham 2006)

b) “Resistance to failure of an item over time” (Anderson & Neri 1990)

c) “Capacity of an item to perform as required over time” (Hamada et


al. 2008).

High reliability is neither necessary nor sufficient for safety!

9.10.2023 23 / 28
HAZOP (1960)

• Hazard and Operability Study


• Hazards identification

• Hazard:
A hazard is any existing or potential condition that can lead to injury,
illness, or death to people; damage to or loss of a system, equipment, or
property; or damage to the environment.

9.10.2023 24 / 28
HAZOP (1960)
The guide words:
No or Not Not used, not done - complete negation
More Quantitative increase
Less Quantitative decrease
Part of Qualitative modification/decrease
Reverse Logical opposite of the design intent
Other than Complete substitution
Early Relative to the clock time
Late Relative to the clock time
After Relative to the clock time
Before Relative to the clock time

9.10.2023 25 / 28
HAZOP (1960)

9.10.2023 26 / 28
References

Heinrich, H. W. (1931). Industrial accident prevention: A scientific approach. McGraw-Hill.

Hollnagel, E (2009). The ETTO Principle: Why things that go right sometimes go wrong.
Ashgate.

How to conduct a failure modes and effects analysis (FMEA): White Paper [online]
(2016). 60071-A3 10/16 F. Siemens PLM Software.
https://polarion.plm.automation.siemens.com/hubfs/Docs/Guides_and_Manuals/Sieme
ns-PLM-Polarion-How-to-conduct-a-failure-modes-and-effects-analysis-FMEA-wp-60071-
A3.pdf

Ferry, T. S. (1988). Modern accident investigation and analysis. 2nd ed. New York: Wiley.
References

Gloss, D. S. and Wardle, M. G. (1984) Introduction to Safety Engineering, John Wiley &
Sons, New York.

Mündel, K. (2020). ATA36 Pneumatic System Reliability and Maintenance for B737NG
Operators, Bachelor thesis, Czech Technical University in Prague.

Safety Management International Collaboration Group (SM ICG). Hazard Taxonomy


Examples [online], 2013. https://www.skybrary.aero/bookshelf/books/2301.pdf
Safety Engineering in Aviation
21BILD

Safety models and methods (part 2)

16.10.2023 1 / 25
Safety models/methods

Adapted from: Eurocontrol, A White Paper on Resilience Engineering for ATM, 2009.

16.10.2023 2 / 25
Human Factors
• Three mile island (1979)
• Eastern Flight 401 (1972)

• Chernobyl (1986)

16.10.2023 3 / 25
SHELL (1972)

• Conceptual model of human factors


• Consists of building blocks
• Emphasizes the blocks interfaces

• Software (procedures, training, support etc.)


• Hardware (equipment and devices)
• Environment (operational conditions)
• Lifeware (other people at the workplace)

16.10.2023 4 / 25
SHELL (1972)

16.10.2023 5 / 25
SHELL (1972)

Lifeware is determined by:


• Physical factors (strength, height, vision, etc.)
• Physiological factors (fitness, health condition, stress, fatigue etc.)
• Psychological factors (skills, knowledge, capabilities to carry out
tasks, adaptability etc.)
• Psychosocial factors (ability to work in teams, family relations,
financial security etc.)

16.10.2023 6 / 25
SHELL (1972)

Source: Chen et al. (2021)

16.10.2023 7 / 25
THERP (1983)

• Technique of Human Error Rate Prediction


• Human Error estimation – support to identify socio-technical system
degradation due to human error
• Originates from nuclear power industry

HRA: Human Reliability Analysis (Human Error analysis) methods

16.10.2023 8 / 25
THERP (1983)

Source: Swain and Guttmann (1983)

16.10.2023 9 / 25
THERP (1983)

Source: Swain and Guttmann (1983)

16.10.2023 10 / 25
THERP (1983)

Source: Swain and Guttmann (1983)

16.10.2023 11 / 25
THERP (1983)

Source: Swain and Guttmann (1983)

16.10.2023 12 / 25
THERP (1983)

Source: Swain and Guttmann (1983)

16.10.2023 13 / 25
HCR (1984)

• Human Cognitive Reliability


• Foundation is three-parameter Weibull distribution
• All parameters are to be estimated either from data or by expert
assessment

• Necessary parameter estimations:


• 1. Cognitive demand with respect to skills, rules and knowledge
• 2. Mean estimation of the time needed to carry out a task

16.10.2023 14 / 25
HCR (1984)

Source: OECD (1998)

16.10.2023 15 / 25
Reason’s model (1990)

• Swiss cheese model


• Active failures and latent conditions
• An accident is result of trajectory passing through all the barriers in
the system

16.10.2023 16 / 25
Reason’s model (1990)

16.10.2023 17 / 25
Reason’s model (1990)

Source: Reason (1997)

16.10.2023 18 / 25
Reason’s model (1990)

16.10.2023 19 / 25
Reason’s model (1990)

16.10.2023 20 / 25
HFACS (2004)

• Human Factors Analysis and Classification System


• Investigation of human factors, support to training and prevention
• Method to identify active failures and latent conditions

16.10.2023 21 / 25
HFACS (2000)

Zdroj: Shappell et al. (2007)

16.10.2023 22 / 25
Organizational era
• British Airways Flight 5390 (1990)

16.10.2023 23 / 25
AcciMap (1997)

Source: Rasmussen (1997)

16.10.2023 24 / 25
AcciMap (1997)

16.10.2023 Source: Parnell et al.25 / 19


(2017)
References

Chen, N., Li, J. and May, Y. (2021) Using SHELL and Risk Matrix Method in Identifying the
Hazards of General Aviation Flight Approach and Landing. In: 6th International
Conference on Transportation Information and Safety: New Infrastructure Construction
for Better Transportation (ICTIS 2021). DOI: 10.1109/ICTIS54573.2021.9798561

OECD (1998). Critical Operator Actions: Human Reliability Modeling and Data Issues.
Nuclear Energy Agency, Committee on the Safety of Nuclear Installations
NEA/CSNI/R(98)1.

Parnell, K., Stanton N. and Plant, K. (2017). What’s the law got to do with it? Legislation
regarding in-vehicletechnology use and its impact on driver distraction. Accident Analysis
and Prevention, 100, pp. 1-14. DOI: 10.1016/j.aap.2016.12.015
References

Rasmussen, J. (1997) Risk Management in a Dynamic Society: A Modelling Problem.


Safety Science, 27(2/3), pp. 183-213. DOI: 10.1016/S0925-7535(97)00052-0

Reason, J. (1990) The Contribution of Latent Human Failures to the Breakdown of


Complex Systems. Philosophical Transactions of the Royal Society of London. Series B,
Biological Sciences. 327 (1241): 475–484. doi:10.1098/rstb.1990.0090.

Reason, J. (1997). Managing the Risks of Organizational Accidents. Ashgate Publishing


Limited, England.

Shappell, S., Detwiler, C., Holcomb, K., Hackworth, C., Boquet, A. and Wiegmann, D. A.
Human error and commercial aviation accidents: An analysis using the human factors
analysis and classification system. Human Factors, 49 (2), pp. 227-242. DOI:
10.1518/001872007X312469
References

Swain D. and Guttmann, H.E. (1983) Handbook of Human Reliability Analysis with
Emphasis on Nuclear Power Plant Applications. Final Report. NUREG/CR-1278,
Washington, DC.

28 / 19
Safety Engineering in Aviation
21BILD

Safety management

6.11.2023 1 / 40
Aviation SMS

• Safety Management System – A systematic approach to managing


safety, including the necessary organizational structures,
accountability, responsibilities, policies and procedures.

• ICAO Annex 19 (L19) – Safety Management (Ed. 2)


• ICAO Doc. 9859 – Safety Management Manual (Ed. 4)

6.11.2023 2 / 40
Aviation SMS

• Safety Management System – A systematic approach to managing


safety, including the necessary organizational structures,
accountability, responsibilities, policies and procedures.

• ICAO Annex 19 (L19) – Safety Management (Ed. 2)


• ICAO Doc. 9859 – Safety Management Manual (Ed. 4)

6.11.2023 3 / 40
Aviation SMS

Introduction of SMS

Source: S. Dekker

6.11.2023 4 / 40
Hazards

• In aviation, a hazard can be considered as a dormant potential for


harm which is present in one form or another within the system or
its environment. This potential for harm may appear in different
forms, for example: as a natural condition (e.g. terrain) or technical
status (e.g. runway markings).“

• Hazards can materialize through one or many consequences.

• Hazards can be identified reactively (past outcomes) or proactively


(low consequence events, process performance).

6.11.2023 5 / 40
Aviation SMS
• SHELL / Swiss Cheese
• Practical Drift

Source: ICAO (2018)

6.11.2023 6 / 40
The risk matrix

Source: Socha et al. (2018)

6.11.2023 7 / 40
The risk matrix

Source: Socha et al. (2018)

6.11.2023 8 / 40
ICAO Risk matrix

Source: ICAO (2018)

6.11.2023 9 / 40
ICAO Risk matrix

Source: ICAO (2018)

6.11.2023 10 / 40
ICAO Risk matrix

Source: ICAO (2018)

6.11.2023 11 / 40
Risk

Risk = [p, s]

6.11.2023 12 / 40
Risk mitigation

• Avoidance: The operation or activity is cancelled or avoided because


the safety risk exceeds the benefits of continuing the activity,
thereby eliminating the safety risk entirely.

• Reduction: The frequency of the operation or activity is reduced, or


action is taken to reduce the magnitude of the consequences of the
safety risk.

• Segregation: Action is taken to isolate the effects of the


consequences of the safety risk or build in redundancy to protect
against them.

6.11.2023 13 / 40
Hazard register

• Safety risk management activities should be documented, including


any assumptions underlying the probability and severity
assessment, decisions made, and any safety risk mitigation actions
taken.

• Hazard registers are usually in a table format and typically include:


the hazard; potential consequences; and assessment of associated
risks, identification date, hazard category, short description, when or
where it applies, who identified it and what measure have been put
in place to mitigate the risks.

6.11.2023 14 / 40
Hazard register

6.11.2023 15 / 40
2P dilemma

Source: ICAO (2018)

6.11.2023 16 / 40
Aviation SMS framework

A systematic approach to managing safety, including the


necessary organizational structures, accountability, Source: ICAO (2018)
responsibilities, policies and procedures

6.11.2023 17 / 40
SDCPS

• Safety Data Collection and Processing System

• Safety data is what is initially reported or recorded as the result of an


observation or measurement. It is transformed to safety information
when it is processed, organized, integrated or analyzed in a given
context to make it useful for management of safety.

• Data > Information > Knowledge > Wisdom


• D3M support

6.11.2023 18 / 40
SDCPS

• Occurrence reports (mandatory/voluntary)


• Incident/Accidents investigation reports
• Audits
• Safety Inspections
• Surveys/Safety studies

• Flight/radar data
• Training data
• ....

6.11.2023 19 / 40
SDCPS

6.11.2023 20 / 40
SDCPS

• It is advisable to streamline the amount of safety data and safety


information by identifying what specifically supports the effective
management of safety within their organization.

• Safety data should ideally be categorized using taxonomies so that


the data can be captured and stored using meaningful terms.

• Common taxonomies establish a standard language, enable analysis


(filtering) and facilitate information sharing and exchange

6.11.2023 21 / 40
Aviation safety taxonomies

• ICAO ADREP (Accident/Incident Data Reporting)


• CAST/ICAO CICTT (Common Taxonomy Team)
• ECCAIRS (European Co-ordination Centre for Accident and Incident
Reporting Systems)
• RIT (Reduced Interface Taxonomy)
• HEIDI (Harmonisation of European Incident Definitions Initiative for
ATM)
• Boeing MEDA (Maintenance Error Decision Aid)
• HFACS (The Human Factors Analysis and Classification System)
• ...

6.11.2023 22 / 40
Data protection

• The objective of protecting safety data, safety information and their


related sources is to ensure their continued availability.
• Individuals and organizations are protected by:
a) ensuring they are not punished on the basis of their report,
b) limiting the use of reported safety data to purposes aimed at maintaining or
improving safety.
• “Employees and operational personnel may trust that their actions or
omissions that are commensurate with their training and experience
will not be punished”.
• Principles of exception set out the circumstances in which a
departure from those protective principles may be permissible.

6.11.2023 23 / 40
Safety culture

• The beliefs, values, biases and their resultant behavior that are shared
by members of a society, group or organization.

• How people behave in relation to safety and risk when no one is


watching. „The way we do safety around here“.

• Assessment is usually done by:


a) questionnaires;
b) interviews and focus groups;
c) observations; and
d) document reviews.

6.11.2023 24 / 40
Safety culture

Source: CANSO (2008)

6.11.2023 25 / 40
Safety culture
• Safety commitment is valued.
• Safety information is surfaced without fear and incident analysis is
conducted without blame
• Incidents and accidents are valued as an important window into
systems that are not functioning as they should - triggering
improvement actions.
• There is a feeling of openness and honesty, where everyone’s voice is
respected. Employees feel that managers are listening.
– There is trust among all parties.
– Employees feel psychologically safe about reporting concerns.
– Employees believe that managers can be trusted to hear their concerns and will
take appropriate action.
– Managers believe that employees are worth listening to and are worthy of respect.

Source: Leveson (2012)

6.11.2023 26 / 40
Safety culture

Source: Flannery (2001)

6.11.2023 27 / 40
11.10.2023 - Stansted

6.11.2023 28 / 40
Safety performance

6.11.2023 29 / 40
Safety performance

„A State’s or service provider’s safety achievement as defined by its


safety performance targets and safety performance indicators.“

To measure safety performance, safety objectives (SOs), safety


performance indicators (SPIs) and safety performance targets (SPTs)
are needed.

6.11.2023 30 / 40
Safety objectives

Safety objectives are brief, high-level statements of safety achievements


or desired outcomes to be accomplished.

Source: ICAO (2018)

6.11.2023 31 / 40
SPIs

Source: ICAO (2018)

6.11.2023 32 / 40
Lagging SPIs

Low probability/high severity: outcomes such as accidents or serious


incidents. Aggregation of data (at industry segment level or regional
level) may result in more meaningful analyses.

High probability/low severity: sometimes also referred to as precursor


indicators. Primarily used to monitor specific safety issues and
measure the effectiveness of existing safety risk mitigations.

6.11.2023 33 / 40
SPTs

Source: ICAO (2018)

6.11.2023 34 / 40
Safety triggers

Source: ICAO (2018)

6.11.2023 35 / 40
Safety triggers

Source: ICAO (2018)

6.11.2023 36 / 40
Safety performance

Source: EUROCONTROL (2009)

6.11.2023 37 / 40
Safety performance

Source: EUROCONTROL (2009)

6.11.2023 38 / 40
Safety performance

Source: Lintner et al. (2009)

6.11.2023 39 / 40
Safety performance

Source: Lintner et al. (2009)

6.11.2023 40 / 40
References

CANSO (2008). Safety Culture Definition and Enhancement Process. Civil Air Navigation
Services Organisation (CANSO). Available from:
https://www.canso.org/sites/default/files/Safety%20Culture%20Definition%20and%20E
nhancement%20Process.pdf

EUROCONTROL (2009). The Aerospace Performance Factor (APF). Developing the


EUROCONTROL ESARR 2 APF. The European Organisation for the Safety of Air
Navigation (EUROCONTROL).

Flannery, J. (2001). Safety culture and its measurement in aviation. The Australian
Society of Air Safety Investigators.
References

ICAO (2013). Doc 9859: Safety Management Manual (SMM). International Civil Aviation
Organization (ICAO), Montréal, Quebec, 3. edition.

ICAO (2016). Annex 19 - Safety Management: International Standards and


Recommended Practices. International Civil Aviation Organization (ICAO), Montréal,
Quebec, 2. edition 2016.

ICAO (2018). Doc 9859: Safety Management Manual (SMM). International Civil Aviation
Organization (ICAO), Montréal, Quebec, 4. edition.

Leveson, N. (2012). Engineering a Safer World: Systems Thinking Applied to Safety. MIT
Press, Cambridge.
References

Lintner, T., Smith, S., Licu, A., Cioponea, R., Stewart, S., Majumdar, A. and Dupuy M.
(2009). The measurement of system-wide safety performance in aviation: Three case
studies in the development of the aerospace performance factor (APF). In: Proceedings
of the Flight Safety Foundation International Aviation Safety Seminar.

Socha, L., Socha, V., Vaško, B., Čekanová, A., Hanáková, L., Hanák, P. and Kraus, J.
(2018). Risk Management in the Process of Aircraft Ground Handling. In: 2018 XIII
International Scientific Conference - New Trends in Aviation Development (NTAD 2018).
DOI: 10.1109/NTAD.2018.8551753
Safety Engineering in Aviation
21BILD

Systemic model: STAMP

6.11.2023 1 / 36
Safety models/methods

Adapted from: Eurocontrol, A White Paper on Resilience Engineering for ATM, 2009.

6.11.2023 2 / 36
GEnx (B787)

Source: https://www.ge.com/

6.11.2023 3 / 36
PW JT3C (B707)

Source: https://airandspace.si.edu/

6.11.2023 4 / 36
Complexity vs. Simplicity

Source: Leveson (2012)

6.11.2023 5 / 36
Simple system

• Sense
• Categorize
• Respond

6.11.2023 6 / 36
Complicated system

• Sense
• Analyze
• Respond

6.11.2023 7 / 36
Complex system

• Probe
• Sense
• Respond

Unkwnown
Unknowns

6.11.2023 8 / 36
Chaotic system

• Act
• Sense
• Respond

No cause-effect
Relations

Act to establish
order

6.11.2023 9 / 36
STAMP

• Systems-Theory Based Accident Model and Processes


• Prof. Nancy G. Leveson
• Invented in 2004 at the MIT

6.11.2023 10 / 36
STAMP

Source: https://www.nhpr.org/

6.11.2023 11 / 36
STAMP

Source: https://www.dallasnews.com/

6.11.2023 12 / 36
Systems Theory
Systems theory is a set of principles that can be used to understand
the behavior of complex systems, whether they be natural or man-
made systems.

Complexity theory describes natural systems where seemingly


independent agents spontaneously order and reorder themselves into a
coherent system using laws of nature that we do not yet fully
understand.

Systems theory appears to be most appropriate for engineered or


designed systems while complexity theory is most appropriate for
natural and sociological systems where the design is unknown.

Source: Leveson (2019)

6.11.2023 13 / 36
Systems Theory
• The foundation of systems theory rests on two pairs of ideas:

a) emergence and hierarchy


Complex systems can be expressed in terms of a hierarchy of levels of
organization, each more complex than the one below, where a level is
characterized by having emergent properties. Emergent properties do not exist
at lower levels; they are meaningless in the language appropriate to those
levels.

b) communication and control


Control is always associated with the imposition of constraints - any
description of a control process entails an upper level imposing constraints
upon the lower. Control in open systems implies the need for communication.

6.11.2023 14 / 36
STAMP

Source: Rasmussen (1997)

6.11.2023 15 / 36
STAMP

• STAMP expands the linear model to more complex causes of


accidents.

• STAMP is not linear—it does not model losses as chains of failure


events. An important difference between Systems Theory and the
standard linear causality models is that more types of causality are
included.

6.11.2023 16 / 36
STAMP

6.11.2023 17 / 36
STAMP

Source: Leveson (2012)

6.11.2023 18 / 36
STAMP

Source: Leveson (2012)

6.11.2023 19 / 36
STAMP

6.11.2023 Source: Leveson (2012) 20 / 36


STAMP

• Passive control – maintains safety by its presence: the system fails


into a safe state or simple interlocks are used to limit the
interactions among system components

• Active control – requires some action(s) to provide protection:


a) detection of a hazardous event or condition (monitoring),
b) measurement of some variable(s)
c) interpretation of the measurement (diagnosis)
d) response

6.11.2023 21 / 36
Process model

• Any controller — human or automated —needs a model of the


process being controlled to control it effectively

• The model consists of:


– the relationships among the system variables (the control laws)
– the current state (the current values of the system
variables)
– the ways the process can change state

Source: Thomas (2013)

6.11.2023 22 / 36
Process model
• Accidents are often due to the process model used by the controller
not matching the process.

Source: Leveson (2012)

6.11.2023 23 / 36
Accident classification

• Using the STAMP, if there is an accident, one or more of the


following must have occurred:

1. The safety constraints were not enforced by the controller.


a) The control actions necessary to enforce the associated safety constraint at
each level of the sociotechnical control structure for the system were not
provided.
b) The necessary control actions were provided but at the wrong time (too early or
too late) or stopped too soon.
c) Unsafe control actions were provided that caused a violation of the safety
constraints.

2. Appropriate control actions were provided but not followed.

6.11.2023 24 / 36
STAMP

a) in STAMP, there is not root cause, only inadequate safety control


(structure).
b) it separates factual data from the interpretations of that data
c) it augments the typical failure-based design focus and encourages
a wider variety of risk reduction measures than simply adding
redundancy and overdesign to deal with component failures.
d) helps in identifying appropriate safety metrics, supports new type
ways of risk assessment

6.11.2023 25 / 36
STAMP

Source: Fletcher (2014)

6.11.2023 26 / 36
STPA
• System-Theoretic Process Analysis

Source: Leveson and Thomas (2018)

6.11.2023 27 / 36
STPA

Source: Leveson and Thomas (2018)

6.11.2023 28 / 36
STPA

Source: Leveson and Thomas (2018)

6.11.2023 29 / 36
STPA

Source: Leveson and Thomas (2018)

6.11.2023 30 / 36
CAST

Source: Leveson (2019)

6.11.2023 31 / 36
STAMP – case study
• London Clapham South Station, March 21, 2015

Source: RAIB (2016)

6.11.2023 32 / 36
STAMP – case study
• London Clapham South Station, March 21, 2015

Source: RAIB (2016)

6.11.2023 33 / 36
STAMP – case study

Source: RAIB (2016)

6.11.2023 34 / 36
STAMP – case study

Source: RAIB (2016)

6.11.2023 35 / 36
STAMP
The train operator:
The CSA:

Source: Zhou and Yan (2018)

6.11.2023 36 / 36
References

Fletcher, R. (2014). CAST (Causal Analysis using System Theory) Accident Analysis.
https://system-safety.org/issc2014/57_CAST.pdf

Leveson, N. (2012). Engineering a Safer World: Systems Thinking Applied to Safety. MIT
Press, Cambridge.

Leveson, N. and Thomas, J. (2018). STPA Handbook. Available from:


http://psas.scripts.mit.edu/home/get_file.php?name=STPA_handbook.pdf

Leveson, N. (2019). CAST Handbook. Available from:


http://sunnyday.mit.edu/CAST-Handbook.pdf
References

RAIB (2016). Rail Accident Report: Passenger trapped in train doors and dragged
at Clapham South station12 March 2015. Rail Accident Investigation Branch (RAIB),
Department for Transport.

Thomas, J. (2013). Extending and Automating a Systems-Theoretic Hazard Analysis for


Requirements Generation and Analysis. Ph.D. thesis, MIT.

Zhou, Y. and Yan F. (2018). Causal Analysis to a Subway Accident: A Comparison of


STAMP and RAIB. MATEC Web Conf., Vol 160, International Conference on Electrical
Engineering, Control and Robotics (EECR 2018).
Safety Engineering in Aviation
21BILD

Engineering safe systems

13.11.2023 1 / 34
Airworthiness

• The status of an aircraft, engine, propeller or part when it conforms


to its approved design and is in a condition for safe operation

• ICAO Annex 8 – Airworthiness of Aircraft (Ed.12)


• ICAO Doc. 9760 – Airworthiness Manual (Ed.3)
• Commission Regulation (EU) No 748/2012
(Annex I - Part 21)

13.11.2023 2 / 34
Airworthiness

State airworthiness responsibilities:


• State of Design and State of Manufacture
• State of Registry
• State of the Operator

Two main functions:


• Type certification / Design approval
• Continuing airworthiness

13.11.2023 3 / 34
Airworthiness

Source: https://www.youtube.com/

13.11.2023 4 / 34
Airworthiness

• NAAs with an airworthiness engineering division (AED)

• The AED will normally establish and carry out procedures for the
type certification or other design approval of aircraft, engine,
propellers, equipment and instruments that are designed or
produced in that State.

• All Contracting States are encouraged to give maximum credit and


recognition to the type certification already done by the State of
Design, and avoid duplicate or redundant testing

13.11.2023 5 / 34
Airworthiness

• The majority of airworthiness standards currently used by States


with aviation manufacturing industries are already harmonized.

• The remaining differences are either with the unique technical


requirements, due to operational or environmental constraints,
and/or interpretation of the same requirements.

• Full harmonization of all airworthiness requirements is yet to come.

13.11.2023 6 / 34
Type certification process

• There are five key activities associated with a type certification


process:

a) establishing the certification basis;


b) establishing the means or methods of compliance;
c) demonstration and findings of compliance;
d) certifying the type design; and
e) post‐type certification activities.

13.11.2023 7 / 34
Type certification process

• The means of compliance:

a) Test – performed when the requirement explicitly calls for a


demonstration by test (physical, actual or simulation). Examples of
test are flight test, ground test, fatigue test, functional test, bird
strike test, and engine ingestion test.
b) Analysis – performed when the requirement explicitly calls for a
demonstration by analysis (qualitative, quantitative, or
comparative), or when the applicant can demonstrate, based on
previously accepted test results, the validity of using analysis in
lieu of testing.

13.11.2023 8 / 34
Type certification process

• The means of compliance:

c) Inspection or evaluation – performed against an item that does not


require test or analysis, but relies on observation, judgment,
verification, evaluation, or a statement of attestation from the
applicant or its vendors/contractors.

13.11.2023 9 / 34
Safety studies

Source: https://future.prg.aero/

13.11.2023 10 / 34
Safety studies
• Safety Assessment Methodology (SAM) – EUROCONTROL

Source: Štumper and Kraus (2016)

13.11.2023 11 / 34
Safety studies

1. Functional Hazard Assessment (FHA)


Identify hazards, assess their effects and the related severity. Main goal is to set
Safety Objectives (SO).

2. Preliminary System Safety Assessment (PSSA)


Fault tree analysis, event tree analysis, common cause analysis. System architecture
evaluation against SO, Proposing Safety Requirements

3. System Safety Assessment (SSA)


Documentation of the evidence, collecting data, test and validation

Source: Štumper and Kraus (2016)

13.11.2023 12 / 34
FHA

Source: Štumper (2016)

13.11.2023 13 / 34
FHA

• Safety Objective is a qualitative or quantitative statement that


defines the maximum frequency or probability at which a hazard can
be expected to occur.

• ICAO Doc. 9859, ESARR 4 (Risk Assessment and Mitigation in ATM)

13.11.2023 14 / 34
FHA – ESARR 4 SCs

Source: EUROCONTROL (2001).

13.11.2023 15 / 34
FHA – ESARR 4 SCs
Maximum Acceptable frequency of Severity Class of the Worst
occurrence of Hazard (Safety Credible hazard effect
Objective) [as per ESARR4]
[Per Operational-hour]
SO < 10-7 SC1
10–7 < SO < 10-5 SC2
10–5 < SO < 10-4 SC3
10–4 < SO < 10-3 SC4
10–3 < SO < 10-1 SC5

Maximum acceptable Severity Class of the Worst


Credible hazard effect
frequency of hazard occurrence
(Safety Objective) [as per ESARR4]
EXTREMELY RARE SC1
RARE SC2
OCCASIONAL SC3
LIKELY SC4
NUMEROUS SC5
Source: EUROCONTROL (2006).

13.11.2023 16 / 34
PSSA

Source: Štumper (2016)

13.11.2023 17 / 34
PSSA
PAL

“Pivotal”
HAZARD
Event S Effect1

S
Effect2
F
S Effect3

Pe F
Ph F
Effect4
Procedure FTA ETA
Causes Consequences

Source: EUROCONTROL (2006)


2

13.11.2023 18 / 34
Hazard register

13.11.2023 19 / 34
Risk mitigation strategy

Source: Leveson (2012)

13.11.2023 20 / 34
SSA

Source: Štumper (2016)

13.11.2023 21 / 34
Systems engineering

• The glue that integrates the activities of engineering and operating


complex systems is specifications and the safety information
system.

• Specifications are critical to engineering of systems of the size and


complexity we are attempting to build today. They are no longer
simply a means of archiving information; they need to play an active
role in the system engineering process to allow:
– reason about particular properties
– construct the system and the software in it to achieve them
– validate that the evolving system has the desired qualities.

13.11.2023 22 / 34
Systems engineering

• The key issue – Engineers are unlikely to be able to read through


volumes of hazard analysis information and relate it easily to the
specific component upon which they are working.

• Most complex systems have voluminous documentation, much of it


redundant or inconsistent, and it degrades quickly as changes are
made over time.

• Sometimes important information is missing, particularly


information about why something was done the way it was — the
intent or design rationale

13.11.2023 23 / 34
Intent specification

Source: Leveson (2012)

13.11.2023 24 / 34
Intent specification

Source: Leveson (2012)

13.11.2023 25 / 34
Case study – TCAS II

• Goals of the system(L1):

1. Provide affordable and compatible collision avoidance system options for a


broad spectrum of National Airspace System users.

2. Detect potential mid-air collisions with other aircraft in all meteorological


conditions; throughout navigable airspace, including airspace not covered by
ATC primary or secondary radar systems; and in the absence of ground
equipment.

13.11.2023 26 / 34
Case study – TCAS II

• Hazards considered during the design (L1):

1. TCAS causes or contributes to a near mid-air collision (NMAC), defined as a


pair of controlled aircraft violating minimum separation standards.
2. TCAS causes or contributes to a controlled maneuver into the ground.
3. TCAS causes or contributes to the pilot losing control over the aircraft.
4. TCAS interferes with other safety-related aircraft systems (for example, ground
proximity warning).
5. TCAS interferes with the ground-based air traffic control system (e.g.,
transponder transmissions to the ground or radar or radio services).
6. TCAS interferes with an ATC advisory that is safety-related (e.g., avoiding a
restricted area or adverse weather conditions).

13.11.2023 27 / 34
Case study – TCAS II

• The requirements for the integration of the new subsystem safely


into the larger system (L1):

1. The behavior or interaction of non-TCAS equipment with TCAS must not


degrade the performance of the TCAS equipment or the performance of the
equipment with which TCAS interacts.
2. Among the aircraft environmental alerts, the hierarchy shall be: Wind shear has
first priority, then the Ground Proximity Warning System (GPWS), then TCAS.
3. The TCAS alerts and advisories must be independent of those using the master
caution and warming system.

13.11.2023 28 / 34
Case study – TCAS II

• High-level functional requirements implementing the goals for


TCAS (L2):
– 1.18: TCAS shall provide collision avoidance protection for any two aircraft
closing horizontally at any rate up to 1200 knots and vertically up to 10,000 feet
per minute.
– Assumption: This requirement is derived from the assumption that commercial
aircraft can operate up to 600 knots and 5000 fpm during vertical climb or
controlled descent (and therefore two planes can close horizontally up to 1200
knots and vertically up to 10,000 fpm).

– 1.19.1: TCAS shall operate in enroute and terminal areas with traffic densities
up to 0.3 aircraft per square nautical miles (i.e., 24 aircraft within 5 nmi).
– Assumption: Traffic density may increase to this level by 1990, and this will be
the maximum density over the next 20 years.

13.11.2023 29 / 34
The issue of probability

Source: https://mars.nasa.gov/
13.11.2023 30 / 34
Risk

Risk = [p, s]

13.11.2023 31 / 34
The issue of probability

Source: Leveson and Dulac (2009)

13.11.2023 32 / 34
The issue of probability

Source: Leveson and Dulac (2009)

13.11.2023 33 / 34
Risk re-definition

Risk: “A combination of the severity of the hazard and the mitigation effectiveness in
controlling the hazard.”
Source: Gregorian and Yoo (2021)

13.11.2023 34 / 34
References

EUROCONTROL (2001). Safety Regulatory Requirement (ESARR 4): Risk Assessment And
Mitigation In ATM . The European Organisation for the Safety of Air Navigation.
https://www.eurocontrol.int/sites/default/files/article/content/documents/single-
sky/src/esarr4/esarr4-e1.0.pdf

EUROCONTROL (2006). Safety Assessment Methodology. A framework of methods and


techniques to develop safety assessments of changes to functional systems. The
European Organisation for the Safety of Air Navigation.
https://www.eurocontrol.int/tool/safety-assessment-methodology

Gregorian, D. and S. Yoo (2021). A System-Theoretic Approach to Risk Analysis. Master’s


thesis, MIT.
References

Leveson, N. and N. Dulac (2009). Incorporating Safety in Early System Architecture Trade
Studies. Journal of Spacecraft and Rockets. 46(2), pp. 430-437.

Leveson, N. (2012). Engineering a Safer World: Systems Thinking Applied to Safety. MIT
Press, Cambridge.

Štumper, M. (2016). Studie bezpečnosti v letectví. Master’s Thesis. Czech Technical


University in Prague.

Štumper, M. and J. Kraus (2016). Safety Study in Aviation. Magazine of Aviation


Development. 4(19), pp. 19-22.
Safety Engineering in Aviation
21BILD

Safety-II

20.11.2023 1 / 36
Safety models/methods

Adapted from: Eurocontrol, A White Paper on Resilience Engineering for ATM, 2009.

20.11.2023 2 / 36
FRAM and RAG

• Prof. Erik Hollnagel


(University of Southern Denmark)

• FRAM – 2004
• RAG – 2011

20.11.2023 3 / 36
Dynamic non-event

• Safety is measured indirectly as the so-called „dynamic non-event.“,


i.e. high number of accidents equals low level of safety and vice
versa
(SESAR sets the target to improve the safety of European ATM by a factor of 10)

• Safety paradox: the safer the system, the less feedback there is and
the less certainty about the current state

• Modern high reliability organizations with safety management


system (SMS) practically cannot estimate the actual state

20.11.2023 4 / 36
Data from operations

Source: EUROCONTROL (2013)

20.11.2023 5 / 36
Safety data

Source: EUROCONTROL (2013)

20.11.2023 6 / 36
Safety data

Is is possible to understand what a


happy marriage is by analyzing and
learning from divorces alone?

Is it possible to understand what


safety is by analyzing and learning
from accidents and incidents alone?

20.11.2023 7 / 36
Safety data

Source: Hollnagel (2014)

20.11.2023 8 / 36
Safety data

Source: EUROCONTROL (2013)

20.11.2023 9 / 36
Safety-I

• When things go right there is no discernible difference between


expected and actual outcomes, hence nothing that attracts
attention.

• Things obviously went well because the system worked as it should.

• When a difference between Work-As-Imagined and Work-As-Done is


found, it is conveniently used to explain why things went wrong.

• Safety–I promotes a bimodal or binary view of work - the outcome


can be acceptable or unacceptable – we succeed or fail.

20.11.2023 10 / 36
Safety-I

Source: EUROCONTROL (2013)

20.11.2023 11 / 36
Safety-I

Source: Eurocontrol (2013)

Source: EUROCONTROL (2013)

20.11.2023 12 / 36
Safety-I

• Hypothesis of different causes: the causes or ‘mechanisms’ that


lead to failures are different from those that lead to ‘normal’
outcomes (success)

• Outcomes can be understood as effects that follow from prior


causes

• Since all adverse outcomes have a cause that can be found, all
accidents can be prevented

20.11.2023 13 / 36
The theory of safety

Source: Hollnagel (2014)

20.11.2023 14 / 36
Ontology of Safety-I

The cornerstone assumptions:

1. Systems are decomposable


2. Functioning of the components can be described in bimodal terms
3. It is possible to determine the order in which events will develop in
advance

20.11.2023 15 / 36
Ontology of Safety-II

The cornerstone assumptions:

1. Human performance, individually or collectively, is always variable.


2. It is neither possible nor meaningful to characterise components in
terms of whether they have worked or have failed, or whether the
functioning is correct or incorrect.

20.11.2023 16 / 36
Safety-II
• Systems are not flawless and people must learn to identify and
overcome design flaws and functional glitches.

• People are able to recognise the actual demands and can adjust
their performance accordingly.

• When procedures must be applied, people can interpret and apply


them to match the conditions.

• People can detect and correct when something goes wrong or when
it is about to go wrong, and hence intervene before the situation
seriously worsens.

20.11.2023 17 / 36
Safety-II

Adapted from: Hollnagel (2014)

20.11.2023 18 / 36
Aetiology of Safety-II
• Emergent outcomes can be understood as arising from unexpected
– and unintended – combinations of performance variability where
the governing principle is resonance rather than causality.

• The principle of emergence means that while we may plausibly


argue that a condition existed at some point in time, we can never
be absolutely sure.

• In the case of emergent outcomes, the ‘causes’ represent patterns


that existed at one point in time but which did not leave any kind of
permanent trace. The outcomes can therefore not be traced back to
specific components or functions.

20.11.2023 19 / 36
Aetiology of Safety-II

Source: Hollnagel (2014)

20.11.2023 20 / 36
Aetiology of Safety-II

Source: Hollnagel (2012)

20.11.2023 21 / 36
Safety-I vs. Safety-II

Source: Hollnagel (2014)

20.11.2023 22 / 36
FRAM and RAG

• Prof. Erik Hollnagel


(University of Southern Denmark)

• FRAM – 2004
• RAG – 2011

20.11.2023 23 / 36
FRAM
• Functional Resonance Analysis Method

4 basic principles:
a) Failures and successes are equivalent
b) Everyday performance of socio-technical systems always is
adjusted to match the conditions.
c) Many of the outcomes we notice must be described as emergent
rather than resultant
d) Relations and dependencies among the functions of a system
must be described as they develop in a specific situation by using
functional resonance.

20.11.2023 24 / 36
FRAM

The method:

1. Produce functional model of a system


2. Identify performance variability
3. Analyze variability resonance
4. Propose control measures for undesired resonance

20.11.2023 25 / 36
FRAM – model

20.11.2023 26 / 36
FRAM – model

20.11.2023 27 / 36
FRAM – variability

Source: Hollnagel (2012)

20.11.2023 28 / 36
FRAM – variability

20.11.2023 29 / 36
FRAM – variability

Source: Patriarca et al. (2017)

20.11.2023 30 / 36
FRAM – variability

Source: Patriarca et al. (2017)

20.11.2023 31 / 36
FRAM – case study

Source: https://www.flickr.com/

20.11.2023 32 / 36
FRAM – case study

20.11.2023 Source: https://ais.avinor.no/ 33 / 36


FRAM – case study

Source: AIBN (2014)

20.11.2023 34 / 36
FRAM – case study

Source: Herrera and Woltjer (2010)

20.11.2023 35 / 36
FRAM – case study

20.11.2023 Source: Herrera and Woltjer (2010)


36 / 36
References
AIBN (2004). Rapport etter alvorlig luftfartshendelse ved Oslo Lufthavn Gardermoen 9.
February 2003 med Boeing 737-36N, NAX541, operert av Norwegian Air Shuttle. Aircraft
Investigation Board Norway (AIBN), SL RAP.:20/2004.

EUROCONTROL (2009). White Paper on Resilience Engineering for Air Traffic


Management. The European Organisation for the Safety of Air Navigation.
https://www.eurocontrol.int/publication/white-paper-resilience-engineering-air-traffic-
management

EUROCONTROL (2013). From Safety-I to Safety-II: A White Paper. The European


Organisation for the Safety of Air Navigation.
https://www.skybrary.aero/bookshelf/books/2437.pdf

Herrera, I. A., and Woltjer R. (2010). Comparing a multi-linear (STEP) and systemic
(FRAM) method for accident analysis. Reliability Engineering and System Safety. 2010,
95, pp. 1269-1275.
References

Hollnagel, E. (2012). FRAM: the Functional Resonance Analysis Method: Modelling


complex socio-technical systems. Burlington, VT: Ashgate.

Hollnagel, E. (2014). Safety-I and Safety-II: The Past and Future of Safety Management.
Burlington, VT, USA: Ashgate.

Leveson, N. (2012). Engineering a Safer World: Systems Thinking Applied to Safety. MIT
Press, Cambridge.

Patriarca, R., Di Gravio, G. and Costantino F. (2017). A Monte Carlo evolution of the
Functional Resonance Analysis Method (FRAM) to assess performance variability in
complex systems. Safety Science. 2017, 91, pp. 49-60.
Safety Engineering in Aviation
21BILD

Human Factors

27.11.2023 1 / 30
Safety-I vs. Safety-II

Source: Hollnagel (2014)

27.11.2023 2 / 30
Human factors in Safety-I

• „ 85 percent of our incidents or accidents are due to ‘human error’ “

• ‘human error’ is the cause of trouble, but it is a simple problem –


once all systems are in place, just get people to pay attention and
comply

• Tighter procedures, compliance, technology and supervision can


reduce the ‘human error’ problem

• Zero errors is the ultimate goal

27.11.2023 3 / 30
The Bad apple Theory
• Safety problems are the result of a few Bad apples in an otherwise
safe system.

• These Bad apples don’t always follow the rules, they don’t always
watch out carefully.

• They undermine the organized and engineered system that other


people have put in place.

• Some Bad Apples have negative attitudes toward safety, which


adversely affects their behavior. So not attending to safety is a
personal problem, a motivational one, an issue of individual choice.

27.11.2023 4 / 30
The Bad apple Theory

The solution:

• get rid of Bad Apples;


• put in more rules, procedures and compliance demands;
• tell people to be more vigilant (with posters, memos, slogans);
• get technology to replace unreliable people.

27.11.2023 5 / 30
The Old view

Source: https://www.nata.aero/

27.11.2023 6 / 30
The Old view
• Investigation and disciplinary actions under the same department or
with the same people

• Behavior modification programs to deal with human errors

• Safety managers going to conferences about motivation, attitude,


personality

• People fired due to safety infractions

• Sometimes HR department responsible for safety

27.11.2023 7 / 30
The New view

• People do not come to work to do a bad job.

• New View: All behavior is affected by the context (system) in which


it occurs. The best way to change human behavior is to change the
system in which it occurs.

• The shortcuts and adaptations people have introduced into their


work often do not serve their goals, but those of the organization

27.11.2023 8 / 30
The ETTO

• Efficiency-Thoroughness Trade-Off (ETTO)

• In daily activities, at work or at leisure, people routinely make a


choice between being effective and being thorough, since it rarely is
possible to be both at the same time.

• If demands for productivity or performance are high (no time,


insufficient resources …), thoroughness is reduced until the
productivity goals are met. If demands for safety are high, efficiency
is reduced until the safety goals are met.

27.11.2023 9 / 30
The ETTO rules

• ‘It looks fine’


• ‘It is not really important’
• ‘It is normally OK, there is no need to check’
• ‘It is good enough for now’
• ‘It will be checked later by someone else’
• ‘It has been checked earlier by someone else’
• ‘(Doing it) this way is much quicker’
• ‘It looks like a Y, so it probably is a Y’
• ‘It normally works’
• ‘If you don’t say anything, I won’t either’

27.11.2023 10 / 30
The ETTO rules

Source: Hollnagel (2009)

27.11.2023 11 / 30
The ETTO principle

• People (humans and organisations) do not ETTO as a deliberate


choice. They ETTO because they have learned that this is an
effective strategy – by imitation or by active encouragement.

• Given that positive outcomes normally outnumber adverse


outcomes by several orders of magnitude, people who change from
being thorough to being efficient will be reinforced in their behavior.

• They continue to do so because they are rewarded or reinforced in


using the heuristic.

27.11.2023 12 / 30
The ETTO principle
• They are warned not to do it when they fail. But since success is
normal and failure is rare, it requires a deliberate effort not to ETTO,
to go against the pressure of least effort.

Source: Hollnagel (2009)

27.11.2023 13 / 30
The ETTO principle

• The efficiency-thoroughness trade-off (or performance variability) is


a characteristic of people, because they are flexible and intelligent
beings, so as of organisations as socio-technical systems.

• The efficiency-thoroughness trade-off cannot be made by machines


unless it is part of their design or programming. Yet in such cases
the trade-off is different from what humans do, since it is
algorithmic rather than heuristic.

27.11.2023 14 / 30
The ETTO principle

Source: Hollnagel (2009)

27.11.2023 15 / 30
The Old vs. The New

The Old View The New View


Asks who is responsible Asks what is responsible

Says what people failed to do Why people did what they did

Human error is a cause Human error is a symptom


(of system that needs redesign)
Human error is acceptable Human error is only a starting
conclusion of investigation point for investigation

Source: Dekker (2014a)

27.11.2023 16 / 30
UPS flight 1354 (CFIT)

Source: NTSB (2014)

27.11.2023 17 / 30
UPS flight 1354 (CFIT)
NTSB: The probable cause of this accident was the flight crew’s
continuation of an unstabilized approach and their failure to monitor
the aircraft’s altitude during the approach, which led to an inadvertent
descent below the minimum approach altitude and subsequently into
terrain

Contributing factors:
1. flight crew’s failure to properly configure and verify the flight
management computer for the profile approach;

2. captain’s failure to communicate his intentions to the first officer


once it became apparent the vertical profile was not captured;

Source: NTSB (2014)

27.11.2023 18 / 30
UPS flight 1354 (CFIT)
3. flight crew’s expectation that they would break out of the clouds at
1,000 feet above ground level due to incomplete weather
information;

4. first officer’s failure to make the required minimums callouts;

5. captain’s performance deficiencies likely due to factors including,


but not limited to, fatigue, distraction, or confusion, consistent with
performance deficiencies exhibited during training; and

6. first officer’s fatigue due to acute sleep loss resulting from her
ineffective off-duty time management and circadian factors.

Source: NTSB (2014)

27.11.2023 19 / 30
Problem of ’failure’
• Humans do not “fail” (unless their heart stops). They simply react to
the situations in which they find themselves.

• What they did may turn out to be the wrong thing to do. But why it
seemed to them to be the right thing at the time needs to examined
to make useful recommendations.

• Companies also do not fail unless they go out of business.


“Company X failed to learn from prior events.” Companies are
pieces of paper describing legal entities. Documents do not learn or
fail.

27.11.2023 20 / 30
Problem of ’failure’
• Software does not “fail” either; it simply executes the logic that was
written.

• There needs to be an examination of why unsafe software was


created—usually it can be traced to requirements flaws—and
recommendations involving improvement of the process that
produced the unsafe software.

• Concluding that the “software failed” makes no technical sense and


provides no useful information.

27.11.2023 21 / 30
UPS flight 1354 (CFIT)
Contributing factors:

flight crew’s failure to properly configure and verify the flight


management computer for the profile approach;
flight crew did not configure and verify the flight management
computer for the profile approach;

captain’s failure to communicate his intentions to the first officer once


it became apparent the vertical profile was not captured;
captain’s did not communicate his intentions to the first officer when
the vertical profile was not captured;

27.11.2023 22 / 30
The Old vs. The New

The Old View The New View


Boss or safety manager gets Whoever is expert and knows
to say the work gets to say
Make it impossible for people Give people space and
to do wrong things possibility to do right things
Bureaucracy Mutual coordination
Predictability, standardization Diversity, innovation

Source: Dekker (2014a)

27.11.2023 23 / 30
Hindsight bias

Source: Dekker (2014a)


Hindsight gets you to
oversimplify history. You will
see events as simpler, more
linear, and more predictable
than they once were.

27.11.2023 24 / 30
Hindsight bias

Source: Dekker (2014a)

27.11.2023 25 / 30
Situation awareness

The concept is poorly specified:

• What is the situation of which the operator is required to be aware?


• The investigators want the operator to be aware of “anything in the
environment which might be of importance if it should change
unexpectedly.”
• The set of events of which operators are to be aware is not defined,
it is unreasonable to expect them to monitor the members of an
undefined set; whilst if one defines a set, there is always the
possibility that the dangerous event is not a member of the set, in
which case the operators should not have been monitoring it.

27.11.2023 26 / 30
Procedures and safety

• Safety is not the result of rote rule following.


• Practices do not follow rules, rather, rules follow evolving practices
• Procedures are resources for action
• Applying procedures successfully across situations can be a
substantive and skillful cognitive activity.
• Procedures can, in themselves, not guarantee safety. Safety results
from people being skillful at judging when and how to adapt
procedures to local circumstances.
• For progress on safety, organizations must monitor and understand
the reasons behind the gap between procedures and practice.

27.11.2023 27 / 30
Procedures and safety

Source: Leveson (2012)

27.11.2023 28 / 30
What to do?

• You have to put yourself in their shoes;


• Imagine that you don’t know the outcome;
• Try to reconstruct which cues came when, which indications may
have contradicted them;
• Envisage what flow of cues and indications could have meant to
people, given their likely understanding of the situation;
• Try to understand how their understanding of the situation was not
static or complete.

27.11.2023 29 / 30
What about accountability?

• The New View does not claim that people are perfect. But it keeps
you from judging and blaming people for not being perfect.

• People do want to be held accountable fairly. They want to be held


accountable by those who really know the details of what it takes to
get the job done—not by those (managers, investigators, judges)
who only think they know.

27.11.2023 30 / 30
What about accountability?

• You need to be able to show that people had the authority to live up
to the responsibility that you are now asking of them.

• You can hold people accountable by letting them tell their story,
literally “giving their account.”

• The challenge is to create a culture of accountability that


encourages learning.

27.11.2023 Source: https://www.youtube.com/


31 / 35
References

Dekker, S. (2014a). The Field Guide to Understanding 'Human Error'. Burlington, VT:
Ashgate.

Dekker, S. (2014b). Safety Differently: Human Factors for a New Era. CRC Press.

Hollnagel, E. (2009). The ETTO Principle: Efficiency-Thoroughness Trade-Off. Burlington,


VT: Ashgate.

Hollnagel, E. (2014). Safety-I and Safety-II: The Past and Future of Safety Management.
Burlington, VT, USA: Ashgate.

Leveson, N. (2019). CAST Handbook. Available from:


http://sunnyday.mit.edu/CAST-Handbook.pdf
References

NTSB (2014), Crash During a Nighttime Non-precision Instrument Landing, UPS Flight
1354, Birmingham, Alabama, August 14, 2013, National Transportation Safety Board
(NTSB), Accident Report NTSB/AAR-14/02.
Safety Engineering in Aviation
21BILD

Resilience (Safety-III)

4.12.2023 1 / 23
Future of safety

4.12.2023 2 / 23
FRAM and RAG

• Prof. Erik Hollnagel


(University of Southern Denmark)

• FRAM – 2004
• RAG – 2011

4.12.2023 3 / 23
RAG

Source: Koren et al.(2017)

4.12.2023 4 / 23
RAG
Resilience Assessment Grid

• The purpose of the RAG is to rate or measure the potentials for


resilient performance
• Based on diagnosis and formative questions

Resilience is an expression of how people, alone or together, cope with


everyday situations – large and small – by adjusting their performance
to the conditions. An organization’s performance is resilient if it can
function as required under expected and unexpected conditions alike.

4.12.2023 5 / 23
RAG

Four potentials for resilient performance are proposed by the method:

• the potential to respond,


• the potential to monitor,
• the potential to learn and
• the potential to anticipate

4.12.2023 6 / 23
RAG

Source: Hollnagel (2017)

4.12.2023 7 / 23
RAG

Source: Hollnagel (2017)

4.12.2023 8 / 23
RAG – questions (response)

Source: Hollnagel (2017)


Source: Hollnagel (2017)

4.12.2023 9 / 23
Source: Hollnagel (2017)

4.12.2023 Source: Chuang et al.(2020) 10 / 23


Source: Hollnagel (2017)

Source: Chuang et al.(2020)

4.12.2023 11 / 23
Synesis
• There is a need for replacement of the term safety. By using the
term „synesis“, safety is defined as ‘with’ rather than as ‘without’.
• Safety-II, i.e., synesis, is the presence of acceptable outcomes. The
more there are, the safer the system.

Source: Hollnagel (2017)

4.12.2023 12 / 23
SMS integration

Safety management should be considered as part of a management


system (and not in isolation). Therefore, a service provider may
implement an integrated management system that includes the SMS.
Integrated management system may include:
• quality management system (QMS),
• safety management system (SMS),
• security management system (SeMS),
• environmental management system (EMS),
• occupational health and safety management system (OHSMS),
• financial management system (FMS),
• fatigue risk management system (FRMS),
• documentation management system (DMS)

4.12.2023 13 / 23
Safety-I and Safety-II

Adapted from: Eurocontrol, A White Paper on Resilience Engineering for ATM, 2009.

4.12.2023 14 / 23
STAMP and Safety-I
Safety-III is needed to put STAMP into the context. It is not new,
however—the practices have been around since 1950s, primarily used
in the most sophisticated / secretive engineering contexts

Source: Leveson (2020)

4.12.2023 15 / 23
Safety-I, -II and -III

Adapted from: Eurocontrol, A White Paper on Resilience Engineering for ATM, 2009.

4.12.2023 16 / 23
Weak spots of Safety-II

It is possible to increase the number of things that go right without ever


impacting safety. The assumption that increasing the things that go
right will decrease the things that go wrong is not valid.

Hazard analysis uses a backward approach in system safety, i.e., start


from the hazards and identify scenarios that can lead to them, because
it is impractical to look at all possible scenarios for operation of a
complex system and determine if any of them lead to a hazard.

It is not possible to describe all human variability because there are just
too many ways to do most jobs and there is no way to determine
whether the job/task is safe under all conditions that can occur.

Source: Leveson (2020)

4.12.2023 17 / 23
Weak spots of Safety-II

Computers and other means of control (beyond human controllers) are


omitted in Safety-II. The model is exclusively human factors oriented
and not systemic.

The role of human operators in systems is staying steady or


decreasing, the role of hardware is decreasing, the role of designers
and managers is staying steady, the role of software is increasing.

Safety-I is more general accident causation model than Safety-II.

Visible also at how emergence is explained in Safety-II – suitable for


natural and not for man-made systems

Source: Leveson (2020)

4.12.2023 18 / 23
STAMP and resilience

• Safety-II: Resilience is an expression of how people, alone or


together, cope with everyday situations – large and small – by
adjusting their performance to the conditions. An organization’s
performance is resilient if it can function as required under expected
and unexpected conditions alike.

• STAMP: Resilience in the ability of the system to maintain the safety


constraints in the face of unplanned or inadequately controlled
behavior or hazards.

Source: Leveson (2020)


4.12.2023 19 / 23
STAMP and resilience

If we want resilience in systems, then it needs to be integrated into the


design of the entire system.

Human operators rarely have the ability to provide resilience in highly


automated systems unless the engineers have designed in the ability
for them to provide it.

Source: Leveson (2020)

4.12.2023 20 / 23
STAMP and resilience

4.12.2023 21 / 23
Final thoughts

4.12.2023 22 / 23
Final thoughts

• Safety-II is (currently) human factors-centered


• Safety-II is based on social science

• Safety-III is total system-centered model


• Safety-III is based on engineering and systems theory

• STAMP is more evolved – covers HW, SW, humans, organizations,


resilience and other emergent system-level properties (security,
quality, efficiency etc.), with detailed guidance in handbooks

4.12.2023 23 / 23
References

Hollnagel, E. (2017). Safety-II in Practice: Developing the Resilience Potentials.


Routledge.

Chuang, S., Ou, J.-Ch. and Ma H.-P. (2020). Measurement of resilience potentials in
emergency departments: Applications of a tailored resilience assessment grid. Safety
Science. 2020, 121, pp. 385-393.

Leveson, N. (2020). Safety III: A Systems Approach to Safety and Resilience. MIT,
Cambridge.

Koren, D., Kilar V. and Rus K. (2017). Proposal for Holistic Assessment of Urban System
Resilience to Natural Disasters. In: IOP Conference Series Materials Science and
Engineering 245 (6).

4.12.2023

You might also like