You are on page 1of 23

Resilience, Business Continuity,

& Safety
in Socio-Technical Systems

Diarmuid Moran
BSc, PhD, CEng, MIEI
+353 87 2844488

Bowline Risk Management Ltd


• Network alarms activated,

• Smoke & fire alarms activated @ key (data) eXchange,

• Engineer Folks drove quite fast,

• Extinguishers were employed,

• A flood was discovered,

• Power was switched off,

• A Crisis was underway


Differences between Robustness,
Agility / Flexibility and Resilience

“events” have minimal impact BAU continues

certain events do impact and outputs are


recovered but maybe different

a major event impacts but outputs are quickly


restored

“BAU = Business as Usual”

The time & anticipation factor


Selection of Notable “Events”
• Chernobyl Accident, Y2k
• 9-11, wars, fires, air-crashes
• Foot & Mouth Disease
• SARS (Co v1), Swine flu
• Strike, Fraud, Regulatory & Social Media gaffes
• IT Virus & Denial of service / data
• Recessions and the Covid-19 Pandemic.
Parallels and Context
• Graded or Proportional Approach
• Hazard, Risk =f(p,C), Mitigation, Transfer/Avoid, Contingency, Exploit
• Cost of BCM – Benefit of measures (cost of loss v cost of mitigation)
• Worst case, expected case & reasonably practicable
• Business Impact Assessment (critical systems)
• Project Risk Management (time, cost, quality)
• Supply Chain Management – “Duty Holder”
Safety in Design & Procur€m€nt
• Life Cycle Design and Planning
– Design for needs and functionality eg MTBF
– Design for operation (IoT)
– Design for security / capacity (SIL)
– Design for ease of maintenance
– Design for environmental performance
– Design for decommissioning / ease of recall
– Design for economics and efficiency
– Design for future proofing / IPR mgmt & control (eg BIM)
– Life time records / risk ( Residual / Latent)

• Human factors & Non-Linear “coupling” or


Functional Resonance
Redundancy Principle N+1, N+2, N+N, N+3

N+N
Diversity Principle

ESB + Battery

Eircom + BT

Fibre + Wireless

Front + Back
Separation Principle
Human factors & Non-Linear “coupling” or Functional Resonance

Modified Perrow diagram – characterisation of complex systems – framework


shows where socio-technical / resilience engineering impacts and factors arise
Non-linear and Safety II
Prof Erik Hollnagel
• Complexity – too many “requirements”
• ETTO is the principle that there is a Trade-
Off between Efficiency on one hand, and
Thoroughness (such as safety
assurance and human reliability)
In accordance with this principle, demands for
productivity tend to reduce thoroughness while
demands for safety reduce efficiency.

Prof James Reason,


• Human error: models and management -

• although he does conclude thatHigh reliability


organisations are the prime examples of
the system approach. They anticipate the
worst and equip themselves to deal with it
at all levels of the organisation.
Covid -19 Case Study -
Agile Response & Crisis Management
Emergency Legislation & Communication
FL, ES Workers & volunteers –resilient after xx yy etc.
NPHET - Cabinet

Business Owners – BCM managers?


Various levels / certain robustness in sectors
(on-line, order/collect, test & trace, adaptation, PPE, WFH)
Certain agile and flexible systems – guidelines & definitions

Fake News / Media Issues / Fairness


Very “agile” response / huge event
Widespread & profound tragedy, hardship & pain

2 years on – resilience (BAU) or learning and evolution


Business Continuity Management
• Roles and Activities ISO 22301 Programme Elements
» Planners / Implementers / Systems
» Test & Exercise
» Managers / Oversight
» Discipline Experts & Threat Specialists
» Business Impact Assessment / Analysis
» Auditors & Reviewers
» Documentation and Sustaining
» BC Strategy
» Risk Analysis

–Could mirror entire organisation


–Apply good succession planning
&
–Cross training / Org change / CPD
BCP and Crisis Management

Crisis Management (strategic)


Business
Service Fraud / Major Accident
Process Market Crash
Disruption PR Issues / Leak
Failure

Business Continuity Management


Technology
Business
Site Environment
Other
Process Etc
Restore
Restore
Restore Restore Crisis
Strategies
Technology Disaster Recovery Other HR /
Continuity Insurance /
Software Hardware Network Data
Etc Legals /
Restore Restore Restore Restore Treatments Compensation /

Is there a BC plan for the CMT / ERT ?


Have they a back-up venue, PPE & communications system
BCP Process
Business/ Switches
Risk Reduction Work Area
Risk Evaluation and Control Comms
Measures
Project Hardware
Initiation and etc
Management Facilities, people
Business Critical Define Resource logistics, SLA
Impact Analysis Functions Requirements

Team Developing
Structure Business
Plan Design Plan Continuity
Data Input Structure Strategies

Written Plans

Awareness Emergency
& Response
Continuous Implement
Training Business
Improvement Continuity Strategy,
Resumptio Insurance
Maintaining n
Disaster & Risk Reduction
&
Recovery
Exercising
Gather Contact
Information
ISO 22301
Crisis & Incident Management
• Crisis Management Roles / Training
– Emergency Response
– Crisis or Incident / Investigation / RCA Team
– Incident Command – Site control
– CM Communications
– CM External co-ordination
– Implementers / Managers / Experts
– Resilience or Quality Planners
– Scenario & Practices
“all bases may not be covered”
So Crisis Management is the non standard procedure
undertaken by individuals or groups to attempt to save
the day and mitigate an incident or event

This is a “Resilience” approach / objective


It is applicable in a socio-technical system

• Covid-19 Global Pandemic and Crisis

• Russian Invasion of Ukraine


Prepare / Practice / Learn from Experience
– “Exaggerate”
• Safety in complex systems needs to learn from
millions of successful (variable) outcomes
• Experience and team wisdom
• Exaggeration “never waste a crisis”
• Culture of understanding / peer systems
• Culture of learning / leadership
• Example - Nuclear safety culture
From others including E Hollnagel
Safety Level

Global Leadership in High Tech, Health, Safety, Design and Risk Management
+353 87 2844488
Resilience – system that deals with recognition,
adaptation, return to BAU
• 4 key qualities
– Learning
– Responding
– Monitoring
– Anticipating
Developing the Resilience and or V’s Safety “compliance” culture is an
interesting aspect.
EG. Incident –>>> call the ERT / FA / CMT: >>>> Q. Does ERT/FA/CMT
comply with xxx or do they have flexibility ?
What is the embedded culture and resilience level……..?
Example R 115 …………… Air Accident Investigation Report
©
5 Resilience Principles

• Learning Environment – complex systems


• Good Habits – question & logs, tools etc
• Diversity – use and recognise (don’t restrict)
• No blame – objective to learn and avoid
• Teams and agility – synergy & balance / fun
Resilience,
Business Continuity,
and Safety,

Dr Diarmuid Moran
+353 87 2844488

You might also like