You are on page 1of 24

CERN Reliability Training

Andrea Apollonio, TE-MPE-PE

Acknowledgements: J. Gutleber, R. Schmidt

11/05/2022 2
Scope of the Training

 Raise awareness on the importance of RAMS studies in


the domain of particle accelerators
 Provide basics for performing reliability analyses to
CERN staff members
 Train 120-150 people from CERN over 4 years

 2 People from IMA Stuttgart


 1 from RAMENTOR

05/11/2022 3
First Training - Test

 Validate contents (quantity, complexity,...)


 Tailor contents to the accelerator domain

Participants (total ~20):


 CERN staff members WITH previous experience with
reliability analyses
 CERN staff members WITHOUT previous experience
with reliability analyses
 Students from IMA Stuttgart

05/11/2022 4
Contents
Days 1-3 (29th Feb – 2nd Mar):
1. Introduction
2. Mathematical Treatment of Reliability
3. Determination of System Reliability
4. Failure Mode and Effects Analysis
5. Fault Tree Analysis
6. Evaluation of Lifetime Tests and Failure Statistics
7. Availability
8. Modeling and Simulation of Availability

Day 4 (3rd Mar):


9. Hands-on training with ELMAS

05/11/2022 5
Institute of Machine Components
Department: Reliability Engineering

External and Internal Influences on Reliability

Reduced Greater
development costs complexity
Shorter Greater
development
time
- - functionality

-
-

Reliability
+ +
Increasing
+
Increased
product liability
Minimization of customer demands
19.-20.06.2013

fault costs

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 6


Institute of Machine Components
Department: Reliability Engineering

Power of Ten Rule:


Relationship Between Fault Costs and Product Lifetime

Fault prevention Fault detection


Opportunity for action Necessary
reaction
Development Purchasing Field
and and use
purchasing manufacture 100.00
Costs per fault

Influence on fault
10.00

0.10 1.00
Design,
Concept product
Planning, Production
phase purchasing Product utilization
development

Development Production Field


19.-20.06.2013

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 7


Institute of Machine Components
Department: Reliability Engineering

Reliability Methods in the Product Life Cycle


Qualitative

- Quality - Field Data - Recycling


- ABC-Analysis Management Collection
- Know-How Potential
- Design Review - Audits - Early
Specifi- - ....
- ... - FMEA Warning
cations - ....
- FTA
- ....
time - ....

Field
Planning Conception Layout Design Production Recycling
Usage
Quantitative

- Statistical - Field Data - Remaining


Reliability - Fuzzy Data - Weibull-Distr., exponential... Process Analysis Lifetime
Goal - Calculation - Testplanning Planning
- ... - ...... - ....
- Boolean Theory
- Markov Model
- ....
19.-20.06.2013

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 8


Institute of Machine Components
Department: Reliability Engineering

Accelerator Operation

[Source: CERN]

Result of equipment reliability, maintenance actions and protection requirements

Availability Protection
19.-20.06.2013

Tradeoff:

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 9


Institute of Machine Components
Department: Reliability Engineering

Survival Probability and Failure Probability

Survival probability R(t) as complement of failure probability F(t)


Density function f(t)

f (t )

F (t x ) R (t x )

tx Failure time t
19.-20.06.2013

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 10


Institute of Machine Components
Department: Reliability Engineering

Classification of Reliability Methods

Qualitative methods Quantitative methods

 Employ methodical, systematic procedures to  Make use of terms and processes from
determine possible faults/failures and their statistics and probability theory
effects  Provide (quantitative) probability values for
 Provide evaluation by way of estimation and reliability by way of calculation
classification
(not by calculation)
 Aim: (Qualitative) ranking of weak points

Methods: Methods:
 FMEA (FMECA)  Fault tree analysis
 Fault tree analysis  Boolean system theory
 Event tree analysis  Markov's theory
  Monte-Carlo method
19.-20.06.2013

Check lists / Design review

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 11


Institute of Machine Components
Department: Reliability Engineering

FMEA Form VDA 4.2


FMEA-No.:
Failure mode and effect analysis (FMEA)
System Page:
University of Stuttgart

Type/Model/Fabrication/Load: Item Code: Responsible: Compartment:


System element,
State: Company: Date:
System structure
System-No./System Element: Item Code: Responsible: Compartment:
and functions
Function/Task: State: Company: Date:

Potential Potential Potential Preventive Detection Responsibility/Date


S C O D RPN
Effects Failure Causes Actions Actions

Risk
Analysis and assessment

Optimization
19.-20.06.2013

S = Assessment value for the severity O = Assessment value for the occurrence D = Assessment value for the detection

RPN = Risk priority number

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 12


Institute of Machine Components
Department: Reliability Engineering

Boolean System Theory

≤1

Serial structure Parallel structure

𝑛 𝑛
𝑅 𝑠 (𝑡 )=∏ 𝑅𝑖 ( 𝑡 ) 𝑅 𝑠 (𝑡 )=∏ (1 − 𝑅 𝑖 ( 𝑡 ) )
𝑖 =1 𝑖=1

Example: Example:
System limit System limit
R1 = 0.9
R1  0.9 R2  0.9 R3 = 0.9 R2 = 0.9
R3  0.9

RS  0.9  0.9  0.9  0.729 RS  1  (1  0.9)  (1  0.9)  (1  0.9)  0.999


19.-20.06.2013

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 13


Institute of Machine Components
Department: Reliability Engineering

Top-down-method (deductive procedure)

TOP = undesired result

system failure vehicle failure

≥1

failure of the partial system … transmission failure …


≥1

failure of the assembly group … failure of the bearing output failure failure of the housing …
≥1

failure of the component …


failure of the
synchronization
failure of gear 1 failure of gear 2 …
≥1

… …
failure modes
fretting breakage pittings

≥1

component characteristic, … overload false calculations incorrect operation …

design flaw
19.-20.06.2013

DOWN = basic event (causes)

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 14


Institute of Machine Components
Department: Reliability Engineering

Exercise 5.4: Task

a) Create the system structure of the given simplified system for the top event “Safety
Critical Fault”. Consider only the safety critical failure modes.

b) Add functions, malfunctions and failure modes.

c) Create a qualitative Fault Tree based on a)-b).


Primary Fault to be
detected
(rate 10^-4 / h) Sensor Decision logics
(e.g. voltage (electronics board)
measurement) Actuator (e.g. Beam
Communication Communication Dumping System, …)
link 1 link 2

Functionality: Measure a given Transmit Decide if an Transmit Execute the


quantity to detect information to the emergency action information to the requested
possible failures rule-based process has to be actuator emergency action
19.-20.06.2013

undertaken
Accelerator
(i.e. LHC) MPS
© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 15
Institute of Machine Components
Department: Reliability Engineering

Exercise 5.6: Task

Create a Fault Tree of the given redundant MPS system and evaluate its failure probability
after one production year (t = 5000h).

String 1
Primary Fault to be Sensor Decision logics
detected (e.g. voltage (electronics board)
Actuator (e.g. Beam
(rate 10^-4 / h) measurement)
Dumping System, …)
Communication Communication
link 1 link 2

LHC String 2
19.-20.06.2013

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 16


Demonstrating Reliability: Test overview Thurel Yves, CERN Edition, 2015 17

Real Case: 113 PSU units (same ref) delivered @ CERN


 A PSU-LAB is in charge of the qualification / reception
 2 different tests were organized to check units reliability
 1st test: All units running during 24 hours (nominal conditions)
 2nd test: 5 units – only - running during 6 months.

Test duration counters


(1 per unit under test)

Up to 10 units under test


conditions

te-epc-lpc
05/11/2022
Re-Testing reworked units, with clear goal! Thurel Yves, CERN Edition, 2015 18

The 0-failure test was successful


(done on reworked units) These
 Even if testing only distributions
10 units makes the are still
possible
1st failure heavier
in the distribution This point was reached
without any failure
test duration was (34 000 hours in total)
long and exclusion (1 failure over 10 tested units = 7 %,
using Benard’s approximation)
of steep beta
Cumulative
Distribution Function
34 000 hours
becomes possible. both in total

Compare to 100 units


running 340 hours !
(1 failure over 100 tested units
would represent 1 %, but exclusion
of high beta is very poor)
te-epc-lpc
3 400
05/11/2022
Institute of Machine Components
Department: Reliability Engineering

Definition of availability (accelerator specific, adaption of


availability definition from literature) [Source: Diss. Apollonio & CERN]

 Availability for Physics (PA):


“It is the probability that an accelerator provides beam to the experiments for measurements or to
the next accelerators.”
… Run Time (RT): scheduled operation time of the accelerator
… Stable Beams (SB): effective time for beam delivery to experiments
or other accelerators

 Beam Availability (BA):


“It is the probability that an accelerator can be operated with beam at time .”
… Machine Cycle Time (MC): time needed to achieve nominal operating
conditions of the accelerator, when no faults occur

 Machine Availability (MA):


“It is the probability that an accelerator can be operated at time , even without beam.”
… Fault Time (FT): time needed to clear a system fault
19.-20.06.2013

NB: an extension of the availability definitions for the injectors is currently under discussion.

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 19


Institute of Machine Components
Department: Reliability Engineering

The role of injectors – simple sketch


[Source: CERN]

7 TeV

LHC

Beam to different
destination

450 GeV (EP to LHC)


SPS

Beam to different
26 GeV (EP to SPS)
19.-20.06.2013

destination
PS

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 20


Institute of Machine Components
Department: Reliability Engineering

Step 2: Construction of the transition matrix and of the system of


differential equations

Row-wise
 Transition probability:
 Product of status probability and transition rate
 Arrows towards a state: positive
 Arrows away from state: negative
System of differential equations
λ
d
p1(t )  p1(t )      p2 (t )   
dt
1 2
d
p2 (t )  p1(t )     p2 (t )    
µ dt

Summary:
19.-20.06.2013

Hint Based on the transition graph, possible state transitions and their related
transition rates will be arranged in a transition matrix.

© Institute of Machine Components, University of Stuttgart │Training “Reliability (Basic Level)“ 21


ELMAS – Hands On Training

05/11/2022 22
Feedback

05/11/2022 23
Thanks a lot for your attention!

11/05/2022 Document reference 24

You might also like