You are on page 1of 220
GOLIPU CAINE) paijuad-Hyigelay 3 au cS = bo flea sy eds ‘(Gua wean) esa A — “sao snizo 2670 pas ton Kenn 00098 ‘equued ‘pmsorntions 9) sesse fastud jo Auden eunoin bue Moje *tionpord “ayaeHo) ayy ye powedL00 as] jue ope ‘siebeuew aouruauszu oy ah aH WN 9609 SH ; sxeuuoped soveuaurea Jvowanseou oy pue Bupuy-ainey “YS Jo uowebeueW ey 010 lun ‘soz ove suonoany ote 4 ‘Suro ungpuco vo PUEEL uo saad oo} vew aiow sepngut uy stn SewdoaRaP ue00) ‘4.299 jeodiou of pasa: Aansuagexturo uoog sey uoape pucoes exe oq pas fou pur you YH o use pape Aeyedu pUE EOE J wo} oy w aouauedeo sup sasuowuns 009 stu, “yodsuEy pu Saeounusve9qe Suu seas Gurung ‘san, pow “NDE Pu 5 vere) san (SuunpenueW poo} puefcaneoewueyd teawseypaNed sed "pars ejqousine fyeseds9) Byumrejuew yo Sad ye apraut 15 sou Sogunco Ze ses ogg we tcu UO “HOE eANEMED SPO > 5 pue oy fede of sn pda ancy sepeoese sy pve syne ay, bau reawonua je *igend onpoxd ygees pu Aegere werd w suBueKox ives pur pavers ‘pce 0 spe WY ‘Sais eouBvEIIPY SSE 20% cose} fom egooyersc0 sO ag se SpUBSSAN eoueUeRIELL ‘pesos ey 0p 9 ueKy wn SSN Joi FeYH Op aru Sse ‘sud em eros or 2p eq isn en - Kreagus fnuoiee psn ssan0id © | eoueueTUleH Butterworth-Heine se House, Jordan Hill, Oxford OX2 SDP 225 Wildwood Avenue, Woburn, MA 01801-2041 A division of Reed Edue: QA morber af the Rest Bevir pe group OXFORD BOSTON JOHANNESBURG: MELBOURNE NEW DELHI SINGAPORE ly or incidentally ‘some other fhout the ‘written permission of the copyright holder except in aceordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a icence issued by theCopyright Licensing Agency Lid, 90 Totenham Court Road, London, England W1P 9H Applications for the copyright holders written permissi to reproduce any part of this publication should be addressed to the publishers British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ‘Typeset hy the author Printed and bound in Great Britain | For Edith Contents Preface Acknowledgements tenance ‘The operating cor Different types of fun How functions should Functional Failures ares What is a failure mode? Hidden and evider Safety and envi oF 80 90 90 92 4 viii Reliability-centred Me 5.4 Operational consequences 55 consequences, 56 ie consequences 37 6 ive Maintenance 1: Preventive Tasks 6.1 ity and proactive tasks 62. Ageand deterioratio 6.3 Age-related failures and preventive m 6.4 Scheduled restoration tasks 6.5 Scheduled discard tasks, 646 Failures which are not age- 7 Proactive Maintenance 2: Predictive Tasks Ta nndition maintenance 12 73 n tasks 74 s of on-condition techniques. 15 of the pitfalls 16 F curves 1 ine the P-F interval 7.8 When on-condition tasks are worth doing 7.9 Selecting proactive tasks 8 — Default Actions 1; Failure-finding 8.1 Default actions 82 we-finding, 83 8S lure-finding 9 — Other Default Actions 9.1 No scheduled maintenance 9.2 Redes 9.3. Walk-around checks, 10 The RCM Decision Diagram 10.1 Integrating consequences and tasks 10.2 ‘The RCM decision process 10.3 Completing the decision worksheet 10.4 Computers and RCM. Contents Implementing RCM Recommendations Implementation - the key steps ‘The RCM audit ‘Task descriptions Implementing once-off changes Work packages Maintenance planning and control systems Reporting defects Actuarial Ai 15 1s 152 153 Appendi Applying the RCM Process Who knows? RCM review groups uplementation strategies RCM in perpetuity How RCM should not be applied Building skills in RCM What RCM Achieves ance performance menance effectiveness iency What RCM achieves A Brief History of RCM ‘The experience of the airlines RCM in other sectors Why RCM 2? I: Asset hierarchies and functional block diagra Appendix 2: Human error Appendix 3: A co) am of risk Appendix 4: Condition monitoring Bil jogeaphy Index 212 212 214 218 220 221 224 233 235 235 250 261 261 266 269 277 284 286 291 292 292 293, 304 307 318 318 321 323 327 335 343 348 412 als Preface ‘Humanity continues to depend to an; generated by highly mechanised and aut more ai ortrains run on time, More than ever, these depend in tu of physical assets. ‘Yet when these as are these services ‘ory — incidents which have y Bhopal and Piper Alpha, Asa evolved completely new strategic framework foren: ‘continues to perform a: * and prepared by Stanley Nowlan and the late Howard Heap in 1978, The report provided a comprehensive deseription of the develop- ‘ment and app xii Rel nee ted inthe UK in 1991 andthe USA xduction to RCM2. inued to evolv he first edition of this book (put 1992) provided a comprehensive that it bec: ry tor developments. Several new chapters have been added, while others have been revised and extended. Foremost among the changes are: ¢ comprehensive review ofthe role of functional a ion of failed states in Chapters 2 and 3 + a much broader and deeper look at failure modes and effects a inthe contextof RCM, with special emphasis on the question of levels of analysis and the degree of detail required in Chapter 4 «+ new material on how to establish acceptable levels of risk in Chapter 5 ‘and Appendix 3 ure- of more rigorous approaches to the determination of finding task intervals in Chapter 8 1mendations in Chapter process ial on the measurement of the overall performance of the ince function in Chapter 14 + abrief review of asset hierarchies in Appendix mary of the (often overstated) role played by functional hierar functional block diagrams in the apy «+ areview of different types of human errorin Appendix 2, together with ‘alook at the part they play in the failure of physical assets he Inthe been sul wn of no fewer than 50 new techniques to the appendix on sn monitoring (now Appendix 4), impression of the second edition, the word about ri xl in Appendix used in the world of risk, includes further mat Preface xiii is written for those who only wis /-centred Maintenance. + Chapters? to 10 descr and will be of most technical grasp of the subject. ining chaptersare for those who Wi f summary of the key achieves. 15 provides a brief history of RCM. JOHN MOUBRAY Lutterworth Leicestershire Serptember 1997 Acknowledgements his colleagues in the civil avi 1g work field. ‘sare also due to Dr Mark Horton, forhis help in develop- the concepts embodied in Chapters 5 and 8, undtoPeter Stock 10 co-author Appendix 4. snbers of the Aladon network for their help in applying the concepts and for their cor feedback about what works and what doesn’t work, much of whic reflected jack, Chris James, Hug! in Katehmar, Sandy Dunn, Tony Geraghty, Frat Amarra, Phil Clarke, Michael Hawdon, Brian mon Deakin and Theuns Koekemoer. “Among the many clients who have proved and are continuing to prove that RCM is a viable force in industry, 1 am especially indebted to the following: Gino Palarchio and Ron Thomas of Dofasco Steel ina of Ford of Europe Mike Hoperaft, Terry Belton and Barry Joe Campbell of the British Steel Corpor Vincent Ryan and Frank O'Connor of the Irish Electricity Supply Board of Hong Kong Electric ind of the New Venture Gear Company Udy, Roger Crouch, Kevin Weedon and Malcolm Regler of the Royal Navy Don Turner and Trevor Ferrer of China Light & Power John Pearce of the Mars Corporation Dick Pettigrew of Rohm & Haas Pat McRory of BP Exy Al Weber and Jerry ‘ladon Lid for permi sion Worksheets and the RCM 2 Decision Dia 1 Introduction to Reliability-centred Maintenance 1.1 The Changing World of Maintenance jenance has changed, pethaps nance is also responding to changing expectat ly growing awareness of the exten ntoa coherent pattern, $0 those likely 10 be of most | sharply. This led to increased mee! types were more numerous and more complex. to depend on them. is dependence grew, downtime came into sharper focus. This led res could and should be prevented, which mainly of equipment overhauls done at fixed intervals, aintenance also started to rise sharply relative to other led to the growth of maintenance planning and to bring maintenance under control, ein indy thas gathered der the head- Introduction to Reliabi New expectations ccontred Main Figure 1.1 shows how expectations of maintenance have evolved. Figure 1 Growing expectations of maintenance ‘Second Generation: + Higher plant availa First Generation ceetahonit | "bene" ecuipm * Longer equipment ite a Lower costs + Greater cost attectveness, 7940 1950 1960 ~«=«1970 += 1980 ~=—«1990 2000 progress mean t stop a whole pl automation has meant that achievement of sp More and ‘quences, ata time ‘on the integrity of our physical asset which becomes a simple matter of org: our dependence on physi ‘To secure the maximum ret they must be kept working eff Figure 1.2 shows how the earl re Was simply that as things got Third Generation 49501960 -=«1970~=«1980-= «1990-2000 been explosive growth in new m niques. Hundreds have been developed over the past fifteen yea ‘more are emerging every week. Third Generation: + Condition monitoring ‘Second Generation: ‘computers + Scheduled overhauls | + Failure modes and elfects| analyses + Expen Systems Big, slow computers ikiling and teamwork 1960 1970 ~=«1980 =< 1990-2000 Figure 1.3: Changing maintenance techniques ‘The new development I, problems are created while existing problems only get worse. The challenges facing maintenance Ina nutshell, the key challenges facing modern maintenance managers can be summarised as + to select the most appro 6 Reliability-centred Maintenance RCM providesa framework which enables users to respond to these chal- lenges, quickly and simply. It does so because it never loses sight of the fact that maintenance is about physical assets. these assets did not exist, the maintenance function itself would not exist. So RCM starts with a comprehensive, zero-based review of the maintenance requirements of set in its operating context 1 too often, these requirements are taken for granted. This results the development of organisation structures, the deployment of resources and the implementation of systems on the basis of incomplete or incorrect ‘assumptions about the real needs of the assets. On the otherhand, ifthese requirements are defined correctly in the light of modern thinking, it possible to a remarkable step changes in maintenance effi- ciency and effectiveness. The rest of this chapter introduces RCI saning of ‘maintenance’ it seven key steps involved ac more detail. It begins by If. It goes on to define RCM. ing this process. 1.2 Maintenance and RCM From the engineering viewpoint, there are two elements to the manage- ‘ment of any physical asset. It must be maintained and from time to time it may ‘The major dictionaries define maintain as cause fo continue (Oxford) or keep in an existing state (Webster). This suggests that maintenance ‘means preserving something. On the other hand, they agree that modify something means to change it in some way. This distinction between ‘maintain and modify has profound implications which are discussed at length in later chapters. However, we focus on maintenance at this point. ‘When weset out to maintain something, whatis it that we wish to cause 10 continue? What is the existing state that we wish to preserve? “The answer to these questions can be found in the fact that every phys- ical asset is put into service because someone wants it to do something, In other words, they expect it to fulfil a specific function or functions. So it follows that when we maintain an asset, the state we wish to preserve imiust be one in which it continues to do whatever its users want it to do, Maintenance: Ensuring that physical assets continue to do what their users want them to do Introduction to Reliability-centred Ma tenance 1 What the users want will depend on exactly where and how the asset is being used (the operating context). This leads to th ‘ion of Reliability-centred Maintenance: nwing formal de- Reliability-centred Maintenance: a process used fo determine the maintenance requirements of any physical asset in its operating context In the light of the earlier det RCM could be *a process used to determine what that any physical asset continues to do whatever its users its present operating context’ 1.3. RCM: The seven basic questions ‘The RCM process eat tem under about the asset or sys- ‘+ what are the functions and associated performance standards of the asset in its present operating context? + in what ways does i fail to fulfil its functions? + what causes each functional failure? * what happens when each failure occurs? + in what way does each failure matter? + what can be done to predict or prevent each failure? + what should be done if a suitable proactive task cannot be found? ‘These questions are introduced briefly in the f then considered in detail in Chapters 2 to 10. lowing paragraphs, and Funetions and Performance Standards Before itis possible to apply a process used to determine wh done to ensure that any physical asset continues to do whatever want it to do in its present operating context, we need to do tw + determine what its users want it to do, must be + ensure that itis capable of doing what its users want to start with, ‘which summarise why the asset was acquired in the has speed, ot ary functions, Users also have expecta- containment, comfort, structural ney of operation, compliance with the appearance of the asset, ‘The users of the best position by far to know ctly what contribution each asset makes to the physical and financial a remarkable amount - often a fright actually works. ves of maintenance are defined by the functions and associ- smance expectations ofthe asset under consideration. But how objectives by adopting a suitable approach to lure. However, before we can apply a suitable blend ‘what failures can occur. Introduction to Re Ma In addition to the total inability to f ses partial failures, where the asset F ures are discussed at greater Iength in Chapter 3. Failure Modes fy e. These ev elude tenance regimes, and. considered to be real possi Failure Effects ‘The fourth step deseribe what happens when ‘+ what must be done to repair the The proces ‘and failure e {for improving pe Failure Consequences led analysis of an average industrial undertaking between three and ten thousand possible failure modes. Each of these the organisation in some way, but ineach case, the effects alfect operations, They may also affect product <, safety or the environment. They will all take ime and cast money to repa these consequent recognises that the consequences of four groups, as follows: ‘* Hidden failure consequences: bbutthey expose the organi catastrophic, consequences, (Most protective devices which are not fail-safe.) could hurt or kil a breach of any corporate, regi itaffects pro ating costs in addition to the direct cost of repa res which, The consequence evaluation process is discussed is chapter, and in much more detail in Chapter 5. The next sect chapter looks at proactive tasks in more detail Proactive Tasks Many people Conditional Probability of Failure —> The traditional view of failure 12 Reliability-centred Maintenance Figure 1-4is based on the assumption that most tems operate reliably for period °X’,and then wear out. Classical thinking suggests that extensive records about fa mnable us to determine tis life and so make plans to take preventive action shortly before the item is due to fail in Future forcertain types of simple equipment, and for some dominant failure modes. In particular, wear-out char acteristics ar often found where equipment comes into direct contact wit the product. Age-related failures are also often associated with fatigue, corrosion, abrasion and evaporation. However, equipment in general i far more complex thanit was twenty led to starting changes in the patiems of failure, as ‘Shown in Figute 1.5. The graphs show conditional probability of failure against operating age for a variety of electrical and! mechanical items Pattern A is the well-known bathtub curve. It begins with a high re (known as infant mortality followed by aconstant or a .gconditional probability of failure, thenby awear-out zone. Paitern B shows constant or slowly increasing conditional prob ability of failure, ending in a wear-out zone (the same as Figure 14). Figure 15: of faire Introduction to Reliability-centred Maintenance B Pattern C shows slowly increasing conditional probability of failure, but there is no identifiable wear-out age. Pattern D shows low conditional shop. then a Ws a constant con- re), Pattern F starts, to aconstant or very with high infant mortality, which drops event slowly increasing conditional probat to pattern A, 2% t0B, 5% t0C, 7% to D, 14% to Eand no fewert 1 pattern F. (The number of times these patterns occur in aiter nevessarily the same as in industry. But there is no doubt th: become more complex, we see mor ity and operating age. This bel ‘more often an item is overhauled, the less likely it this is seldom true, Unless there is a dominant age-s Nowadays, re mode, I failure rates by le systems, fons to abandon the introducing infant mortality into other An awareness of these facts has led some or thing to do for failures wit consequences are significant, something must be done to prevent or pre- dict the failures, or at least to reduce the consequences. ‘This brings us back to the question of proactive tasks. As mentioned earlier, RCM divides proactive tasks into three categories, + scheduled restoration tasks + scheduled discard tasks + scheduled on-condition tasks. id scheduled discard tasks smponent or overhaul- Scheduled restoration Scheduled restorat ing an assembly at or before a dition atthe tine, Sin at or before a specified life ti Collectively, these two types of tasks are now generally known as pre ventive maintenance. They used to be by far the most widely used form tenance. However forthe reasonsdiseussed above, they sed than they were twenty years ago. need to prevent certain types of failure, and the growing, techniques to do so, are behind the growth of new (ypes of failure management, The majority of these techniques rely on the fact that most failures give some warning of th hey are about tooccur. These warnings are known aspotential failures, and are defined fable physical conditions which indicate that a functional fail- 10 occur or is in the process of occurring, TThe new techniques are used to detect potential failures so that action ‘can be taken to avoid the consequences which could occur if they degen- decisions in this area to be made with particular confidence, also covers once-off changes to procedures. tenance: as the name implies, this defau mak default is also called run-t0-fa The RCM Task A great strength of RCM is the way it provides simple, precise and easily understood criteria for deciding wt any) of the proactive tasks is technically feasible in any context, and if so for deciding how often they should be done and who should do them. These criteria are discussed in Chapters 6 and 7 tion Process Introduction to Reliabi Tow level. If snnot be found then a scheduled failure-finding task mustbe performed. fa suitable failure-finding task cannot be found, then the secondary det designed (depending lures with safety oF nly worth doing if it reduces the risk of that f not be found which reduces the item must be redesigned or the process must be changed. on economic grounds. IF it is not j isagai the secondary default decision is once again red ‘This approach means that proactive tasks are only specifi which really need them, which in turn leads to st routine workloads. Less r0 quences apply in different operating contexts. Th rumbers of schedules which are wasted, not because they are ‘wrong! in the technical sense, but because they ae! RCM process considers the maintenance require necessary to reconsider the mentsol design, today has to mi there or what mi here at some stage inthe future, 1.4 Applying the RCM process Before s to analyse the maintenance requirements of the asset in any organisation, we need to know what these assets are and to decide {which of them are to be subjected to the RCM review process. This meat er must be prepared if‘one does, ready. In fact, nowadays already possess plant Aly applied, RCM leads to remarkable improvements in tnait- veness ‘aly. However, the successful appl preparation, The key elements of the planning process are as follows: it from the RCM process, 9s justify the investment, decide in detail is, when and where, ining clearly understood, and arrange for them (o receive appropri + ensure that the operating context of the asset 0 Review groups ‘We have seen how the RCM process embodies seven basi The us me Facilitator only enables management ain access to the Operations ‘Supervisor Engineering Supervisor Craftsman. but the members (M and/or E) themselves gain a greatly enhanced under- standing of the asset in Operator groupreaches conse ing the enthusiasm and c Improved operating performance (« ‘mer service): RCM recognises th .s of an RCM analy Je manner suggested abov schedules to be done by the maintenance department procedures for the operators of the asset if changes must be made to the design of the s where the new maintenance programs ~ trial which is trating, and error which can be very c Greater maintenance cost-effecti ‘on the maintenance 1.5 What RCM Achieves Desirable as they are, the outcomes listed above should only be seen as ey should enable the maintenance func mas listed in Figure 1.1 at the beginning of following paragraphs, serateh, nance progr by more and more regulators). Fit ves the eff cach asset also intain each Avaluable of individuals, especially people who are involved cess, ‘This leads to greatly improved general under- CM provides ood tech me who aintenance. ‘This gives maintenance tanding in (and cannot) achieve ar ‘must be done se issues are part of the m: ‘many arealready the ing everyone who has any- results very quick! applied, RCM reviews can pay for matter ‘of months and sometimeseven a matter of weeks, hapter ‘ransform both the perceived maintenance requirements used by the orgat and the way in which the is perceived. The result is more cost- 2 Functions why the RCM process + whatarethe functions and associated performance standards of the present operating context? tions should be listed. = is not complete unless performance desired by th For instance, the primary function ofthe pump in Figure 2.1 woul pump water from Tank X to Tank Y at not less than 800 itres por complete function statement consists ofa verb, ied of performance desired by the user Thisexample: ‘an object and the st A function statement should consist of a verb, an abject and a desired standard of performance 2.2. Performance standards jon (also known as ‘chaos’ or rest whatever process is causing the system to det into a tank from which the ar. This happens regardless of whether iheimpelieris made i wear tothe point that sure I: Initial capability vs desired performance able to deliver standard of pet miship between this eapabi- \d desired pert liowing for deterioration only restore the asset beyond it 24 Reliability-centred Maintenance INITIAL CAPABILITY (What it ean do) porate more For example, one function of a chemical reactor in a batch-type chemical plant might be listed as: + To heat up to 500 kg of product X from ambient temperatu (125°C) in one hour. Inthis case, the weight of product, the temperature ange. Figure 2.3: A maintainable asset For instance, ture 2.1 had an ‘ince the maintenance pro: ‘gram does not existwhich makes pumps bigger, maintenance cannot deliver the red performance inthis context. Sir ‘of maintenance will make this motor big enough. It may be perfectly adequately ‘signed and bultinits own right tjust ‘cannot deliver the desired performance the need to be precise, itis se tive perforn statements, For inctanc a painting is us not ‘attractive), What is meant by ‘acceptable’ var person andisimpossible to quantity. Asa res care to ensure that they share a common understanding at what is meant by words lke ‘acceptable’ before setting up a system intended to preserve that acceptably te performance statement which contains no performance standard at all usually For instance, he concept of containment is associated with neary all enciosed on statements covering containment are often written as follows: = To contain liquid X athe liquid, and that an ‘an enclosed system can folarate some leakage, the amount ated shouldbe Incorporated.asa performance standardin th ialcapabty the worstease'load, which in ablity dacs not drop case it would automat range ol performance expecations. Figure 2.5: Variable performance standards Forexample, a grinding machine used'o finish grinda crankshaft wll not prods ‘on every journal. The diameters wil ing machine ina foad factory wil not thoxactly the same weightof food. The weights wil vary, Mean desired performance. triesare spending a gre. and energy on designing proc spect of design and develop’ wwe are concerned purely discussed further in Chapter 3. 28 —Reliabil 2.3 The Operating Context fined as “a process used to determine the my yg context egy formulation process, rogram is being devel to Endburg. Betore the vehicle can be defined, they thoroughly under- route? What loadis the tuck carrying What speedlimits andother regulatory constraints apply tothe ties exist along the way? these questions might lead us to define the primary function jows: To transport up to 40 tonnes of steel slabs at speeds | rage 45 mph) from Startsville to Endburgon one tank ot fue’ Endburg may det ‘Notonly does the context drast affects the n is and consequences, how often ‘what must be done to manage them, sider again the pump shown in Figure 2.1. It were moved to imps mildly abrasive slurry intoa TankBfrom when the slurry “of 900 lites per minuta, the primary function would be: ‘900 litres per minute. ‘isis higher performance sta ard towhich t hae tobe maintained rises acco process must ensure that they have a erystal clear underst context before they start. Some of the most in red are discussed in the fol ‘most of the machines are In low processes, the f asset can cither stop the entire and-by plant is available. Or tures only curtail the output ofa of such failures are determined The importance of redundancy is ilustrated bythe three identical pumps shown in Figure 2.7, Pump B has a stand-by, while pump A does not. Figure 2.7: Stand Alone Stand-by Duty Ditlerent operating contexts ‘Thismeans that the primary function of pump A to another on is own, and that of pump B to do ‘This itference means thatthe maintenance requirements ofthese pumps wl 4.4 How Much Detail? it was mentioned th: tobe possibte toselect Failure modes should be defined in enough detail for it to be possible to select a suitable failure management policy Failure Modes of the key factors which need to} following paragraphs. Causation remoter corners of the psyche of the operators define so-called root causes of failure. ‘The exte ‘failure modes can be described at di of detail is Figure 4.7 is based on the pump set shown in Figure 42, some of whose f ‘modes were listed in Figure 4.1. Figure 47 lists ways in which the pump but it would very seldom be necessary to do Soin practice the failure modes listed onty apply to the functional failure ‘unable to transter water at al. Figure 4.7 does not show failure modes which would cause other res, such as loss of containment or loss of protection. {WEIap 10 S010) 1uBLEYID 18 Sapo BIW ponusuod) £°p andi He eS ET a ee de erage Foreign emo eee "= Taster may ——~Ntwosisaresed ee : es nasn myname PSE | ae tig aa ape | ~arnaees argo — Rose ee me fees worst Wenge oe | oe ean | polars ~~ weg gaaied Despre | veer “Wore sor ugiee Posteo aaa a RE Ha aS Ra ————Sre pet] “in amar gant FaTs on as SR a Ssaeors : ole erg obec ees son Suen ae FARSI Aer e1 z RC ee dan 3 “ag each ang : __ Eichiseoeipe = a Serine ey ——— + 3 “caagamanar — Co mateo sag Sic g ong rorenbpoatca Des er g —sramesycopatonay — tag yee = Se 8 “rap sara aera sar 3 se Se aa ay DG : “Eavrscpls—— ssw oe : Sees! —— eee a ‘oy sealer 08 ne g a = “rag seatgaciog ——— Benge ter Sra? zs _rrarerrana a oases — soln See — * a Pengsad Suoueditat— Peco aer See ern eT we] eT HAE aT frases vais — Bhs Til oo ge Wetr under — “Uieston aire Basrrg eas Sara = zi - ‘Weg onic sted aralacureg ot “Baa eG PARTE Carag pee nsaiaon ——Beatng copped sia Sarstsepig ener | Bear daviagad Wael — Prcuere er ‘aga AES — argh ane Reel ener | Became Teeny efor Seo dpe —— isch by WEaI6T eB Sgped — uansacg 6 — “Beata cote nts Sere or ~ gear sures ‘tg bet sgt ——— Poorer “Nerves — ior wietsrecay — ae ‘Siar wndg Motrin as ures See Apperae ‘al ar andiea ‘Gpering enor See Anan? ovr ea oss “esa ap rea, ‘Casey gastro faec ise sre on ci __Noteresrg ate see 1s —— Fara “Fe leo wor 35 spe — farts Sea B aoupuasupyyy p2ustie>-S) y s122{f puo sopoyy aun 68 Reliability-centred Maintenance ee] eT war ‘Shak sears ndash St sna Weng ras acts see hak ears dst aig Pong setts Sei as ‘ese ‘Cover ty a Cover rated ‘acu ‘sera or a Pons ia ‘Settee sree t ng as To Tg hie Gee. RE co aa ei | Ba Bess 2sSs ley | t 3 ae Re a | toe AD k FRE Fede iz GE ke e i PGR oF of 4 Rs aoe Figure 4.7 (continued): Failure modes at diferent levels of detail ‘modes that can be listed, Forinstanos, there are ive failure modes lstedatlvel2forthe pump setin Figure. ‘Two more key issues which arise from Figure 4.7 concern “root causes’ and human error. They are discussed below. Root causes ‘The term ‘root cause’ is often used in connection with the analysis of fi ures. implies that if one drills down far enough, itis possible to ative and absolute level of causation, In fact, this is seldom the case, istod 3c was distracted (level 8). He might have been distracted because his ‘child was il (level 9). This failure might have occurred because the ‘child ate bad food in restaurant (level 10). ost Forever ~ way. FMEA has any con: re level at leo identify an appropriate failure management policy (This isequally true whether one is carrying out an FMEA before failures ‘occur or a ‘root cause analysis’ after a failure has occurred.) ‘The fact that the level which is appropriate varies for different f ‘modes means that we do not have to listall failure modes atthe same level ‘on the Information Worksheet. Some failure modes might be identified at level 2, others at level 7, and the rest somewhere in between. For instance, in one particular context, it may be appropriate to list only those failure modes shaded in grey in Figure 4.7. In another context, t may be appro priate foran entire FMEA for an identical pump set to consist ofthe single failure ode ‘pump set fails. Another context may call for yet another selection Obviously, in order to be able to stop at an appro doing such analyses need to be aware of ofthe f ‘ment policy options. These are discussed at length in Chapters 6 to 9. Other factors which influence the level of detail are considered in the rest of this part of this chapter and again in part 7 level, the people 70 Human error Part 3 of this cl wre modes are thought in the FMEA. This has been done in Figure 4.7, where all ides ending with the word ‘error’ are some form of human error. AP summary of key issues involved in the classification and management of such errors. Probability Different failure modes occur at different frequencies, Some may occur tervals measured in months, weeks or even days. jprobable, with mean times between occur cf years. When preparing an FMEA, deci- ilure modes are so unlikely When listing failure modes, do not try to list every single failure possibility regardless of its likelihood right reasonably be expected to occur in the ‘of ‘reasonably likely" fail~ ‘ure modes shot jude the following: + failures occurred before on the same or similar assets. These candidates for inclusion in an FMEA un- less the asset has been mo +, sources of information about these failures include (your own employees, vendors orother technical history records and data banks. Part 6 of this chapter about the records, and in Chapter 12 about (0 ensure that none of these failure modes have been overlooked is to study existing maintenance schedules and ask failure mode would occur if we did not do this task?” Failure Modes and Effects A 1 For example, ‘seal pump shown in Figure 4.7. This means that FS low —s0 low that it would not be included in most FMEA's. On the other hand, > of lubricant probably wot ted components, centralised lube systems and gearboxes, lure mode should be tempered by Consequences Ifthe consequences failure possibilities For instance, the pump satin Figure 4.7 wasinstalledin a food factory ora vehicle FMEA. (On “where things must be switched on in a particular sequence and som be damaged if they are not - then ths falure made should be considered.) Cause vs Effect Care should be tak modes. TI to the RCM pro For example, one plant had some 200 gearboxes, al performing more or less the same function onthe same type of equipment. Ey, the folowing failure modes were recorded for one of these gearboxes: * Goarbox bearings seize confuse causesand effects when| tne gearboxes were twenty years old). The fl lication was th it shouldhave been, sothe gearboxes had actu that any ofthe gearboxes had failed ithey had been the failure mode was eventually recorded as: + Gearbox fails due to lack of oi “This undortined the importance o the obvious proactive task, which was to check the oll level periodically. (Tis is not o suggest that all gearboxes should be ana- lysed it iplex or much more heavily loaded, Bigso are subject to a wider variety of failure modes. In other cases, the fature ‘consequences maybe much more severe, which would callfora more defensive view of failure possibilities.) ire Modes and the Operating Context We have seen how the functions and fur influenced by its operating context. This is also true of fai three pumps shown in Figure 2.7. The failure modes. the stand-by pump (such as brinelling ofthe bearings, Stagnation of water in the pump casing and even the ’borrowing’ of Key compo nents to use elsewhere in an emergency) are different from those which might falfect the duly pump, as sot out in Figure 4.7. ‘The operating conte \dconsequences of fai 4.5 Failure Effects TThe fourth step in the RCM review process entai when each failure mode occurs. These are known as fa Failure effects describe what happens when a failure mode occurs (Note that failure effects are nor the same as fi at must be done to repair ‘These issues are reviewed ‘ed Maintenance Evidence of Failure Failureeffects should be described ina way which enables the teamdoing the! become evident tothe ‘operating crew under norm For instance, the description shou! lights to come on or alarms to sot ‘on a local panel or in a central both). ye description should state jous physical effects suchas loud noises, escaping steam, unusual s also state whether the m ine shuts down as a result of the failure, bearings ofthe pump shown + Motor tips out ‘sounds after 20 minutes, and tank runs dry after 30 minutes, Down ipalarm sounds in the controlreom. Tank ¥ low evel alarm ied ‘ould be partially removed by the periodic injection of special materials i sear, a process known as jet bl xe etfecis were de- aly as fol jency dectines and governor compensates to sustain power ‘output, causing exhaust temperature to rise. Exhaust temperature is displayed fon the local contro! panel and in the central contro action is taken, ‘exhaust gas temperature rises above 475°C under full power. A high exhaust ‘92s temperature alarm annunciates on the local contro! panel and a warning {ight comes on in the central control room. Above 500°C, the control system Shuts down the turbine, (Running at temperatures above 475°C shortens the ree life of the turbine blades.) The blades can be partially cleaned by jet blasting, and jet blasting takes about 30 minutes. complex failure mode, so the description of the fail- ‘The average description of a ‘When deset failure consequences by using the words ‘hidden’ or process, and using them prematurely mis should state briefly what would happen ifthe protected device were to fail while the protective device was unserviceable. could happen. Examples include: sed risk of fire or expl cape of hazardous cl (gases, liquidls or solids) ressure vessels and hydraulic 5 terials ng components ments, |< pownTime Wiachine| Find the | Diagnose [Find the] Repair [Revalidate |Put the Hopsce| (parson | the fault | spare’ | the faut Jor test he |machine vino can arts ‘machine |Back into. repaint service >| REPAIR LS TIME Figure 48: Downtime vs repair time 1eas defined above can vary greatly for different occurrences of and the most serious consequences are usually caused nen the failure occurs ona normal day ‘occurrence, we ist the former. uch night shifts are a reak of course possi sduce the operational consequences of a failure toshorten the downtime, most often by reducing the time + we should base the assessment of consequences on the wean’, as discussed above, lure should be recorded, because this can to downtime, any other le he operational c ies include: + whether and how product qu: if so whether any financial per ‘+ whetherany other equipment ora ther the failure leads he direct cost of re 2 ri switch about 30 minutes + Downtime to strip the turbine and replace the disc about 2 weeks. 4.6 Sources of Information about Modes and Effects ‘When considering where to get in mprehensive FMEA, remember ‘much emphasis should be pl ‘When carrying out an FMEA, springs to mind first isthe mat new equipment. In some indus -d Maintenance 19 78 ‘Reliability In practice, few manufacturers are involved in the day-to-day opera- tion of the equipment. After the end of the warranty period, almost none get regular feedback from the equipment users about what fails and why, that many of them can do is try to draw conclusions about how -cdotal evidence FMEA's ~ prepared by third parties. They may cover and an analysis of ‘occurs, in which ease lawyers tend ical discussion about root causes often ceases.) ing context of the equipment, desired standards of performance, failure xs and maintainers. More ese issues. As a result, generic and often jbvious and very valuable source of infor bo what can go wrong with commonly used assets, provided of regulatory bodies (as in the same organisation. However, note the above comment cal support, confid ing stage, so that everyone knows clear chnicians rather' However, they should be treated wi key describe what was done to repair the failure 2 what caused other failure. ‘These drawbacks mean th supplementary source of inform: never as the sole source, to-day basis. They tend toknow the most abo. ‘equipment works, much each failure matters and what must 't know, they are the ones who have the on their knowledge is to arrange ¢ preparation of the FMEA as part all RCM process. The: sfa suitably trained facilitator ata: gs. (The most valuable source ional information acompre- hensive set of process and instrumentation drawings, coupled with teady access to process and/or technical specialists on an ad hoc basis.) This approach to RCM was introduced in Chapter I and is discussed at much greater length in Chapter 13. 47 Levels of Analysis and the Information Worksheet jure modes can be described at finding, w! ikely to be subjected to some sort of proactive maintenance. Fai ire Modes and Effects An FMEA) 81 ‘The detail used to describe ‘physical For exampla, it we apply RCM to differentials, axles and wheels? Or should the engine not be divided into engine block, engine managament system, cooling system, fuel system andsoonbetore starting the analysis? What about subdividing the fuel system into tank, pump, pipes and ters? te needs careful thought because an analy’ Starting at a low level (One of the most common mistakes in the RCM processis carrying out the at too low a level in the equipment hierarchy, thinking about the whieh comes to mi ‘soitseems sensibletoaddress| RCM INFORMATION inding task prescribed more than o + anew worksheet has to be raised for each new ae ratory ‘we consider thatthe vehicle can ly dozens ~ if not hundreds ~ of sub- systems separate system, the following problems begin to re progresses, the more di and define performance standards. (On cares about the precise amount of fuel pas h the fuel system as fong as the fuel economy of the ve! is and the vehicle has enough power.) ise" + the further down the hierarch (ing the analysis towards the bot we could start at the top. to visualise and hence toana- For example, the primary function of: transport up to 40 tons of material. tuck was listed on page 28 as follows: To in Figure 4 could all caus {an Information Worksheet Worksheet covering the entre truck, as shown in Figure 4.10. RCMIT INFORMATION 40 ton truck. WORKSHEET [sussvevess a © swe aLzoon ero Fanon FRSREAL EAE | FRR ERE when a sensorin one sub-system drives thi. wich reads a signal off the fwheel ‘sub-system might send a signal through a processor sub-system to a fuel shutoff valve in the ue! sub-system. Figure 4.10: Faire modes ofa truck ‘ious orrneton (Coats te) inalyses carried out at this level consume far less paper. the main disadvantage of performing the anal For instance, we have seen how the blocked fuel system might have been the jad in the analysis car igure 410 shows thatit might into many levels and the RCM process applied at any one of these level Forexample, Figure 4.11 showshow the 40 ton truck couldbe dvidedintoatleast traces the hierarchy from the level ofthe truck as a whole down to the evel ofthe fuel ines. It goes onto show how the primary function of he asset right be defined at each level on an RCM Information Worksheet, and how the ies, how doweselect ‘We have seen that the top level usually embodies too many failure odes per function to permit sensible analysis. In spite ofthis however, the main fi of the asset or system atthe «framework for the rest ofthe analysis. overall performance of the asse judged at the highest levels. For! ‘much more likely 1 ask: 4X performing?'than ‘now is the fuel system ‘on truck X performing?” (unless the fue system is known tobe causing a problem) system Steering system Cabin Propshaft Differentials A lt pony ck Choking system Exfaust Chapter 2 explained that in practice, a st provides a record of the main functions dards of any asset or system at levels above the analysis isto be ear Effects FMEA) 87 is nearly always FAILURE MODE FAILURE EF FECT tostart too Low in the asset hierarchy. For this reason, a good general rule + [Gearbox bearings soze 2 [Gear tet sipped 3 | Gearbox seizes ue to lak ot te uffer from n nce, the entire braking system could be analysedat level 2as shownin Fig- the higher level system. but it may be necessary to analyse the engine at level 3 or ever level 4 a a separate exer is. There is no technical reason ther with their effects) at For example, the failure of the gearbox discussed above could have beon listed icy tobe selected. as folows: jure management level analyses sometimes generat FAILURE WOE FRESE 1 [Gearbox fais Gearbox analysed separately ‘when the asset incorporates comple: s fa a 7 which coul iselves suffer from a large number of failure modes. ‘Examples of such subassemblies include smal electric motors, small hydraulic Assub-assembly is usually worth tr systems, small gearboxes, contolloops, protectiveciruits andcomplexcouplings. ture modes ofthe subassembly could c: the main assembly. (lf there are between 7 and 9 f ‘option more analyses, but fewer ure modes of the subassembly individu- alysis ~ in other words, at levels equivalent as a result of the is asset this gear- puogag fjqensn uonmusopily saaysyso paysYio 4 woTPLFOFA atp Jo UUENIOO SM| a4 ‘afore 8 sv paskeur St 991.98 a4p HOY IN ‘ane (ono “wnnara ‘sasv® “¢ puw | suondo jo uoN uo $9 j e322 Fe es i]2: a ag 2 3834 ® ema ckses z= ar gee es iy Bee a be oFad # Zz a rele i a8 8 ag ; al E 3 5 z z a a 7 aoueusqureu aanseord jo m0 Kut 01 sanedo1 29e] 0} paroofqns 10 papseastp 39 jquiasseqns 10 juauodwwoo & 105 paxdope 24 ouonbayj ain asm0209 rooysyiogn woREWONY WOU OU “EF 2EnIyy Si Gas Tre ae a el PEST Saisie ie a oe a, a ra a | ne 1 1 + ocmmastna tine scnmeym |: [ter mrt ar | ir aemaoes ens et eee F ab lomay osage aye dt fom cn gh en gs nn Dawria aps posters: |i pademarncatte [ppm fica been im tha, | re Sea eect ees, = eR pee eerie > saleocemsy aoa pat tomas Rote aig bse 2 cmtnates mecgey | an snage neat nar new oe ena sen wade | /upeostncrokoty | he pe etn cue ae ak soak ee chaps ace 1 Eau st meurog ats 2 tautenttincur | Teck canta we i 8 he + stewermsal anny tcome matclucucte ene ane ghost acne Feet genni a to aiigncman erecta oan cunts as Paty ea tanceat an ed eee kale the sary bate Surtees apa ‘econ Brotcrceenekenayan Sunira europe ase ie td al ee stip peut cuhe rstoe cots pie Dre ga i ea ma. aac Ee ter rey cen a coon freserase ay | tue tbe et engage pst ane shen Nase es ‘wigs puny ovens apr scl ake 5 5 68 5 Failure Consequences Previous chapters have explained how the RCM process asks the follow- {ng seven questions about each asset: ‘+ what are the functions and associated performance standards of the asset in its present operating context? + in what ways does it fail to fulfil its functions? # what causes each functional failure? * what happens when each failure occurs? + in what way does each failure matter ‘+ what can be done to predict or prevent each failure? + what if a suitable proactive task cannot be found? first four questions were discussed at length in Chap- modes and failure effect ions are asked about each individual fa xs the fifth question: lure matter? 5.1 Technically Feasible and Worth Doing Every time a failure occurs, the organisation which uses the assetisaffec~ tediin some way, Some failures affect output, product quality or customer service. Others threaten safety or the environment. Some increase oper- ng costs, for instance by increasing energy consumption, while a few have an impact in four, five: ix of these areas, Still others may appear to have no effect at all if they occur on their own, but may expose the organisation to the risk of much more serious failures. Ifany of these failures are not prevented, the time andeffort which need tobe spent com 1 affects the organisation, because repair- nsuimes resources which might be better used elsewhere, operating context of the asset, the performance standards which apply to each function, and the physical effects of each failure mode. quences are very serious, then co the failure, or at least to cause significant secondary dat On the other hand, if the possible that no proactive action w rected each time it occurs. the consequences of proactive m ich about preventing f about avoiding or reducing the consequences of fai Proactive maintenance has much more to do with avoiding or reducing the consequences of failure than it has to do with preventing the failures themselves If this is accepted, then it stands to reason thi worth doing if it deals successh which itis meant to prevent. A proactive task is worth doing if it deal the consequences of the failure which it successfully with ‘meant to prevent This of course presupposes that itis possible to anticipate or prevent the failure in the first place, Whether or not a proactive task is techn ‘feasible depends on the technical characteristics of the t failure which it is meant to prevent. The crite failure consequences also indi Default tasks are reviewed in les the criteria used toevaluate the fide whether any form of prone- re divided in two stages into inctions from evident some fa th, Others cause machines to shut down or some other part srupted. Others lead to product quality problet ‘ay, and yet others are ecompanied by obviot es, escaping steam, unusual smells or are classed as evident because someone will even ‘when they occur on their own. This leads to the ‘An evident function is one whose failure its own eventually and inevitably become evident to the operating crew under normal circumstances However, item is in nt to the operatin normal circumstances if it occurs are discussed functions are asso Categories of Evident Failures Evident failures are classified into three. importance, as follows: environmental standard erational consequences. A ti affects production or opera 94 Reliability-centred Maintenance ‘This approach also means that the safety, environmental and economic consequences of each failure are assessed in one exercise, which is much ‘more cost-effective than considering them separately. The next four sections of this chapter consider each of these categories iting with the evident categories and then moving on to the her more complex issues surrounding hidden functions. 5.3. Safety and Environmental Consequences Safety First As we have seen, the first step in the consequence evaluation provess is to identify hidden functions so that they can be dealt with appropriately All remaining failure modes — in other words, failures which are notel sified as hidden ~ must by definition be evident. The above paragraphs explained that the RCM process considers the safety and environmer implications of each evident failure mode frst, Itdoes so for two reasons: + amore and more firmly held belief among employers, employees, cus- society in general that hurting or killing people inthe course of business is simply not tolerable, and hence that everything possible should be done to minimise the possibility of any sort of safety-related incident or environmental excursion. tomers hare tolerated for failures which have operational con- ‘most of the cases where a proactive task is ikely to be more than sequences, AS a resu worth doing from the safety Atone level, safety refers to the safety of individuals in the workplace. , RCM asks whether anyone could get hurt or killed either as ire mode itself or by other damage which may be A failure mode has safety consequences if it causes a loss of function or other damage which could hurt or kill someone Failure Consequences 95 Atanother level, “safety” refers to the safety or well-being of society in general. Nowadays, failures which affect society tend to be classed a ‘environmental’ issues. Infact, in many parts of the world the point is fast approaching where organisations either conform to soviety’s environ: ‘mental expectations, or they will no longer be allowed to operate. Soq apart from any personal feelings which anyone may have on the issue, environmental probity is becoming a prerequisite for corporate survival. ave their own sometimes even more stringent corporate standards, A failure mode is said to have environmental consequencesifitcould lead to the breach of any of these standards, A failure mode has environmental consequences if it causes a loss of function or other damage which could lead to the breach of any known environmental standard or regulation Note that when considering whethera ‘mental consequences, we are considerit ‘own could have the coi ‘chapter, in which we consider the system, ‘The Question of Risk like to five in an environment where there is f death or injury, it is generally accepted that there isan clement of risk in everything we do. h absolute zero is unattainable, even though it isa worthy target to keep striving for. This immediately leads us to ask what is at ‘To answer this question, we first need to consider the question of risk in more detail, Risk assessment consists of three elements, The asks what could asks how likely itis for the event to occur at ments provides a measure of the degree of risk. Th ‘most contentious element ~ asks whether this risk is toler 96 For example, consider a people (what could happen). The probabil t mode could occur is Ene ina thousandin any one year (how likely its fo oecur). On the basis of these figures, the risk associated with ths failure is 44000) = 1 casualty per 100 years igeable when the MTBF is greater the jually happens if any fail onthe RCM Information Worksheet a th in Chapter 4, Part 5 of Chapter 4 also listed a number of typical is which pose a threat to safety or the environment, that these effects could hurt or kill someone does not neces- ean that they will do so every time they occur. Some may even he issue is not whether le, often wi -es are inevitable, but whether they are poss Iravehing crane used to cary stee coils, yyone who happened be standing under ye was nearby, than no-one would get hur. ‘Someone couldibe hurt means that this failure mode Failure Consequences. 97 Thisexample. consequences at of the a quences of the failure of protected functions sh as if protective devices of this type are not pres (is always regarded as a safely ing system does not necessar- function inits turn, as discussed in part 6 of is chapter and Chapter 8. How likely is the failure to occur? art 4 of Chapter 4 mentions that only failure modes which are reason- 98 ——_-Reliability-centred Maintenance wildly unlikely but nonetheless risk when they believe that than when they believe that ‘000000 who kely tobe killed isgivenby the statingst trav ‘between New Yorkand Los Angeles in the USA ‘ahite doing so, wile 4 person in 14.000 who makes the trip by roadis likely tobe Killed, And yet Some people insiston making his tip by road because they believe prepared totolerate and the extent to which In more general terms, this shown in Figure 5.2. Probably which tolerate of ring klein any ne your & L ol - < coi eatin eee EE Soe SSG Saas anes ceeancem omminet Tite “uncea Figure 5.2: ‘eine sSeuimqnane eiosemye cry en, cae oe Teli maleanege tgearer Cea 9 the author ~ they mer he or she is prepared they are based on the pers about his or her daily business. Th degree of risk f be arough appro: 108 Forexample, con ‘one of my 1 000 that everyone on the site faces roughly the same hazards). Furthermore, actiiies carried out on the sile embody (say) 10 000 events which could 1e average probability that each ever ‘Thetechniquesby whichor ity in this fashion are know iments. This approach is explored further 100 Rel Although perceived degree of e of risk, it is by no means the only issue. Other factors 1es: To explore this issue in any depth is well beyond the eof this book. Suffice it to contrast the views on tolerable risk ikely to be held by a mountaineer with those of someone who suffers stigo, or those of an underground miner with those of someone ly more dangerous than others, Some even compensate for sof risk with higherpay levels. The views of any individual down tohis orher perception ~inother words, whether the powerful effect on peoples’ views splay a surprising andeven ye has, How: tobe spent persuading some people to ever, threaten their offspring and thei For example, the author worked with one group which had oocasion to discuss the properties of a certain chemical, Words like toxic’ and ‘carcinogenic’ were tToated with indifference, even though most ofthe members ofthis group were the people most at risk. However, as s00n as itemerged that the chemical was iso mutagenie and teratogenia, and the meaning of these words was explained {o the group, the chemical was suddeniy viewed wih much greater respect. yced by how much people know about the ass ‘which it forms part and the failure mechanisms associated with each failure mode, The more they know, the better their judgement. (Ignorance is often a two-edged sword jons people take the most appalling risks out of sheer they wildly exaggerate the risks ~also out of + kno -d contempt value placed or and even fac the tolerability of any tisk ive and subjective not possible in a way which will be univer able if someone is hurt or ipplied in a properly foc isdom of such a group will do much to ensure that the org: identify and manage all the failure modes th: environment, (The use of such group: Groups of this nature ean usually reach consens dealing with direct safety hazards, because they: 102 ‘Relia red Maintenance Safety and Proactive Maintenance fect safety orthe environment, the RCM process lates that we must try fo prevent it, The above discussion suggests that: For failure modes which have safety or environmental con- sequences, a proactive task is only worth doing if it reduces the probability of the failure to a tolerably low level ive task cannot be found which achieves this obj make the system safe. This ‘something’ couldbe the as ‘or an operating procedure. Once-off changes of this sort are classified as + tochange things si failure no longer has safety or environmen- yasequences. redesign is discussed in more detail in Chapter 9. issues, RCM failing, or to make it safe, This suggests ‘process for failure modes which have safety orenviron- ‘mental consequences can be summarised as shown in Figure 5.3 below: Gamage |_| DovaiRe talus meas aise iumeien| | Peleavet neti acne