You are on page 1of 10

Failure mode and effects analysis 1

Failure mode and effects analysis


Failure Mode and Effects Analysis (FMEA) was one of the first systematic techniques for failure analysis. It was
developed by reliability engineers in the 1950s to study problems that might arise from malfunctions of military
systems. A FMEA is often the first step of a system reliability study. It involves reviewing as many components,
assemblies, and subsystems as possible to identify failure modes, and their causes and effects. For each component,
the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet.
There are numerous variations of such worksheets. A FMEA is mainly a qualitative analysis.[1]
A few different types of FMEA analysis exist, like
• Functional,
• Design, and
• Process FMEA.
Sometimes the FMEA is called FMECA to indicate that Criticality analysis is performed also.
An FMEA is an Inductive reasoning (forward logic) single point of failure analysis and is a core task in reliability
engineering, safety engineering and quality engineering. Quality engineering is specially concerned with the
"Process" (Manufacturing and Assembly) type of FMEA.
A successful FMEA activity helps to identify potential failure modes based on experience with similar products and
processes - or based on common physics of failure logic. It is widely used in development and manufacturing
industries in various phases of the product life cycle. Effects analysis refers to studying the consequences of those
failures on different system levels.
Functional analyses are needed as an input to determine correct failure modes, at all system levels, both for
functional FMEA or Piece-Part (hardware) FMEA. A FMEA is used to structure Mitigation for Risk reduction based
on either failure (mode) effect severity reduction or based on lowering the probability of failure or both. The FMEA
is in principle a full inductive (forward logic) analysis, however the failure probability can only be estimated or
reduced by understanding the failure mechanism. Ideally this probability shall be lowered to "impossible to occur"
by eliminating the (root) causes. It is therefore important to include in the FMEA an appropriate depth of
information on the causes of failure (deductive analysis).

Introduction
The FME(C)A is a design tool used to systematically analyze postulated component failures and identify the
resultant effects on system operations. The analysis is sometimes characterized as consisting of two sub-analyses, the
first being the failure modes and effects analysis (FMEA), and the second, the criticality analysis (CA). Successful
development of an FMEA requires that the analyst include all significant failure modes for each contributing element
or part in the system. FMEAs can be performed at the system, subsystem, assembly, subassembly or part level. The
FMECA should be a living document during development of a hardware design. It should be scheduled and
completed concurrently with the design. If completed in a timely manner, the FMECA can help guide design
decisions. The usefulness of the FMECA as a design tool and in the decision making process is dependent on the
effectiveness and timeliness with which design problems are identified. Timeliness is probably the most important
consideration. In the extreme case, the FMECA would be of little value to the design decision process if the analysis
is performed after the hardware is built. While the FMECA identifies all part failure modes, its primary benefit is the
early identification of all critical and catastrophic subsystem or system failure modes so they can be eliminated or
minimized through design modification at the earliest point in the development effort; therefore, the FMECA should
be performed at the system level as soon as preliminary design information is available and extended to the lower
levels as the detail design progresses.
Failure mode and effects analysis 2

Remark: For more complete scenario modelling other type of Reliability analysis may be considered, for example
fault tree analysis(FTA); a deductive (backward logic) failure analysis that may handle multiple failures within the
item and/or external to the item including maintenance and logistics. It starts at higher functional / system level. A
FTA may use the basic failure mode FMEA records or an effect summary as one of its inputs (the basic events).
Interface hazard analysis, Human error analysis and others may be added for completion in scenario modelling.

Functional analysis
The analysis may be performed at the functional level until the design has matured sufficiently to identify specific
hardware that will perform the functions; then the analysis should be extended to the hardware level. When
performing the hardware level FMECA, interfacing hardware is considered to be operating within specification. In
addition, each part failure postulated is considered to be the only failure in the system (i.e., it is a single failure
analysis). In addition to the FMEAs done on systems to evaluate the impact lower level failures have on system
operation, several other FMEAs are done. Special attention is paid to interfaces between systems and in fact at all
functional interfaces. The purpose of these FMEAs is to assure that irreversible physical and/or functional damage is
not propagated across the interface as a result of failures in one of the interfacing units. These analyses are done to
the piece part level for the circuits that directly interface with the other units. The FMEA can be accomplished
without a CA, but a CA requires that the FMEA has previously identified system level critical failures. When both
steps are done, the total process is called a FMECA.

Ground rules
The ground rules of each FMEA include a set of project selected procedures; the assumptions on which the analysis
is based; the hardware that has been included and excluded from the analysis and the rationale for the exclusions.
The ground rules also describe the indenture level of the analysis, the basic hardware status, and the criteria for
system and mission success. Every effort should be made to define all ground rules before the FMEA begins;
however, the ground rules may be expanded and clarified as the analysis proceeds. A typical set of ground rules
(assumptions) follows:
1. Only one failure mode exists at a time.
2. All inputs (including software commands) to the item being analyzed are present and at nominal values.
3. All consumables are present in sufficient quantities.
4. Nominal power is available

Benefits
Major benefits derived from a properly implemented FMECA effort are as follows:
1. It provides a documented method for selecting a design with a high probability of successful operation and safety.
2. A documented uniform method of assessing potential failure mechanisms, failure modes and their impact on
system operation, resulting in a list of failure modes ranked according to the seriousness of their system impact
and likelihood of occurrence.
3. Early identification of single failure points (SFPS) and system interface problems, which may be critical to
mission success and/or safety. They also provide a method of verifying that switching between redundant
elements is not jeopardized by postulated single failures.
4. An effective method for evaluating the effect of proposed changes to the design and/or operational procedures on
mission success and safety.
5. A basis for in-flight troubleshooting procedures and for locating performance monitoring and fault-detection
devices.
6. Criteria for early planning of tests.
Failure mode and effects analysis 3

From the above list, early identifications of SFPS, input to the troubleshooting procedure and locating of
performance monitoring / fault detection devices are probably the most important benefits of the FMECA. In
addition, the FMECA procedures are straightforward and allow orderly evaluation of the design.

Basic terms
The following covers some basic FMEA terminology.
Failure
The loss of an intended function of a device under stated conditions.
Failure mode
The specific manner or way by which a failure occurs in terms of failure of the item (being a part or (sub)
system) function under investigation; it may generally describe the way the failure occurs. It shall at least
clearly describe a (end) failure state of the item (or function in case of a Functional FMEA) under
consideration. It is the result of the failure mechanism (cause of the failure mode). For example; a fully
fractured axle, a deformed axle or a fully open or fully closed electrical contact are each a separate failure
mode.
Failure cause and/or mechanism
Defects in requirements, design, process, quality control, handling or part application, which are the
underlying cause or sequence of causes that initiate a process (mechanism) that leads to a failure mode over a
certain time. A failure mode may have more causes. For example; "fatigue or corrosion of a structural beam"
or "fretting corrosion in a electrical contact" is a failure mechanism and in itself (likely) not a failure mode.
The related failure mode (end state) is a "full fracture of structural beam" or "an open electrical contact". The
initial Cause might have been "Improper application of corrosion protection layer (paint)" and /or
"(abnormal) vibration input from another (possible failed) system".
Failure effect
Immediate consequences of a failure on operation, function or functionality, or status of some item.
Indenture levels (bill of material or functional breakdown)
An identifier for system level and thereby item complexity. Complexity increases as levels are closer to one.
Local effect
The failure effect as it applies to the item under analysis.
Next higher level effect
The failure effect as it applies at the next higher indenture level.
End effect
The failure effect at the highest indenture level or total system.
Detection
The means of detection of the failure mode by maintainer, operator or built in detection system, including
estimated dormancy period (if applicable)
Risk Priority Number (RPN)
Cost (of the event) * Probability (of the event occurring) * Detection (Probability that the event would be
detected before the user was aware of it)
Severity
The consequences of a failure mode. Severity considers the worst potential consequence of a failure,
determined by the degree of injury, property damage, system damage and/or time lost to repair the failure.
Failure mode and effects analysis 4

Remarks / mitigation / actions


Additional info, including the proposed mitigation or actions used to lower a risk or justify a risk level or
scenario.

History
Procedures for conducting FMECA were described in US Armed Forces Military Procedures document MIL-P-1629
(1949); revised in 1980 as MIL-STD-1629A). By the early 1960s, contractors for the U.S. National Aeronautics and
Space Administration (NASA) were using variations of FMECA or FMEA under a variety of names. NASA
programs using FMEA variants included Apollo, Viking, Voyager, Magellan, Galileo, and Skylab. The civil aviation
industry was an early adopter of FMEA, with the Society for Automotive Engineers (SAE) publishing ARP926 in
1967. After two revisions, ARP926 has been replaced by ARP4761, which is now broadly used in civil aviation.
During the 1970s, use of FMEA and related techniques spread to other industries. In 1971 NASA prepared a report
for the U.S. Geological Survey recommending the use of FMEA in assessment of offshore petroleum exploration. A
1973 U.S. Environmental Protection Agency report described the application of FMEA to wastewater treatment
plants. FMEA as application for HACCP on the Apollo Space Program moved into the food industry in general.
The automotive industry began to use FMEA by the mid 1970s. The Ford Motor Company introduced FMEA to the
automotive industry for safety and regulatory consideration after the Pinto affair. Ford applied the same approach to
processes (PFMEA) to consider potential process induced failures prior to launching production. In 1993 the
Automotive Industry Action Group (AIAG) first published an FMEA standard for the automotive industry. It is now
in its fourth edition. The SAE first published related standard J1739 in 1994. This standard is also now in its fourth
edition.
Although initially developed by the military, FMEA methodology is now extensively used in a variety of industries
including semiconductor processing, food service, plastics, software, and healthcare.[2] Toyota has taken this one
step further with its Design Review Based on Failure Mode (DRBFM) approach. The method is now supported by
the American Society for Quality which provides detailed guides on applying the method.

Example worksheet (ARP4761) - Design (Hardware) FMEA

Example FMEA worksheet


FMEA Item Potential Potential Mission Local Next System (P) (S) Severity Detection (D) Risk Level Actions for Mitigation /
Ref. failure cause(s) / Phase effects of higher Level End Probability (Indications Detection P*S (+D) further Requirements
mode mechanism failure level Effect (estimate) to Dormancy Investigation
effect Operator, Period / evidence
Maintainer)

1.1.1 Brake Internal a) O-ring Landing Decreased No Left Severely (C) (VI) (1) Flight Built-In Unacceptable Check Require
Manifold Leakage Compression pressure Wheel Reduced Moderate Catastrophic Computer Test Dormancy redundant
Ref. from Set (Creep) to main Braking Aircraft (this is the and interval is Period and independent
Designator Channel failure b) brake deceleration worst case) Maintenance 1 minute probability of brake
2b, A to B surface hose on ground Computer failure hydraulic
channel A, damage and side will indicate channels
O-ring during drift. "Left Main and/or
assembly Partial loss Brake, Require
of runway Pressure redundant
position Low" sealing and
control. Classify
Risk of O-ring as
collision Critical Part
Class 1
Failure mode and effects analysis 5

Probability (P)
In this step it is necessary to look at the cause of a failure mode and the likelihood of occurrence. This can be done
by analysis, calculations / FEM, looking at similar items or processes and the failure modes that have been
documented for them in the past. A failure cause is looked upon as a design weakness. All the potential causes for a
failure mode should be identified and documented. This should be in technical terms. Examples of causes are:
Human errors in handling, Manufacturing induced faults, Fatigue, Creep, Abrasive wear, erroneous algorithms,
excessive voltage or improper operating conditions or use (depending on the used ground rules). A failure mode is
given an Probability Ranking.

Rating Meaning

A Extremely Unlikely (Virtually impossible or No known occurrences on similar products or processes, with many running hours)

B Remote (relatively few failures)

C Occasional (occasional failures)

D Reasonably Possible (repeated failures)

E Frequent (failure is almost inevitable)

Severity (S)
Determine the Severity for the worst case scenario adverse end effect (state). It is convenient to write these effects
down in terms of what the user might see or experience in terms of functional failures. Examples of these end effects
are: full loss of function x, degraded performance, functions in reversed mode, too late functioning, erratic
functioning, etc. Each end effect is given a Severity number (S) from, say, I (no effect) to VI (catastrophic), based on
cost and/or loss of life or quality of life. These numbers prioritize the failure modes (together with probability and
detectability). Below a typical classification is given. Other classifications are possible. See also hazard analysis.

Rating Meaning

I No relevant effect on reliability or safety

II Very minor, no damage, no injuries, only results in a maintenance action (only noticed by discriminating customers)

III Minor, low damage, light injuries (affects very little of the system, noticed by average customer)

IV Moderate, moderate damage, injuries possible (most customers are annoyed, mostly financial damage)

V Critical (causes a loss of primary function; Loss of all safety Margins, 1 failure away from a catastrophe, severe damage, severe injuries,
max 1 possible death )

VI Catastrophic (product becomes inoperative; the failure may result complete unsafe operation and possible multiple deaths)

Detection (D)
The means or method by which a failure is detected, isolated by operator and/or maintainer and the time it may take.
This is important for maintainability control (Availability of the system) and it is specially important for multiple
failure scenarios. This may involve dormant failure modes (e.g. No direct system effect, while a redundant system /
item automatic takes over or when the failure only is problematic during specific mission or system states) or latent
failures (e.g. deterioration failure mechanisms, like a metal growing crack, but not a critical length). It should be
made clear how the failure mode or cause can be discovered by an operator under normal system operation or if it
can be discovered by the maintenance crew by some diagnostic action or automatic built in system test. A dormancy
and/or latency period may be entered.
Failure mode and effects analysis 6

Rating Meaning

1 Certain - fault will be caught on test

2 Almost certain

3 High

4 Moderate

5 Low

6 Fault is undetected by Operators or Maintainers

DORMANCY or LATENCY PERIOD The average time that a failure mode may be undetected may be entered if
known. For example:
• During aircraft C Block inspection, preventive or predictive maintenance, X months or X flight hours
• During aircraft B Block inspection, preventive or predictive maintenance, X months or X flight hours
• During Turn-Around Inspection before or after flight (e.g. 8 hours average)
• During in-built system functional test, X minutes
• Continuously monitored, X seconds
INDICATION
If the undetected failure allows the system to remain in a safe / working state, a second failure situation should be
explored to determine whether or not an indication will be evident to all operators and what corrective action they
may or should take.
Indications to the operator should be described as follows:
• Normal. An indication that is evident to an operator when the system or equipment is operating normally.
• Abnormal. An indication that is evident to an operator when the system has malfunctioned or failed.
• Incorrect. An erroneous indication to an operator due to the malfunction or failure of an indicator (i.e.,
instruments, sensing devices, visual or audible warning devices, etc.).
PERFORM DETECTION COVERAGE ANALYSIS FOR TEST PROCESSES AND MONITORING (From
ARP4761 Standard):
This type of analysis is useful to determine how effective various test processes are at the detection of latent and
dormant faults. The method used to accomplish this involves an examination of the applicable failure modes to
determine whether or not their effects are detected, and to determine the percentage of failure rate applicable to the
failure modes which are detected. The possibility that the detection means may itself fail latent should be accounted
for in the coverage analysis as a limiting factor (i.e., coverage cannot be more reliable than the detection means
availability). Inclusion of the detection coverage in the FMEA can lead to each individual failure that would have
been one effect category now being a separate effect category due to the detection coverage possibilities. Another
way to include detection coverage is for the FTA to conservatively assume that no holes in coverage due to latent
failure in the detection method affect detection of all failures assigned to the failure effect category of concern. The
FMEA can be revised is necessary for those cases where this conservative assumption does not allow the top event
probability requirements to be met.
After these three basic steps the Risk level may be provided.
Failure mode and effects analysis 7

Risk level (P*S) and (D)


Risk is the combination of End Effect Probability And Severity. Where probability and severity includes the effect
on non-detectability (dormancy time). This may influence the end effect probability of failure or the worst case
effect Severity. The exact calculation may not be easy in case multiple scenarios (with multiple events) are possible
and detectability / dormancy plays a crucial role (as for redundant systems). In that case Fault Tree Analysis and/or
Event Trees may be needed to determine exact probability and risk levels.
Preliminary Risk levels can be selected based on a Risk Matrix like shown below, based on Mil. Std. 882.[3] The
higher the Risk level, the more justification and mitigation is needed to provide evidence and lower the risk to an
acceptable level. High risk should be indicated to higher level management, who are responsible for final decision
making.

Probability / Severity --> I II III IV V VI

A Low Low Low Low Moderate High

B Low Low Low Moderate High Unacceptable

C Low Low Moderate Moderate High Unacceptable

D Low Moderate Moderate High Unacceptable Unacceptable

E Moderate Moderate High Unacceptable Unacceptable Unacceptable

• After this step the FMEA has become like a FMECA.

Timing
The FMEA should be updated whenever:
• A new cycle begins (new product/process)
• Changes are made to the operating conditions
• A change is made in the design
• New regulations are instituted
• Customer feedback indicates a problem

Uses
• Development of system requirements that minimize the likelihood of failures.
• Development of designs and test systems to ensure that the failures have been eliminated or the risk is reduced to
acceptable level.
• Development and evaluation of diagnostic systems
• To help with design choices (trade-off analysis).
Failure mode and effects analysis 8

Advantages
• Improve the quality, reliability and safety of a product/process
• Improve company image and competitiveness
• Increase user satisfaction
• Reduce system development time and cost
• Collect information to reduce future failures, capture engineering knowledge
• Reduce the potential for warranty concerns
• Early identification and elimination of potential failure modes
• Emphasize problem prevention
• Minimize late changes and associated cost
• Catalyst for teamwork and idea exchange between functions
• Reduce the possibility of same kind of failure in future
• Reduce impact on company profit margin
• Improve production yield

Limitations
If used as a top-down tool, FMEA may only identify major failure modes in a system. Fault tree analysis (FTA) is
better suited for "top-down" analysis. When used as a "bottom-up" tool FMEA can augment or complement FTA and
identify many more causes and failure modes resulting in top-level symptoms. It is not able to discover complex
failure modes involving multiple failures within a subsystem, or to report expected failure intervals of particular
failure modes up to the upper level subsystem or system.[citation needed]
Additionally, the multiplication of the severity, occurrence and detection rankings may result in rank reversals,
where a less serious failure mode receives a higher RPN than a more serious failure mode. The reason for this is that
the rankings are ordinal scale numbers, and multiplication is not defined for ordinal numbers. The ordinal rankings
only say that one ranking is better or worse than another, but not by how much. For instance, a ranking of "2" may
not be twice as severe as a ranking of "1," or an "8" may not be twice as severe as a "4," but multiplication treats
them as though they are. See Level of measurement for further discussion.

Types
• Functional: before design solutions are provided (or only on high level) functions can be evaluated on potential
functional failure effects. General Mitigations ("design to" requirements) can be proposed to limit consequence of
functional failures or limit the probability of occurrence in this early development. It is based on a functional
breakdown of a system. This type may also be used for Software evaluation.
• Concept Design / Hardware: analysis of systems or subsystems in the early design concept stages to analyse the
failure mechanisms and lower level functional failures, specially to different concept solutions in more detail. It
may be used in trade-off studies.
• Detailed Design / Hardware: analysis of products prior to production. These are the most detailed (in mil 1629
called Piece-Part or Hardware FMEA) FMEAs and used to identify any possible hardware (or other) failure mode
up to the lowest part level. It should be based on hardware breakdown (e.g. the BoM = Bill of Material). Any
Failure effect Severity, failure Prevention (Mitigation), Failure Detection and Diagnostics may be fully analysed
in this FMEA.
• Process: analysis of manufacturing and assembly processes. Both quality and reliability may be affected from
process faults. The input for this FMEA is amongst others a work process / task Breakdown.
Failure mode and effects analysis 9

External links
• Mistake-Proofing Example Wiki [4]

References
[1] System Reliability Theorie, Models, Statistical Methods, and Applications, Marvin Rausand & Arnljot Hoylan, Wiley Series in probability
and statistics - second edition 2004, page 88
[2] Quality Associates International's History of FMEA (http:/ / www. quality-one. com/ services/ fmea. php)
[3] http:/ / www. everyspec. com/ MIL-STD/ MIL-STD-0800-0899/ MIL-STD-882E_41682/
[4] http:/ / pokayoke. wikispaces. com
Article Sources and Contributors 10

Article Sources and Contributors


Failure mode and effects analysis  Source: http://en.wikipedia.org/w/index.php?oldid=574724828  Contributors: 9000store, AbsolutDan, Adambro, Ahhboo, Aimaz, Alexdfromald, Allantonkin,
Anaxial, Ashchori, Ashton1983, Attilios, AxelBoldt, Bdm87, Bill, Billinghurst, Bondegezou, Boogachamp, Brockert, Broeze, Bsaguy, CWDURAND, CatherineMunro, Chris Capoccia, Chris the
speller, ChrisCork, Cohesion, ConconJondor, Ctxppc, Cybercobra, DP2.5, Daleh, DanielPenfield, Darren.bowles, Darth Panda, Davewho2, David Haslam, David n m bond, Derek R Bullamore,
Dfunk58, DonToto, DragonFury, Duncharris, Eastlaw, El C, Emurphy42, Epipelagic, Er.yogesh3486, Everyking, Excirial, FMEA Expert, Fantomian, Fiducial, Fig wright, Fnlayson,
Foamfollower, Gejigeji, Gnowor, Greenrd, HALIGHT, HansMair, Harryzilber, Heat fan1, Hu12, Inwind, IrfanSha, J to2000, J.delanoy, JSSX, Javegeorge, Jayen466, Jdvernier, Jivecat,
Jmuenzing, JohnnyB256, Jojalozzo, Jonahugh, Jsc83, Jt310897, Juliancolton, Jü, Kevconaghan, Kevinmon, Khazar2, Kjwilson76, Knoma Tsujmai, Kobraton, Kozuch, LaurensvanLieshout,
Lexas, Liftarn, LilHelpa, Linforest, LizardJr8, Lupo, LyallDNZ, M2OS, Maradi.mallikarjun, Marasmusine, Marc T Smith, Marianne.jyrgenson, Michael Hardy, Michaelsopa, MilesSmith1776,
Miller17CU94, Mm0therway, Mmoitzh, Modalla, Mongbei, Ncsupimaster, New Light Taiwan, NielsJurgen, Niklo sv, O.Koslowski, OliJWalker, Olufowa, Omerpeksen, Onopearls, Pac72,
Pavanpatidar, Pawankashyap123, PeterBrooks, PeterFV, Peterlewis, Petter73, Philip Sutton, Philip Trueman, Plain Jane 2009, Plrk, Prainog, Prashanthns, Pswift, QAassistant, Quantumobserver,
Quarl, Qwfp, RJBurkhart, RJBurkhart3, RandomAct, Redrose64, Remuel, RexNL, Rich Baldwin, Rich Farmbrough, RichardF, Rlsheehan, Rmhermen, Robin48gx, Ronz, Rp, Rsavoie,
SchreiberBike, SchreyP, SimonP, Spangineer, SueHay, Syona, Taffykins, Thequality, Thopper, Tide rolls, Tolly4bolly, Tom harrison, Tourhaan, Trivialist, Twas Now, Ukexpat, Vicsar,
Vinodh.vinodh, Vortexrealm, Watdenfoo, Whagen2, WikHead, Wittym, Wragge, Wtubeguru, Yamamoto Ichiro, Yintan, Zarius, Zwiadowca21, 488 anonymous edits

License
Creative Commons Attribution-Share Alike 3.0
//creativecommons.org/licenses/by-sa/3.0/

You might also like