You are on page 1of 25

Chapter 2

Failure Modes and Effects


Analysis
Introduction
Failure Modes and Effects Analysis (FMEA) is a procedure that examines each item
in a system, considers how that item can fail, and then determines how that
failure will affect the operation of the system. It is a structured, logical, and
systematic analysis. Identifying possible component failure modes and
determining their effects on the system operation helps the analyst to develop a
deeper understanding of the relationships among the system components and,
ultimately, to improve the system design by making changes to either eliminate or
mitigate the undesirable effects of a failure.
The FMEA Process
The FMEA methodology is based on a hierarchical, inductive approach to analysis;
the analyst must determine how every possible failure mode of every system
component affects the system operation.

The procedure consists of:


1. Identifying all item failure modes
2. Determining the effect of the failure for each failure mode, both locally and on
the overall system being analysed
3. Classifying the failure by its effects on the system operation and mission
4. Determining the failure’s probability of occurrence
5. Identifying how the failure mode can be detected. (This is especially important
for fault tolerant configurations.)
6. Identifying any compensating provisions or design changes to mitigate the
failure effects.
The FMEA Process
The details of the FMEA analysis are captured on analysis worksheets. These
worksheets provide a description of the failure modes and their consequences
traceable to diagrams or other design documentation.

Generally they include:


• Identification of the component being analysed
• Its purpose or function
• The component failure mode
• The cause of the failure and how the failure is detected
• The local, subsystem, and system-level effects of the failure mode
• The severity classification and probability of occurrence of the failure mode

A FMEA normally analyses each item failure as if it were the only failure within the
system. When the failure is undetectable or latent or the item is redundant, the
analysis may be extended to determine the effects of another failure, which in
combination with the first failure could result in an undesirable condition. All
single-point failures identified during the analysis that have undesirable
consequences must be identified on the FMEA worksheets for proper disposition.
The FMEA Process
When analysing failure effects, the analyst must also be concerned about possible
failure cascades and common cause failures where a single event can lead to
multiple failures. Such failures often result from the physical placement of the
components rather than their operational functions. For example, a failure of a
disk turbine in which the disk disintegrates and throws off pieces of broken metal
could disable several independent hydraulic systems in an aircraft if they are all
routed near the turbine.

In Fig. 5, the functional, interface, and detailed analyses provide tools for
evaluating the design at each phase of the development cycle. The analysis
iterates as the design evolves and expands to include more failure modes as more
design details become available, until all the required equipment elements have
been completely defined, analysed, and documented. Conducting the analysis in
this manner enforces a disciplined review of the baseline design and allows timely
feedback to the design process
The FMEA Process
A. FMEA Planning

Careful planning for the FMEA tailors the scope of the analysis to the needs of the
program and provides a process that efficiently identifies design deficiencies so
that corrective actions or compensating provisions can be made in a timely
manner. Proper planning requires that the system requirements to be all
specified. During the planning process, data such as field reports, design rules,
checklists, and other guidelines based on lessons learned, technology advances,
and the history or analysis of similar systems are collected and studied. Models
are developed to illustrate the physical and functional relationships between
system components and the interfaces within the system.

These models are especially helpful for:


• Identifying the material and component technologies that are being proposed
• Identifying their characteristic failure modes
• Examining the effects of those types of failures on the system safety and
operation
• Identifying potential compensating provisions in the design
The FMEA Process
A. FMEA Planning

The failures can then be assessed according to their effects on the operation and
mission of the system, how the failures will be detected, and any compensating
provisions or design changes needed to mitigate the effects of the failures.

The complexity of the analysis tasks makes it essential to establish a set of ground
rules for the analysis as early as possible. These rules help to ensure the
completeness, correctness, and consistency of the analysis. Ground rules identify
assumptions, limitations, analysis approach, boundary conditions, failure criteria
for fault models, and what constitutes a failure (in terms of performance criteria,
success/failure criteria, or interface factors). They also identify requirements to be
verified, possible end-item support equipment (e.g., operational or ground
support, maintenance support, special test equipment, etc.), lowest indenture
level to be analysed, assumed environmental conditions, possible mission
objectives and modes of operation, risk factors defined by system safety analyses,
and so on.
The FMEA Process
A. FMEA Planning

It is also useful to define libraries with descriptions of failure modes and


consequences. Such libraries help control the analysis process and ensure
consistency in terminology, types of failure modes considered, and so on among
all the analysts (including future analysts) contributing to the project. They also
provide direction as to the level of detail for the analysis while ensuring a more
consistent and uniform documentation.

The following libraries should be developed for the FMEA:


• Functional, interface, and detailed failure modes for each item type
• Mission phases and operating modes
• Effects that each failure mode has on the overall system and on the next-higher
indenture level above the postulated failure mode
• Descriptions with which to classify the severity of each failure mode’s effect on
the end-item
• Monitor descriptions that identify how a failure mode is detected
The FMEA Process
B. Functional Fault Analysis

A functional fault analysis is performed on the conceptual design to verify that


provisions to compensate for component failures are both necessary and
sufficient.

A functional analysis begins with a functional block diagram or equivalent system


representation. The block diagram indicates the input/output transfer function,
the flow of information, energy, force, fluid, and so on within the system, and the
primary relationship between the items to be covered in the analysis. Functional
failure mode models are assigned to each block resulting in the list of postulated
failure modes to be analysed. Then each function is analytically failed in each of
its failure modes to determine the effects and characteristic indications of failure
mode in each applicable operating mode.
The FMEA Process
B. Functional Fault Analysis

Ideally, a functional fault analysis focuses on the functions that an item or group
of items perform rather than the characteristics of the specific components used
in their implementation. In practice, the types of failure modes considered for a
function may depend on how the function will be implemented.

When a functional analysis is applied to manufacturing processes, typical failure


mode categories include manufacturing and assembly operations, receiving
inspection, and testing. Process failure modes are described by process
characteristics that can be corrected.

When the analysis is applied to software, typical functional failure modes are: (a)
failure to execute; (b) incomplete execution; (c) execution at an incorrect time
(early, late, or when it should not have been executed); (d) incorrect result. For
some software, the effects of other failure modes may also have to be assessed.
For example, the analysis of a real-time system may require an assessment of
interrupt timing and priority assignments.
The FMEA Process
B. Functional Fault Analysis

As the design details are developed, the functional block diagrams and analyses
are expanded, and the analysis iterates until all the system elements have been
completely defined and documented. Any undetected failures that cause loss of
system-level functions are corrected by incorporating requirements for
compensating provisions into the design and revising the functional fault analysis
to reflect the modifications.

A major benefit of a functional FMEA is that the functional failure modes can be
identified in the conceptual design before the detailed design has been
developed. Thus the analysis is aimed at influencing the design before the
construction of any hardware. Typical results of the analysis identify functional
failure modes that need to be eliminated or mitigated by changing the
functionaldesign of the system.
The FMEA Process
C. Interface Fault Analysis

The interface fault analysis focuses on determining the characteristics of failures in


the interconnections between subsystem elements. Cables, plumbing, fibre-optic
links, mechanical linkages, and other interconnections between subsystem
modules provide the basis for the postulated failure modes. Each type of
interconnection has its own set of potential failure modes.
The FMEA Process
C. Interface Fault Analysis

The interface fault analysis begins by defining the specific failure modes of the
interfaces between subsystem elements. Typical electrical failure modes are
“signal fails in the open condition,” “signal fails in the short condition,” and “input
or output shorted to ground.” Typical mechanical failure modes are “piping fails in
the closed position,” and “hydraulic pressure low.” Software interface failure
modes focus on failures affecting the interfaces between disparate software and
hardware elements.

The four failure modes most often applied to software interfaces are: (a) failure to
update an interface value; (b) incomplete update of the interface value; (c) update
to interface value occurs at an incorrect time (early or late); and (d) error in the
values or message provided at the software interface. Other failure modes specific
to the software or the interface hardware may also need to be considered.
Process errors that could result in misalignment or improper connection of parts
are an important failure mode for a process interface FMEA.
These types of errors are often eliminated by designing interconnections that can
be made in only one way or ones for which any orientation is valid.
The FMEA Process
C. Interface Fault Analysis

Because the interface analysis involves the interfaces between subsystem


elements, it is the responsibility of the system integrator to ensure that the
analysis is complete. The subsystem designer is responsible for assessing the
effects of all inputs to the subsystem. The integrator uses the results of these
analyses to determine the effects of the interface failure modes on the subsystem
and system.

The advantage of a separate interface fault analysis is that it can be performed


before detailed module designs are available; it can begin as soon as the
subsystem inputs, outputs, and their interconnections are defined. Typical results
of this analysis are interface failure modes that need to be eliminated or mitigated
by interface design changes.
The FMEA Process
D. Detailed Fault Analysis

A detailed fault analysis is used to verify that the design complies with the system
requirements for:
i. Failures that can cause the loss of system functions,
ii. Single-point failures,
iii. Fault detection capabilities and
iv. Fault isolation

It uses component failure modes postulated from the individual components in


the detailed design. This includes the physical devices in the design, software
modules, and the processing steps to produce the item.
The FMEA Process
D. Detailed Fault Analysis

A detailed fault analysis on the system hardware has traditionally been done in a
FMEA. This type of analysis is sometimes called a piece-part fault analysis because
it is done on the “piece parts” that compose the system. To perform the analysis,
an established set of component failure modes and their corresponding
occurrence ratios are especially useful.

For example, failure modes normally considered for a capacitor are “open,”
“short,” and “leaking”; For an integrated circuit they are “output pin stuck high”
and “output pin stuck low.” For a bearing, they are “binding or sticking,” “excessive
play,” and “contaminated.”

The failure mode ratios allow the item failure probability to be apportioned
among its failure modes to give the failure mode probability of occurrence. Failure
mode ratios are best obtained from field data that are representative of the
particular item application but when such data are not available, generic
references can be used for guidance. Failure mode ratios for a particular
component type may vary depending on the operating environment,
manufacturer, application, and other factors.
The FMEA Process
D. Detailed Fault Analysis

Process-related failure modes are specific to the manufacturing or maintenance


process. For example, a wax coating may be applied too thinly to provide the
corrosion protection it is intended to provide.

One problem associated with a detailed fault analysis is that the level of detail
required to do the analysis means that it cannot be initiated until the design has
matured to the point that detailed schematics and parts lists are available. This
means that any major errors found by the analysis are likely to be very expensive
to fix. Conversely, even major errors in the design concept are relatively easy to fix
in the early design phase when the functional and interface fault analyses are
done.
The FMEA Process
E. Identify Failure Consequences

The consequences of a failure mode analysed in a FMEA are its effects, a


classification of the severity of the failure mode based on its system-level effects,
and the probability of the failure mode occurrence. The analysis is conducted for
all phases and modes of system operation including normal operating modes,
contingency modes, and test modes, and with respect to the primary and
secondary mission objectives. The local, next-higher, and end-level failure effects
of each item failure mode must be determined, and corrective actions or
compensating provisions must be identified within each applicable operating
mode.

Assessment of the failure mode effects must identify the system conditions or
operational modes that manifest the anomalous behaviour.
For example, failures in the landing gear of an airplane that would cause “loss of
landing gear extension” will have significantly different effects if the failure occurs
on the ground than if it occurs while attempting to land. Likewise, the discovery
mechanism for detecting the failure may be different.
The FMEA Process
E. Identify Failure Consequences

The analysis identifies the effects of each postulated failure mode in a bottom-up
manner, beginning with the lowest-level items identified. The effects of each
failure mode are evaluated with respect to the function of the item being
analysed. Because the item failure under consideration might impact the system
at several levels of indenture, the failure effects are then related to the functions
at the next-higher indenture level of the design, continuing progressively to the
top or system-level functions.

The local effect(s) description gives a detailed accounting of the impact the failure
has on the local operation or function of the item being analysed. The fault
condition is described in sufficient detail that it can be used with the next-level
effects, end-effects, and detecting monitor(s) to identify and isolate the faulty
equipment, thus providing a basis for evaluating compensating provisions and
recommending corrective actions.
The FMEA Process
E. Identify Failure Consequences

Next-level effects describe the effect the failure has on the next-higher level
operation, function, or status. Descriptions of the next-level effects are normally
compiled in a table for consistency of annotation. The failure effect at one level of
indenture is the item failure mode of the next higher level of which the item is a
component.

End-effects describe the effect the failure has on the ability of the system to
operate and properly complete its mission. End-effects also provide a “go/no-go”
assessment of system capability to perform its intended mission. The system-level,
failure effect descriptions are best derived from the system requirements and
compiled in a table for consistency of annotation.
The FMEA Process
E. Identify Failure Consequences

Failure modes are usually classified by an assessment of the significance of the


end-effect on the system operation and mission. The FMEA ground rules should
provide a ranking and classification system, and appropriate criteria for assessing
the severity of failures for the product being analysed. Often a four-level
classification system developed for military equipment with severity classifications
ranging from “catastrophic” to “minor” is used.

Classifying a failure mode and ranking the consequences of failure require


knowledge of the system and its phases of operation. For example, in some
situations a failed tire might result in nothing more than the inconvenience of
having to change the tire. In other cases, a failed tire could lead to loss of control
of the vehicle and much more serious consequences.

When items are redundant and there is no warning that a redundant item has
failed, the severity should be assessed as if all of the redundant items have failed.
The FMEA Process
F. Corrective Action Recommendations

Corrective actions are needed for undetectable faults and for faults having
significant consequences— for example, unsafe conditions, mission- or safety-
critical single-point failures, adverse effects on operating capability, or high
maintenance costs.

Corrective actions may not be needed if the risks for the specific consequence(s)
of a failure are acceptable based on a low enough probability of occurrence.
Corrective actions generally take the form of changes in requirements, design,
processes, procedures, or materials to eliminate the design deficiency.
The FMEA Process
F. Corrective Action Recommendations

Development of an appropriate corrective action usually requires understanding


and eliminating the cause of the specific failure mode; conversely, careful analysis
of the failure mode causes may suggest ways to eliminate the failure.

Some examples of failure causes are:


• Incorrect material specification
• Overstressing of a component
• Insufficient lubrication
• Inadequate maintenance instructions
• Poor protection from the environment
• Incorrect algorithm (software)
• Software design errors, including software requirements errors
The FMEA Process
F. Corrective Action Recommendations

Special attention to the failure mode causes may be needed to ensure that proper
materials are used when the operational environment is especially severe due to
effects such as extreme temperature cycling, very high or very low operating
temperatures, the presence of corrosive chemicals, and so on. Once a corrective
action is implemented and validated, the affected fault analyses must be revised
to reflect the new baseline configuration.

If a failure results in unsafe system operating conditions, warnings are necessary


to alert the user and to ensure satisfactory system status before commencing
operations. Monitors must be strategically located to cover all undetected
catastrophic, hazardous, and single-point failures, based on the system
requirements and intended uses. Once the necessary monitors have been
identified, a subsequent FMEA iteration is conducted (in support of maintenance
activities) to verify that any remaining undetected failure modes comply with the
system fault detection requirements. Requirements for fault detection monitors
are then derived to cover the remaining undetected failure modes. These
monitoring requirements include operator procedures and human monitoring, as
well as built-in test.
The FMEA Process
F. Corrective Action Recommendations

When design changes to correct a deficiency are not possible or feasible,


compensating provisions must be identified to circumvent or mitigate the effect of
the failure when it occurs. Such provisions are often in the form of design
provisions or designated operator actions that allow continued safe operation
when a failure occurs.

You might also like