You are on page 1of 6

Analyzing failure

to prevent problems
BY MOHAMMED HAMED AHMED SOLIMAN

EXECUTIVE SUMMARY
Failure mode and effects analysis (FMEA) was initiated by the aerospace
industry in the 1960s to improve the reliability of systems. It is a
part of total quality management programs and should be used to
prevent potential failures that could affect safety, production, cost or
customer satisfaction. FMEA can be used during the design, service or
manufacturing processes to minimize the risk of failure, improving the
customers confidence while also reducing costs.

One ISO requirement is to have a Organizations that use the tool as a A team and a process
method or system capable of controlling corrective method will find that it does FMEA is not a job for one individual.
the process that determines the accept- not work as intended. FMEA can be The best possible results come when
ability of product or service quality. quite useful for design engineers in the teams are composed of contributors
Failure mode and effects analysis design phase of the product, as well as from different engineering perspectives.
(FMEA) is a good tool for improving the research and development engineers The team should have between four to
reliability of the product and its lifecycle. to help them develop new products FMEA is not six members. Team size is determined
The tool can maximize the mean time with better reliability, quality and safety. a job for one by the number of areas affected by the
between failures by reducing the proba- FMEA helps manufacturing engineers individual. FMEA, such as manufacturing, mainte-
bility of failure, extending the lifecycle of control the process and eliminate errors nance, design, engineering, material and
the product. This can be done during the during production, thus decreasing technical service.
design phase, manufacturing phase or warranty costs and wastes. Service The customer adds another unique
maintenance service. engineers can use FMEA to improve perspective and should be considered for
FMEA is a risk management tool the lifecycle of the product and lower team membership. If customers cannot
that is designed to work as a preventive its service costs by developing a proper be included, the team should devise ways
method rather than a corrective one. maintenance program. to generate voice-of-the customer data.
10 Industrial Management
J. Mikulak, Robin McDermott and
Michael Beauregard. Some of the steps
are obvious, but others arent. A basic
outline follows.
1. Select a high-risk process. This
will depend on the criticality of the
process and how a failure in this process
can affect safety, environment, health,
production or costs. For example, a
generator that will supply electricity to a
firefighting system during emergencies
is a critical safety component and must
be considered during an FMEA because
failures in such situations cannot be
accepted.
2. Review the process. This process
involves assigning a team that includes
people with various job responsi-
bilities and levels of experience, such
Every team as the design engineers, maintenance
member engineers, production engineers,
should be process engineers, safety engineers and
able to environmental engineers. The purpose
operate it and of the FMEA team is to bring a variety
see how it of perspectives and experiences to the
works. project.
If the process is a manufacturing
process, then the team should review the
process flowcharts and walk through the
process at the gemba (the place where
the work is done) to observe the real
situation and collect all the data needed.
If the process is a product or machine,
then the team should review the
assembly drawing. The product should
be tested, and every team member
should be able to operate it and see how
it works.
The team should have a leader who To encourage ideas, no theory should In this step, everyone in the team
acts as a facilitator, not a decision- be critiqued or commented on when must have full knowledge of how the
maker. The team leaders main goals are it is first offered. Each idea should be process works and operates.
to ensure that all resources are available, listed and numbered, exactly as offered, 3. Break down the system into
coordinate the meetings, and make sure on a flip chart. Expect to generate at components and subcomponents.
the team moves toward completing the least 50 to 60 concepts in a 30-minute If the system is a large system, like a
FMEA process. brainstorming session. Brainstorming water system that supplies an indus-
Brainstorming is a well-known sessions should follow four general trial process, the pump can be a critical
technique for generating a large number rules: Do not comment on, judge or component inside the system. A motor
of ideas in a short time period. Its critique ideas at the time they are offered; pump is a critical subcomponent
preferable to use this tool during the encourage creative and offbeat ideas; the because its failure can break down the
start of an FMEA process to determine goal is to end up with a large number of entire process. The motor pump should
potential failure modes for each ideas; and evaluate ideas later. be broken down into more subcompo-
component your team is studying. FMEA has sequential steps that nents that are likely to fail and will affect
Brainstorming also helps find the root were summarized in the book Basics the system, such as the motors bearings
causes of each failure mode. of FMEA, Second Edition, by Raymond and the rotor shaft. The FMEA will be
september/october 2014 11
CATEGORIZING FAILURE
Figure 1. An FMEA process should use 10-point scales to rank the severity, occurrence and detection of each failure mode.

Severity ranking criteria Detection ranking criteria


Description of failure e ect E ect Ranking Ranking Description
No reason to expect failure to have any e ect on safety, health, environment or mission. None 1
1-2 Very high probability of detection
Minor disruption of production. Repair of failure can be accomplished during trouble call. Very low 2
3-4 High probability of detection
Minor disruption of production. Repair of failure may be longer than trouble call but does not delay mission. Low 3
5-7 Moderate probability of detection
Moderate disruption of production. Some portion of the production process may be delayed. Low to moderate 4 8-9 Low probability of detection

Moderate disruption of production. The production process will be delayed. Moderate 5 10 Very low probability of detection
Moderate disruption of production. Some portion of production function is lost. Moderate delay in restoring high Moderate to high 6
function.
High disruption of production. Some portion of production function is lost. Signi cant delay in restoring function. High 7 of a standby generator will reduce the
High disruption of production. All of production function is lost. Signi cant delay in restoring high function. Very high 8 criticality of the system. However, this
Potential safety, health or environmental issue. Failure will occur with warning. Hazard 9
performance must be considered and
Potential safety, health or environmental issue. Failure will occur without warning. Hazard 10 compared. If the transformer failed,
Occurrence ranking criteria
would the generator be able to supply
Ranking Frequency of occurrence/ Description the electricity needed with the same
operating hours
efficiency? What is the time interval
1 1/10,000 Remote probability of occurrence; unreasonable to expect failure to occur

2 1/5,000 Low failure rate


between when the transformer fails and
3 1/2,000 Low failure rate when the generator starts to work? Will
4 1/1000 Occasional failure rate any failures have a severe effect on the
5 1/500 Moderate failure rate product, the process or the whole system
6 1/200 Moderate failure rate
that will cost a lot of money to repair?
7 1/100 High failure rate
8 1/50 High failure rate
One failure mode could have several
9 1/20 Very high failure rate effects. For example, an electrical cutoff
10 1/10 Very high failure rate in the home could stop the refrigerator
and damage food or prevent you from
used to prevent the probability of failure types of failures should be included in doing work on the computer.
for each component or subcomponent. the FMEA. Anything that can be done Several failure modes could have one
4. Brainstorm potential failure to ensure the product works correctly, effect. A dead car battery or tire failure
modes. Once everyone in the team has regardless of how the user operates it, has the same effect on your vehicle it
a deep understanding about how the will move the product closer to 100 will be difficult to make it to work on
process or product works, the team can percent total customer satisfaction. The time with such a failure early in the
start thinking about things that could use of mistake-proofing techniques, also morning.
happen to affect the process. After a known by its Japanese term poka-yoke, The team must determine the
brainstorming session, organize the can be a good tool for preventing failures end-effect each failure mode has on
ideas by grouping them into categories. related to user mistakes. the system or the process. This means
Categorizing failure modes can be done For example, an FMEA involving a One failure examining how each failure affects the
using many different ways, including coffee maker could try to engineer out mode could entire system, the facility or the other
failure type (i.e., electrical, mechanical or the user mistake of putting too much or have several connected processes.
user-created). too little ground coffee in the filter. This effects. 6. Assign severity rankings. Severity,
A failure mode is an event that causes will ensure that the machine is making occurrence and detection are each ranked
a functional failure, any of the myriad the right coffee with the same quality of on a 10-point scale, ranging from one as
ways in which a product or process can taste for all users. the lowest ranking to 10 as the highest.
fail. Examples of failure modes abound. 5. Assign an effect for each failure Figure 1 shows a standard example of
Low discharge pressure could be a mode. Each failure mode should have rankings for all three. In the severity
compressor failure mode. Knocking an effect that determines the severity of category, potential safety, health and
could be an engine failure mode. Seized the failure. It is also known as the conse- environmental failure modes generally
bearings are a bearing failure mode. quence of failure. indicate high risk, with rankings of nine
Burnout is a motor failure mode. A dead The effect of a failure mode on the and 10. Production losses and costs
battery is a car battery failure mode. system is influenced by the availability rank from a low of two to a high of eight,
Note that failures are not limited to of standby or redundancy in the system. depending upon the length of potential
problems with the product, and failures For example, a transformer that supplies delays and the severity of their effects on
could be tied to user mistakes. Those electricity is critical, but the existence the entire system.
12 Industrial Management
7. Assign an occurrence ranking take action. This could be done with of industry and the seriousness of
for each failure mode. Occurrence something like a Pareto chart and the the failure. For example, the nuclear
is the probability of failure during the 80-20 rule. Failure modes should industry has little margin for errors,
products expected lifecycle, usually be prioritized according to the risk as minor problems could escalate into
determined using the failure log number. High-risk numbers should be major disasters. Other industries might
history. But when historical data are given attention first; then you can pay find it acceptable to take higher risks.
not available or the failure never has attention to the severity rankings. Thus,
occurred before, the team can determine if several failure modes have the same A case of reliable improvement
the causes of each failure mode with risk priority number, that failure mode A good example of a successful FMEA
techniques such as the five whys. with the highest severity should be given process comes from the case of a
Once the potential causes are deter- more priority. system that supplied electricity to a
mined, the team can estimate an occur- All RPNs above a certain cutoff point glass-melting furnace in Egypt. The
rence ranking. should be considered for improvement. electric transformer is considered
8. Assign a detection ranking for The cutoff point number should be one critical because a failure causes high
each failure mode. First, the current that will improve at least 50 percent of production losses $5,000 an hour.
control and prevention methods the total risk priority number. A standby generator could keep the
applied to prevent, detect or control the 11. Take action to eliminate or furnace running if the transformer
failure should be listed, reviewed and reduce the high risk failure modes. High-risk failed. The standby was sufficient to
evaluated. The detection ranking should Once the priorities are assigned, numbers avoid damaging the furnace but did not
be assigned for each failure mode or organize action through continuous should supply enough electricity to continue
effect based on the current control/ improvement tasks and problem-solving be given production.
prevention/detection methods. As with approaches, implementing countermea- attention The team broke down the transformer
the severity and occurrence rankings, sures to reduce or eliminate the high-risk first; then into seven components: bushing, tank,
the detection ranking table in Figure failure modes. you can pay core, winding, oil, tap changer and solid
1 is standard. If one failure mode or Often, the easiest way to make an attention to isolation. Each component has different
effect has several causes, detection and improvement to the product or process the severity failure modes. For each failure mode
occurrence rankings should be assigned is to increase the detectability of the rankings. there is an effect. And for each failure
based on these causes. When potential failure, lowering the detection rate mode and effect there are several causes.
causes are eliminated, the risk of failure number. Teams can improve the chances Figure 2 shows the seven components
is lowered. of detecting failure through modifying and their failure modes, effects, causes,
9. Calculate the risk priority the preventive maintenance program, RPNs, rankings, recommended actions
number. The risk priority number using a proper condition-monitoring and other details.
(RPN) gauges the risk associated with method, eliminating the failure mode The severity ranking number was
potential problems identified during during the manufacturing process by based on the effect of each failure mode.
the FMEA process. It is useful for changing materials or suppliers, or Most of the failures had a medium effect
assessing risk and comparing compo- considering a mistake-proofing method on production because standby was
nents to determine priorities. The RPN during the design phase. An example available. An occurrence ranking was
is calculated by multiplying the severity, would be computer software that assigned based on the potential causes
occurrence and detection for each automatically warns that you are running of each failure mode and the historical
failure mode or effect. The number can out of memory. data.
serve as a gauge to compare with the 12. Calculate risk priority number It is important to discover the
revised RPN once the FMEA process is as high risks are removed. After problems root cause first because the
completed and risk is lowered. corrective actions have been taken to cause will help determine the occur-
Many have commented that the lower risks, recalculate the RPN. You rence ranking. A detection ranking was
ideal tables in Figure 1 do not exactly can compare this revised RPN with the assigned based on an evaluation of the
match their industry type or current earlier number to gauge improvement. transformers current preventive mainte-
conditions. But remember that the The expectation is that the FMEA nance program.
ideal is only a guide, and the tables can approach will reduce the initial RPN by The transformers maintenance
be adapted and changed as needed. at least 50 percent. program contained basic measurements
However, it is important to keep the There always will be a potential for and analysis on a monthly and annual
rankings from one to 10 so that the failure modes to occur. The question basis. No advanced prediction methods
RPN scale has a minimum score of one the company must ask is how much were used to detect severe problems
and a maximum score of 1,000. relative risk the team is willing to take. that might occur during the systems
10. Prioritize failure modes to That answer might depend on the type operation.
september/october 2014 13
CURBING FUTURE PROBLEMS
Figure 2. This successful FMEA project reduced the RPN of a transformers seven components from 540 to 188.

COMPONENT NAME AND FUNCTION: Bushing, supply high voltage

Results

Occurrence
Current control

Detection
Severity

Failure Failure Failure Failure detection/

RPN
Failure cause Failure causes Recommendations Actions
mode e ect causes cause prevention
methods S O D RPN

Inelastic gasket Use a proper condition- 4 1 2 8


Fault in Water 1 6 24
Equipment shutdown

Visual inspection Improve inspection monitoring technique


insulation penetration or Aging
and cleaning and detectability such as ultrasound to
material dirt Lack of detect insulation faults 4 1 2 8
Short maintenance 1 6 24
4
circuit
Damage Sabotage stone,
bushing crash or
careless 1 None 4 16 NA NA 4 1 4 16
handling

COMPONENT NAME AND FUNCTION: Tank , enclose oil, protect active parts
Results

Occurrence

Detection
Severity

Failure Failure Failure Current

RPN
Failure cause Failure cause Failure cause Recommendations Actions
mode e ect cause controls
S O D RPN

Aging 1 5 20 4 1 1 4
Material/ Inelastic gasket Visual Use ultrasound
method or corrosion Insu cient inspection
Equipment shutdown

1 5 20 analysis
maintenance 4 1 1 4
Improve inspection and technique to
Tank detectability detect arcing
High pressure
Leakage 4 damage phenomena
due to gas
(rupture) Arcing 1 10 40
generation 4 1 1 4
Mechanical
None
damage
Careless
handling 1 1 4 NA NA 4 1 1 4

COMPONENT NAME AND FUNCTION: Core, carry magnetic ux


Occurrence

Detection
Severity

Resulting

RPN
Failure mode Failure e ect Failure cause Failure cause Current controls Recommendations Actions
RPN

DC magnetization 1
Basic 4 16 NA NA 16
Loss of e ciency Lower voltage,
measurements and
(reduction of transformer production 4 Mechanical failure Displacement of the core seal
gauges monitoring
e ciency) disturbance during construction 1
on monthly basis 4 16 NA NA 16
(construction fault)

COMPONENT NAME AND FUNCTION: Winding, carry current


Results
Occurrence

Detection
Severity

Failure Failure Failure Current


RPN

Failure cause Failure cause Recommendations Actions


mode e ect cause controls S O D RPN

Generation of Improve inspection Use ultrasound analysis


Fault 1 8 32 4 1 2 8
copper sul de and detectability technique
insulation
Hot spot Low oil quality 1 Oil sampling 1 4 NA NA 4 1 1 4
Equipment shutdown

Movement of
Aging of cellulose 1 5 20
transformer 4 1 2 8
Short
4
circuit Short circuit in the net 1
5 20 4 1 2 8
Improve inspection Use ultrasound for
Mechanical Transient
Connection of transformer 1 None and detectability early detection
damage overvoltage 5 20 4 1 2 8
Lightning 1
5 20 4 1 2 8
Construction fault 1 5 20 4 1 2 8

COMPONENT NAME AND FUNCTION: Oil, the oil serves as both cooling medium and part of the insulation system
Occurrence

Detection

Results
Severity

Current
RPN

Failure Failure Failure


RPN

Failure cause Failure cause Failure cause controls Recommendations Actions


mode e ect cause
S O D RPN

Short circuit Particles in Pump failure,


Overheated
in transformer the oil dirty particles
2 4 32 4 1 2 8
Water in the Overheated or in the oil Visual
Equipment shutdown

Increase the Sample oil


oil aging monitoring
frequency of oil every six
of gauges
sampling to twice months in the
Oil 4 Oil circulation and oil
per year in the semiannual
out of function, sampling
Oil is not Fan/pump maintenance maintenance
or every three
Overheated cooled failure 2 4 32 schedule schedule 4 1 2 8
air/water years
cooling is out
of function

COMPONENT
14 IndustrialNAME AND FUNCTION: Tap changers, regulate voltage (volt leveling)
Management
rence

ction

Results
erity

Failure Failure
N
air/water years

E
cooling is out
of function

COMPONENT NAME AND FUNCTION: Tap changers, regulate voltage (volt leveling)

Occurrence

Detection
Results

Severity
Failure Failure

RPN
Failure cause Failure cause Failure cause Current controls Recommendations Actions
mode e ect
S O D RPN

Use a proper condition-


Change Use infrared
monitoring technique
Tap of the Cant change Mechanical analysis to detect
3 Wear 2 Voltage measuring 6 36 to detect mechanical 4 1 2 8
changes voltage voltage level damage mechanical
damages of the tap
output damages
changers

COMPONENT NAME AND FUNCTION: Solid isolation in cellulose-based products such as pressboard and paper. Its function is to provide dielectric and mechanical isolation to the
windings.

Occurrence

Detection
Results
Severity

Failure Failure Current

RPN
Failure cause Sources of failure Failure cause Recommendations Actions
Mode E ect controls
S O D RPN

Short circuit
Mechanical Aging of 10 40
1 4 1 2 8
damage Movement of cellulose Improve inspection and Use ultrasound to
transformer detectability in the detect early isolation
Equipment shutdown

maintenance program failures


Cant 10 40
Aging of cellulose 1 4 1 2 8
supply
4 None
insulation
Fault in
Low oil quality,
insulation
Hot spot or 1 1 4 4 1 1 8
material
overload

Use a proper condition- Use ultrasound to


Generation of
1 10 40 monitoring technique to detect isolation 4 1 2 8
copper sulfide
detect insulation faults early failures early

A risk priority number was calcu- While the above example involves your customers. Collecting feedback is
lated, with a cutoff RPN of 16. All RPNs a piece of equipment and its parts, important. For example, Toyotas recalls
greater than 16 were considered for FMEA can be applied in many other in recent years relied on its dealers
improvement. The FMEA calculated a areas, including the component proving and service centers to play a big role in
total RPN of 540. Applying continuous process; the outsourcing or resourcing collecting the important data needed to
improvement actions to all RPNs of a product; developing suppliers let Toyota know what changes needed to
greater than 16 lowered the total RPN to to achieve quality; major changes in be made on its factory floors. The data
188. This revised RPN was a 65 percent processes, equipment or technology; cost was based on customer feedback and
improvement. The reduction percentage reductions; and analysis of new products comments.
is calculated using this formula: (RPN - or designs. A proper And, as industrial engineers and
RPN revised) / RPN*100. FMEA process managers know, the best tools will not
The improvements that yielded Other important considerations must consider work without an inherent culture of
success included using ultrasound to Failure mode and effects analysis can ... failures continuous improvement. Everything
detect issues, increasing the frequency maximize a products reliability. But and mistakes runs a risk of failure. When failure
of oil sampling and using infrared dont mistake it as a standalone tool. that can be happens, the important thing is to find
analysis to detect mechanical damage. For example, to determine occurrence introduced out what the organization can do to
An FMEA process can trigger a ratings, FMEAs rely on the failure by your prevent those failures from occurring in
number of such actions to improve log history, and the documentation customers. the future.
a products service or maintenance process also is important. Problem- An FMEA is not a one-time job it
processes. They include, but are not solving techniques like five whys, should be repeated continuously to keep
limited to: Increase the detection rate brainstorming, fault-tree analysis and the process improved. Once the quality
of high-risk failures using a proper Pareto analysis must be engaged. These and cost of your companys offerings
technique to monitor conditions; techniques will help determine potential have been improved, competitors will
increase the inspection rate for a failure modes; assign the severity, try to match or exceed your value propo-
specific component or part; modify the occurrence and detection rankings; and sition. Continual FMEAs will bring your
routine maintenance program; increase provide solutions or actions to eliminate processes closer to perfection, so the
the frequency of replacing a specific those failures. continuous improvement culture should
spare part; modify the preventive And it cannot be emphasized how be embedded throughout all levels and
maintenance schedule; change a spare important customers are for a successful with all employees. v
part supplier; redesign a specific part FMEA. A proper FMEA process must
in the system or redesign the whole consider not only failures related to
system; and use different types of your organizations quality, but failures
materials or spare parts. and mistakes that can be introduced by
september/october 2014 15

You might also like