You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224441904

Failure analysis of FMEA

Conference Paper · March 2009


DOI: 10.1109/RAMS.2009.4914700 · Source: IEEE Xplore

CITATIONS READS

21 2,496

2 authors:

Zigmund Bluvband Pavel Grabov


ALD Group Technion - Israel Institute of Technology
25 PUBLICATIONS   108 CITATIONS    17 PUBLICATIONS   122 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pavel Grabov on 20 June 2019.

The user has requested enhancement of the downloaded file.


Expanded FMEA (EFMEA)
Zigmund Bluvband, ALD Ltd., Beit-Dagan.
Pavel Grabov, ALD Ltd., Beit-Dagan.
Oren Nakar, MOTOROLA Israel Ltd., Tel-Aviv.

Key Words: Failure Mode and Effect Analysis, RPN prioritization, Corrective action choice, Feasibility ranking.

SUMMARY & CONCLUSIONS failure mode of each item, and the effects of the failures on the
system:
The main FMEA objective is the identification of ways in ο What is the input for FMEA? Functions or items
which a product, process or service fail to meet critical identified as candidates for analysis.
customer requirements, as well as the ranking and ο What can go wrong? – The following is a list of
prioritization of the relative risks associated with specified potential failure modes for every identified item:
failures. The effectiveness of prioritization can be significantly • The item doesn’t perform the intended function;
improved by using a simple graphical tool, as described by the • The item’s performance is poor;
authors. Evaluation of the adequacy of correction actions • The item performs an unintended function.
proposed to improve product/process/service, and the ο What is the effect on the system’s output? – The
prioritization of these actions, can be supported by expected result that the given failure will have on
implementing the procedure proposed here, which is based on people, system (equipment) or environment.
the evaluation of correction action feasibility. The procedure After the failure effects have been identified, the
supports evaluation of both the feasibility of a corrective action consequences associated with each effect should be evaluated.
implementation and impact of the action taken on failure One possible method is the use of the conventional ranking
mode. procedure to rank the severity (S) of the failure effect. It is
ranked on a ‘1’ (Best Case) to ‘10’ (Worst Case) scale, which
1. INTRODUCTION appears on standard FMEA forms [1]. The ranking procedure
takes into account considerations of safety, system downtime,
Failure Mode Effect Analysis (FMEA) is one of the well- defective unit appearance, reduced performance level, system
known risk-assessment methodologies. FMEA, first used in instability, etc.
1960’s in the Aerospace industry, is now recognized as a The following sequence of questions & answers is needed
fundamental tool in the Reliability Engineering field. to perform the root-cause analysis of failure modes, as well as
The purpose of FMEA is to examine possible failure modes to evaluate the occurrence (O) and detectability (D) ranks:
and determine the impact of these failures on the product ο What is the cause of every identified failure? –
(Design FMEA - DFMEA), process (Process FMEA - Specific cause(s) for each failure mode must be
PFMEA) or service (Service FMEA - SFMEA): identified. Root-cause analysis can be supported by
• DFMEA is used to analyze product designs before relevant historic data, reports, complaints, service
they are released to production. DFMEA focuses on calls, etc.;
potential failure modes associated with the functions ο How often does this cause happen? – Evaluate the
of product and caused by the design deficiencies; probability of occurrence of the cause that resulted in
• PFMEA is used to analyze the already developed or the failure. The likelihood of occurrence should be
existing processes. PFMEA focuses on potential ranked also on a ‘1’ (Best Case) to ‘10’ (Worst Case)
failure modes associated with both the process scale [1] using historical data (failure rate, MTBF,
safety/effectiveness/efficiency, and the functions of a FRACAS reports, Cpk data, etc.).
product caused by the process problems; ο How, when and where can we detect this cause and/or
• SFMEA is used to analyze the product serviceability, a relevant failure mode? – Current control activities
i.e. it is focused on the potential problems associated associated with given cause and/or related failure mode
with both maintenance issues and field failures of the should be considered.
manufactured products. ο How well can we detect this cause and/or relevant
failure mode? – The probability that the control system
2. BACKGROUND will detect failure mode and/or cause when they occur
should be evaluated. Similar to severity and
FMEA represents a ‘Step by Step’ procedure (some occurrence, detectability is ranked on a ‘1’ (Best Case)
sequence of questions & answers) implying, at its first steps, a to ‘10’ (Worst Case) scale [1] using the data
tabulation of system functions or system equipment items, the
characterizing the effectiveness of control (testing, ordered RPN values and is similar to so-called Scree Plot used
inspection, measurements, etc.). in principal component analysis.
Obtained ranks of severity (S), occurrence (O) and Scree Plot settings require preliminary ordering of the RPN
detectability (D) are used for risk assessment via an index values by size, from smallest to largest. These values are then
called RPN (Risk Priority Number) calculated by multiplying plotted, by size, across the graph. Normally, when observing
the severity, occurrence and detection ranking factors for every from the right, Scree Plot appears like a cliff, descending to
cause: base level of ground (see Fig. 1).
The calculated RPN usually form a right-skewed
RPN = S * O * D distribution, with a short tail on the left (negligible risk values)
Once all items have been analyzed and assigned a RPN and a long tail on the right (due to critical risk values
value, it is common to plan corrective actions from the highest representing ‘outliers’ from the distribution analysis point of
RPN value down. The intent of any corrective action is view). Therefore, the shape of the points forms a non-
reduction of any of the severity, occurrence and/or detection symmetrical upward curve on a Scree Plot.
rankings. The lower long part of the plot is characterized by a
gradual increase of the RPN values that can, usually, fit a
st
3. PITFALLS OF CONVENTIONAL FMEA AND straight line with a rather slight slope (showed by 1 dotted line
PROPOSED SOLUTIONS on plot). The RPN values scattered around this line should be
considered as a kind of ‘Information Noise’. The issues
Conventional FMEA procedure is characterized by some characterized by these RPN do not require immediate
pitfalls. The Expanded FMEA (EFMEA) procedure proposed attention.
by the authors, improves the FMEA effectiveness providing The short uppermost part of Scree Plot is characterized by
solutions to two very important problems: a very steep increase of the RPN values (RPN jumps). A
nd
ο RPN prioritization, straight line with a very strong slope (showed by 2 dotted line
ο Corrective actions comparison. on plot) could fit it. The RPN values scattered around this line
are related to the most critical issues of FMEA that need to be
3.1. RPN prioritization dealt with promptly.

The items covered by the FMEA procedure are usually 3.2. Choice of preferable corrective action
very different from the risk values point of view. Obviously,
the most important items, characterized by high RPN, should There are, usually, several possible competitive corrective
be separated from those characterized by a significantly lower actions that, theoretically, are capable of reducing the RPN for
RPN value. Selected ‘High Priority’ items represent issues for any given failure mode. Although there are actions that aim at
corrective action plan development. The question is ‘How such failure mode severity reduction (usually by redesign), the bulk
separation can be performed?’ of the actions, deemed appropriate, aim at either occurrence
Common recommendations of the conventional FMEA ranking reduction or detectability ranking reduction. Actions
concerning calculated RPN values are usually very general. aimed at occurrence ranking reduction seek to prevent the
For example, occurrence of the cause of failure mode, or to reduce the rate at
ο ‘For higher RPN’s the team must undertake efforts to which the cause and/or the failure mode occur. Actions aimed
reduce the calculated risk through corrective action(s)’ at detectability ranking reduction adopt a course of action
[1]; focused on improvement or on the detection of the cause
ο ‘Focus attention on the high RPN’s’ [2]; and/or the failure mode prior to its occurrence and to issue a
ο ‘Expend team effort on top 20 to 30% of problems as warning.
defined by RPN values’ [3]. Since conventional FMEA does not provide any guidelines
The common practice of an FMEA team analyzing RPN for the optimal choice between competitive corrective actions,
values in Pareto fashion is to limit the list of recommended the FMEA team faces a difficult task. Priority of the
corrective actions to ‘Top ‘X’ Issues’. Chosen X-value could alternatives under comparison usually is subjectively
be 3 or 5 or 10, etc. In any case, the ‘X’ will be an absolutely established based on intuition, experience and/or feelings of
random choice. Obviously, this kind of decision-making is FMEA team members. The final solution recommended by the
very problematic. FMEA team is, often, far from being the optimal one, such as
How can we decide which RPN values characterize critical an action preferable from the department manager’s point of
issues that should be immediately treated? This question can view or the one suggested by the loudest member of the team.
be answered by a distribution analysis of the RPN values. The EFMEA procedure provides the basis for the optimal
Although there is some rather sophisticated statistical corrective action choice. This procedure implies evaluation of
techniques supporting distribution analysis, we recommend the both the feasibility of a corrective action implementation and
application of a very simple and quite effective graphical tool the expected RPN value after implementing this action. Since
for RPN value analysis. This tool actually represents graph of feasibility estimation is a multidimensional problem, its
evaluation should be performed by posing the question: ‘How
feasible is it to implement a given corrective action under the
existing constraints of safety, cost, resources, time, quality & Where: RPNi Before and RPNi After are RPN values for a given
reliability requirements, organizational structure, personnel item before and after implementation of the i-th corrective
resistance, etc.?’ Moreover, EFMEA takes into consideration action, ∆RPN is the difference between these values; Fi is the
both chance of success (i.e. the RPN reduction) and the feasibility rank of i-th corrective action.
probability of an undesirable impact (on people, system, The calculated ratio belongs to the family LTB (‘The
product, process or environment) as a result of a corrective Larger-the Better’), i.e. the most preferable corrective action is
action implementation. characterized by the largest ratio.
Similar to the conventional FMEA’s procedure, the There is an alternative approach for feasibility evaluation
feasibility rank (F) is estimated on a ‘1’ (Best Case) to ‘10’ based on a known procedure of Pareto Priority Index
(Worst Case) scale using the criteria proposed by the authors calculation [4]. The feasibility estimate can be calculated as
and presented in Table 1. the geometrical mean of values of all feasibility related
The EFMEA procedure results in prioritization of the dimensions (such as cost, time consumption, chance of
analyzed alternatives. The final decision, i.e. the choice of the success, etc.) [5]. Obviously, the same dimensions, measured
optimal corrective action, is based on the results of the on the same scales, should characterize all competitive
comparative analysis of the differences between the RPN corrective actions.
values before and after the implementation of given corrective
actions divided by the corresponding feasibility ranking factors

RPN i Before − RPN i After ∆ RPN


=
Fi Fi

RPN Values

nd
2 Straight Line

RPN Jumps

st
1 Straight Line

RPN
Graduate
Increase

1 2 3 . . . . . n Cause

‘Information Noise’ Critical Issues


Figure 1. Scree Plot RPN

Criteria: Feasibility of Corrective Actions Implementation Ranking


Safety Problem and/or Noncompliance to Government Regulation and/or unavailable necessary resources 10
and/or unacceptable cost and/or time consumption and/or zero chance of success and/or 100% probability of
undesirable impact.
Very remote availability of necessary resources and/or almost unacceptable cost and/or time consumption 9
and/or almost zero chance of success and/or almost 100% probability of undesirable impact.
Remote availability of necessary resources and/or near unacceptable cost and/or time consumption and/or 8
remote chance of success and/or near 100% probability of undesirable impact.
Very low availability of necessary resources and/or very high cost and/or time consumption and/or very low 7
chance of success and/or very high probability of undesirable impact.
Low availability of necessary resources and/or high cost and/or time consumption and/or low chance of 6
success and/or high probability of undesirable impact.
Rather low availability of necessary resources and/or rather high cost and/or time consumption and/or rather 5
low chance of success and/or rather high probability of undesirable impact.
Moderate availability of necessary resources, cost, time consumption, chance of success and probability of 4
undesirable impact.
Rather highly available resources, rather low cost and time consumption, rather high chance of success and 3
rather low probability of undesirable impact.
Highly available resources, low cost and time consumption, high chance of success and low probability of 2
undesirable impact.
Fully available resources, very low cost and time consumption, near 100% chance of success and near zero 1
probability of undesirable impact.

Table 1. Feasibility Ranking

Current Vs. Expected


S O D RPN ∆RPN Corrective
Item
Recommended
(Failure Mode & F F Action's
Difference

Corrective Action
Expected

Expected

Expected

Expected

Cause) Priority
Current

Current

Current

Current

Change of raw
9 9 8 1 6 8 432 72 360 8 45 2
material
Failed Product Change of process
Due to Insufficient temperature 9 9 8 4 6 8 432 288 144 2 72 1
Strength
Change of process
pressure 9 9 8 4 6 8 432 288 144 5 29 3

Table 2. Example of Corrective Actions Prioritization

4. CASE STUDY identification of all failure effects and root-cause analysis of


failure modes in teamwork, all corresponding RPN values have
The proposed procedure has been applied for evaluation of been calculated (see Table 3)
the communication device, designed by Motorola. After
D
C O
E
POTENTIAL S L C R
Function POTENTIAL FAILURE POTENTIAL CAUSE(S)/ CURRENT DESIGN T
ID FUNCTION
MODE
EFFECT(S) OF E A
MECHANISM(S) OF FAILURE
C
CONTROLS E
P
Description FAILURE V S U N
C
S R
T
Note: The FMEA will use the "circuit block" approach in which all parts in a circuit block are treated as an FMEA one item.
NO DTE/DCE Data portion is not U5201 is damaged Due to Test ESD (x KV)
7 6 3 126
communication working ESD after Pilot
Unit set to communicate
NO DTE/DCE Data portion is not
7 in other modes of 4 Verify integration 6 168
communication working
communication (e.g. USB)
BB-1 RS 232 RS 232 port RTS / DTR
DTR or RTS are not
connections are
connected to the
checked in final
NO DTE/DCE Data portion is not processor Reasons may
7 4 test (Installations 2 56
communication working be contacts in J1/J10/J11
recommendations
or customer installation
in the developer
problems.
guide).
Distorted sound in Injected signal was too
TX Path High Distortion 4 3 N/A 10 120
the uplink high
"Pops" heard
Aux. Audio "Pop" when pressing 5 wrong integration 10 N/A 10 500
Analog Audio
BB-2 On/Off switch
signals
Radio turns on
RX Path AUDIO_OUT_ONOFF DC Checked at final
automatically, with 3 3 54
loaded/shorted to gnd test
Low (auxiliary) audio applying Vcc, low 6
or no handsfree Line loaded due to bad
2 N/A 10 120
audio assembly
Phone is not unpower/ Disconnected
Flash Corruption 8 3 Final Test or before 1 24
functioning at all the unit during SW load
Flash Boot Sector Phone is not
8 3 Final Test or before 1 24
Corruption functioning at all
Main
Logic circuits Unit issues reset
BB-3 functionality
& Memories out and turns off at
of the phone
OS startup (after Phone is not Internal Discontinuity in
8 7 N/A 10 560
completing functioning at all PCB
initialization ) - logo
presented
"A" damaged by
Very High Current Phone is not overvoltage although
Checked at final
Consumption (>xA) functioning at all, 8 there are 3 3 72
test
at power-up excessive heat recommendations in
Developer guide
Very High Current Phone is not
Checked at final
Main Consumption (>xA) functioning at all, 8 RF PA damaged 3 2 48
Power test, replace PA
BB-4 functionality at power-up excessive heat
Management
of the phone
Radio powers up,
High Current
but draws high one of "A" regulators Checked at final
Consumption (yA < 6 3 4 72
current, excessive supplying RF is shorted test
I < xA) at power-up
heat in "A" area
Phone is not Main FET not properly Checked at final
No current 8 2 3 48
functioning at all soldered test
Radio doesn't turn
inadequate zener diode,
on/off due to Protection diode
No Ignition burs out easily, drawing
BB-5 IGNITION Ignition, but turns and resistor burn- 5 10 N/A 10 500
functionality additional current through
on/off from out,
the resistor as well
audio_out_onoff
D
C O
E
POTENTIAL S L C R
Function POTENTIAL FAILURE POTENTIAL CAUSE(S)/ CURRENT DESIGN T
ID FUNCTION
MODE
EFFECT(S) OF E A
MECHANISM(S) OF FAILURE
C
CONTROLS E
P
Description FAILURE V S U N
C
S R
T

Poor receive 1. Factory assembly


100% test on final
levels, dropped 7 defects. 3 1 21
test
Unit not receiving / calls 2. Defective parts.
RF-1 RF RX Path FER
Poor RX quality
Wrong RSSI 100% test on final
7 Phasing problem 2 1 14
indication to user test

Limited TX
Output No output power/ dynamic range -
3 5 N/A 9 135
power. Low power limited unit
coverage 1. Factory assembly
defects.
timing error TX parametric 2. Defective parts.
RF-2 RF TX Path Tested any proto in
/ Frequency errors: timing error / Dropped calls. 7 5 9 315
extremes
error Frequency error

High VSWR on the


Damage to power Loss of radio antenna ports Due to
8 5 N/A 10 400
amplifiers fuctionality Damage to output cable /
antenna
1. ESD tested
during qualification
Antenna Loss of radio ESD although coil is tests
RF-3 RF - Front End Component damage 8 1 4 32
path fuctionality placed- L450 2. parameters
tested 100% in
factory.
Anything from 1. Connector not soldered
Connector
Antenna 50ohm degradation in properly Visual inspection in
RF-4 becomes detatched 8 5 1 40
Connector connectors receive and 2. High pull force by the factory.
from the PCB
transmit user

Table 3. Failure Mode and Effects Analysis (DFMEA)

All RPN values were sorted and plotted on a graph in the uppermost part of Scree Plot have been identified and
ascending order (see Fig. 2). The critical issues belonging to reviewed.

Scree Plot of ordered RPN values

600

500

400
RPN

300

200

100

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Item

Figure 2. Scree Plot of ordered RPN values


ACTION RESULTS
D
C O
E
POTENTIAL S L POTENTIAL CAUSE(S)/ C CURRENT R S O D R
Function POTENTIAL FAILURE T RECOMMENDED
ID
Description
FUNCTION
MODE
EFFECT(S) OF E A MECHANISM(S) OF C DESIGN
E
P
ACTION(S)
E C E P F ∆RPN/F
FAILURE V S FAILURE U CONTROLS N V C T N
C
S R
T
1. Add
recommendations in
5 3 10 150 1 350
"Pops" heard Developer guide
Analog
BB-2 RX Path Aux. Audio "Pop" when pressing 5 Wrong Integration 10 N/A 10 500
Audio signals
On/Off switch
2. Integration with host
5 2 10 100 5 80
product

1. Improve the
8 2 10 160 9 44
vendor's process
Unit issues reset 2. Add test at the
out and turns off at vendor facility before
Logic Main
OS startup (after Phone is not Internal
BB-3 circuits & functionality 8 7 N/A 10 560 shipping to Motorola 8 3 10 240 4 80
completing functioning at all Discontinuity in PCB for every batch
Memories of the phone
initialization ) - logo (sampling)
presented
3. Add Acceptance
Inspection (100% at 8 1 10 80 8 60
Motorola door)
1. Protection diode
inadequate zener 5 1 10 50 3 150
Radio doesn't turn 2. Resistor derating
diode, burs out
on/off due to Protection diode 3. Add testing in the
No Ignition easily, drawing 5 10 4 200 2 150
BB-5 IGNITION Ignition, but turns and resistor burn- 5 10 N/A 10 500 Final Test
functionality additional current
on/off from out, Improve Design (1.2)
through the resistor
audio_out_onoff & Increase 5 1 4 20 3 160
as well
Detectability (3)
TX parametric 1. Factory assembly Tested any
errors: timing error / Dropped calls. 7 defects. 5 proto in 9 315 Test 100% in factory. 7 5 1 35 3 93
Frequency error 2. Defective parts. extremes
timing error PA's specified to
RF-2 RF TX Path / Frequency High VSWR on the ruggedness of x:1
error Damage to power Loss of radio antenna ports Due which is sufficient
8 5 N/A 10 400 8 1 10 80 2 160
amplifiers fuctionality to Damage to output because of the loss
cable / antenna from the PA output pin
to the antenna ports.

Table 4. Corrective Action Analysis & Prioritization (EFMEA)


From 1 to 3 alternatives for every issue have been revealed Beit-Dagan, Israel 50200
during discussion of the corrective actions plan by the FMEA Internet (e-mail): zigmund@ald.co.il
team. Prioritization of the proposed alternatives has been Zigmund Bluvband is President of Advanced Logistics
performed using the software supporting the suggested Developments Ltd. His Ph.D. (1974) is in Operation Research.
procedure of EFMEA [6]. Every corrective action has been He is a Fellow of ASQ and ASQ – Certified Quality &
evaluated taking into account both the expected RPN value Reliability Engineer, Quality Manager and Six Sigma Black
after this action implementation and the feasibility of the Belt. Z. Bluvband has accrued 30 years of industrial and
action implementation under the existing constraints of cost, academic experience. Z. Bluvband was the President of the
resources, time, as well as quality & reliability requirements. Israel Society of Quality from 1989 to 1994. He has published
Comparative analysis of adequacy of the alternatives has been more than 60 books, papers and tutorials.
performed using the proposed ‘standardized-improvement’
criteria ( ∆RPN /F). Post-FMEA activity (design and testing Pavel Grabov, Ph.D.
improvement) was based on chosen preferable corrective A.L.D. Ltd.
actions (see Table 4). 5 Menachem Begin Blvd.
Beit-Dagan, Israel 50200
REFERENCES Internet (e-mail): grabov@ald.co.il
Pavel Grabov is VP and CTO of ALD Ltd. His Ph.D
1. ‘Potential Failure Mode and Effects Analysis (1978) is in Nuclear Physics. He is a member of ASQ and
(FMEA)’, QS-9000 Reference Manual, Chrysler ASQ – Certified Quality Engineer and Six Sigma Black Belt.
Corporation, Ford Motor Company, General Motors His area of expertise is quantitative methods of quality &
Corporation, 1995. reliability engineering. He is senior lecturer at Technion (Israel
2. ‘Failure Mode and Effect Analysis’, 6• Green Institute of Technology). He has over 25 years of academic
Belt Workshop, AlliedSignal Inc., 1998. and industrial experience.
3. James McBride & Edward Schleicher ‘Failure
Mode(s) and Effect(s) Analysis (FMEA)’, McBride, Oren Nakar, M.Sc.
Schleicher, & Associates, 2000. MOTOROLA Israel Ltd.
4. B. Hartman, ‘Implementing Quality 3 Krementski St.
Improvement’, The Juran Report, #2, November 1983. Tel-Aviv 67899
5. Z. Bluvband, ‘Quality Greatest Hits: Classic ISRAEL
Wisdom from the Leaders of Quality’, ASQ Quality Internet (e-mail): Oren.Nakar@motorola.com
Press, Milwaukee, Wisconsin, 2002. Oren Nakar is the reliability manager of Motorola Design
6. PFMEA, RAM Commander RAMC 7.2 New Center in Israel. He holds Master of Science (1997) in Quality
Features, ALD WEB Site, www.aldservice.com & Reliability Engineering from the Technion. He is a member
of ASQ and ASQ – Certified Reliability Engineer. Since 2002
BIOGRAPHY he is responsible for Six Sigma activities in the Motorola
Design Center. His areas of interest are: Reliability Analysis
Zigmund Bluvband, Ph.D. and Prediction, FRACAS activity and Experimental Design.
A.L.D. Ltd.
5 Menachem Begin Blvd.

View publication stats

You might also like