You are on page 1of 44

Failure Mode and Effect Analysis

(FMEA)
Training

Roy Gill, MD Director, Requirements


January 25, 2021

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 1


Goals Of This Course
• To understand the foundations and implementation of FMEA
- Discuss background and purpose of FMEA
- Discuss definitions and processes regarding FMEA

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 2


Agenda

FMEA Background

What is at the heart of an FMEA?

Full FMEA Process

Practice

Recording results

Questions

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 3


Background

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 4


4
FMEA Overview – What is FMEA

• FMEA - Failure Mode and Effect Analysis is a


systematic methodology to identify, analyze, prioritize,
prevent, mitigate, or control risks before they occur.

• FMEA is carried out on new features of a product


before the feature is released to the market.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 5


Importance of Proactive Risk Analysis
• EHR software is an inherently risky product due to its complexity and its direct effects on the healthcare of
patients and populations, as well as the impact on the businesses that administer healthcare.

• This risk demands a Proactive Risk Assessment (PRA) during refinement in order
to minimize and mitigate both design and development flaws that - if not prevented as early in the process
as possible - could be detrimental to our patients and clients.

• FMEA is an industry standard PRA method that is widespread both in healthcare and in
software development that has been demonstrated to prevent serious problems from being introduced
into released software. Early detection of such issues will prevent re-work (and expense); safer
software will improve the health of our patients and clients while escalating our profile as a high-quality
organization.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 6


FMEA Overview – Why FMEA
• NASA (National Aeronautics and Space Administration) was one of the
companies who first conducted FMEA in the 1960’s. The main focus of FMEA
was to improve safety, prevent safety issues and improve quality of the products.
• FMEA is versatile methodology which can be implemented on any product line,
irrespective of the product’s intended use. Therefore, FMEA gives a standardize
approach to identify critical risks which can be utilized across the board.

The goal of NextGen Healthcare, with


utilization of FMEA is to prevent critical
defects and improve client satisfaction.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 7


FMEA Overview – Why FMEA
• In addition to standardization, FMEA helps in preventing defects
which in turn enhances safety and eventually increases customer
satisfaction.
• FMEA has also proven to be a cost-effective methodology, helping
to identify improvements early in the development process when
changes are relatively easy and inexpensive resulting in a more
robust product.
• Assuming or understanding that there are failures in the design, or
process of a product during its lifetime and improvement of these
parameters call for FMEA. There is no such thing as 100% perfect,
therefore there is always room for FMEA.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 8


Big Picture
• Identify Failure Modes
• Determine Severity, Occurrence, and Detectability (SOD) scores
- Calculate Risk Priority Number (RPN)
• Mitigation and Control
• Test plans and closing the loop

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 9


What is at the Heart of FMEA?
Failure Modes, SOD and RPN

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 10


10
Identify Failure Modes

• Main pitfall: Failure to identify a Failure Mode


- Due to an insufficiently comprehensive brainstorming session

• Failure: “The inability of an item, product, or service to perform required


functions on demand due to one or more defects”

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 11


What Can Go Wrong?
• The intended function is not performed
• The intended function is performed, but there is some safety problem or a problem in meeting a regulation
associated with the intended function performance
- The intended function is performed, but at a wrong time (availability problem)
- The intended function is performed, but at a wrong place
- The intended function is performed, but in a wrong way
- The intended function is performed, but the performance level is lower than planned
- The intended function is performed, but its cost is higher than planned (unscheduled maintenance or repair,
higher consumption of required resources)

• An unintended (unplanned) or undesirable function is performed


• Period of intended function performance is impossible or problematic
• Support for intended function performance is impossible or problematic (maintenance, repairability,
serviceability problem)

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 12


Failure Mode EFFECT
• Once failure or hazard is identified, the team identifies the potential effect or impact of each
failure on the patient in this step.
- Consider this step as an if-then process - If the failure occurs, then what are the
consequences?
- There can be multiple effects for one failure and these effect should be clearly noted

• Effects on patients will determine the assignment of Risk rankings for each failure. At this point
the SME will play an important role.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 13


Determine Severity, Occurrence, and Detectability
scores
• Severity is the defined degree of seriousness of the effect if the given failure does
occur.
• Potential cause of a failure mode can be determined from past history, knowledge, & expertise of the team
members. In many cases the team requires technical expertise and a clear understanding of the system
architecture to determine the potential cause of a failure mode.
• Potential cause should be clearly noted and understood since potential cause can help in determining the
mitigation activities.

• The team shall assign a Severity score to each Failure Mode/effect per the Severity Rankings
table, which is mentioned in the next slide

• The more severe the impact of failure, the more hazardous it is


- It is the effect, not the failure that is rated.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 14


Determine Severity, Occurrence, and Detectability
scores
Severity Ranking (Quantitative) Corresponding Option

0 No harm

1 Negligible harm

2 Minor harm

3 Major harm

4 Critical harm

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 15


Determine Severity, Occurrence, and Detectability scores
• Once potential cause of failure is identified, team identifies the occurrence, the frequency or how often the
failure can occur.
• Team can best estimate the occurrence ranking by knowing the potential cause of failure. Therefore, once potential cause of
failure has been identified, an occurrence ranking can be assigned even if failure data does not exist.

• The best method to determine occurrence ranking is to use actual data, e.g. failure logs, history of failure
or even process capability data. If actual failure data is not available, the team must estimate based on
knowledge and expertise of the team members.

The More the Occurrence of Hazard, the more Hazardous it is.


Note: It is the failure that is rated, not the effect.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 16


Determine Severity, Occurrence, and Detectability
scores
Occurrence Ranking Corresponding Option
(Quantitative)
0 Hazard Cannot Occur

1 Theoretical

2 Remote/Occasional

3 Probable

4 Frequent/Always

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 17


Determine Severity, Occurrence, and Detectability scores

• This ranking estimates how well the controls can detect the failure before the
customer is affected. The identification of detectability ranking is determined by current
controls that may detect the failure or effect of the failure. If there are no current controls, the ability to
detect the failure will be low and therefore the ranking can be high.

• Team shall determine (or assign detectability score to) each failure per the Detectability Rankings table,
which is mentioned in the next slide.

• The more obvious the hazard, the lower the risk

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 18


Determine Severity, Occurrence, and Detectability scores

• Once occurrence ranking is identified, in this step, team identifies current process
control that can mitigate risk and/ or informs the user of the failure.

• Current Controls can be defined as current procedures, processes, techniques,


current system functionality and standards, etc. For example, the system error log,
where the user is able to view the failure in the transmission and field validation, and
where the system prevents the user from entering incorrect values, providing warning
messages and workarounds.

• Current controls can help in determining detectability ranking in the next step.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 19


Determine Severity, Occurrence, and Detectability scores
Detectability Ranking Corresponding Option
(Quantitative)

1 Obvious

2 Noticeable

3 Obscure

4 Undetectable

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 20


Risk Priority Number (RPN)

• RPN helps in prioritizing the Risks and also helps to figure out if mitigation is
required for the failure.

• The RPN is the mathematical multiplication of severity, occurrence, and


detectability rankings (S * O * D = RPN)

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 21


Risk Priority Number Categorization
Risk Categories RPN Level / (Acceptable ?)

Minor Risk Risk which is small; no further mitigation is


1 through 11 Acceptable/ Yes
(Green) required. The rationale shall be documented.

Risk is tolerable, but actions can be taken to


Moderate Risk I
control the risk, if deemed important by the 12 through 17 Tolerable/ Yes
(Aqua)
team.
Risk Mitigation shall be required to reduce the
Risk as low as reasonably possible. If the
Moderate Risk II Undesirable/ Risk
Residual Risk is still Moderate II after 18 through 27
(Yellow) Benefit Analysis
mitigation, the risk may be tolerated with risk
benefit analysis documented and approved.
The Risk is intolerable; no risk benefit analysis
Major Risk is acceptable. Risk shall be mitigated, at a
28 and Above Intolerable/ No
(Red) minimum, to a Moderate Risk I or II and then
handled as described above.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 22


Mitigation and Control
• This step is performed during FMEA sessions.
• Once the team has identified the Risk which requires mitigation, Risk
Mitigation actions shall be identified in this step.
• Risk Mitigations actions to be taken, by the team or other members in the
organization, need to bring the Risk to an acceptable level.
• Mitigations can be organized problem-solving processes. Ideally, failure should
be eliminated completely, however it may not be achievable in all cases.
• Often the easiest approach to mitigate the Risks is to increase the detectability
of the failure, thus lowering the detectability ranking.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 23




I would rather have questions that
can't be answered than answers
that can't be questioned.
Richard Feynman

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 24


24


The wise man doesn't give the
right answers, he poses the right
questions.
Claude Levi-Strauss

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 25


25
Full FMEA Process

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 26


26
Scope: What, When
• All new features refined and developed in the NextGen suite of products
require PRA/FMEA
• Because the goal is prevention, FMEA should be performed as early as is
reasonable in the SDLC, and will commonly be performed more than once:
- The actual milestones that trigger the FMEA will depend on the general risk and complexity
of the feature and should be discussed and planned by the team. For example:
• After story map is complete and/or when design is complete
• Once there is “code” to demonstrate and test
• Confirm that previously identified risks have been mitigated
• Check for any new risks

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 27


Scope: Who, Why
• Refinement team+ [NB: all members may provide inputs in all matters]
• Requirements manager
• Run the meeting
• Review and explain the feature and plan to date

• Software developer (Architect when felt needed by the team)


• Identify potential software failure modes
• Identify likelihood of software failures

• Solutions manager
• Identify Severity in most cases
• Identify likelihood of process and workflow failures
• Identify Detectability from a user perspective

• Quality Assurance
• Provide general expertise in the likelihood of failures and Detectability
• Provide test cases to assure that the failure mode risks are mitigated

• Subject matter expert (may be the Solutions manager, Clinical Center of Excellence or
Business Center of Excellence team member or other “guest” as needed)
• Similar to Solutions manager role
Confidential–For Use By Authorized Employees Only. Do Not Distribute. 28
FMEA: Activities: Preparation
1. Early in refinement, the Requirements manager should address the topic of FMEA timing
with the team
1. Assess which milestones should trigger FMEA meeting(s)
2. Schedule meetings as soon as milestone dates are known

2. Prior to the FMEA meeting, the Requirements manager should prepare to present the
functionality in its most advanced state using the available artifacts. (This may include a
story map, wireframe, working software, etc..)

3. The Requirements manager should prepare the FMEA Worksheet spreadsheet with the
details needed for the meeting
1. The Requirements manager may pre-fill any failure modes they can think of prior to
the meeting if time allows

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 29


FMEA Activities: Failure Mode Identification and
Scoring
1. Once the team understands the new feature, the team shall determine ‘How can the product fail or not
work as expected?’ The answer to this question will result in the number of failures / hazards.
• A good starting point can be identifying the failure via a list of common hazard categories. Some of
the hazard categories, with examples, are explained in the next slide.

2. The team shall assign S, O and D scores for each failure mode (the RPN should auto-calculate)

3. User Acceptance Criteria that state how the system “should” function should be discussed and recorded

4. It is helpful to discuss Mitigations during the meeting, but commonly these discussions divert from the main
mission of identifying and evaluating the failure modes.
• If time allows the Mitigations may be discussed and recorded, but not at the risk of concentrating on
Failure Mode identification

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 30


FMEA Activities: Mitigations
1. Once the team has identified the Risk which requires mitigation, Risk Mitigation actions shall be
identified in this step
2. Risk Mitigations are actions to be taken, by the team or other members in the organization, to bring
the Risk to an acceptable level.
3. Mitigations can be organized problem-solving processes. Ideally, failure should be eliminated
completely, however it may not be achievable in all cases.
4. Often the easiest approach to mitigate the Risks is to increase the detectability of the failure, thus
lowering the detectability ranking. Some of the examples to increase the detectability are as follows.
The following examples may not be best approach but can be acceptable:
1. Warning Messages, Error logs etc.
2. Field Validation, where the system prevents the user from entering incorrect information
3. Workarounds in the system to get to the same information
4. User notifications
5. Change wireframes, screens to make the failure more visible

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 31


FMEA Activities: Mitigations

1. The richest opportunity for improvement lies in reducing the


likelihood of occurrence of the failure. After all if its highly unlikely
that failure will occur, there will be less need for detection
measures.

2. The team should make certain that the mitigations are not merely
a ‘Band-Aid’, since ‘Band-Aids’ are often costly and do not
actually improve the quality of the product.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 32


FMEA Activities: Recording Results

• This slide will outline process steps specific to each product, including:
• Where will artifacts (such as the FMEA worksheet) be stored
• What Jira steps will be involved to assure FMEA is done and the results are logged and traceable

Draft Process proposed:

• The spreadsheet will be created for each feature and attached to Epic in Jira.
• It might be created as a separate confluence page under RSD

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 33


FMEA Activities: Re-evaluation, Testing

• At least one test case should be created for each failure mode (as
informed by the user acceptance criteria): this assures that the loop will be
closed and provide tangible evidence that undesirable risk has not entered
the system

• Follow up FMEA as needed

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 34


Hazard Categories
Categories Description / Example
Calculation Error Incorrect system calculation due to incorrect formula in logic / code.
E.g., Showing incorrect slots, distances (miles vs. kms).

Display Error Incorrect data displays that may cause a Potential Patient Safety Issue.
E.g., 1) Physician approves lab results with comments, but subsequent view of these comments
in Orders module is truncated, 2) Data retrieval results in an incorrect display.

Implementation Error Incorrect feature implementation or product configuration resulting in a Potential Patient Safety
Issue.
E.g., Creation of templates displays incorrect “date of last” field in health maintenance.
Input Error Incorrect input by the user resulting in a Potential Patient Safety Issue.
E.g., UI allows data entry of free text as opposed to only numeric values.

Installation Error Incorrect installation on a users computer that may result in a Patient Safety Issue.
E.g., 1) Mismatch in setting metric vs. standard units of measure can lead to unexpected results,
2) Incorrect client network installation or configuration.

Labeling Hazard Incorrect output which provides clinically incorrect instructions that may result in a Potential
Patient Safety Issue.
E.g., Documentation providing incorrect instructions.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 35


Hazard Categories

Categories Description / Example


Patient Context Error Patient’s information mismatched that may result in a Potential Patient Safety Issue.
E.g., Patient A’s medications are displayed within the Med Module for Patient B.

Storage Error Incorrect data storage that may result in a Potential Patient Safety Issue.
E.g., Partially executed database operation is not transactional and does not rollback on failure,
leaving data in a partially (corrupted) state.

Transport Error Incorrect data transportation from one system to other that may result in a Potential Patient
Safety Issue.
E.g., eRx transactions are not delivered and the provider is not notified.

Usability Error Inappropriate display of fields or data.


1) Field names overlap, 2) Change in “Save” button placement on every screen, 3) Units
displayed far from the field or data.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 36




I had six honest serving men-they
taught me all I knew: Their names
were Where and What and When-
and Why and How and Who.
Rudyard Kipling

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 37


37
Examples

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 38


38
Determine Severity, Occurrence and Detectability scores
Severity Ranking Corresponding Description / Example
(Quantitative) Option
No impact to patient care.
0 No harm
Example: An un-generated billing charge.

Inconvenience or temporary discomfort.


1 Negligible harm
Example: abrasion, temporary mild fever.

May result in temporary injury or impairment.


2 Minor harm
Examples: reversible kidney failure, hives.

May result in permanent injury or impairment.


3 Major harm Examples: chronic kidney failure, congested heart failure, loss of limb
(failure leading to incorrect diagnosis or treatment).
May result in death or life threatening event.
4 Critical harm Examples: anaphylactic shock, ventricular arrhythmia, pulmonary
embolism (failure leading to incorrect procedural treatment or death).

* Training note: These examples are clinical in nature and can be adjusted per the trainee situation

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 39


Determine Severity, Occurrence and Detectability scores
Occurrence Ranking Corresponding Description / Example
(Quantitative) Option
The hazardous situation can never occur.
0 Hazard Cannot Occur

The hazardous situation can only occur if a negligible set of


independent events happen simultaneously, or in rapid
1 Theoretical succession. The practical expectation is that frequency rating 1
events never occur, but are theoretically possible.

The hazardous situation may occur from time to time but would
2 Remote/Occasional require the coincidence of unlikely events. Example: Network
failure impacting data storage.
The hazardous situation will likely occur at predictable frequencies
over the life of the product.
3 Probable Example: Data backup, impacting patient context information.

The hazardous situation will often or always occur over the life of
the product. If frequency cannot be estimated, assume this
4 Frequent/Always value. Example: When the user tries to select the “Clear
Consciousness” checkbox , the system automatically selects and
saves the “Clouded Consciousness” option.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 40


FMEA: Activities - Phase A: Step 8: Detectability
Rankings
Detectability Ranking Corresponding Description / Example
(Quantitative) Option
The presence of the hazardous situation is evident to even
untrained users of the system. Example: The user receives a fatal
1 Obvious
error so that he/she clearly knows the function didn’t work;
however, this could delay patient care.

The hazardous situation can be detected easily by the intended


2 Noticeable
user of the system. Example: Data display errors.

The hazardous situation is difficult to detect even for the intended


users of the system in normal operating situations. Example:
3 Obscure
Hazards due to different versions of Operating Systems;
Transmission errors to different system.
The hazardous situation cannot be detected. Example: Hazard due
4 Undetectable to external data import, which cannot be detected unless the
system goes in production.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 41




There are no foolish questions and
no man becomes a fool until he
has stopped asking questions
Charles Proteus Steinmetz

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 42


42
Confidential–For Use By Authorized Employees Only. Do Not Distribute. 43
NOTICE: This document contains information that is confidential and proprietary to NextGen Healthcare, Inc. and its subsidiaries and
affiliates (“Company”) and is intended for use solely by its authorized employees. This document may not be copied, reproduced,
published, displayed, otherwise used, transmitted, or distributed in any form by any means as a whole or in any part, nor may any of
the information it contains be used or stored in any information retrieval system or media, translated into another language, or
otherwise made available or used by anyone other than the authorized employee to whom this document was originally delivered
without the prior, written consent of Company.

By retaining or using this document, you represent that you are a Company employee who is authorized to use this document and
that you will use this document and the information it contains solely as and to the extent permitted by the Company. Any other use or
distribution of the contents of this document, as a whole or in any part, is prohibited.

Although we exercised great care in creating this publication, Company assumes no responsibility for errors or omissions that may
appear in this publication and reserves the right to change this publication at any time without notice.

© 2018 QSI Management, LLC. All Rights Reserved.

The registered trademarks listed at www.nextgen.com/legal-notice are the registered trademarks of QSI Management, LLC.
All other names and marks are the property of their respective owners.

Our issued and published patents can be found at www.nextgen.com/legal-notice.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 44


44

You might also like