HL7 Fhir

Failure Mode and Effect Analysis
(FMEA)
Training
Roy Gill, MD Director, Requirements

January 25, 2021
Confidential–For Use By Authorized Employees Only. Do Not Distribute. 1

Goals Of This Course
• To understand the foundations and implementation of FMEA
- Discuss background and purpose of FMEA
- Discuss definitions and processes regarding FMEA

Agenda
FMEA Background
What is at the heart of an FMEA?
Full FMEA Process
Practice
Recording results
Questions

Background

4
FMEA Overview – What is FMEA
• FMEA - Failure Mode and Effect Analysis is a

systematic methodology to identify, analyze, prioritize,
prevent, mitigate, or control risks before they occur.
• FMEA is carried out on new features of a product

before the feature is released to the market.

Importance of Proactive Risk Analysis
• EHR software is an inherently risky product due to its complexity and its direct effects on the healthcare of
patients and populations, as well as the impact on the businesses that administer healthcare.
• This risk demands a Proactive Risk Assessment (PRA) during refinement in order
to minimize and mitigate both design and development flaws that - if not prevented as early in the process
as possible - could be detrimental to our patients and clients.
• FMEA is an industry standard PRA method that is widespread both in healthcare and in
software development that has been demonstrated to prevent serious problems from being introduced
into released software. Early detection of such issues will prevent re-work (and expense); safer
software will improve the health of our patients and clients while escalating our profile as a high-quality
organization.

FMEA Overview – Why FMEA
• NASA (National Aeronautics and Space Administration) was one of the
companies who first conducted FMEA in the 1960’s. The main focus of FMEA
was to improve safety, prevent safety issues and improve quality of the products.
• FMEA is versatile methodology which can be implemented on any product line,
irrespective of the product’s intended use. Therefore, FMEA gives a standardize
approach to identify critical risks which can be utilized across the board.
The goal of NextGen Healthcare, with

utilization of FMEA is to prevent critical
defects and improve client satisfaction.

FMEA Overview – Why FMEA
• In addition to standardization, FMEA helps in preventing defects
which in turn enhances safety and eventually increases customer
satisfaction.
• FMEA has also proven to be a cost-effective methodology, helping
to identify improvements early in the development process when
changes are relatively easy and inexpensive resulting in a more
robust product.
• Assuming or understanding that there are failures in the design, or
process of a product during its lifetime and improvement of these
parameters call for FMEA. There is no such thing as 100% perfect,
therefore there is always room for FMEA.

Big Picture
• Identify Failure Modes
• Determine Severity, Occurrence, and Detectability (SOD) scores
- Calculate Risk Priority Number (RPN)
• Mitigation and Control
• Test plans and closing the loop

What is at the Heart of FMEA?
Failure Modes, SOD and RPN

10
Identify Failure Modes
• Main pitfall: Failure to identify a Failure Mode

- Due to an insufficiently comprehensive brainstorming session
• Failure: “The inability of an item, product, or service to perform required

functions on demand due to one or more defects”

What Can Go Wrong?
• The intended function is not performed
• The intended function is performed, but there is some safety problem or a problem in meeting a regulation
associated with the intended function performance
- The intended function is performed, but at a wrong time (availability problem)
- The intended function is performed, but at a wrong place
- The intended function is performed, but in a wrong way
- The intended function is performed, but the performance level is lower than planned
- The intended function is performed, but its cost is higher than planned (unscheduled maintenance or repair,
higher consumption of required resources)
• An unintended (unplanned) or undesirable function is performed

• Period of intended function performance is impossible or problematic
• Support for intended function performance is impossible or problematic (maintenance, repairability,
serviceability problem)

Failure Mode EFFECT
• Once failure or hazard is identified, the team identifies the potential effect or impact of each
failure on the patient in this step.
- Consider this step as an if-then process - If the failure occurs, then what are the
consequences?
- There can be multiple effects for one failure and these effect should be clearly noted
• Effects on patients will determine the assignment of Risk rankings for each failure. At this point
the SME will play an important role.

Determine Severity, Occurrence, and Detectability
scores
• Severity is the defined degree of seriousness of the effect if the given failure does
occur.
• Potential cause of a failure mode can be determined from past history, knowledge, & expertise of the team
members. In many cases the team requires technical expertise and a clear understanding of the system
architecture to determine the potential cause of a failure mode.
• Potential cause should be clearly noted and understood since potential cause can help in determining the
mitigation activities.
• The team shall assign a Severity score to each Failure Mode/effect per the Severity Rankings
table, which is mentioned in the next slide
• The more severe the impact of failure, the more hazardous it is

- It is the effect, not the failure that is rated.

scores
Severity Ranking (Quantitative) Corresponding Option
0 No harm
1 Negligible harm
2 Minor harm
3 Major harm
4 Critical harm

Determine Severity, Occurrence, and Detectability scores
• Once potential cause of failure is identified, team identifies the occurrence, the frequency or how often the
failure can occur.
• Team can best estimate the occurrence ranking by knowing the potential cause of failure. Therefore, once potential cause of
failure has been identified, an occurrence ranking can be assigned even if failure data does not exist.
• The best method to determine occurrence ranking is to use actual data, e.g. failure logs, history of failure
or even process capability data. If actual failure data is not available, the team must estimate based on
knowledge and expertise of the team members.
The More the Occurrence of Hazard, the more Hazardous it is.

Note: It is the failure that is rated, not the effect.

scores
Occurrence Ranking Corresponding Option
(Quantitative)
0 Hazard Cannot Occur
1 Theoretical
2 Remote/Occasional
3 Probable
4 Frequent/Always

• This ranking estimates how well the controls can detect the failure before the
customer is affected. The identification of detectability ranking is determined by current
controls that may detect the failure or effect of the failure. If there are no current controls, the ability to
detect the failure will be low and therefore the ranking can be high.
• Team shall determine (or assign detectability score to) each failure per the Detectability Rankings table,
which is mentioned in the next slide.
• The more obvious the hazard, the lower the risk

• Once occurrence ranking is identified, in this step, team identifies current process
control that can mitigate risk and/ or informs the user of the failure.
• Current Controls can be defined as current procedures, processes, techniques,

current system functionality and standards, etc. For example, the system error log,
where the user is able to view the failure in the transmission and field validation, and
where the system prevents the user from entering incorrect values, providing warning
messages and workarounds.
• Current controls can help in determining detectability ranking in the next step.

Detectability Ranking Corresponding Option
(Quantitative)
1 Obvious
2 Noticeable
3 Obscure
4 Undetectable

Risk Priority Number (RPN)
• RPN helps in prioritizing the Risks and also helps to figure out if mitigation is
required for the failure.
• The RPN is the mathematical multiplication of severity, occurrence, and

detectability rankings (S * O * D = RPN)

Risk Priority Number Categorization
Risk Categories RPN Level / (Acceptable ?)
Minor Risk Risk which is small; no further mitigation is

1 through 11 Acceptable/ Yes
(Green) required. The rationale shall be documented.
Risk is tolerable, but actions can be taken to

Moderate Risk I
control the risk, if deemed important by the 12 through 17 Tolerable/ Yes
(Aqua)
team.
Risk Mitigation shall be required to reduce the
Risk as low as reasonably possible. If the
Moderate Risk II Undesirable/ Risk
Residual Risk is still Moderate II after 18 through 27
(Yellow) Benefit Analysis
mitigation, the risk may be tolerated with risk
benefit analysis documented and approved.
The Risk is intolerable; no risk benefit analysis
Major Risk is acceptable. Risk shall be mitigated, at a
28 and Above Intolerable/ No
(Red) minimum, to a Moderate Risk I or II and then
handled as described above.

Mitigation and Control
• This step is performed during FMEA sessions.
• Once the team has identified the Risk which requires mitigation, Risk
Mitigation actions shall be identified in this step.
• Risk Mitigations actions to be taken, by the team or other members in the
organization, need to bring the Risk to an acceptable level.
• Mitigations can be organized problem-solving processes. Ideally, failure should
be eliminated completely, however it may not be achievable in all cases.
• Often the easiest approach to mitigate the Risks is to increase the detectability
of the failure, thus lowering the detectability ranking.

“
“
I would rather have questions that
can't be answered than answers
that can't be questioned.
Richard Feynman

24
“
“
The wise man doesn't give the
right answers, he poses the right
questions.
Claude Levi-Strauss

25
Full FMEA Process

26
Scope: What, When
• All new features refined and developed in the NextGen suite of products
require PRA/FMEA
• Because the goal is prevention, FMEA should be performed as early as is
reasonable in the SDLC, and will commonly be performed more than once:
- The actual milestones that trigger the FMEA will depend on the general risk and complexity
of the feature and should be discussed and planned by the team. For example:
• After story map is complete and/or when design is complete
• Once there is “code” to demonstrate and test
• Confirm that previously identified risks have been mitigated
• Check for any new risks

Scope: Who, Why
• Refinement team+ [NB: all members may provide inputs in all matters]
• Requirements manager
• Run the meeting
• Review and explain the feature and plan to date
• Software developer (Architect when felt needed by the team)

• Identify potential software failure modes
• Identify likelihood of software failures
• Solutions manager
• Identify Severity in most cases
• Identify likelihood of process and workflow failures
• Identify Detectability from a user perspective
• Quality Assurance
• Provide general expertise in the likelihood of failures and Detectability
• Provide test cases to assure that the failure mode risks are mitigated
• Subject matter expert (may be the Solutions manager, Clinical Center of Excellence or
Business Center of Excellence team member or other “guest” as needed)
• Similar to Solutions manager role
FMEA: Activities: Preparation
1. Early in refinement, the Requirements manager should address the topic of FMEA timing
with the team
1. Assess which milestones should trigger FMEA meeting(s)
2. Schedule meetings as soon as milestone dates are known
2. Prior to the FMEA meeting, the Requirements manager should prepare to present the
functionality in its most advanced state using the available artifacts. (This may include a
story map, wireframe, working software, etc..)
3. The Requirements manager should prepare the FMEA Worksheet spreadsheet with the
details needed for the meeting
1. The Requirements manager may pre-fill any failure modes they can think of prior to
the meeting if time allows

FMEA Activities: Failure Mode Identification and
Scoring
1. Once the team understands the new feature, the team shall determine ‘How can the product fail or not
work as expected?’ The answer to this question will result in the number of failures / hazards.
• A good starting point can be identifying the failure via a list of common hazard categories. Some of
the hazard categories, with examples, are explained in the next slide.
2. The team shall assign S, O and D scores for each failure mode (the RPN should auto-calculate)
3. User Acceptance Criteria that state how the system “should” function should be discussed and recorded
4. It is helpful to discuss Mitigations during the meeting, but commonly these discussions divert from the main
mission of identifying and evaluating the failure modes.
• If time allows the Mitigations may be discussed and recorded, but not at the risk of concentrating on
Failure Mode identification

FMEA Activities: Mitigations
1. Once the team has identified the Risk which requires mitigation, Risk Mitigation actions shall be
identified in this step
2. Risk Mitigations are actions to be taken, by the team or other members in the organization, to bring
the Risk to an acceptable level.
3. Mitigations can be organized problem-solving processes. Ideally, failure should be eliminated
completely, however it may not be achievable in all cases.
4. Often the easiest approach to mitigate the Risks is to increase the detectability of the failure, thus
lowering the detectability ranking. Some of the examples to increase the detectability are as follows.
The following examples may not be best approach but can be acceptable:
1. Warning Messages, Error logs etc.
2. Field Validation, where the system prevents the user from entering incorrect information
3. Workarounds in the system to get to the same information
4. User notifications
5. Change wireframes, screens to make the failure more visible

FMEA Activities: Mitigations
1. The richest opportunity for improvement lies in reducing the

likelihood of occurrence of the failure. After all if its highly unlikely
that failure will occur, there will be less need for detection
measures.
2. The team should make certain that the mitigations are not merely
a ‘Band-Aid’, since ‘Band-Aids’ are often costly and do not
actually improve the quality of the product.

FMEA Activities: Recording Results
• This slide will outline process steps specific to each product, including:
• Where will artifacts (such as the FMEA worksheet) be stored
• What Jira steps will be involved to assure FMEA is done and the results are logged and traceable
Draft Process proposed:
• The spreadsheet will be created for each feature and attached to Epic in Jira.
• It might be created as a separate confluence page under RSD

FMEA Activities: Re-evaluation, Testing
• At least one test case should be created for each failure mode (as
informed by the user acceptance criteria): this assures that the loop will be
closed and provide tangible evidence that undesirable risk has not entered
the system
• Follow up FMEA as needed

Hazard Categories
Categories Description / Example
Calculation Error Incorrect system calculation due to incorrect formula in logic / code.
E.g., Showing incorrect slots, distances (miles vs. kms).
Display Error Incorrect data displays that may cause a Potential Patient Safety Issue.
E.g., 1) Physician approves lab results with comments, but subsequent view of these comments
in Orders module is truncated, 2) Data retrieval results in an incorrect display.
Implementation Error Incorrect feature implementation or product configuration resulting in a Potential Patient Safety
Issue.
E.g., Creation of templates displays incorrect “date of last” field in health maintenance.
Input Error Incorrect input by the user resulting in a Potential Patient Safety Issue.
E.g., UI allows data entry of free text as opposed to only numeric values.
Installation Error Incorrect installation on a users computer that may result in a Patient Safety Issue.
E.g., 1) Mismatch in setting metric vs. standard units of measure can lead to unexpected results,
2) Incorrect client network installation or configuration.
Labeling Hazard Incorrect output which provides clinically incorrect instructions that may result in a Potential
Patient Safety Issue.
E.g., Documentation providing incorrect instructions.

Hazard Categories
Categories Description / Example

Patient Context Error Patient’s information mismatched that may result in a Potential Patient Safety Issue.
E.g., Patient A’s medications are displayed within the Med Module for Patient B.
Storage Error Incorrect data storage that may result in a Potential Patient Safety Issue.
E.g., Partially executed database operation is not transactional and does not rollback on failure,
leaving data in a partially (corrupted) state.
Transport Error Incorrect data transportation from one system to other that may result in a Potential Patient
Safety Issue.
E.g., eRx transactions are not delivered and the provider is not notified.
Usability Error Inappropriate display of fields or data.

1) Field names overlap, 2) Change in “Save” button placement on every screen, 3) Units
displayed far from the field or data.

“
“
I had six honest serving men-they
taught me all I knew: Their names
were Where and What and When-
and Why and How and Who.
Rudyard Kipling

37
Examples

38
Determine Severity, Occurrence and Detectability scores
Severity Ranking Corresponding Description / Example
(Quantitative) Option
No impact to patient care.
0 No harm
Example: An un-generated billing charge.
Inconvenience or temporary discomfort.

1 Negligible harm
Example: abrasion, temporary mild fever.
May result in temporary injury or impairment.

2 Minor harm
Examples: reversible kidney failure, hives.
May result in permanent injury or impairment.

3 Major harm Examples: chronic kidney failure, congested heart failure, loss of limb
(failure leading to incorrect diagnosis or treatment).
May result in death or life threatening event.
4 Critical harm Examples: anaphylactic shock, ventricular arrhythmia, pulmonary
embolism (failure leading to incorrect procedural treatment or death).
* Training note: These examples are clinical in nature and can be adjusted per the trainee situation

Determine Severity, Occurrence and Detectability scores
Occurrence Ranking Corresponding Description / Example
The hazardous situation can never occur.
0 Hazard Cannot Occur
The hazardous situation can only occur if a negligible set of

independent events happen simultaneously, or in rapid
1 Theoretical succession. The practical expectation is that frequency rating 1
events never occur, but are theoretically possible.
The hazardous situation may occur from time to time but would
2 Remote/Occasional require the coincidence of unlikely events. Example: Network
failure impacting data storage.
The hazardous situation will likely occur at predictable frequencies
over the life of the product.
3 Probable Example: Data backup, impacting patient context information.
The hazardous situation will often or always occur over the life of
the product. If frequency cannot be estimated, assume this
4 Frequent/Always value. Example: When the user tries to select the “Clear
Consciousness” checkbox , the system automatically selects and
saves the “Clouded Consciousness” option.

FMEA: Activities - Phase A: Step 8: Detectability
Rankings
Detectability Ranking Corresponding Description / Example
The presence of the hazardous situation is evident to even
untrained users of the system. Example: The user receives a fatal
1 Obvious
error so that he/she clearly knows the function didn’t work;
however, this could delay patient care.
The hazardous situation can be detected easily by the intended

2 Noticeable
user of the system. Example: Data display errors.
The hazardous situation is difficult to detect even for the intended

users of the system in normal operating situations. Example:
3 Obscure
Hazards due to different versions of Operating Systems;
Transmission errors to different system.
The hazardous situation cannot be detected. Example: Hazard due
4 Undetectable to external data import, which cannot be detected unless the
system goes in production.

“
“
There are no foolish questions and
no man becomes a fool until he
has stopped asking questions
Charles Proteus Steinmetz

42
NOTICE: This document contains information that is confidential and proprietary to NextGen Healthcare, Inc. and its subsidiaries and
affiliates (“Company”) and is intended for use solely by its authorized employees. This document may not be copied, reproduced,
published, displayed, otherwise used, transmitted, or distributed in any form by any means as a whole or in any part, nor may any of
the information it contains be used or stored in any information retrieval system or media, translated into another language, or
otherwise made available or used by anyone other than the authorized employee to whom this document was originally delivered
without the prior, written consent of Company.
By retaining or using this document, you represent that you are a Company employee who is authorized to use this document and
that you will use this document and the information it contains solely as and to the extent permitted by the Company. Any other use or
distribution of the contents of this document, as a whole or in any part, is prohibited.
Although we exercised great care in creating this publication, Company assumes no responsibility for errors or omissions that may
appear in this publication and reserves the right to change this publication at any time without notice.
© 2018 QSI Management, LLC. All Rights Reserved.
The registered trademarks listed at www.nextgen.com/legal-notice are the registered trademarks of QSI Management, LLC.
All other names and marks are the property of their respective owners.
Our issued and published patents can be found at www.nextgen.com/legal-notice.

44

HL7 Fhir

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HL7 Fhir

Uploaded by

Copyright:

Available Formats

Failure Mode and Effect Analysis

Roy Gill, MD Director, Requirements

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 1

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 2

What is at the heart of an FMEA?

Full FMEA Process

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 3

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 4

• FMEA - Failure Mode and Effect Analysis is a

• FMEA is carried out on new features of a product

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 5

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 6

The goal of NextGen Healthcare, with

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 7

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 8

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 9

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 10

• Main pitfall: Failure to identify a Failure Mode

• Failure: “The inability of an item, product, or service to perform required

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 11

• An unintended (unplanned) or undesirable function is performed

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 12

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 13

• The more severe the impact of failure, the more hazardous it is

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 14

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 15

The More the Occurrence of Hazard, the more Hazardous it is.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 16

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 17

• The more obvious the hazard, the lower the risk

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 18

• Current Controls can be defined as current procedures, processes, techniques,

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 19

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 20

• The RPN is the mathematical multiplication of severity, occurrence, and

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 21

Minor Risk Risk which is small; no further mitigation is

Risk is tolerable, but actions can be taken to

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 22

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 23

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 24

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 25

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 26

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 27

• Software developer (Architect when felt needed by the team)

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 29

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 30

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 31

1. The richest opportunity for improvement lies in reducing the

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 32

Draft Process proposed:

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 33

• Follow up FMEA as needed

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 34

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 35

Categories Description / Example

Usability Error Inappropriate display of fields or data.

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 36

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 37

Confidential–For Use By Authorized Employees Only. Do Not Distribute. 38

Inconvenience or temporary discomfort.

May result in temporary injury or impairment.

May result in permanent injury or impairment.