You are on page 1of 140

For individual use only. © Copyright ISPE 2020. All rights reserved.

GOOD PRACTICE GUIDE:

Equipment
Reliability

Disclaimer:
This Good Practice Guide provides practical guidance to help pharmaceutical organizations apply and improve
reliability at all stages of the equipment lifecycle. This Guide is solely created and owned by ISPE. It is not a
regulation, standard or regulatory guideline document. ISPE cannot ensure and does not warrant that a system
managed in accordance with this Guide will be acceptable to regulatory authorities. Further, this Guide does not
replace the need for hiring professional engineers or technicians.

Limitation of Liability
In no event shall ISPE or any of its affiliates, or the officers, directors, employees, members, or agents of each of
them, or the authors, be liable for any damages of any kind, including without limitation any special, incidental,
indirect, or consequential damages, whether or not advised of the possibility of such damages, and on any theory of
liability whatsoever, arising out of or in connection with the use of this information.

© Copyright ISPE 2020. All rights reserved.

All rights reserved. No part of this document may be reproduced or copied in any form or by any means – graphic,
electronic, or mechanical, including photocopying, taping, or information storage and retrieval systems – without
written permission of ISPE.

All trademarks used are acknowledged.

ISBN 978-1-946964-37-3

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 2 ISPE Good Practice Guide:
Equipment Reliability

Preface
The current climate within the industry continues to rely upon innovation for success. The appeal for innovation
expands beyond new products and is accompanied by increased pressure to provide for affordable healthcare.
Whether for new or legacy products, a reliable supply chain and effective cost management are critical for satisfying
customer needs.

The application of asset management principles can leverage equipment towards competitive advantage. As
innovation extends to and transforms the supply chain, equipment lifecycle costs and availability become ever more
relevant to maintaining a competitive advantage. Reliable equipment improves the likelihood of achieving reliable
operations and thus improves the supply of critical therapies to patients worldwide.

This ISPE Good Practice Guide: Equipment Reliability intends to provide guidance on the application of equipment
reliability concepts to systematically and proactively improve equipment reliability at all stages of the equipment
lifecycle. This Guide is not intended to provide the basic body of knowledge for equipment reliability or reliability
engineering. This Guide offers best practices with respect to equipment reliability, addresses specific opportunities for
the pharmaceutical industry beyond the general reliability of equipment, and can serve as the basis for an effective
reliability program.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 3
Equipment Reliability

Acknowledgements
The Guide was produced by a Task Team led by Michael Berkey (Merck & Co., Inc., Kenilworth, NJ, USA).

Core Team

The following individuals took lead roles in the preparation of this Guide:

John Byrne MSD Ireland


Thomas Harris AstraZeneca USA
Robert Smith CAI USA
Hamid Teimourian AstraZeneca USA
Richard Tree CAI USA
Alfred Ao Yu Resilience Biotechnologies Inc. Canada

Special Thanks

The Team would also like to thank ISPE for technical writing and editing support by Nina Wang and Jeanne Perez
(ISPE Guidance Documents Technical Writers/Editors) and production support by Lynda Goldbach (ISPE Guidance
Documents Manager).

The Team Leads would like to express their grateful thanks to the many individuals and companies from around the
world who reviewed and provided comments during the preparation of this Guide; although they are too numerous to
list here, their input is greatly appreciated.

Company affiliations are as of the final draft of the Guide.

Cover photo: https://www.shutterstock.com/

For individual use only. © Copyright ISPE 2020. All rights reserved.
600 N. Westshore Blvd., Suite 900, Tampa, Florida 33609 USA
Tel: +1-813-960-2105, Fax: +1-813-264-2816

www.ISPE.org

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 5
Equipment Reliability

Table of Contents
1 Introduction.......................................................................................................................7
1.1 Background................................................................................................................................................... 7
1.2 Purpose and Value........................................................................................................................................ 7
1.3 Scope and Audience..................................................................................................................................... 8
1.4 Benefits......................................................................................................................................................... 9
1.5 How to Use This Guide............................................................................................................................... 10

2 Key Concepts and Terms.................................................................................................11


2.1 Key Concepts............................................................................................................................................. 11
2.2 Key Terms and Acronyms........................................................................................................................... 15

3 Equipment Lifecycle....................................................................................................... 19
3.1 Approach.................................................................................................................................................... 19
3.2 Project Phase............................................................................................................................................. 21
3.3 Operation Phase......................................................................................................................................... 25
3.4 Decommissioning Phase............................................................................................................................ 32

4 Risk Management........................................................................................................... 35
4.1 Overview..................................................................................................................................................... 35
4.2 Risk Management Process......................................................................................................................... 36
4.3 Mitigating Risk............................................................................................................................................. 40

5 Supplier Management................................................................................................... 49
5.1 Failure Reporting, Analysis, and Corrective Action System (FRACAS)...................................................... 49
5.2 Supplier Activities Throughout the Asset Lifecycle...................................................................................... 50
5.3 Supplier Products and Services.................................................................................................................. 51
5.4 Planning Phase: Design for Reliability and Front End Planning – “Design it Right”................................... 53
5.5 Installation Phase – Construction, Commissioning and Validation – “Install it Right”................................. 61
5.6 Value Creation Phase – Operations and Maintenance – “Operate and Maintain it Right”.......................... 62
5.7 Value Optimization Phase – Lean/Six Sigma, Reliability Improvement – “Improve it Right”....................... 63
5.8 End-of-Life Phase – Refurbish, Repurpose, Decommission, Replace – “Decommission it Right”............. 65

6 Operations and Maintenance........................................................................................ 67


6.1 Risk-Based Decisions................................................................................................................................. 67
6.2 Lifecycle Cost.............................................................................................................................................. 67
6.3 Supplier Documentation – Expectations..................................................................................................... 68
6.4 Maintenance and Calibration Programs..................................................................................................... 68
6.5 Establishing and Managing Support Services............................................................................................ 72
6.6 Performance Monitoring............................................................................................................................. 73
6.7 Incident Management................................................................................................................................. 74
6.8 Change Management................................................................................................................................. 76
6.9 Periodic Review.......................................................................................................................................... 77
6.10 Continuity Management.............................................................................................................................. 79

7 Appendix 1 – Managing Reliability in Operations...................................................... 81


7.1 Organizational Readiness........................................................................................................................... 81
7.2 Risk Management....................................................................................................................................... 83
7.3 Operations Management............................................................................................................................ 84
7.4 Asset Obsolescence Management and Retirement Process..................................................................... 85

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 6 ISPE Good Practice Guide:
Equipment Reliability

8 Appendix 2 – Managing Reliability in New Projects................................................. 87


8.1 Stage 1: Feasibility..................................................................................................................................... 87
8.2 Stage 2: Conceptual Development............................................................................................................. 88
8.3 Stage 3: Project Delivery Planning............................................................................................................. 88
8.4 Stage 4: Design.......................................................................................................................................... 88
8.5 Stage 5: Implementation............................................................................................................................. 91
8.6 Stage 6: Close-out (Project Turnover)........................................................................................................ 94

9 Appendix 3 – Special Interests..................................................................................... 95


9.1 Alignment with ASTM E2500...................................................................................................................... 95
9.2 Alignment with ISO 55000.......................................................................................................................... 95
9.3 Design for Reliability (DfR) Tools................................................................................................................ 96
9.4 Reliability Centered Maintenance (RCM)................................................................................................. 101
9.5 Total Productive Maintenance (TPM)........................................................................................................ 102
9.6 Predictive Tools (for PdM)......................................................................................................................... 104
9.7 Precision Maintenance............................................................................................................................. 106
9.8 Root Cause Analysis (RCA)...................................................................................................................... 106
9.9 PM Optimization........................................................................................................................................ 107
9.10 Obsolescence........................................................................................................................................... 108
9.11 Emerging Technologies............................................................................................................................. 109

10 Appendix 4 – References..............................................................................................113

11 Appendix 5 – Glossary...................................................................................................117
11.1 Acronyms and Abbreviations.................................................................................................................... 117
11.2 Definitions................................................................................................................................................. 119
11.3 Additional Definitions for Reference......................................................................................................... 123

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 7
Equipment Reliability

1 Introduction
1.1 Background

Equipment reliability is concerned with the risk of failures in equipment and processes, providing focus on equipment
availability, fitness for purpose, and cost. The strategy and tactics of reliability contribute to realizing the value of
equipment throughout its useful life and mission. Reliability and maintenance strategies are developed to reduce
the risk and impacts of equipment failure. From a management perspective, reliability can be viewed as a set of
techniques applied through an attitude of anticipating unreliability (instability) and an appreciation for the proactive
elimination of issues. The techniques presented in this Good Practice Guide concentrate on equipment reliability,
whether directly or indirectly related to product supply.

1.2 Purpose and Value

This Guide aims to provide guidance on the application of equipment reliability concepts in the context of the
pharmaceutical, medical device, biologics, blood, and/or advanced therapy industries. These concepts are applicable
to facilities, utilities, systems, and equipment assets. The Guide intends to:

• Provide recommendations and best practices for organizations to develop and apply solutions for asset and
maintenance strategies to optimize equipment performance and minimize the total cost of ownership

• Allow for flexibility in implementing equipment reliability principles with respect to organizational size, resources,
asset age, and maturity

• Apply a lifecycle approach to equipment assets

• Be complementary to other ISPE Guides for related aspects of asset management and maintenance [1, 2]

The value offered by this Guide includes:

• Presenting techniques to systematically and proactively improve equipment reliability at all stages of the
equipment lifecycle (design, installation, commissioning/qualification, operations and maintenance, and
decommissioning)

• Raising awareness of the contribution of assets (conditions) to the management of business continuity risk

• Influencing the pharmaceutical industry to move toward more reliable assets by focusing on the systematic
reduction of equipment performance variation and its operating impact, through improved equipment design and
management

• Improving manufacturing processes, support processes, and product quality by ensuring continued fitness for
purpose and availability of equipment

• Addressing the events and consequences of equipment failures, by providing guidance on effective tools and
strategies to be applied to an organization’s systems

• Focusing on equipment reliability as a means of reducing and managing risk with respect to business operations,
supply, product quality, compliance, and reputation

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 8 ISPE Good Practice Guide:
Equipment Reliability

1.3 Scope and Audience

This Guide is written for the pharmaceutical and biotechnology industry, including but not limited to pharmaceuticals,
medical devices, biologics, blood, and/or advanced therapy medicine industries. In terms of equipment reliability, the
application of this Guide may extend to the physical assets, facilities, utilities, and equipment that support:

• Production

• Materials management (e.g., warehousing, transport)

• Packaging and labeling

• Laboratory operations

It is intended that this Guide bridges to existing ISPE guidance documents within and beyond the knowledge base
of equipment reliability. The connectivity to other bodies of knowledge is intended to provide deeper and broader
understanding of equipment reliability within the context of business, including stability, capacity, risk, and compliance.
These supporting ISPE documents include:

• ISPE Baseline® Guide: Volume 5 – Commissioning and Qualification (Second Edition) [3]

• ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Calibration Management [4]

• ISPE Good Practice Guide: Asset Management [1]

• ISPE Good Practice Guide: Decommissioning of Pharmaceutical Equipment and Facilities [5]

• ISPE Good Practice Guide: Good Engineering Practice [6]

• ISPE Good Practice Guide: Maintenance [2]

• ISPE Good Practice Guide: Operations Management [7]

• ISPE Good Practice Guide: Project Management [8]

This ISPE Good Practice Guide: Equipment Reliability is not intended to:

• Be a prescriptive application for equipment reliability, tools, or techniques

• Provide the basic body of knowledge for equipment reliability or reliability engineering

• Define how to fully develop an equipment reliability program

• Address specific asset types

• Apply reliability techniques to specific situations

The strategy and application of equipment reliability to specific equipment is beyond the scope of this Guide. The
design and implementation of several equipment types are currently addressed by separate ISPE documents and this
Guide may be used to augment its management:

• Biopharmaceutical manufacturing facilities [9]

• Heating, Ventilation, and Air Conditioning (HVAC) [10]

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 9
Equipment Reliability

• Process gases [11]

• Quality laboratory facilities [12]

• Sterile product manufacturing facilities [13]

• Water and steam systems [14]

The primary audience is intended to be those in positions responsible for designing, implementing, operating, and
maintaining equipment assets, for example:

• Maintenance management

• Operations management

• Quality management

• Reliability and maintenance engineers

• Facilities and utilities engineers

• Project and design engineers

• Craftsperson and operators

Given the intended impact of equipment asset reliability and availability, this Guide also provides strategic insight into
additional functions that interface with asset design, operation, maintenance, and decommissioning. These groups
may include, for example:

• Process engineering

• Supply chain

• Health, Safety, and Environment (HSE)

• Procurement

• Finance/capital management

1.4 Benefits

This Guide presents guidance to address specific opportunities within the industry beyond the general reliability
of equipment. The focus on asset availability, compliance expectations, and strategic management of equipment
supports an organization’s pursuit of supply chain stability and competitive advantage. Additional benefits of this
Guide include:

• Providing a baseline for equipment reliability and asset management best practices that may be implemented to
improve the support of aging equipment and facilities

• Providing an approach to new equipment assets and reliability beginning in the design phase, such that
equipment lifecycle and total cost of ownership are evaluated as part of the procurement process

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 10 ISPE Good Practice Guide:
Equipment Reliability

• Promoting a broader understanding of equipment assets, emphasizing that asset reliability is influenced (in order
of impact) by design, operation, and maintenance

• Providing information advocating that common organizational goals and decision-making account appropriately
for risk contributed by equipment assets, including the business, process/product, and compliance risk aspects

• Providing a basis for continual improvement (product quality, risk, operating cost, employee safety, capital
management, etc.)

1.5 How to Use This Guide

This Guide offers best practices with respect to equipment reliability and can serve as the basis for an effective
reliability program. For organizations that choose to implement portions rather than the entirety of this Guide,
improved performance can be achieved based on specific areas of opportunity, i.e., there is value in employing the
concepts described in individual sections even if all aspects of this Guide are not implemented.

Key concepts and terms are covered in Chapter 2.

The following major topics are addressed:

• Equipment lifecycle (Chapter 3)

• Risk management (Chapter 4)

• Supplier activities, including equipment and service providers (Chapter 5)

• Operations and maintenance (Chapter 6)

• Managing reliability in operations (Chapter 7, Appendix 1)

• Managing reliability in new projects (Chapter 8, Appendix 2)

• Special interests, including advanced equipment reliability topics (Chapter 9, Appendix 3)

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 11
Equipment Reliability

2 Key Concepts and Terms


2.1 Key Concepts

2.1.1 Equipment Lifecycle

The equipment lifecycle consists of three distinct phases:

• Project phase

• Operation phase

• Decommissioning phase

2.1.1.1 Project Phase

During the project phase, the requirements for the equipment are defined and the equipment is designed, built,
installed, commissioned, and turned over.

The project phase is the opportunity to apply Design for Reliability (DfR) principles such that asset philosophy,
operating strategy, maintenance strategy, and lifecycle analysis can be incorporated into the equipment design.
Analysis and risk assessments performed during the project phase assist with developing the strategies and tactics
to mitigate the risk of failures through design, operational, maintenance, and administrative controls. Deploying a
method of defect discovery enables prevention or mitigation of equipment issues during design and installation, and it
also enables effective maintenance strategies and continual improvement in the Operation phase.

Depending on the project framework employed, elements of the project phase may include:

• Feasibility

• Concept

• Specifications

• Design

• Installation

• Commissioning and Qualification (C&Q)

• Turnover

An additional focus of the project phase is preparing for the operational readiness of the equipment. Operational
readiness needs to include the technical, operational, and maintenance aspects of the asset. To achieve right-first-
time deployment, the equipment strategies developed during the Project phase need to include capacities, operating
envelopes, spare parts availability, maintenance tactics, etc.

For best practices associated with managing projects, refer to ISPE Good Practice Guide: Project Management for
the Pharmaceutical Industry [8].

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 12 ISPE Good Practice Guide:
Equipment Reliability

2.1.1.2 Operation Phase

The Operation phase is the longest phase in the equipment lifecycle and encompasses the equipment being actively
operated and maintained, providing value to the organization. The maintenance and reliability elements critical in the
Operation phase include:

• Change management

• Incident management

• Monitoring and trending

• Continual improvement

• Lifecycle planning

Modifications to either the equipment or its usage may require adapting the maintenance strategy during the
Operation phase. The inputs from incidents, performance trends, and improvements can warrant changes to address
the effectiveness and efficiency of maintenance strategies. Ongoing capacity planning may necessitate changes to
both asset mix and maintenance strategies to effectively manage the impact of demand changes.

2.1.1.3 Decommissioning Phase

The Decommissioning phase applies when the equipment asset is no longer required in service of the organization’s
objectives. Upon decommissioning, the equipment is no longer utilized for operations. Within regulated environments,
equipment or its components may require documentation of End-of-Life (EOL) testing or qualification activities to
ensure design performance and fitness for purpose were maintained throughout the asset life. Considerations should
be made for equipment components, such as critical safety devices (e.g., pressure relief devices), that need to be
maintained even if the equipment is out of service but remains installed. Financial rules may distinguish between an
asset that is owned but out of service versus an asset that has been disposed of.

For best practices associated with decommissioning, refer to ISPE Good Practice Guide: Decommissioning of
Pharmaceutical Equipment and Facilities [5].

2.1.2 Operational Readiness for Maintenance

The project planning and implementation activities need to ensure that maintenance system deliverables are
completed during the project phase and provided to the maintenance organization for deployment in the Operation
phase. While final delivery coincides with completion of the project, ideally the project team has collaborated and
shared information with the maintenance department beginning with the project’s concept development. In order
for the maintenance department to be prepared for operational readiness at project turnover, elements of the
Maintenance Work Management System should be developed in parallel with project execution, including but not
limited to:

• Preventive Maintenance (PM) and Predictive Maintenance (PdM) plans

• Condition-Based Maintenance (CBM)

• Spare parts lists, Bills of Materials (BOMs), and stocking plans

• Standard Operating Procedures (SOPs)

• Specialized tools

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 13
Equipment Reliability

• Maintenance consumables

• Equipment asset documentation (e.g., manuals, drawings, warranty, service agreements, etc.)

• Training

• Computerized Maintenance Management System (CMMS)

• Computerized Calibration Management System (CCMS)

2.1.3 Risk-Based Approach

Product, process, compliance, and business understanding should be used as the basis for making risk-based
decisions about an asset and its maintenance strategy. Conducting a system-level impact assessment (e.g., direct,
indirect, non-GMP) is a best practice that serves from concept through to decommissioning of the equipment
lifecycle. Compliance factors may extend beyond GMPs to include HSE and finance characteristics. Understanding
and effectively addressing operating requirements and current regulatory expectations contribute to the sustainability
of an asset through the Operation phase. Supply chain and cost effects are also key considerations in establishing an
asset’s criticality and priority.

A deeper understanding of risk may be developed with the evaluation of equipment functions and features, the
components involved in those functions and features, likelihood of failure, and severity of failure impact. Failure
Modes and Effects Analysis (FMEA) and/or Failure Modes, Effects, and Criticality Analysis (FMECA) may be used
to assess potential equipment failures and identify mitigations to address risks presented within the context of a
maintenance strategy.

2.1.4 Efficiency and Effectiveness

Maintenance strategies need to provide for both effective and efficient equipment maintenance. During the operating
life of an equipment asset, the applied maintenance strategy may become sub-optimized for various reasons.
Factors including operating changes, improved equipment understanding, component wear, and predictive tools
can impact an asset’s performance and its maintenance strategy. Preventive Maintenance Optimization (PMO) may
be intentionally exercised to improve sub-optimized performance with focus on one or more contributing factors,
including but not limited to:

• Improper initial maintenance strategy

• Incorrect Root Cause Analysis (RCA) findings and/or corrective actions

• Changes in an asset’s operation

• Newly available and/or improved technologies and tools

• Changes with respect to reevaluated Asset Criticality and risk

• Changes with respect to FMEA/FMECA identified risks and mitigations

• Cumulative effects of changes and incidents

Decisions on design and strategy made during the project, properly founded in reliability engineering, will influence
both the effectiveness and efficiency of equipment maintenance tactics in the Operation phase. The following actions
and considerations should be considered:

• Vendor availability and technical support

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 14 ISPE Good Practice Guide:
Equipment Reliability

• Spare parts criticality, availability, storage, and consignment

• Technology platform, complexity, and level of automation

• Improvement tools (RCA, FMEA/FMECA, etc.)

• Asset Criticality

• Deployment of predictive tools and technologies

• CMMS, CCMS, and data analysis

• Equipment education (e.g., technical, operation, maintenance)

For best practices associated with the execution of equipment maintenance, refer to ISPE Good Practice Guide:
Maintenance [2] and ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Calibration Management [4].

2.1.5 Operability

Beyond equipment design, operations is the second leading factor that contributes to the overall reliability of an
asset. The operator-equipment interface and error proofing are worthwhile considerations during and after equipment
selection, whether accomplished through design, engineering controls, or procedural controls.

Critical to reliability and the success of the maintenance strategy is ensuring that the operation honors the equipment
design envelope, whether with respect to constraints, intended functions, or capacities. Intentionally operating outside
of the equipment’s design envelope both increases the potential for premature or unexpected component failures as
well as places the failure modes, effects, likelihood, and risks beyond the perspective of the evaluation supporting the
current maintenance strategy.

Dependent upon the operating strategy and the organization’s culture, a productive opportunity exists to bridge
equipment operation with maintenance. Operator care recognizes that an operator has greater exposure and
sensitivity to the performance of operating equipment. Maintenance approaches, such as Total Productive
Maintenance (TPM), leverage an operator’s observations and access to the equipment to perform shop floor
adjustments and routine maintenance. With properly trained operators, there is more opportunity to provide more
efficient maintenance with at-hand resources. In addition, maintenance resources may be utilized for higher priority
and/or more technical tasks.

For best practices associated with manufacturing operations, refer to ISPE Good Practice Guide: Operations
Management [7].

2.1.6 Asset Management

Asset management comprises developing a holistic view of assets and its lifecycles. Beginning with a case for new
equipment, asset management presents the framework for applying lifecycle strategies and tactics that bridge the
Project, Operation, and Decommissioning phases. Asset management reflects and interacts with the operations and
maintenance strategies, including rebuild, replacement, and acquisition activities.

With key understanding of equipment reliability and the business case for asset availability, an Asset Management
Plan applies a long-range approach to an organization’s equipment assets. A sound plan provides the mechanisms
to address changes to assets, whether planned (e.g., end of operation, useful life), obsoleted, altered, functionally
failed, or decommissioned. As applied through equipment reliability, asset management offers perspective on the full
equipment lifecycle and the total cost of ownership.

For best practices associated with an asset management strategy, refer to ISPE Good Practice Guide: Asset
Management [1].

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 15
Equipment Reliability

2.2 Key Terms and Acronyms

This section introduces key terms as they are used in the context of this Guide. Refer to Chapter 11 (Appendix 5) for
an expanded listing of definitions.

Asset

As defined in ISO 55000 [15]:

“An item, thing, or entity that has potential or actual value to an organization.”

Asset Criticality

Asset Criticality is both an attribute of an asset and a process by which the importance of the equipment is
determined to sustain production in a compliant manner. Asset Criticality is used to prioritize those assets in the
management system(s) that provide the most value toward operating throughput and are most important to quality,
health, safety, and environment.

Asset Management

A coordinated set of activities that will help the organization achieve the optimal amount of value from the assets.

Asset Management System

A management system that supports the asset management policy and the strategic asset management plans.

Availability

When a tangible asset is ready for its intended use. Availability is one of three factors that determine the Overall
Equipment Effectiveness (OEE).

Bad Actor

Equipment that performs the worst in terms of effectiveness to sustain production in a safe and compliant manner.

Design for Reliability (DfR)

The process of incorporating equipment reliability and availability into equipment design.

Downtime

The time during which an asset is not available for its intended use. Downtime may be composed of both planned
and unplanned events.

Equipment

The items, articles, and implements utilized for a specific purpose in an operation or activity. The fixed assets of an
organization, other than land and buildings.

Facility

An area or building that is built, installed, or established to serve a particular purpose.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 16 ISPE Good Practice Guide:
Equipment Reliability

Lifecycle

The stages that an equipment asset progresses through, from initial development through to disposal.

MTBF

Mean Time between Failure

MTBM

Mean Time between Maintenance

MTTF

Mean Time to Fail

MTTR

Mean Time to Repair

MWT

Mean Waiting Time

Overall Equipment Effectiveness (OEE)

A calculated measurement that indicates how effectively a manufacturing operation performs with respect to its
design capacity, when scheduled to run. The factors in determining OEE are Availability (% Scheduled Time),
Performance (% Design Performance), and Quality (% Good Units). OEE is expressed as a percentage.

Preventive Maintenance Optimization (PMO)

An intentional and structured process that targets the effectiveness and efficiency of an equipment’s PM program.
The focus is upon preserving and restoring the equipment’s condition, with value-added tasks, materials, and
downtime.

Reliability

The probability that the equipment will perform satisfactorily for a given period of time under stated conditions.

Reliability Centered Maintenance (RCM)

A systematic and structured process used to develop an efficient and effective maintenance plan for an asset to
minimize the probability of failure.

Reliability Growth

The positive improvement in a reliability parameter over a period of time due to the implementation of corrective
actions to system design, operation, maintenance procedures, or the associated manufacturing process.

Reliability Integration

The process of seamlessly and cohesively integrating reliability tools to maximize equipment reliability at the lowest
possible cost.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 17
Equipment Reliability

Root Cause Analysis (RCA)/Root Cause Failure Analysis (RCFA)

A method used to determine the underlying cause(s) of a failure. The focus is upon identifying the causes to be
eliminated that prevent additional failures versus treating failure symptoms.

Total Effective Equipment Performance (TEEP)

A calculated measurement that indicates how effectively a manufacturing operation performs with respect to its
design capacity, based upon total calendar time. The factors in determining TEEP are Utilization (% Utilized Time),
Performance (% Design Performance), and Quality (% Good Units). TEEP is expressed as a percentage.

Total Productive Maintenance (TPM)

An operational philosophy where the total workforce of an organization plays a role in ensuring the performance of
equipment is maintained and improved. TPM is similar to Total Quality Management but with the difference that the
focus is on assets and not products.

Utility

An equipment asset that generates and/or distributes an essential supply service (e.g., electricity, water, steam, etc.)
to support a facility’s operation.

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 19
Equipment Reliability

3 Equipment Lifecycle
3.1 Approach

An asset is defined as follows in ISO 55000 [15]:

“An asset is an item, thing or entity that has potential or actual value to an organization. The value will vary
between different organizations and their stakeholders, and can be tangible or intangible, financial or non-
financial.

The period from the creation of an asset to the end of its life is the asset life. An asset’s life does not necessarily
coincide with the period over which any one organization holds responsibility for it; instead, an asset can
provide potential or actual value to one or more organizations over its asset life, and the value of the asset to an
organization can change over its asset life.”

For the purpose of this Guide, the term equipment used to describe the subset of an organization’s assets that
includes facilities, utilities, systems, and equipment and that has a useful life of more than one year. The financial
world refers to these assets as Property, Plant, and Equipment which are tangible, long-term, fixed assets. Fixed
assets should have a useful life assigned to them, which means they have a set period of time that the equipment will
produce value for an organization.

The equipment lifecycle management process consists of three distinct phases:

1. Project phase

2. Operation phase

3. Decommissioning phase

An organization should have a capital asset replacement planning program that incorporates the equipment lifecycle
management process. Figure 3.1 is an example swimlane diagram showing typical policies, plans, and processes
included in a capital asset replacement planning program.

Figure 3.1: Example of Capital Asset Replacement Planning Program (Swimlane Diagram)
Used with permission from CAI, www.cagents.com.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 20 ISPE Good Practice Guide:
Equipment Reliability

Figure 3.2 is an example swimlane diagram showing typical steps that comprise an equipment lifecycle management
process.

Figure 3.2: Example of Equipment Lifecycle Management Process (Swimlane Diagram)


Used with permission from CAI, www.cagents.com.

Each step in the equipment lifecycle management process flow is not intended to be applied to all assets. The steps
represent best practices to be considered during the lifecycle of an asset. Depending on the complexity, criticality,
or size of the asset, some steps will require more or less effort, or may not be required. A step does not need to be
completed before the next step is started. There is typically overlap between the activities in the various steps. Often,
multiple steps will have tasks being conducted at the same time and in conjunction with other steps.

The equipment lifecycle management process starts when a need has been identified for a new or replacement
asset(s) and the priority has risen to the top of the 5 and 10-year capital plans based on lifecycle planning and
decision-making methods. These methods include:

• Risk management

• Operational planning

• Capital investment planning

• Financial planning

The output from the project decision-making process should include:

• Schedule of projects, including their costs (capital expenditures (CAPEX), operating expenses (OPEX), and
valuation impacts)

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 21
Equipment Reliability

• Timings required to deliver the service levels

• Benefits of the preferred options

This information will then be used to help build the 10-year capital plan. The forecasting approaches used will have a
major influence on the capital project priorities and timing. With a 10-year capital plan, the forecasts need to be robust
over the same period. All asset lives and replacement dates need to have a reasonable level of robustness for asset
valuation and depreciation purposes.

A conservative approach for developing the 10-year capital plan is to assume the eventual replacement of all
existing assets. This plan is based on an estimated replacement date that is adjusted based on periodic condition
assessments to produce a more accurate replacement date. For assets which have exceeded their projected life
expectancy but are still performing at an acceptable level, the asset life should be extended accordingly.

3.2 Project Phase

This section focuses on the following elements of the Project phase:

• Concept and design

• Installation, C&Q, and validation

3.2.1 Concept and Design

In the concept and design portion of the Project phase, the requirements for the equipment are defined and the
equipment is designed and built. During this phase, the organization should perform a DfR analysis that includes
establishing the asset philosophy, operating strategy, maintenance strategy, and conducting a Lifecycle Cost Analysis
(LCCA). The results from the DfR analysis can then be incorporated into the equipment design and can assist with
the development or revision of the Strategic Asset Management Plan (SAMP). The DfR analysis should also include
risk assessments, including Reliability Centered Maintenance (RCM) or FMEA and Asset Criticality analysis to assist
with developing the strategies and tactics to mitigate the risk of failures through technical design, operations, and
maintenance controls.

The initial User Requirement Specification (URS) that was included in the business case will be reviewed and
updated, as required. The user requirements are used as the basis for developing project specifications and
drawings.

The functional and detailed design should take into consideration the defined user requirements, layouts,
constructability, relevant codes, regulations, technical norms, and standards. Included as part of the functional and
detailed design review process, an asset LCCA and DfR review should be conducted.

The LCCA is used to estimate the total cost of ownership of an asset over the entire asset lifecycle. Often investment
decisions are based on the initial capital cost to acquire the asset, unduly focusing on the low bid. Basing the asset
purchase decision solely on the lowest initial cost ignores the considerations that most capital assets incur the bulk
of their lifecycle costs during the operational phase and that the ratio of operating costs to capital costs can vary
significantly between otherwise equivalent assets.

The information gathered and analyzed from a LCCA is used to compare the relative merits of different asset
alternatives considered during the evaluation and procurement stages in order to make the selection that offers the
lowest total cost of ownership. The LCCA should satisfy four characteristics established by industry best practices:
credible, well documented, accurate, and comprehensive. [43]

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 22 ISPE Good Practice Guide:
Equipment Reliability

The LCCA provides for a more informed equipment selection by allowing decision makers to:

• Compare the total cost of ownership for equivalent assets from different manufacturers

• Estimate the timing of expected cash flow requirements at all stages of the asset lifecycle

• Assess the feasibility of investing in a specific asset (“go” or “no go”)

• Evaluate the costs of alternative design options to minimize lifecycle costs

• Decide between sources of supply (manufacturers or vendors)

• Select the asset maintenance strategies to maximize asset life

• Assess asset expected EOL, replacement, and disposal requirements

DfR is applied with the purpose of designing out failure modes and defects. The DfR process analyzes the various
“abilities”, e.g., reliability, maintainability, operability, accessibility, cleanability, repairability, serviceability, etc. A cross-
functional team performs the DfR process to evaluate the performance requirements of an asset such as:

• Safety requirements (e.g., HSE, process, etc.)

• Operating context (operational, maintenance, and reliability functional attributes)

• Environmental conditions (e.g., level of vibration, sound, temperature, etc.)

• Precision requirements (e.g., balancing, alignment, torque, tolerances, etc.)

• Metrics (e.g., Mean Time between Failure (MTBF), Mean Time to Repair (MTTR), Overall Equipment
Effectiveness (OEE), etc.)

• Redundancy

• Technology (new or existing)

• Maintenance service and support (e.g., parts standardization and availability, annual service contracts, etc.)

• Failure modes (eliminate or reduce need for mitigation)

• Maintenance and Reconditioning (M&R) standards (e.g., lubrication, materials of construction, foundations,
fasteners, etc.)

• Capacity of equipment cleaning

• Materials of construction compatibility

• Level of automation complexity

• Predictive technologies

• Condition-based monitoring and instrumentation

Studies have documented that up to 85% of an asset’s lifecycle cost is committed during the design phases of the
asset’s life, yet up to 85% of the actual costs are incurred after it is put into operation. [41]

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 23
Equipment Reliability

An Asset Criticality analysis identifies significant unrecognized risks and costs associated with the risk potential. The
analysis ensures that overall success is balanced against the reality of limited capital and maintenance budgets. This
prime constraint is most commonly known as the level of service versus cost to maintain factor.

An Asset Criticality analysis provides the basis to quantifiably rank systems for analysis via reliability priority and to
define the most cost-effective manner for developing the right maintenance strategy for each category of system
criticality.

Figure 3.3 below is an example of a maintenance strategy decision flowchart used to select the right maintenance
strategy for developing asset maintenance programs based on system criticality.

Figure 3.3: Maintenance Strategy Decision Flowchart


Used with permission from CAI, www.cagents.com.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 24 ISPE Good Practice Guide:
Equipment Reliability

Refer to the ISPE Good Practice Guide: Asset Management [1] for best practices associated with developing
the Strategic Asset Management Plan and to the ISPE Good Practice Guide: Good Engineering Practice [6] for
information about the design process.

3.2.2 Installation, C&Q, and Validation

In the installation, C&Q, and validation portion of the Project phase, the equipment is installed, commissioned,
qualified, validated and turned over to the operating department. During this phase, the operating department is
responsible for ensuring development of the Asset Management Plan (AMP) which defines the activities undertaken
on the equipment, including specific and measurable objectives. These objectives need to align with operating plans,
site organizational plans, and operating unit business plans.

The AMP is developed for groupings of assets, e.g., assets managed by a business unit, assets at a site/facility,
or a complex asset system. It is common practice for the AMP to contain the rationale for the proposed asset
management activities and the objectives they are intended to achieve, operational and maintenance plans, capital
investment plans (overhaul, renewal, replacement, enhancement, and disposal), and financial and resource plans.

After the system/asset is purchased, and during installation, the asset registry information is verified and entered into
the CMMS system. This information includes, for example:

• Asset hierarchy development and asset attributes

• Criticality and system impact

• Projected useful life

• P&ID information

When a project is approved to install a system/asset, documents are assembled into an Engineering Turnover
Package (ETOP or TOP). The ETOP documentation typically includes the design basis, fabrication, assembly,
installation and testing of equipment and facilities which provides the basis for qualification, and operations and
maintenance information. As the ETOP documentation is made available, PM/PdM tasks and calibrations, spare
parts, vendor records, service contracts, warranty period, and Maintenance, Repair, and Operations (MRO) BOM are
developed and entered into the CMMS.

The system/asset goes through a C&Q process before turnover to the end-user or to the validation team. The ISPE
Baseline® Guide: Commissioning and Qualification (Second Edition) [3] describes a Quality Risk Management
approach (based on ICH Q9 [16] and ICH Q8 [17]) for the classification of systems and equipment based on its
potential impact to product quality.1 System classification is performed to establish whether a system is commissioned
and qualified (direct impact) or only commissioned (not direct impact). A system risk assessment is performed with
direct impact systems to identify the product quality risk controls.

Prior to accepting the system/asset, training and SOPs need to be developed and personnel need to read the
applicable SOPs and complete required training prior to bringing the system/asset(s) into service. Examples of these
SOPs include:

• Operational

• Cleaning

1
The system risk assessment approach, as described in the C&Q Baseline® Guide [3], refers to Critical Design Elements as the engineering and
automation design elements used to implement the design controls (Critical Aspects) and to ensure documented Critical Process Parameter
operation. Examples of engineering design features include components, instruments, and materials of construction. Critical Design Elements are
identified and documented based on technical understanding of the product Critical Quality Attributes, process Critical Process Parameters, and
equipment design/automation.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 25
Equipment Reliability

• Testing

• Maintenance

• Calibration

Refer to the ISPE Good Practice Guides for additional best practices related to the Project phase:

• Project management [8]

• AMP development [1]

• C&Q [3]

• Process validation [18]

3.3 Operation Phase

The Operation phase comprises operations and maintenance and is the longest phase in the equipment lifecycle.
During this phase, the manner in which the equipment is operated and maintained determines the amount of value
created for the organization. The critical elements in the Operation phase include:

• Ongoing risk management

• Change management

• Continual improvement (performance monitoring and trending)

The outputs of these critical elements will require changes to the operations, equipment, maintenance, and reliability
strategies.

3.3.1 Operations

The first question to answer is: “Who owns the equipment?” Ownership of assets depends on its main function and
purpose. In the pharmaceutical industry, assets are typically broken down into four main categories:

• Process

• Facility

• Utility

• Laboratory

Ownership of some equipment is obvious, such as process or laboratory equipment. Shared asset ownership is not
as obvious. With shared assets, it will be important to determine who has overall responsibility for the assets and how
these assets will be managed. The engineering, maintenance, and reliability departments provide support functions
for the operational departments. The operational and support departments need to work closely together to provide
reliable operations that maximize the value created by the equipment for the organization.

The remainder of this section includes select examples to be considered when involving operations in an operations-
driven equipment reliability program.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 26 ISPE Good Practice Guide:
Equipment Reliability

3.3.1.1 Operator Training and Qualification

Operator training and qualification are critical to the safe and reliable operation of equipment and systems. Typical
items that are included in an operator training and qualification program include:

• Complete overview of the piece of equipment

• Technical discussion on the operation of the unit

• Technical discussion specific to the start-up and shutdown of the unit

• Emergency procedures

• Safety precautions

• Manual and automatic mode operation

• Methods for setting machine operating parameters (i.e., feed rate) and centerlining

• Appropriate machine sanitation measures

• Theory of operation

• Sequence of operation

• Autonomous maintenance activities

• Supporting One-Point Lessons (OPLs)

3.3.1.2 Total Productive Maintenance (TPM)

“Total Productive Maintenance (TPM) is a holistic approach to equipment maintenance [and reliability] that strives
to achieve perfect production:

• No breakdowns

• No small stops or slow running

• No defects . . .

• No accidents

TPM emphasizes proactive and preventative maintenance to maximize the operational efficiency of equipment.
It blurs the distinction between the roles of production and maintenance by placing a strong emphasis on
empowering operators to help maintain their equipment.

The implementation of a TPM program creates a shared responsibility for equipment that encourages greater
involvement by plant floor workers. In the right environment this can be very effective in improving productivity
(increasing up time, reducing cycle times, and eliminating defects).” [19]

3.3.1.3 Centerlining

Centerlining is a continual improvement program that enables an organization to reduce variation in their processes.
To centerline a process, these four steps should be performed:

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 27
Equipment Reliability

“1. Identify the important process factors, Critical Process Parameters (CPP), or variables

2. Determine the [optimal] settings and ranges for all of the important variables – by grade or product if multiple
products are being produced

3. Determine how these variables affect the process and the product

4. Ensure that the centerlined settings are always used during production

Centerlining is a never-ending process, and continuous work on all of these steps is needed in order to reap the
most benefits. Benefits can include one or more of the following: improved product quality, increased production,
reduced downtime, reduced off-quality product, reduced waste, reduced costs, and/or increased profits.
Centerlining can also help in the detection of process upsets, which allows corrective or preventative actions to
be taken.” [20]

Centerlining requires input from multiple data sources and can also provide data to support production and process
improvement initiatives such as Lean/Six Sigma.

“It must be stressed that centerlining alone cannot achieve all of the potential benefits. The best results can only
be achieved if the equipment is properly maintained and upstream processes are also centerlined. Additionally,
easy access to process data and trends from past runs can enhance centerlining efforts by making it easy to
determine where the process is, where it’s supposed to be, and where it has been historically.” [20]

3.3.1.4 Implementation of Lean Management and Six Sigma Techniques

The goal of Lean management techniques is to maximize value by eliminating waste in operations. The goal of Six
Sigma techniques is to reduce variation in the operation.

Lean management and Six Sigma technique examples include [19]:

• Kaizen: A continual improvement process that involves employees at all levels of an organization to work
together to proactively accomplish ongoing, incremental operational improvements

• 5S Program: Visual workplace

- Sort (clearing)

- Set in Order (configure)

- Shine (clean and check)

- Standardize (conformity)

- Sustain (consensus)

• Mistake Proofing: Developing a process for equipment operators to avoid mistakes

• Gemba Walks: “Go to where the work is done” (plant floor)

• Standard Work: Creating, clarifying, and sharing information about the important tasks that are performed
routinely in an operation

• Kanban (Pull System): Automated replenishment of production supplies through signal cards or lights that
indicate when more supplies are needed

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 28 ISPE Good Practice Guide:
Equipment Reliability

• Key Performance Indicators (KPIs): Tracking and encouraging progress toward meeting key production goals
and driving desired behaviors in support of achieving these goals

• Overall Equipment Effectiveness (OEE) and 6 Big Losses: Measuring productivity loss for a given production
process by tracking Availability, Throughput, and Quality

- OEE is a key measurement to help with the elimination of waste in a production process

- 6 Big Losses are used to track the most common causes of waste in a production process:

> Breakdowns

> Setup/Adjustments

> Small Stops

> Reduced Speed

> Start-up Rejects

> Production Rejects

• Value Stream Mapping: Visually mapping the flow of a production process in order to highlight opportunities to
improve the process flow and eliminate waste

• Statistical Process Control (SPC): Using statistical methods to monitor and control a process

• Root Cause Analysis (RCA): Problem solving method

• Suppliers, Inputs, Process Outputs, and Customers (SIPOC) Process: Defining a business process from
beginning to end before work begins by summarizing the inputs and outputs

• Responsible, Accountable, Consulted and Informed (RACI) Matrix: Using a responsibility assignment matrix to
map out tasks, milestones or key decisions involved in completing a project or operational process

3.3.2 Maintenance

Key aspects of equipment reliability are improved when a well-designed and managed maintenance program is
implemented.

“Maintenance is a combination of all technical and administrative actions, including supervision actions, intended
to retain an item in, restore it to, or replace it so that it can perform a required function.” [21]

Maintenance management and work execution consists of the processes and controls used in the delivery of
maintenance services. A maintenance-driven equipment reliability program employs techniques like RCM, Condition-
Based Maintenance (CBM), and precision maintenance techniques. A maintenance organization that is focused on
equipment reliability strives to optimize maintenance activities.

A maintenance-driven equipment reliability program should include the following aspects:

• Maintenance Training and Qualifications are critical to the safe and reliable maintenance of equipment and
systems. Typical items that are included in a maintenance training and qualification program include:

- Training of maintenance technicians in the complete operation of the equipment and understanding of the
theory of operation, sequence of operation, and operating limits

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 29
Equipment Reliability

- Vendor-provided training, operations and maintenance manuals

- Emergency procedures

- Safety precautions

- PM activities

- Conditioned-based maintenance activities

- Precision maintenance activities

- Lubrication procedures

- Common troubleshooting

- Corrective Maintenance (CM) work plan content

• Maintenance Management processes and controls normally involve the following systems: [2]

- Records for equipment, facilities, and systems

- Labor resources

- Planned maintenance

- Spare parts

- CMMS

- Maintenance documentation

- Work order management

- Documentation assessment

- Technical training program for equipment

Figure 3.4 illustrates the maintenance model.

Figure 3.4: Maintenance Model


Figure courtesy of Arnel N. Cabungcal.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 30 ISPE Good Practice Guide:
Equipment Reliability

3.3.3 Continual Improvement

Lean management and Six Sigma techniques, discussed earlier, are key elements of a continual improvement
program that supports equipment reliability during the Operation phase.

Another key element of a continual improvement program that supports equipment reliability is to define what an
equipment bad actor is. There are several factors that should be considered when establishing a bad actor list:

• System/equipment criticality

• OEE

• Downtime (hours, mechanical and operational, cost)

• Mean measures (MTBF (Mean Time between Failures), MTTF (Mean Time to Failure), MTTR (Mean Time to
Repair), MTBM (Mean Time between Maintenance))

• Waste and rework

• System utilization

• Budget variance (operating and maintenance)

• Work order count (including maintenance labor hours)

• Spare parts cost

• Repair cost

• Type of equipment

- Start at a macro level and then drill down to system and asset level; type examples include process, utilities,
facilities, laboratory, static equipment (piping, heat exchangers, pressure vessels), rotating equipment, fired
equipment (furnaces, boilers), instrumentation, electrical, etc.

- An example of a type of equipment grouping to identify a bad actor type would be jacketed tanks (when
grouped as an equipment type, jacketed tanks in general might be experiencing a high number of failures,
downtime, etc.)

A bad actor report will identify opportunities to utilize RCA and FMEA tools to identify mitigation strategies.

Overall Equipment Effectiveness (OEE = Availability × Performance × Quality) is a commonly used KPI across all
manufacturing industries. By tracking and improving OEE, organizations have been able to avoid capital investments
by increasing the efficiency of their manufacturing processes, thereby increasing their manufacturing capacity with
existing equipment.

The Society for Maintenance and Reliability Professionals (SMRP) [22] defines leading and lagging indicators as:

Leading Indicator:

“An indicator that measures performance before the business or process result starts to follow a particular
pattern or trend. Leading indicators can sometimes be used to predict changes and trends.”

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 31
Equipment Reliability

Lagging Indicator:

“An indicator that measures performance after the business or process result starts to follow a particular pattern
or trend. Lagging indicators confirm long-term trends, but do not predict them.”

When considering a leading measure, it is beneficial to express it in terms of what it is a leading measure for (e.g.,
what is the lagging measure that will be affected?). Whether an indicator is a leading or lagging indicator depends on
where in the process the indicator is applied.

A lagging indicator of one process component can be a leading indicator of another process component, for example:

• PM compliance is a leading indicator for equipment reliability (lagging measure)

• Equipment reliability is a leading indicator for maintenance cost (lagging measure)

• Maintenance cost is a leading indicator for profitability (lagging measure).

SMRP identifies three sets of measurable components that make up the maintenance and reliability process. [22]

• Management processes and behaviors (mission and vision, people skills)

• Operational execution (operations, design, and maintenance)

• Manufacturing performance (availability, quality, cost, and benefits)

Production disruptions due to equipment unavailability is an example of a lagging KPI under operational execution.
The impact of unreliability is seen when scheduled batches do not get produced and thus this disruption needs to be
measured.

KPIs are often used to compare, or benchmark, performance between similar equipment, processes, or sites.
Benchmarking is a method of improving performance by measuring and comparing performance against others or
“best in class” performance measures. Benchmarking should answer the following questions:

• Who performs better?

• Why are they better?

• What actions do we need to take in order to improve our performance?

Internal Benchmarking: A comparison of internal operations at a site or within the same organization. The aim is
to identify “best in organization” processes/practices and then bring the level of performance of the site or the whole
organization to the current “best in organization” benchmark.

External Benchmarking: This type of benchmarking can be competitive or generic. Competitive benchmarking is
a comparison against specific competitor products, services, or functions. Generic benchmarking is a comparison
of business functions or processes that are the same, regardless of industry or country. Generic benchmarking can
typically be used when comparing asset management, maintenance, and reliability processes and practices.

The key benefits of benchmarking are:

• Focusing improvement efforts on issues critical to success

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 32 ISPE Good Practice Guide:
Equipment Reliability

• Ensuring that improvement targets are based on what has been achieved in practice, which removes the
temptation to say “it can’t be done”

• Providing confidence that the organization’s performance compares favorably with best practice

Risk management and review is a key element of the continual improvement process. The following are examples of
processes included in a risk management and review program: [21]

• Risk assessment and management

• Contingency planning and resilience analysis

• Sustainable development

• Management of change

• Asset performance and health monitoring

• Asset management system monitoring

• Management review, audit, and assurance

• Asset costing and valuation

• Stakeholder engagement

For best practices associated with operations and maintenance, refer to the following ISPE Good Practice Guides:

• Asset Management [1]

• Operations Management [7]

• Maintenance [2]

• Good Engineering Practice [6]

• A Risk-Based Approach to Calibration Management [4]

• Sustainability [23]

3.4 Decommissioning Phase

The Decommissioning phase applies when the equipment will no longer be used to meet the organization’s
objectives. This is commonly referred to as the equipment EOL. During the decommissioning phase, the equipment is
removed from operational service and is typically not maintained. The equipment might be maintained in an out-of-
service mode for a period of time while waiting for removal and disposal. From the time the equipment is placed out of
service to the time the equipment is disposed of, certain activities may need to be performed (e.g., final requalification
and calibration close out) to provide the documented evidence that the system was operating as it should at the
time of decommissioning, i.e., verifying that the appropriate state has been maintained at the end of the operational
life of the system. The decommissioning phase also includes managing the disposition of spare parts maintained
in inventory. Prior to the removal of a piece of equipment, it is important to determine if the equipment will require
decontamination. The organization may receive salvage value for the equipment and spare parts which might only be
scrap value.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 33
Equipment Reliability

Decommissioning should be included as part of the commissioning process. Equipment becomes obsolete as
processes change and as the techniques, technology and approaches to meeting process needs change also.
Decommissioning obsolete equipment, and recommissioning others where appropriate, is part of the continuous
cycle of commissioning.

In order for decommissioning to be effective, it needs to be embedded in the existing approach to commissioning and
to take a strategic and holistic commissioning approach.

The ISPE Good Practice Guide: Decommissioning of Pharmaceutical Equipment and Facilities [5] provides thorough
guidance with respect to the GxP decommissioning process which includes testing responsibilities, employee
support, how to properly decommission GMP documents for equipment that is removed from a validated system,
asset disposal, remediation and demolition, and roles and responsibilities.

The decommissioning and the EOL evaluation process include a refurbish, repurpose or replace decision process.
The following are considerations for this decision process:

• Evaluating needs, past and present—have the needs changed?

• What are the risks associated with extending the life of the asset? Decline in performance, lack of vendor/
manufacturer support (i.e., software, obsolescence, etc.), physical condition, new technology of replacement
equipment is superior from both cost and functionality aspects, etc.

• Determining the value of the equipment in its current state—what is the residual value of the equipment? (actual
and/or remaining book value)

• Determining the cost to refurbish the equipment—what will the equipment be worth after a successful repair is
made?

• Can the equipment be reused in a different capacity and replaced?

• Determining the cost of new equipment—what are the repair versus replacement factors? Considerations include
repair cost as a percentage of new cost, features, reliability, installation costs, energy costs (standard versus high
efficiency), time frame, warranty protection, availability of spare parts, protection from obsolescence, regulatory/
health/safety/environmental considerations, CAPEX versus OPEX funds availability, etc.

Figure 3.5 illustrates the asset lifecycle, showing the EOL stage.

Figure 3.5: Example of Equipment Reaching End-of-Life [1]

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 34 ISPE Good Practice Guide:
Equipment Reliability

Spare parts obsolescence is unavoidable and is not a result of how inventory is managed or how well storeroom
personnel are trained. It is inevitable that any spare part will become obsolete at some point. Technology changes,
designs change, equipment changes, and processes change; these changes can be difficult to influence. Conversely,
how organizations plan for and respond to these changes can be managed.

The MRO spare parts removal process needs to take into account two types of obsolescence: vendor and owner.

• Vendor Obsolescence: Occurs when the Original Equipment Manufacturer (OEM) no longer sells the spare
part or when the OEM is purchased by another OEM and the physical parts remain with the same, but the
manufacturer/model information changes. The vendor may or may not have an alternative spare part to take the
place of the old part. The alternative spare part may require modifications to fit the existing application.

• Owner Obsolescence: Occurs when an organization decides to replace or upgrade existing equipment. The
existing spare parts in inventory will no longer be needed, so the spare parts become obsolete.

Begin with the End in Mind: Few organizations think about the future obsolescence or useful life of spare parts.
This can result in either not having the parts when they are needed or ending up disposing of parts that were bought
and never used.

Managing Spare Parts Obsolescence: It is important to understand and manage vendor and owner obsolescence.

• Vendor obsolescence may or may not be communicated to the owner in advance of the change. Vendor-owner
relationships are critical to ensure the vendor provides the owner with advanced notification.

• Owner obsolescence visibility is under the control of the owner. The owner has control of the impending change
that will result in the spare part being obsolete.

Project Spare Parts Evaluation: Engineering and maintenance personnel need to communicate to the storeroom
when equipment is being upgraded and the existing equipment is being decommissioned. This allows the storeroom
to reduce the stocking levels and determine the size of the final order. Typically, a capital project has a cost line item
to account for disposal of obsolete spare parts.

EOL Evaluation Process for Spare Parts: The best way to manage spare parts obsolescence is by adopting a
lifecycle approach to asset management. Lifecycle analysis takes into account the entire asset life, including the EOL
plan. Proactively managing spare parts obsolescence can save significant costs in both procurement and equipment
availability.

For best practices associated with decommissioning, refer to the ISPE Good Practice Guide: Decommissioning of
Pharmaceutical Equipment and Facilities [5].

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 35
Equipment Reliability

4 Risk Management
4.1 Overview

This chapter reviews the application of risk management to equipment assets with the goal of improving its reliability
and maintenance. Risk management is the systematic application of policies, procedures, and/or practices that are
used to assess, analyze, evaluate, and control risk. A risk management process is an enabler to making informed
decisions, being proactive, and pursuing continual improvement throughout the equipment lifecycle.

Ideally, risk management is applied throughout the lifecycle of an asset. Risk evaluation is relevant to both new and
existing equipment in an asset portfolio. Risk management should also extend from the asset to the components.
Periodic evaluation allows for the adjustment of priorities based upon changes to the asset mix, design, usage,
performance, and issues.

The risks and impacts pertinent to equipment reliability are influenced by operational context. In the pharmaceutical
industry, equipment failure has the potential to impact business, safety, and compliance standing. The risk priority of
an asset should reflect and address the severity of impact, including:

• Asset management strategy

• Reliable supply

• Product safety and effectiveness (identity, strength, quality, purity, potency)

• Profit/loss

• GMPs

• HSE

Furthermore, risk elements and mitigation may be elicited from risk assessments driven by other management
systems and/or compliance requirements where equipment assets are implicated. These other types of risk
assessment include:

• Quality risk assessment, Quality Risk Management

• Process hazard analysis, Process Safety Management (PSM)

• Environmental risk assessment, environmental risk management plan

Risk evaluation is pertinent to critical equipment components. Practical methods and tools used in establishing and
sustaining a maintenance strategy have risk management as a base; these include:

• RCM

• FMEA

• FMECA

• Calibration management

• Critical spares analysis

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 36 ISPE Good Practice Guide:
Equipment Reliability

Asset Criticality is related to risk, and the Asset Criticality analysis develops a relative priority of equipment assets
within an asset set or portfolio. Asset Criticality is beneficial in prioritizing activity and resources, including but not
limited to:

• Asset improvement

• PM optimization

• Critical spare parts

• Engineering effort

• Maintenance execution

Note: The focus of risk assessment applied for C&Q activities is on manufacturing process failure modes and are out
of the scope of this Guide.

4.2 Risk Management Process

While the practice of formal risk management is not covered here in detail (refer to ICH Q9 [16]), the process
generally develops the risk evaluation with the following facets:

• Likelihood

• Severity (of impact)

• Detectability

• Mitigating actions

4.2.1 Engagement

The key to an effective evaluation is to include the perspectives necessary to incorporate the causes and effects
of risk. Kickoff of a risk management process begins with identifying and involving stakeholders, including
representatives from the following functions:

• Operations

• Engineering (e.g., facility, utility, process)

• Maintenance

• Quality Assurance (QA)

• HSE

4.2.2 Identify Targets

Prior to launching a risk evaluation, it is crucial to define the scope and objects of evaluation. Facilitation with respect
to the scope is critical to make effective and efficient use of stakeholders’ time.

As suggested previously, risk assessment in the context of equipment reliability may be applied at multiple levels
throughout an asset’s lifecycle. In each case, preemptively define the boundaries of the systems or components
under consideration.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 37
Equipment Reliability

Prompt the stakeholders to engage and assemble evidence of equipment failures and impact to feed the evaluation
process. The basis of the evidence may include:

• Operational issues

• Maintenance and calibration issues

• Repeat failures

• Incidents/deviations/atypical events

• Equipment or process changes

For the implementation of a new asset without history or explicit evidence, equipment failure assumptions may be
collated based upon an existing asset base, network experience, or manufacturer’s data. In addition, equipment
considerations catalogued in parallel assessments may be reviewed and presented. These assessments may
include:

• Quality risk assessment

• Process hazard analysis

• Environmental risk assessment

• Value stream mapping

• Business continuity

4.2.3 Evaluate Targets

To quantify or qualify risk, the elements of risk are determined for each object under evaluation. The basis of
calculated risk are:

• Likelihood of failure

• Severity of failure impact

• Detectability of failure

The targeted objects are dependent upon the scope of the risk assessment. The level of detail corresponds to the
method and specific risk being evaluated. Suitable risk assessment tools for example assets include:

• RCM: Equipment (functions) of a pumping skid

• FMEA: Equipment components within a sterile product boundary

• Facility quality risk assessment: HVAC equipment supporting graded space

• Process hazard analysis: Instruments and safety devices within a pressurized system

Developing and aligning on rating scales for each factor serves as a best practice. The scales should be developed
with functional experts experienced in evaluating equipment risk and with an understanding of the operating context.
Determine the number of brackets (e.g., 1–5) and the range boundaries for each bracket necessary to encompass
the anticipated ranges of each element. Well-developed scales aid with the following:

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 38 ISPE Good Practice Guide:
Equipment Reliability

• Comparable, relative scores across asset groups and business units

• Consistent interpretation among stakeholders

• Effectiveness of facilitation by clearly defined boundaries

• Evaluation efficiency through reduction of subjective debate

4.2.3.1 Likelihood of Failure

The likelihood of failure is the thrust of risk in equipment reliability. Multiple sources of information may be used as an
indication of equipment’s likelihood to fail, for example:

• Operating experience

• Maintenance experience

• Calibration and corrective work order history

• Safety events

• Quality deviations

• Environmental incidents

• Equipment change history

A key consideration for equipment is the lifecycle stage of an asset and the failure profiles that apply to the equipment
and components. Conceptualization of the failure profile in equipment reliability is presented by the Bathtub curve, as
shown in Figure 4.1.

Figure 4.1: Bathtub Curve

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 39
Equipment Reliability

The curve plots failure rate versus time, with three major failure rate segments:

• Decreasing: Early “infant mortality” failure

• Constant: Random failures

• Increasing: Wear-out failures

In practice, the curve should represent the specific attributes and failure modes of the equipment or component. For
example, an electronic device may have a very steep decreasing failure rate for infant mortality, after which it remains
at a constant failure rate with no increase due to wear. Conversely, a mechanical device may begin with a constant
failure rate and end with a steep, increasing failure rate, but with no initial decrease due to infant mortality.

4.2.3.2 Severity of Impact

The severity of impact should reflect the effects of equipment failure. Severity is independent of the potential or
frequency of failure. Severity dimensions that reflect the business’s priorities should be selected, with impact that may
include:

• Personnel safety

• Environment

• Product quality, safety, and efficacy

• GMPs

• Product supply

• Financial

• Resources/effort

The facets of impact should be considered independently and holistically. For example, “supply” may capture the
scheduled impact to the supply chain due to downtime, whereas “financial” expresses the expense of repair and/or
the cost (loss) of the product. The rating of severity should be based upon experienced or expected values.

To represent risk fairly, the assessment team may consider both common and worst-case failures when establishing
severity. Critically important to an effective assessment is to rate severity based upon a single, independent failure.
Risk facilitators must carefully screen multi-failure scenarios for dependency, feasibility, and scope of evaluation.
Accounting for dependent or cascading failures will inflate the severity and overall risk score.

4.2.3.3 Detectability

Detectability in the context of risk assessment represents the ability to recognize a failure or its effects, given that a
failure has occurred. The measure of detectability is inversely proportional to risk.

The key to detectability is time relevance. Depending on the failure mode and immediacy of impact, detectability can
be determined by various means:

• Operational observation

• PM

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 40 ISPE Good Practice Guide:
Equipment Reliability

• Performance measurement

• Monitoring (predictive)

With respect to equipment reliability strategies (e.g., RCM, PdM, PMO), detectability can also provide meaningful
indicators for design or program effectiveness. While elimination or prevention of failure modes is an obvious target in
risk mitigation, improving failure detection is a powerful aspect of equipment reliability.

If detectability is not applicable or not factored, the outcome of the evaluation is risk versus risk priority.

4.2.4 Risk Assessment

Depending on the type of assessment, risk scores fall into one of two classifications, risk or risk priority, as shown in
Figure 4.2. Within each classification, a matrix or group of ranges (e.g., High, Medium, Low) may be developed to aid
in sub-classifying the objects assessed. The risk process and/or assessment team should pre-define what level of risk
will require mitigation.

Figure 4.2: Risk Assessment Matrices [24]

Risk is the product of the Likelihood and Severity of failure(s). Detectability is not considered.

Risk priority is the product of Likelihood, Severity, and Detectability. A Risk Priority Number (RPN) is used to quantify/
qualify the risk of an identified failure and its effects.

4.3 Mitigating Risk

The primary benefit of risk management is the drive for improving equipment performance and reliability. The risk or
risk priority scores developed during evaluation provide the organization with a measured mechanism to prioritize and
focus resources. The intended outcome is reduction in the overall risk to the business, with attention aimed toward
the highest potential risk(s) presented by the most critical asset(s).

In practice, and complementary to equipment maintenance, mitigation may include corrective, preventive, and
proactive elements. Consider improvements along with the factors of likelihood, severity, and detection, for example:

• Likelihood: Redesign or replace a component to eliminate a failure mode

• Severity: Add system redundancy to reduce downtime exposure

• Detection: Implement monitoring to detect failure early warning

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 41
Equipment Reliability

The outcomes of the risk management process should be shared with stakeholders. While risk controls are expected,
risk acceptance is also a valid outcome of risk management. As resources are limited in any organization, risks will be
identified that are partially or not mitigated. Based upon risk assessment and Asset Criticality analysis, communicate
the overall strategy encompassing risk control, risk acceptance, and priority.

4.3.1 Asset Criticality Analysis

The criticality of an asset indicates the relative importance of the equipment to sustaining operations with respect to
functional, business, and compliance requirements. Asset Criticality analysis applies values and ranking to equipment
assets in order to serve as an operating priority.

Asset Criticality is deduced from risk (i.e., failure likelihood and severity of impact). At an asset level, the likelihood
is an estimation of failure, based upon overall components and failure modes. For the severity element to reflect
multiple dimensions, the analysis process and evaluation may create an amalgam of the facets representing impact
(e.g., supply, product quality, personnel safety, financial, etc.). Weighting each facet of severity (e.g., 1X versus 2X,
3X, 4X, etc.) in the criticality calculation allows the analysis team to reflect the organization’s priorities in the context
of the asset priority.

For the purposes of categorizing Asset Criticality, it is beneficial to define impact groups (e.g., High, Medium, and
Low). In addition, the matrix of criticality scores can be utilized to define thresholds for which assets are deemed
“critical”. Further, consider establishing thresholds for severity elements (e.g., product quality, personnel safety) above
which an asset is considered critical regardless of the asset’s overall criticality score. For example, a fatality under
personnel safety or a recall under product quality would warrant a critical asset profile, even with a relatively low
failure likelihood or Asset Criticality.

Asset Criticality

Related to risk, Asset Criticality is determined by failure Likelihood and Severity of failure impact. A criticality number
or group is used as a means to relatively rank objects based upon potential risk, without focus on specific failures.

The criticality of an asset is generally a static attribute and is not intended as an up-to-the-minute indicator of
performance. That said, the criticality of assets is influenced by changes, so it is a good practice to periodically
reassess criticality. Impactful changes may include:

• Business requirements

• Regulatory changes

• Technology

• Design

• Asset age/obsolescence

• Level of redundancy

• Failure frequencies

Asset Criticality is the attribute used to focus organizational priorities with respect to importance and risk:

• Risk mitigation

• Improved design, operation, and/or maintenance

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 42 ISPE Good Practice Guide:
Equipment Reliability

• Maintenance planning, scheduling, and execution

• Maintenance rigor and strategy

• MRO rigor and strategy

• Capital planning

Evaluation of utility equipment and systems (e.g., electrical distribution, steam distribution, compressed air
generation, etc.) can present a unique challenge among other asset classes in the evaluation of Asset Criticality. Due
to the importance to plant operations, utilities may be designed to include redundancy. Evaluating with an assessment
assumption of a single, independent failure, the utility equipment is likely to present relatively deflated risk profiles.
To better represent risk to the business, the assessment may frame likelihood and severity with respect to lack/loss
of redundancy or contingency. Alternatively, the assessment team may consider rating scales and ranking utility
systems separately from other equipment classes. The course chosen may depend upon the strategic differentiation,
importance, and/or health of the utility systems relative to the balance of equipment assets.

4.3.2 Equipment Reliability

The power of risk management, applied to equipment reliability, is the management of the sustained availability
of the equipment for its intended purpose. The overall effect of reliability is reduced risk to operations attributed to
equipment and its maintenance.

Aligned with Asset Criticality, the prioritization of resources and effort applied to equipment reliability should be
driven by the value or risk presented to its operation, as shown in Figure 4.3. The equipment that is most critical
should be the focus for advantage and proactively reducing risk. With equipment of lesser criticality, the organization
may choose to focus on maintenance efficiency and waste reduction. A Run-to-Failure (RTF) strategy and reactive
maintenance tactics may be appropriate for the least critical equipment.

Figure 4.3: Asset Criticality and Resource Allocation

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 43
Equipment Reliability

Equipment Prioritization

With Asset Criticality as a baseline, equipment prioritization is founded upon the risk presented by current
performance—the likelihood of failures occurring with adverse impact. The prioritization is dynamic, based on
equipment performance. Frequent evaluation of metrics and trending are important for dialing into equipment that
presents as bad actors and maintaining relevance to operations. Indicators of the current failure likelihood may
include, but are not limited to:

• Operations and/or maintenance feedback

• Deviations/atypical events

• Unplanned downtime

• Performance degradation

• Change history

The analysis for operational prioritization may be more or less formal. In its simplest form, prioritization can be the
product of criticality and likelihood of failure. In-depth evaluation (e.g., FMEA) may include further quantification of
severity, likelihood, and detectability to determine risk. Bad actors are the equipment of highest importance, having
the worst or unacceptable performance, as shown in Figure 4.4.

Figure 4.4: Equipment Bad Actors

4.3.3 Spare Parts

Spare parts management is an integral part of the maintenance and reliability strategy. Assessment of spare parts is
relevant both as an independent element of strategy as well as a potential mitigator for identified asset and equipment
risks.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 44 ISPE Good Practice Guide:
Equipment Reliability

4.3.3.1 Spare Part Types

Spare parts inventory may serve different functional purposes with respect to equipment reliability, as applied to
execution and strategy. It is important to understand the intention of these spare parts types. The purposes and risks
to each should be considered during stocking evaluations.

Spare parts can be identified as:

• PM parts: Parts that experience regular wear and tear through equipment operation, and are routinely and
proactively replaced. Whether time-based or condition-based, the parts volumes for proactive maintenance are
predictable. While the known quantities may be stocked or ordered in advance of the scheduled activity, PM
parts are likely to also serve as repair parts.

• Repair parts: Parts that are necessary to complete CM on failed or poorly performing equipment. These parts,
stocked or procured in a timely manner, serve as insurance for operational availability. Repair parts may be
subdivided further as either critical or non-critical spare parts.

- Critical parts: Parts that are essential for the equipment’s operational requirements (e.g., safety, quality,
business) and must be stocked due to the risk (consequences) of the equipment being unavailable. Critical
parts are stocked as insurance for equipment availability, where criticality is an effect of acquisition lead time
and/or cost. For critical parts, the impact of equipment downtime justifies the acquisition and carrying costs
of holding the part in stock.

- Non-critical parts: Parts that are either not essential to the equipment’s operational requirements or the
acquisition does not hinder repair beyond when the consequences (risk) are acceptable. Non-critical parts
are not stocked in inventory, based upon risk. When stocking non-critical parts is considered as a matter of
convenience, the inventory impact (e.g., space, cost) should be evaluated.

• Repairable parts: Parts that are technically and economically able to be refurbished. Upon removal, whether
through corrective or proactive maintenance activity, a repairable part is rebuilt and returned to inventory as an
available spare part.

• Structural parts: Parts that provide the integrity of facilities, buildings, and equipment. Structural elements are
unlikely to fail under normal operating conditions. While elements constructed of common building materials
may be of little concern, the criticality and availability of specialized components may warrant consideration in
stocking (e.g., electronic access controls).

• Maintained parts: Parts that exhibit failure modes during storage are preventively maintained while in stock.
PM of spare parts in inventory ensure their functionality through deployment in service. Long-lead, low-turnover,
high-cost, critical spares (e.g., large electric motors) are possible candidates.

A given component in inventory may be identified as multiple types. As an example, a mechanical seal that is
routinely replaced under PM, has a significant lead time, is practical to be refurbished, and for which equipment
availability is critically important would be considered: a PM part, a critical repair part, and a repairable part.

From a risk perspective, consumable supplies should not be muddled with spare parts. Consumables are often
needed as a part of maintenance activity and may be included in MRO inventory, whether storeroom, floor stock, or
bin stock. These common items are likely to be commodities purchased in bulk to be readily available, including but
not limited to:

• Lubricants

• Gaskets

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 45
Equipment Reliability

• Filters

• Fasteners

• Finishes (e.g., paint)

• Cleaning agents

• Forms

• Personal Protective Equipment (PPE)

4.3.3.2 Stocking Evaluations

The decision to stock a spare part should be based on a combination of factors, reflecting:

• Asset Criticality

• Impact of the part with respect to functional requirements

• Extent of impact until part is available (i.e., lead time)

• Usage requirements, including preventive and corrective activity

• Expected turnover rate (i.e., annual usage)

• Inventory carrying costs

• Order costs

A spare parts assessment determines the following:

• Spare part criticality

• Stocking decision, based upon criticality and/or the maintenance strategy

• If stocked, the appropriate stocking parameters (e.g., minimum, maximum, reorder)

A visual of these determination factors is provided in Figure 4.5.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 46 ISPE Good Practice Guide:
Equipment Reliability

Figure 4.5: Critical Spare Parts

Special consideration must be given to compliance aspects of spare parts and their replacement. Expectations for
material control and change control may have significant impact on maintaining compliance. Examples include:

• Materials of construction for GMP product-contacting and potential product-contacting parts, having direct and
indirect contact with product, respectively

• Materials of construction for Process Safety Management governed processes and equipment

• Instrumentation certifications to satisfy expected functions (SIF) and levels (SIL) within a Safety Instrumented
System (SIS).

4.3.4 Risk Management Review

Risk management is a dynamic process. As equipment and its context are not static, managing equipment reliability
warrants active participation. Given the stakeholder investment in performing risk evaluations, apply diligence in risk
management throughout the equipment lifecycle.

The key to managing risk effectively is effective mitigation. With mitigating actions and priorities identified, pursue
commitments made to action and timelines. Monitor and report on risk mitigation to ensure progress. Considerations
include:

• Timing and progress of mitigation actions

• Resource availability

• Financial impact

• Effectiveness of implemented mitigations

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 47
Equipment Reliability

In addition to monitoring the progress of the current strategy and tactics, it is also valuable to revisit equipment risk
and results on a periodic basis. Continue to engage stakeholders to maintain alignment. Reviewing risk evaluations
and Asset Criticality provide the opportunity to reconsider scoring, mitigation, and priorities based upon changes in
the equipment context. An annual review typically supports traditional and related business processes; however,
periodicity may be adjusted with respect to the relative flux in the operating context. Considerations include:

• Equipment modification

• Equipment performance

• Operating requirements

• Operating context (e.g., impact severity)

• Maintenance (e.g., precision, predictive, preventive)

• Operating budget

• Asset Criticality

• Capital plan

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 49
Equipment Reliability

5 Supplier Management
Each organization has a unique combination of assets, objectives, constraints, stakeholder expectations, and
strengths and weaknesses. Each organization also has challenges and opportunities in coordination, alignment of
purpose, delivering performance, controlling costs and risks, retaining stakeholder confidence, and ensuring future
sustainability. [15]

Suppliers provide products and services that can enhance equipment reliability across all phases of the asset
lifecycle. The pharmaceutical industry has unique requirements and challenges with respect to managing equipment-
related supplier activities. Effectively managing supplier activities has the power to boost customer service, eliminate
equipment-related drug shortages, reduce operating costs, ensure compliance with regulatory requirements, and
improve the financial performance of an organization. By applying the correct equipment reliability tools, systems, and
processes, asset managers can ensure that supplier activities will provide maximum value to an organization.

Organizations need to develop a sound equipment reliability program before they can effectively incorporate the
supplier activities that can enhance equipment reliability. Figure 5.1 shows a sound reliability strategy that addresses
all phases of a system’s lifecycle.

Figure 5.1: A Sound Reliability Strategy that Addresses all Phases of a System’s Lifecycle (modified)
Source: Department of Army Technical Manual – Reliability/Availability of Electrical & Mechanical Systems [44]

5.1 Failure Reporting, Analysis, and Corrective Action System (FRACAS)

Capturing, analyzing, and incorporating corrective actions through the use of a FRACAS process is a key component
of a strong equipment reliability program.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 50 ISPE Good Practice Guide:
Equipment Reliability

Most failures do not just happen; they are caused, which means they are preventable. A closed loop FRACAS
process provides information about system failure, what happened to cause failure, and how to prevent the failure
from reoccurring. Suppliers provide valuable input throughout the equipment reliability and FRACAS processes.

Figure 5.2: FRACAS Process


Used with permission from Johnson & Johnson, www.jnj.com. [45]

“The FRACAS process requires a source of data before it can be implemented. Once data begins to become
available and the definition and implementation of processes has begun, a working FRACAS should be in place
and failure data collected for all failure data, based on the equipment technical strategy.” [45]

This system provides the details necessary to provide meaningful feedback to suppliers of services, assets,
components, and items.

5.2 Supplier Activities Throughout the Asset Lifecycle

The following sections of this chapter describe typical supplier activities and reliability tools that can be used to help
manage these activities throughout the asset lifecycle.

Suppliers can provide valuable input, knowledge, training, and resources for the following equipment reliability elements:

• Reliability analysis and predictions

• Maintainability analysis and predictions

• FMECA

• FRACAS

• Parts and materials reliability assurance

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 51
Equipment Reliability

• Critical items analysis

• Monitoring/control of subcontractors and suppliers

• Environmental effects analysis

• Reliability development, testing, and qualification

• Providing an interrelationship between reliability and FRACAS results

• Ensuring reliability performance levels are maintained

• Reliability provisions for spares

• Development and demonstration of product maintainability and testability

5.3 Supplier Products and Services

Suppliers are categorized into several different types, such as contractors, vendors, MRO spare parts manufacturers,
consultants, and service providers. Typical supplier products and services include:

• MRO supplies

• Measurement equipment

• Calibration services

• Maintenance services

• Shutdown services

• PdM equipment and services

• CMMS/EAM (Enterprise Asset Management) software

• Fabrication services

• Engineering services

• Training and certification services

• Instrumentation products and services

• Process controls products and services

• Energy management products and services

• Contract manufacturing services

• Cleanroom products and services

• Equipment manufacturers, including services

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 52 ISPE Good Practice Guide:
Equipment Reliability

• Regulatory consulting services

5.3.1 Types of Suppliers: Equipment Reliability Consultants

Organizations often use outside consultants to make significant strides towards reliability improvements. Consultants
should provide a broad range of equipment reliability knowledge. An experienced consultant has assisted other
organizations that are experiencing similar problems in implementing reliability processes and has helped them
succeed. In addition, they have seen how change can be accomplished in multiple industries by employing many
and varied methods. They also have experience facilitating change efforts, a very important ingredient for success.
Bringing external influence into the company often enables an organization to create the vision it needs to be
successful. It is also helpful if the consultant has pharmaceutical industry experience to facilitate implementation of
a compliant program. Consultants that do not have pharmaceutical industry experience may lack an appreciation for
regulated industry requirements associated with change management, GxP, data integrity, qualification and validation
knowledge, and documentation.

5.3.2 Types of Suppliers: Equipment Manufacturers

During the design phase, it is critical to work closely with equipment manufacturers to thoroughly analyze the lifecycle
costs, verify equipment is adequately designed for its intended use, and to evaluate all available options. One of the
fundamentals of a good reliability program is a process to verify the equipment is designed correctly. In the design
phase, it is much less expensive to address equipment design issues, include key options, and identify ways to
design out the need for maintenance, than it is to make changes after the equipment is installed and operating.

Organizations should ensure manufacturers provide reliability guarantees of prolonged performance, such as
availability or uptime of equipment over its lifecycle, efficiency, MTBF, and annual operating and maintenance costs.
[45]

Equipment manufacturers can also provide valuable input with the development of a condition monitoring program.
Their insight and experience can be leveraged to identify the best predictive and condition-based technologies that
can be applied to monitor the condition of their equipment.

It is also important to include equipment manufacturers in ongoing continual improvement programs. They should be
included in failure analysis and with identifying solutions for improving equipment reliability, efficiency, and potential
cost savings.

5.3.3 Types of Suppliers: Service Contract Management

It is often necessary to establish a service contract for highly specialized equipment or for equipment for which in-
house expertise is not available for maintenance. The process for developing and managing service contracts can
play an important role in the organization’s equipment reliability program.

When establishing a service contract with a supplier, it is important to structure the contract to include incentives for
improving reliability and performance. The typical service contract specifies:

• Contract duration

• Service scope

• Price

• What the service provider is or is not responsible to provide maintenance and repair services

• Terms and conditions

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 53
Equipment Reliability

The service contract should also include performance-based, or outcome-based, incentives that reward the supplier
for information sharing and cost-saving improvements. These incentives can motivate a culture of shared responsibility
and ensure the supplier makes decisions that are in your best interest, reduce unnecessary maintenance, and are
focused on long-term equipment reliability improvements. The basis for performance-based, or outcome-based,
incentives included in the service contract is to lead the supplier to choose a maintenance policy that maximizes
equipment reliability and thereby increases the combined cost-savings and profits for the organization and the supplier.

A formal Supplier Management Program should detail the process for supplier selection and ongoing management of
key suppliers. Metrics exist to track supplier quality, cost, and service level.

The following is a list of items that should be included in a performance-based contract:

“1. There is a clearly defined process and associated procedures in place for selection of all Maintenance/
Operations Service Providers at the site.

2. The site has a process in place to ensure that all Maintenance and Operations Service Providers work in
accordance with [approved] practices (CMMS, Work Order System, Compliance, Safety, Quality, etc.)

3. Common objectives are developed with on-site service providers to achieve site goals.

4. Regular meetings with major service providers are used to communicate performance and align goals,
utilizing defined metrics and targets.

5. Performance-based contracts, in which profit is at risk for performance (e.g., response time, skills, and
training), are utilized for all major service providers.

6. There is a documented process in place for evaluating in-house vs. out-sources for specific systems
maintained based on criticality, internal skills level, and cost.

7. Third party or internal audits are used to measure the performance of major service suppliers.

8. A formal service provider consolidation strategy is in place and actively used.” [45]

5.4 Planning Phase: Design for Reliability and Front End Planning – “Design it Right”

During the planning phase an organization should consider the organization context and define value:

• Organizational Context: Provides a starting point for a line of sight connection between the organization’s
objectives and supplier activities.

• Define Value: What organization values will have a direct impact on supplier activities (speed to market,
preventing drug shortages, etc.).

Supplier quality planning should include the following components [28]:

• “Describe requirements

• Identify technical and process information

• Identify potential supplier(s) (existing approved/new)

• Product/process risk assessment

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 54 ISPE Good Practice Guide:
Equipment Reliability

• Identify controls

- Product specifications/part requirements, instructions

- Potential supplier contact details

- Supplier Risk Assessment

- Product/process controls”

The organization’s Business Processes (BP) and Quality Management System (QMS) will provide the formal
processes to be used during the planning phase. An Asset Management System (AMS) will provide the framework for
aligning the BPs and QMS with a line of sight from the business objectives to the assets that enable the organization
to achieve their objectives.

The QMS and BP will provide guidance for developing user requirement specifications, identifying assets that are
critical to quality, defining process Critical Quality Attributes and Critical Process Parameters. The QMS will support
key planning processes such as system impact assessments, risk assessments, etc.

The BP considerations that are typically evaluated during the planning phase involve DfR and front end planning tools
and processes.

The AMS starts with upper management commitment in the form of an Asset Management Policy that provides the
organization’s vision, values, intentions, and direction about asset management that aligns with the business objectives.
A Strategic Asset Management Plan provides the guidelines for converting business objectives to asset management
objectives, the approach for developing asset management plans and the role of the Asset Management System in
supporting achievement of the asset management objectives. [25] Finally, the development of an Asset Management
Plan for a specific set of assets, called an asset portfolio, establishes the processes for making the best possible
decisions regarding the construction, operation, maintenance, renewal, replacement, expansion, and disposal of assets.

The following are typical processes that are used during the planning phase. It is important to determine the best way
to leverage supplier resources to assist with these processes.

5.4.1 Front End Planning (FEP)

FEP is a systematic and disciplined approach to the early definition of a project’s business and technical
requirements. The purpose of the FEP process is to create an environment very early in the asset lifecycle in which
team members can effectively analyze and address potential risks. With effective FEP, risks can be mitigated through
the development of detailed scope definition and the subsequent efficient use of project resources.

Considerations for performing FEP include:

• Development of user requirements, design specifications, and reliability specifications as part of the FEP
process; helps to establish the project scope

• Proper fit, form, and function

• Lower project costs, shortened project timeframe, and fewer changes

• Opportunity to influence the outcome of a project and control costs, which has much greater impact during the
early stages of a project

Figures 5.3 and 5.4 illustrate the opportunity for influence and the value of FEP.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 55
Equipment Reliability

Figure 5.3: Front End Planning – Opportunity for Influence


Used with permission from CAI, www.cagents.com.

Figure 5.4: The Value of Front End Definition [8]

5.4.2 Design for Reliability (DfR)

DfR is a methodology that can be applied to the engineering design function as well as to influence the procurement
and capital project functions. The following list of activities, tools, techniques, specifications, and approaches are
elements of a robust DfR program: [26]

• Phase Gate Criteria for Integrating DfR Elements into the Capital Project Process

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 56 ISPE Good Practice Guide:
Equipment Reliability

• Engineering Design Reviews

• DfR Value Proposition Communication and Training Plan

• Request for Information (RFI)

• CMMS Data Mining and Analysis

• Interview to identify concerning Failure Modes on similar assets and systems

• M&R Integrated into Engineering Specifications and URS

• OEM Performance Criteria

• LCCA

• Asset Hierarchy Design

• Configuration Management

• Asset Criticality analysis

• Digital Asset Management Strategy

• Reliability, Availability, and Maintainability (RAM) Analysis

• Reliability Centered Design (RCD) Methodology

• Maintenance Shop Design

• Maintenance Strategy Approach Decision and Development

Specifically, DfR supports asset decision making in any of these ways:

• Assess future resource requirements (budgeting)

• Assess the feasibility of investing in a particular asset (“go” or “no go”)

• Assess comparative costs of alternative design options (design appraisal)

• Decide between sources of supply (source selection)

• Identify changes in the design to optimize performance, eliminate or minimize bottlenecks, and make decisions
that positively affect the overall LCC and reliability

• Select optimum asset preservation strategies (to maximize asset life)

• DfR representative attend and support engineering design reviews

• DfR representative should bring a Defect Discovery and Prevention (DD&P) mindset to the process

5.4.3 System/Asset Criticality and Risk Assessment

Asset Criticality is a risk matrix that is used to assess risk and establish system/asset criticality to determine what is to
be achieved, establish priorities, and align and target resources.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 57
Equipment Reliability

5.4.4 Essential Information Management (Master Data)

Essential information management involves determining what equipment reliability information needs to be collected
to support reliability processes and tools.

5.4.5 Critical Items Design, Elimination, Test, and Inspection List Standard and Checklist

Critical items elimination involves identifying and potentially eliminating critical items before the design specifications
are completed. For the remaining items, identify critical design features, tests, inspection points and procedures that
will minimize the probability of failure of an asset.

5.4.6 Single Point of Failure (SPF)

Identification of SPFs involves examining their effects that pose a hazard with respect to its potential to lead directly
to loss of a safety-critical or mission-critical system function. Determine if the SPF can be designed out of the system
or asset. If the SPF cannot be designed out, develop an SPF failure strategy to identify appropriate corrective action
and risk management activities, such as increasing the part quality, increased testing/inspection, fail safe design, etc.

5.4.7 Total Cost of Ownership (TCO)/Lifecycle Cost (LCC) Analysis

A cheaper price does not necessarily equate to lower TCO or a lower LCC analysis result. The TCO/LCC analysis
should be conducted when considering critical system or asset sourcing options. It is important to analyze the entire
cost that an organization incurs throughout the lifecycle of the system or asset. The analysis includes the system
or asset price plus any costs which are jointly incurred by the supplier and the owner, and internal costs incurred
by the owner – material costs, labor costs, depreciation costs, energy costs, maintenance costs, availability, rebuild
frequency, parts obsolescence, EOL disposal (including spare parts inventory), etc. The TCO of a system or asset
also includes the non-price TCO components such as freight and packaging, inspection labor caused by the system
or asset in the owner’s organization and inventory carrying costs (parts availability), missed customer deliveries due
to shipment delays, and travel costs to visit and manage the supplier.

5.4.8 Reliability, Availability, and Maintenance (RAM) Analysis

RAM are three related characteristics, or design attributes, of a system that have significant impacts on the
sustainability or total LCC of a system. Regarding factors that are considered in a RAM analysis, the Department of
Defense (DOD) Guide for Achieving Reliability, Availability, and Maintainability (August 2005) [27] states:

“Many factors are important to RAM: system design; manufacturing quality; the environment in which the system
is transported, handled, stored, and operated, the design and development of the support system; the level
of training and skills of the people operating and maintaining the system; the availability of material required
to repair the system; and the diagnostic aids and tools (instrumentation) available to them. All of these factors
must be understood to achieve a system with a desired level of RAM. During pre-systems acquisition, the most
important activity is to understand the user [requirements]. During system development, the most important RAM
activity is to identify potential failure mechanisms and to make design changes to remove them.”

• Reliability (R) is the probability of no failures over a given duration of time, cycles, etc. For repairable assets, a
common measurement is MTBF. For non-repairable assets, a common measurement is MTTF. Reliability can be
defined in the following two ways:

- Duration or probability of failure-free operation under stated conditions

- Probability that an asset performs its intended function for a specified period of time under stated conditions

• Availability (A) is the percentage of time an asset is available to perform its function(s).

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 58 ISPE Good Practice Guide:
Equipment Reliability

Availability can be measured in the following ways:

A = Uptime ÷ Total Time; Total Time = Uptime + Downtime

A = MTBF ÷ (MTBF + MTTR)

Inherent availability is based solely on reliability and CM or repair as a function of the inherent design
characteristics of the asset. Operational availability is based on the reliability and repair, but also includes other
factors related to PM and logistics. Operational availability takes into account delays such as when spare parts or
maintenance personnel are not immediately available to support maintenance. Availability is impacted by:

- Frequency of occurrence of failures (level of reliability)

- Time required to restore operations following an asset failure or time required to perform maintenance to
prevent a failure (level of maintainability)

- Logistics to support maintenance of systems—number and availability of spare parts, maintenance


personnel and other logistics resources

• Maintainability (M) is the measure of the ability of an asset to be retained in or restored to a specified condition
when maintenance is performed by personnel having specified skill levels, using prescribed procedures and
resources, at each prescribed level of maintenance and repair. Another way to describe maintainability is a
measure of how quickly and economically asset failures can be prevented through PM or asset operation can be
restored following a failure through CM. A common measurement of maintainability in terms of activities developed
to prevent failures is MTBM or MTTR in terms of CM. “Maintainability” is not the same as “maintenance”.
Maintainability is a design parameter, while maintenance consists of actions to correct or prevent a failure event.

Maintainability is a function of design features, such as standardization and the following “ility” factors:

- Accessibility

- Rig-ability

- Cleanability

- Repairability

- Operability

- Interchangeability and modularity

Maintainability includes designing with the human element of the asset in mind. The human element includes
operators and maintenance personnel. A system that is highly maintainable can be restored to full operation in a
minimum of time with a minimum expenditure of resources.

5.4.9 Risk Management

ISO 55002 [25] states the following:

“Risk management is essential in developing asset management objectives and plans and ensuring decision
making is in line with organizational objectives and stakeholder requirements. . . Based on the risk attitude of the
organization’s top management, asset managers should establish a framework, including decision-making criteria,
to illustrate the relationship between this risk attitude and value creation within the risk management approach.”

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 59
Equipment Reliability

5.4.10 Supplier Qualification Program

An effective supplier qualification program should include, at a minimum, the following essential elements:

• Pre-qualification assessment

• Supplier questionnaire and supplier response

• Management and QA evaluation

• Approval process – approved, unapproved, approval withdrawn, reapproval

• Ongoing performance monitoring/oversight

Suppliers should be classified into different risk levels. The risk factors should be included in the supplier qualification
program as noted below.

5.4.10.1 Pre-Qualification Assessment

The objective of a supplier pre-qualification assessment is to obtain enough information to perform a preliminary risk
assessment for a supplier.

The pre-qualification assessment should not be used by itself to qualify the supplier or to assess specific compliance
with applicable GMPs and other relevant regulatory standards. Verification of compliance with relevant regulations for
suppliers and components should occur during the formal supplier qualification.
Supplier considerations include:

• Country of origin

• Supplier regulatory history

• Supplier experience

• Supplier audit results

Documentation considerations include:

• Supplier specifications (comprehensive versus superficial)

• Technical documentation

• Claims substantiation

5.4.10.2 Supplier Questionnaire

The supplier questionnaire may be used in conjunction with other guidelines and protocols. Information received on
supplier questionnaires from potential suppliers serve as the basis for evaluating companies and assessing their
capabilities in line with GMPs and quality and reliability standards.

The following are examples of supplier questionnaire questions, adapted from the Pharmaceutical Quality Group’s
risk management guide [28]):

• “is the product or service off-the-shelf or custom made?

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 60 ISPE Good Practice Guide:
Equipment Reliability

• how complex is the [manufacturing process or service]?

• is the process [or service] adequately defined and understood?

• what is the criticality of the product or service to the compliance of the end-product?

• would any product specification failure be detectable by the organization prior to use?

• what is the detectability of non-conformity in the product supplied and how it can be corrected?

• is packaging, storage and distribution fit for the product characteristics?

• is the supplier currently approved to supply products or services to the organization or are they a new
supplier?

• what is the percentage of supply to the organization’s business sector?”

The Pharmaceutical Quality Group’s guide [28] also states:

“Information about potential suppliers should be used to determine additional potential supply and business risks
and include the following:

• financial viability of supplier

• continuity of supply or service

• liability

• amount of work awarded to supplier in view of the supplier’s overall capacity

• technical capability

• distribution and transportation considerations

• agents and brokers (potential for agents and brokers to change source of supply)

• capital investment needed

• single source suppliers i.e. vulnerability

• supplier company legal status (licensing)

• ethical / political acceptability

• does the supplier have a disaster / contingency plan for supply?”

“The level of requirement depends on the level of potential risk (criticality).” [28]

5.4.10.3 Management and QA Evaluation

Management and QA are responsible for evaluating and selecting suppliers. It might be necessary to perform on-site
audits of suppliers that provide high-risk, critical materials or services to confirm the supplier conforms to GMP and
quality principles.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 61
Equipment Reliability

After management and QA have conducted a thorough supplier evaluation they should document their decision and
rationale. A quality agreement should be completed for critical suppliers to ensure a mutual understanding between
the organization and supplier with respect to the quality and regulatory requirements relevant to the materials or
services supplied and risk management expectations.

5.4.10.4 Approval Process

At the end of the evaluation process a supplier is either approved, conditionally approved, unapproved, approval
withdrawn or reapproved. Suppliers of critical assets, components, or services must be approved and may require
requalification at regular intervals. An approved supplier list that designates critical suppliers must be maintained.

5.4.10.5 Supplier Performance Monitoring

Critical suppliers are generally included in a supplier performance monitoring program. The overall performance of
suppliers should be monitored on a regular basis to ensure expected performance metrics are clearly communicated
to the supplier, measured, and reviewed with the supplier. Supplier issues are addressed with appropriate corrective
and preventive action plans and the effectiveness of corrective actions should be measured. The performance criteria
that is measured will vary based on the product or service provided by the supplier. The supplier performance process
should assign a rating such as “Exceeds Expectations”, “Meets Expectations” or “Does Not Meet Expectations”. The
following are examples of criteria to be evaluated:

• Purchase records

• On-time deliveries or performance of services

• Supplier representatives’ skills (expertise, training, and retraining) and change of personnel

• Supplier complaints

• Supplier corrective and preventive actions—open and completed deviations, out-of-specifications, etc.

• Critical supplier survey responses

• Audit findings

5.5 Installation Phase – Construction, Commissioning and Validation – “Install it Right”

The following are typical activities performed during the installation phase. Suppliers provide data, resources, and
training to assist with these tasks:

• Asset Registry – data hierarchy, attributes based on asset type, assess performance and failure modes, establish
life expectancy, record current replacement value, determine level of service and criticality

• Asset Management System – Asset Lifecycle Management – Asset Management Plan(s)

• RCM for critical systems and equipment

• Maintenance and Operations Strategy – TPM process, PMs, PdMs, and calibrations

• BOMs – specifications, materials of construction, determine what will be stored on-site, storage space
requirements, storage conditions, handling, receipt inspections

• Training – maintenance and operations

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 62 ISPE Good Practice Guide:
Equipment Reliability

• SOPs, Forms

• Operational and Maintenance SIPOC and RACI

• Installation and Commissioning – focus on RAM and DfR

• System Turnover – training, user documentation, and vendor TOP

• Factory Acceptance Testing (FAT) and Start-Up Testing

• Qualifications – IQ/OQ/PQ

• Supplier and Vendor Management – spare parts management strategy

• Maintainability walkdowns – ensure required maintenance has adequate space and headroom for activities to be
completed in a safe and practical manner

During the Installation phase, it is important to manage the collection of supplier documentation and warranty
information, assist with the coordination of equipment delivery schedules, and work with suppliers to identify
opportunities for vendor supported maintenance and operations activities. This is the time to establish the scope of
services and contracts for ongoing technical support, routine equipment inspections, maintenance, rebuilds, spare
parts stocking strategies, training, etc.

5.6 Value Creation Phase – Operations and Maintenance – “Operate and Maintain it Right”

The value creation phase is where suppliers transition from the initial sales of equipment and services to selling the
output or value that their products and services deliver. It is time for the supplier to demonstrate their ability to deliver
on maximizing equipment and system uptime.

Suppliers provide ongoing support for the following operations and maintenance reliability processes:

• Asset Management Plans

• Ongoing RCM program and TPM process support

• Condition monitoring and failure mitigation – FRACAS

• RAM Analysis – during this phase, the most important RAM activity is to ensure quality in manufacturing so that
the inherent RAM qualities of the design are not degraded

• Critical Items Lists – manage failures and ensure asset management objectives are met

• OEE – availability, throughput, and first-pass quality

• Centerlining – centerline and lock in machine operating settings for optimal line OEE

• Bad actors program – systematic process to minimize repetitive or costly repairs, extend run lengths, and focus
maintenance expenditures on those items that impact plant reliability

• SPF monitoring

• KPIs – it is critical to select the “vital few” KPIs that will provide meaningful and actionable indicators

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 63
Equipment Reliability

• PMs, PdMs – continuous monitoring

• Mistake proofing – using an automatic device or method that either makes it impossible for an error to occur or
makes the error immediately obvious once it has occurred

• Spare parts inventory management – managing critical and capital spares

• Planning, kitting, and scheduling process

5.7 Value Optimization Phase – Lean/Six Sigma, Reliability Improvement – “Improve it


Right”

The following improvement processes are used to optimize the value organizations receive from their assets through
increased asset reliability. Suppliers can play a key role in improving the reliability of assets. Suppliers provide
training, Subject Matter Experts (SMEs), technical support, analysis of data, and other services to support the
following improvement processes:

• Lean Tools (eliminate waste): 5S, TPM, RCA, Value Stream Mapping, A3 Problem Solving, etc.

Lean establishes a systematic approach to eliminating these wastes and creating flow throughout the whole
company. It can also be used to develop and implement a long-term plan to streamline operations for success.

Benefits are:

- Reduced cycle time

- Reduced inventory

- Reduced Work-in-Process (WIP)

- Reduced costs

- Increased capacity

- Improved lead times

- Increased productivity

- Improved quality

- Increased profits

• Six Sigma (reduce variation) – SPC, DMAIC approach, visual workplace, Pareto charts, etc.

• Reliability Tools – RCM, FMECA, failure analysis, etc.

• RAM Analysis – During this phase, the most important RAM activity is to monitor performance in order to
facilitate retention of RAM capability, to enable improvements in design (if there is to be a new design increment),
or of the support system (including the support concept, spare parts storage, etc.)

• Spare Parts Management Optimization – Continue to refine stocking strategy, vendor managed inventory,
obsolescence management

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 64 ISPE Good Practice Guide:
Equipment Reliability

• System Operating Limits versus Reliability Operating Limits – Make sure operations understands the difference
between these limits and understands the consequences of operating beyond these limits. Suppliers can often
provide design, operational and reliability information to help define these limits.

• Supplier/Vendor Management – Vendor reduction process helps to look for opportunities to standardize and
consolidate vendor services, spares, etc.

• Unplanned Downtime – Ensure that unplanned downtime events are feedback and reviewed with current
maintenance plans to ensure maintenance tasks are effective, efficient and failure mode driven.

• Facility Condition Assessment (Asset Life Expectancies) – This should not be a “one-time” exercise but should
be an ongoing assessment that is updated on a routine basis. Supplier input is necessary to determine when
equipment and components (especially electronic items) will no longer be supported. The supplier will provide
upgrade and replacement information.

• Risk Management – In the context of equipment reliability, risk is the likelihood that something will happen that
causes asset damage, injury, or loss. Basic equipment reliability risk assessment is the combination of two
things, as illustrated in Figure 5.5:

- The likelihood that something will happen, and

- The consequences if it does

Risk is the likelihood that an operating event will reduce the reliability of the system, asset, or item to the point
that the consequences are unacceptable.

Figure 5.5: Equipment Risk Assessment

Because failure events cannot be prevented from happening, organizations should plan and operate the system,
asset, or item so that when failure events do occur, their effects are manageable and the consequences are
acceptable. One of the keys to providing reliable processes is managing risks.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 65
Equipment Reliability

5.8 End-of-Life Phase – Refurbish, Repurpose, Decommission, Replace – “Decommission


it Right”

Regarding the EOL phase, Gransberg and O’Connor [29] state:

“Equipment life-cycle cost analysis (LCCA) is typically used as one component of the [asset management
process] and allows the [asset manager] to make equipment repair, replacement, and retention decisions on the
basis of [an asset’s economic or useful life]. The decision to repair, overhaul, or replace [an asset] is a function of
ownership and operating costs. Determining the remaining service life of an asset must take into consideration
physical life, profit life and economic life.”

The asset replacement decision takes into consideration “depreciation, inflation, investment, maintenance and
repairs, reliability (downtime) and obsolescence.” [29]

Suppliers provide valuable input to assist organizations with asset EOL decisions. Suppliers will typically inform
organizations when assets will no longer be supported or become obsolete. They will provide options for refurbishing
or replacing assets.

System refurbishing, replacement, and retirement decision and management tools include:

• Facility condition assessment

• Repair or replace criteria

• EOL decision-making criteria

• Electronic component lifecycle management (special consideration due to short life span)

• Decommissioning process – spare parts removal

Ownership costs can be determined by computing the Equivalent Uniform Annual Cost (EUAC) of the initial costs and
the estimated salvage value, as shown in Figure 5.6.

Figure 5.6: Economic Life of Equipment Based on the Cost Minimization Method

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 66 ISPE Good Practice Guide:
Equipment Reliability

After an asset is purchased, installed and used, it eventually begins to wear out and suffer mechanical problems. At
some point, it reaches the end of its useful life and must be replaced. [29] Suppliers provide support for assets and
can also play an important part in determining the asset EOL. Suppliers should inform users of obsolescence, and
dates when they will no longer provide updates/upgrades, etc.

Thus, a major element of profitable asset management is the process of making the asset replacement decision. This
decision involves determining when it is no longer economically feasible to repair or refurbish an asset. The following
are components of an asset management economic decision-making model [29]:

• “[Asset] life: Determining the estimated useful life for a given [asset].

• Replacement analysis: Analytic tools to compare alternatives to replace [an asset] that has reached the end
of its useful life.

• Replacement equipment selection: Methods to make [the optimal decision based on options and alternatives
that provide the best solution to the asset] replacement decision.”

Gransberg and O’ Connor [29] also state:

“The economic life, alternative selection and replacement timing of equipment can be determined using
replacement analysis. The methods can be categorized as either theoretical replacement methods or practical
replacement methods . . .Determining the appropriate timing to replace a piece of equipment requires that its
owner include not only ownership costs and operating costs, but also other costs that are associated with owning
and operating the given piece of equipment. These include depreciation, inflation, investment, maintenance,
repair, downtime, and obsolescence costs.”

The following methods are commonly used to predict asset EOL [29]:

• Minimum Cost Method

• Maximum Profit Method

• Payback Period Method

• Mathematical Modeling Method

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 67
Equipment Reliability

6 Operations and Maintenance


This chapter describes how operations and maintenance effect equipment reliability over the total lifecycle,
where maintenance should be applied and improved, and where data driven decisions can be made from asset
performance.

6.1 Risk-Based Decisions

The operations and maintenance departments are key stakeholders and have a large input into a risk-based
decision-making process. They have first-hand knowledge of how the equipment operates, common failure modes,
maintenance issues, and often the effects of failures as they have occurred in the environment. In a situation
where new assets are being introduced, operations and maintenance can also feed into the risk assessments with
knowledge of past operational experiences.

This tribal knowledge (gained from first-hand experience) of operations and maintenance personnel could override
data-based sources where credibility of the data could be in question. For example, more emphasis may be placed
on an operator’s experience with the reliability of a system rather than recorded corrective repair work orders where
maintenance data is not readily available or reliable.

Examples of risk-based decisions commonly made in the operations and maintenance phase of an equipment
lifecycle include:

PM strategy changes (frequency, tasks, etc.)

• Calibration strategy changes

• Equipment lifecycle replacement

• Spare parts stock versus non-stock

• Shutdown cycle times

• Equipment reprioritization

6.2 Lifecycle Cost

The lifecycle cost of an asset can be described as the total cost expected from an asset over the life span of the
equipment. The upfront CAPEX for an asset is typically only about 15% of the actual spend expected over the entire
lifecycle, with the majority being spent during the operations and maintenance phase. [41] Cost management during
the operation and maintenance phase should include a risk-based analysis of the highest cost contributors first,
and then a systematic approach to control the costs after the phase. The high cost of maintaining a system is often
identified as an easy item to remove to save on expenses, but this can lead to more problems in the future with
equipment reliability.

Key areas where costs can be managed during the operation and maintenance phase include:

• Spare parts usage: Tracking the total cost of spare parts and/or consumables of an asset can be a valuable
source of information to determine if a system is wearing out. An increase in spare parts usage could indicate
premature wear due to various reasons.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 68 ISPE Good Practice Guide:
Equipment Reliability

• Outside service provider usage: A risk-based analysis of which maintenance activities can be brought in-house
and removed from outside service providers is an effective way to control costs.

• Performance monitoring: Trending of adverse conditions to determine where focused equipment improvement
should be directed. See Section 6.7 for information on incident management.

6.3 Supplier Documentation – Expectations

Equipment can be introduced to a facility through a small capital project (e.g., new freezer or pump) or as part of
a large capital project (e.g., air handling unit, air compressor, or water purification skid, etc. during construction of
a new production facility). Either way, the owner of the equipment has certain requirements for the documentation
provided by the equipment supplier/manufacturer. The extent of the documentation varies from an Operations and
Maintenance manual to a full set of TOP that includes a system description and equipment cut sheets, BOM, MOC
certifications, as-built drawings, weld logs, etc.

One of the main concerns with supplier documentation is that it might not provide the expected information or enough
detail required by the customer. It is recommended that the documentation requirements are determined at the
beginning of each project through the URS. Involving suppliers at the early stage of the project, preferably design,
sets clear expectations from the owner perspective and informs what is needed from the supplier and at what stage
of the project execution.

It is a good practice to develop a standard TOP table of contents. Not all sections of the standard format will contain
information for a given system; therefore, it can be modified to contain the information necessary for the TOP being
submitted. A supplier may submit the documents as hard copies, in electronic format, or both. This expectation should
be clearly communicated to the supplier at the beginning of the project as it could cause delays in project execution
and/or increase the cost.

Recommended good practices for document management include:

• Maintaining hard copies of documents in a controlled environment (temperature, humidity, light, fire protection,
insect, pest, etc.) with controlled access

• Use of a web-based file sharing system to provide employees with access to electronic format of documents

• Updates to documents/TOPs when changes are made to the equipment, as appropriate; the personnel in charge
of any updates or upgrades to equipment (typically engineering, SME, or system owner) is responsible for the
revision

• Procedure for document management that includes the process for submitting, filing, storing, accessing, and
maintaining documents

6.4 Maintenance and Calibration Programs

Maintenance and calibration programs are a well-known regulatory requirement; specific examples include US FDA
21 CFR § 211.58, 67, and 68 [30], and Chapters 3, 5, and 6 of the EU GMP Guide [31]. Detail is purposefully left out
of these requirements for how to efficiently sustain a maintenance and calibration program. This section of the Guide
provides definitions and best practices for various maintenance and calibration functions.

In the most simplistic sense, the maintenance and calibration functions within a facility exist to maintain the reliability
of a site—repairing and/or replacing equipment before it enters the wear-out zone of the failure profile as shown in
Figure 6.1.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 69
Equipment Reliability

Figure 6.1: Classic Failure Profile for Preventive Maintenance

In Figure 6.1, PM activities are performed before the wear-out zone, but when the probability of failure is slightly
starting to tick upward. These PM activities take the form of periodic overhauls, intrusive replacements of spare parts,
or major repairs. Performing PM work would then “reset” the profile back to the flat line until the next PM interval.
The major assumption with this profile is that the equipment experiences infrequent, constant (non-random) failures
between PM intervals. Therefore, the invasive PM should be performed before entering the wear-out zone. Although
this is an ideal scenario that does not take into account random failure and changes in performance requirements, it
is the basis for most maintenance programs at implementation.

Maintenance and calibration programs should be a mutual agreement between the operations and maintenance
functions, with input from quality and validation functions where applicable, such that all parties agree on the terms
of what type and when activities will be performed. It should be understood that these PM and calibration activities
are performed to avoid entering the wear-out zone as indicated above; but it should also be understood that reactive/
CM can still occur, as reducing the probability of failure of a system to zero is costly and almost always impossible.
Additionally, a PdM program, if implemented properly, can prevent systems from entering the wear-out zone. PdM
makes use of technology to identify failure modes well before they become evident in more intrusive proactive
maintenance routines such as visual inspections or overhauls.

For all maintenance strategies, it is important that collaboration between the operations and maintenance functions
is well established. Operations personnel, SMEs, reliability engineers, and maintenance teams should work together
when developing, executing, and investigating maintenance activities. Each group has a unique view of the system
being maintained and offers varying information when building a maintenance strategy.

One way to visually represent where the different maintenance programs interact with an asset’s lifecycle is with
a failure curve, as shown in Figure 6.2. This figure represents a continuation of Figure 6.1, where the wear-out
zone begins after the point where the failure starts to occur. Well before failures become detectable with visual
inspections, early signals are possible to detect with sophisticated technologies such as ultrasonic, vibration, oil, and
thermographic analysis. At the point when the failure becomes hot, audible, or visible to the operator/technician, it is
often a higher cost to repair and requires a more intrusive solution to bring the equipment back to the base condition.
This is often when PM is performed, as it relies on a mostly visual method of detection.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 70 ISPE Good Practice Guide:
Equipment Reliability

Figure 6.2: Theoretical Failure Curve Representing Points of Failure and Their Respective Detection Types

The ISPE Good Practice Guide: Maintenance [2] outlines examples of what to include when building a maintenance
strategy and program (refer to Chapter 4 of the Maintenance Guide). The following sections in this Guide describe in
more detail the various maintenance strategies that exist.

6.4.1 Preventive Maintenance (PM)

As described previously, PM is required on equipment to restore its reliability and performance. A PM can be defined
as regularly scheduled work that is intended to avoid, delay, or detect the onset of failure to ensure that the equipment
continues to function as required by the user. A maintenance strategy involving PM should be developed with sound
principals such as RCM methodology, FMECA, or OEM recommendations. At a minimum, PM strategies should exist
for all systems in a facility that could impact HSE, quality, supply, or cost. Ideally this would be determined through the
process of a criticality analysis. See Section 4.3.1 for more information on performing an Asset Criticality analysis.

Caution should be taken when selecting a PM strategy because there is a point of diminishing return as more
complex and frequent tasks are added. This is due to the inherent nature of a PM being intrusive and often requiring
the downtime of the system. A balanced PM strategy will let a system continue to run until the functional properties
of the system are no longer performing to specification, and then they will be replaced or overhauled, bringing the
system back to its base condition.

6.4.2 Predictive Maintenance (PdM)

PdM is a best practice that should be part of any equipment reliability program. Understanding a system’s condition
using predictive technologies drives higher reliability and availability by detecting failure modes well before the
probability of failure becomes too high of a risk. Corrective repair work can be planned and scheduled, and resources

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 71
Equipment Reliability

can be allocated in an orderly way—saving costs and creating trust between the maintenance and operations
functions. PdM is a form of condition monitoring that allows overhauls to be planned more effectively based on
understanding when repairs are needed.

Organizations need to balance the cost of implementation with the cost of the repair of assets when determining
which PdM technology to deploy. A low-cost entry method is to contract out the PdM program to a trusted vendor
who will collect, analyze, and report back findings from the data. This function can then be matured and brought in-
house as expertise is gained within the facility. The benefits of bringing the PdM program in-house include a closer
relationship to the actual asset and a lower long-term cost after purchasing the diagnostic equipment. A best practice
is to gain proper training from ASTM [32] accredited training centers when developing an in-house PdM team.

PdM may also be implemented in the form of cloud-based analysis using advanced machine learning or artificial
intelligence technologies. Potential equipment anomalies could be detected by monitoring operational patterns and
alerting the maintenance and operations functions of imminent failure. This may be implemented with existing sensor
networks throughout the facility, e.g., from a Production Control System, or procuring and installing sensors that
report various types of conditions to the cloud (i.e., tri-axial vibration, ultrasonic, or temperature). This type of “off-
premises” PdM support would be an added layer of monitoring to the established PdM program within the facility.

For additional information on PdM, including examples of PdM technologies, see Chapter 9 (Appendix 3).

6.4.3 Corrective Maintenance (CM)

While the objective of PM and PdM is to reduce the amount of CM occurrences, a CM strategy is necessary for any
pharmaceutical facility. The goal of the strategy should be to maintain or increase the availability of the system(s)
and to decrease the risk to the process, personnel, and environment. This strategy should include, at a minimum,
an analysis of what equipment is deemed Run-to-Failure (RTF) and a process of escalation when failures do occur.
Typically, RTF scenarios are for equipment with minimal or no impacts to HSE, quality, or cost. Examples of RTF
equipment may be inconsequential valves in a water treatment system or exhaust fans with 100% redundancy. In this
way, equipment is only maintained when necessary; however, the cost of maintenance after failure could be increased.

The process of escalation and documentation of CM work can be just as important as a functional PM program. A
maintenance strategy should be in place that accounts for the wearing out (MTBF) of equipment such that there are
sufficient resources in place to mitigate the failures. An escalation process may include standards for troubleshooting,
OPLs, or guidelines that describe scenarios of equipment failure and subsequent actions to take. These are in place
to limit the amount of downtime or damage to affected equipment when a failure does occur. If the impact to the
product is not known at the time of failure, an impact assessment should be performed and any resulting deviations
or corrective actions developed. Documentation of the CM could serve as evidence in further RCA, or even during
regulatory audits. Care needs to be taken when documenting the work performed to ensure that the problem
discovered and any corrective actions taken are well described.

A measure of the effectiveness of the site’s reliability program is through a ratio of proactive maintenance to CM. A
higher percentage of CM to PM would imply that a sites maintenance strategy is too reactive, and improvement of the
PM program is most likely necessary. A combination of 20/40/40, that is, 20% CM (emergency work), 40% CM (non-
emergency), and 40% PM is optimal and considered an acceptable industry target. [33]

6.4.4 Calibration

A calibration program is similar to a PM program where the tasks are usually time-based, intrusive to operations, and
instruments have to be taken off-line in order to complete. It is also similar in that it is a regulatory requirement for a
site to have a program in place [30, 31]. Instruments should be segregated into critical and non-critical based on their
impact to GMP and product quality. A master list of instruments in these categories should be readily available as this
is a common ask during internal and external audits and regulatory inspections.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 72 ISPE Good Practice Guide:
Equipment Reliability

The calibration of instruments can have a significant impact on operations and maintenance. Calibrations are
intended to determine that an instrument is operating within the control limits of the process it is monitoring. With
appropriate tolerances, the calibration process ensures that the designed specifications of a process are acceptable.
An unacceptable drift in an instrument’s calibration could result in a deviation from standard, loss of product, or
regulatory findings. Calibrations are often scheduled for a conservative amount of time, often over-calibrating
to compensate for the risk of these impacts. Calibration frequencies should be determined by the criticality,
operating conditions, stability of the process, and the history of the instrument. As with PMs, adhering to the OEM’s
recommended frequency is an acceptable place to start when building a calibration program. With enough passing
results and an understanding of acceptable drift in measurements, frequencies can be extended on a case-by-case
basis. A risk analysis should be documented for each modification to an instrument’s calibration frequency. Similar
to PMs, an RCM approach to instrumentation and calibration may also be an acceptable method of determining the
strategy to maintain the intended function and operational requirements of the system.

In addition to GMP and quality implications, instrumentation calibration can also impact the HSE aspects of a
facility. For example, some locations are considered a Process Safety Management facility where a regulatory
agency enforces control of processes that may store, manufacture, or handle highly hazardous materials. Heptane,
methylene chloride, and ethanol are examples of hazardous chemicals used in the manufacturing of drug products.
These could be stored and handled in high volumes or under high-pressure where process control is critical.
Calibration of instruments used in these processes would be considered highly critical but may or may not have
impact to the product quality.

Non-PSM (Process Safety Management) facilities are also likely to have instrumentation that requires calibration for
non-GMP situations. For example, effluent flow from a waste processing stream in a pharmaceutical manufacturing
application could be regulated by the local authorities to have a pH balance of a certain tolerance and to not exceed
a total cumulative flow or rate over a period. Failure to control the waste stream could result in a Notice of Violation
which may mean that the facility was negligent in environmental responsibilities and damage the organization’s public
reputation.

6.5 Establishing and Managing Support Services

There are multiple reasons why an organization may want to establish a relationship with outside service providers
to support its operation. The decision for this partnership is commonly made when a task requires a special skill or
certification that is not available in-house, to reduce operational and maintenance cost, due to insufficient internal
resources, or to keep the focus of staff on other matters.

Organizations should have a strategic approach in selecting and managing outside service providers to achieve the
best value out of this partnership. Although cost is one the main reasons for outsourcing services, it should not be the
only focus in this process.

See also ISPE Good Practice Guide: Maintenance [2], Chapter 6 for information about contractor management and
outsourcing.

6.5.1 Identification of Business Need

There should be an established process to identify the business need for hiring outside service providers. This
process can be initiated by an individual employee or a cross-functional team. It is recommended to involve key
stakeholders and impacted functions early in this process and clearly identify the need for the service and the
expectations from the supplier.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 73
Equipment Reliability

6.5.2 Supplier Qualification and Management

Supplier qualification refers to the process by which a supplier is evaluated to ensure their qualification to perform the
work, and compliance to applicable quality standards, prior to their acceptance as a provider of goods or services.
Depending on the nature of the service, this process should include risk assessment for critical tasks that the supplier
will be performing including compliance, safety, sharing of confidential information, etc. It is a good practice to have
an approved list for critical and non-critical suppliers/service providers and provide guidance regarding the minimum
requirements applicable to each category. It is recommended to have backup qualified supplier(s) so that production
is not affected if the first vendor is not available. Refer to Chapter 5 for further information regarding supplier
qualification and management.

6.6 Performance Monitoring

Performance monitoring is essential for a successful reliability program. It involves evaluating, managing, budgeting,
and making decisions about equipment. Performance monitoring indicators should be meaningful and important to all
levels of the organization.

Examples of performance indicators are as follows, with a separation of category for those that impact the business
and those that run the business:

• Impact the Business

- Spare parts cost

- OEE

- Equipment maintenance cost, person-hours, downtime

- Equipment uptime/downtime

- Production schedule adherence due to equipment failure

- Calibration Out of Tolerance (OOT)

• Run the Business

- PM/CM ratio

- PM and calibration schedule compliance

- PM, PdM findings ratio

- Work order schedule compliance

- Deviation rate

- Number of Corrective and Preventive Actions (CAPAs), incidents, observations

For more details on performance indicators, refer to Section 3.3.3 Continual Improvement.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 74 ISPE Good Practice Guide:
Equipment Reliability

6.7 Incident Management

When a piece of equipment fails, it is important to understand why it happened, how to fix it, and how to prevent
a repeat failure. Organizations should have a systematic approach for addressing equipment failures to prevent
costly breakdowns, reduce equipment downtime, improve efficiency, and avoid regulatory observations. Incident
management is a data driven process, preferably performed by SMEs, to determine the root cause of a failure and
take corrective actions to fix the problem and prevent future incidents.

6.7.1 Event Identification

The first step of any failure analysis process is to understand the problem. The investigator needs to know what
happened, when it happened, and where it happened to depict a clear picture of the problem. A good problem
statement should be brief, based on the facts and specific. The following questions should be considered when
constructing the problem statement:

• What is the specific problem?

• Who is affected by the problem?

• What is the impact of the problem?

• Where is the problem occurring? (geographic or process location)

• How often does this problem occur?

6.7.2 Incident Investigation

Early in the process, the general approach for conducting the investigation should be clearly defined and the
team members should be identified (including SMEs from relevant functions if necessary). An effective incident
investigation should:

• Be data driven and based on factual information and events

• Identify the main cause(s) of the problem

• Find and address the core cause(s) rather than acting upon the symptoms

• Establish the relationships between the root cause(s) and the incident

• Properly document the failure analysis process to provide defensible investigation

• Identify actions to prevent the incident from reoccurring

Selecting the right tool(s) for the investigation and understanding the results are key components of an effective
incident investigation. There are numerous tools that can be utilized for incident investigation. The following
methodologies are commonly used for incident investigation and RCA:

• FMEA

• 5 Whys

• Pareto chart

• Fishbone diagram

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 75
Equipment Reliability

• Is/is not matrix

• Barrier analysis

• Cause mapping

• Scatter diagram

• Fault Tree Analysis (FTA) tool

Refer to Chapter 9 (Appendix 3) for more information about RCA and FMEA.

6.7.3 Corrective and Preventive Actions (CAPA)

CAPAs are implemented to address or minimize the likelihood of a repeat incident and to prevent any other potential
nonconformity. It is critical to ensure that the root cause(s) and/or contributing factors are clearly understood before
evaluating the need for a corrective or preventive action.

Consider the following when selecting the action to address the root cause during incident investigation:

• If the immediate correction addressed the root cause of the event

• Any interim controls necessary to reduce the risk during implementation

• Cost to implement

• Time required to implement

• Complexity of CAPA implementation

• Sustainability of the chosen solution

• Potential trials required to test the solution

The characteristics of an effective CAPA are as follows:

• Defined by specific actions with specific results

• Tied to the root cause of the event

• Deliverable

• Measurable

• Sustainable

6.7.4 Effectiveness Check

Effectiveness monitoring is required to ensure that the CAPA has been implemented successfully according to the
CAPA plan and is effective in addressing the root cause of the problem. This will determine if the CAPA met the
desired outcome and had no negative impact after implementation. The success or failure of the CAPA should be
clearly defined before implementation, and quantitative and/or qualitative data should be used to provide conclusions
about the effectiveness.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 76 ISPE Good Practice Guide:
Equipment Reliability

6.7.5 Knowledge Sharing

A key factor in avoiding repeat failures is sharing the findings after the completion of an investigation and developing
an effective solution to a problem. Organizations should create a culture of learning from failures through effective
communication among employees. It is a good practice to have a knowledge sharing platform with available
templates to be used by employees for sharing findings with others. The most effective method of sharing the
knowledge is to communicate/share them directly with the users impacted through discussion in site meetings, team
meetings, tiered meetings, pre-job briefings, shift-change or any other setting where affected personnel meet and
communicate. To facilitate ongoing sharing of incident lessons they can be retained on a web-based file sharing site.
Table 6.1 is an example of a knowledge sharing template.

Table 6.1: Example Knowledge Sharing Template

One-Point Lesson Title:

Relevant information (Work Order, Incident Investigation Report #, Deviation #, etc.):

Key Words:

Incident/Problem Solution/Lessons Learned

Author: Date:

6.8 Change Management

Organizations should have a change management system in order to evaluate and implement any changes in a GxP
environment. This process is normally managed by quality systems and requires a systematic approach to implement
changes with GxP impact. The process is discussed in detail in the ISPE PQLI® Guide: Part 3 – Change Management
System as a Key Element of a Pharmaceutical Quality System [34].

This section is intended to discuss the change management process and industry good practices from the reliability
engineering perspective. The scope can vary from changes implemented into a new process or computerized system
to replacing outdated equipment or discontinued component.

A change management procedure should be followed for successful change implementation and to prevent any
undesired outcomes. The procedure should include the following steps:

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 77
Equipment Reliability

• Change Justification: The change management process starts with identifying the need for the change (e.g.,
process improvement, cost saving, obsolete parts, etc.). The change can be identified by process owners,
engineering, maintenance personnel or even contractors performing work on-site. It is recommended to involve
SMEs early in this process and seek their opinion about the change.

• Risk Assessment: Evaluating the risk of the change is one of the important steps in the change management
process. It should include risk(s) identification, risk(s) analysis and risk(s) control. Refer to ICH Q9 [16].

• Change Approval Process (Pre and Post): The change needs to be approved before and after implementation
by the process owner and/or SME.

• Implementation: The change control implementation plan must ensure execution of all pre, post and phased
action items identified in change control. The action items must be clearly defined with specific deliverables.

• Documentation: Updating SOPs, maintenance and calibration procedures, BOM, spare part list and drawings.

• Training: The requirement for training should be identified during risk assessment and completed pre and/or
post change implementation.

• Effectiveness Check: It is recommended to evaluate the success of the change after implementation. This will
verify whether the change objective(s) have been met and had no negative outcome. The success criteria and
evaluation period must be clearly pre-defined.

In case the change involves component replacement, the change management process should define different types
of replacement part(s) and the subsequent actions that must be taken to complete the process. The replacement part
falls in one of the categories below:

• Exact for Exact: Identical to the original part and does not need any further action.

• Like for Like (functionally equivalent): Refers to the replacement components that perform the same function
as the one being replaced. The replacement part may come from different manufacturer and not be identical to
the part being replaced. However, it has the same specifications (e.g., MOC, process range, etc.) as the original
part with minimal or no impact on the parent system in terms of installation, automation, and control changes.

• Non-Equivalent: Non-identical components to the original part with different specifications and size that requires
significant changes to the parent system.

6.9 Periodic Review

Periodic review of equipment reliability helps to verify the health of the equipment within a facility and to identify
problem areas. The output of the review should feed into the 10-year capital plan (as discussed in Chapter 3),
describe where equipment may be failing at unacceptable rates, and drive actions that will adapt the overall
maintenance strategy.

Periodic reviews should include a comprehensive “look back” to the last performed review and should consider the
following data sources:

• CMMS (including master data information and work order history)

• Change controls

• Deviation quality records

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 78 ISPE Good Practice Guide:
Equipment Reliability

• Calibration records

• Process performance, if applicable

• Regulatory standings

A reliability best practice is making the review of these records a repeatable and sustainable process, i.e., including
references to “live” production data as much as possible so that a “refresh” of the review will be connected to the
same data sources. For example, if the work order history is needed for the periodic review, consider making a
template where the work order information can be stored and referenced easily without data manipulation. The site
guidelines and procedures need to be followed when working with data extracted from validated systems to ensure
that the data is accurate and contemporaneous. For best practices associated with data integrity, refer to the ISPE
GAMP® Guide: Records and Data Integrity Guide [35].

Periodic reviews of a facility can also identify system changes that may have not triggered a change in the
maintenance strategy. Adverse trends in CM work order count may indicate that a more robust PM strategy is
needed. This type of finding does not immediately assume that PM tasks should be done more frequently, but
more likely that the PM tasks are not effective. Adverse trends in equipment health should also trigger the use of
RCA, RCN, and Six Sigma, for example, to mitigate the possible issue. Documentation and accountability for the
corrections help to ensure that the trend does not repeat.

Another reliability best practice is the use of a bad actor list that utilizes a Pareto chart to plot equipment health from
largest to smallest. As shown in Figure 6.3, the use of failure counts can be an acceptable measure for determining
bad actors. The theory of a bad actor Pareto chart is that most of the failures typically occur in a small number of
equipment. This view allows the operations and maintenance functions to focus on only the important equipment and
to drive improvement.

Figure 6.3: Hypothetical Bad Actor Pareto Chart by Equipment Failure Count

Alternatively, the category (or X axis) of the Pareto chart could be repeat failure modes, root causes, or other factors
that may lend themselves to be grouped and similarly sorted to show where the majority of the issues are occurring.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 79
Equipment Reliability

6.10 Continuity Management

Continuity management minimizes the threat of an uncontrolled interruption to the operations of a facility. A
comprehensive risk management program often identifies areas where continuity is mandatory.

An equipment reliability best practice is to have a prepared Business Continuity Plan (BCP). The BCP should be
relevant to the equipment area and have instructional details that would resynchronize the operations between areas.
Unplanned downtime can be costly, so having a plan in place to handle the downtime efficiently and effectively is
beneficial. Items to consider including in a BCP are as follows:

• A strategy for triage during the initial onset of failure

• How the organization will transfer skills/communications if necessary (e.g., cross-training)

• Roles and responsibilities

• Physical locations of backup equipment and spare parts

• Business priorities for resynchronization (i.e., lost product, on-time shipments, loss of quality, etc.)

One way to avoid a disruption to continuity in the Operation phase is to ensure there is an acceptable amount of
redundancy in equipment. It is not economical to increase the capacity above design criteria for every system in a
facility, so a risk analysis needs to be used to determine where redundancy is needed. Single point-of-failures should
be identified as a high risk and consideration must be given to building in spares/backups or at least a BCP in place
to handle unexpected loss of continuity. An example of a common redundancy in manufacturing is a boiler Feed
Water Pump (FWP) system. A failed FWP could result in the boiler cutting out due to low water level, or at worst an
erratic operating pump could cause a flash of cold water into the vessel creating a rapid expansion and explosion.
Having a redundant FWP is an expensive measure when designing a system but the risk is high enough to almost
always warrant it.

Another example of ensuring the continuity of operations is to provide a Reliability, Availability, Maintainability, and
Safety (RAMS) plan. A RAMS plan describes the functions responsibilities and tasks that will be integrated into all
phases of an assets life to ensure business continuity. The RAMS may include reliability modeling, FMECA, FTA,
reliability testing, hazard analysis, and hazard tracking. The goal of a RAMS analysis is to identify any significant
sources of impact that may limit the continuity of operations and provide solutions to mitigate those sources.
Reliability and availability computations can be completed during the design phase of a project to estimate the
proposed failure rates and determine if they will be acceptable. The weak points in the design can be corrected or
planned for in the facility and any recommendations on redundancy can be accommodated.

Continuity management also ensures that the knowledge of employees remains intact and transferrable. Also
known as tribal knowledge, there should be a process in place to capture the knowledge of employees who may be
maintaining or operating equipment into a procedural format so that others can learn and execute in the same way.
Often the most experienced maintenance and operations technicians understand the nuances of the equipment
without having that information documented. When those employees are reassigned or no longer in the workplace,
there can be a loss of equipment reliability unless the information is well documented.

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 81
Critical Utilities GMP Compliance Appendix 1

7 Appendix 1 – Managing Reliability in

Appendix 1
Operations
7.1 Organizational Readiness

Reliability culture has been embraced by many industries including pharmaceuticals. The culture is transforming
the traditional way of equipment maintenance by converting from a reactive to a proactive approach using modern
technologies and managing by applying asset lifecycle concepts. Beyond the technical aspects, management
systems and processes are relevant to equipment reliability throughout its lifecycle. As equipment performance
connects multiple key aspects of the business, the engagement of reliability extends beyond the maintenance
department.

7.1.1 Leadership Sponsorship

To achieve cross-functional engagement towards reliability, it is important to obtain full sponsorship from senior
leadership. Effective communication with senior leadership can be achieved by providing a clear business case. The
potential benefits and improvement of reliability implementation need to be translated into financial and operational
benefits. The financial justification or return of investment can be quickly set with the required priority, not only for the
maintenance department, but as an organization-wide priority.

Reliability, as with traditional management systems, may incorporate elements throughout the organization including,
but not limited to:

• Management commitment and resources

• People engagement and accountability

• Change management

• Risk management

• Personnel (education and training)

• Continual improvement

• System evaluation

It is important to create the basis for the full context of asset management, equipment reliability, and maintenance.

7.1.2 Maintenance and Reliability Organizational Structure

The maintenance and reliability organizational structure can be built in several ways. Typically, the maintenance and
reliability functions report to engineering. This structure ensures a high level of alignment among project engineering,
maintenance, and reliability to achieve a high level of knowledge transfer and resource efficiency. There are also
other structures in which maintenance and reliability report to plant management directly or report to operation
management. Each of these structures has its own advantages based on the business priorities and operational
models of the organization.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 82 ISPE Good Practice Guide:
Appendix 1 Equipment Reliability

The key roles and responsibilities that contribute to reliability are as follows:

• Reliability engineer

- For new projects, ensure equipment design, selection, and installation are implemented with asset lifecycle
strategy

- Provide SME information for equipment maintenance plans

- Set up and monitor maintenance and reliability KPIs

- Establish and manage an equipment condition monitoring program

- Review deficiencies and failure modes in CM and provide solutions

- Develop and manage processes (such as FRACAS) to capture, analyze, and correct bad actors

- Establish equipment engineering standardization

Maintenance engineer responsibilities that are commonly combined with the reliability engineer role:

- Develop procedures and plans for maintenance overhauls and shutdowns

- Review maintenance cost-effectiveness

- Manage or assist the processes for contractor qualifications

- Manage equipment history files

- Provide training to trades for precision installation, alignment technique, and other skills

• Maintenance manager/supervisor

- Manage the maintenance functions including planning and CMMS

- Manage work order system

- Manage PM and CM

- Work closely with reliability functions to identify bad actors and improvement opportunities

- Manage spare part system

- Manage maintenance staff

- Communicate and collaborate with other department functions and other sites (if applicable)

- Manage department costs

• Operations/production management

- Work closely with maintenance and reliability

- Collaborate with maintenance for maintenance planning and scheduling

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 83
Equipment Reliability Appendix 1

- Provide production staff with basic equipment maintenance knowledge and skills

- Proactively monitor equipment performance and report early anomalies

• Other organizational involvement

- Engage the reliability culture in the organization—departments such as IT and process engineering are
important for reliability partnership

- Establish equipment lifecycle concept in business management systems

- Connect equipment reliability risks with business risks

7.1.3 Reliability KPIs

The reliability KPIs should be set with a clear tie to the long-term corporate vision and strategy.
Equipment reliability KPIs are some of the most important measurements to determine asset performance. Such
measurements provide directional and quantitative information for the leadership of the organization to determine
areas for improvement, as well as strategies for facility and equipment investment requirements. Reliability KPIs are
usually set in combination with maintenance operation KPIs.

Typical reliability KPIs include:

• Asset utilization rate

• OEE/production schedule adherence

• Equipment uptime

• MTBF

• MTTR

• CM versus PM rate

7.2 Risk Management

The practice of reliability involves managing the risks that threaten equipment availability. The prospect of managing
equipment reliability effectively involves developing the full context of equipment’s risks within its operating context,
and proactively and systematically reducing the risk to the overall operation.

While assessments and mitigation strategies developed during design serve as a baseline, Risk-Based Asset
Management persists through the equipment lifecycle. The management challenge includes establishing both
a periodic, systematic review and the mechanisms to measure the facets of risk. As operations and equipment
performance are subject to change, risk management must incorporate and reflect the current reality. Whether
founded in business or compliance drivers, the following operational elements may be considered to identify the
symptoms of equipment issues:

• Operational issues

• System changes

• Performance trends

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 84 ISPE Good Practice Guide:
Appendix 1 Equipment Reliability

• Atypical events (e.g., quality, safety, environmental)

• PM or PdM findings

• Repeated equipment failures

• Downtime extent

Refer to Chapter 4 for a detailed discussion on risk management. Refer also to Chapter 9 (Appendix 3) for
information about DfR.

7.3 Operations Management

The objective of equipment reliability is to ensure the function of the equipment satisfies the operational design
requirements (i.e., performance, quality, availability); this objective becomes a cooperative movement of activity and
achievement in the Operation phase of the equipment. Equipment advocacy engages both the production/operations
and maintenance functions to improve reliability.

Through both strategic and tactical activities, reliability efforts identify risks, mitigate risks, and manage failure effects
on equipment functionality and availability. Within the context of operations, management mechanisms need to
support the risk and technical aspects of equipment lifecycle.

Understanding current equipment risks is pivotal to effectively managing asset priorities and proactively addressing
equipment issues. Issues presenting risk may include compliance, product quality, supply/capacity, and cost.
Management may decide to include monitoring of the following factors, depending on the relevance to the operation:

• Equipment functional failures

• Deviations, atypical events

• Changes to operations or equipment requirements

• Changes to equipment

Whether issues are the subject of compliance requirements and/or asset risk, the pursuit of root cause, corrective
actions, and preventive actions should be documented and tracked to resolution. The engineering practices exercised
during equipment design are equally applicable to problem solving. The tool set available for reliability should include
those necessary for problem solving and continual improvement.

In support of risk management, the technical aspects and tools of reliability engineering should be managed through
the equipment lifecycle. The tools and processes that are leveraged for improvement may include the following:

• Performance analysis (e.g., trending, SPC, Weibull)

• RCA (e.g., Fishbone, 5 Whys, flowchart)

• FRACAS

• FMECA

• Application of predictive technologies

• Precision maintenance

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 85
Equipment Reliability Appendix 1

• Technical requirements (i.e., equipment, spare parts)

• PM optimization

• Periodic risk evaluation (Risk-Based Asset Management)

• Asset analysis (e.g., utilization, OEE, repair/replace, make/buy)

7.4 Asset Obsolescence Management and Retirement Process

The challenges of operating obsolete equipment are common in the pharmaceutical industry due to compliance
requirements. To ensure the equipment can continue to provide reliable service until a planned replacement or
upgrade, a systematic approach should be used to assess, identify, and implement actions to address these
concerns. These actions include:

• Obtaining spare parts in appropriate quantities to cover the expected operation duration

• Proactive change control to upgrade key obsolete components (e.g., PLC, transmitters, etc.)

• Implementing condition monitoring for early problem detection

• Implementing intensive PM based on equipment condition

• Identifying outside service providers if possible

• Strengthening internal equipment SME training to compensate for the lack of OEM service

The final stage of the lifecycle, the Decommissioning phase, concludes the Operation phase and removes equipment
from service. Whether being idled, repurposed, replaced, or disposed of, decommissioning is the process of removing
an equipment asset from active service. A decommissioning plan, scaled to the scope of change, provides for
equipment retirement.

Proper management of a retired asset extends beyond the physical equipment. The formality of the decommissioning
activities and the required level of change management will depend on the impact of the asset. The extent of the
planning that is required is dependent on the scope of the decommissioning. The following management aspects,
with examples, should be considered when planning for the removal of an equipment asset from service:

• Engineering: Assessing process, personnel, material, and capacity change impacts

• HSE: Addressing decontamination, safety devices/systems, permits

• GxP: Performing final verification, EOL testing, and process/product change management

• Documentation: Archiving design, maintenance, and change records

• Maintenance: Deactivating maintenance plans and contract support

• Inventory: Adjusting or salvaging spare and repair parts

• Financial: Updating operating budget, ledger, depreciation, taxes, and insurance

• Asset: Updating asset register, criticality, and strategy (e.g., Risk-Based Asset Management)

For best practices associated with decommissioning, refer to ISPE Good Practice Guide: Decommissioning [5].

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 87
Critical Utilities GMP Compliance Appendix 2

8 Appendix 2 – Managing Reliability in

Appendix 2
New Projects
Fundamentally, the bases for performance, maintainability, operability, and reliability of new equipment and systems
are established in the early stages of the projects, even before the equipment and systems are designed, fabricated
and installed. The conscious and active participation of the asset management and reliability program from the
beginning of the project is therefore essential to ensuring the level of equipment performance expectations is aligned
with other project deliverables to meet the business operation plan.

The development phases of an asset reliability plan coincide with the phases of a project. The distinct phases may
vary depending on the complexity of the project and assets. The development of equipment with computerized
systems will also coincide with the system development lifecycle of the respective control systems.

Applying equipment reliability principles during a project phase extends beyond the deployment of equipment in new
facilities. Capital management plans and Risk-Based Asset Management, exercised through the Operation phase,
can identify equipment that are obsolete, at EOL, or require upgrade, refurbishment, and replacement. Even though
differentiated by scale and scope, project lifecycle and equipment reliability elements are equally relevant to small
capital projects and improvements to assets in existing operations.

This chapter follows the typical project lifecycle described in ISPE Good Practice Guide: Project Management for the
Pharmaceutical Industry [8], as shown in Figure 8.1.

Figure 8.1: Typical Project Lifecycle [8]

8.1 Stage 1: Feasibility

A project begins with an idea. A feasibility study establishes the business need and potential value of a project that
warrants investment. This study should deliver a business case communicating the following:

• Sponsorship

• Project driver

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 88 ISPE Good Practice Guide:
Appendix 2 Equipment Reliability

• Project goals and targets

• Constraints

Asset management and equipment reliability are typically not reviewed at this stage as a separate topic; however, a
healthy level of equipment performance should be expected as a basic prerequisite.

8.2 Stage 2: Conceptual Development

With a successful business case and project initiation, the project transitions to the conceptual development stage
which entails starting to identify required resources. This stage involves developing a solution with a high-level view of
the assets in the project scope.

The conceptual development stage includes the following activities:

• Establishing project strategies (e.g., quality, asset management, etc.)

• Determining the asset mix

• Defining functional requirements

• Developing the project milestone schedule

Whether an existing site or a new facility, a site Asset Management Plan and/or project-level asset management
strategy should be developed to guide the lifecycle beginning of the new assets. Asset performance and a reliability
plan should be considered together with the drafting of the URS and procurement strategy.

Refer to Chapter 3 (Equipment Lifecycle) for reliability considerations during concept and design.

See also ISPE Good Practice Guide: Asset Management [1].

8.3 Stage 3: Project Delivery Planning

In the project delivery planning stage, the project team members, organization structure, key stakeholders, project
schedule, communication plan, procurement plan, and construction plan should all be carved out. The asset
management and equipment reliability elements should be built into all of the project planning aspects. Usually, a
full-time or part-time (depending on the size and extent of the project) maintenance/reliability representative should
be involved as a project team member. Engagement at this stage ensures that the required asset management and
reliability elements are captured in the initial plans, which will drive all of the downstream stages of the project.

8.4 Stage 4: Design

8.4.1 Basis of Design

Project requirements and planning are in full force during Basis of Design. Based upon the business case and
concept, the Basis of Design establishes the baseline for project delivery, including the technical, resource, cost, and
schedule requirements. The Basis of Design creates the plan for the assets expected to satisfy the business need.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 89
Equipment Reliability Appendix 2

If not represented in prior project stages, maintenance and reliability perspectives should be engaged at the onset of
design. Asset management, reliability, and maintenance strategies are entwined with asset design and are shaped as
the design progresses. These strategies include:

• Asset specifications

• Asset selection

- Manufacturer preferences

- Equipment component preferences

- Lifecycle costing

- Capacities

• Operating strategy

• Asset Criticality

• Quality expectations for equipment and maintenance (i.e., GxP)

• HSE expectations for equipment and maintenance (i.e., process hazard analysis, ergonomics)

• Availability requirements

• Maintenance strategy

8.4.2 Detailed Design

During Detailed Design, the assets and requirements planned during the Basis of Design progress toward a finalized,
functional design. The design process culminates in the procurement of the assets expected to satisfy the objectives
of the project and the business case. Through the course of developing the asset design, the user group should
envision how assets will be operated and maintained. The ongoing reliability of an asset throughout its lifecycle is
greatly influenced by the design.

As not every asset pursued is of custom design, it is important for the design team to recognize the difference
between the objective requirements, improvements, and the design constraints. Assets may fall along the spectrum
from custom manufactured to Commercial Off-the-Shelf (COTS) equipment. While the same principles apply to
selecting equipment that satisfies user requirements, the provider may be more or less able to customize the
equipment design or options. The design degrees of freedom may be influenced by many factors including, but not
limited to:

• User requirements (e.g., flexibility, specificity, tolerance)

• Level of standardization among the asset type

• Capabilities and availability of the manufacturer

• Schedule constraints (e.g., delivery timeline)

• Cost constraints (e.g., design, manufacture, maintenance)

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 90 ISPE Good Practice Guide:
Appendix 2 Equipment Reliability

The design and selection of assets must include a broad view of user requirements. Full consideration in design may
alleviate challenges and the need for changes during installation, testing, and operation.

• DfR

- Maintainability

- Operability

- Reliability

• Compliance

- GxP

- Safety (e.g., Process Safety Management)

- Environmental

• Design standards and practices

• Lifecycle costing

• Selection

- Manufacturers

- Equipment

- Components

- Service

• Maintenance strategy

- Predictive technology

- Precision maintenance

- Maintenance frequencies and scheduling

- Maintenance execution

• Spare parts and MRO strategy

Refer to Chapter 9 (Appendix 4 – Special Interests) which includes a discussion on DfR.

Beyond explicit design reviews, participating in cross-functional project reviews offers an additional perspective
on operating expectations and maintenance requirements for equipment. For example, spatial configuration of
components, criticality of instrumentation, and operational testing may be incorporated or accommodated by design.
These project reviews include, for example:

• Process hazard analysis

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 91
Equipment Reliability Appendix 2

• Ergonomic reviews

• GxP facility reviews

When available, reviewing 3D models of assets throughout the design process provides for interactive feedback on
development. With progressive detailing of the asset’s model, a 3D model and drawings provide beneficial orientation
to the equipment’s operability and maintainability. Where the scope of the project warrants the use of Building
Information Modeling (BIM), 3D modeling is extended to incorporate the assets within their context (e.g., facility). The
benefits of 3D modeling are expanded beyond individual assets within the design scope as well as extending added
benefits into the construction phase. Added benefits of BIM include:

• Asset configuration, dimensions, and position

• Human-machine interfaces

• Component-level approach and accessibility

• Spatial context and measurements

• Design clashes

In practice, the progression of Detailed Design increases both the breadth and granularity of detail. Given the
expansion of design detail, changes to the overall design become relatively more expensive (e.g., cost, time) to
implement as the design progresses. The impact may be realized in later phases of the project, as well as beyond
project turnover.

8.5 Stage 5: Implementation

8.5.1 Procurement

The procurement process involves more than a transaction to purchase an asset. Procurement involves selecting
a vendor as well as their offered equipment. As asset management, reliability, and maintenance are affected by the
details of the purchasing agreement, consideration must be given to criteria beyond equipment design and included
in the specifications during the procurement process. These considerations include:

• Documentation/turnover expectations

• Performance requirements

• Training offerings and services

• Maintenance/technical support capabilities

• Service and support contracts

• LCC or TCO

From a maintenance and reliability perspective, procurement begins a relationship with an asset provider. When
entering a contract to purchase an asset, the vision of the maintenance strategy should be used to develop the type
of relationship and expectations for the vendor relationship.

See also Chapter 5 (Supplier Activities).

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 92 ISPE Good Practice Guide:
Appendix 2 Equipment Reliability

8.5.2 Construction

The construction phase is the building of the design into a tangible asset and includes both the manufacture of the
asset as well as its installation. As Detailed Design is influenced by the level of asset customization, the construction
of assets may incorporate various avenues of acquisition and installation:

• Commercial Off-the-Shelf (COTS) equipment assembled remotely

• Custom equipment, manufactured remotely

• Custom equipment, assembled on-site

• Field construction (on-site)

As in design, the construction phase includes aspects of both asset and maintenance management.

• Supplier management

• Change management

• Optimize operability

• Optimize maintainability

• Construction/installation inspections

• Alignment, fit, finish, adjustments

• Equipment connections and interfaces

Asset reviews must extend beyond design phases, both to attest to the build progressing as specified as well as
validating that the design meets the intended expectations for maintainability, operability, and reliability. Inspections
serve as an important activity not only toward build verification, but also potential design improvements and error
elimination. Installation is a great time to consider build integrity and installation best practices for equipment
operation and reliability.

As when deployed during design, BIM realizes similar benefits when leveraged during on-site construction.

• Asset configuration, dimensions, and position

• Spatial context and measurements

• Asset-level approach and accessibility

• Construction planning

• Installation clashes

In preparation for operational readiness, the tactical development of the maintenance strategy can begin in parallel
with the construction and installation of the assets. As asset components are defined and the operating context is
refined, the following activities can commence in preparation for asset turnover:

• CMMS preparation

• CCMS preparation

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 93
Equipment Reliability Appendix 2

• MRO storeroom preparation

• Asset hierarchy

• Asset criticality

• Equipment asset records

• RCM (Reliability Centered Maintenance)

• Risk-Based Asset Management

• Spare parts lists/BOM

• Maintenance planning and schedule development

• Critical spares

• Service contracts

• Special tools/equipment

8.5.3 Testing (FAT, SAT, C&Q)

When construction and installation are completed, an asset is prepared to be commissioned. While the documented
evidence of testing is most often associated with the C&Q processes in the pharmaceutical industry, commissioning is
a key stage gate in an asset lifecycle. The commissioning process represents placing the asset into service.

The testing (e.g., FAT, SAT, C&Q, and validation) of an asset proves its fitness for purpose. The selection of these
steps is based on the purpose and usage of the asset, the risk and criticality, and organization engineering and
quality policies and practices.

For equipment that is built off-site, it is common to conduct an FAT. The FAT serves as the final gate prior to shipment
and installation. The activity provides for inspection and hands-on experience with the equipment by representatives
of the user group. Requirements (e.g., safety, performance, operability, maintainability, operability, reliability) and
design are open to empirical review. FAT provides an opportunity for efficient adjustments or modifications to
equipment, while still in the manufacturer’s physical location and control. The SAT is similar to the FAT in nature but
conducted in the user’s premises, typically in its final intended installation location.

The commissioning activities accomplish a checkout of components, functionality, and performance of installed assets
according to equipment requirements. In the context of a regulated industry, the documented results of testing provide
evidence that the assets satisfy user requirements, supporting product quality (e.g., identity, strength, purity, safety,
efficacy), and providing the operational baseline of the validated state of the equipment and the manufacturing process.

Beyond the C&Q testing that provides evidence of user requirements in the support of process and product quality,
commissioning provides an opportunity for equipment maintenance and reliability, for example:

• Verifying installation best practices (e.g., alignment, guarding)

• Confirming components and spare parts

• Documenting baseline configurations and settings

• Capturing equipment baseline measurements for condition monitoring or PdM (e.g., vibration, infrared,
ultrasound, and oil analysis)

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 94 ISPE Good Practice Guide:
Appendix 2 Equipment Reliability

Refer to ISPE Baseline® Guide: Volume 5 – Commissioning and Qualification (Second Edition) [3] for further details
on test stage gates and expectations.

With the asset entering service and preparing for operational readiness, the execution of the maintenance strategy
continues with the creation of maintenance elements, including:

• Spare parts

• Equipment BOMs

• Maintenance plans (i.e., preventive, predictive, calibration)

• SOPs

• Corrective work job bank

C&Q culminates in the turnover of assets into the Operation phase.

8.6 Stage 6: Close-out (Project Turnover)

A capital project closes with turnover and marks the beginning of the Operation phase. The business owner accepts
ownership of the assets. While additional performance qualifications and validations may be required in the pursuit
of the manufacturing of product, the owner’s responsibility to maintain assets begins with the Operation phase.
Important project deliverables to the owner group at the onset of the Operation phase are as follows:

• Physical asset

• Financial records

• Asset documentation

• Warranties

• Equipment registrations (e.g., for pressure vessels)

As discussed in Chapter 3, the elements of the equipment lifecycle relevant once an asset has been placed in service
include:

• Operations

• Maintenance

• Modifications

• Upgrades

• Decommissioning

In the Operation phase, equipment reliability expands to the practical and strategic management of asset-based
performance, deviations, change, and risk, both from the perspectives of business outcomes as well as compliance.
Refer to Chapter 6 (Operations and Maintenance).

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 95
Critical Utilities GMP Compliance Appendix 3

9 Appendix 3 – Special Interests

Appendix 3
Special interests relate to advanced reliability and asset management techniques that intend to increase the
performance of manufacturing assets across many dimensions which may include but not limited to: increased
availability, increased reliability, reduced planned and unplanned downtime, increased lifecycle, obsolescence
management, increased maintainability, increased quality and safety, and reduced TCO. Emerging technologies are
also addressed at the end of this chapter.

9.1 Alignment with ASTM E2500

ASTM E2500 (Standard Guide for Specification, Design, and Verification of Pharmaceutical and Biopharmaceutical
Manufacturing Systems and Equipment) [36] applies a lifecycle approach to process validation, with the intention of
ensuring product quality and ongoing control of process variation. The standard includes a risk-based approach to
qualifying equipment and systems (e.g., process equipment, utility equipment, control systems, etc.) that have the
potential to affect product quality and patient safety.

In the context of process design and ongoing process validation, manufacturing systems and equipment are to be fit
for intended use. Equipment is specified, designed, installed, operated, and performs to satisfy process requirements.
As a lifecycle approach, the equipment must maintain fitness for purpose until its removal from service.

Per regulatory expectations, the ASTM guide incorporates science and risk-based decisions to the implementation
and operation of systems, equipment, and their functions, and performance. The equipment and the process it
supports are designed for quality, engaging critical quality functions to establish suitability and performance much
earlier in the qualification process.

With respect to reliability, ASTM E2500 resonates with lifecycle management, DfR, Asset Criticality, and critical
functions. Equipment reliability and qualification benefit from engagement beginning with specification and design, to
ensure ongoing fit for purpose. With a focus on Asset Criticality, critical functions, performance, and risk, effort and
resources can be applied where they have the greatest impact and/or risk mitigation.

Refer also to ISPE Baseline® Guide: Volume 5 – Commissioning and Qualification (Second Edition) [3].

9.2 Alignment with ISO 55000

In January 2014, ISO released a set of standards focused on managing assets:

• ISO 55000 (Asset Management — Overview, principles and terminology) [15]

• ISO 55001 (Asset Management — Management systems – Requirements) [37]

• ISO 55002 (Asset Management — Management Systems – Guidelines for the application of ISO 55001) [25]

These standards create the basis of an Asset Management System, by which an organization can realize the value
of its assets. In this case, lifecycle management is applied to physical assets. The objective is effective and efficient
management of assets through planning, design, implementation, maintenance, improvement, and retirement of
assets.

Refer also to ISPE Good Practice Guide: Asset Management [1].

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 96 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

9.3 Design for Reliability (DfR) Tools

DfR involves using reliability engineering tools and techniques and applying them at specific times throughout the
design process. By using this approach, organizations can not only benefit in areas such as safety and quality, but
can also gain significant benefits in the Operation phase and with regard to OPEX. It is important to understand that
small CAPEX investments in reliability at the design stage may yield savings that are many multiples of the initial
investment. Although DfR tools and techniques are generally used and built-in in the design stage, it is also important
to understand that these tools and techniques should be embedded in an overall system that continues into the
operational or sustaining phase where their complete value will be realized.

Examples of tools used to DfR include:

• FMEA (Failure Modes and Effects Analysis)

• FMECA (Failure Modes, Effects, and Criticality Analysis)

• FTA (Fault Tree Analysis)

• RBD (Reliability Block Diagram)

• AHP (Analytical Hierarchy Process)

• Weibull analysis

• Reliability KPIs (Key Performance Indicators)

9.3.1 Failure Modes Effects Analysis (FMEA) and Failure Modes, Effects, and Criticality Analysis
(FMECA)

FMEA is a tool that is generally used at the design stage of a process to establish all failure modes associated with
the equipment and then to establish the effects on the process across three dimensions: probability, severity, and
difficulty of detection. These three dimensions are multiplied to establish the risk to the process and prioritize the
associated failure modes. If the failure modes risk is deemed unacceptable, then the design is reviewed to mitigate or
eliminate the risk.

FMEA is a systematic approach to analyzing the causes and effects of product or equipment failures; it is a valuable
and important tool in assessing failure modes, their cause and effect, and how much they matter by determining a
RPN (Risk Priority Number). The RPN is generally calculated by:

RPN = Probability or Frequency × Severity × Difficulty of Detection

Where:

• Probability relates to the probability of the failure mode occurring and may be given a score, e.g., 1 to 5
with 1 being least probable and 5 being most probable. Each score is sometimes assigned a MTBF range.
Probability determination may be achieved by either historical data or expert opinion.

• Severity relates to the impact on the process or the business as a whole and may include many factors such
as environment, safety, cost, quality, and health. A score is again assigned to determine scale of severity.

• Difficulty of detection relates to how difficult it is to detect a failure mode before it actually occurs, with the
lowest number on the scale indicating excellent detection and the highest number indicating absolutely no
detection possible at all.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 97
Equipment Reliability Appendix 3

This calculation yields the failure priority. High-risk failure modes with a high RPN can then be analyzed to determine
if acceptable or unacceptable.

FMECA is an extension of FMEA that incorporates criticality into the calculation to determine the RPN. It goes a step
further than FMEA by creating a maintenance plan based on risk and criticality that strives to mitigate or eliminate
high-risk failure modes through PM activity.

9.3.2 Fault Tree Analysis (FTA)

FTA is an advanced failure analysis tool that shows as a graphic how low-level events or failure modes may interact
through AND and/or OR logic gates, and then may go on to cause a high-level undesirable or catastrophic event.
Other logic gates may also be used, but the AND and OR gates are the most common. FTA can be useful in the
design process by providing an excellent visual representation as to how events combine to cause the undesirable
event. FMEA tends to focus on failures and failure modes individually without considering other failures that may
simultaneously contribute, while FTA strives to consider all contributions and interactions. Figure 9.1 shows how a
basic FTA may be constructed on a WFI (Water for Injection) system.

Figure 9.1: Example Basic FTA on a WFI System

Based on the FTA in Figure 9.1, there is redundancy built into the PUW (Purified Water) system such that both
PUW 1 AND PUW 2 systems need to fail simultaneously for the entire WFI system to fail due to no PUW supply to
the system. Additionally, the WFI system could also fail if either the distribution pump or the distribution motor fail,
resulting in no WFI distribution to the user points.

9.3.3 Reliability Block Diagram (RBD)

The RBD is generally, but not always, constructed from an FTA; it uses the low-level events or components within
the FTA and arranges them into series and parallel block diagrams. An OR gate in an FTA is represented in a RBD
as blocks arranged in series while an AND gate is represented as blocks arranged in parallel. The benefit of the RBD
process is that it can highlight where component or equipment redundancy may or may not be required within the overall
system by highlighting weaknesses. Figure 9.2 shows an RBD constructed from the example FTA from Figure 9.1.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 98 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

Figure 9.2: Example RBD on a WFI System

The PUW 1 and PUW 2 systems were assigned an AND logic gate in the FTA in Figure 9.1 and are shown in parallel
in the RBD in Figure 9.2 to represent the redundancy built into the PUW system, i.e., both PUW systems would need
to fail simultaneously to cause complete system failure. The distribution pump and distribution motor were assigned
an OR logic gate in the FTA in Figure 9.1 and are shown in series in the RBD in Figure 9.2, i.e., if either the pump or
the motor fails, then the complete WFI system will fail.

An RBD can be particularly useful if availability data is available for each block in the diagram. Availability is
expressed in percentage or decimal form and provides information on the ability of the equipment to function when
required.

In the example RBD in Figure 9.2:

• The availability for parallel blocks A and B is calculated as follows:

Availability of PUW Supply = 1-((1-A) × (1-B))

Where: A is the availability of PUW System 1 and B is the availability of PUW System 2

• The availability for blocks C and D is calculated by simple multiplication since these blocks are in series:

Availability of WFI Distribution = C × D

Where: C is the availability of the distribution pump and D is the availability of the distribution motor

• The total system availability can be expressed by:

Total WFI System Availability = (1-((1-A) × (1-B))) × C × D

9.3.4 Analytical Hierarchy Process (AHP)

AHP is a unique decision-making tool for enabling correct decisions to be made on equipment and components that
satisfy multiple criteria and may lead to a more successful outcome.

Figure 9.3 shows an example of how the AHP process works. A combination of pairwise and matrices calculations is
used to derive a figure that can provide the best decision based on the predetermined criteria.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 99
Equipment Reliability Appendix 3

Figure 9.3: Example AHP Process

For more information on AHP, refer to Davidson and Labib [38].

9.3.5 Weibull Analysis

Weibull analysis can be a useful tool in designing for reliability by analyzing Time to Failure (TTF) data sets on similar
equipment or components, and then applying the data to a Weibull Probability Density Function (PDF). The main
advantage of using the Weibull PDF in design is that it can represent the three main types of failures that equipment
or components under analysis may undergo:

• Intrinsic Wear-Out Failure (Normal PDF): Increasing failures over time

• Extrinsic Randomly Occurring Failure (Exponential PDF): Constant failures over time

• Intrinsic Running-In Failure (Hyper-Exponential PDF): Decreasing failures over time

Other valuable data sets may also be derived from the Weibull PDF, such as characteristic life and guaranteed
life. Having data available in Weibull format can greatly assist in the design process when designing for reliability,
enabling more informed decisions on equipment and component types and their expected failure patterns and
lifecycle.

TTF data on equipment and components selected for Weibull analysis may not be readily available in the real world;
therefore, there are alternative approaches to generating a Weibull PDF:

• Median rank method

• Cumulative hazard method

The median rank method is more suitable for equipment or components that have a small or even incomplete TTF
data set. The cumulative hazard method is suited to equipment or components that may have more than one mode of
failure.

While the Weibull process can be a powerful tool to aid the decision-making process when designing for reliability, it
is limited to creating reliability data on specific components at the lowest level. For example, it would not be useful to
perform Weibull analysis on a motor or a skid because of the large number of components involved (e.g., bearings,
rotor, stator, frame, mount, nuts and bolts, etc.), but it would be useful to perform Weibull on one of these components

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 100 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

specifically, assuming that the data on each component is available. Because Weibull generates reliability data
on a single component at the lowest level, the data also provides a clear picture on a specific single failure mode
associated with the component, i.e., failed bearing or failed rotor, stator, frame, etc.

Crow-AMSAA (Army Materiel Systems Analysis Activity) is another method to measure reliability and is a statistical
extension of Weibull. This method is more suitable for measuring reliability of systems (i.e., motors and skids) due to
its ability to measure data on multiple failure modes and therefore multiple components.

9.3.6 Reliability KPIs

Examples of reliability KPIs and calculation methods that may be used in the design process are:

• MTBF (Mean Time between Failure)

• MTTR (Mean Time to Repair)

• MTTF (Mean Time to Failure)

• MWT (Mean Waiting Time)

• Availability

• Downtime

• Uptime

Within the design process, MTBF, MTTR, and MWT are the main KPIs worthy of consideration to improve the
success rate of the design. The other KPIs are either subsets or outcomes of these main KPIs.

The factors that may influence MTBF would be:

• Equipment or component choice

• Design

• Redundancy

• Condition monitoring

Factors that may influence MTTR would be:

• Accessibility

• Personnel knowledge and training

• Spare parts availability

• Built-in test functions

• Fault indicators and information display

• Personnel motivation

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 101
Equipment Reliability Appendix 3

9.4 Reliability Centered Maintenance (RCM)

RCM is a structured and disciplined approach to applying the right maintenance tasks at the right time using the right
resources to generate a maintenance strategy that utilizes advanced maintenance techniques and methods in an
appropriate manner based on risk. It has many similarities to FMECA, with the objective of both RCM and FMECA
being to preserve, or in some cases, increase equipment reliability in its present operating context.

The main difference between these two techniques is that RCM focuses on functional failures, while FMECA is more
focused on component failures. Additionally, FMECA tends to be a more high-level approach to identifying failure
modes, calculating how much these failures matter (criticality or severity), and then creating specific tasks that may
eliminate or mitigate the risk of the failure mode occurring. RCM tends to be a more structured and detailed approach
to achieve a similar but more detailed result (a task list or maintenance life plan).

There are two phases within the RCM structure:

• Analysis and generation of data

• Implementation of tasks

In the analysis phase, there are seven questions which the RCM process asks:

• What are the functions of the asset?

• What are the functional failures?

• What causes the functional failures?

• What happens when a failure occurs?

• How much does each failure matter?

• Can we predict or prevent failures, and should we be doing so?

• How should we manage the failure if prediction or prevention is not an option?

By answering what the functions of the asset under analysis are, the requirements of the assets can be determined.
For example, the function of a pump may be to provide water at a flow rate of 20 L/sec and a pressure of 3 bar.

Functional failures can then be determined, i.e., how can the function (e.g., flow rate and pressure) fail. This could be
no supply of water, partial supply, supply at low pressure, or supply at low flow etc.

Once functional failures are identified, the failure causes can be identified. For example, a loss of flow may be caused
by a failed pump, failed pipework, failed motor, etc.

What happens when the failure occurs is then documented to identify the effect of the failure and its effect on other
systems. For example, identifying the effect on the system requiring the water supply.

How much each failure matters and whether the failure can be predicted or prevented is then determined by an RCM
decision flow chart, and also suggests alternative strategies if prediction or prevention is not possible, e.g., a risk
management review or potential redesign of the equipment.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 102 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

RCM can be an intense and laborious process. It requires input from numerous personnel across different
departments and requires completion of resultant tasks. Because of these aspects, there needs to be full buy-in to the
RCM process before commencing, from the shop floor right up to the site head. The outcome from an RCM process
is a reduction of maintenance tasks by focusing on high-risk failure modes and reducing focus on failure modes that
do not matter. Reliability, availability and equipment lifetime should also increase.

It is important to perform RCM on critical equipment only, due to its time consuming and laborious nature. A criticality
assessment on the equipment should be performed prior to any RCM process. It is also important to set RCM at the
right level, e.g., RCM could be applied on a single motor, or it could be applied to where the motor is located within an
entire skid that may contain multiple components.

9.5 Total Productive Maintenance (TPM)

TPM originated from Japan in the 1950s with the car manufacturing firm Toyota being one of the first companies
to utilize TPM successfully. While there are many technical elements to TPM, the biggest element is cultural and
structural change within the entire organization. The basic TPM structure is shown in Figure 9.4, with integration of
the maintenance and operations teams being the main change along with a delayered organization.

Figure 9.4: TPM Structure

By structuring the organization in this way, the maintenance team and the operations team now become one unit
and, as a result, the traditional horizontal boundaries between these two departments are broken down and a sense
of ownership of the production equipment is instilled in the operations team with the maintenance team providing
expertise and support. Basic maintenance tasks are now carried out by the operators, with the maintenance team
enabled to work on more complex issues and continual improvement.

The main objective of TPM is to remove all waste across the entire supply chain with a view to maximizing value
across the entire organization. This then enables a Just-In-Time approach to delivery of products. In order to measure
wasteful activity, a measurement called OEE (Overall Equipment Effectiveness) is used. OEE uses three main
elements to calculate overall OEE:

OEE = Availability × Performance × Quality

Availability takes into account both planned and unplanned stoppages of the equipment. An availability rate of 100%
would mean that the process is always running and never stopped. Performance takes into account the speed of the
machine related to its validated setting. Sometimes in a manufacturing process the speed of the equipment may be
deliberately slowed below its validated speed to enable the equipment to run. This practice however tends to mask
the true root cause of failure at the validated speed and negatively impacts performance and OEE. Quality takes into
account defects during the process including rework. A quality rate of 100% would mean that the process ran with
absolutely no defects and all good parts produced.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 103
Equipment Reliability Appendix 3

In order to positively affect the complete supply chain, TPM is usually deployed using highly visual pillars, such as
shown in Figure 9.5. These pillars are usually bespoke to each organization, with some organizations making use of
some or all of the pillars at different times using different roadmaps.

Figure 9.5: Example TPM Pillars

The one common element of every TPM endeavor is that all activity and pillars are underpinned by a practice known
as 5S:

• Sort

• Set in Order

• Shine

• Standardize

• Sustain

5S is a process of prioritizing the basics in the organization to enable future success. It involves attaining and
maintaining high levels of cleanliness and visibility in the workplace.

While TPM can bring many benefits to an organization, there is an exceptionally high failure rate with up to 90% being
recorded in one study, and another study shows that every second attempt at TPM implementation failed. [39] The
following steps for TPM deployment, also shown in Figure 9.6, are integral to a successful TPM program:

• First step is the official announcement—an indicator that TPM is a high priority amongst the leaders in the
organization

• Second step revolves around education and training—giving the personnel on the ground the necessary tools for
success and promoting the strategy to get buy-in

• Third step is to define the process by establishing the policy and the goals—defining what should be achieved

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 104 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

Figure 9.6: TPM Deployment Steps

Two key points relating to the deployment of TPM are:

• If TPM is not a priority for the site, it will be unimportant to the leadership team

• If TPM is seen as unimportant at upper levels, it will not survive at ground level

9.6 Predictive Tools (for PdM)

PdM tools can be used to detect early stage defects on equipment and components before complete functional failure
may occur. The result is that unplanned downtime can be avoided and small defects can be rectified early before
developing into large breakdowns and costly repairs.

Predictive tools in maintenance may consist of:

• Human senses (e.g., heat, feel, noise, smell, visual)

• Motion amplification

• Motor current analysis

• Non-destructive testing

• Oil analysis

• Parameter monitoring (e.g., temperature, flow rate, etc.)

• Thermographic imaging/thermography

• Ultrasonic testing

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 105
Equipment Reliability Appendix 3

• Vibration analysis

• Wear particle testing

Common technologies for PdM are listed in Table 9.1 with a brief summary of their usage.

Table 9.1: Example PdM Technologies

PdM Technology Description

Motion Amplification Motion amplification is a relatively new PdM technique that uses camera and
software technology to detect minor movements on components, machines and
installations, and then amplify these movements to make them more visible.
Design faults and increases in movements over time can be tracked and
potentially acted on before failure may occur.

Motor Current Analysis MCA is similar to VA and UA in the analysis of a spectrum but relates to the current
(MCA) of a rotating motor. As the motor rotates, the electrical current and voltage varies
slightly depending on the load. An analyst can view these changes in current and
voltage and detect faults early in the life of the equipment.

Oil Analysis Oil analysis is the process of determining the count and type of particles in a
machine’s lubrication. High counts of certain particle types imply damage to
subcomponents such as gear-teeth, bearings, or seals. This is at a scale well
below the visibility of the human eye.

Thermographic Analysis All objects emit electromagnetic radiation relative to their temperature. In
equipment diagnostics, a higher temperature, when compared to a baseline or
similar piece of equipment, often means that there is an underlying issue (e.g.,
loose connection in an electrical terminal, improperly lubricated bearing on a
rotating shaft, etc.).

Ultrasonic Analysis (UA) UA is a similar method to VA but instead of movement, it uses sound. Frequencies
of sound higher than the human ear can detect are heterodyned so that an analyst
can “hear” the fault in the rotating equipment. UA is also used on static equipment
such as electric transformers, steam traps, or compressed air systems.

Vibration Analysis (VA) VA is a sophisticated means of measuring movement in rotating equipment and
matching it to known frequencies or patterns of frequencies of faults. Raw time-
waveform data is converted to the frequency domain via Fast Fourier Transform
(FFT) where an analyst can detect wear patterns well before they would cause
damage to the equipment.

PdM can result in a highly effective maintenance strategy, or if not deployed correctly it can potentially be a waste of
time, resources, productivity, and finances.

An Asset Criticality analysis should be used when selecting the most appropriate equipment for PdM. The most
appropriate PdM technique should then be matched to each high-risk failure mode. An FMECA or RCM study is ideal
to assist in matching the most appropriate PdM technique to specific failure modes, see Chapters 5 and 10.

The advantages to using predictive tools include:

• Reduced maintenance interventions which may subsequently reduce safety, quality and also cost issues.

• Increased lifetime of equipment and components due to early defect elimination or mitigation.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 106 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

• More stable and predictable processes and capacity levels due to decreased unplanned downtime.

• Better alignment of the maintenance department with business goals and objectives due to decreased costs and
increased asset utilization.

• Enhanced motivation and perception of the maintenance department due to the ability to add more value to the
wider organization.

9.7 Precision Maintenance

Precision maintenance involves the setup of equipment to very fine tolerances to ensure that the equipment operates
as efficiently and effectively as possible, and that defects or breakdowns are kept to an absolute minimum.

Research presented at the Machinery Reliability Conference in April 2001 [40]:

“92% of rotating machinery is reported to have defects at start-up that result in premature failure.”

In his article titled “Safety and Reliability Concepts” [41], Moore states:

“Rohm & Haas Company reported that you’re seven to 17 times more likely to introduce defects during start-
up than normal operation; BP reported that incidents are 10 times more likely during start-up; the chemical
industry reported process safety incidents are five times more likely during start-up; Paul Lucas, Principal
Safety Consultant for ABB Ltd., reported that 56 percent of forced outages occur less than one week after a
maintenance shutdown.”

Equipment and its associated components can move in many dimensions and can be connected or linked in many
ways. As a result, there are many opportunities for the equipment to fail due to excessive movement, misaligned
components, or basic cleanliness levels.

Examples of precision maintenance tools consist of:

• Laser alignment

• Torque setting

• Belt tensioning

9.8 Root Cause Analysis (RCA)

Incidents are events that depart from normal, whether they are unexpected, unplanned, or unwanted. As is the
case in any management system, incidents present problems to be solved as well as opportunities to improve. The
expectation for effective management should be both to correct the problem and prevent recurrence. To prevent
recurrence, one must understand the cause.

RCA is utilized to better understand an incident and isolate cause(s) from effects. Specifically, RCA is a focus on
causal factor(s) that will address the recurrence of problems and lend to continual improvement.

When evaluating an incident, it is paramount that the problem be properly identified and clearly stated. Multiple
factors may be discovered that contribute to an event. A root cause, from among the contributing factors, is the
factor that led directly to the problem. The objective of RCA is to drive the investigation to identify the root cause and
facilitate actions that will prevent repeated incidents.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 107
Equipment Reliability Appendix 3

RCA can be accomplished using many different methods and tools including, but not limited to:

• 5 Why’s: Successively asks “Why” five times, with each question refining the factors toward the root cause.

• Pareto Analysis: Charts the causes related to an issue by measure of effect (e.g., frequency, cost). Pareto
charts are traditionally bar charts with causes ranked high to low, providing focus to troubleshooting. The theory
is that 20% of causes contribute to 80% of the issues. (See Figure 6.3 for an example Pareto chart.)

• Fishbone (Ishikawa) Diagram: Diagrams causal factors within cause categories (e.g., man, method, material,
machine, and measurement). The analysis yields which causes and factors are key to resolving the issue.

• Is/Is Not Matrix: Analysis focuses on defining the boundaries of an issue and identifying the cause. When
considering the problem (e.g., who, what, why, where, when, and how), factors are considered and differentiate
between what “is” and what “is not”. May be used in preparation for other RCA tools.

• Barrier Analysis: Analysis considers the target, barriers (to harm), and hazards. The focus is on identifying the
pathways for hazards to the target, such that barriers or controls can be established to prevent hazards.

• Scatter Diagram: Data is charted to provide a graphical representation of relationships. The charts may be used
to evaluate correlations among causes or between causes and effects.

• Fault Tree Analysis (FTA): Beginning with the issue as the starting point, diagrams (as an inverted tree)
the causal factors from high level to the specific elements. The factoring continues to branch until the lowest
level, basic faults are determined. The analysis can also incorporate logic gates (e.g., AND, OR) to develop
relationships between factors. (See Figure 9.1 for an example FTA.)

• Cause Mapping: Driven by “why”, diagrams the cause(s) of effects. Key to cause mapping is dependence on
evidence to substantiate causes. The tool is similar to Fishbone with respect to a left-right orientation, but without
categorization of causes. Cause mapping is also similar to FTA, with a drive to continue factoring until basic
faults and causes are identified.

• FMEA/FMECA: See Section 9.3.

Capitalizing on RCA by developing effective CAPAs is a key pathway to improvement.

9.9 PM Optimization

Due to the inherent nature of the pharmaceutical industry, PM optimization is an important process that should be
carried out on a frequent basis by the maintenance organization and relevant support services and stakeholders.

Over time, considerable extra maintenance tasks may be added to maintenance schedules and life plans of
equipment. This may be due to a variety of reasons including poor performance, reliability, regulatory requirements,
or as a result of an assessment or audit by a governing body. Sometimes tasks may be added as a reaction to a
recommendation, but this may not be the correct decision and the tasks may not be monitored for effectiveness. The
result is an ever-increasing maintenance workload that may not be delivering the desired results.

PM optimization works by analyzing historical equipment data to determine the effectiveness of the PM strategy/plan
and optimizing accordingly. Historical data analysis may consist of performance data (i.e., downtime, MTBF, MTTR,
technical availability) and compliance data (i.e., PMs completed on time, overdue PMs, incomplete PMs). Optimizing
a PM strategy may consist of adjusting the frequency of maintenance tasks based on the PM effectiveness in
preventing failures and maintaining compliance, and also introducing or deleting tasks depending on the data
analysis.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 108 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

Maintenance data should be analyzed in terms of the following factors:

• Planned PM Work Orders: To measure the frequency of failure modes that were prevented as a result of planned
PM activity. Note that this should include predictive and condition-based work orders.

• Planned CM Work Orders: To measure the frequency of failure modes that were corrected before becoming an
issue.

• Unplanned CM Work Orders: To measure the frequency of failure modes that caused downtime and resulted in
reactive maintenance.

If a specific failure mode has been identified as being a cause of downtime and poor equipment reliability, then the
maintenance task that should have prevented that failure mode should be analyzed for effectiveness and frequency.
The opposite situation should also be considered where specific failure modes did not occur in the analysis time
period, and the frequency of maintenance should be considered.

If specific failure modes are age related (follow a consistent time pattern of failure occurrence), then it may be useful
to calculate MTBF of that failure mode. MTBF may be calculated by dividing the total time within the time period
under analysis by the number of failures within that time period. This calculation can be useful when requesting QA
approval to adjust frequency of maintenance. For example, a weekly maintenance task designed to prevent a specific
failure mode may have a MTBF of three months, so in this case it may be justified to decrease the frequency of
maintenance.

If specific failure modes are random in nature (follow an inconsistent time pattern of failure occurrence) then
identification of the shortest time period between failures during the analysis period should be established, and an
alternative frequency of maintenance may be proposed based on the data.

9.10 Obsolescence

Managing equipment obsolescence is a challenge for every industry to help ensure survival, competitive advantage,
and production capability into the future.

Equipment may become obsolete not just due to condition and age but from a variety of other factors such as:

• Technological obsolescence: The technology on the equipment is no longer fit for purpose. This may be
due to advancement in technology and non-compatibility with older systems, or lack of supporting services
on equipment with old technology. An example would be outdated PLC software or hardware that will not
communicate with more modern systems.

• Economic obsolescence: Equipment may become more expensive to maintain over time or may be beyond
economic repair. More modern equivalent equipment may be less expensive to run, resulting in a scenario where
it may be economically sensible to replace the equipment. Equipment reliability can also be a factor when it
negatively effects production and yield, and subsequently has a negative effect on the bottom line.

• Functional obsolescence: This may occur if business needs or regulatory requirements have changed, and the
functions of the equipment no longer match the requirements. An example would be a Low Pressure Hot Water
(LPHW) system that becomes undersized due to the expansion of the business.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 109
Equipment Reliability Appendix 3

9.11 Emerging Technologies

9.11.1 Machine Learning, Artificial Intelligence, and Augmented Reality

The industry is moving from programming the way equipment and systems operate or monitor process outputs to
building analytics to allow machines to provide feedback about their functional performance. Energy management
systems are pioneers in machine learning, developing analytics that allow air handling units to send alert messages
when a damper operator does not open or close within a given timeframe, a chilled water valve is not modulating
properly, or a reheat coil valve is open beyond the expected value for an excessive amount of time indicating that it is
overcompensating for a chilled water excessive cooling malfunction.

Machine Learning (ML) and Artificial Intelligence (AI) are changing equipment reliability and the supply chain in a way
that will improve efficiency and dramatically reduce costs. Machines can identify equipment malfunctions or the early
onset of failures which will allow for timely, minor adjustments or repairs, and eliminate costly downtime, catastrophic
failures, ancillary damage, and excessive costs. ML is not limited to one piece of equipment at one location but can
be used to monitor similar equipment across a site or at multiple locations around the globe.

ML and AI can provide suppliers and manufacturers with valuable information about the operating conditions of
their equipment, frequency of adjustments, failures, etc. This also allows organizations to shift from time-based
maintenance to evidence-based maintenance and can reduce dependence on route-based, routine PdM monitoring.

Organizations are currently leveraging augmented reality (AR), or mixed reality, devices. These AR technologies are
being used with customized computer applications in a number of productive and innovative ways, such as remote
DfR reviews, FATs, and remote troubleshooting support from vendors and SMEs.

9.11.2 Application of Industry 4.0 to Pharma 4.0™

The ISPE’s Pharma 4.0™ Special Interest Group (SIG) has developed an operating model to apply the principles of
Industry 4.0 to Pharma 4.0™ as shown in Figure 9.7.

Figure 9.7: From Industry 4.0 to Pharma 4.0™ Operating Model [42]

The ISPE Pharma 4.0™ SIG [42] working groups are undertaking the following:

• Holistic Control Strategy

• Impact and Maturity Models

• Process Maps and Critical Thinking

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 110 ISPE Good Practice Guide:
Appendix 3 Equipment Reliability

• Plug and Produce

• Validation 4.0

• Management Communication Strategy

• Continuous Process Verification and Process Automation

• Connected Health

Figure 9.8 represents how ISPE is bridging the ISPE Pharma 4.0™ Operating Model with the SIG working groups.

Figure 9.8: Designing the ISPE Pharma 4.0™ Operating Model with the SIG Working Groups [42]

The Pharma 4.0 [42] mission statement is to:

“provide practical guidance, embedding regulatory best practices, to accelerate Pharma 4.0™ transformations.
Our objective is to enable organizations involved in the product lifecycle to leverage the full potential of
digitalization to provide faster innovations for the benefit of patients.”

The Pharma 4.0 Operating Model will help pharmaceutical organizations create a fully automated environment that
considers data integrity from the beginning of the design period. Implementing new Industry 4.0-based manufacturing
concepts in Pharma 4.0 requires alignment of expectations, interpretation, and definitions with the pharmaceutical
regulations.

The twelve theses for Pharma 4.0 are as follows:

• Pharma 4.0 extends/describes the Industry 4.0 Operating Model for medicinal products

• In difference to common Industry 4.0 approaches, Pharma 4.0 embeds health regulations best practices.

• Pharma 4.0 breaks silos in organizations by building bridges between industry, regulators and healthcare, and all
other stakeholders.

• For the next Generation Medicinal Products, Pharma 4.0 is THE enabler and business case.

• For the established products, Pharma 4.0 offers new business cases

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 111
Equipment Reliability Appendix 3

• Investment calculations for Pharma 4.0 require innovative approaches for business case calculations.

• Prerequisite for Pharma 4.0 is an established Pharmaceutical Quality System and controlled processes and
products.

• Pharma 4.0 is not an IT Project.

• The Pharma 4.0 Operating Model incorporates next to IT also the organizational, cultural, processes, and
resources aspects.

• The Pharma 4.0 Maturity Model allows aligning the organizations operating model for innovative and established
industries, suppliers, and contractors to an appropriate desired state.

• Pharma 4.0 is not a must, but a competitive advantage. Missing Pharma 4.0 might be a business risk.

• When moving from blockbusters to niche products and personalized medicines, Pharma 4.0offers new ways to
look at business cases.

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 113
Critical Utilities GMP Compliance Appendix 4

10 Appendix 4 – References

Appendix 4
1. ISPE Good Practice Guide: Asset Management, International Society for Pharmaceutical Engineering (ISPE),
First Edition, November 2019, www.ispe.org.

2. ISPE Good Practice Guide: Maintenance, International Society for Pharmaceutical Engineering (ISPE), First
Edition, May 2009, www.ispe.org.

3. ISPE Baseline® Pharmaceutical Engineering Guide, Volume 5 – Commissioning and Qualification, International
Society for Pharmaceutical Engineering (ISPE), Second Edition, June 2019, www.ispe.org.

4. ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Calibration Management, International Society for
Pharmaceutical Engineering (ISPE), Second Edition, November 2010, www.ispe.org.

5. ISPE Good Practice Guide: Decommissioning of Pharmaceutical Equipment and Facilities, International Society
for Pharmaceutical Engineering (ISPE), First Edition, June 2017, www.ispe.org.

6. ISPE Good Practice Guide: Good Engineering Practice, International Society for Pharmaceutical Engineering
(ISPE), First Edition, December 2008, www.ispe.org.

7. ISPE Good Practice Guide: Operations Management, International Society for Pharmaceutical Engineering
(ISPE), First Edition, April 2016, www.ispe.org.

8. ISPE Good Practice Guide: Project Management for the Pharmaceutical Industry, International Society for
Pharmaceutical Engineering (ISPE), First Edition, November 2011, www.ispe.org.

9. ISPE Baseline® Pharmaceutical Engineering Guide, Volume 6 – Biopharmaceutical Manufacturing Facilities,


International Society for Pharmaceutical Engineering (ISPE), Second Edition, December 2013, www.ispe.org.

10. ISPE Good Practice Guide: Heating, Ventilation, and Air Conditioning, International Society for Pharmaceutical
Engineering (ISPE), First Edition, September 2009, www.ispe.org.

11. ISPE Good Practice Guide: Process Gases, International Society for Pharmaceutical Engineering (ISPE), First
Edition, July 2011, www.ispe.org.

12. ISPE GAMP® Good Practice Guide: A Risk-Based Approach to GxP Compliant Laboratory Computerized Systems,
International Society for Pharmaceutical Engineering (ISPE), Second Edition, October 2012, www.ispe.org.

13. ISPE Baseline® Pharmaceutical Engineering Guide, Volume 3 – Sterile Product Manufacturing Facilities,
International Society for Pharmaceutical Engineering (ISPE), Third Edition, April 2018, www.ispe.org.

14. ISPE Baseline® Pharmaceutical Engineering Guide, Volume 4 – Water and Steam Systems, International Society
for Pharmaceutical Engineering (ISPE), Third Edition, September 2019, www.ispe.org.

15. ISO 55000:2014, Asset management – Overview, principles and terminology, International Organization for
Standardization (ISO), www.iso.org.

16. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Quality Risk Management –
Q9, Step 4, 9 November 2005, www.ich.org.

17. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Pharmaceutical
Development – Q8(R2), Step 5, August 2009, www.ich.org.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 114 ISPE Good Practice Guide:
Appendix 4 Equipment Reliability

18. ISPE Good Practice Guide: Practical Application of the Lifecycle Approach to Process Validation, International
Society for Pharmaceutical Engineering (ISPE), First Edition, March 2019, www.ispe.org.

19. Vorne Industries Inc., www.leanproduction.com.

20. Capstone Technology, “How to Centerline A Process,” 28 May 2015, www.dataparc.com/blog/how-to-centerline-


a-process.

21. The Maintenance Framework, First Edition, Global Forum on Maintenance & Asset Management (GFMAM),
February 2016, ISBN 978-0-9870602-5-9, www.gfmam.org.

22. Society for Maintenance & Reliability Professionals (SMRP), www.smrp.org.

23. ISPE Handbook: Sustainability, International Society for Pharmaceutical Engineering (ISPE), First Edition,
December 2015, www.ispe.org.

24. ISPE GAMP® 5: A Risk-Based Approach to Compliant GxP Computerized Systems, International Society for
Pharmaceutical Engineering (ISPE), Fifth Edition, February 2008, www.ispe.org.

25. ISO 55002:2018, Asset management – Management systems – Guidelines for the application of ISO 55001,
International Organization for Standardization (ISO), www.iso.org.

26. Commissioning Agents, Inc. (CAI), www.cagents.com.

27. US Department of Defense, “DOD Guide for Achieving Reliability, Availability, and Maintainability,” August 2005,
dod.defense.gov.

28. Pharmaceutical Quality Group – Chartered Quality Institute (joint publication), “A Guide to Supply Chain Risk
Management for the Pharmaceutical and Medical Device Industries and their Suppliers,” 2010, www.pqg.org.

29. Gransberg, Douglas D., O’Connor, Edward P., “Major Equipment Life-cycle Cost Analysis,” Minnesota
Department of Transportation, April 2015, dot.state.mn.us/research/TS/2015/201516.pdf.

30. 21 CFR Part 211 – Current Good Manufacturing Practice for Finished Pharmaceuticals, Code of Federal
Regulations, US Food and Drug Administration (FDA), www.fda.gov.

31. EudraLex Volume 4 – Guidelines for Good Manufacturing Practices for Medicinal Products for Human and
Veterinary Use, Annex 1: Manufacture of Sterile Medicinal Products, https://ec.europa.eu/health/documents/
eudralex/vol-4_en.

32. ASTM International, West Conshohocken, PA, www.astm.org.

33. Wireman, T., Developing Performance Indicators for Managing Maintenance, Second Edition, Industrial Press,
Inc., 2005, ISBN 0831131845.

34. ISPE Guide Series: Product Quality Lifecycle Implementation (PQLI®) from Concept to Continual Improvement,
Part 3 – Change Management System as a Key Element of a Pharmaceutical Quality System, International
Society for Pharmaceutical Engineering (ISPE), First Edition, June 2012, www.ispe.org.

35. ISPE GAMP® Guide: Records and Data Integrity, International Society for Pharmaceutical Engineering (ISPE),
First Edition, March 2017, www.ispe.org.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 115
Equipment Reliability Appendix 4

36. ASTM Standard E2500-20, “Standard Guide for Specification, Design, and Verification of Pharmaceutical and
Biopharmaceutical Manufacturing Systems and Equipment,” ASTM International, West Conshohocken, PA,
www.astm.org.

37. ISO 55001:2014, Asset management – Management systems – Requirements, International Organization for
Standardization (ISO), www.iso.org.

38. Davidson, G.G. and A.W. Labib, “Learning from failures: design improvements using a multiple criteria decision-
making process,” Journal of Aerospace Engineering, January 2003, Volume 217, Issue 4.

39. Attri, Rajesh, Sandeep Grover, Nikhil Dev, “A graph theoretic approach to evaluate the intensity of barriers in the
implementation of total productive maintenance (TPM),” International Journal of Production Research, May 2014,
Volume 52, Issue 10.

40. Machinery Reliability Conference, Phoenix, AR, April 2001.

41. Moore, Ron, “Safety and Reliability Concepts,” Reliabilityweb.com, reliabilityweb.com/articles/entry/safety_and_


reliability_concepts.

42. ISPE Pharma 4.0™ Special Interest Group, www.ispe.org/initiatives/pharma-4.0.

43. US Nuclear Regulatory Commission, Regulatory Analysis Guidelines of the U.S. Nuclear Regulatory Commission
(NUREG/BR-0058, Revision 5), Appendix B: Cost Estimating and Best Practice, April 2017, www.nrc.gov.

44. US Department of Army, TM 5-698-1 Reliability/Availability of Electrical & Mechanical Systems for Command,
Control, Communications, Computer, Intelligence, Surveillance, and Reconnaissance (C4ISR) Facilities, 19
January 2007, www.armypubs.army.mil.

45. Rivera, Angel and Jim Kelly, “Johnson and Johnson Reliability Workbook,” Johnson & Johnson, 2012.

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 117
Critical Utilities GMP Compliance Appendix 5

11 Appendix 5 – Glossary

Appendix 5
11.1 Acronyms and Abbreviations

AHP Analytical Hierarchy Process


AI Artificial Intelligence
AMP Asset Management Plan
AMS Asset Management System
AR Augmented Reality
ASTM American Society for Testing and Materials
BCP Business Continuity Plan
BIM Building Information Modeling
BOM Bill of Materials
BP Business Processes
C&Q Commissioning and Qualification
CAPA Corrective and Preventive Actions
CAPEX Capital Expenditure
CBM Condition-Based Maintenance
CCMS Computerized Calibration Management System
CM Corrective Maintenance
CMMS Computerized Maintenance Management System
COTS Commercial Off-the-Shelf
CPP Critical Process Parameters
DD&P Defect Discovery and Prevention
DfR Design for Reliability
DOD Department of Defense
EAM Enterprise Asset Management
EOL End-of-Life
ETOP Engineering Turnover Package
EUAC Equivalent Uniform Annual Cost
FAT Factory Acceptance Testing
FEP Front End Planning
FFT Fast Fourier Transform
FMEA Failure Modes and Effects Analysis
FMECA Failure Modes, Effects, and Criticality Analysis
FRACAS Failure Reporting, Analysis, and Corrective Action System
FTA Fault Tree Analysis
FWP Feed Water Pump
GMP Good Manufacturing Practice
GxP Good “x” Practice
HSE Healthy, Safety, and Environment
HVAC Heating, Ventilation, and Air Conditioning
IT Information Technology

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 118 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

KPI Key Performance Indicators


LCC Lifecycle Cost
LCCA Lifecycle Cost Analysis
LPHW Low Pressure Hot Water
M&R Maintenance and Reconditioning
ML Machine Learning
MRO Maintenance, Repair, and Operations
MTBF Mean Time between Failure
MTBM Mean Time between Maintenance
MTTF Mean Time to Fail
MTTR Mean Time to Repair
MWT Mean Waiting Time
NDA Non-Disclosure Agreement
OEE Overall Equipment Effectiveness
OEM Original Equipment Manufacturer
OPEX Operating Expenses
PDF Probability Density Function
PdM Predictive Maintenance
PLC Programmable Logic Controller
PM Preventive Maintenance
PMO Preventive Maintenance Optimization
PSM Process Safety Management
PUW Purified Water
QA Quality Assurance
QMS Quality Management System
RACI Responsible, Accountable, Consulted, and Informed
RAM Reliability, Availability, Maintainability
RAMS Reliability, Availability, Maintainability, and Safety
RBD Reliability Block Diagram
RCA Root Cause Analysis
RCD Reliability Centered Design
RCM Reliability Centered Maintenance
RPN Risk Priority Number
RTF Run-to-Failure
SAMP Strategic Asset Management Plan
SIF Safety Instrumented Functions
SIG Special Interest Group
SIL Safety Integrity Levels
SIPOC Suppliers, Inputs, Process Outputs and Customers
SIS Safety Instrumented System
SME Subject Matter Expert
SMRP Society for Maintenance & Reliability Professionals
SOP Standard Operating Procedures

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 119
Equipment Reliability Appendix 5

SPC Statistical Process Control


SPF Single Point of Failure
TCO Total Cost of Ownership
TEEP Total Effective Equipment Performance
TOP Turnover Package
TPM Total Productive Maintenance
TTF Time to Failure
UA Ultrasonic Analysis
URS User Requirement Specification
VA Vibration Analysis
WFI Water for Injection

11.2 Definitions

Asset (ISO 55000 [15])

An item, thing, or entity that has potential or actual value to an organization.

Asset Criticality

Both an attribute of an asset and a process by which the importance of the equipment is determined to sustain
production in a compliant manner. Asset Criticality is used to prioritize those assets in the management system(s)
that provide the most value toward operating throughput and are most important to quality, health, safety, and
environment.

Asset Management

A coordinated set of activities that will help the organization achieve the optimal amount of value from the assets.

Asset Management System

A management system that supports the asset management policy and the strategic asset management plans.

Availability

When a tangible asset is ready for its intended use. Availability is one of three factors that determine the Overall
Equipment Effectiveness (OEE).

Bad Actor

Equipment that performs the worst in terms of effectiveness to sustain production in a safe and compliant manner.

Component

Part, item, device, subsystem, functional unit, equipment, or system that can be individually described and
considered.

A number of components (e.g., a population of components) or a sample may itself be considered as a component.
A component may consist of hardware, software, or both. Software consists of programs, procedures, rules,
documentation, and data from an information processing system.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 120 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Condition Monitoring

Activity, performed either manually or automatically, intended to measure at predetermined intervals the
characteristics and parameters of the actual state of a component.

Corrective Maintenance (CM)

Maintenance carried out after fault recognition and intended to put a component into a state in which it can perform a
required function.

Critical Asset (ISO 55000 [15])

Asset having potential to significantly impact on the achievement of the organization’s objectives.

Design for Reliability (DfR)

The process of incorporating equipment reliability and availability into equipment design.

Downtime

The time during which an asset is not available for its intended use. Downtime may be composed of both planned
and unplanned events.

Equipment

The items, articles, and implements utilized for a specific purpose in an operation or activity. The fixed assets of an
organization, other than land and buildings.

Facility

An area or building that is built, installed, or established to serve a particular purpose.

Failure

Termination of the ability of a component to perform a required function. After failure, the component has a fault,
which may be complete or partial.

Failure Mode

Manner in which the inability of a component to perform a required function occurs. A failure mode may be defined by
the function lost or the state transition that occurred.

Failure Mode and Effect Analysis (FMEA)

Systematic approach to analyzing the causes and effects of equipment failures and to determining the outcomes of all
known failure modes within a system or component.

Failure Mode, Effects, and Criticality Analysis (FMECA)

Extension of FMEA that incorporates criticality into the calculation.

Failure Rate

Number of failures of a component in a given time interval divided by the time interval.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 121
Equipment Reliability Appendix 5

Fault

State of a component characterized by inability to perform a required function, excluding the inability during PM
or other planned actions, or due to lack of external resources. A fault usually results from a failure, but in some
circumstances, it may be a preexisting fault.

Fault Tree Analysis (FTA)

A graphical method of modeling a system failure using AND and/or OR logic in a tree form.

Lifecycle

The stages that an equipment asset progresses through, from initial development through to disposal.

Lifecycle

The stages that an equipment asset progresses through, from initial development through to disposal.

Obsolescence (for maintenance purposes)

Inability of a component to be maintained due to the unavailability on the market of the necessary resources at
acceptable technical and/or economic conditions.

Overall Equipment Effectiveness (OEE)

A calculated measurement that indicates how effectively a manufacturing operation performs with respect to its
design capacity, when scheduled to run. The factors in determining OEE are Availability (% Scheduled Time),
Performance (% Design Performance), and Quality (% Good Units). OEE is expressed as a percentage.

Predictive Maintenance (PdM)

Condition-based maintenance carried out following a forecast derived from repeated analysis or known
characteristics and evaluation of the significant parameters of the degradation of the component.

Preventive Maintenance (PM)

Maintenance carried out at predetermined intervals, or according to prescribed criteria, and intended to reduce the
probability of failure or the degradation of the functioning of a component.

Preventive Maintenance Optimization (PMO)

An intentional and structured process that targets the effectiveness and efficiency of an equipment’s PM program.
The focus is upon preserving and restoring the equipment’s condition, with value-added tasks, materials, and
downtime.

Reactive Maintenance

Operating a piece of equipment until it fails and then making a repair or replacement. Failure does not necessarily
imply a complete breakdown; it might just mean that the equipment is operating outside of specified parameters.

Reliability

The probability that the equipment will perform satisfactorily for a given period of time under stated conditions.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 122 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Reliability

The probability that the equipment will perform satisfactorily for a given period of time under stated conditions.

Reliability Centered Maintenance (RCM)

A systematic and structured process used to develop an efficient and effective maintenance plan for an asset to
minimize the probability of failure.

Reliability Growth

The positive improvement in a reliability parameter over a period of time due to the implementation of corrective
actions to system design, operation, maintenance procedures, or the associated manufacturing process.

Reliability Growth

The positive improvement in a reliability parameter over a period of time due to the implementation of corrective
actions to system design, operation, maintenance procedures, or the associated manufacturing process.

Reliability Integration

The process of seamlessly and cohesively integrating reliability tools to maximize equipment reliability at the lowest
possible cost.

Root Cause Analysis (RCA)/Root Cause Failure Analysis (RCFA)

A method used to determine the underlying cause(s) of a failure. The focus is upon identifying the causes to be
eliminated that prevent additional failures versus treating failure symptoms.

Severity (of a Failure or a Fault)

Potential or actual detrimental consequences of a failure or a fault. The severity of a failure may be related to safety,
availability, costs, quality, environment, etc.

Total Effective Equipment Performance (TEEP)

A calculated measurement that indicates how effectively a manufacturing operation performs with respect to its
design capacity, based upon total calendar time. The factors in determining TEEP are Utilization (% Utilized Time),
Performance (% Design Performance), and Quality (% Good Units). TEEP is expressed as a percentage.

Total Productive Maintenance (TPM)

An operational philosophy where the total workforce of an organization plays a role in ensuring the performance of
equipment is maintained and improved. TPM is similar to Total Quality Management but with the difference that the
focus is on assets and not products.

Utility

An equipment asset that generates and/or distributes an essential supply service (e.g., electricity, water, steam, etc.)
to support a facility’s operation.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 123
Equipment Reliability Appendix 5

11.3 Additional Definitions for Reference

This section contains additional terms related to equipment reliability and are categorized as follows:

• Fundamental

• Component

• Reliability

• Failure and Event

• Faults and States

• Maintenance

• Time

These listings may also contain terms from Section 11.2, with expanded definitions.

11.3.1 Fundamental Terms

Dependability

The ability to perform as and when required. Dependability characteristics include availability and its influencing
factors (reliability, recoverability, maintainability, maintenance support performance) and, in some cases, durability,
economics, integrity, safety, security, and conditions of use. Dependability is used descriptively as an umbrella term
for the time-related quality characteristics of a product or service.

Maintenance

The combination of all technical, administrative, and managerial actions during the lifecycle of a component intended
to retain it in or restore it to a state in which it can perform the required function.

Maintenance Management

All activities of the management that determine the maintenance objectives, strategies, and responsibilities, and
implementation of them by such means as maintenance planning, maintenance control, and the improvement of
maintenance activities and economics.

Maintenance Objective

The target assigned and accepted for the maintenance activities. These targets may include, for example, availability,
cost reduction, product quality, environment preservation, safety, and/or asset value preservation.

Maintenance Plan

Structured and documented set of tasks that include the activities, procedures, resources, and the time scale required
to carry out maintenance.

Maintenance Strategy

Management method used in order to achieve the maintenance objectives. Examples could be outsourcing of
maintenance, allocation of resources, etc.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 124 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Maintenance Support Performance

Ability of a maintenance organization to have the correct maintenance support at the necessary place to perform the
required maintenance activity when required.

Operations

Combination of all technical, administrative, and managerial actions, other than maintenance actions, that results in
the component being in use. Maintenance actions carried out by operators are not included in operation.

Required Function

Function, combination of functions, or a total combination of functions of a component which are considered
necessary to provide a given service. To provide a given service may also include asset value preservation. The
given service may be expressed or implied and may in some cases be below the original design specifications.

11.3.2 Component Terms

Component

Part, item, device, subsystem, functional unit, equipment, or system that can be individually described and
considered.

A number of components (e.g., a population of components) or a sample may itself be considered as a component.
A component may consist of hardware, software, or both. Software consists of programs, procedures, rules,
documentation, and data from an information processing system.

Consumable Component

Component or material that is expendable, may be regularly replaced, and generally is not component specific.
Generally, consumable components are relatively low cost compared to the component itself.

Indenture Level

Level of subdivision within a component hierarchy. Examples of indenture levels are system, subsystem, and
component. From the maintenance perspective, the indenture level depends on the complexity of the component’s
construction, the accessibility to subcomponents, skill level of maintenance personnel, test equipment facilities, safety
considerations, etc.

Insurance Spare Part

Spare part which is not normally needed during the useful life of the component but whose unavailability would
involve an unacceptable downtime due to its provisioning. If the spare part is expensive, then for accountancy
purposes such a part may be considered as a capital asset.

Repairable Component

Component which may be restored under given conditions and, after a failure, to a state in which it can perform a
required function. The given conditions may be economical, ecological, technical, and/or others.

Spare Part

Component intended to replace a corresponding component in order to retain or maintain the original required
function of the component. The original component may be subsequently repaired.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 125
Equipment Reliability Appendix 5

Any component that is dedicated and/or exchangeable for a specific component is often referred to as replacement
component.

“Like for Like” as used for spare parts to mean that it is functionally equivalent – replacement of a part with a part that
meets the original speciation in terms of functionality, input range, output range, accuracy specification, operational
specifications, MOC.

11.3.3 Reliability Terms

Active Redundancy

Redundancy wherein more than one means for performing a required function are operating simultaneously.

Availability

When a tangible asset is ready for its intended use. Ability to be in a state to perform as and when required, under
given conditions, assuming that the necessary external resources are provided.

This ability depends on the combined aspects of the reliability, maintainability, and recoverability of the component
and the maintenance supportability. Required external resources, other than maintenance resources, do not affect the
availability of the component although the component may not be available from the user’s viewpoint. Availability may
be quantified using appropriate measures or indicators and is then referred to as availability performance.

Conformity

Fulfilment of a requirement.

Durability

Ability of a component to perform a required function under given conditions of use and maintenance, until a limiting
state is reached. A limiting state of a component may be characterized by the end of the useful life. The limiting state
may be redefined by changes in conditions of use.

Failure Mode and Effect Analysis (FMEA)

Systematic approach to analyzing the causes and effects of equipment failures and to determining the outcomes of all
known failure modes within a system or component.

Failure Mode, Effects, and Criticality Analysis (FMECA)

Extension of FMEA that incorporates criticality into the calculation.

Failure Rate

Number of failures of a component in a given time interval divided by the time interval. In some cases, unit of time
can be replaced by units of use. This can be applied to:

• Observed failure rate (as computed from a sample)

• Assessed failure rate (as inferred from sample information)

• Extrapolated failure rate (projected to other stress levels)

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 126 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Fault Tree Analysis (FTA)

A graphical method of modeling a system failure using AND and/or OR logic in a tree form.

Intrinsic/Inherent Maintainability

Maintainability of a component determined by the original design.

Intrinsic/Inherent Reliability

Reliability of a component determined by design and manufacture.

Lifecycle

The stages that an equipment asset progresses through, from initial development through to disposal.

Maintainability

Ability of a component under given conditions of use, to be retained in, or restored to, a state in which it can perform
a required function, when maintenance is performed under given conditions and using stated procedures and
resources.

Maintainability may be quantified using appropriate measures or indicators and is then referred to as maintainability
performance.

Obsolescence (for maintenance purposes)

Inability of a component to be maintained due to the unavailability on the market of the necessary resources at
acceptable technical and/or economic conditions. The necessary resources can be one (or more) subcomponent
needed to restore the component:

• Tools

• Monitoring or testing devices

• Documentary resources

• Skills, etc.

The unavailability of the resources can be due to:

• Technological development

• Market situation

• Absence of supplier

• Regulations

Redundancy

In a component, existence of more than one means for performing a required function when needed.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 127
Equipment Reliability Appendix 5

Reliability

The probability that the equipment will perform satisfactorily for a given period of time under stated conditions. Ability
of a component to perform a required function under given conditions for a given time interval.

It is assumed that the component is in a state to perform as required at the beginning of the time interval. Reliability
may be quantified as a probability or performance indicators by using appropriate measures and is then referred to
as reliability performance. In some cases, a given number of unit of uses can be considered instead of a given time
interval (number of cycles, number of running hours, number of kilometers, miles, etc.).

Reliability Growth

The positive improvement in a reliability parameter over a period of time due to the implementation of corrective
actions to system design, operation, maintenance procedures, or the associated manufacturing process.

Increase in reliability as a result of continued design modifications resulting from field data feedback.

Standby Redundancy

Redundancy wherein an alternative means for performing the particular function is only activated when the active
means is unavailable. Standby redundancy is often referred to as passive redundancy.

Useful Life

Time interval from a given instant until the instant when a limiting state is reached. The limiting state may be a
function of failure rate, maintenance support requirement, physical condition, economics, age, obsolescence,
changes in the user requirements, or other relevant factors.

11.3.4 Failure and Event Terms

Aging Failure

Failure whose probability of occurrence increases with the passage of calendar time. This time is independent of the
operating time of the component. Aging is a physical phenomenon which involves a modification of the physical and/
or chemical characteristics of the material.

Common Cause Failures

The result of an event(s) that, because of dependencies, causes a coincidence of failure states of components in
two or more separate channels of a redundant system, leading to the defined system failing to perform its intended
function. Common cause failures may reduce the effect of system redundancy.

Criticality (of a failure or a fault)

Numerical index of the severity of a failure or a fault combined with the probability or frequency of its occurrence. The
numerical index in this context may be defined, for example, as an area in the frequency of failure occurrence.

Degradation

Detrimental change in physical condition, with time, use, or external cause. Degradation may lead to a failure. In
a system context, degradation may also be caused by failures within the system. (See Degraded State in Section
11.3.5.)

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 128 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Failure

Termination of the ability of a component to perform a required function. After failure, the component has a fault,
which may be complete or partial. Failure is an event, as distinguished from fault which is a state. The concept as
defined does not apply to components consisting of software only.

Failure Cause

Circumstances during specification, design, manufacture, installation, use, or maintenance that result in failure.
Examples include:

• Misuse (caused by operation outside specified stress)

• Primary (not caused by an earlier failure)

• Secondary (caused by an earlier failure)

• Wear-out (caused by accelerating failure rate mechanism)

• Design (caused by an intrinsic weakness)

• Software (caused by a program error despite no hardware failure)

Failure Criteria

Predefined conditions to be accepted as conclusive evidence of failure, e.g., a defined limiting state of wear,
crack propagation, performance degradation, leakage, emission, etc. beyond which it is deemed to be unsafe or
uneconomic to continue operation.

Failure Mechanism

Physical, chemical, or other processes which may lead or have led to the failure.

Failure Mode

Manner in which the inability of a component to perform a required function occurs. A failure mode may be defined by
the function lost or the state transition that occurred.

Hidden Failure

Failure that is not detected during normal operation.

Primary Failure

Failure of a component not caused either directly or indirectly by a failure or a fault of another component.

Secondary Failure

Failure of a component caused either directly or indirectly by a failure or a fault of another component.

Severity (of a Failure or a Fault)

Potential or actual detrimental consequences of a failure or a fault. The severity of a failure may be related to safety,
availability, costs, quality, environment, etc.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 129
Equipment Reliability Appendix 5

Sudden Failure

Failure that could not be anticipated by prior examination or monitoring.

Wear-Out Failure

Failure whose probability of occurrence increases with the operating time or the number of operations of the
component and the associated applied stresses. Wear-out is a physical phenomenon which results in a loss,
deformation or change of material.

11.3.5 Faults and States Terms

Degraded State

State in which the ability to provide the required function is reduced, but within defined limits of acceptability. A
degraded state may be the result of faults at lower indenture levels.

Disabled State Outage

State of a component characterized by its inability to perform a required function, for any reason. A disabled state
may be either an up or down state.

Down State

State of a component characterized either by a fault, or by a possible inability to perform a required function during
PM. This state is related to availability performance. A down state is sometimes referred to as an internal disabled
state.

External Disabled State

Subset of the disabled state when the component is in an up state, but lacks required external resources or is
disabled due to planned actions other than maintenance.

Fault

State of a component characterized by inability to perform a required function, excluding the inability during PM
or other planned actions, or due to lack of external resources. A fault usually results from a failure, but in some
circumstances, it may be a preexisting fault.

Fault Masking

Condition in which a fault exists in a subcomponent of a component but cannot be recognized because of a fault of
the component or because of another fault of that subcomponent or of another subcomponent. Fault masking may
conceal a progressive loss of redundancy.

Hazardous State

State of a component assessed as likely to result in an injury to persons, significant material damage, or other
unacceptable consequences.

Idle State

State of a component which is in an up state and non-operating, during non-required time.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 130 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Latent Fault

Existing fault that has not become apparent.

Operating State

State when a component is performing as required.

Partial Fault

Fault characterized by the fact that a component can only perform some, but not all, of the required functions. In
some cases, it may be possible to use the component with reduced performance.

Shutdown

Outage scheduled in advance for maintenance or other purposes. Shutdown may also be called planned outage.

Software Fault/Bug

Condition of a software component that may prevent it from performing as required.

Standby State

State of a component which is in an up state and non-operating during the required time.

Up State

State of a component characterized by the fact that it can perform a required function, assuming that the external
resources, if required, are provided.

11.3.6 Maintenance Terms

Compliance Test

Test used to show whether or not a characteristic or a property of a component complies with the stated
requirements.

Component Register

Record of individually identified components. Additional information, such as location, may also be stored on the
component register. The level of individual components to be registered should be specified.

Condition Monitoring

Activity, performed either manually or automatically, intended to measure at predetermined intervals the
characteristics and parameters of the actual state of a component. Monitoring is distinguished from inspection in that
it is used to evaluate any changes in the parameters of the component with time. Monitoring may be continuous, over
a time interval or after a given number of operations. Monitoring is usually carried out in the operating state. Review
and interpretation of operational, testing, or calibration data can indicate the condition of an asset and potential failure
characteristic such that the correct point of intervention can be established.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 131
Equipment Reliability Appendix 5

Condition-Based Maintenance (CBM)

PM that includes a combination of condition monitoring/inspection/testing, analysis, and the ensuing maintenance
actions. The condition monitoring/inspection/ testing may be scheduled, by request, or continuous.

Corrective Maintenance (CM)

Maintenance carried out after fault recognition and intended to put a component into a state in which it can perform a
required function.

Deferred Corrective Maintenance

CM that is not immediately carried out after a fault detection but is delayed in accordance with given rules.

Failure Analysis

Logical and systematic examination of component failure modes and causes, before or after a failure, to identify the
consequences of failure as well as the probability of its occurrence. Failure analysis is generally performed to improve
dependability.

Fault Diagnosis

Actions taken for fault recognition, fault localization, and identification of causes.

Fault Localization

Actions taken to identify the faulty component at the appropriate indenture level. These actions may include black box
testing (a means of testing in which test cases are chosen using only the functional specifications of the component).

Function Checkout

Action taken after maintenance actions to verify that the component is able to perform as required. Function checkout
is usually carried out after down state.

Immediate Corrective Maintenance

CM that is carried out without delay after a fault has been detected to avoid unacceptable consequences.

Improvement

Combination of all technical, administrative, and managerial actions, intended to improve the reliability/maintainability/
safety of a component, without changing the original function. An improvement may also be introduced to prevent
misuse in operation and to avoid failures.

Inspection

Examination for conformity by measuring, observing, or testing the relevant characteristics of a component.

Line of Maintenance/Maintenance Echelon

Position in an organization where specified levels of maintenance are to be carried out on a component. Examples of
line of maintenance are field (first line maintenance), workshop (second line maintenance), and manufacturer (third
line maintenance). The lines of maintenance are characterized by the skill required of the personnel, the facilities
available, the location, the complexity of the maintenance task, etc.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 132 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Maintenance Level

Maintenance task categorization by complexity. These tasks are divided into levels of increasing complexity.
Examples include:

• Level 1 is characterized by simple actions carried out with minimal training.

• Level 2 is characterized by basic actions which should be carried out by qualified personnel using detailed
procedures.

• Level 3 is characterized by complex actions carried out by qualified technical personnel using detailed
procedures.

• Level 4 is characterized by actions which imply the know-how of a technique or a technology and carried out by
specialized technical personnel.

• Level 5 is characterized by actions which imply a knowledge held by the manufacturer or a specialized company
with industrial logistic support equipment.

The maintenance level may be related to the indenture level.

Maintenance Outsourcing

Contracting out of all or part of the maintenance activities of an organization for a stated period of time. In the case of
complete outsourcing of all maintenance activities this is referred to as complete maintenance outsourcing.

Maintenance Record

Part of maintenance documentation which contains the history of all maintenance related data for a component. The
history may contain records of all failures, faults, costs, component availability, up time and any other relevant data.

Maintenance Schedule

Plan produced in advance detailing when a specific maintenance task should be carried out.

Maintenance Support

Provision of resources, services, and management necessary to carry out maintenance. The provision may include,
for example, personnel, test equipment, workrooms, spare parts, documentation, tools, etc.

Maintenance Task Preparation

Supplying all necessary information and identifying the required resources to enable the maintenance task to be
carried out. The preparation may include a description of how to perform the work, reference to valid instructions and/
or documentation, required permits, spare parts, skill, tools, etc.

Mean Time Between Failures (MTBF) and Mean Time to Fail (MTTF)

The total cumulative functioning time of a population divided by the number of failures. As with failure rate, the same
applies to observed, assessed, and extrapolated.

• MTBF is used for components that involve repair.

• MTTF is used for components with no repair.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 133
Equipment Reliability Appendix 5

Mean Time to Repair (MTTR)

The mean time to carry out a defined maintenance action. Usually refers to Corrective Maintenance.

Modification

Combination of all technical, administrative, and managerial actions intended to change one or more functions
of a component. Modification is not a maintenance action but has to do with changing the required function of a
component to a new required function. The changes may have an influence on the dependability characteristics.
Modification may involve the maintenance organization. The change of a component where a different version is
replacing the original component, without changing the function or upgrading the dependability of the component, is
called a replacement and is not a modification.

Online Maintenance

Maintenance carried out on the component while it is operating and without impact on its performance.

On-site Maintenance

Maintenance carried out at the location where the component is normally located.

Operator Maintenance

Maintenance actions carried out by an operator. Such maintenance actions should be clearly defined.

Overhaul

Comprehensive set of PM actions carried out to maintain the required level of performance of a component. Overhaul
may be performed at prescribed intervals of time or after a set number of operations. Overhaul may require a
complete or partial dismantling of the component.

Predictive Maintenance (PdM)

Condition-based maintenance carried out following a forecast derived from repeated analysis or known
characteristics and evaluation of the significant parameters of the degradation of the component.

Preventive Maintenance (PM)

Maintenance carried out at predetermined intervals, or according to prescribed criteria, and intended to reduce the
probability of failure or the degradation of the functioning of a component.

Reactive Maintenance

Operating a piece of equipment until it fails and then making a repair or replacement. Failure does not necessarily
imply a complete breakdown; it might just mean that the equipment is operating outside of specified parameters.

Rebuilding

Action following the dismantling of a component, and the repair or replacement of those subcomponents, that are
approaching the end of their useful life and/or should be regularly replaced. Rebuilding differs from overhaul in that
the actions may include modifications and/or improvements. The objective of rebuilding is normally to provide a
component with an extended useful life.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 134 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Remote Maintenance

Maintenance of a component carried out without physical access by personnel to the component.

Repair

Physical action taken to restore the required function of a faulty component. Repair also includes fault localization
and function checkout.

Repair Rate

The reciprocal of MTTR.

Restoration

Event at which the ability to perform as required is reestablished after a failure.

Routine Maintenance

Regular or repeated simple PM activities. Routine maintenance may include, for example, cleaning, tightening of
connections, replacement of connectors, checking liquid level, lubrication, etc.

Scheduled Maintenance

Maintenance carried out in accordance with an established time schedule or established number of units of use.
Corrective deferred maintenance may also be scheduled.

Temporary Repair

Physical action taken to allow a faulty component to perform its required function for a limited time interval and until a
repair is carried out.

Unplanned Maintenance

Maintenance that is performed without prior planning based on an immediate need either because a piece of
equipment has failed or failure is imminent.

11.3.7 Time Terms

Active Maintenance Time

Part of the maintenance time when active maintenance is carried out on a component, excluding logistic delays. An
active maintenance action may be carried out while the component is functioning.

Active Preventive Maintenance Task Time

Time interval when an active PM task is carried out. All technical delays are excluded from active PM task time.

Constant Failure Period

Period in the life of a component when the instantaneous failure intensity for a repairable component, or the
instantaneous failure rate for a non-repairable component, is approximately constant.

For individual use only. © Copyright ISPE 2020. All rights reserved.
ISPE Good Practice Guide: Page 135
Equipment Reliability Appendix 5

Corrective Maintenance Time

Part of the maintenance time when active CM is carried out on a component, including logistic delays.

Downtime

The time during which an asset is not available for its intended use. Downtime may be composed of both planned
and unplanned events.

Time interval throughout which a component is in a down state.

Early Failure Period

Time interval in early life of a component when the instantaneous failure intensity for a repairable component, or the
instantaneous failure rate for a non-repairable component, is significantly higher than that of the subsequent period.

External Disabled Time

Time interval throughout which a component is in an external disabled state.

Idle Time

Time interval throughout which a component is in an idle state.

Lifecycle Cost

All costs generated during the lifecycle of the component. For a user or an owner of a component, the total lifecycle
cost may include only those costs pertaining to acquisition, operation, maintenance, and disposal.

Logistic Delay

Accumulated time when maintenance cannot be carried out due to the need to acquire maintenance resources,
excluding any administrative delay. Logistic delays can be due to, for example, traveling to installations, pending
arrival of spare parts, specialists, test equipment and information, and unsuitable environmental conditions.

Maintenance Time

Time interval when maintenance is carried out on a component, including technical and logistic delays. Maintenance
may be carried out while the component is functioning.

Operating Time

Time interval throughout which a component is in the operating state.

Operating Time Between Failures

Total time duration of operating time between two consecutive failures of a component.

Operating Time to Failure

Accumulated operating times of a component, from the instant it is first into use, until failure or, from the instant of
restoration until next failure. Operating time between failures is a special case of operating time to failure. Time to
failure is often used instead of operating time to failure.

For individual use only. © Copyright ISPE 2020. All rights reserved.
Page 136 ISPE Good Practice Guide:
Appendix 5 Equipment Reliability

Preventive Maintenance Time

Part of maintenance time when PM is carried out on a component, including logistic delays.

Repair Time

Part of active CM time when repair is carried out on a component.

Required Time

Time interval throughout which the component is required to be in an up state.

Standby Time

Time interval throughout which a component is in a standby state.

Technical Delay

Accumulated time necessary to perform auxiliary technical actions associated with, but not part of, the maintenance
action, e.g., rendering the equipment safe and setting up test equipment.

Time between Failures

Time duration between two consecutive failures of a component. Time between failures may include non-operating
time after restoration.

Time to Restoration

Time interval for which a component is in a down state due to a failure. Down time for other reasons, e.g., for PM, is
excluded.

Up Time

Time interval throughout which a component is in an up state.

Wear-Out Failure Period

Period in the life of a component when the instantaneous failure intensity for a repairable component, or the
instantaneous failure rate for a non-repairable component, increases significantly with time.

For individual use only. © Copyright ISPE 2020. All rights reserved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
600 N. Westshore Blvd., Suite 900, Tampa, Florida 33609 USA
Tel: +1-813-960-2105, Fax: +1-813-264-2816

www.ISPE.org

You might also like